Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-06 Thread via GitHub


alamb commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2152291093

   > > I think it would also help to define what is required of an 
implementation to have a "check" in the corresponding feature row.
   > 
   > I worry about this becoming a little bit of a pedantic discussion. My 2 
cents: I think a reasonable approach is to let maintainers decide on checking 
the box or not. Maybe we can have a ternary value. If a maintainer feels it is 
fully supported it is a check. If they think reasonable people might be 
confused it gets '-' or check without footnote to explain the exception. If it 
is completely not supported it gets an 'X'. We can always adjust criteria as we 
get feedback.
   
   I agree that it would be best to  start with a relatively lax, low barrier 
to entry for self reporting 
   
   Over time we can add more stringency  (ideally with automated checks) 
if/when that would add value


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-05 Thread via GitHub


emkornfield commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2151393203

   > I think it would also help to define what is required of an implementation 
to have a "check" in the corresponding feature row.
   
   I worry about this becoming a little bit of a pedantic discussion.  My 2 
cents: I think a reasonable approach is to let maintainers decide on checking 
the box or not.  Maybe we can have a ternary value.  If a maintainer feels it 
is fully supported it is a check.  If they think reasonable people might be 
confused it gets '-' or check without footnote to explain the exception.  If it 
is completely not supported it gets an 'X'.  We can always adjust criteria as 
we get feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] DRAFT: PARQUET-2489: Strawman proposal for releases [parquet-site]

2024-06-05 Thread via GitHub


emkornfield opened a new pull request, #61:
URL: https://github.com/apache/parquet-site/pull/61

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-04 Thread via GitHub


alippai commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2147981789

   I like both!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-04 Thread via GitHub


alamb commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2147973327

   >  Totally agree, thanks for the guide. What do you think about the 
non-Apache or other projects? (Duckdb, fastparquet, impala, cudf)
   
   Echoing what @pitrou  said, I suggest we add a note somewhere that "this 
table includes any open source currently maintained implementation of parquet 
whose maintainers have helped fill it out. If you wish to add a new 
implementation to this table, please open a PR to do so"
   
   It might also be worth adding a column with some sort of placeholder (`?` 
for example) for those implementation (as a way of encouraging their help). 
However, that might be a good thing to do as a follow on PR as well 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-04 Thread via GitHub


pitrou commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2147379889

   IMHO, any currently maintained open source implementation of Parquet 
deserves mentioning there. But that also requires involvement from their 
respective maintainers (we shouldn't expect us Parquet maintainers to make sure 
the information is up to date).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-06-04 Thread via GitHub


alippai commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2147369156

   Totally agree, thanks for the guide. What do you think about the non-Apache 
or other projects? (Duckdb, fastparquet, impala, cuff)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-22 Thread via GitHub


emkornfield merged PR #60:
URL: https://github.com/apache/parquet-site/pull/60


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-22 Thread via GitHub


emkornfield commented on PR #60:
URL: https://github.com/apache/parquet-site/pull/60#issuecomment-2125731630

   LGTM, thank you @vinooganesh 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-22 Thread via GitHub


alamb commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1610633755


##
content/en/_index.html:
##
@@ -24,8 +24,8 @@
 {{% /blocks/feature %}}
 
 
-{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-mr; %}}
-We do a [Pull Request](https://github.com/apache/parquet-mr/pulls) 
contributions workflow on **GitHub**. New users are always welcome!
+{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-java; %}}

Review Comment:
   I think it is something to revisit as a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-22 Thread via GitHub


alamb commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1610633516


##
README.md:
##
@@ -63,7 +63,7 @@ You can now preview the site locally on http://localhost:1313/
 
 To create documentation for a new release of `parquet-format` create a new 
.md file under `content/en/blog/parquet-format`. Please see 
existing files in that directory as an example.
 
-To create documentation for a new release of `parquet-mr` create a new 
.md file under `content/en/blog/parquet-mr`. Please see existing 
files in that directory as an example.
+To create documentation for a new release of `parquet-java` create a new 
.md file under `content/en/blog/parquet-java`. Please see 
existing files in that directory as an example.

Review Comment:
   I don't feel strongly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


vinooganesh commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1609027971


##
content/en/_index.html:
##
@@ -24,8 +24,8 @@
 {{% /blocks/feature %}}
 
 
-{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-mr; %}}
-We do a [Pull Request](https://github.com/apache/parquet-mr/pulls) 
contributions workflow on **GitHub**. New users are always welcome!
+{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-java; %}}

Review Comment:
   That's a great point. This was actually that was torn about the most when I 
first built the new site. I figured more people would want to contribute to 
`mr` than `format` (there's actual code in the former), so I went with `mr` 
everywhere. More than happy to revisit this as it was mostly just a guess on my 
part. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


vinooganesh commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1609020796


##
README.md:
##
@@ -63,7 +63,7 @@ You can now preview the site locally on http://localhost:1313/
 
 To create documentation for a new release of `parquet-format` create a new 
.md file under `content/en/blog/parquet-format`. Please see 
existing files in that directory as an example.
 
-To create documentation for a new release of `parquet-mr` create a new 
.md file under `content/en/blog/parquet-mr`. Please see existing 
files in that directory as an example.
+To create documentation for a new release of `parquet-java` create a new 
.md file under `content/en/blog/parquet-java`. Please see 
existing files in that directory as an example.

Review Comment:
   Ah I see the confusion - these notes have to do with updating the website 
announce the new release: https://parquet.apache.org/blog/. So the flow would be
   1. Make a release of parquet-java in that repo
   2. Put up a blog post entry on the website containing the release information
   
   Happy to remove this if folks feel strongly - but was thinking it may be 
good to have some instructions on how to actually make the post. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


vinooganesh commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1609020796


##
README.md:
##
@@ -63,7 +63,7 @@ You can now preview the site locally on http://localhost:1313/
 
 To create documentation for a new release of `parquet-format` create a new 
.md file under `content/en/blog/parquet-format`. Please see 
existing files in that directory as an example.
 
-To create documentation for a new release of `parquet-mr` create a new 
.md file under `content/en/blog/parquet-mr`. Please see existing 
files in that directory as an example.
+To create documentation for a new release of `parquet-java` create a new 
.md file under `content/en/blog/parquet-java`. Please see 
existing files in that directory as an example.

Review Comment:
   Ah I see the confusion - these notes have to do with updating the website 
announce the new release: https://parquet.apache.org/blog/. So the flow would be
   1. Make a release of parquet-java in that repo
   2. Put up a blog post entry on the website containing the release information
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


vinooganesh commented on PR #60:
URL: https://github.com/apache/parquet-site/pull/60#issuecomment-2123541356

   Thanks @alamb!
   
   > Should we also do a sweep and update the contribution guidelines to 
include the new ways to contribute? 
   > I am not sure what this is asking
   
   Sorry -- this is a typo on my side. I meant to include a new contribution 
template (edited the message above). It was a response to this thread: 
https://lists.apache.org/thread/5oohcx3m16kqs8dmtl3vm1cgd8z0q10b. 
   
   It's probably worth having separate release announcement templates for 
`parquet-format` and `parquet-java`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


alamb commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1608995912


##
content/en/_index.html:
##
@@ -24,8 +24,8 @@
 {{% /blocks/feature %}}
 
 
-{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-mr; %}}
-We do a [Pull Request](https://github.com/apache/parquet-mr/pulls) 
contributions workflow on **GitHub**. New users are always welcome!
+{{% blocks/feature icon="fab fa-github" title="Contributions welcome!" 
url="https://github.com/apache/parquet-java; %}}

Review Comment:
   As part of another PR perhaps we should revisit this link (perhaps it should 
link to parquet-format?) as again linking to the java implementation from the 
homepage might be more confusing than helpful



##
README.md:
##
@@ -63,7 +63,7 @@ You can now preview the site locally on http://localhost:1313/
 
 To create documentation for a new release of `parquet-format` create a new 
.md file under `content/en/blog/parquet-format`. Please see 
existing files in that directory as an example.
 
-To create documentation for a new release of `parquet-mr` create a new 
.md file under `content/en/blog/parquet-mr`. Please see existing 
files in that directory as an example.
+To create documentation for a new release of `parquet-java` create a new 
.md file under `content/en/blog/parquet-java`. Please see 
existing files in that directory as an example.

Review Comment:
   I would personally suggest removing the discussion bout release of 
`parquet-mr`/ `parquet-java` to that repo. It seems confusing to have 
instructions on how to do a release from another repo in `parquet-site`



##
content/en/docs/Overview/_index.md:
##
@@ -18,14 +18,14 @@ The parquet-format repository hosts the official 
specification of the Apache Par
 As a repository focused on specification, the parquet-format repository does 
not contain source code. 
 
 
-### parquet-mr
+### parquet-java

Review Comment:
   i agree adding a note like this would be clearer
   
   ```
   The parquet-java repository(previously named `parquet-mr`) is part of the 
Apache Parquet project and specifically focuses on providing Java tools for
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


alamb commented on PR #60:
URL: https://github.com/apache/parquet-site/pull/60#issuecomment-2123503469

   > @alamb @wgtmac I put a very basic PR together to update _some_ of the 
references on the website from `parquet-mr` to `parquet-java`. I only chose to 
do some because I think we have a few questions to figure out first:
   > 
   > 1. Are we going to change the published artifact name of `parquet-mr` to 
`parquet-java` or do we just want to keep publishing under mr?
   
   I personally suggest not making this change unless there is a compelling 
usecase. It seems like it doesn't hurt to leave the artifacts as parquet-mr and 
would only cause downstream pain to update them now for very little gain
   
   > 2. Do we want to actually "rewrite history" and update the past references 
(contributions, etc..) in the docs to refer to `parquet-java` instead? I'm not 
a fan of rewriting history but figured I'd start a conversation just in case 
people want to.
   
   I recommend against doing this, again on the justification of "what benefit 
would we get from it"?
   
   > 3. Should we also do a sweep and update the contribution guidelines to 
include the new ways to contribute?
   
   I am not sure what this is asking
   
   > 4. Should we introduce a new section of the blog called `parquet-java` (I 
had been hacking using the blog for releases) to add a note (assuming we change 
the name of the artifact) that things have changed?
   
   Maybe we could create a blog post announcing some of the recent changes / 
activity (e.g. discussion son V3 format, clarifications on repos, new website, 
etc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


emkornfield commented on code in PR #60:
URL: https://github.com/apache/parquet-site/pull/60#discussion_r1608936158


##
content/en/docs/Overview/_index.md:
##
@@ -18,14 +18,14 @@ The parquet-format repository hosts the official 
specification of the Apache Par
 As a repository focused on specification, the parquet-format repository does 
not contain source code. 
 
 
-### parquet-mr
+### parquet-java

Review Comment:
   maybe note here that this was previously referred to as parquet-mr due to 
the name of the repository (which has also been moved)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Updating refereces from parquet-mr -> parquet-java [parquet-site]

2024-05-21 Thread via GitHub


vinooganesh opened a new pull request, #60:
URL: https://github.com/apache/parquet-site/pull/60

   @alamb @wgtmac I put a very basic PR together to update *some* of the 
references on the website from `parquet-mr` to `parquet-java`. I only chose to 
do some because I think we have a few questions to figure out first:
   
   1. Are we going to change the published artifact name of `parquet-mr` to 
`parquet-java` or do we just want to keep publishing under mr?
   2. Do we want to actually "rewrite history" and update the past references 
(contributions, etc..) in the docs to refer to `parquet-java` instead? I'm not 
a fan of rewriting history but figured I'd start a conversation just in case 
people want to.
   3. Should we also do a sweep and update the contribution guidelines to 
include the new ways to contribute?
   4. Should we introduce a new section of the blog called `parquet-java` (I 
had been hacking using the blog for releases) to add a note (assuming we change 
the name of the artifact) that things have changed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-16 Thread via GitHub


julienledem commented on PR #59:
URL: https://github.com/apache/parquet-site/pull/59#issuecomment-2116406982

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-16 Thread via GitHub


alamb commented on PR #59:
URL: https://github.com/apache/parquet-site/pull/59#issuecomment-2115741244

   Thanks @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-16 Thread via GitHub


wgtmac commented on PR #59:
URL: https://github.com/apache/parquet-site/pull/59#issuecomment-2115380985

   Let me merge this. Thanks everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-16 Thread via GitHub


wgtmac merged PR #59:
URL: https://github.com/apache/parquet-site/pull/59


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1602060617


##
content/en/docs/Overview/_index.md:
##
@@ -6,11 +6,11 @@ description: >
   All about Parquet.
 ---
 
-Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval.
+It provides high performance compression and encoding schemes to handle 
complex data in bulk and is supported in many programming language and 
analytics tools.

Review Comment:
   No really strong feelings, was just wondering if there was a subtextual 
focus intended



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-15 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1602019285


##
content/en/docs/Overview/_index.md:
##
@@ -6,11 +6,11 @@ description: >
   All about Parquet.
 ---
 
-Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval.
+It provides high performance compression and encoding schemes to handle 
complex data in bulk and is supported in many programming language and 
analytics tools.

Review Comment:
   I didn't mean for the comma or lack there of to carry any additional 
semantic meaning. I am happy to put a comma there if you like



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1601981340


##
content/en/docs/Overview/_index.md:
##
@@ -6,11 +6,11 @@ description: >
   All about Parquet.
 ---
 
-Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval.
+It provides high performance compression and encoding schemes to handle 
complex data in bulk and is supported in many programming language and 
analytics tools.

Review Comment:
   Did we mean for this to say "high performance compression" or is it "high 
performance, compression"? I think it may be the latter. Or maybe "It provides 
performant compression and encoding schemes..." I was thinking the first 
versions sound too much like the compression tool rather than the format 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1601981340


##
content/en/docs/Overview/_index.md:
##
@@ -6,11 +6,11 @@ description: >
   All about Parquet.
 ---
 
-Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval.
+It provides high performance compression and encoding schemes to handle 
complex data in bulk and is supported in many programming language and 
analytics tools.

Review Comment:
   Did we mean for this to say "high performance compression" or is it "high 
performance, compression"? I think it may be the latter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-15 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1601975186


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   Text is updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-15 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1601930883


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,30 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### Parquet Format
+
+The "Parquet Format" project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### Parquet MR 
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+ Parquet MR can be thought of the a "reference" implementation of 
parquet-format. There are a number of other Parquet Format implementations, 
such as [parquet-cpp](https://github.com/apache/parquet-cpp) and [parquet 
rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md). 

Review Comment:
   To follow up -- we are discussing reference implementations on the mailing 
list: https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


alamb commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2112691416

   > Ah okay, so it does seem like we need to clear out the output directory of 
`asf-staging`: https://parquet.staged.apache.org/
   
   I suggest we make a PR that directly targets that branch that leaves a 
pointer to https://parquet.apache.org/ for anyone who stumbles on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2112682297

   Ah okay, so it does seem like we need to clear out the output directory of 
`asf-staging`: https://parquet.staged.apache.org/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


wgtmac merged PR #58:
URL: https://github.com/apache/parquet-site/pull/58


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


wgtmac commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2112664342

   Sure, let me merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2112458273

   I think we can merge as is, if only to see what the behavior will be. It 
seems pretty low risk, especially since we still have the branches and can 
revert if needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-15 Thread via GitHub


vinooganesh commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2112434241

   Thanks everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-15 Thread via GitHub


wgtmac commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2112333016

   > > I think I can delete the staging branch. Before that, should we send a 
notice to the dev ML in case there is any objection? Maybe we can set a 
deadline and proceed after that. I don't think this worth a formal vote.
   > 
   > I agree -- either an email or JIRA ticket is probably good to record the 
rationale and decision in a more easily discoverable location
   > 
   > > BTW, should we do anything to remove the site: 
https://parquet.staged.apache.org/? Or it will be removed automatically after 
we are done?
   > 
   > I suggest we make one final PR to the staging branch to have it push some 
sort of notice or redirect so https://parquet.staged.apache.org/ redirects to 
https://parquet.apache.org/
   > 
   > I don't know how to remove it
   
   Do we want to make additional change, or it is good to merge as is? 
@vinooganesh @alamb 
   
   I can try to delete staging-related branches once you think fit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-15 Thread via GitHub


wgtmac commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2112328673

   I just merged it. Thanks everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-15 Thread via GitHub


wgtmac merged PR #53:
URL: https://github.com/apache/parquet-site/pull/53


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-15 Thread via GitHub


alamb commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2112312510

   I think we should merge this PR and begin working on the next steps (feature 
compatibility matrix)
   
   This is quite an impressive list of ✅   
   
   ![Screenshot 2024-05-15 at 7 44 35 
AM](https://github.com/apache/parquet-site/assets/490673/690e6b2b-85e7-4253-abde-3cdae98eeb17)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-14 Thread via GitHub


wgtmac commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2111443901

   cc @gszadovszky @julienledem 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-14 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1599985436


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Parquet 
format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Parquet files, including tools for data import/export, schema management, and 
data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)

Review Comment:
   FYI this work is tracked by 
https://issues.apache.org/jira/browse/PARQUET-2310 and there is a draft at 
https://github.com/apache/parquet-site/pull/34. Once this PR gets merged we'll 
start working on that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-14 Thread via GitHub


crepererum commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599844055


##
content/en/docs/Overview/_index.md:
##
@@ -6,4 +6,7 @@ description: >
   All about Parquet.
 ---
 
-Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval. 
+It provides efficient data compression and encoding schemes with enhanced 
performance to handle complex data in bulk.
+Parquet is available in multiple languages including Java, C++, and Python.

Review Comment:
   I think mentioning implementation (both as end-user software and as libs) is 
valuable but shouldn't be part of the elevator pitch. Other formats usually 
solve this by a dedicated sub-section or page, e.g.:
   
   - https://jpeg.org/jpegxl/software.html (the list format is good, the fact 
that there's only a single implementation is not)
   - https://paseto.io/
   - https://autocrypt.org/dev-status.html
   
   This would also allow multiple implementations for a single language, which 
sometimes can be valuable (e.g. if you have a backwards compatible, 
conservative variant and a fancy new one).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-14 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599769911


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   My plan is to wait another day or so for any additional comments, and if I 
don't hear any I will update this PR so that all three locations use the 
generic "Parquet is supported in many programming language and analytics 
tools." phrasing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


amoeba commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599097041


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2108802214

   I initially built this to publish to the `output/` dir per 
https://github.com/apache/infrastructure-asfyaml/tree/main?tab=readme-ov-file#pelican-sub-directories-for-static-output.
 see: 
https://github.com/apache/parquet-site/blob/staging/.github/workflows/deploy.yml#L46.
 We may want to publish an empty commit to the `output` dir of the 
`asf-staging` branch as just a sanity check cleanup too. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599070791


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   I think the generic sentiment of wide support is good  
   
   I tried to remove the specific Java/C++/Go tech references which are on 
https://projects.apache.org/project.html?parquet (comes from 
https://github.com/apache/parquet-site/blob/production/static/doap_Parquet.rdf 
it seems)
   
   What do you think about changing all three locations (DOAP, this, and the 
overview @etseidl  mentions in 
https://github.com/apache/parquet-site/pull/59/files#r1599056646) to use the 
more generic phrasing?
   
   
   > "Parquet is supported in many programming language and analytics tools."
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599066379


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+
+Apache Parquet is an open source, column-oriented data file format designed 
for efficient data storage and retrieval. 
+It provides high performance data compression and encoding schemes with to 
handle complex data in bulk.

Review Comment:
   nice catch -- I actually did this locally but forgot to push the change 臘 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


amoeba commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599045106


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   At the risk of bike-shedding this (+1 as-is), I think it might be good to 
indicate that Parquet has readers/writers in many languages and in many tools. 
In the recent [IANA 
registration](https://www.iana.org/assignments/media-types/application/vnd.apache.parquet)
 I went with the vague "used across a wide variety of platforms, technologies, 
and environments.". But here maybe,
   
   > "Parquet is supported in many programming language and analytics tools."
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on PR #59:
URL: https://github.com/apache/parquet-site/pull/59#issuecomment-2108729440

   +1! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #59:
URL: https://github.com/apache/parquet-site/pull/59#discussion_r1599034830


##
content/en/_index.md:
##
@@ -9,7 +9,10 @@ title: Parquet
 
   Download 
 
-Apache Parquet is a columnar storage format available to 
any project in the Hadoop ecosystem, regardless of the choice of data 
processing framework, data model or programming language.
+

Review Comment:
   I wordsmithed the landing page a little to reduce its length and make it 
flow better. I can make it exactly mirror the DOAP text if reviewers prefer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] PARQUET-2470: Update website with larger ecosystem emphasis [parquet-site]

2024-05-13 Thread via GitHub


alamb opened a new pull request, #59:
URL: https://github.com/apache/parquet-site/pull/59

   # Rationale
   As described on https://issues.apache.org/jira/browse/PARQUET-2470, 
Parquet's role in the analytics ecosystem is substantial. 
   
   However, https://parquet.apache.org/ currently emphasis Parquet's role in 
the Hadoop ecosystem. I think this causes confusion in several ways:
   
   1. It implies that parquet is only focused on Hadoop, when I think it is a 
critical technology across other ecosystems that are unrelated to hadoop (e.g. 
Apache Iceberg, Delta Lake, etc)
   2. It may further the perception that the Apache Parquet project only 
focuses on / cares about Hadoop / Java implementation
   
   # Chanages
   Update the home page content to mirror the Apache Project Description 
https://projects.apache.org/project.html?parquet (which does not mention Hadoop 
specifically)
   
   > Apache Parquet is an open source, column-oriented data file format 
designed for efficient data storage and retrieval. It provides efficient data 
compression and encoding schemes with enhanced performance to handle complex 
data in bulk. Parquet is available in multiple languages including Java, C++, 
and Python.
   
   
   
   ## Before this PR
   
   ![Screenshot 2024-05-13 at 4 13 31 
PM](https://github.com/apache/parquet-site/assets/490673/86a76878-f304-4d43-8156-a3555ccebfbc)
   
   
   ## After the PR
   
   ![Screenshot 2024-05-13 at 4 15 17 
PM](https://github.com/apache/parquet-site/assets/490673/7479dd8f-3054-410e-9c14-4a8d2a0dccaa)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1599015648


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   I started a thread on the mailing list about this topic to see if we can 
reach consensus: 
https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598594225


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) repositories. 
+
+
+### parquet-format
+
+The parquet-format repository hosts the official specification of the Apache 
Parquet file format, defining how data is structured and stored. This 
specification, along with Thrift metadata definitions and other crucial 
components, is essential for developers to effectively read and write Parquet 
files. The parquet-format project specifically contains the format 
specifications needed to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. The "mr" stands for MapReduce. Essentially, 
this repository includes all the necessary Java libraries and modules that 
allow developers to read and write Apache Parquet files.
+
+The parquet-mr repository contains an implementation of the Apache Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Apache 
Parquet format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Apache Parquet files, including tools for data import/export, schema 
management, and data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [Parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [Parquet Go, a subproject for Arrow 
Go](https://github.com/apache/arrow/tree/main/go/parquet) 
([documentation](https://github.com/apache/arrow/tree/main/go))
+* [Parquet 
Rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)
+* [cudf](https://github.com/rapidsai/cudf)

Review Comment:
   Ahh, thanks @etseidl ! I didn't realize this is was the stylized version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


etseidl commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598588034


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) repositories. 
+
+
+### parquet-format
+
+The parquet-format repository hosts the official specification of the Apache 
Parquet file format, defining how data is structured and stored. This 
specification, along with Thrift metadata definitions and other crucial 
components, is essential for developers to effectively read and write Parquet 
files. The parquet-format project specifically contains the format 
specifications needed to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. The "mr" stands for MapReduce. Essentially, 
this repository includes all the necessary Java libraries and modules that 
allow developers to read and write Apache Parquet files.
+
+The parquet-mr repository contains an implementation of the Apache Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Apache 
Parquet format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Apache Parquet files, including tools for data import/export, schema 
management, and data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [Parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [Parquet Go, a subproject for Arrow 
Go](https://github.com/apache/arrow/tree/main/go/parquet) 
([documentation](https://github.com/apache/arrow/tree/main/go))
+* [Parquet 
Rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)
+* [cudf](https://github.com/rapidsai/cudf)

Review Comment:
   ```suggestion
   * [cuDF](https://github.com/rapidsai/cudf)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598499413


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   cc @gszadovszky @Fokko @xhochy @wgtmac. I don't want to further block this 
PR by settling this beforehand, so I'm going to remove the word "reference" and 
we can add it back if we want to in a subsequent PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598499413


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   cc @gszadovszky @Fokko @xhochy @wgtmac. I don't want to further block this 
PR by settling this beforehand, so I'm going to remove the word "reference" and 
we can add it back if we want to. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598495733


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.

Review Comment:
   @pitrou I assume it's mapreduce, but please correct me if I'm wrong



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


pitrou commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598406934


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   If we don't agree on what a reference implementation then we should not list 
parquet-mr as a reference implementation. The term "reference implementation" 
has an official connotation and implies a specific status; it certainly should 
not be assigned lightly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598331634


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   There have been a lot of conversations about this: 
https://github.com/apache/parquet-site/pull/53#discussion_r1582882267 (and 
others on the thread) and I'm inclined to keep this as is. I don't think we 
need to exhaustively list other the reference reference implementing when there 
is a list of implementations below.  @gszadovszky has also called this a 
reference implementation and I think it helps clarify the relationship between 
the `parquet-format` and `parquet-mr`. I'm more than happy to update this once 
the community has reached a consensus after the mailing list discussion that 
@alamb suggested though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598329089


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.

Review Comment:
   I can make this change. These are referred to publicly as both projects and 
repo (in our mailing list as well) so I deliberately put both in. I'll stick 
with repository though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598319137


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   If we knew what those references implementations are, I agree it would be 
valuable to document.
   
   However,  I think there is consensus required before we made such a 
determination
   
   Thus, for this PR I suggest:
   1. Remove the word "reference"
   2. File a follow on ticket / discussion in the mailing list to figure out 
what should be listed as references implementations
   ```suggestion
   The parquet-mr repo contains an implementation of the Parquet format. There 
are a number of other Parquet format implementations, which are listed below. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598315092


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Parquet 
format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Parquet files, including tools for data import/export, schema management, and 
data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)

Review Comment:
   I recommend that (as a follow on PR) we turn this list into a table, 
something like
   
   | Project | Language | Website | API Docs |
   |||||
   | parquet-mr | Java | link |
   | parquet-cpp | C++ | link |link |
   | parquet-rs | Rust | link | link |
   
   
   Also I recommend the criteria for being listed here is "Open source 
implementations of the parquet format" (which is a low bar to be sure
   
   I would be happy to propose such changes as a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2107320484

   Given the size and substance of this PR, it is unlikely we will get it 
perfect the first time. Also, I don't see any disagreement across commenters on 
the value of this information. 
   
   I would personally suggest that we address all the outstanding comments as 
best as possible, merge this PR, and then iterate on the content in subsequent 
PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


jorisvandenbossche commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598301378


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Parquet 
format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Parquet files, including tools for data import/export, schema management, and 
data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)

Review Comment:
   And could also mention for Go (and similarly for rust below) that it is a 
subproject of Arrow Go, similarly like done for C++ above
   
   (it also seems there are several Parquet Go implementations, others that are 
not part of the Arrow project, so it's good to clarify this one is Arrow 
related. But at the same time it's not entirely clear what the criterium is for 
being listed here ..)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


pitrou commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1598282487


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.

Review Comment:
   Can we please make the terminology consistent? Either describe both 
parquet-format and parquet-mr as "projects" or as "GitHub repositories", but 
not one and the other.



##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 

Review Comment:
   It's "a" reference implementation, it means there are other ones. But I 
don't see them listed here.
   
   Either list all reference implementations explicitly, or make this "the" 
reference implementation.



##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.

Review Comment:
   Also, can we explain what "mr" stands for? It's a mystery for most people.



##

Re: [PR] Remove staging [parquet-site]

2024-05-13 Thread via GitHub


alamb commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2107096367

   > I think I can delete the staging branch. Before that, should we send a 
notice to the dev ML in case there is any objection? Maybe we can set a 
deadline and proceed after that. I don't think this worth a formal vote.
   
   I agree -- either an email or JIRA ticket is probably good to record the 
rationale and decision in a more easily discoverable location
   
   > BTW, should we do anything to remove the site: 
https://parquet.staged.apache.org/? Or it will be removed automatically after 
we are done?
   
   I suggest we make one final PR to the staging branch to have it push some 
sort of notice or redirect so https://parquet.staged.apache.org/ redirects to 
https://parquet.apache.org/
   
   I don't know how to remove it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-13 Thread via GitHub


jorisvandenbossche commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597944460


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,41 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a reference implementation of the Parquet format. 
There are a number of other Parquet format implementations, which are listed 
below. 
+
+Included in parquet-mr:
+* Java Implementation: It contains the core Java implementation of the Parquet 
format, making it possible to use Parquet files in Java applications, 
particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Parquet files, including tools for data import/export, schema management, and 
data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)
+* [parquet 
rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)
+* [cudf](https://github.com/rapidsai/cudf)
+* [apache impala](https://github.com/apache/impala)
+* [duckdb](https://github.com/duckdb/duckdb)
+* [fast-parquet python](https://github.com/dask/fastparquet)
+* [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)

Review Comment:
   ```suggestion
   ```
   
   It's twice in the list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


wgtmac commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2106471367

   I think I can delete the staging branch. Before that, should we send a 
notice to the dev ML in case there is any objection? Maybe we can set a 
deadline and proceed after that. I don't think this worth a formal vote.
   
   BTW, should we do anything to remove the site: 
https://parquet.staged.apache.org? Or it will be removed automatically after we 
are done?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


alamb commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2106397019

   > The main thing I'm curious about is whether the PMC can delete branches 
easily from github. If so, it maybe much more straightforward, otherwise will 
have to file INFRA tickets
   
   In the arrow and datafusion repos, any committer can delete any branch other 
than the "protected" one (typically the main one)
   
   Thus I suspect someone like @wgtmac  could do so in this repo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


vinooganesh commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2106379277

   Yep, there is actually a sequencing of things that need to happen here:
   1. Deleting the `asf-staging` brach
   2. Deleting the `staging branch` 
   3. Deleting the README from the production branch.
   
   The main thing I'm curious about is whether the PMC can delete branches 
easily from github. If so, it maybe much more straightforward, otherwise will 
have to file INFRA tickets 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


alamb commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2106376534

   > Thanks - and just to make sure it's clear, my main goal was to start the 
process of actually documenting the institutional knowledge in the community 
and this PR is mostly intended as a starting point. There are some other much 
meatier topics (parquet v2's definition for example) that will need to be 
discussed in follow up PRs.
   
   I think documenting the current / institutional knowledge is superful 
helpful. Thank you for pushing this forward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


alamb commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2106376324

   We may also want to update the readme too: 
https://github.com/apache/parquet-site/blob/production/README.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


vinooganesh commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2106374201

   Thanks - and just to make sure it's clear, my main goal was to start the 
process of actually documenting the institutional knowledge in the community 
and this PR is mostly intended as a starting point. There are some other much 
meatier topics (parquet v2's definition for example) that will need to be 
discussed in follow up PRs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


vinooganesh commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597713221


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a "reference" implementation of the Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 

Review Comment:
   Will update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


vinooganesh commented on PR #58:
URL: https://github.com/apache/parquet-site/pull/58#issuecomment-2106372596

   cc @wgtmac @gszadovszky @alamb after conversation on 
https://github.com/apache/parquet-site/pull/56 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Remove staging [parquet-site]

2024-05-12 Thread via GitHub


vinooganesh opened a new pull request, #58:
URL: https://github.com/apache/parquet-site/pull/58

   There still needs to be an infra ticket filed to actually delete the 
`staging` branch (unless a PMC member can delete the branch)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-12 Thread via GitHub


alamb commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2106180380

   Thanks @wgtmac  and @vinooganesh 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


wgtmac commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597576971


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### Parquet Format
+
+The "Parquet Format" project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### Parquet-MR 
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+Parquet-MR can be seen as a "reference" implementation of parquet-format. 
There are a number of other Parquet Format implementations, which are listed 
below. 
+
+Included in parquet-mr:
+* Java/Scala Implementation: It contains the core Java/Scala implementation of 
the Parquet format, making it possible to use Parquet files in Java 
applications, particularly those based on Hadoop.

Review Comment:
   Perhaps we should just say Java implementation here. The scala code is just 
for filters and we don't have a full scala implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


xhochy commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597576454


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a "reference" implementation of the Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 

Review Comment:
   I would second the removal of the quotes here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-12 Thread via GitHub


wgtmac commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2106156421

   @xhochy @pitrou @tustvold Would you like to take a final pass?
   
   Will merge it if there is no further comment next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-12 Thread via GitHub


wgtmac merged PR #56:
URL: https://github.com/apache/parquet-site/pull/56


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-12 Thread via GitHub


wgtmac commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2106152954

   For the staging site, I had a discussion with @gszadovszky here: 
https://github.com/apache/parquet-site/pull/31#issuecomment-1474023977. I think 
we can remove the staging site now and use the docker file for debug purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update README.md on asf-site branch with pointer to real readme [parquet-site]

2024-05-12 Thread via GitHub


wgtmac merged PR #57:
URL: https://github.com/apache/parquet-site/pull/57


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update README.md on asf-site branch with pointer to real readme [parquet-site]

2024-05-12 Thread via GitHub


wgtmac commented on PR #57:
URL: https://github.com/apache/parquet-site/pull/57#issuecomment-2106151363

   Thanks Andrew for doing this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


vinooganesh commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2105999510

   Good question @alamb. Technically the "best practice" from the docsy 
instructions were to create a staging website so I mostly just followed them 
when I remade the parquet one. Back then, there was a lot of stuff to work 
through with hugo builds and migrating from the old jenkins site, so having a 
place to test was definitely helpful. At this point though, I don't think it's 
necessary to have the staging site anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update README.md on asf-site branch with pointer to real readme [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on PR #57:
URL: https://github.com/apache/parquet-site/pull/57#issuecomment-2105997880

   Thanks @vinooganesh   -- I started a thread to discuss this on the mailing 
list https://lists.apache.org/thread/97g4zqlvobr9knntvsbghjs6v3gr63x2
   
   (which is typically what INFRA likes to see when making changes such as the 
default branch)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2105997211

   > The other thing that we haven't been doing a good job of is maintaining 
the staging website. I made a bunch of changes to get the staging and 
production branch in sync, but staging still isn't heavily used.
   
   I wonder what the usecase for the staging website is? (maybe we should just 
not use it?)
   
   FWIW for https://arrow.apache.org/ and https://datafusion.apache.org/ we 
simply publish to the production version of the site.
   
   Sometimes the staging site might be helpful to host pre-release api docs or 
something, but I didn't see any on this site 樂 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update README.md on asf-site branch with pointer to real readme [parquet-site]

2024-05-11 Thread via GitHub


vinooganesh commented on PR #57:
URL: https://github.com/apache/parquet-site/pull/57#issuecomment-2105996926

   +1, I think this will require an INFRA ticket. @shangxinli couldn't change 
it in the Github UI the last time we attempted to update it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


vinooganesh commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2105996596

   This is a great suggestion and the timing is right. I spend some time a few 
weeks ago moving the parquet site's docsy dependency to a hugo module, so now 
they can be managed separately. 
   
   The other thing that we haven't been doing a good job of is maintaining the 
staging website. I made a bunch of changes to get the `staging` and 
`production` branch in sync, but staging still isn't heavily used. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-11 Thread via GitHub


vinooganesh commented on PR #53:
URL: https://github.com/apache/parquet-site/pull/53#issuecomment-2105995733

   @wgtmac - given consensus here, would you be able to merge? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Build failed [parquet-site]

2024-05-11 Thread via GitHub


vinooganesh closed issue #54: Build failed
URL: https://github.com/apache/parquet-site/issues/54


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Update README.md on asf-site branch with pointer to real readme [parquet-site]

2024-05-11 Thread via GitHub


alamb opened a new pull request, #57:
URL: https://github.com/apache/parquet-site/pull/57

   Note that this PR purposely targets the `asf-site` branch rather than the 
`development` or `production` branches
   
   ## Rationale
   The landing page of https://github.com/apache/parquet-site is confusing (it 
seems to have an outdated readme):
   
   ![Screenshot 2024-05-11 at 2 40 24 
PM](https://github.com/apache/parquet-site/assets/490673/bf9c3187-e69a-4423-aaca-84a78b393e61)
   
   It was not immediately clear to me that `asf-site` branch is hosts the 
output of statically building the website and that the updated README / etc are 
on the `production` branch
   
   ## Changes
   Update the README to direct people to the `production` branch which has an 
updated readme
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2105983298

   Thanks for the review @wgtmac  -- I have implemented your suggestion and 
created a Dockerfile and updated the instructions to use them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Dockerfile + instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on PR #56:
URL: https://github.com/apache/parquet-site/pull/56#issuecomment-2105983147

   > Thanks for the improvement! My only concern is that these steps may be out 
of sync easily (e.g. when the provided URLs are broken).
   
   Perhaps we can update the instructions over time if/when they become broken? 
I am sure there are better ways to make such scripts, but in my opinion this is 
a step in the right direction


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on code in PR #56:
URL: https://github.com/apache/parquet-site/pull/56#discussion_r1597483388


##
README.md:
##
@@ -14,21 +15,61 @@ cd parquet-site
 git submodule update --init --recursive
 ```
 
-To build or update your site’s CSS resources, you also need PostCSS to create 
the final assets. By default npm installs tools under the directory where you 
run npm install.
+To build or update CSS resources, you also need PostCSS to create the final 
assets.  By default npm installs tools under the directory where you run npm 
install.
 
 ```
 npm install -D autoprefixer
 npm install -D postcss-cli
 npm install -D postcss
 ```
 
-To run this website site locally, run the following in the root of the 
directory:
+To preview this website site locally, run the following in the root of the 
directory:
 
 ```shell
 hugo server
 ```
 
-# Release Documentation
+## Building and Running in Docker
+
+If you don't want to install `hugo` and its dependencies local machine, you 
can use
+docker to preview locally.  First checkout the `parquet-site` explained above 
+and then run:
+
+```shell
+# run docker container mounting the current directory to /parquet-site and 
exposing port 1313
+docker run -it -v `pwd`:/parquet-site -p 1313:1313  debian:bullseye-slim  bash

Review Comment:
   A docker file is a good idea. I will make one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


wgtmac commented on code in PR #56:
URL: https://github.com/apache/parquet-site/pull/56#discussion_r1597467736


##
README.md:
##
@@ -14,21 +15,61 @@ cd parquet-site
 git submodule update --init --recursive
 ```
 
-To build or update your site’s CSS resources, you also need PostCSS to create 
the final assets. By default npm installs tools under the directory where you 
run npm install.
+To build or update CSS resources, you also need PostCSS to create the final 
assets.  By default npm installs tools under the directory where you run npm 
install.
 
 ```
 npm install -D autoprefixer
 npm install -D postcss-cli
 npm install -D postcss
 ```
 
-To run this website site locally, run the following in the root of the 
directory:
+To preview this website site locally, run the following in the root of the 
directory:
 
 ```shell
 hugo server
 ```
 
-# Release Documentation
+## Building and Running in Docker
+
+If you don't want to install `hugo` and its dependencies local machine, you 
can use

Review Comment:
   ```suggestion
   If you don't want to install `hugo` and its dependencies on local machine, 
you can use
   ```



##
README.md:
##
@@ -14,21 +15,61 @@ cd parquet-site
 git submodule update --init --recursive
 ```
 
-To build or update your site’s CSS resources, you also need PostCSS to create 
the final assets. By default npm installs tools under the directory where you 
run npm install.
+To build or update CSS resources, you also need PostCSS to create the final 
assets.  By default npm installs tools under the directory where you run npm 
install.
 
 ```
 npm install -D autoprefixer
 npm install -D postcss-cli
 npm install -D postcss
 ```
 
-To run this website site locally, run the following in the root of the 
directory:
+To preview this website site locally, run the following in the root of the 
directory:
 
 ```shell
 hugo server
 ```
 
-# Release Documentation
+## Building and Running in Docker
+
+If you don't want to install `hugo` and its dependencies local machine, you 
can use
+docker to preview locally.  First checkout the `parquet-site` explained above 
+and then run:
+
+```shell
+# run docker container mounting the current directory to /parquet-site and 
exposing port 1313
+docker run -it -v `pwd`:/parquet-site -p 1313:1313  debian:bullseye-slim  bash

Review Comment:
   Is it better to use a dockerfile which is much easier to use? I'm just 
asking but not required to change. These steps are helpful enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add instructions on how to preview site using docker rather than installing `hugo` locally [parquet-site]

2024-05-11 Thread via GitHub


alamb opened a new pull request, #56:
URL: https://github.com/apache/parquet-site/pull/56

   In order to make changes to the website and have confidence that we won't 
break things we should make sure we can see the results of the work locally. 
   
   I can't / don't want to try and figure out how to get a local `hugo` install 
running locally, and prefer to use docker.
   
   I figured these instructions might help others
   
   BTW I am happy to make a JIRA for this PR, but it isn't clear to me if that 
is desired or NOT in Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597433857


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a "reference" implementation of the Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 
+
+Included in parquet-mr:
+* Java/Scala Implementation: It contains the core Java/Scala implementation of 
the Parquet format, making it possible to use Parquet files in Java 
applications, particularly those based on Hadoop.
+
+* Utilities and APIs: It provides various utilities and APIs for working with 
Parquet files, including tools for data import/export, schema management, and 
data conversion.
+
+
+###  Other Clients / Libraries / Tools
+
+The Parquet ecosystem is rich and varied, encompassing a wide array of tools, 
libraries, and clients, each offering different levels of feature support. It's 
important to note that not all implementations support the same features of the 
Parquet format. When integrating multiple Parquet implementations within your 
workflow, it is crucial to conduct thorough testing to ensure compatibility and 
performance across different platforms and tools.
+
+Here is a non-exhaustive list of Parquet implementations:
+
+* [parquet-mr](https://github.com/apache/parquet-mr)
+* [Parquet C++, a subproject of Arrow 
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet) 
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
+* [parquet 
rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)

Review Comment:
   ```suggestion
   * [parquet go](https://github.com/apache/arrow/tree/main/go/parquet)
   * [parquet 
rust](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PARQUET-2310: implementation status [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on PR #34:
URL: https://github.com/apache/parquet-site/pull/34#issuecomment-2105688283

   FYI https://github.com/apache/parquet-site/pull/53 is a related 
conversation. Once that PR merges perhaps there will be a more natural location 
for this chart / location


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] First draft of docs about parquet format vs mr [parquet-site]

2024-05-11 Thread via GitHub


alamb commented on code in PR #53:
URL: https://github.com/apache/parquet-site/pull/53#discussion_r1597433429


##
content/en/docs/Overview/_index.md:
##
@@ -7,3 +7,40 @@ description: >
 ---
 
 Apache Parquet is a columnar storage format available to any project in the 
Hadoop ecosystem, regardless of the choice of data processing framework, data 
model or programming language.
+
+This documentation contains information about both the 
[parquet-mr](https://github.com/apache/parquet-mr) and 
[parquet-format](https://github.com/apache/parquet-format) projects. 
+
+
+### parquet-format
+
+The parquet-format project hosts the official specification of the Parquet 
file format, defining how data is structured and stored. This specification, 
along with Thrift metadata definitions and other crucial components, is 
essential for developers to effectively read and write Parquet files. The 
parquet-format project specifically contains the format specifications needed 
to understand and properly utilize Parquet files.
+
+As a repository focused on specification, the parquet-format repository does 
not contain source code. 
+
+
+### parquet-mr
+
+The parquet-mr GitHub repository is part of the Apache Parquet project and 
specifically focuses on providing Java tools for handling the Parquet file 
format within the Hadoop ecosystem. Essentially, this repository includes all 
the necessary Java libraries and modules that allow developers to read and 
write Parquet files.
+
+The parquet-mr repo contains a "reference" implementation of the Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 

Review Comment:
   I don't think there is any reason to add quotes
   
   ```suggestion
   The parquet-mr repo contains a reference implementation of the Parquet 
format. There are a number of other Parquet format implementations, which are 
listed below. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   5   6   7   8   9   10   >