Re: [PR] Put same docs on prod [parquet-site]
wgtmac merged PR #52: URL: https://github.com/apache/parquet-site/pull/52 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update .asf.yaml to include updated info [parquet-site]
wgtmac merged PR #51: URL: https://github.com/apache/parquet-site/pull/51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix typos [parquet-site]
Fokko merged PR #46: URL: https://github.com/apache/parquet-site/pull/46 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix typos [parquet-site]
vinooganesh commented on PR #46: URL: https://github.com/apache/parquet-site/pull/46#issuecomment-1989448621 @deining this should be good to go now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Put same docs on prod [parquet-site]
vinooganesh opened a new pull request, #52: URL: https://github.com/apache/parquet-site/pull/52 We updated the docs and added more release versions to the docs on staging. Applying the same settings to prod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Update .asf.yaml to include updated info [parquet-site]
vinooganesh opened a new pull request, #51: URL: https://github.com/apache/parquet-site/pull/51 @shangxinli @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding Old Parquet Format Releases with Links [parquet-site]
wgtmac merged PR #47: URL: https://github.com/apache/parquet-site/pull/47 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update to new website [parquet-site]
shangxinli merged PR #50: URL: https://github.com/apache/parquet-site/pull/50 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding Old Parquet Format Releases with Links [parquet-site]
vinooganesh commented on PR #47: URL: https://github.com/apache/parquet-site/pull/47#issuecomment-1988358117 @wgtmac - Ugh sorry, I'm moving to vscode and was too used to the automatic file saving before commit in my intellij. Actually fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding Old Parquet Format Releases with Links [parquet-site]
wgtmac commented on code in PR #47: URL: https://github.com/apache/parquet-site/pull/47#discussion_r1519649944 ## content/en/blog/parquet-format/2.8.0.md: ## @@ -0,0 +1,20 @@ +--- +title: "2.8.0" +date: 2020-01-13 +description: > +--- + +[Github Release Link](https://github.com/apache/parquet-format/releases/tag/apache-parquet-format-2.7.0). Review Comment: This page is not right. ## content/en/blog/parquet-format/2.9.0.md: ## @@ -4,31 +4,16 @@ date: 2021-10-06 description: > --- -The [latest version of parquet-format is 2.9.0](https://www.apache.org/dyn/closer.lua/parquet/apache-parquet-format-2.9.0/apache-parquet-format-2.9.0.tar.gz). +[Github Release Link](https://github.com/apache/parquet-format/releases/tag/apache-parquet-format-2.7.0). Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding Old Parquet Format Releases with Links [parquet-site]
vinooganesh commented on PR #47: URL: https://github.com/apache/parquet-site/pull/47#issuecomment-1988295518 @wgtmac updated to fix typo and new release url -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix hugo deprecation warning [parquet-site]
wgtmac merged PR #48: URL: https://github.com/apache/parquet-site/pull/48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding Old Parquet Format Releases with Links [parquet-site]
wgtmac commented on code in PR #47: URL: https://github.com/apache/parquet-site/pull/47#discussion_r1519317314 ## content/en/blog/parquet-format/2.7.0.md: ## @@ -0,0 +1,20 @@ +--- +title: "2.7.0" +date: 2019-09-25 +description: > +--- + +[Github Release Link](https://github.com/apache/parquet-format/releases/tag/apache-parquet-format-2.7.0). + +The [latest version of parquet-format is 2.7.0](https://www.apache.org/dyn/closer.lua/parquet/apache-parquet-format-2.7.0/apache-parquet-format-2.7.0.tar.gz). Review Comment: ditto ## content/en/blog/parquet-format/2.10.0.md: ## @@ -0,0 +1,18 @@ +--- +title: "2.10.0" +date: 2023-11-20 +description: > +--- + +[Github Release Link](https://github.com/apache/parquet-format/releases/tag/apache-parquet-format-2.10.0). + +The [latest version of parquet-format is 2.9.0](https://www.apache.org/dyn/closer.lua/parquet/apache-parquet-format-2.10.0/apache-parquet-format-2.10.0.tar.gz). Review Comment: This looks strange to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DO NOT MERGE] Updating Parquet Website [parquet-site]
vinooganesh closed pull request #49: [DO NOT MERGE] Updating Parquet Website URL: https://github.com/apache/parquet-site/pull/49 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Update to new website [parquet-site]
vinooganesh opened a new pull request, #50: URL: https://github.com/apache/parquet-site/pull/50 New website update. @wgtmac @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Updating Parquet Website [parquet-site]
vinooganesh opened a new pull request, #49: URL: https://github.com/apache/parquet-site/pull/49 New Hugo module based website based on staging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix hugo deprecation warning [parquet-site]
vinooganesh commented on PR #48: URL: https://github.com/apache/parquet-site/pull/48#issuecomment-1986424564 Hi @deining - this is actually fixed on staging and there will be a pretty large bulk change going to production soon: https://github.com/apache/parquet-site/blob/staging/config.toml#L50. I'd recommend waiting until that change has merged, because many issues have been fixed there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix typos [parquet-site]
vinooganesh commented on PR #46: URL: https://github.com/apache/parquet-site/pull/46#issuecomment-1986398154 Thanks @deining - I'll merge these into the next version of the website. I'm working on a revamp right now to use hugo modules and change some of the info. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Adding Old Parquet Format Releases with Links [parquet-site]
vinooganesh opened a new pull request, #47: URL: https://github.com/apache/parquet-site/pull/47 Updating to add more parquet format releases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Fix typos [parquet-site]
deining opened a new pull request, #46: URL: https://github.com/apache/parquet-site/pull/46 This PR fixes a few typos I spotted in the project. It also fixes a broken link and converts some links from `http` to `https` protocol. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Force redeploy website [parquet-site]
vinooganesh commented on PR #45: URL: https://github.com/apache/parquet-site/pull/45#issuecomment-1979959820 @wgtmac - Ah, so I think the main ones may be coming from g...@apache.org, which I believe is a gitbox config. I can look into how to get rid of those notifications, but I think that config are specified outside of this yaml config unfortunately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Force redeploy website [parquet-site]
wgtmac merged PR #45: URL: https://github.com/apache/parquet-site/pull/45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Force redeploy website [parquet-site]
wgtmac commented on PR #45: URL: https://github.com/apache/parquet-site/pull/45#issuecomment-1979956360 Could you also update the asf yaml like https://github.com/apache/parquet-mr/blob/master/.asf.yaml to avoid notifications to dev@ ML? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Force redeploy website [parquet-site]
vinooganesh opened a new pull request, #45: URL: https://github.com/apache/parquet-site/pull/45 @wgtmac - apologies, the last PR didn't contain any core website changes so nothing was redeployed. This should fix that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating asf Staging [parquet-site]
wgtmac merged PR #44: URL: https://github.com/apache/parquet-site/pull/44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating asf Staging [parquet-site]
vinooganesh commented on PR #44: URL: https://github.com/apache/parquet-site/pull/44#issuecomment-1979940099 @wgtmac one more for you to get us close to done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Updating asf Staging [parquet-site]
vinooganesh opened a new pull request, #44: URL: https://github.com/apache/parquet-site/pull/44 ASF Docs: https://github.com/apache/infrastructure-asfyaml/blob/main/README.md Iceberg Example: https://github.com/apache/iceberg/blob/main/.asf.yaml -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Dropping go to 1.12 in go.mod [parquet-site]
vinooganesh commented on PR #43: URL: https://github.com/apache/parquet-site/pull/43#issuecomment-1979908298 Thanks - looks like this fixed staging: https://parquet.staged.apache.org/! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding updated doap file [parquet-site]
wgtmac merged PR #41: URL: https://github.com/apache/parquet-site/pull/41 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Dropping go to 1.12 in go.mod [parquet-site]
wgtmac merged PR #43: URL: https://github.com/apache/parquet-site/pull/43 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding updated doap file [parquet-site]
vinooganesh commented on PR #41: URL: https://github.com/apache/parquet-site/pull/41#issuecomment-1979782097 @wgtmac @shangxinli also this one please! I'll follow up with a full production PR once the staging site build passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Dropping go to 1.12 in go.mod [parquet-site]
vinooganesh commented on PR #43: URL: https://github.com/apache/parquet-site/pull/43#issuecomment-1979780007 @wgtmac could you +1 and merge this too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Dropping go to 1.12 in go.mod [parquet-site]
vinooganesh opened a new pull request, #43: URL: https://github.com/apache/parquet-site/pull/43 This should fix the build. There was an issue that has to do with missing toolchain here: https://github.com/golang/go/issues/62278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
vinooganesh commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1977909721 Ugh I think staging is hitting this: https://github.com/golang/go/issues/62278 ```go: download go1.23 for linux/amd64: toolchain not available``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
vinooganesh commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1977811642 @wgtmac - that's a valid point. It's good for situations like this where we have to debug something strange before going to prod (https://github.com/apache/parquet-site/actions/runs/8149845126/job/22275151185). But, overall, it's pretty common for folks to have a staging site. For example: https://github.com/apache/dubbo-website/blob/master/.asf.yaml#L31. ASF docs have it as an option too: https://github.com/apache/infrastructure-asfyaml/blob/main/README.md#staging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
wgtmac commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1977806694 Is it possible to retire the staging site? It does not seem necessary to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add DOAP to staging [parquet-site]
wgtmac merged PR #42: URL: https://github.com/apache/parquet-site/pull/42 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Add DOAP to staging [parquet-site]
vinooganesh opened a new pull request, #42: URL: https://github.com/apache/parquet-site/pull/42 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
vinooganesh commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1977115694 FYI in case someone finds this later, the URL of staging is https://parquet.staged.apache.org/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Adding updated doap file [parquet-site]
vinooganesh opened a new pull request, #41: URL: https://github.com/apache/parquet-site/pull/41 Updating DOAP file to latest parquet release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
shangxinli merged PR #40: URL: https://github.com/apache/parquet-site/pull/40 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
shangxinli commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1973486947 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating README [parquet-site]
shangxinli merged PR #39: URL: https://github.com/apache/parquet-site/pull/39 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating README [parquet-site]
shangxinli commented on PR #39: URL: https://github.com/apache/parquet-site/pull/39#issuecomment-1973485657 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating README [parquet-site]
vinooganesh commented on PR #39: URL: https://github.com/apache/parquet-site/pull/39#issuecomment-1973401634 @shangxinli can you +1 and merge? once it's good here, I'll make the change on prod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
vinooganesh commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1973400827 @shangxinli can you +1 and merge, once it's good here, I'll make the change on prod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
vinooganesh commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1966813570 Yeah haha @wgtmac, that's mostly me being negligent. I'm going to get everything up to date on both staging and production now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating Staging [parquet-site]
wgtmac commented on PR #40: URL: https://github.com/apache/parquet-site/pull/40#issuecomment-1966791036 IIUC, the staging size is no longer maintained. cc @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Updating README [parquet-site]
vinooganesh opened a new pull request, #39: URL: https://github.com/apache/parquet-site/pull/39 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Updating docsy to 2bfdac43ca13cb6605f1103581f77ba6e08a6c72 [parquet-site]
vinooganesh closed pull request #27: Updating docsy to 2bfdac43ca13cb6605f1103581f77ba6e08a6c72 URL: https://github.com/apache/parquet-site/pull/27 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update logicaltypes.md [parquet-site]
wgtmac commented on code in PR #38: URL: https://github.com/apache/parquet-site/pull/38#discussion_r1470597540 ## content/en/docs/File Format/Types/logicaltypes.md: ## @@ -10,4 +10,4 @@ of primitive types to a minimum and reuses parquet's efficient encodings. For example, strings are stored as byte arrays (binary) with a UTF8 annotation. These annotations define how to further decode and interpret the data. Annotations are stored as `LogicalType` fields in the file metadata and are -documented in LogicalTypes.md. +documented in [LogicalTypes.md](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) Review Comment: We've had a similar discussion since it is painful to update the site for every release: https://lists.apache.org/thread/5qrcz8ps83hnd079dtzd1q6xm31zjbvh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update logicaltypes.md [parquet-site]
jasonhorner commented on code in PR #38: URL: https://github.com/apache/parquet-site/pull/38#discussion_r1470502252 ## content/en/docs/File Format/Types/logicaltypes.md: ## @@ -10,4 +10,4 @@ of primitive types to a minimum and reuses parquet's efficient encodings. For example, strings are stored as byte arrays (binary) with a UTF8 annotation. These annotations define how to further decode and interpret the data. Annotations are stored as `LogicalType` fields in the file metadata and are -documented in LogicalTypes.md. +documented in [LogicalTypes.md](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) Review Comment: is there a way we can do this in a way that will always link to the right version of the docs? Happy to do the update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update logicaltypes.md [parquet-site]
wgtmac commented on code in PR #38: URL: https://github.com/apache/parquet-site/pull/38#discussion_r1460506762 ## content/en/docs/File Format/Types/logicaltypes.md: ## @@ -10,4 +10,4 @@ of primitive types to a minimum and reuses parquet's efficient encodings. For example, strings are stored as byte arrays (binary) with a UTF8 annotation. These annotations define how to further decode and interpret the data. Annotations are stored as `LogicalType` fields in the file metadata and are -documented in LogicalTypes.md. +documented in [LogicalTypes.md](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) Review Comment: This is by purpose. Or at least we should link to a release version: https://github.com/apache/parquet-format/blob/apache-parquet-format-2.10.0/LogicalTypes.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Update logicaltypes.md [parquet-site]
jasonhorner opened a new pull request, #38: URL: https://github.com/apache/parquet-site/pull/38 Fixing broken link -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Sync site with format release v2.10.0 [parquet-site]
krylov-krylov commented on code in PR #37: URL: https://github.com/apache/parquet-site/pull/37#discussion_r1451890924 ## content/en/docs/File Format/bloomfilter.md: ## @@ -0,0 +1,335 @@ +--- +title: "Bloom Filter" +linkTitle: "Bloom Filter" +weight: 7 +--- +### Problem statement +In their current format, column statistics and dictionaries can be used for predicate +pushdown. Statistics include minimum and maximum value, which can be used to filter out +values not in the range. Dictionaries are more specific, and readers can filter out values +that are between min and max but not in the dictionary. However, when there are too many +distinct values, writers sometimes choose not to add dictionaries because of the extra +space they occupy. This leaves columns with large cardinalities and widely separated min +and max without support for predicate pushdown. + +A [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a compact data structure that +overapproximates a set. It can respond to membership queries with either "definitely no" or +"probably yes", where the probability of false positives is configured when the filter is +initialized. Bloom filters do not have false negatives. + +Because Bloom filters are small compared to dictionaries, they can be used for predicate +pushdown even in columns with high cardinality and when space is at a premium. + +### Goal +* Enable predicate pushdown for high-cardinality columns while using less space than + dictionaries. + +* Induce no additional I/O overhead when executing queries on columns without Bloom + filters attached or when executing non-selective queries. + +### Technical Approach + +The section describes split block Bloom filters, which is the first +(and, at time of writing, only) Bloom filter representation supported +in Parquet. + +First we will describe a "block". This is the main component split +block Bloom filters are composed of. + +Each block is 256 bits, broken up into eight contiguous "words", each +consisting of 32 bits. Each word is thought of as an array of bits; +each bit is either "set" or "not set". + +When initialized, a block is "empty", which means each of the eight +component words has no bits set. In addition to initialization, a +block supports two other operations: `block_insert` and +`block_check`. Both take a single unsigned 32-bit integer as input; +`block_insert` returns no value, but modifies the block, while +`block_check` returns a boolean. The semantics of `block_check` are +that it must return `true` if `block_insert` was previously called on +the block with the same argument, and otherwise it returns `false` +with high probability. For more details of the probability, see below. + +The operations `block_insert` and `block_check` depend on some +auxiliary artifacts. First, there is a sequence of eight odd unsigned +32-bit integer constants called the `salt`. Second, there is a method +called `mask` that takes as its argument a single unsigned 32-bit +integer and returns a block in which each word has exactly one bit +set. + +``` +unsigned int32 salt[8] = {0x47b6137bU, 0x44974d91U, 0x8824ad5bU, + 0xa2b7289dU, 0x705495c7U, 0x2df1424bU, + 0x9efc4947U, 0x5c6bfb31U} + Review Comment: Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Sync site with format release v2.10.0 [parquet-site]
krylov-krylov commented on code in PR #37: URL: https://github.com/apache/parquet-site/pull/37#discussion_r1451890553 ## content/en/docs/File Format/bloomfilter.md: ## @@ -0,0 +1,335 @@ +--- +title: "Bloom Filter" +linkTitle: "Bloom Filter" +weight: 7 +--- +### Problem statement +In their current format, column statistics and dictionaries can be used for predicate +pushdown. Statistics include minimum and maximum value, which can be used to filter out +values not in the range. Dictionaries are more specific, and readers can filter out values +that are between min and max but not in the dictionary. However, when there are too many +distinct values, writers sometimes choose not to add dictionaries because of the extra +space they occupy. This leaves columns with large cardinalities and widely separated min +and max without support for predicate pushdown. + +A [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a compact data structure that +overapproximates a set. It can respond to membership queries with either "definitely no" or +"probably yes", where the probability of false positives is configured when the filter is +initialized. Bloom filters do not have false negatives. + +Because Bloom filters are small compared to dictionaries, they can be used for predicate +pushdown even in columns with high cardinality and when space is at a premium. + +### Goal +* Enable predicate pushdown for high-cardinality columns while using less space than + dictionaries. + +* Induce no additional I/O overhead when executing queries on columns without Bloom + filters attached or when executing non-selective queries. + +### Technical Approach + +The section describes split block Bloom filters, which is the first +(and, at time of writing, only) Bloom filter representation supported +in Parquet. + +First we will describe a "block". This is the main component split +block Bloom filters are composed of. + +Each block is 256 bits, broken up into eight contiguous "words", each +consisting of 32 bits. Each word is thought of as an array of bits; +each bit is either "set" or "not set". + +When initialized, a block is "empty", which means each of the eight +component words has no bits set. In addition to initialization, a +block supports two other operations: `block_insert` and +`block_check`. Both take a single unsigned 32-bit integer as input; +`block_insert` returns no value, but modifies the block, while +`block_check` returns a boolean. The semantics of `block_check` are +that it must return `true` if `block_insert` was previously called on +the block with the same argument, and otherwise it returns `false` +with high probability. For more details of the probability, see below. + +The operations `block_insert` and `block_check` depend on some +auxiliary artifacts. First, there is a sequence of eight odd unsigned +32-bit integer constants called the `salt`. Second, there is a method +called `mask` that takes as its argument a single unsigned 32-bit +integer and returns a block in which each word has exactly one bit +set. + +``` +unsigned int32 salt[8] = {0x47b6137bU, 0x44974d91U, 0x8824ad5bU, + 0xa2b7289dU, 0x705495c7U, 0x2df1424bU, + 0x9efc4947U, 0x5c6bfb31U} + Review Comment: 0x678fcf657e1e90658d540bca155bca7df8f512bc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Sync site with format release v2.10.0 [parquet-site]
shangxinli merged PR #37: URL: https://github.com/apache/parquet-site/pull/37 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2259: Update site to sync with latest parquet-format [parquet-site]
wgtmac closed pull request #31: PARQUET-2259: Update site to sync with latest parquet-format URL: https://github.com/apache/parquet-site/pull/31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Sync site with format release v2.10.0 [parquet-site]
wgtmac commented on PR #37: URL: https://github.com/apache/parquet-site/pull/37#issuecomment-1890940974 I have just updated the site, PTAL. @gszadovszky @shangxinli @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Sync site with format release v2.10.0 [parquet-site]
wgtmac opened a new pull request, #37: URL: https://github.com/apache/parquet-site/pull/37 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix checksumming.md [parquet-site]
wgtmac merged PR #35: URL: https://github.com/apache/parquet-site/pull/35 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
wgtmac merged PR #224: URL: https://github.com/apache/parquet-format/pull/224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
wgtmac commented on PR #224: URL: https://github.com/apache/parquet-format/pull/224#issuecomment-1841970937 Thanks for the suggestion! @kou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
wgtmac commented on PR #224: URL: https://github.com/apache/parquet-format/pull/224#issuecomment-1841024568 > @kou Does the file syntax look ok? I copied this from https://github.com/apache/zookeeper/blob/master/.asf.yaml -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
pitrou commented on PR #224: URL: https://github.com/apache/parquet-format/pull/224#issuecomment-1841021328 @kou Does the file syntax look ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Fix INTERVAL sort order doc in parquet.thrift to be undefined [parquet-format]
wgtmac merged PR #222: URL: https://github.com/apache/parquet-format/pull/222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
wgtmac commented on PR #224: URL: https://github.com/apache/parquet-format/pull/224#issuecomment-1841016437 PTAL. Thanks! @pitrou @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] PARQUET-2409: Add custom .asf.yaml for email notification [parquet-format]
wgtmac opened a new pull request, #224: URL: https://github.com/apache/parquet-format/pull/224 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2409 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2407: Add customized .asf.yaml for email notifications [parquet-mr]
wgtmac merged PR #1232: URL: https://github.com/apache/parquet-mr/pull/1232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2407: Add customized .asf.yaml for email notifications [parquet-mr]
wgtmac commented on PR #1232: URL: https://github.com/apache/parquet-mr/pull/1232#issuecomment-1839865078 Thanks @gszadovszky @pitrou! I'll merge it to see what happens. If it works as expected, I'll create a PR for parquet-format as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2408: Fix license header in .gitattributes [parquet-mr]
Fokko merged PR #1231: URL: https://github.com/apache/parquet-mr/pull/1231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]
shangxinli commented on code in PR #1177: URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414220467 ## parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java: ## @@ -409,4 +428,14 @@ abstract void writePage( ValuesWriter definitionLevels, ValuesWriter values) throws IOException; + + abstract void writePage( + int rowCount, + int valueCount, + Statistics statistics, + SizeStatistics sizeStatistics, Review Comment: There could be some confusion of the two names that sieStatistics is one type of statistics but we are separating them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]
shangxinli commented on code in PR #1177: URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414218649 ## parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java: ## @@ -389,7 +400,14 @@ void writePage() { this.rowsWrittenSoFar += pageRowCount; if (DEBUG) LOG.debug("write page"); try { - writePage(pageRowCount, valueCount, statistics, repetitionLevelColumn, definitionLevelColumn, dataColumn); + writePage( + pageRowCount, + valueCount, + statistics, + sizeStatisticsBuilder.build(), Review Comment: Can we have the parity of lines 406 and 407? You can use a variable in line 407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2261: Implement SizeStatistics [parquet-mr]
shangxinli commented on code in PR #1177: URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414218649 ## parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java: ## @@ -389,7 +400,14 @@ void writePage() { this.rowsWrittenSoFar += pageRowCount; if (DEBUG) LOG.debug("write page"); try { - writePage(pageRowCount, valueCount, statistics, repetitionLevelColumn, definitionLevelColumn, dataColumn); + writePage( + pageRowCount, + valueCount, + statistics, + sizeStatisticsBuilder.build(), Review Comment: Can we have parity of line 406 and 407? You can use a varaiible in line 407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2407: Add customized .asf.yaml for email notifications [parquet-mr]
wgtmac commented on PR #1232: URL: https://github.com/apache/parquet-mr/pull/1232#issuecomment-1838950498 Per the [discussion on ML](https://lists.apache.org/thread/4x2ob2ojkznfft3czz0gypmtoz7vo9fz), I proposed to create an .asf.yaml file for customizing email notification. Please take a look, thanks! @gszadovszky @shangxinli @pitrou @emkornfield -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] PARQUET-2407: Add customized .asf.yaml for email notifications [parquet-mr]
wgtmac opened a new pull request, #1232: URL: https://github.com/apache/parquet-mr/pull/1232 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2407 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Style - [ ] My contribution adheres to the code style guidelines and Spotless passes. - To apply the necessary changes, run `mvn spotless:apply -Pvector-plugins` ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2408: Fix license header in .gitattributes [parquet-mr]
wgtmac commented on PR #1231: URL: https://github.com/apache/parquet-mr/pull/1231#issuecomment-1838934525 PTAL @amousavigourabi @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] PARQUET-2408: Fix license header in .gitattributes [parquet-mr]
wgtmac opened a new pull request, #1231: URL: https://github.com/apache/parquet-mr/pull/1231 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-2408 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Style - [ ] My contribution adheres to the code style guidelines and Spotless passes. - To apply the necessary changes, run `mvn spotless:apply -Pvector-plugins` ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-1822: Avoid requiring Hadoop installation for reading/writing [parquet-mr]
drealeed commented on PR #: URL: https://github.com/apache/parquet-mr/pull/#issuecomment-1838584976 @amousavigourabi , that's actually what I did and it's working for us now. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] PARQUET-2405: Include fallback CodecFactory implementation [parquet-mr]
amousavigourabi opened a new pull request, #1230: URL: https://github.com/apache/parquet-mr/pull/1230 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Style - [x] My contribution adheres to the code style guidelines and Spotless passes. - To apply the necessary changes, run `mvn spotless:apply -Pvector-plugins` ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] PARQUET-2406: Clean up valueOf calls where possible [parquet-mr]
amousavigourabi opened a new pull request, #1229: URL: https://github.com/apache/parquet-mr/pull/1229 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Style - [ ] My contribution adheres to the code style guidelines and Spotless passes. - To apply the necessary changes, run `mvn spotless:apply -Pvector-plugins` ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2393: Make `ColumnIOCreatorVisitor` static [parquet-mr]
Fokko merged PR #1216: URL: https://github.com/apache/parquet-mr/pull/1216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2401: Synchronize on final fields [parquet-mr]
Fokko merged PR #1224: URL: https://github.com/apache/parquet-mr/pull/1224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2392: Remove StringBuilder in `LogicalTypeAnnotation` [parquet-mr]
Fokko merged PR #1215: URL: https://github.com/apache/parquet-mr/pull/1215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2394: Use `computeIfAbsent` in `MessageColumnIO` [parquet-mr]
Fokko merged PR #1217: URL: https://github.com/apache/parquet-mr/pull/1217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type [parquet-mr]
wgtmac commented on PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#issuecomment-1838002383 BTW, it would be good to add an interoperability test to read parquet files from here: https://github.com/apache/parquet-testing/commit/da467dac2f095b979af37bcf40fa0d1dee5ff652. You may want to take a look at this example: https://github.com/apache/parquet-mr/blob/44b56225be6fe7b74667f4f2430326ef1f076cc5/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java#L40 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2395: Prefer `singletonList` over `asList` [parquet-mr]
wgtmac commented on PR #1218: URL: https://github.com/apache/parquet-mr/pull/1218#issuecomment-1837995639 Thanks for the explanation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2395: Prefer `singletonList` over `asList` [parquet-mr]
Fokko commented on PR #1218: URL: https://github.com/apache/parquet-mr/pull/1218#issuecomment-1837993837 Thanks for the review @wgtmac, @zhangjiashen and @amousavigourabi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2395: Prefer `singletonList` over `asList` [parquet-mr]
Fokko merged PR #1218: URL: https://github.com/apache/parquet-mr/pull/1218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2395: Prefer `singletonList` over `asList` [parquet-mr]
Fokko commented on PR #1218: URL: https://github.com/apache/parquet-mr/pull/1218#issuecomment-1837993326 @wgtmac Two things: - `singletonList` is completely immutable, while with `asList` you can still mutate the reference. - `singletonList` is not backed by an array, reducing the memory footprint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2344: Bump to Thrift 0.19.0 [parquet-mr]
wgtmac commented on code in PR #1192: URL: https://github.com/apache/parquet-mr/pull/1192#discussion_r141345 ## pom.xml: ## @@ -619,6 +622,9 @@ true true + + javax.annotation:javax.annotation-api:jar:1.3.2 Review Comment: Why do we need to ignore this? ## parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java: ## @@ -225,14 +225,18 @@ private static ThriftField toThriftField(String name, Field field, ThriftField.R final Field listElemField = field.getListElemField(); type = new ThriftType.ListType(toThriftField(listElemField.getName(), listElemField, requirement)); break; + case UUID: case ENUM: -Collection enumValues = field.getEnumValues(); -List values = new ArrayList(); -for (TEnum tEnum : enumValues) { - values.add(new EnumValue(tEnum.getValue(), tEnum.toString())); +if (field.isEnum()) { Review Comment: Why mixing UUID and ENUM in this case? ## parquet-format-structures/pom.xml: ## @@ -156,6 +156,11 @@ libthrift ${format.thrift.version} + + javax.annotation + javax.annotation-api Review Comment: Where do we need this? ## parquet-thrift/src/main/java/org/apache/parquet/thrift/struct/ThriftTypeID.java: ## @@ -51,10 +51,15 @@ public enum ThriftTypeID { LIST (TType.LIST, true, ListType.class), ENUM (TType.ENUM, TType.I32, EnumType.class); - private static ThriftTypeID[] types = new ThriftTypeID[17]; + private static final ThriftTypeID[] types; static { +types = new ThriftTypeID[18]; for (ThriftTypeID t : ThriftTypeID.values()) { - types[t.thriftType] = t; + if (t.thriftType == -1) { Review Comment: It would be good to add the link to the comment as well. Or at least we need to explain why -1 is used here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2396: Refactor `ColumnIndexBuilder` [parquet-mr]
Fokko merged PR #1219: URL: https://github.com/apache/parquet-mr/pull/1219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2391: Remove unnecessary unboxing [parquet-mr]
Fokko commented on PR #1214: URL: https://github.com/apache/parquet-mr/pull/1214#issuecomment-1837969587 Thanks for the review @wgtmac & @amousavigourabi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type [parquet-mr]
zhangjiashen commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413455235 ## pom.xml: ## @@ -596,6 +597,9 @@ org.apache.parquet.arrow.schema.SchemaMapping + + org.apache.parquet.io.api.Binary Review Comment: @wgtmac Just updated pom.xml to only exclude methods -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2391: Remove unnecessary unboxing [parquet-mr]
Fokko merged PR #1214: URL: https://github.com/apache/parquet-mr/pull/1214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2385: Allow user to specify CodecFactory for ParquetWriter [parquet-mr]
wgtmac merged PR #1203: URL: https://github.com/apache/parquet-mr/pull/1203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2390: Replace anonymous functions with lambdas [parquet-mr]
Fokko commented on PR #1213: URL: https://github.com/apache/parquet-mr/pull/1213#issuecomment-1837879063 Thanks for the review @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2390: Replace anonymous functions with lambdas [parquet-mr]
Fokko merged PR #1213: URL: https://github.com/apache/parquet-mr/pull/1213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-2400: Update Spotless command in PR prompt to include plugins [parquet-mr]
wgtmac merged PR #1223: URL: https://github.com/apache/parquet-mr/pull/1223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type [parquet-mr]
wgtmac commented on code in PR #1142: URL: https://github.com/apache/parquet-mr/pull/1142#discussion_r1413353347 ## pom.xml: ## @@ -596,6 +597,9 @@ org.apache.parquet.arrow.schema.SchemaMapping + + org.apache.parquet.io.api.Binary Review Comment: Thanks for rebasing! Could you change the exclusion to the level of a specific method? Class level seems too wide to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org