[GitHub] [parquet-mr] wgtmac commented on pull request #1011: PARQUET-2159: Vectorized BytePacker decoder using Java VectorAPI

2023-03-01 Thread via GitHub
wgtmac commented on PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#issuecomment-1451385421 I'd request sign off from @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695499#comment-17695499 ] ASF GitHub Bot commented on PARQUET-2159: - wgtmac commented on PR #1011: URL:

[jira] [Commented] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695023#comment-17695023 ] ASF GitHub Bot commented on PARQUET-2252: - zhongyujiang commented on PR #1038: URL:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1038: PARQUET-2252: Make some methods public to allow external projects to …

2023-03-01 Thread via GitHub
zhongyujiang commented on PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#issuecomment-1449991193 @wgtmac @rdblue Can you please help review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [parquet-mr] gszadovszky commented on pull request #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-03-01 Thread via GitHub
gszadovszky commented on PR #1036: URL: https://github.com/apache/parquet-mr/pull/1036#issuecomment-1450139433 Thanks a lot, @wgtmac. It looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695070#comment-17695070 ] ASF GitHub Bot commented on PARQUET-2230: - gszadovszky commented on PR #1036: URL:

Re: Fallback Encoding for Very Sparse or Sorted Datasets

2023-03-01 Thread Patrick Hansert
Hi Gang, thanks for your reply. On 01.03.23 03:09, Gang Wu wrote: If at least one record in the beginning 2 rows is not null, then the encoded size will be much better. That is the workaround I have been using for the past weeks, although my tests show that at least two values are

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1038: PARQUET-2252: Make some methods public to allow external projects to …

2023-03-01 Thread via GitHub
zhongyujiang opened a new pull request, #1038: URL: https://github.com/apache/parquet-mr/pull/1038 …implement page skipping. Issue: [PARQUET-2252](https://issues.apache.org/jira/browse/PARQUET-2252) This PR makes some methods required to implement column index filter public to

[jira] [Commented] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695022#comment-17695022 ] ASF GitHub Bot commented on PARQUET-2252: - zhongyujiang opened a new pull request, #1038: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-03-01 Thread via GitHub
gszadovszky commented on PR #1036: URL: https://github.com/apache/parquet-mr/pull/1036#issuecomment-1450142055 (Congrats for the committership! From now on I won't push your PRs. :wink: ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695073#comment-17695073 ] ASF GitHub Bot commented on PARQUET-2230: - gszadovszky commented on PR #1036: URL:

[jira] [Created] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2252: -- Summary: Make some methods public to allow external projects to implement page skipping Key: PARQUET-2252 URL: https://issues.apache.org/jira/browse/PARQUET-2252

Re: Fallback Encoding for Very Sparse or Sorted Datasets

2023-03-01 Thread Gang Wu
> What are the reasons for forcing the dictionary to be the first page? This is by design. I guess it benefits sequential scan where the dictionary page is read first and then followed by its encoded indices in the data pages. Otherwise we need to seek anyway. > can this be changed to allow for

[GitHub] [parquet-mr] wgtmac merged pull request #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-03-01 Thread via GitHub
wgtmac merged PR #1036: URL: https://github.com/apache/parquet-mr/pull/1036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695171#comment-17695171 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac merged PR #1036: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1036: PARQUET-2230: [CLI] Deprecate commands replaced by rewrite

2023-03-01 Thread via GitHub
wgtmac commented on PR #1036: URL: https://github.com/apache/parquet-mr/pull/1036#issuecomment-1450365870 > (Congrats for the committership! From now on I won't push your PRs.  ) Thank you for your help all the time! @gszadovszky -- This is an automated message from the Apache

[jira] [Commented] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695170#comment-17695170 ] ASF GitHub Bot commented on PARQUET-2230: - wgtmac commented on PR #1036: URL:

[jira] [Commented] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695180#comment-17695180 ] ASF GitHub Bot commented on PARQUET-2252: - gszadovszky commented on PR #1038: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #1038: PARQUET-2252: Make some methods public to allow external projects to …

2023-03-01 Thread via GitHub
gszadovszky commented on PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#issuecomment-1450409997 Since these are already used in iceberg I think it is better to have them public and maintain backward compatibility. -- This is an automated message from the Apache Git

[jira] [Commented] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695166#comment-17695166 ] ASF GitHub Bot commented on PARQUET-2252: - wgtmac commented on PR #1038: URL:

[GitHub] [parquet-mr] wgtmac commented on pull request #1038: PARQUET-2252: Make some methods public to allow external projects to …

2023-03-01 Thread via GitHub
wgtmac commented on PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#issuecomment-1450359195 @gszadovszky @shangxinli Do you have any concern? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [parquet-format] wgtmac commented on a diff in pull request #190: Minor: add FIXED_LEN_BYTE_ARRAY under Types in doc

2023-03-01 Thread via GitHub
wgtmac commented on code in PR #190: URL: https://github.com/apache/parquet-format/pull/190#discussion_r1122508344 ## README.md: ## @@ -132,6 +132,7 @@ readers and writers for the format. The types are: - FLOAT: IEEE 32-bit floating point values - DOUBLE: IEEE 64-bit

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695405#comment-17695405 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-03-01 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1121226362 ## .github/workflows/vector-plugins.yml: ## @@ -0,0 +1,56 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-03-01 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1121226362 ## .github/workflows/vector-plugins.yml: ## @@ -0,0 +1,56 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[jira] [Commented] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695407#comment-17695407 ] ASF GitHub Bot commented on PARQUET-2252: - wgtmac commented on code in PR #1038: URL:

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695406#comment-17695406 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-03-01 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1121226362 ## .github/workflows/vector-plugins.yml: ## @@ -0,0 +1,56 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1038: PARQUET-2252: Make some methods public to allow external projects to …

2023-03-01 Thread via GitHub
wgtmac commented on code in PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#discussion_r1122511179 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1011,6 +1012,35 @@ public PageReadStore readFilteredRowGroup(int

[jira] [Resolved] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-03-01 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars resolved PARQUET-2251. --- Resolution: Fixed > Avoid generating Bloomfilter when all pages of a column are encoded by > dictionary >

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: Vectorized BytePacker decoder using Java VectorAPI

2023-03-01 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1122538089 ## .github/workflows/vector-plugins.yml: ## @@ -0,0 +1,56 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695408#comment-17695408 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695418#comment-17695418 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL: