[jira] [Updated] (PARQUET-2250) Expose column descriptor through RecordReader

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-2250: Labels: pull-request-available (was: ) > Expose column descriptor through RecordReader

[jira] [Created] (PARQUET-2250) Expose column descriptor through RecordReader

2023-02-23 Thread fatemah (Jira)
fatemah created PARQUET-2250: Summary: Expose column descriptor through RecordReader Key: PARQUET-2250 URL: https://issues.apache.org/jira/browse/PARQUET-2250 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693007#comment-17693007 ] ASF GitHub Bot commented on PARQUET-2251: - wgtmac commented on code in PR #1033: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1033: PARQUET-2251 Avoid generating Bloomfilter when all pages of a column are encoded by dictionary in parquet pageV1

2023-02-23 Thread via GitHub
wgtmac commented on code in PR #1033: URL: https://github.com/apache/parquet-mr/pull/1033#discussion_r1116520642 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestStoreBloomFilter.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693017#comment-17693017 ] ASF GitHub Bot commented on PARQUET-2251: - yabola commented on code in PR #1033: URL:

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693023#comment-17693023 ] ASF GitHub Bot commented on PARQUET-2149: - whcdjj commented on PR #968: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
wgtmac commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1116427502 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-831) Corrupt Parquet Files

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692972#comment-17692972 ] ASF GitHub Bot commented on PARQUET-831: wgtmac commented on code in PR #1022: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1022: PARQUET-831: fix estimate page size check overflow corrupting parquet

2023-02-23 Thread via GitHub
wgtmac commented on code in PR #1022: URL: https://github.com/apache/parquet-mr/pull/1022#discussion_r1116429546 ## parquet-column/src/test/java/org/apache/parquet/column/impl/TestColumnWriterV1.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692993#comment-17692993 ] ASF GitHub Bot commented on PARQUET-2251: - yabola opened a new pull request, #1033: URL:

[GitHub] [parquet-mr] yabola opened a new pull request, #1033: PARQUET-2251 Avoid generating Bloomfilter when all pages of a column are encoded by dictionary in parquet pageV1

2023-02-23 Thread via GitHub
yabola opened a new pull request, #1033: URL: https://github.com/apache/parquet-mr/pull/1033 In parquet pageV1, even all pages of a column are encoded by dictionary, it will still generate BloomFilter. Actually it is unnecessary to generate BloomFilter and it cost time and occupy storage.

[jira] [Commented] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693000#comment-17693000 ] ASF GitHub Bot commented on PARQUET-2251: - yabola commented on PR #1033: URL:

[GitHub] [parquet-mr] yabola commented on pull request #1033: PARQUET-2251 Avoid generating Bloomfilter when all pages of a column are encoded by dictionary in parquet pageV1

2023-02-23 Thread via GitHub
yabola commented on PR #1033: URL: https://github.com/apache/parquet-mr/pull/1033#issuecomment-1442792170 @wgtmac @gerashegalov Please take a look, thank you~ And I will update [PR](https://github.com/apache/parquet-mr/pull/1023) to skip bloomfilter when all pages are encoded in

[jira] [Resolved] (PARQUET-2201) Add Stress test for RecordReader SkipRecords

2023-02-23 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved PARQUET-2201. -- Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Assigned] (PARQUET-2201) Add Stress test for RecordReader SkipRecords

2023-02-23 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield reassigned PARQUET-2201: Assignee: fatemah > Add Stress test for RecordReader SkipRecords >

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692970#comment-17692970 ] ASF GitHub Bot commented on PARQUET-2159: - wgtmac commented on code in PR #1011: URL:

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1, even all pages of one column are encoded by dictionary (was: In parquet

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of a column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Summary: Avoid generating Bloomfilter when all pages of a column are encoded by dictionary (was: Avoid

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1, even all pages of a column are encoded by dictionary, it will still

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692968#comment-17692968 ] ASF GitHub Bot commented on PARQUET-2159: - wgtmac commented on code in PR #1011: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
wgtmac commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1116420531 ## pom.xml: ## @@ -151,6 +151,9 @@ parquet-scala parquet-thrift parquet-hadoop-bundle + +http://maven.apache.org/POM/4.0.0; +

[jira] [Created] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
Mars created PARQUET-2251: - Summary: Avoid generating Bloomfilter when all pages of one column are encoded by dictionary Key: PARQUET-2251 URL: https://issues.apache.org/jira/browse/PARQUET-2251 Project:

[jira] [Updated] (PARQUET-2251) Avoid generating Bloomfilter when all pages of one column are encoded by dictionary

2023-02-23 Thread Mars (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated PARQUET-2251: -- Description: In parquet pageV1,  > Avoid generating Bloomfilter when all pages of one column are encoded by

[GitHub] [parquet-mr] whcdjj commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2023-02-23 Thread via GitHub
whcdjj commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1442878000 Hi, I am very interested in this optimization and just have some questiones when testing in a cluster with 4nodes/96 cores using spark3.1. Unfortunately, I see little improvement. I

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1033: PARQUET-2251 Avoid generating Bloomfilter when all pages of a column are encoded by dictionary in parquet v1

2023-02-23 Thread via GitHub
yabola commented on code in PR #1033: URL: https://github.com/apache/parquet-mr/pull/1033#discussion_r1116549921 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestStoreBloomFilter.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692604#comment-17692604 ] ASF GitHub Bot commented on PARQUET-2159: - gszadovszky commented on code in PR #1011: URL:

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
gszadovszky commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115429762 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692616#comment-17692616 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115471862 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692606#comment-17692606 ] ASF GitHub Bot commented on PARQUET-2159: - jatin-bhateja commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jatin-bhateja commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jatin-bhateja commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115408615 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692614#comment-17692614 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115471862 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692597#comment-17692597 ] ASF GitHub Bot commented on PARQUET-2159: - jatin-bhateja commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jatin-bhateja commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jatin-bhateja commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115435366 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692644#comment-17692644 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115317241 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692642#comment-17692642 ] ASF GitHub Bot commented on PARQUET-2159: - jiangjiguang commented on code in PR #1011: URL:

[GitHub] [parquet-mr] jiangjiguang commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jiangjiguang commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115317241 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the

[jira] [Commented] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692685#comment-17692685 ] ASF GitHub Bot commented on PARQUET-2246: - gszadovszky merged PR #1030: URL:

[jira] [Commented] (PARQUET-2198) Vulnerabilities in jackson-databind

2023-02-23 Thread Brais Couce (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692703#comment-17692703 ] Brais Couce commented on PARQUET-2198: -- I see that the PR associated to this ticket was merged

[jira] [Assigned] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-23 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2246: - Assignee: Yujiang Zhong > Add short circuit logic to column index filter >

[jira] [Resolved] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-23 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2246. --- Resolution: Fixed > Add short circuit logic to column index filter >

[GitHub] [parquet-mr] gszadovszky merged pull request #1030: PARQUET-2246: Add short circuit logic to column index filter

2023-02-23 Thread via GitHub
gszadovszky merged PR #1030: URL: https://github.com/apache/parquet-mr/pull/1030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-831) Corrupt Parquet Files

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692816#comment-17692816 ] ASF GitHub Bot commented on PARQUET-831: jianchun commented on code in PR #1022: URL:

[GitHub] [parquet-mr] jianchun commented on a diff in pull request #1022: PARQUET-831: fix estimate page size check overflow corrupting parquet

2023-02-23 Thread via GitHub
jianchun commented on code in PR #1022: URL: https://github.com/apache/parquet-mr/pull/1022#discussion_r1116058380 ## parquet-column/src/test/java/org/apache/parquet/column/impl/TestColumnWriterV1.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] jatin-bhateja commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-23 Thread via GitHub
jatin-bhateja commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1115408615 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692819#comment-17692819 ] ASF GitHub Bot commented on PARQUET-2159: - jatin-bhateja commented on code in PR #1011: URL: