Re: [PR] PARQUET-1942: Bump Arrow to 14.0.1 [parquet-mr]

2023-11-16 Thread via GitHub
Fokko merged PR #1193: URL: https://github.com/apache/parquet-mr/pull/1193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-1942) Bump Apache Arrow 2.0.0

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786667#comment-17786667 ] ASF GitHub Bot commented on PARQUET-1942: - Fokko merged PR #1193: URL:

[jira] [Updated] (PARQUET-1942) Bump Apache Arrow 2.0.0

2023-11-16 Thread Fokko Driesprong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated PARQUET-1942: -- Affects Version/s: 1.13.1 (was: 1.11.0) > Bump Apache

Re: [PR] PARQUET-1942: Bump Arrow to 14.0.1 [parquet-mr]

2023-11-16 Thread via GitHub
Fokko commented on PR #1193: URL: https://github.com/apache/parquet-mr/pull/1193#issuecomment-1814012249 Thanks for the review @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (PARQUET-1942) Bump Apache Arrow 2.0.0

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786668#comment-17786668 ] ASF GitHub Bot commented on PARQUET-1942: - Fokko commented on PR #1193: URL:

[jira] [Resolved] (PARQUET-1942) Bump Apache Arrow 2.0.0

2023-11-16 Thread Fokko Driesprong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved PARQUET-1942. --- Resolution: Fixed > Bump Apache Arrow 2.0.0 > --- > >

[jira] [Updated] (PARQUET-1942) Bump Apache Arrow 2.0.0

2023-11-16 Thread Fokko Driesprong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated PARQUET-1942: -- Fix Version/s: 1.14.0 > Bump Apache Arrow 2.0.0 > --- > >

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786777#comment-17786777 ] Gang Wu commented on PARQUET-2378: -- Thanks for reporting the issue! I can reproduce it on my end. Let

Re: [PR] PARQUET-2380: Decouple rewriter from Hadoop [parquet-mr]

2023-11-16 Thread via GitHub
wgtmac commented on PR #1195: URL: https://github.com/apache/parquet-mr/pull/1195#issuecomment-1814783385 cc @ConeyLiu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786833#comment-17786833 ] ASF GitHub Bot commented on PARQUET-2380: - wgtmac commented on PR #1195: URL:

[jira] [Commented] (PARQUET-2382) Remove the deprecated OriginalType

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786835#comment-17786835 ] ASF GitHub Bot commented on PARQUET-2382: - wgtmac commented on PR #1194: URL:

Re: [PR] PARQUET-2382: Remove the deprecated `OriginalType` [parquet-mr]

2023-11-16 Thread via GitHub
wgtmac commented on PR #1194: URL: https://github.com/apache/parquet-mr/pull/1194#issuecomment-1814785912 I am fine with it but I'd like to seek advices from @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] PARQUET-2375 Extend vectorized bit unpacking benchmark for various bit sizes. [parquet-mr]

2023-11-16 Thread via GitHub
jatin-bhateja commented on PR #1186: URL: https://github.com/apache/parquet-mr/pull/1186#issuecomment-1814978209 Hi @wgtmac , May I request you to kindly merge this, I do not have write access to repo. -- This is an automated message from the Apache Git Service. To respond to the

[jira] [Commented] (PARQUET-2375) Extend vectorized bit unpacking benchmark for various bit sizes.

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786877#comment-17786877 ] ASF GitHub Bot commented on PARQUET-2375: - jatin-bhateja commented on PR #1186: URL:

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:50 AM: --

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:49 AM: --

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:30 AM: --

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on code in PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1396756877 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java: ## @@ -80,10 +80,12 @@ static final class ColumnChunkPageReader implements

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787030#comment-17787030 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 5:42 AM: --

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787030#comment-17787030 ] Jiashen Zhang commented on PARQUET-2378:

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:28 AM: --

[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787049#comment-17787049 ] ASF GitHub Bot commented on PARQUET-2374: - ConeyLiu commented on code in PR #1187: URL:

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:56 AM: --

[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787048#comment-17787048 ] ASF GitHub Bot commented on PARQUET-2374: - ConeyLiu commented on code in PR #1187: URL:

Re: [PR] PARQUET-2373: Improve I/O performance with bloom_filter_length [parquet-mr]

2023-11-16 Thread via GitHub
zhangjiashen commented on code in PR #1184: URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396702452 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1347,11 +1348,24 @@ public BloomFilter

[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787028#comment-17787028 ] ASF GitHub Bot commented on PARQUET-2373: - zhangjiashen commented on code in PR #1184: URL:

[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787041#comment-17787041 ] ASF GitHub Bot commented on PARQUET-2380: - ConeyLiu commented on code in PR #1195: URL:

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang commented on PARQUET-2378: What about we directly print content given a parquet

[jira] [Resolved] (PARQUET-2375) Extend vectorized bit unpacking benchmark for various bit sizes.

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2375. -- Fix Version/s: 1.14.0 Assignee: JATIN BHATEJA Resolution: Fixed > Extend vectorized

[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787045#comment-17787045 ] ASF GitHub Bot commented on PARQUET-2380: - ConeyLiu commented on PR #1195: URL:

Re: [PR] PARQUET-2380: Decouple rewriter from Hadoop [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on PR #1195: URL: https://github.com/apache/parquet-mr/pull/1195#issuecomment-1815809399 Thanks @amousavigourabi for this contribution, and thanks @wgtmac for pinging me. Just two minor comments. -- This is an automated message from the Apache Git Service. To respond

[jira] [Commented] (PARQUET-2375) Extend vectorized bit unpacking benchmark for various bit sizes.

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787044#comment-17787044 ] ASF GitHub Bot commented on PARQUET-2375: - wgtmac merged PR #1186: URL:

[jira] [Updated] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiashen Zhang updated PARQUET-2378: --- Attachment: image-2023-11-16-21-40-07-628.png > Problem with a cat > -- >

Re: [VOTE] Release Apache Parquet Format 2.10.0 RC0

2023-11-16 Thread wish maple
+1 (no-binding) Thanks Gang for release! Best, Xuwei Fu Gang Wu 于2023年11月16日周四 14:07写道: > Hi everyone, > > I propose the following RC to be released as the official Apache Parquet > Format 2.10.0 release. > > The commit id is b9c4fa81c3be13dc98760c92b037fa4dd465cef8 > * This corresponds to

Re: [PR] PARQUET-2373: Improve I/O performance with bloom_filter_length [parquet-mr]

2023-11-16 Thread via GitHub
zhangjiashen commented on code in PR #1184: URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396684158 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java: ## @@ -341,6 +351,15 @@ public long getBloomFilterOffset() {

[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787022#comment-17787022 ] ASF GitHub Bot commented on PARQUET-2373: - zhangjiashen commented on code in PR #1184: URL:

[jira] [Commented] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787036#comment-17787036 ] Gang Wu commented on PARQUET-2378: -- Can we get rid of the schema conversion via AvroSchemaConverter?

Re: [PR] PARQUET-2380: Decouple rewriter from Hadoop [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on code in PR #1195: URL: https://github.com/apache/parquet-mr/pull/1195#discussion_r1396744262 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/rewrite/ParquetRewriterTest.java: ## @@ -99,28 +103,47 @@ public class ParquetRewriterTest { private

[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787035#comment-17787035 ] ASF GitHub Bot commented on PARQUET-2373: - zhangjiashen commented on code in PR #1184: URL:

Re: [PR] PARQUET-2373: Improve I/O performance with bloom_filter_length [parquet-mr]

2023-11-16 Thread via GitHub
zhangjiashen commented on code in PR #1184: URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396684158 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java: ## @@ -341,6 +351,15 @@ public long getBloomFilterOffset() {

[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787034#comment-17787034 ] ASF GitHub Bot commented on PARQUET-2373: - zhangjiashen commented on code in PR #1184: URL:

Re: [PR] PARQUET-2373: Improve I/O performance with bloom_filter_length [parquet-mr]

2023-11-16 Thread via GitHub
zhangjiashen commented on code in PR #1184: URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396702452 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1347,11 +1348,24 @@ public BloomFilter

Re: [PR] PARQUET-2380: Decouple rewriter from Hadoop [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on code in PR #1195: URL: https://github.com/apache/parquet-mr/pull/1195#discussion_r1396740241 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/RewriteOptions.java: ## @@ -64,15 +72,53 @@ private RewriteOptions(Configuration conf,

[jira] [Commented] (PARQUET-2380) Decouple RewriteOptions from Hadoop classes

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787038#comment-17787038 ] ASF GitHub Bot commented on PARQUET-2380: - ConeyLiu commented on code in PR #1195: URL:

Re: [PR] PARQUET-2375: Extend vectorized bit unpacking benchmark for various bit sizes. [parquet-mr]

2023-11-16 Thread via GitHub
wgtmac merged PR #1186: URL: https://github.com/apache/parquet-mr/pull/1186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2374) Add metrics support for parquet file reader

2023-11-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787047#comment-17787047 ] ASF GitHub Bot commented on PARQUET-2374: - ConeyLiu commented on code in PR #1187: URL:

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on code in PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1396754140 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -125,10 +125,20 @@ public class ParquetFileReader implements Closeable {

[jira] [Comment Edited] (PARQUET-2378) Problem with a cat

2023-11-16 Thread Jiashen Zhang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787046#comment-17787046 ] Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:26 AM: --

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

2023-11-16 Thread via GitHub
ConeyLiu commented on code in PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1396753652 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -125,10 +125,20 @@ public class ParquetFileReader implements Closeable {