[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990826841 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[jira] [Commented] (PARQUET-2195) Add scan command to parquet-cli

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614795#comment-17614795 ] ASF GitHub Bot commented on PARQUET-2195: - shangxinli commented on code in PR #998: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-10-09 Thread GitBox
shangxinli commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272621608 Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into

[jira] [Updated] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

2022-10-09 Thread Andrei Pangin (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Pangin updated PARQUET-2202: --- Description: Profiling of a Spark application revealed a performance issue in production:

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614786#comment-17614786 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824259 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] shangxinli merged pull request #989: PARQUET-2176: Column index/statistics truncation in ParquetWriter

2022-10-09 Thread GitBox
shangxinli merged PR #989: URL: https://github.com/apache/parquet-mr/pull/989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli merged pull request #990: PARQUET-2142: Update the parquet-cli document to avoid NoSuchMethodError

2022-10-09 Thread GitBox
shangxinli merged PR #990: URL: https://github.com/apache/parquet-mr/pull/990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2142) parquet-cli without hadoop throws java.lang.NoSuchMethodError on any parquet file access command

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614799#comment-17614799 ] ASF GitHub Bot commented on PARQUET-2142: - shangxinli merged PR #990: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1003: Bump protobuf-java from 3.17.3 to 3.19.6 in /parquet-protobuf

2022-10-09 Thread GitBox
shangxinli commented on PR #1003: URL: https://github.com/apache/parquet-mr/pull/1003#issuecomment-1272620318 Not sure what does the 'compatibility' unknown mean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614787#comment-17614787 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824600 ## parquet-hadoop/pom.xml: ## @@ -102,6 +102,11 @@ jar compile + + io.airlift Review Comment: Generally, we are strict to add

[jira] [Commented] (PARQUET-2196) Support LZ4_RAW codec

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614788#comment-17614788 ] ASF GitHub Bot commented on PARQUET-2196: - shangxinli commented on code in PR #1000: URL:

[GitHub] [parquet-mr] shangxinli merged pull request #962: Performance optimization to ByteBitPackingValuesReader

2022-10-09 Thread GitBox
shangxinli merged PR #962: URL: https://github.com/apache/parquet-mr/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2195) Add scan command to parquet-cli

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614797#comment-17614797 ] ASF GitHub Bot commented on PARQUET-2195: - shangxinli commented on code in PR #998: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990827069 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[jira] [Updated] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

2022-10-09 Thread Andrei Pangin (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Pangin updated PARQUET-2202: --- Description: Profiling of a Spark application revealed a performance issue in production:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990827325 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[jira] [Commented] (PARQUET-2195) Add scan command to parquet-cli

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614798#comment-17614798 ] ASF GitHub Bot commented on PARQUET-2195: - shangxinli commented on code in PR #998: URL:

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614809#comment-17614809 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #988: URL:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824756 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (PARQUET-2176) Parquet writers should allow for configurable index/statistics truncation

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614791#comment-17614791 ] ASF GitHub Bot commented on PARQUET-2176: - shangxinli merged PR #989: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #974: PARQUET-2156: Column bloom filter: Show bloom filters in tools

2022-10-09 Thread GitBox
shangxinli commented on PR #974: URL: https://github.com/apache/parquet-mr/pull/974#issuecomment-1272610018 @panbingkun Do you still need this PR open? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Commented] (PARQUET-2156) Column bloom filter: Show bloom filters in tools

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614800#comment-17614800 ] ASF GitHub Bot commented on PARQUET-2156: - shangxinli commented on PR #974: URL:

[jira] [Created] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

2022-10-09 Thread Andrei Pangin (Jira)
Andrei Pangin created PARQUET-2202: -- Summary: Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte Key: PARQUET-2202 URL: https://issues.apache.org/jira/browse/PARQUET-2202

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990912850 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614882#comment-17614882 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-10-09 Thread GitBox
jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272784474 > Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614889#comment-17614889 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914702 ## parquet-protobuf/src/test/resources/Trees.proto: ## @@ -0,0 +1,37 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614878#comment-17614878 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL:

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914134 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614887#comment-17614887 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614888#comment-17614888 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914303 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: this is

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614890#comment-17614890 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990915628 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-10-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614886#comment-17614886 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL:

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990913344 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaConverterTest.java: ## @@ -82,264 +93,447 @@ public void testConvertAllDatatypes() throws