[GitHub] [parquet-mr] avinashkolluru opened a new pull request, #1005: PARQUET-2198 : Updating jackson data bind version to fix CVEs

2022-10-19 Thread GitBox
avinashkolluru opened a new pull request, #1005: URL: https://github.com/apache/parquet-mr/pull/1005 Fixes CVE-2022-42003 and CVE-2022-42004 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet

[GitHub] [parquet-mr] ggershinsky commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-19 Thread GitBox
ggershinsky commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1283566340 I would also like to recommend adding @matthieun as a co-author to this PR, per the discussion in the parallel PR. -- This is an automated message from the Apache Git Service. To

[GitHub] [parquet-mr] ggershinsky commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-19 Thread GitBox
ggershinsky commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1283562000 > > Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code:

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997975040 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedCompressor.java: ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997973251 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestLz4RawCodec.java: ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997973218 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedCompressor.java: ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997972294 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestLz4RawCodec.java: ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997969281 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997966317 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedDecompressor.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997965979 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/NonBlockedCompressor.java: ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r997964441 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] pitrou commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
pitrou commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1282107004 It would probably be easier to use a git submodule (I wonder why that's not the approach being used in `parquet-mr`), but in any case reference files should go into the

[GitHub] [parquet-mr] wgtmac commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
wgtmac commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1282105001 > Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code:

[GitHub] [parquet-mr] ggershinsky commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-18 Thread GitBox
ggershinsky commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1281945116 Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code:

[GitHub] [parquet-mr] emkornfield commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-17 Thread GitBox
emkornfield commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1281847541 FWIW, Arrow/Parquet C++ checkout https://github.com/apache/parquet-testing when running parquet tests (instead of checking binary files into the main repo). As an aside, I

[GitHub] [parquet-mr] shangxinli commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-17 Thread GitBox
shangxinli commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1281720715 Hm... any opinion on this @ggershinsky ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [parquet-mr] wgtmac commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-17 Thread GitBox
wgtmac commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1281706700 > Looks good. The only thing is we checked in binary files directly. It would be hard to maintain in the future. Can you generate the parquet file using the parquetwriter?

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997562516 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997548363 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997548363 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] sheinbergon commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-10-17 Thread GitBox
sheinbergon commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1281551940 @shangxinli any reason why this PR hasn't been merged yet? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997546366 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997539131 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997390810 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] shangxinli commented on pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-17 Thread GitBox
shangxinli commented on PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#issuecomment-1281244709 Looks good. The only thing is we checked in binary files directly. It would be hard to maintain in the future. Can you generate the parquet file using the parquetwriter? --

[GitHub] [parquet-mr] shangxinli commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-17 Thread GitBox
shangxinli commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1281237456 @ggershinsky Can you have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-17 Thread GitBox
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1281213675 > Mostly looks reasonable, I'm not too familiar with parquet-mr @shangxinli can you recommend someone who might be able to give a better review? pinging @shangxinli :) -- This

[GitHub] [parquet-mr] shangxinli commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

2022-10-17 Thread GitBox
shangxinli commented on PR #978: URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1281184662 @ala Thanks for pinging me! At this moment, I don't have ETA yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-17 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r997314660 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] ala commented on pull request #978: PARQUET-2161: Fix row index generation in combination with range filtering

2022-10-17 Thread GitBox
ala commented on PR #978: URL: https://github.com/apache/parquet-mr/pull/978#issuecomment-1281067651 @ggershinsky @shangxinli Hi! I just wanted to ask if 1.12.4 release might be happening soon (it seems in the previous years there usually was a release around September-October time)? We

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-14 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r996074063 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-14 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r996038868 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-14 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r995950328 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-13 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r995178483 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-13 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r995178115 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-13 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r995132403 ## parquet-common/src/main/java/org/apache/parquet/io/SeekableInputStream.java: ## @@ -23,6 +23,10 @@ import java.io.IOException; import java.io.InputStream;

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-13 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r995112398 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-12 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r993981829 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-12 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r993792933 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] ChrisCollinsIBM commented on pull request #888: PARQUET-2020: Remove deprecated modules

2022-10-12 Thread GitBox
ChrisCollinsIBM commented on PR #888: URL: https://github.com/apache/parquet-mr/pull/888#issuecomment-1276356059 Would still love to get some clarity on a path forward on this @Fokko @gszadovszky Thanks! -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992923598 ## parquet-hadoop/pom.xml: ## @@ -102,6 +102,11 @@ jar compile + + io.airlift Review Comment: @shangxinli I have checked that

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992924054 ## parquet-hadoop/pom.xml: ## @@ -102,6 +102,11 @@ jar compile + + io.airlift Review Comment: From the

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992923598 ## parquet-hadoop/pom.xml: ## @@ -102,6 +102,11 @@ jar compile + + io.airlift Review Comment: @shangxinli I have checked that

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992920886 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992915727 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992915052 ## parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java: ## @@ -30,7 +30,8 @@ public enum CompressionCodecName {

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992914648 ## parquet-cli/src/main/java/org/apache/parquet/cli/Util.java: ## @@ -151,6 +151,8 @@ public static String shortCodec(CompressionCodecName codec) { return

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-11 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r992914232 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawDecompressor.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992841047 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992826565 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992824827 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992787569 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992678347 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992678347 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1093,10 +1099,38 @@ private ColumnChunkPageReadStore

[GitHub] [parquet-mr] mukund-thakur commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-11 Thread GitBox
mukund-thakur commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r992670321 ## parquet-common/src/main/java/org/apache/parquet/io/SeekableInputStream.java: ## @@ -23,6 +23,10 @@ import java.io.IOException; import java.io.InputStream;

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-11 Thread GitBox
wgtmac commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r992446760 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-11 Thread GitBox
wgtmac commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r992437914 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-11 Thread GitBox
wgtmac commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r992436297 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-11 Thread GitBox
wgtmac commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r992431971 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-10 Thread GitBox
parthchandra commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r991511389 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -1811,6 +1845,44 @@ public void readAll(SeekableInputStream f,

[GitHub] [parquet-mr] matthieun commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-10-10 Thread GitBox
matthieun commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1273621299 Hi, I am fine with whatever solution. If you choose #995 that works, please just close this one! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990915628 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914702 ## parquet-protobuf/src/test/resources/Trees.proto: ## @@ -0,0 +1,37 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914303 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: this is

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914134 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990913344 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaConverterTest.java: ## @@ -82,264 +93,447 @@ public void testConvertAllDatatypes() throws

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-09 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990912850 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor

[GitHub] [parquet-mr] jinyius commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-10-09 Thread GitBox
jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272784474 > Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into

[GitHub] [parquet-mr] shangxinli commented on pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-10-09 Thread GitBox
shangxinli commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272621608 Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into

[GitHub] [parquet-mr] shangxinli commented on pull request #1003: Bump protobuf-java from 3.17.3 to 3.19.6 in /parquet-protobuf

2022-10-09 Thread GitBox
shangxinli commented on PR #1003: URL: https://github.com/apache/parquet-mr/pull/1003#issuecomment-1272620318 Not sure what does the 'compatibility' unknown mean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [parquet-mr] shangxinli commented on pull request #974: PARQUET-2156: Column bloom filter: Show bloom filters in tools

2022-10-09 Thread GitBox
shangxinli commented on PR #974: URL: https://github.com/apache/parquet-mr/pull/974#issuecomment-1272610018 @panbingkun Do you still need this PR open? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [parquet-mr] shangxinli merged pull request #990: PARQUET-2142: Update the parquet-cli document to avoid NoSuchMethodError

2022-10-09 Thread GitBox
shangxinli merged PR #990: URL: https://github.com/apache/parquet-mr/pull/990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990827325 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990827069 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #998: PARQUET-2195: Add scan command to parquet-cli

2022-10-09 Thread GitBox
shangxinli commented on code in PR #998: URL: https://github.com/apache/parquet-mr/pull/998#discussion_r990826841 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ScanCommand.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [parquet-mr] shangxinli merged pull request #962: Performance optimization to ByteBitPackingValuesReader

2022-10-09 Thread GitBox
shangxinli merged PR #962: URL: https://github.com/apache/parquet-mr/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli merged pull request #989: PARQUET-2176: Column index/statistics truncation in ParquetWriter

2022-10-09 Thread GitBox
shangxinli merged PR #989: URL: https://github.com/apache/parquet-mr/pull/989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824756 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/codec/TestInteropReadLz4RawCodec.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824600 ## parquet-hadoop/pom.xml: ## @@ -102,6 +102,11 @@ jar compile + + io.airlift Review Comment: Generally, we are strict to add

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-10-09 Thread GitBox
shangxinli commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r990824259 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/Lz4RawCodec.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [parquet-mr] emkornfield commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1272447374 Mostly looks reasonable, I'm not too familiar with parquet-mr @shangxinli can you recommend someone who might be able to give a better review? -- This is an automated message from

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727498 ## parquet-protobuf/src/test/resources/Trees.proto: ## @@ -0,0 +1,37 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727282 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: or is par

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727030 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: Aren't

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726331 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726272 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726095 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor

[GitHub] [parquet-mr] emkornfield commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-08 Thread GitBox
emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990725805 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaConverterTest.java: ## @@ -82,264 +93,447 @@ public void testConvertAllDatatypes() throws

[GitHub] [parquet-mr] danielcweeks commented on a diff in pull request #999: [DRAFT] PR to show Vectored IO integration, compilation fails now.

2022-10-08 Thread GitBox
danielcweeks commented on code in PR #999: URL: https://github.com/apache/parquet-mr/pull/999#discussion_r990665831 ## parquet-common/src/main/java/org/apache/parquet/io/SeekableInputStream.java: ## @@ -23,6 +23,10 @@ import java.io.IOException; import java.io.InputStream;

[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-10-07 Thread GitBox
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1271822621 ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] songhuicheng opened a new pull request, #1004: PARQUET-2199: Fix checkBlockSizeReached zero record size perf issue

2022-10-06 Thread GitBox
songhuicheng opened a new pull request, #1004: URL: https://github.com/apache/parquet-mr/pull/1004 Parquet checks Block size after writing records to decide when it shall flush. This is relatively expensive, so it estimates the next check based on record size, record count etc. For

[GitHub] [parquet-mr] ggershinsky merged pull request #1001: PARQURT-2197: Document uniform encryption

2022-10-05 Thread GitBox
ggershinsky merged PR #1001: URL: https://github.com/apache/parquet-mr/pull/1001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-09-30 Thread GitBox
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1263878341 thanks for the review. updated to handle the logging perf concern as well as fixing the javadoc errors. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-09-30 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984839210 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-09-30 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984839210 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-09-30 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984838003 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR

[GitHub] [parquet-mr] jinyius commented on a diff in pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth

2022-09-30 Thread GitBox
jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984809641 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-09-30 Thread GitBox
ggershinsky commented on code in PR #959: URL: https://github.com/apache/parquet-mr/pull/959#discussion_r984394753 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -244,16 +272,60 @@ protected CompressionCodec getCodec(CompressionCodecName

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-30 Thread GitBox
wgtmac commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r984349672 ## parquet-cli/src/main/java/org/apache/parquet/cli/Util.java: ## @@ -151,6 +151,8 @@ public static String shortCodec(CompressionCodecName codec) { return

[GitHub] [parquet-mr] pitrou commented on a diff in pull request #1000: PARQUET-2196: Support LZ4_RAW codec

2022-09-30 Thread GitBox
pitrou commented on code in PR #1000: URL: https://github.com/apache/parquet-mr/pull/1000#discussion_r984376166 ## parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java: ## @@ -30,7 +30,8 @@ public enum CompressionCodecName {

<    1   2   3   4   5   6   7   8   9   10   >