[GitHub] [parquet-mr] rdblue commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-24 Thread GitBox
rdblue commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r857175853 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -88,6 +132,15 @@ public long skip(long n) { return bytesToSkip; }

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-24 Thread GitBox
rdblue commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r857175617 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -46,12 +74,19 @@ public long position() { return buffer.position() -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-24 Thread GitBox
rdblue commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r857175587 ## parquet-common/src/main/java/org/apache/parquet/bytes/SingleBufferInputStream.java: ## @@ -46,12 +74,19 @@ public long position() { return buffer.position() -

[GitHub] [parquet-mr] theosib-amazon commented on pull request #948: PARQUET-2128: Upgrade Thrift to 0.16.0

2022-04-22 Thread GitBox
theosib-amazon commented on PR #948: URL: https://github.com/apache/parquet-mr/pull/948#issuecomment-1106722530 The pull request doesn't explain why this change was made. Can anyone explain? The system where I'm doing my development doesn't have Thrift 0.16.0 in its standard packages yet,

[GitHub] [parquet-mr] theosib-amazon commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-04-22 Thread GitBox
theosib-amazon commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1106632106 Alright. You have a point. If the maintainers want me to delete that stuff, they can let me know, and I'll go ahead and do it. -- This is an automated message from the

[GitHub] [parquet-mr] dossett commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-04-22 Thread GitBox
dossett commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1106556192 Seems good to me (non-binding!). Revisiting whether or not the caching strategy make sense might be worthwhile, but that shouldn't stop this fix. Small comment: I would remove

[GitHub] [parquet-mr] dossett commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-04-22 Thread GitBox
dossett commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1106556193 Seems good to me (non-binding!). Revisiting whether or not the caching strategy make sense might be worthwhile, but that shouldn't stop this fix. Small comment: I would remove

[GitHub] [parquet-mr] theosib-amazon opened a new pull request, #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-04-21 Thread GitBox
theosib-amazon opened a new pull request, #959: URL: https://github.com/apache/parquet-mr/pull/959 CodecFactory cached instances of compressors and decompressors across threads, which was not thread-safe. This change makes the caches thread-local. -- This is an automated message

[GitHub] [parquet-mr] shangxinli commented on pull request #958: PARQUET-2138: Add ShowBloomFilterCommand to parquet-cli

2022-04-20 Thread GitBox
shangxinli commented on PR #958: URL: https://github.com/apache/parquet-mr/pull/958#issuecomment-1104439140 For column encrypted files, does it work? Can you add tests for it? For how to create encrypted test file, you can refer to this file

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #958: PARQUET-2138: Add ShowBloomFilterCommand to parquet-cli

2022-04-20 Thread GitBox
shangxinli commented on code in PR #958: URL: https://github.com/apache/parquet-mr/pull/958#discussion_r854534798 ## parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowBloomFilterCommand.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [parquet-mr] shangxinli commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-04-20 Thread GitBox
shangxinli commented on PR #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1104081321 @rdblue regarding to 'using Iceberg expressions and filters', we agreed to use. We are finding resources to work on it. Huaxin may be able to work on it after this column resolution

[GitHub] [parquet-mr] shangxinli commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-04-20 Thread GitBox
shangxinli commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1104043913 @mwong38 Can you address the feedback from @emkornfield before we can merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [parquet-mr] WangGuangxin opened a new pull request, #958: PARQUET-2138: Add ShowBloomFilterCommand to parquet-cli

2022-04-19 Thread GitBox
WangGuangxin opened a new pull request, #958: URL: https://github.com/apache/parquet-mr/pull/958 Add `ShowBloomFilter` command to parquet-cli. We can leverage it to check whether a block can be filtered by bloom filter. Example usage: ``` parquet-cli bloom-filter -c test_column

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-18 Thread GitBox
theosib-amazon commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r852512302 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPackingValuesReader.java: ## @@ -1,14 +1,14 @@ -/* +/* * Licensed to the

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-18 Thread GitBox
theosib-amazon commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r852508717 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPackingValuesReader.java: ## @@ -1,14 +1,14 @@ -/* +/* * Licensed to the

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-04-18 Thread GitBox
rdblue commented on code in PR #953: URL: https://github.com/apache/parquet-mr/pull/953#discussion_r852492474 ## parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPackingValuesReader.java: ## @@ -1,14 +1,14 @@ -/* +/* * Licensed to the Apache

[GitHub] [parquet-mr] theosib-amazon opened a new pull request, #957: PARQUET-2069: Allow list and array record types to be compatible.

2022-04-14 Thread GitBox
theosib-amazon opened a new pull request, #957: URL: https://github.com/apache/parquet-mr/pull/957 This PR addresses the following JIRA entry: https://issues.apache.org/jira/browse/PARQUET-2069 ParquetMR breaks compatibility with itself by including a JSON representation of a

[GitHub] [parquet-mr] shangxinli merged pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli merged PR #954: URL: https://github.com/apache/parquet-mr/pull/954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli commented on PR #954: URL: https://github.com/apache/parquet-mr/pull/954#issuecomment-1099348386 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850595178 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
ggershinsky commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850561540 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850550025 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850550025 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
ggershinsky commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850543188 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
ggershinsky commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850534162 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
shangxinli commented on code in PR #954: URL: https://github.com/apache/parquet-mr/pull/954#discussion_r850523822 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java: ## @@ -336,20 +354,30 @@ public ParquetFileWriter(OutputFile file, MessageType

[GitHub] [parquet-mr] ggershinsky commented on pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
ggershinsky commented on PR #954: URL: https://github.com/apache/parquet-mr/pull/954#issuecomment-1099243404 (it basically adds a new constructor, so shouldn't affect the existing stuff) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [parquet-mr] ggershinsky commented on pull request #954: PARQUET-2136: File writer construction with encryptor

2022-04-14 Thread GitBox
ggershinsky commented on PR #954: URL: https://github.com/apache/parquet-mr/pull/954#issuecomment-1099230977 Hi @shangxinli , can you have a quick look at this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [parquet-mr] dependabot[bot] opened a new pull request, #956: Bump hadoop-common from 2.10.1 to 3.2.3

2022-04-12 Thread GitBox
dependabot[bot] opened a new pull request, #956: URL: https://github.com/apache/parquet-mr/pull/956 Bumps hadoop-common from 2.10.1 to 3.2.3. [![Dependabot compatibility

[GitHub] [parquet-mr] JackBuggins commented on pull request #955: PARQUET-2127: update jackson-databind to 2.13.2.2

2022-04-07 Thread GitBox
JackBuggins commented on PR #955: URL: https://github.com/apache/parquet-mr/pull/955#issuecomment-1091923632 @shangxinli - would you mind taking a look at this one since you recently reviewed a similar change? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [parquet-mr] JackBuggins opened a new pull request, #955: PARQUET-2127: update jackson-databind to 2.13.2.2

2022-04-07 Thread GitBox
JackBuggins opened a new pull request, #955: URL: https://github.com/apache/parquet-mr/pull/955 address the following cve https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-36518 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the

[GitHub] [parquet-site] shangxinli merged pull request #23: Add search functionality to the Parquet Website with Algolia

2022-04-06 Thread GitBox
shangxinli merged PR #23: URL: https://github.com/apache/parquet-site/pull/23 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842099171 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -878,11 +880,97 @@ public String getFile() { return blocks; } -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842097255 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -878,11 +880,97 @@ public String getFile() { return blocks; } -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842096174 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -878,11 +880,97 @@ public String getFile() { return blocks; } -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842094667 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -878,11 +880,97 @@ public String getFile() { return blocks; } -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842094667 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ## @@ -878,11 +880,97 @@ public String getFile() { return blocks; } -

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842091649 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java: ## @@ -170,6 +174,24 @@ public Void visit(Not not) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842090063 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java: ## @@ -170,6 +174,24 @@ public Void visit(Not not) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842090063 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java: ## @@ -170,6 +174,24 @@ public Void visit(Not not) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842088106 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java: ## @@ -48,10 +55,18 @@ protected Column(ColumnPath columnPath, Class columnType) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842087619 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/FilterApi.java: ## @@ -72,26 +73,50 @@ public static IntColumn intColumn(String columnPath) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842086915 ## parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java: ## @@ -511,6 +519,11 @@ public Builder withPageWriteChecksumEnabled(boolean val) {

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842086583 ## parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java: ## @@ -266,6 +269,10 @@ public int getMaxBloomFilterBytes() { return

[GitHub] [parquet-mr] rdblue commented on a diff in pull request #950: PARQUET-2006: Column resolution by ID

2022-04-04 Thread GitBox
rdblue commented on code in PR #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842085396 ## parquet-column/src/main/java/org/apache/parquet/column/ColumnDescriptor.java: ## @@ -70,7 +71,20 @@ public ColumnDescriptor(String[] path, PrimitiveTypeName type,

[GitHub] [parquet-mr] shangxinli commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-04-03 Thread GitBox
shangxinli commented on PR #951: URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1086966074 Thanks for adding the check and debug log. LGTM! One more thing(sorry for not asking at first-round review), do you think it makes sense to add tests? -- This is an automated

[GitHub] [parquet-site] shangxinli merged pull request #16: Add ASF Links

2022-03-30 Thread GitBox
shangxinli merged pull request #16: URL: https://github.com/apache/parquet-site/pull/16 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] shangxinli merged pull request #21: Updating tagline and readme

2022-03-30 Thread GitBox
shangxinli merged pull request #21: URL: https://github.com/apache/parquet-site/pull/21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] shangxinli merged pull request #22: Adding static updated DOAP parquet file

2022-03-30 Thread GitBox
shangxinli merged pull request #22: URL: https://github.com/apache/parquet-site/pull/22 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] theosib-amazon commented on pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-03-29 Thread GitBox
theosib-amazon commented on pull request #953: URL: https://github.com/apache/parquet-mr/pull/953#issuecomment-1081948012 I forgot to add this to a comment in the code: The reason PlainValuesReader still includes an unused LittleEndianDataInputStream member is because if I don't, the

[GitHub] [parquet-mr] theosib-amazon opened a new pull request #953: Performance optimizations: Merged all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-03-29 Thread GitBox
theosib-amazon opened a new pull request #953: URL: https://github.com/apache/parquet-mr/pull/953 This PR is all performance optimization. In benchmarking with Trino, we find query performance to improve from 5% to 15%, depending on the query, and that includes all the I/O time from S3.

[GitHub] [parquet-mr] ggershinsky commented on pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader

2022-03-28 Thread GitBox
ggershinsky commented on pull request #945: URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1080253613 @prakharjain09 the upcoming parquet release will include the current master (plus a couple of WIP PRs, once they are merged), so this patch will be covered. -- This is

[GitHub] [parquet-mr] emkornfield commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-27 Thread GitBox
emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836044690 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -97,6 +127,46 @@ public MessageType

[GitHub] [parquet-mr] emkornfield commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-27 Thread GitBox
emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836042758 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -35,22 +35,49 @@ import java.util.List;

[GitHub] [parquet-mr] emkornfield commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-27 Thread GitBox
emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836040766 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -97,6 +127,46 @@ public MessageType

[GitHub] [parquet-mr] emkornfield commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-27 Thread GitBox
emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836039431 ## File path: parquet-protobuf/pom.xml ## @@ -57,6 +58,16 @@ protobuf-java ${protobuf.version} + + com.google.protobuf

[GitHub] [parquet-mr] prakharjain09 commented on pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader

2022-03-27 Thread GitBox
prakharjain09 commented on pull request #945: URL: https://github.com/apache/parquet-mr/pull/945#issuecomment-1080148066 @shangxinli @ggershinsky Thanks a lot for reviewing this change. This will unblock SPARK-37980 if this is released as part of upcoming parquet release. Do we need

[GitHub] [parquet-site] vinooganesh opened a new pull request #22: Adding static updated DOAP parquet file

2022-03-27 Thread GitBox
vinooganesh opened a new pull request #22: URL: https://github.com/apache/parquet-site/pull/22 Adding DOAP file per instructions here: https://projects.apache.org/doap.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [parquet-site] vinooganesh closed pull request #15: Add Release docs and new GCS engine id

2022-03-26 Thread GitBox
vinooganesh closed pull request #15: URL: https://github.com/apache/parquet-site/pull/15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-25 Thread GitBox
shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1079368200 Will have another look soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [parquet-mr] sheinbergon commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-25 Thread GitBox
sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1079025532 @shangxinli any news about merging this version? Are there still any blockers? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [parquet-format] BMDan opened a new pull request #183: Correct formatting errors in Encodings.md

2022-03-24 Thread GitBox
BMDan opened a new pull request #183: URL: https://github.com/apache/parquet-format/pull/183 In particular, address the issue that formatting caused the text after "The data stream looks like:" in "Delta-length byte array" to disappear entirely. The remainder of changes are simply

[GitHub] [parquet-site] shangxinli merged pull request #20: Updating readme to retrigger deploy

2022-03-24 Thread GitBox
shangxinli merged pull request #20: URL: https://github.com/apache/parquet-site/pull/20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] shangxinli closed pull request #19: Seems like this file has to be on the branch

2022-03-24 Thread GitBox
shangxinli closed pull request #19: URL: https://github.com/apache/parquet-site/pull/19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] vinooganesh opened a new pull request #20: Updating readme to retrigger deploy

2022-03-24 Thread GitBox
vinooganesh opened a new pull request #20: URL: https://github.com/apache/parquet-site/pull/20 Updating readme to retrigger deploy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [parquet-site] vinooganesh opened a new pull request #19: Seems like this file has to be on the branch

2022-03-24 Thread GitBox
vinooganesh opened a new pull request #19: URL: https://github.com/apache/parquet-site/pull/19 Per https://infra-reports.apache.org/site-source/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [parquet-site] shangxinli merged pull request #17: Cleaning out existing branch with full history

2022-03-24 Thread GitBox
shangxinli merged pull request #17: URL: https://github.com/apache/parquet-site/pull/17 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] shangxinli merged pull request #18: Creating new Parquet production site

2022-03-24 Thread GitBox
shangxinli merged pull request #18: URL: https://github.com/apache/parquet-site/pull/18 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-site] vinooganesh opened a new pull request #18: Creating new Parquet production site

2022-03-24 Thread GitBox
vinooganesh opened a new pull request #18: URL: https://github.com/apache/parquet-site/pull/18 https://github.com/apache/parquet-site/pull/17 will need to merge first before this one can -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [parquet-mr] 7c00 commented on a change in pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-03-23 Thread GitBox
7c00 commented on a change in pull request #951: URL: https://github.com/apache/parquet-mr/pull/951#discussion_r833885795 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java ## @@ -66,6 +67,15 @@ public static SeekableInputStream

[GitHub] [parquet-site] vinooganesh opened a new pull request #15: Add Release docs and new GCS engine id

2022-03-23 Thread GitBox
vinooganesh opened a new pull request #15: URL: https://github.com/apache/parquet-site/pull/15 Add documentation on how to release and new GCS engine id -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [parquet-mr] shangxinli commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-23 Thread GitBox
shangxinli commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1076485916 > hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored (if in footer

[GitHub] [parquet-mr] ggershinsky commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
ggershinsky commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1075923688 I'll join too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1075889722 @shangxinli Yes, I will join the meeting tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1075889461 @ggershinsky I think in write/read, if `COLUMN_ID_RESOLUTION` sets to true but field_id were not set by the caller/writer, we need to throw Exception. -- This is an

[GitHub] [parquet-mr] shangxinli edited a comment on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
shangxinli edited a comment on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1075683666 @huaxingao @ggershinsky Will you be able to join tomorrow's meeting to have a discussion on the open issues? We can try to close tham in the meeting and move this

[GitHub] [parquet-mr] shangxinli commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
shangxinli commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1075683666 @huaxingao @ggershinsky Will you be able to join tomorrow's meeting to have a discussion on the open issues? We can try to close then in the meeting and move this PR

[GitHub] [parquet-mr] dossett commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-22 Thread GitBox
dossett commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1075164744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [parquet-mr] ggershinsky commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-22 Thread GitBox
ggershinsky commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1074835875 Thanks @huaxingao , one more question / clarification. In the writer, > field_id has to be unique in the entire schema, otherwise, an Exception will be thrown.

[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1074572886 @ggershinsky I updated the description. Please check again to see if it is clear to you. Thanks! -- This is an automated message from the Apache Git Service. To

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831619229 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java ## @@ -472,7 +488,13 @@ public static boolean

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831619033 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618518 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618391 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java ## @@ -181,7 +181,7 @@ public void

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618277 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java ## @@ -170,6 +174,24 @@ public Void

[GitHub] [parquet-mr] shangxinli merged pull request #952: PARQUET-2127: upgrade jackson-databind to 2.13.2

2022-03-21 Thread GitBox
shangxinli merged pull request #952: URL: https://github.com/apache/parquet-mr/pull/952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] trevorurquhart opened a new pull request #952: PARQUET-2127: upgrade jackson-databind to 2.13.2

2022-03-21 Thread GitBox
trevorurquhart opened a new pull request #952: URL: https://github.com/apache/parquet-mr/pull/952 ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. -

[GitHub] [parquet-mr] mwong38 commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830942166 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary

[GitHub] [parquet-mr] mwong38 commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830941957 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary

[GitHub] [parquet-mr] sheinbergon commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073708633 @mwong38 let me know if you want me to help in any way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r830823180 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java ## @@ -170,6 +174,24 @@ public

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r830820621 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[GitHub] [parquet-mr] ggershinsky edited a comment on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky edited a comment on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073553770 hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored (if in

[GitHub] [parquet-mr] ggershinsky commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073553770 hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored (if in footer

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-20 Thread GitBox
shangxinli commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830653898 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-20 Thread GitBox
shangxinli commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830652740 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-03-20 Thread GitBox
shangxinli commented on a change in pull request #951: URL: https://github.com/apache/parquet-mr/pull/951#discussion_r830648757 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java ## @@ -66,6 +67,15 @@ public static SeekableInputStream

[GitHub] [parquet-mr] shangxinli merged pull request #945: PARQUET-2117: Expose Row Index via ParquetReader and ParquetRecordReader

2022-03-19 Thread GitBox
shangxinli merged pull request #945: URL: https://github.com/apache/parquet-mr/pull/945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [parquet-mr] shangxinli commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-19 Thread GitBox
shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073074540 Can you squash all the commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [parquet-mr] shangxinli commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-19 Thread GitBox
shangxinli commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073074030 Hi. @huaxingao Thanks for working on it. I just had a first-round review and left some comments. After we address them, I will have another look. -- This is an

<    5   6   7   8   9   10   11   12   13   14   >