[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553850#comment-17553850 ] ASF GitHub Bot commented on PARQUET-2157: - huaxingao commented on code in PR #975: URL:

[GitHub] [parquet-mr] huaxingao commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
huaxingao commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r896299022 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -282,6 +286,63 @@ public void testParquetFileWithBloomFilter() throws

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553842#comment-17553842 ] ASF GitHub Bot commented on PARQUET-2157: - chenjunjiedada commented on code in PR #975: URL:

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553843#comment-17553843 ] ASF GitHub Bot commented on PARQUET-2157: - chenjunjiedada commented on code in PR #975: URL:

[GitHub] [parquet-mr] chenjunjiedada commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
chenjunjiedada commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r896285374 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -282,6 +286,63 @@ public void testParquetFileWithBloomFilter() throws

[GitHub] [parquet-mr] chenjunjiedada commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
chenjunjiedada commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r896285197 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -282,6 +286,63 @@ public void testParquetFileWithBloomFilter() throws

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553756#comment-17553756 ] ASF GitHub Bot commented on PARQUET-2157: - huaxingao commented on PR #975: URL:

[GitHub] [parquet-mr] huaxingao commented on pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
huaxingao commented on PR #975: URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1154253580 cc @chenjunjiedada @ggershinsky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553755#comment-17553755 ] ASF GitHub Bot commented on PARQUET-2157: - huaxingao commented on PR #975: URL:

[GitHub] [parquet-mr] huaxingao commented on pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
huaxingao commented on PR #975: URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1154252101 The CI passed. Thanks a lot @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553715#comment-17553715 ] ASF GitHub Bot commented on PARQUET-2157: - dongjoon-hyun commented on code in PR #975: URL:

[jira] [Commented] (PARQUET-2157) Add BloomFilter fpp config

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553716#comment-17553716 ] ASF GitHub Bot commented on PARQUET-2157: - dongjoon-hyun commented on code in PR #975: URL:

[GitHub] [parquet-mr] dongjoon-hyun commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
dongjoon-hyun commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r895949925 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -39,13 +39,17 @@ import java.io.File; import java.io.IOException;

[GitHub] [parquet-mr] dongjoon-hyun commented on a diff in pull request #975: PARQUET-2157: add bloom filter fpp config

2022-06-13 Thread GitBox
dongjoon-hyun commented on code in PR #975: URL: https://github.com/apache/parquet-mr/pull/975#discussion_r895949925 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java: ## @@ -39,13 +39,17 @@ import java.io.File; import java.io.IOException;

[jira] [Commented] (PARQUET-2153) Cannot read schema from parquet file

2022-06-13 Thread Timothy Miller (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553635#comment-17553635 ] Timothy Miller commented on PARQUET-2153: - Is this related to anything fixed by

[jira] [Commented] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553623#comment-17553623 ] ASF GitHub Bot commented on PARQUET-2158: - steveloughran commented on PR #976: URL:

[GitHub] [parquet-mr] steveloughran commented on pull request #976: PARQUET-2158: Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread GitBox
steveloughran commented on PR #976: URL: https://github.com/apache/parquet-mr/pull/976#issuecomment-1154018331 This PR fixes Parquet to build/link against Hadoop 3.2.0 and higher. It would be cleaner to remove the deprecated class causing compatibility issues -the fact that nobody has ever

[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553622#comment-17553622 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL:

[GitHub] [parquet-mr] dossett commented on pull request #963: PARQUET-1020 Add DynamicMessage writing support

2022-06-13 Thread GitBox
dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154013785 @guillaume-fetter I see what you mean, that makes sense. I think for my use case (reading protobuf data from kafka via the confluent schema registry and then writing to parquet) I won't

[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553618#comment-17553618 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on PR #963: URL:

[GitHub] [parquet-mr] guillaume-fetter commented on pull request #963: PARQUET-1020 Add DynamicMessage writing support

2022-06-13 Thread GitBox
guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154004113 @dossett Depends on your use case. If you are running a simple program that does data processing on a single host, then you're good. If you are using a big data processing tool

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553586#comment-17553586 ] ASF GitHub Bot commented on PARQUET-2149: - steveloughran commented on PR #968: URL:

[GitHub] [parquet-mr] steveloughran commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-06-13 Thread GitBox
steveloughran commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1153924743 (i could of course add those probes into the shim class, so at least that access of internals was in one place) -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553585#comment-17553585 ] ASF GitHub Bot commented on PARQUET-2149: - steveloughran commented on PR #968: URL:

[GitHub] [parquet-mr] steveloughran commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-06-13 Thread GitBox
steveloughran commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1153923501 bq. perhaps check if the ByteBufferReadable interface is implemented in the stream? The requirement for the `hasCapability("in:readbytebuffer")` to return true postdates

[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553563#comment-17553563 ] ASF GitHub Bot commented on PARQUET-2134: - steveloughran commented on PR #951: URL:

[GitHub] [parquet-mr] steveloughran commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-06-13 Thread GitBox
steveloughran commented on PR #951: URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1153871066 whoever actually commits this can use the github squash option to combine all commits into one before merging. FYI, I've just started writing a shim library so that apps

[jira] [Commented] (PARQUET-2153) Cannot read schema from parquet file

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553558#comment-17553558 ] ASF GitHub Bot commented on PARQUET-2153: - NilsB44 opened a new pull request, #977: URL:

[GitHub] [parquet-mr] NilsB44 opened a new pull request, #977: PARQUET-2153: SchemaParseException: Can't redefine: element for FixedSchema

2022-06-13 Thread GitBox
NilsB44 opened a new pull request, #977: URL: https://github.com/apache/parquet-mr/pull/977 This extends the previous issue PARQUET-1441, where this issue was fixed for RecordSchema but not for FixedSchema. Make sure you have checked _all_ steps below. ### Jira - [ ]

[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553557#comment-17553557 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL:

[GitHub] [parquet-mr] dossett commented on pull request #963: PARQUET-1020 Add DynamicMessage writing support

2022-06-13 Thread GitBox
dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1153856375 Oh that's interesting @guillaume-fetter so you can't just write out a dynamic message into parquet without jumping through more hoops? -- This is an automated message from the Apache

[jira] [Commented] (PARQUET-2153) Cannot read schema from parquet file

2022-06-13 Thread Nils Broman (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553555#comment-17553555 ] Nils Broman commented on PARQUET-2153: -- I found out that this is indeed the same issue as for

[jira] [Commented] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553554#comment-17553554 ] ASF GitHub Bot commented on PARQUET-2158: - steveloughran commented on PR #976: URL:

[GitHub] [parquet-mr] steveloughran commented on pull request #976: PARQUET-2158:. pgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread GitBox
steveloughran commented on PR #976: URL: https://github.com/apache/parquet-mr/pull/976#issuecomment-1153851509 thrift module doesn't compile is using an hadoop internal class tagged as private & which made an incompatible change in hadoop 3. see HADOOP-12436 ``` Error: Failed to

[jira] [Commented] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553552#comment-17553552 ] Steve Loughran commented on PARQUET-2158: - build is broken by HADOOP-12436 {code} Error:

[jira] [Commented] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553524#comment-17553524 ] ASF GitHub Bot commented on PARQUET-2158: - steveloughran opened a new pull request, #976: URL:

[GitHub] [parquet-mr] steveloughran opened a new pull request, #976: PARQUET-2158:. pgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread GitBox
steveloughran opened a new pull request, #976: URL: https://github.com/apache/parquet-mr/pull/976 This updates Parquet's Hadoop dependency to 3.2.0. This version adds compatibility with Java 11, as well as many other features and bug fixes. ### Jira - [X] My PR

[jira] [Created] (PARQUET-2158) Upgrade Hadoop dependency to version 3.2.0

2022-06-13 Thread Steve Loughran (Jira)
Steve Loughran created PARQUET-2158: --- Summary: Upgrade Hadoop dependency to version 3.2.0 Key: PARQUET-2158 URL: https://issues.apache.org/jira/browse/PARQUET-2158 Project: Parquet Issue

[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553445#comment-17553445 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on PR #963: URL:

[GitHub] [parquet-mr] guillaume-fetter commented on pull request #963: PARQUET-1020 Add DynamicMessage writing support

2022-06-13 Thread GitBox
guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1153626738 Just a heads-up (because I have run into that issue), DynamicMessage is not serializable. So this means that this use-case is for local-only instances of a DynamicMessage.

[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553434#comment-17553434 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on code in PR #963: URL:

[GitHub] [parquet-mr] guillaume-fetter commented on a diff in pull request #963: PARQUET-1020 Add DynamicMessage writing support

2022-06-13 Thread GitBox
guillaume-fetter commented on code in PR #963: URL: https://github.com/apache/parquet-mr/pull/963#discussion_r895442880 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer

[jira] [Commented] (PARQUET-2117) Add rowPosition API in parquet record readers

2022-06-13 Thread Gidon Gershinsky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553425#comment-17553425 ] Gidon Gershinsky commented on PARQUET-2117: --- [~sha...@uber.com] Could you add

[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553412#comment-17553412 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on PR #900: URL:

[GitHub] [parquet-mr] sheinbergon commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-06-13 Thread GitBox
sheinbergon commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1153562823 @mwong38 anyway I can help with finalizing this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-06-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553401#comment-17553401 ] ASF GitHub Bot commented on PARQUET-2134: - 7c00 commented on PR #951: URL:

[GitHub] [parquet-mr] 7c00 commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-06-13 Thread GitBox
7c00 commented on PR #951: URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1153522439 Thanks @steveloughran @shangxinli . I have cherry-picked the commit from https://github.com/apache/parquet-mr/pull/971 -- This is an automated message from the Apache Git Service. To