[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
gszadovszky commented on a change in pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#discussion_r698296466 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetFileWriter.java ## @@ -239,6 +248,82 @@ public void testWriteRead()

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406621#comment-17406621 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], I am not sure how it would be

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406626#comment-17406626 ] ASF GitHub Bot commented on PARQUET-2078: - gszadovszky commented on a change in pull request

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Nemon Lou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406658#comment-17406658 ] Nemon Lou commented on PARQUET-2078: The wrongly setted dictionary page offset does not spread

[GitHub] [parquet-mr] huaxingao commented on pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-08-30 Thread GitBox
huaxingao commented on pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#issuecomment-908849008 @gszadovszky @shangxinli @dbtsai Thank you all very much for reviewing! I have changed the code to generate the visit methods for in/notIn and also added the default by

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407030#comment-17407030 ] ASF GitHub Bot commented on PARQUET-1968: - huaxingao commented on pull request #923: URL:

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407029#comment-17407029 ] ASF GitHub Bot commented on PARQUET-1968: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-08-30 Thread GitBox
huaxingao commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r698937308 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/recordlevel/IncrementallyUpdatedFilterPredicate.java ## @@ -123,6 +124,46 @@

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406722#comment-17406722 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], you are right, so

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
gszadovszky commented on a change in pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#discussion_r698473966 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetFileWriter.java ## @@ -239,6 +248,82 @@ public void testWriteRead()

Re: Any Parquet implementations might be impacted by PARQUET-2078

2021-08-30 Thread Gabor Szadovszky
It turned out that ColumnMetaData.dictionary_page_offset is not impacted by this issue so it is much easier to handle. It seems that 1.12.0 is the first parquet_mr release which writes ColumnChunk.file_offset and according to PARQUET-2078

[jira] [Commented] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406743#comment-17406743 ] ASF GitHub Bot commented on PARQUET-2080: - gszadovszky opened a new pull request #178: URL:

[GitHub] [parquet-format] gszadovszky opened a new pull request #178: PARQUET-2080: Deprecate RowGroup.file_offset

2021-08-30 Thread GitBox
gszadovszky opened a new pull request #178: URL: https://github.com/apache/parquet-format/pull/178 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406724#comment-17406724 ] ASF GitHub Bot commented on PARQUET-2078: - gszadovszky commented on a change in pull request

[jira] [Assigned] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2078: - Assignee: Nemon Lou > Failed to read parquet file after writing with the same

[jira] [Created] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-08-30 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2080: - Summary: Deprecate RowGroup.file_offset Key: PARQUET-2080 URL: https://issues.apache.org/jira/browse/PARQUET-2080 Project: Parquet Issue Type: Bug

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406776#comment-17406776 ] ASF GitHub Bot commented on PARQUET-2078: - shangxinli commented on pull request #925: URL:

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406789#comment-17406789 ] ASF GitHub Bot commented on PARQUET-2078: - gszadovszky commented on pull request #925: URL:

[GitHub] [parquet-mr] gszadovszky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
gszadovszky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908452452 @ggershinsky, even though this PR fixes the write path as well we have already released 1.12.0 so we have to prepare for the case of `RowGroup.file_offset` is incorrect.

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406797#comment-17406797 ] ASF GitHub Bot commented on PARQUET-2078: - ggershinsky commented on pull request #925: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908463834 Yep, but the current fix perpetuates the situation where some readers can't process encrypted files, even if they have keys for all projected columns; doesn't look like an

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406813#comment-17406813 ] ASF GitHub Bot commented on PARQUET-1968: - shangxinli commented on a change in pull request

[GitHub] [parquet-mr] shangxinli commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
shangxinli commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908428874 @ggershinsky Do you want to have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406784#comment-17406784 ] ASF GitHub Bot commented on PARQUET-2078: - ggershinsky commented on pull request #925: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #925: PARQUET-2078: Failed to read parquet file after writing with the same …

2021-08-30 Thread GitBox
ggershinsky commented on pull request #925: URL: https://github.com/apache/parquet-mr/pull/925#issuecomment-908443796 Sure. This won't work if the first column is encrypted and the reader doesn't have its key. Can the "write" part be fixed instead, so the RowGroup offset is set correctly?

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-08-30 Thread GitBox
shangxinli commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r698598441 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java ## @@ -247,6 +250,80 @@ public int hashCode() {

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406788#comment-17406788 ] ASF GitHub Bot commented on PARQUET-1968: - shangxinli commented on a change in pull request

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-08-30 Thread GitBox
shangxinli commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r698606499 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java ## @@ -247,6 +250,80 @@ public int hashCode() {

[jira] [Created] (PARQUET-2081) Encryption translation tool - Parquet-hadoop

2021-08-30 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2081: Summary: Encryption translation tool - Parquet-hadoop Key: PARQUET-2081 URL: https://issues.apache.org/jira/browse/PARQUET-2081 Project: Parquet Issue Type:

[jira] [Created] (PARQUET-2082) Encryption translation tool - Parquet-cli

2021-08-30 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2082: Summary: Encryption translation tool - Parquet-cli Key: PARQUET-2082 URL: https://issues.apache.org/jira/browse/PARQUET-2082 Project: Parquet Issue Type: