[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510175#comment-17510175 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on pull request #950: URL:

[GitHub] [parquet-mr] huaxingao commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1074572886 @ggershinsky I updated the description. Please check again to see if it is clear to you. Thanks! -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510141#comment-17510141 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on a change in pull request

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510140#comment-17510140 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831619229 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java ## @@ -472,7 +488,13 @@ public static boolean

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831619033 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510139#comment-17510139 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618518 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510138#comment-17510138 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on a change in pull request

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510137#comment-17510137 ] ASF GitHub Bot commented on PARQUET-2006: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618391 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java ## @@ -181,7 +181,7 @@ public void

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
huaxingao commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r831618277 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java ## @@ -170,6 +174,24 @@ public Void

Re: Multiple pages with indexes vs multiple row groups with one data page per chunk

2022-03-21 Thread Weston Pace
This is speculative since we haven't implemented page/column indices in C++ yet but I don't know if shrinking the row group size works well if you have differently sized columns. For example, imagine you have a wide column of varstring "comments" that averages 50 bytes per cell. If you then

[jira] [Commented] (PARQUET-2127) Security risk in latest parquet-jackson-1.12.2.jar

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510122#comment-17510122 ] ASF GitHub Bot commented on PARQUET-2127: - shangxinli merged pull request #952: URL:

[GitHub] [parquet-mr] shangxinli merged pull request #952: PARQUET-2127: upgrade jackson-databind to 2.13.2

2022-03-21 Thread GitBox
shangxinli merged pull request #952: URL: https://github.com/apache/parquet-mr/pull/952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2127) Security risk in latest parquet-jackson-1.12.2.jar

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510088#comment-17510088 ] ASF GitHub Bot commented on PARQUET-2127: - trevorurquhart opened a new pull request #952: URL:

[GitHub] [parquet-mr] trevorurquhart opened a new pull request #952: PARQUET-2127: upgrade jackson-databind to 2.13.2

2022-03-21 Thread GitBox
trevorurquhart opened a new pull request #952: URL: https://github.com/apache/parquet-mr/pull/952 ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. -

[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509743#comment-17509743 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on a change in pull request #900:

[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509744#comment-17509744 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on a change in pull request #900:

[GitHub] [parquet-mr] mwong38 commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830942166 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary

[GitHub] [parquet-mr] mwong38 commented on a change in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830941957 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary

[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509738#comment-17509738 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on pull request #900: URL:

[GitHub] [parquet-mr] sheinbergon commented on pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-03-21 Thread GitBox
sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073708633 @mwong38 let me know if you want me to help in any way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509660#comment-17509660 ] ASF GitHub Bot commented on PARQUET-2006: - ggershinsky commented on a change in pull request

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r830823180 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java ## @@ -170,6 +174,24 @@ public

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509653#comment-17509653 ] ASF GitHub Bot commented on PARQUET-2006: - ggershinsky commented on a change in pull request

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on a change in pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#discussion_r830820621 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -878,11 +880,92 @@ public String getFile() {

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509652#comment-17509652 ] ASF GitHub Bot commented on PARQUET-2006: - ggershinsky edited a comment on pull request #950:

[GitHub] [parquet-mr] ggershinsky edited a comment on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky edited a comment on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073553770 hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored (if in

[jira] [Commented] (PARQUET-2006) Column resolution by ID

2022-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509651#comment-17509651 ] ASF GitHub Bot commented on PARQUET-2006: - ggershinsky commented on pull request #950: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #950: PARQUET-2006: Column resolution by ID

2022-03-21 Thread GitBox
ggershinsky commented on pull request #950: URL: https://github.com/apache/parquet-mr/pull/950#issuecomment-1073553770 hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored (if in footer