[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186386#comment-14186386 ] Gunther Hagleitner commented on HIVE-7800: -- +1 for 0.14. [~brocknoland] can you commit to the branch? > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.15.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159348#comment-14159348 ] Brock Noland commented on HIVE-7800: +1 > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159152#comment-14159152 ] Hive QA commented on HIVE-7800: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12672847/HIVE-7800.3.patch {color:green}SUCCESS:{color} +1 6538 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1113/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1113/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1113/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12672847 > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158599#comment-14158599 ] Daniel Weeks commented on HIVE-7800: One more in the list 4) Certain operations (group by + order by) lose the hive schema in the configuration, so the table information isn't available in the 'prepareForRead' and column index access resolution didn't work. > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158594#comment-14158594 ] Daniel Weeks commented on HIVE-7800: This patch actually resolves a few different issues: 1) If the file schema size and table schema size differ across partitions, it no longer throws an index out of bounds. 2) There was an odd case where if the calculated input splits resulted in a mapper not processing the first split (due to the row group boundary checking), the array writable used to back the materialized rows would be initialized as the full table length as opposed to projected column length. In the column index access case this caused problems due to not being able to handle that case. 3) There was a check included previously that didn't allow the file schema to vary from the table schema (i.e. could not request a column that doesn't exist in the underlying file). This doesn't allow for schema evolution and was removed. Columns missing from the file schema should be null padded in the final result. > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking
[ https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155444#comment-14155444 ] Daniel Weeks commented on HIVE-7800: [~brocknoland] The most recent patch isn't sufficient. I have a patch that moves the logic into the init method, but it was written against parquet-hive which is now significantly different than the current hive implementation. I need to update this patch to reflect those changes. > Parquet Column Index Access Schema Size Checking > > > Key: HIVE-7800 > URL: https://issues.apache.org/jira/browse/HIVE-7800 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Daniel Weeks >Assignee: Daniel Weeks >Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch > > > In the case that a parquet formatted table has partitions where the files > have different size schema, using column index access can result in an index > out of bounds exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)