[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-27 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186386#comment-14186386
 ] 

Gunther Hagleitner commented on HIVE-7800:
--

+1 for 0.14. [~brocknoland] can you commit to the branch?

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.15.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-04 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159348#comment-14159348
 ] 

Brock Noland commented on HIVE-7800:


+1

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159152#comment-14159152
 ] 

Hive QA commented on HIVE-7800:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12672847/HIVE-7800.3.patch

{color:green}SUCCESS:{color} +1 6538 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1113/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1113/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1113/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12672847

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-03 Thread Daniel Weeks (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158599#comment-14158599
 ] 

Daniel Weeks commented on HIVE-7800:


One more in the list

4) Certain operations (group by + order by) lose the hive schema in the 
configuration, so the table information isn't available in the 'prepareForRead' 
and column index access resolution didn't work.

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-03 Thread Daniel Weeks (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158594#comment-14158594
 ] 

Daniel Weeks commented on HIVE-7800:


This patch actually resolves a few different issues:

1) If the file schema size and table schema size differ across partitions, it 
no longer throws an index out of bounds.
2) There was an odd case where if the calculated input splits resulted in a 
mapper not processing the first split (due to the row group boundary checking), 
the array writable used to back the materialized rows would be initialized as 
the full table length as opposed to projected column length.  In the column 
index access case this caused problems due to not being able to handle that 
case.
3) There was a check included previously that didn't allow the file schema to 
vary from the table schema (i.e. could not request a column that doesn't exist 
in the underlying file).  This doesn't allow for schema evolution and was 
removed.  Columns missing from the file schema should be null padded in the 
final result. 

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch, HIVE-7800.3.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7800) Parquet Column Index Access Schema Size Checking

2014-10-01 Thread Daniel Weeks (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155444#comment-14155444
 ] 

Daniel Weeks commented on HIVE-7800:


[~brocknoland]  The most recent patch isn't sufficient.  I have a patch that 
moves the logic into the init method, but it was written against parquet-hive 
which is now significantly different than the current hive implementation.  I 
need to update this patch to reflect those changes. 

> Parquet Column Index Access Schema Size Checking
> 
>
> Key: HIVE-7800
> URL: https://issues.apache.org/jira/browse/HIVE-7800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Daniel Weeks
>Assignee: Daniel Weeks
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7800.1.patch, HIVE-7800.2.patch
>
>
> In the case that a parquet formatted table has partitions where the files 
> have different size schema, using column index access can result in an index 
> out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)