[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

2018-05-09 Thread Haifeng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469821#comment-16469821
 ] 

Haifeng Chen commented on HIVE-19015:
-

[~vihangk1] This is the same nested complex type problem as it is not yet 
implement in Parquet vectorized reader. I will get this done together with 
HIVE-19016. 

The nested complex type handling will be much complex than primitives but fewer 
cases. My current thought is for root columns which is primitive or List, 
Struct and Map with primitives, we will go with the current implementation as 
fast path.  When we found a root column with nested complex types, we will go 
with a tree reader which can handling the definition level and repetition level 
properly.

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q 
> gets a ClassCastException
> -
>
> Key: HIVE-19015
> URL: https://issues.apache.org/jira/browse/HIVE-19015
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to 
> parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
> org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
> optional binary key (UTF8);
> required int32 value;
>   }
> } is not primitive
>   at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) 
> ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

2018-04-25 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453249#comment-16453249
 ] 

Matt McCline commented on HIVE-19015:
-

And, HIVE-19016 seems similar.

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q 
> gets a ClassCastException
> -
>
> Key: HIVE-19015
> URL: https://issues.apache.org/jira/browse/HIVE-19015
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to 
> parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
> org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
> optional binary key (UTF8);
> required int32 value;
>   }
> } is not primitive
>   at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) 
> ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

2018-04-25 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453232#comment-16453232
 ] 

Matt McCline commented on HIVE-19015:
-

[~vihangk1] seems like the vectorized Parquet reader does not handle complex 
types as LIST elements.

I can add a new HiveConf variable to disallow non-primitive LIST elements for 
Parquet input format in the Vectorizer.  Sound like a plan?

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q 
> gets a ClassCastException
> -
>
> Key: HIVE-19015
> URL: https://issues.apache.org/jira/browse/HIVE-19015
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to 
> parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
> org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
> optional binary key (UTF8);
> required int32 value;
>   }
> } is not primitive
>   at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) 
> ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException

2018-03-21 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409076#comment-16409076
 ] 

Matt McCline commented on HIVE-19015:
-

[~vihangk1] You may need the fixes in HIVE-19019 in order to make progress on 
this one.

> Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q 
> gets a ClassCastException
> -
>
> Key: HIVE-19015
> URL: https://issues.apache.org/jira/browse/HIVE-19015
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Adding "SET hive.vectorized.execution.enabled=true;"  to 
> parquet_map_of_arrays_of_ints.q triggers this call stack:
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to 
> org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}
> FYI: [~vihangk1]
> Adding parquet_map_of_maps.q, too.  Stack trace seems related.
> {noformat}
> Caused by: java.lang.ClassCastException: optional group value (MAP) {
>   repeated group key_value {
> optional binary key (UTF8);
> required int32 value;
>   }
> } is not primitive
>   at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) 
> ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)