[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469821#comment-16469821 ] Haifeng Chen commented on HIVE-19015: - [~vihangk1] This is the same nested complex type problem as it is not yet implement in Parquet vectorized reader. I will get this done together with HIVE-19016. The nested complex type handling will be much complex than primitives but fewer cases. My current thought is for root columns which is primitive or List, Struct and Map with primitives, we will go with the current implementation as fast path. When we found a root column with nested complex types, we will go with a tree reader which can handling the definition level and repetition level properly. > Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q > gets a ClassCastException > - > > Key: HIVE-19015 > URL: https://issues.apache.org/jira/browse/HIVE-19015 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Matt McCline >Assignee: Vihang Karajgaonkar >Priority: Critical > > Adding "SET hive.vectorized.execution.enabled=true;" to > parquet_map_of_arrays_of_ints.q triggers this call stack: > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to > org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} > FYI: [~vihangk1] > Adding parquet_map_of_maps.q, too. Stack trace seems related. > {noformat} > Caused by: java.lang.ClassCastException: optional group value (MAP) { > repeated group key_value { > optional binary key (UTF8); > required int32 value; > } > } is not primitive > at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) > ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0] > at > org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453249#comment-16453249 ] Matt McCline commented on HIVE-19015: - And, HIVE-19016 seems similar. > Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q > gets a ClassCastException > - > > Key: HIVE-19015 > URL: https://issues.apache.org/jira/browse/HIVE-19015 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Matt McCline >Assignee: Vihang Karajgaonkar >Priority: Critical > > Adding "SET hive.vectorized.execution.enabled=true;" to > parquet_map_of_arrays_of_ints.q triggers this call stack: > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to > org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} > FYI: [~vihangk1] > Adding parquet_map_of_maps.q, too. Stack trace seems related. > {noformat} > Caused by: java.lang.ClassCastException: optional group value (MAP) { > repeated group key_value { > optional binary key (UTF8); > required int32 value; > } > } is not primitive > at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) > ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0] > at > org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453232#comment-16453232 ] Matt McCline commented on HIVE-19015: - [~vihangk1] seems like the vectorized Parquet reader does not handle complex types as LIST elements. I can add a new HiveConf variable to disallow non-primitive LIST elements for Parquet input format in the Vectorizer. Sound like a plan? > Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q > gets a ClassCastException > - > > Key: HIVE-19015 > URL: https://issues.apache.org/jira/browse/HIVE-19015 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Matt McCline >Assignee: Vihang Karajgaonkar >Priority: Critical > > Adding "SET hive.vectorized.execution.enabled=true;" to > parquet_map_of_arrays_of_ints.q triggers this call stack: > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to > org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} > FYI: [~vihangk1] > Adding parquet_map_of_maps.q, too. Stack trace seems related. > {noformat} > Caused by: java.lang.ClassCastException: optional group value (MAP) { > repeated group key_value { > optional binary key (UTF8); > required int32 value; > } > } is not primitive > at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) > ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0] > at > org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19015) Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q gets a ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409076#comment-16409076 ] Matt McCline commented on HIVE-19015: - [~vihangk1] You may need the fixes in HIVE-19019 in order to make progress on this one. > Vectorization and Parquet: When vectorized, parquet_map_of_arrays_of_ints.q > gets a ClassCastException > - > > Key: HIVE-19015 > URL: https://issues.apache.org/jira/browse/HIVE-19015 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Matt McCline >Assignee: Vihang Karajgaonkar >Priority: Critical > > Adding "SET hive.vectorized.execution.enabled=true;" to > parquet_map_of_arrays_of_ints.q triggers this call stack: > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to > org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:67) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} > FYI: [~vihangk1] > Adding parquet_map_of_maps.q, too. Stack trace seems related. > {noformat} > Caused by: java.lang.ClassCastException: optional group value (MAP) { > repeated group key_value { > optional binary key (UTF8); > required int32 value; > } > } is not primitive > at org.apache.parquet.schema.Type.asPrimitiveType(Type.java:213) > ~[parquet-hadoop-bundle-1.9.0.jar:1.9.0] > at > org.apache.hadoop.hive.ql.io.parquet.vector.BaseVectorizedColumnReader.(BaseVectorizedColumnReader.java:130) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.(VectorizedListColumnReader.java:52) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:568) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)