[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970726#comment-13970726 ] Brock Noland commented on HIVE-6785: +1 query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970121#comment-13970121 ] Szehon Ho commented on HIVE-6785: - +1 (non-binding) + [~brocknoland] query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966355#comment-13966355 ] Szehon Ho commented on HIVE-6785: - Hi Tonjie, these are deprecated now and will be removed. See the discussion on HIVE-6757, for the current state. Use 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967047#comment-13967047 ] Hive QA commented on HIVE-6785: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639834/HIVE-6785.3.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5615 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2221/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2221/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639834 query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967069#comment-13967069 ] Tongjie Chen commented on HIVE-6785: [~szehon], are those failure transient? the new patch only changes the parquet class, which has nothing to do with these test cases. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967087#comment-13967087 ] Szehon Ho commented on HIVE-6785: - Yea, it doesnt look related to this patch. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967259#comment-13967259 ] Szehon Ho commented on HIVE-6785: - Looked at the new patch, looks fine with me query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, HIVE-6785.3.patch When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966093#comment-13966093 ] Tongjie Chen commented on HIVE-6785: Hi [~brocknoland], are you suggesting using: ALTER TABLE parquet_mixed_fileformat SET FILEFORMAT PARQUET; ( after I execute this statement, the table become not found) instead of: ALTER TABLE parquet_mixed_fileformat set SERDE 'parquet.hive.serde.ParquetHiveSerDe'; ALTER TABLE parquet_mixed_fileformat SET FILEFORMAT INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'; Please advise. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962473#comment-13962473 ] Szehon Ho commented on HIVE-6785: - +1 (non-binding) , thanks for adding the q-test and address comments. FYI [~brocknoland] query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962482#comment-13962482 ] Brock Noland commented on HIVE-6785: Hi, LGTM except I see we are using the parquet... class names when creating a table, which are soon to be removed. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962493#comment-13962493 ] Szehon Ho commented on HIVE-6785: - Good catch Brock, I missed that. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961491#comment-13961491 ] Hive QA commented on HIVE-6785: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12638885/HIVE-6785.2.patch.txt {color:green}SUCCESS:{color} +1 5549 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12638885 query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Fix For: 0.14.0 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961231#comment-13961231 ] Tongjie Chen commented on HIVE-6785: When I add a qtest, I realized that this bug is resolved with this patch in hive-trunk. But it is still a bug in Hive-0.11 now. Digging a little bit, I found that when Partition SerDe and Table SerDe are different, hive 0.11 would try to convert object inspector as long as they are not equals; however, in hive-trunk (0.13 or 0.14), if output ObjectInspector's all fields are all settable, there is no conversion happening, hence the bug presented in this jira does not show in hive-trunk any more. However, I do think that ParquetStringInspector should be subclass of JavaStringObjectInspector, so that Hive 0.11 would have no problem as well. related Hive Jiras: HIVE-5202 HIVE-5394 --- HIVE-trunk (0.13, 0.14 etc) code snippet for ObjectInspectorConverters - // 1. If equalsCheck is true and the inputOI is the same as the outputOI OR // 2. If the outputOI has all fields settable, return it if ((equalsCheck inputOI.equals(outputOI)) || ObjectInspectorUtils.hasAllFieldsSettable(outputOI, oiSettableProperties) == true) { return outputOI; } --- HIVE-0.11 code snippet for ObjectInspectorConverters - // If the inputOI is the same as the outputOI, just return it if (inputOI.equals(outputOI)) { return outputOI; } query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961240#comment-13961240 ] Tongjie Chen commented on HIVE-6785: In my previous comment, I mean that this bug is not reproducible in hive trunk due to patches from HIVE-5202 and HIVE-5394. But the patch introduced in this Jira is still an enhancement. Btw, how can I edit my own previous comment once submitted? query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957453#comment-13957453 ] Szehon Ho commented on HIVE-6785: - Shouldn't need to use svn for that, you can try 'git add'/ 'git rm'. I left couple comments on the rb for consideration. Also when you are ready, click 'submit patch' to trigger the pre commit test. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt More specifically, if table contains string type columns. it will result in the following exception Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957992#comment-13957992 ] Szehon Ho commented on HIVE-6785: - In my opinion, it is better to change the javastringobjectinspector's Ctor to protected , so we can keep all the parquet inspector in the same package. Also it would be good to add a q-test for this case. You can write one following the example of parquet_create.q and then generate and verify the result using the following command (maven version) on your q-test file: [https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoupdatetheoutputofaCliDrivertestcase?|https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoupdatetheoutputofaCliDrivertestcase?] query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958010#comment-13958010 ] Tongjie Chen commented on HIVE-6785: If we change javastringobjectinspector's Ctor to be protected, that should work fine for 0.13. But when we backport this jira to parquet-hive, it will break unless we also make the same change in hive 0.12 and hive 0.10, right? query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958043#comment-13958043 ] Szehon Ho commented on HIVE-6785: - Yea thats right, do you think it would be a huge issue though if its a hive 0.13 only fix? query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958054#comment-13958054 ] Tongjie Chen commented on HIVE-6785: The only issue I see is that parquet-hive cannot backport this jira. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958074#comment-13958074 ] Szehon Ho commented on HIVE-6785: - OK I'm not a huge fan of moving that inspector to a unnatural place because it will be stuck like that going forward in hive, but we can let others also chime in. If its really important to support for earlier hive, maybe one option is to back-port a different version of the patch into parquet? query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958081#comment-13958081 ] Tongjie Chen commented on HIVE-6785: If we can be flexible by back-porting a different version of the patch into parquet-hive, that would be great! I like to keep parquet related stuff in parquet package as well. query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of other SerDe, AND if this table has string column[s], hive generates confusing error message: Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector This is confusing because timestamp is mentioned even if it is not been used by the table. The reason is when there is SerDe difference between table and partition, hive tries to convert objectinspector of two SerDes. ParquetHiveSerDe's object inspector for string type is ParquetStringInspector (newly introduced), neither a subclass of WritableStringObjectInspector nor JavaStringObjectInspector, which ObjectInspectorConverters expect for string category objector inspector. There is no break statement in STRING case statement, hence the following TIMESTAMP case statement is executed, generating confusing error message. see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 To fix that it is relatively easy, just make ParquetStringInspector subclass of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. But because constructor of class JavaStringObjectInspector is package scope instead of public or protected, we would need to move ParquetStringInspector to the same package with JavaStringObjectInspector. Also ArrayWritableObjectInspector's setStructFieldData needs to also accept List data, since the corresponding setStructFieldData and create methods return a list. This is also needed when table SerDe is ParquetHiveSerDe, and partition SerDe is something else. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957258#comment-13957258 ] Tongjie Chen commented on HIVE-6785: This patch involves deleting file and adding new files (mv), and there is no instruction to delete/add if using git in https://cwiki.apache.org/confluence/display/Hive/HowToContribute; however my patch is using git diff, if that does not work, I will resubmit a patch using svn. https://reviews.apache.org/r/19896/ query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen Attachments: HIVE-6785.1.patch.txt More specifically, if table contains string type columns. it will result in the following exception Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe
[ https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954291#comment-13954291 ] Brock Noland commented on HIVE-6785: FYI [~jcoffey] [~xuefuz] [~szehon] query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe -- Key: HIVE-6785 URL: https://issues.apache.org/jira/browse/HIVE-6785 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Tongjie Chen More specifically, if table contains string type columns. it will result in the following exception Failed with exception java.io.IOException:java.lang.ClassCastException: parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector see also in the following parquet issue: https://github.com/Parquet/parquet-mr/issues/324 -- This message was sent by Atlassian JIRA (v6.2#6252)