[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970726#comment-13970726
 ] 

Brock Noland commented on HIVE-6785:


+1

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-15 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970121#comment-13970121
 ] 

Szehon Ho commented on HIVE-6785:
-

+1 (non-binding) +  [~brocknoland] 

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966355#comment-13966355
 ] 

Szehon Ho commented on HIVE-6785:
-

Hi Tonjie, these are deprecated now and will be removed.  See the discussion on 
HIVE-6757, for the current state.

Use 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe',
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat',
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967047#comment-13967047
 ] 

Hive QA commented on HIVE-6785:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12639834/HIVE-6785.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5615 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2221/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2221/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12639834

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-11 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967069#comment-13967069
 ] 

Tongjie Chen commented on HIVE-6785:


[~szehon], are those failure transient? the new patch only changes the parquet 
class, which has nothing to do with these test cases.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967087#comment-13967087
 ] 

Szehon Ho commented on HIVE-6785:
-

Yea, it doesnt look related to this patch.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-11 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967259#comment-13967259
 ] 

Szehon Ho commented on HIVE-6785:
-

Looked at the new patch, looks fine with me

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt, 
 HIVE-6785.3.patch


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-10 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966093#comment-13966093
 ] 

Tongjie Chen commented on HIVE-6785:


Hi [~brocknoland], 

are you suggesting using:

ALTER TABLE parquet_mixed_fileformat SET FILEFORMAT PARQUET;  ( after I execute 
this statement, the table become not found)

instead of:

ALTER TABLE parquet_mixed_fileformat set SERDE 
'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE parquet_mixed_fileformat
 SET FILEFORMAT
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Please advise.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-07 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962473#comment-13962473
 ] 

Szehon Ho commented on HIVE-6785:
-

+1 (non-binding) , thanks for adding the q-test and address comments.

FYI [~brocknoland]

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962482#comment-13962482
 ] 

Brock Noland commented on HIVE-6785:


Hi,

LGTM except I see we are using the parquet... class names when creating a 
table, which are soon to be removed.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-07 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962493#comment-13962493
 ] 

Szehon Ho commented on HIVE-6785:
-

Good catch Brock, I missed that.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961491#comment-13961491
 ] 

Hive QA commented on HIVE-6785:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638885/HIVE-6785.2.patch.txt

{color:green}SUCCESS:{color} +1 5549 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2153/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638885

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Fix For: 0.14.0

 Attachments: HIVE-6785.1.patch.txt, HIVE-6785.2.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-05 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961231#comment-13961231
 ] 

Tongjie Chen commented on HIVE-6785:


When I add a qtest, I realized that this bug is resolved with this patch in 
hive-trunk. But it is still a bug in Hive-0.11 now.

Digging a little bit, I found that when Partition SerDe and Table SerDe are 
different, hive 0.11 would try to convert object inspector as long as they are 
not equals; however, in hive-trunk (0.13 or 0.14), if output ObjectInspector's 
all fields are all settable, there is no conversion happening, hence the bug 
presented in this jira does not show in hive-trunk any more.

However, I do think that ParquetStringInspector should be subclass of 
JavaStringObjectInspector, so that Hive 0.11 would have no problem as well.

related Hive Jiras:

HIVE-5202
HIVE-5394

--- HIVE-trunk (0.13, 0.14 etc) code snippet for 
ObjectInspectorConverters -
// 1. If equalsCheck is true and the inputOI is the same as the outputOI OR
// 2. If the outputOI has all fields settable, return it
if ((equalsCheck  inputOI.equals(outputOI)) ||
ObjectInspectorUtils.hasAllFieldsSettable(outputOI, 
oiSettableProperties) == true) {
  return outputOI;
}   

--- HIVE-0.11 code snippet for ObjectInspectorConverters 
-
// If the inputOI is the same as the outputOI, just return it
if (inputOI.equals(outputOI)) {
  return outputOI;
}   

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-05 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961240#comment-13961240
 ] 

Tongjie Chen commented on HIVE-6785:


In my previous comment, I mean that this bug is not reproducible in hive trunk 
due to patches from HIVE-5202 and HIVE-5394.

But the patch introduced in this Jira is still an enhancement.

Btw, how can I edit my own previous comment once submitted?

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957453#comment-13957453
 ] 

Szehon Ho commented on HIVE-6785:
-

Shouldn't need to use svn for that, you can try 'git add'/ 'git rm'.  

I left couple comments on the rb for consideration.  Also when you are ready, 
click 'submit patch' to trigger the pre commit test.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 More specifically, if table contains string type columns. it will result in 
 the following exception Failed with exception 
 java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957992#comment-13957992
 ] 

Szehon Ho commented on HIVE-6785:
-

In my opinion, it is better to change the javastringobjectinspector's Ctor to 
protected , so we can keep all the parquet inspector in the same package.

Also it would be good to add a q-test for this case. You can write one 
following the example of parquet_create.q and then generate and verify the 
result using the following command (maven version) on your q-test file: 
[https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoupdatetheoutputofaCliDrivertestcase?|https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoupdatetheoutputofaCliDrivertestcase?]

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958010#comment-13958010
 ] 

Tongjie Chen commented on HIVE-6785:


If we change javastringobjectinspector's Ctor to be protected, that should work 
fine for 0.13.   But when we backport this jira to parquet-hive, it will break 
unless we also make the same change in hive 0.12 and hive 0.10, right?


 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958043#comment-13958043
 ] 

Szehon Ho commented on HIVE-6785:
-

Yea thats right, do you think it would be a huge issue though if its a hive 
0.13 only fix?

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958054#comment-13958054
 ] 

Tongjie Chen commented on HIVE-6785:


The only issue I see is that parquet-hive cannot backport this jira.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958074#comment-13958074
 ] 

Szehon Ho commented on HIVE-6785:
-

OK I'm not a huge fan of moving that inspector to a unnatural place because it 
will be stuck like that going forward in hive, but we can let others also chime 
in.  

If its really important to support for earlier hive, maybe one option is to 
back-port a different version of the patch into parquet?

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-02 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958081#comment-13958081
 ] 

Tongjie Chen commented on HIVE-6785:


If we can be flexible by back-porting a different version of the patch into 
parquet-hive, that would be great!

I like to keep parquet related stuff in parquet package as well.

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 When a hive table's SerDe is ParquetHiveSerDe, while some partitions are of 
 other SerDe, AND if this table has string column[s], hive generates confusing 
 error message:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 This is confusing because timestamp is mentioned even if it is not been used 
 by the table. The reason is when there is SerDe difference between table and 
 partition, hive tries to convert objectinspector of two SerDes. 
 ParquetHiveSerDe's object inspector for string type is ParquetStringInspector 
 (newly introduced), neither a subclass of WritableStringObjectInspector nor 
 JavaStringObjectInspector, which ObjectInspectorConverters expect for string 
 category objector inspector. There is no break statement in STRING case 
 statement, hence the following TIMESTAMP case statement is executed, 
 generating confusing error message.
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324
 To fix that it is relatively easy, just make ParquetStringInspector subclass 
 of JavaStringObjectInspector instead of AbstractPrimitiveJavaObjectInspector. 
 But because constructor of class JavaStringObjectInspector is package scope 
 instead of public or protected, we would need to move ParquetStringInspector 
 to the same package with JavaStringObjectInspector.
 Also ArrayWritableObjectInspector's setStructFieldData needs to also accept 
 List data, since the corresponding setStructFieldData and create methods 
 return a list. This is also needed when table SerDe is ParquetHiveSerDe, and 
 partition SerDe is something else.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-04-01 Thread Tongjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957258#comment-13957258
 ] 

Tongjie Chen commented on HIVE-6785:


This patch involves deleting file and adding new files (mv),  and there is no 
instruction to delete/add if using git in 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute; however my 
patch is using git diff, if that does not work, I will resubmit a patch using 
svn.

https://reviews.apache.org/r/19896/

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen
 Attachments: HIVE-6785.1.patch.txt


 More specifically, if table contains string type columns. it will result in 
 the following exception Failed with exception 
 java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6785) query fails when partitioned table's table level serde is ParquetHiveSerDe and partition level serde is of different SerDe

2014-03-29 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954291#comment-13954291
 ] 

Brock Noland commented on HIVE-6785:


FYI [~jcoffey] [~xuefuz] [~szehon]

 query fails when partitioned table's table level serde is ParquetHiveSerDe 
 and partition level serde is of different SerDe
 --

 Key: HIVE-6785
 URL: https://issues.apache.org/jira/browse/HIVE-6785
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Tongjie Chen

 More specifically, if table contains string type columns. it will result in 
 the following exception Failed with exception 
 java.io.IOException:java.lang.ClassCastException: 
 parquet.hive.serde.primitive.ParquetStringInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableTimestampObjectInspector
 see also in the following parquet issue:
 https://github.com/Parquet/parquet-mr/issues/324



--
This message was sent by Atlassian JIRA
(v6.2#6252)