[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Attachment: HIVE-7850.2.patch New patch submitted based on comments and suggestions from ryan. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.1.patch, HIVE-7850.2.patch, HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Attachment: HIVE-7850.1.patch New patch file submitted by correcting indentations. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.1.patch, HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Description: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. was: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Fix Version/s: 0.14.0 Status: Patch Available (was: Open) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Status: Open (was: Patch Available) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Attachment: HIVE-7850.patch This patch fixes this issue,Since this feature we want to use in the next release of Hive. Requesting someone to look into this patch changes and merge to the main branch. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Status: Patch Available (was: Open) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7850: --- Assignee: Sathish Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)