[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-26 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.2.patch

New patch submitted based on comments and suggestions from ryan.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.2.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.1.patch

New patch file submitted by correcting indentations.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Description: 
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ name : action, type : [ { type : array, items : string }, 
null ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
{code}


This issue has long back posted on Parquet issues list and Since this is 
related to Parquet Hive serde, I have created the Hive issue here, The details 
and history of this information are as shown in the link here 
https://github.com/Parquet/parquet-mr/issues/281.

  was:
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ name : action, type : [ { type : array, items : string }, 
null ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at 

[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Fix Version/s: 0.14.0
   Status: Patch Available  (was: Open)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Status: Open  (was: Patch Available)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.patch

This patch fixes this issue,Since this feature we want to use in the next 
release of Hive. Requesting someone to look into this patch changes and merge 
to the main branch.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Status: Patch Available  (was: Open)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7850:
---

Assignee: Sathish

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)