Hi Ashutosh,

Thanks for your reply.

Not sure if HIVE-9324 is the same issue we met.
We found it is a bug in CDH when using MR1 with hive 0.13.1. This bug does
not exist when using yarn with 0.13.1.



Guodong

On Fri, Jan 16, 2015 at 1:21 AM, Ashutosh Chauhan <hashut...@apache.org>
wrote:

> Seems like you are hitting into :
> https://issues.apache.org/jira/browse/HIVE-9324
>
> On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang <wangg...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
>> tables. Here is the sample query.
>>
>> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
>> type)
>> SELECT COALESCE(b.resource_id, a.id) AS id,
>>        a.timstamp,
>>        a.payload,
>>        a.name,
>>        a.type
>> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
>> WHERE a.date='2015-01-17'
>>     AND a.name IN ('a‘, 'b', 'c')
>>     AND a.type <= 14;
>>
>> It turns out that when there are more than 25000 joins rows on a specific
>> id, hive MR job fails, throwing NegativeArraySizeException.
>>
>> Here is the stack trace
>>
>> 2015-01-15 14:38:42,693 ERROR 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
>> java.lang.NegativeArraySizeException
>>      at 
>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>      at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>      at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>      at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>      at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>      at 
>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>      at 
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>      at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>      at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>      at 
>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>      at 
>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>      at 
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>      at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.NegativeArraySizeException
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>      at 
>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>      at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>      at 
>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>      at 
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>      at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.NegativeArraySizeException
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>      ... 11 more
>> Caused by: java.lang.NegativeArraySizeException
>>      at 
>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>      at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>      at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>      at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>      at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>      at 
>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>      at 
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>      at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>      at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>      at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>      ... 12 more
>>
>>
>> I found that when the exceptions are thrown. There is a log like this
>>
>> 2015-01-15 14:38:42,045 INFO 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer 
>> created temp file 
>> /local/data0/mapred/taskTracker/ubuntu/jobcache/job_201412171918_0957/attempt_201412171918_0957_r_000000_0/work/tmp/hive-rowcontainer5023288010679723993/RowContainer5093924743042924240.tmp
>>
>>
>> Looks like when RowContainer collects more than 25000 row records.
>> It will flush out the block to local disk. But it can not read
>> these blocks out.
>>
>> Any help is really appreciated!
>>
>>
>>
>> Guodong
>>
>
>

Reply via email to