Furthure found:
Works:  select deviceid from (select deviceid  from test1)t1 left outer join 
(select deviceid from test2)t2 on t1.deviceid=t2. deviceid;
Failed:  select deviceid from (select distinct deviceid  from test1)t1 left 
outer join (select deviceid from test2)t2 on t1.deviceid=t2. deviceid;

Difference: The second query has two mr jobs, first count distinct id for 
test1, and then join with test2.
job1's output file's format is sequence file, and map task with job1's output 
has other paths of test2 file in split, but there's only one input format 
"SeuqenceFileInputFormat"

So is there any change between hadoop 2.4 and 2.0 for combine split can cause 
this happen?

Thanks for help.


发件人: Peng <[email protected]<mailto:[email protected]>>
日期: Wed, 15 Oct 2014 23:17:51 +0800
至: <[email protected]<mailto:[email protected]>>
主题: Textfile table but some map task try to use SequenceFile reader

We upgraded from hadoop-2.0.0 to hadoop-2.4.0, without upgrading hive and still 
using hive 0.9(not recompiled with hadoop2.4)

Normal queries work well, like count and udf, but some queries with JOIN failed.
I found some map tasks failed because HIVE treat inputs type wrong. Input table 
stored as textfile scan but some map splits format type is 
SequenceFileInputFormat, ant others are TextInputFormat.

I know Hive 0.9 is very old,but I can't figure out what difference between 
hadoop 2.4 and 2.0 caused this weired result.

Thanks for help.
Failed map task log as below:


2014-10-15 22:44:41,320 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 5 finished. closing...
2014-10-15 22:44:41,320 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 5 forwarded 0 rows
2014-10-15 22:44:41,320 INFO [main] 
org.apache.hadoop.hive.ql.exec.SelectOperator: 4 Close done
2014-10-15 22:44:41,320 INFO [main] 
org.apache.hadoop.hive.ql.exec.FilterOperator: 3 Close done
2014-10-15 22:44:41,320 INFO [main] 
org.apache.hadoop.hive.ql.exec.TableScanOperator: 2 Close done
2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0
2014-10-15 22:44:41,321 INFO [main] 
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2014-10-15 22:44:41,321 INFO [main] 
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 99481 rows
2014-10-15 22:44:41,321 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 finished. closing...
2014-10-15 22:44:41,321 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 forwarded 0 rows
2014-10-15 22:44:41,321 INFO [main] 
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2014-10-15 22:44:41,321 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 
10 Close done
2014-10-15 22:44:41,321 INFO [main] ExecMapper: ExecMapper: processed 99481 
rows: used memory = 187570680
2014-10-15 22:44:41,327 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:350)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:229)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1589)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:336)
... 11 more
Caused by: java.io.IOException: 
hdfs://cluster/new_user/createdate=2013-05-21/2013-05-21_204 not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1854)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1814)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1763)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1777)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:51)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more

Reply via email to