Hi,
I am using cloudera and run mapreduce job written with pig latin, I met
the following exception in a map task:
014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.DataBag
at
org.apache.pig.builtin.Distinct.getDistinctFromNestedBags(Distinct.java:140)
at org.apache.pig.builtin.Distinct.access$100(Distinct.java:39)
at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:101)
at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:94)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:376)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:354)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:263)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:220)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:210)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:185)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1477)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1587)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1199)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:609)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
By looking up the staketrace i think the exception is throw here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.pig/pig/0.11.0-cdh4.3.1/org/apache/pig/builtin/Distinct.java
line 140
However, the second retry of this map task succeed. They are using exactly
the same data and same code. This really confuse me.
Any insight about this?
Thanks,
Lei
[email protected]