Hi,
during execution of the following PIG script i ran into the class cast
exception mentioned in the title of this mail. The log indicates, that
the error is happening in the reduce process and i somehow have the
feeling, that the problem exists because of my UDF, since the error
happens in GreaterThanExpr.java:106 which only could occurs in the step
that defines variable "j4". BUT ...
1. My UDF returns an int (both in code and as defined by @outputSchema)
2. The returned value is compared to an int (so this should not be the
problem)
Is there anybody who sees a solution and reason for this behavious?
Best regards,
Elmar
##########################################
register 'myUDF.py' using jython as moins;
t = load 'rt_bow_notag';
rt = load 'rt_only_notag';
j = join t by $1, rt by $2;
j3 = foreach j generate $0 as ots, $1 as oauthor, $2 as omention, $3 as
otag1, $4 as otag2, $5 as ourl, $6 as ort, $7 as oown, $8 as omsg, $9 as
rtts, $10 as rtauthor, $11 as rtmention, moins.isRT($8,$17) as isrt;
j4 = filter j3 by (isrt > 0);
j5 = limit j4 5;
dump j5;
###########################################
The UDF "isRT" is defined in myUDF.py as the following:
###########################################
@outputSchema("isRT:int")
def isRT(bag1, bag2):
if bag1 is None or bag2 is None: return 0;
intersection = set(bag1) & set(bag2)
intersection_size = len(intersection)
size1 = len(set(bag1));
size2 = len(set(bag2));
max_size = max([size1,size2])
if max_size == 0: return 1
if intersection_size/max_size > 0.5: return 1
return 0
############################################
And finally the stacktrace looks like the following:
############################################
2012-09-19 13:35:18,358 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 18% complete
2012-09-19 13:38:25,393 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 33% complete
2012-09-19 13:41:33,896 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2012-09-19 13:41:38,422 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201209181449_0024 has failed! Stop running all dependent jobs
2012-09-19 13:41:38,422 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-09-19 13:41:38,683 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backed error: java.lang.ClassCastException:
java.lang.Integer cannot be cast to java.lang.Double
at java.lang.Double.compareTo(Double.java:49)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.doComparison(GreaterThanExpr.java:106)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNext(GreaterThanExpr.java:74)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:117)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:460)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:428)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:400)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:262)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2012-09-19 13:41:38,684 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-09-19 13:41:38,685 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: