Have you checked that each record your input data has at least the number of fields you specify? Have you checked that the field separator in your data matches the default for PigPerformanceLoader (^A I think)?


On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:

We ran into what looks like some edge case bug in Pig, which causes it
to throw an IndexOutOfBoundsException (stack trace below).  The script
just joins two relations; it looks like our data was generated
incorrectly, and the join is empty, which may be what's causing the
failure. It also appears to only happen when at least one of the
inputs is on the large size (at least a few hundred megs).  Any ideas
on what could be happening and how to zoom in on the underlying cause?
We are running off unmodified trunk.


register datagen.jar;
E =  load 'Employee' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
D =  load 'Department' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
P =  load 'Project' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
R1 = JOIN E by dc, D by dept_id;
R2 = JOIN R1 by E::id, P by emp_id;
store R2 into 'TestCase2Output';

R2 join fails with the stack trace below. It also fails if we
pre-calculate R1, store it, and load it directly (so, load R1, load P,
join R1 by $0, P by emp_id). We've verified that the records in R1 and
R2 have the expected fields, etc.

Stack Trace:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
       at java.util.ArrayList.RangeCheck(ArrayList.java:547)
       at java.util.ArrayList.get(ArrayList.java:322)
       at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at org .apache .pig .backend .hadoop .executionengine .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 148) at org .apache .pig .backend .hadoop .executionengine .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 226) at org .apache .pig .backend .hadoop .executionengine .physicalLayer .relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java: 260) at org .apache .pig .backend .hadoop .executionengine .physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org .apache .pig .backend .hadoop .executionengine .mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org .apache .pig .backend .hadoop .executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org .apache .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce $Map.map(PigMapReduce.java:93)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 358)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
       at org.apache.hadoop.mapred.Child.main(Child.java:170)

