Yes and yes.. in any case, the latest from SVN doesn't have this
issue. Guessing it was 921 that did it.
-D
On Tue, Oct 13, 2009 at 4:01 PM, Alan Gates wrote:
> Have you checked that each record your input data has at least the number of
> fields you specify? Have you checked that the field separator in your data
> matches the default for PigPerformanceLoader (^A I think)?
>
> Alan.
>
> On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:
>
>> We ran into what looks like some edge case bug in Pig, which causes it
>> to throw an IndexOutOfBoundsException (stack trace below). The script
>> just joins two relations; it looks like our data was generated
>> incorrectly, and the join is empty, which may be what's causing the
>> failure. It also appears to only happen when at least one of the
>> inputs is on the large size (at least a few hundred megs). Any ideas
>> on what could be happening and how to zoom in on the underlying cause?
>> We are running off unmodified trunk.
>>
>> Script:
>>
>> register datagen.jar;
>> E = load 'Employee' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (id,name,cc,dc);
>> D = load 'Department' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (dept_id,dept_nm);
>> P = load 'Project' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (id,emp_id,role);
>> R1 = JOIN E by dc, D by dept_id;
>> R2 = JOIN R1 by E::id, P by emp_id;
>> store R2 into 'TestCase2Output';
>>
>> R2 join fails with the stack trace below. It also fails if we
>> pre-calculate R1, store it, and load it directly (so, load R1, load P,
>> join R1 by $0, P by emp_id). We've verified that the records in R1 and
>> R2 have the expected fields, etc.
>>
>>
>> Stack Trace:
>>
>> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>> at java.util.ArrayList.get(ArrayList.java:322)
>> at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>