subject:"Re\: LocalRearrange out of bounds exception \- tips for debugging\?"

Re: LocalRearrange out of bounds exception - tips for debugging?

2009-10-19 Thread Dmitriy Ryaboy

Yes and yes.. in any case, the latest from SVN doesn't have this
issue. Guessing it was 921 that did it.

-D

On Tue, Oct 13, 2009 at 4:01 PM, Alan Gates  wrote:
> Have you checked that each record your input data has at least the number of
> fields you specify?  Have you checked that the field separator in your data
> matches the default for PigPerformanceLoader (^A I think)?
>
> Alan.
>
> On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:
>
>> We ran into what looks like some edge case bug in Pig, which causes it
>> to throw an IndexOutOfBoundsException (stack trace below).  The script
>> just joins two relations; it looks like our data was generated
>> incorrectly, and the join is empty, which may be what's causing the
>> failure. It also appears to only happen when at least one of the
>> inputs is on the large size (at least a few hundred megs).  Any ideas
>> on what could be happening and how to zoom in on the underlying cause?
>> We are running off unmodified trunk.
>>
>> Script:
>>
>> register datagen.jar;
>> E =  load 'Employee' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (id,name,cc,dc);
>> D =  load 'Department' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (dept_id,dept_nm);
>> P =  load 'Project' using
>> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
>> (id,emp_id,role);
>> R1 = JOIN E by dc, D by dept_id;
>> R2 = JOIN R1 by E::id, P by emp_id;
>> store R2 into 'TestCase2Output';
>>
>> R2 join fails with the stack trace below. It also fails if we
>> pre-calculate R1, store it, and load it directly (so, load R1, load P,
>> join R1 by $0, P by emp_id). We've verified that the records in R1 and
>> R2 have the expected fields, etc.
>>
>>
>> Stack Trace:
>>
>> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>>       at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>       at java.util.ArrayList.get(ArrayList.java:322)
>>       at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>>       at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>

Re: LocalRearrange out of bounds exception - tips for debugging?

2009-10-13 Thread Alan Gates

Have you checked that each record your input data has at least the  
number of fields you specify?  Have you checked that the field  
separator in your data matches the default for PigPerformanceLoader  
(^A I think)?


Alan.

On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:


We ran into what looks like some edge case bug in Pig, which causes it
to throw an IndexOutOfBoundsException (stack trace below).  The script
just joins two relations; it looks like our data was generated
incorrectly, and the join is empty, which may be what's causing the
failure. It also appears to only happen when at least one of the
inputs is on the large size (at least a few hundred megs).  Any ideas
on what could be happening and how to zoom in on the underlying cause?
We are running off unmodified trunk.

Script:

register datagen.jar;
E =  load 'Employee' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,name,cc,dc);
D =  load 'Department' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(dept_id,dept_nm);
P =  load 'Project' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,emp_id,role);
R1 = JOIN E by dc, D by dept_id;
R2 = JOIN R1 by E::id, P by emp_id;
store R2 into 'TestCase2Output';

R2 join fails with the stack trace below. It also fails if we
pre-calculate R1, store it, and load it directly (so, load R1, load P,
join R1 by $0, P by emp_id). We've verified that the records in R1 and
R2 have the expected fields, etc.


Stack Trace:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
148)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
226)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer 
.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java: 
260)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
   at  
org 
.apache 
.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce 
$Map.map(PigMapReduce.java:93)

   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 
358)

   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

Re: LocalRearrange out of bounds exception - tips for debugging?

Re: LocalRearrange out of bounds exception - tips for debugging?

2 matches

Site Navigation

Mail list logo

Footer information