[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788387#action_12788387
 ] 

Viraj Bhat commented on PIG-1131:
---------------------------------

Hi Pradeep,
 So the workaround for this is for the user to specify the schema for the 
largest size tuple or which contains the maximum number of fields/columns?
Viraj

> Pig simple join does not work when it contains empty lines
> ----------------------------------------------------------
>
>                 Key: PIG-1131
>                 URL: https://issues.apache.org/jira/browse/PIG-1131
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: junk1.txt, junk2.txt, simplejoinscript.pig
>
>
> I have a simple script, which does a JOIN.
> {code}
> input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
> describe input1;
> input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
> describe input2;
> joineddata = JOIN input1 by $0, input2 by $0;
> describe joineddata;
> store joineddata into 'result';
> {code}
> The input data contains empty lines.  
> The join fails in the Map phase with the following error in the 
> PRLocalRearrange.java
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>       at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>       at java.util.ArrayList.get(ArrayList.java:322)
>       at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:159)
> I am surprised that the test cases did not detect this error. Could we add 
> this data which contains empty lines to the testcases?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to