[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-12 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833250#action_12833250
 ] 

Richard Ding commented on PIG-1131:
---

+1 for commit.

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, 
 simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831818#action_12831818
 ] 

Hadoop QA commented on PIG-1131:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435394/pig-1131.patch
  against trunk revision 908177.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/196/console

This message is automatically generated.

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, pig-1131.patch, 
 simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831898#action_12831898
 ] 

Hadoop QA commented on PIG-1131:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435402/pig-1131.patch
  against trunk revision 908324.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/console

This message is automatically generated.

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, 
 simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831248#action_12831248
 ] 

Viraj Bhat commented on PIG-1131:
-

Olga I marked it as critical since we mention that Pig can eat any type of 
data, and the example script shows that we need data with fixed schema's and to 
perform a simple join.

Viraj

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-08 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831251#action_12831251
 ] 

Viraj Bhat commented on PIG-1131:
-

Ashutosh I was able to recreate a similar problem using the trunk. 

java -cp pig-withouthadoop.jar org.apache.pig.Main -version


Apache Pig version 0.7.0-dev (r907874) 

compiled Feb 08 2010, 17:35:04

Viraj

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829736#action_12829736
 ] 

Ashutosh Chauhan commented on PIG-1131:
---

Can't reproduce this on trunk. PIG-1194 touched upon the same piece of code and 
was recently checked in. That one might have fixed this one too. Viraj, can you 
please confirm if you can reproduce it or some variant of it ?

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2009-12-07 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787234#action_12787234
 ] 

Pradeep Kamath commented on PIG-1131:
-

Not sure, but looking through the code, the issue might be present even if 
first input tuple has more fields than subsequent input tuples for either of 
the inputs in the join. The issue is because in the optimization to not send 
parts of the value present in the key, in POLocalRearrange, we further try to 
optimize by remembering which parts of the value were present in the key for 
the first value - so if the next value has different number of fields, we hit 
the exception seen in the description.

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.