[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833250#action_12833250 ] Richard Ding commented on PIG-1131: --- +1 for commit. Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831818#action_12831818 ] Hadoop QA commented on PIG-1131: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435394/pig-1131.patch against trunk revision 908177. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/196/console This message is automatically generated. Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, pig-1131.patch, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831898#action_12831898 ] Hadoop QA commented on PIG-1131: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435402/pig-1131.patch against trunk revision 908324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/197/console This message is automatically generated. Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831248#action_12831248 ] Viraj Bhat commented on PIG-1131: - Olga I marked it as critical since we mention that Pig can eat any type of data, and the example script shows that we need data with fixed schema's and to perform a simple join. Viraj Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831251#action_12831251 ] Viraj Bhat commented on PIG-1131: - Ashutosh I was able to recreate a similar problem using the trunk. java -cp pig-withouthadoop.jar org.apache.pig.Main -version Apache Pig version 0.7.0-dev (r907874) compiled Feb 08 2010, 17:35:04 Viraj Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829736#action_12829736 ] Ashutosh Chauhan commented on PIG-1131: --- Can't reproduce this on trunk. PIG-1194 touched upon the same piece of code and was recently checked in. That one might have fixed this one too. Viraj, can you please confirm if you can reproduce it or some variant of it ? Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787234#action_12787234 ] Pradeep Kamath commented on PIG-1131: - Not sure, but looking through the code, the issue might be present even if first input tuple has more fields than subsequent input tuples for either of the inputs in the join. The issue is because in the optimization to not send parts of the value present in the key, in POLocalRearrange, we further try to optimize by remembering which parts of the value were present in the key for the first value - so if the next value has different number of fields, we hit the exception seen in the description. Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Priority: Critical Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.