[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Status: Patch Available (was: Open) > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Attachment: pig-1131.patch Previous patch was stale. Merged with trunk and regenerated the patch. > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Status: Open (was: Patch Available) > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Status: Patch Available (was: Reopened) > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1131: -- Attachment: pig-1131.patch In POLocalRearrange number of elements in tuple not present in key (and thus put in value) is computed first time and then cached as an optimization. This patch removes this caching because of the problem illustrated in the bug. Test case included which reproduces the bug. > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1131: Priority: Major (was: Critical) Not sure why this issue was marked critical > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1131: Attachment: simplejoinscript.pig junk2.txt junk1.txt Dummy datasets and pig script > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.