[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792468#action_12792468 ] Hadoop QA commented on PIG-1157: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428359/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/console This message is automatically generated. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792593#action_12792593 ] Olga Natkovich commented on PIG-1157: - +1. Patch looks good. Will commit once the tests pass. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792633#action_12792633 ] Hadoop QA commented on PIG-1157: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428448/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/console This message is automatically generated. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. -
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791730#action_12791730 ] Richard Ding commented on PIG-1157: --- Two quick observations: 1. The script works if multi-query optimization is disabled (-M). 2. The script also works if using regular join instead of replicated join. I'll look into it further. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.