[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1157: Resolution: Fixed Fix Version/s: (was: 0.6.0) 0.7.0 Status: Resolved (was: Patch Available) patch committed, thanks Richard > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Status: Patch Available (was: Open) > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Attachment: PIG-1157.patch > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Status: Open (was: Patch Available) > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Attachment: PIG-1157.patch The problem is that, by merging a MR splittee with a FR join, the MultiQuery optimizer may introduce a direct cycle to the graph of the MR plan. This patch fixed this problem by not merging FR splitees. This is actually stronger than necessary. A better solution would be to check if merging a MR splittee would form a directed cycle in the original DAG before merging it, and if not, allow the merge to go ahead. > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Status: Patch Available (was: Open) > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1157: Attachment: oomreplicatedjoin.pig replicatedjoinexplain.log Explain output and Pig script. > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat > Fix For: 0.6.0 > > Attachments: oomreplicatedjoin.pig, replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.