Hi all, I have a Pig script that only runs if I turn on "-no_multiquery".
What the script does is this: - read from disk a relation where each tuple has 10 fields, one of which is a count - take each non-count field in turn, group by it, and sum the counts for each group. The full code is included at the end of the email. With "-no_multiquery" each of the groups is processed individually, and things work just fine. Without that option, I get a bunch of java.lang.OutOfMemoryError: GC overhead limit exceeded And the failed job message says: JobId Alias Feature Message Outputs job_201101201235_0287 merged_rules,statL_rules,statL_rules_grouped,statL_totals,statLt_rules,statL t_rules_grouped,statLt_totals,statR_rules,statR_rules_grouped,statR_totals,s tatRt_rules,statRt_rules_grouped,statRt_totals,statT_rules,statT_rules_group ed,statT_totals MULTI_QUERY,COMBINER Message: Job failed! I'm running pig 0.8.0, on hadoop 0.20.2 and java 1.6.0_06. My questions are: - is it expected that Pig's multiquery execution would create enough of an overhead that the execution should fail? - can someone explain (or point me to an explanation) of where the multiquery overhead comes from? I'd really like to understand it - is there a better way to write the pig code to do that computation? Maybe I can re-structure my computation, or configure my cluster differently? Or am I stuck with a no_multiquery execution? Many thanks, Dragos Munteanu CODE: merged_rules = LOAD 'RuleProcess.xTCxi/rules' AS (pruneType:int, dbkey:chararray, root:chararray, lhs:chararray, lhsTokens:chararray, rhs:chararray, rhsTokens:chararray, align:chararray, count:long, features:chararray); -- compute stats: root statT_rules = FOREACH merged_rules GENERATE root, count; statT_rules_grouped = GROUP statT_rules BY root PARALLEL 30; statT_totals = FOREACH statT_rules_grouped GENERATE FLATTEN(group), SUM(statT_rules.count) AS total; STORE statT_totals INTO 'RuleProcess.xTCxi.4/stats.root' using PigStorage; -- compute stats: lhs statL_rules = FOREACH merged_rules GENERATE lhs, count; statL_rules_grouped = GROUP statL_rules BY lhs PARALLEL 30; statL_totals = FOREACH statL_rules_grouped GENERATE FLATTEN(group), SUM(statL_rules.count) AS total; STORE statL_totals INTO 'RuleProcess.xTCxi.4/stats.lhs' using PigStorage; -- compute stats: lhsTokens statLt_rules = FOREACH merged_rules GENERATE lhsTokens, count; statLt_rules_grouped = GROUP statLt_rules BY lhsTokens PARALLEL 30; statLt_totals = FOREACH statLt_rules_grouped GENERATE FLATTEN(group), SUM(statLt_rules.count) AS total; STORE statLt_totals INTO 'RuleProcess.xTCxi.4/stats.lhsTokens' using PigStorage; -- compute stats: rhs statR_rules = FOREACH merged_rules GENERATE rhs, count; statR_rules_grouped = GROUP statR_rules BY rhs PARALLEL 30; statR_totals = FOREACH statR_rules_grouped GENERATE FLATTEN(group), SUM(statR_rules.count) AS total; STORE statR_totals INTO 'RuleProcess.xTCxi.4/stats.rhs' using PigStorage; -- compute stats: rhsTokens statRt_rules = FOREACH merged_rules GENERATE rhsTokens, count; statRt_rules_grouped = GROUP statRt_rules BY rhsTokens PARALLEL 30; statRt_totals = FOREACH statRt_rules_grouped GENERATE FLATTEN(group), SUM(statRt_rules.count) AS total; STORE statRt_totals INTO 'RuleProcess.xTCxi.4/stats.rhsTokens' using PigStorage; </pre> <BR style="font-size:4px;"> <a href = "http://www.sdl.com/innovate"><img src="http://www.sdl.com/images/Innovate2011_emailsignature_final.png" alt="www.sdl.com" border="0"/></a> <BR> <font face="arial" size="2"><a href ="http://www.sdl.com/innovate" style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font> <BR> <BR> <font face="arial" size="1" color="#736F6E"> <b>SDL PLC confidential, all rights reserved.</b> If you are not the intended recipient of this mail SDL requests and requires that you delete it without acting upon or copying any of its contents, and we further request that you advise us.<BR> SDL PLC is a public limited company registered in England and Wales. Registered number: 02675207.<BR> Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, UK. </font>
