Hi all,

I have a Pig script that only runs if I turn on "-no_multiquery".

What the script does is this:
- read from disk a relation where each tuple has 10 fields, one of which is
a count
- take each non-count field in turn, group by it, and sum the counts for
each group.
The full code is included at the end of the email.

With "-no_multiquery" each of the groups is processed individually, and
things work just fine.
Without that option, I get a bunch of
java.lang.OutOfMemoryError: GC overhead limit exceeded
And the failed job message says:
JobId   Alias   Feature Message Outputs
job_201101201235_0287
merged_rules,statL_rules,statL_rules_grouped,statL_totals,statLt_rules,statL
t_rules_grouped,statLt_totals,statR_rules,statR_rules_grouped,statR_totals,s
tatRt_rules,statRt_rules_grouped,statRt_totals,statT_rules,statT_rules_group
ed,statT_totals    MULTI_QUERY,COMBINER Message: Job failed!

I'm running pig 0.8.0, on hadoop 0.20.2 and java 1.6.0_06.

My questions are:
- is it expected that Pig's multiquery execution would create enough of an
overhead that the execution should fail?
- can someone explain (or point me to an explanation) of where the
multiquery overhead comes from? I'd really like to understand it
- is there a better way to write the pig code to do that computation? Maybe
I can re-structure my computation, or configure my cluster differently? Or
am I stuck with a no_multiquery execution?

Many thanks,
Dragos Munteanu


CODE:
merged_rules = LOAD 'RuleProcess.xTCxi/rules' AS (pruneType:int,
dbkey:chararray, root:chararray, lhs:chararray, lhsTokens:chararray,
rhs:chararray, rhsTokens:chararray, align:chararray, count:long,
features:chararray);
-- compute stats: root
statT_rules = FOREACH merged_rules GENERATE root, count;
statT_rules_grouped = GROUP statT_rules BY root PARALLEL 30;
statT_totals = FOREACH statT_rules_grouped GENERATE FLATTEN(group),
SUM(statT_rules.count) AS total;
STORE statT_totals INTO 'RuleProcess.xTCxi.4/stats.root' using PigStorage;
-- compute stats: lhs
statL_rules = FOREACH merged_rules GENERATE lhs, count;
statL_rules_grouped = GROUP statL_rules BY lhs PARALLEL 30;
statL_totals = FOREACH statL_rules_grouped GENERATE FLATTEN(group),
SUM(statL_rules.count) AS total;
STORE statL_totals INTO 'RuleProcess.xTCxi.4/stats.lhs' using PigStorage;
-- compute stats: lhsTokens
statLt_rules = FOREACH merged_rules GENERATE lhsTokens, count;
statLt_rules_grouped = GROUP statLt_rules BY lhsTokens PARALLEL 30;
statLt_totals = FOREACH statLt_rules_grouped GENERATE FLATTEN(group),
SUM(statLt_rules.count) AS total;
STORE statLt_totals INTO 'RuleProcess.xTCxi.4/stats.lhsTokens' using
PigStorage;
-- compute stats: rhs
statR_rules = FOREACH merged_rules GENERATE rhs, count;
statR_rules_grouped = GROUP statR_rules BY rhs PARALLEL 30;
statR_totals = FOREACH statR_rules_grouped GENERATE FLATTEN(group),
SUM(statR_rules.count) AS total;
STORE statR_totals INTO 'RuleProcess.xTCxi.4/stats.rhs' using PigStorage;
-- compute stats: rhsTokens
statRt_rules = FOREACH merged_rules GENERATE rhsTokens, count;
statRt_rules_grouped = GROUP statRt_rules BY rhsTokens PARALLEL 30;
statRt_totals = FOREACH statRt_rules_grouped GENERATE FLATTEN(group),
SUM(statRt_rules.count) AS total;
STORE statRt_totals INTO 'RuleProcess.xTCxi.4/stats.rhsTokens' using
PigStorage;

</pre>
<BR style="font-size:4px;">
<a href = "http://www.sdl.com/innovate";><img 
src="http://www.sdl.com/images/Innovate2011_emailsignature_final.png"; 
alt="www.sdl.com" border="0"/></a>
<BR>
<font face="arial"  size="2"><a href ="http://www.sdl.com/innovate"; 
style="color:005740; font-weight: bold">www.sdl.com/innovate</a></font>
<BR>
<BR>
<font face="arial"  size="1" color="#736F6E">
<b>SDL PLC confidential, all rights reserved.</b>
If you are not the intended recipient of this mail SDL requests and requires 
that you delete it without acting upon or copying any of its contents, and we 
further request that you advise us.<BR>
SDL PLC is a public limited company registered in England and Wales.  
Registered number: 02675207.<BR>
Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6 7DY, 
UK.
</font>

Reply via email to