Do you mean "d = group c by (var1, var2); "? If so, I can see the combiner being used. Which version of Pig are you using?

Daniel

On 06/16/2011 11:13 AM, Shubham Chopra wrote:
Hi,

My pig query is roughly the following:

register some_lib.jar
a = load 'somefile' using CustomUDF();
b = foreach a generate CustomProjectionUDF();
c = foreach b generate var1, var2, var3;
d = group b by (var1, var2);
e = foreach d generate flatten(group), SUM(c.var1), SUM(c.var2),
SUM(c.var3);
store e into 'file';

I was expecting to see the combiner being used, but the optimizer did not
use a combiner. The following is the output I see (version 0.8.1)
INFO executionengine.HExecutionEngine: pig.usenewlogicalplan is set to true.
New logical plan will be used.
INFO executionengine.HExecutionEngine: (Name: agg:
Store(hdfs://machine:9000/SomeFile:PigStorage('|')) - scope-4353 Operator
Key: scope-4353)
INFO mapReduceLayer.MRCompiler: File concatenation threshold: 100
optimistic? false
INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
INFO mapReduceLayer.AccumulatorOptimizer: Reducer is to run in accumulative
mode.
INFO pigstats.ScriptState: Pig script settings are added to the job
INFO mapReduceLayer.JobControlCompiler: BytesPerReducer=1000000000
maxReducers=999 totalInputFileSize=611579950
INFO mapReduceLayer.JobControlCompiler: Neither PARALLEL nor default
parallelism is set for this job. Setting number of reducers to 1
INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for
submission.

How can I enforce the use of combiner here?

Thanks,
Shubham.

Reply via email to