Thanks! I guess its time to move to the trunk then!

~Shubham.

On Thu, Jun 16, 2011 at 3:13 PM, Dmitriy Ryaboy <[email protected]> wrote:

> I've confirmed this behavior in 8.1 and the fact that it's fixed in
> trunk (didn't check 9).
>
>
> On Thu, Jun 16, 2011 at 12:00 PM, Shubham Chopra
> <[email protected]> wrote:
> > Hi Daniel,
> >
> > I am seeing this behaviour with 0.8.1.
> >
> > Consider the an input file named a containing the following:
> > 1|2|3
> > 3||4
> >
> > I start pig in the local mode and then use the following script:
> > a = load 'a' using PigStorage('|');
> > b = group a by $0;
> > c = foreach b generate 'Test' as name, flatten(group), SUM(a.$0) as s0,
> > SUM(a.$1) as s1, SUM(a.$2) as s2;
> > dump c;
> >
> > The above script does not use the combiner.
> >
> > However, the following script does:
> > a = load 'a' using PigStorage('|');
> > b = group a by $0;
> > c = foreach b generate flatten(group), SUM(a.$0) as s0, SUM(a.$1) as s1,
> > SUM(a.$2) as s2;
> > dump c;
> >
> > This script uses the combiner.
> >
> > I pinpointed the difference to using or not using a constant in the
> foreach
> > statement. Is this an expected behavior? I was thinking the decision to
> use
> > a combiner depends on UDFs implementing the algebraic interface. Why is
> the
> > constant projection stopping the combiner from being used?
> >
> > Thanks,
> > Shubham.
> >
> > On Thu, Jun 16, 2011 at 2:38 PM, Daniel Dai <[email protected]>
> wrote:
> >
> >> Do you mean "d = group c by (var1, var2); "? If so, I can see the
> combiner
> >> being used. Which version of Pig are you using?
> >>
> >> Daniel
> >>
> >>
> >> On 06/16/2011 11:13 AM, Shubham Chopra wrote:
> >>
> >>> Hi,
> >>>
> >>> My pig query is roughly the following:
> >>>
> >>> register some_lib.jar
> >>> a = load 'somefile' using CustomUDF();
> >>> b = foreach a generate CustomProjectionUDF();
> >>> c = foreach b generate var1, var2, var3;
> >>> d = group b by (var1, var2);
> >>> e = foreach d generate flatten(group), SUM(c.var1), SUM(c.var2),
> >>> SUM(c.var3);
> >>> store e into 'file';
> >>>
> >>> I was expecting to see the combiner being used, but the optimizer did
> not
> >>> use a combiner. The following is the output I see (version 0.8.1)
> >>> INFO executionengine.**HExecutionEngine: pig.usenewlogicalplan is set
> to
> >>> true.
> >>> New logical plan will be used.
> >>> INFO executionengine.**HExecutionEngine: (Name: agg:
> >>> Store(hdfs://machine:9000/**SomeFile:PigStorage('|')) - scope-4353
> >>> Operator
> >>> Key: scope-4353)
> >>> INFO mapReduceLayer.MRCompiler: File concatenation threshold: 100
> >>> optimistic? false
> >>> INFO mapReduceLayer.**MultiQueryOptimizer: MR plan size before
> >>> optimization: 1
> >>> INFO mapReduceLayer.**MultiQueryOptimizer: MR plan size after
> >>> optimization: 1
> >>> INFO mapReduceLayer.**AccumulatorOptimizer: Reducer is to run in
> >>> accumulative
> >>> mode.
> >>> INFO pigstats.ScriptState: Pig script settings are added to the job
> >>> INFO mapReduceLayer.**JobControlCompiler: BytesPerReducer=1000000000
> >>> maxReducers=999 totalInputFileSize=611579950
> >>> INFO mapReduceLayer.**JobControlCompiler: Neither PARALLEL nor default
> >>> parallelism is set for this job. Setting number of reducers to 1
> >>> INFO mapReduceLayer.**MapReduceLauncher: 1 map-reduce job(s) waiting
> for
> >>> submission.
> >>>
> >>> How can I enforce the use of combiner here?
> >>>
> >>> Thanks,
> >>> Shubham.
> >>>
> >>
> >>
> >
>

Reply via email to