Thanks! I guess its time to move to the trunk then! ~Shubham.
On Thu, Jun 16, 2011 at 3:13 PM, Dmitriy Ryaboy <[email protected]> wrote: > I've confirmed this behavior in 8.1 and the fact that it's fixed in > trunk (didn't check 9). > > > On Thu, Jun 16, 2011 at 12:00 PM, Shubham Chopra > <[email protected]> wrote: > > Hi Daniel, > > > > I am seeing this behaviour with 0.8.1. > > > > Consider the an input file named a containing the following: > > 1|2|3 > > 3||4 > > > > I start pig in the local mode and then use the following script: > > a = load 'a' using PigStorage('|'); > > b = group a by $0; > > c = foreach b generate 'Test' as name, flatten(group), SUM(a.$0) as s0, > > SUM(a.$1) as s1, SUM(a.$2) as s2; > > dump c; > > > > The above script does not use the combiner. > > > > However, the following script does: > > a = load 'a' using PigStorage('|'); > > b = group a by $0; > > c = foreach b generate flatten(group), SUM(a.$0) as s0, SUM(a.$1) as s1, > > SUM(a.$2) as s2; > > dump c; > > > > This script uses the combiner. > > > > I pinpointed the difference to using or not using a constant in the > foreach > > statement. Is this an expected behavior? I was thinking the decision to > use > > a combiner depends on UDFs implementing the algebraic interface. Why is > the > > constant projection stopping the combiner from being used? > > > > Thanks, > > Shubham. > > > > On Thu, Jun 16, 2011 at 2:38 PM, Daniel Dai <[email protected]> > wrote: > > > >> Do you mean "d = group c by (var1, var2); "? If so, I can see the > combiner > >> being used. Which version of Pig are you using? > >> > >> Daniel > >> > >> > >> On 06/16/2011 11:13 AM, Shubham Chopra wrote: > >> > >>> Hi, > >>> > >>> My pig query is roughly the following: > >>> > >>> register some_lib.jar > >>> a = load 'somefile' using CustomUDF(); > >>> b = foreach a generate CustomProjectionUDF(); > >>> c = foreach b generate var1, var2, var3; > >>> d = group b by (var1, var2); > >>> e = foreach d generate flatten(group), SUM(c.var1), SUM(c.var2), > >>> SUM(c.var3); > >>> store e into 'file'; > >>> > >>> I was expecting to see the combiner being used, but the optimizer did > not > >>> use a combiner. The following is the output I see (version 0.8.1) > >>> INFO executionengine.**HExecutionEngine: pig.usenewlogicalplan is set > to > >>> true. > >>> New logical plan will be used. > >>> INFO executionengine.**HExecutionEngine: (Name: agg: > >>> Store(hdfs://machine:9000/**SomeFile:PigStorage('|')) - scope-4353 > >>> Operator > >>> Key: scope-4353) > >>> INFO mapReduceLayer.MRCompiler: File concatenation threshold: 100 > >>> optimistic? false > >>> INFO mapReduceLayer.**MultiQueryOptimizer: MR plan size before > >>> optimization: 1 > >>> INFO mapReduceLayer.**MultiQueryOptimizer: MR plan size after > >>> optimization: 1 > >>> INFO mapReduceLayer.**AccumulatorOptimizer: Reducer is to run in > >>> accumulative > >>> mode. > >>> INFO pigstats.ScriptState: Pig script settings are added to the job > >>> INFO mapReduceLayer.**JobControlCompiler: BytesPerReducer=1000000000 > >>> maxReducers=999 totalInputFileSize=611579950 > >>> INFO mapReduceLayer.**JobControlCompiler: Neither PARALLEL nor default > >>> parallelism is set for this job. Setting number of reducers to 1 > >>> INFO mapReduceLayer.**MapReduceLauncher: 1 map-reduce job(s) waiting > for > >>> submission. > >>> > >>> How can I enforce the use of combiner here? > >>> > >>> Thanks, > >>> Shubham. > >>> > >> > >> > > >
