Thanks Daniel for the comments.
>> PIG-1270 is to solve 2, but performance test does not show improvement This puts a restriction on the PigRecordReader itself and prevents mappers from reading more data. Isn't supposed to increase the performance?. What was the datasize you used? If this patch is compatible with 0.9, I can try it on my cluster. On Mon, Sep 12, 2011 at 11:14 AM, Daniel Dai <[email protected]> wrote: > Two ways to optimize: > 1. Launching less maps > 2. For each map, stop earlier > > PIG-1270 is to solve 2, but performance test does not show improvement. For > 1, in extreme case, such as 2T data only contains 100 records, launching > all > maps is necessary. Pig currently does not probe the input data before > launching map-reduce jobs. Maybe we can launch fewer maps as initial guess > and launch all maps if guess fail. Thoughts? > > Daniel > > On Sun, Sep 11, 2011 at 10:13 PM, Rajesh Balamohan < > [email protected]> wrote: > > > I have a large data set (> 2 TB) and I tried scanning 100 records from > it. > > > > a = load '/usr/largedata/' using PigStorage(','); > > b = limit a 100; > > dump b; > > > > >>>> > > 2011-09-11 21:56:34,262 [main] INFO > > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the > > script: LIMIT > > 2011-09-11 21:56:34,414 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler > - > > File concatenation threshold: 100 optimistic? false > > 2011-09-11 21:56:34,483 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size before optimization: 1 > > 2011-09-11 21:56:34,484 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size after optimization: 1 > > >>>> > > > > This ends up launching a MR job with 20,000+ Maps and a single reducer. > > > > Is it possible for PIG to analyze such cases and realistically scan only > > 100 > > rows (rather than scanning the entire data and emitting 100 rows?). > > > > This is on PIG 0.9. > > > > -- > > ~Rajesh.B > > > -- ~Rajesh.B
