Dmitriy V. Ryaboy commented on PIG-1062:


bq. sending a special tuple with number of rows in the split will likely lead 
to more accurate estimate of number of reducers required.

You can get the same info from the counters without unnecessarily complicating 
tuple processing, imo. In fact you can use (num bytes read / num records read) 
to get the old calculation, and not rely on number of samples and local average 
size estimates.

> load-store-redesign branch: change SampleLoader and subclasses to work with 
> new LoadFunc interface 
> ---------------------------------------------------------------------------------------------------
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
> This is part of the effort to implement new load store interfaces as laid out 
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
> be changed to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to