Olga Natkovich resolved PIG-1081.

    Resolution: Duplicate

dumplicate of 1084

> PigCookBook use of PARALLEL keyword
> -----------------------------------
>                 Key: PIG-1081
>                 URL: https://issues.apache.org/jira/browse/PIG-1081
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.5.0
>            Reporter: Viraj Bhat
>             Fix For: 0.5.0
> Hi all,
>  I am looking at some tips for optimizing Pig programs (Pig Cookbook) using 
> the PARALLEL keyword.
> http://hadoop.apache.org/pig/docs/r0.5.0/cookbook.html#Use+PARALLEL+Keyword 
> We know that currently Pig 0.5 uses Hadoop 20 (as its default) which launches 
> 1 reducer for all cases. 
> In this documentation we state that: <num machines> * <num reduce slots per 
> machine> * 0.9, this documentation was valid for HoD (Hadoop on Demand) where 
> you are creating your own Hadoop clusters, but if you are using:
> Either the Capacity Scheduler 
> http://hadoop.apache.org/common/docs/current/capacity_scheduler.html or the 
> Fair Share Scheduler 
> http://hadoop.apache.org/common/docs/current/fair_scheduler.html , these 
> numbers could mean that you are using around 90% of your reducer slots in 
> your machine.
> We should change this to something like: 
> The number of reducers you may need for a particular construct in Pig which 
> forms a Map Reduce boundary depends entirely on your data and the number of 
> intermediate keys you are generating in your mappers. In best cases we have 
> seen that a reducer processing about 500 MB of data behaves efficiently. 
> Additionally it is hard to define the optimum number of reducers, since it 
> completely depends on the paritioner and distribution of map (combiner) 
> output keys.
> Viraj

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to