Amr Awadallah commented on PIG-856:

Please keep in mind that when running on a loaded system (i.e. with many 
concurrent jobs) the fair-scheduler will have a better chance of allocating 
mappers with local data to process your job if you have more replicas (not sure 
if capacity also does that). So, while setting replicas to less than 3 might 
improve performance when you are only job running in system, it will harm it 
when you are sharing cluster with many others.

Not to mention that this also affects speculative execution, etc.

-- amr

> PERFORMANCE: reduce number of replicas
> --------------------------------------
>                 Key: PIG-856
>                 URL: https://issues.apache.org/jira/browse/PIG-856
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Olga Natkovich
> Currently Pig uses the default number of replicas between MR jobs. Currently, 
> the number is 3. Given the temp nature of the data, we should never need more 
> than 2 and should explicitely set it to improve performance and to be nicer 
> to the name node.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to