Milind Bhandarkar commented on PIG-856:

+1 on seeing performance differences. But, is there code in pig to determine 
that the output of a previous map-reduce stage is not accessible because of 
datanode failures (as opposed to some other reason), and repeat the map-reduce 
stage ? Because a single datanode failure with replication 1 will cause 
temporary data to be unavailable, and is  very likely for long-running queries.

> PERFORMANCE: reduce number of replicas
> --------------------------------------
>                 Key: PIG-856
>                 URL: https://issues.apache.org/jira/browse/PIG-856
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Olga Natkovich
> Currently Pig uses the default number of replicas between MR jobs. Currently, 
> the number is 3. Given the temp nature of the data, we should never need more 
> than 2 and should explicitely set it to improve performance and to be nicer 
> to the name node.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to