The reason for replication also has to do with data locality in a larger
cluster for running a map-reduce jobs. You can reduce the replication,
that's why it's a configurable parameter.


On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <[email protected]> wrote:

> I know the default replication is 3, which ensures reliability when 2
> nodes crash at the same time.
>
> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
> nodes crash at the same time is too small.
>
> Can we simply set the replication to 2, or are there any other defects?
>
> any information are appreciated!
>

Reply via email to