The reason for replication also has to do with data locality in a larger cluster for running a map-reduce jobs. You can reduce the replication, that's why it's a configurable parameter.
On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <[email protected]> wrote: > I know the default replication is 3, which ensures reliability when 2 > nodes crash at the same time. > > However, for a small cluster, e.g. 10~20 nodes, the possibility that 2 > nodes crash at the same time is too small. > > Can we simply set the replication to 2, or are there any other defects? > > any information are appreciated! >
