thanks, Peyman! I know it's configurable, what I don't know is if it's typical to reduce it in small cluster,
or are there any recommended setting, such as 2 for 10-node cluster, 3 for 100-node, 4 for 1000-node? or no matter how big the cluster is, just set it to 3. 2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <[email protected]>: > The reason for replication also has to do with data locality in a larger > cluster for running a map-reduce jobs. You can reduce the replication, > that's why it's a configurable parameter. > > > On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <[email protected]> wrote: > >> I know the default replication is 3, which ensures reliability when 2 >> nodes crash at the same time. >> >> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2 >> nodes crash at the same time is too small. >> >> Can we simply set the replication to 2, or are there any other defects? >> >> any information are appreciated! >> > >
