Hello, I use hadoop 2.7.3 non-HA setup with hbase 1.2.3 on top of it.
I'm trying to understand these options in hdfs-site.xml: dfs.replication 3 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. dfs.namenode.replication.min 1 Minimal block replication. What I'm trying to do is to make sure that on write we always end up with 2 replicas minimum. In other words, a write should fail if we don't end up with 2 replicas of each block. As I understand, on write, hadoop creates a write pipeline of datanodes where each datanode writes to the next one. Here's a diagram from Cloudera: [image: Inline image 1] Is it correct to say that the dfs.namenode.replication.min option controls how many datanodes in the pipeline must have COMPLETEd the block in order to consider a write successful and then acks to the client about success? And dfs.replication option means that we eventually want to have this many replicas of each block, but it doesn't need to be done at the write time but could be done asynchronously later by the Namenode? So, essentially, if I want a guarantee that I have one back up of each block at all times, I need to set to dfs.namenode.replication.min=2. And, if I want to make sure that I won't go into safemode on startup too often, I should set dfs.replication = 3 to tolerate one replica loss.
