version: hadoop-2.2.0 There were 13 nodes in our hdfs cluster. We wanted to decommission 7 nodes. We used two methods as follow:
Method 1: At the beginning, we set the dfs.hosts.exclude parameter and successfully decommissioned 7 nodes, so there were many Under-Replicated blocks need to replicate. However, it spent about 20 hours and the replication didn’t finish yet. We observed the speed of replication is very slow. Method 2: Later, we gave up the method, and used another method of stopping datanode node by node. We stopped one datanode. When replication of Under-Replicated blocks of the node finished, we continued to stop another datanode till 7 nodes were stopped. It spent about 12 hours and the speed of replication is obviously much faster the method 1. We thought method 1 should be faster method 2. But factually, method 2 is much faster than method 1. Why ?
