version: hadoop-2.2.0

There were 13 nodes in our hdfs cluster. We wanted to decommission 7 nodes. We 
used two methods as follow:

Method 1:
At the beginning, we set the dfs.hosts.exclude parameter and successfully 
decommissioned 7 nodes, so there were many Under-Replicated blocks need to 
replicate. However, it spent about 20 hours and the replication didn’t finish 
yet. We observed the speed of replication is very slow.

Method 2:
Later, we gave up the method, and used another method of stopping datanode node 
by node. We stopped one datanode. When replication of Under-Replicated blocks 
of the node finished, we continued to stop another datanode till 7 nodes were 
stopped. It spent about 12 hours and the speed of replication is obviously much 
faster the method 1.

We thought method 1 should be faster method 2. But factually, method 2 is much 
faster than method 1. Why ?

Reply via email to