Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned.
Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
