Hi,
I think i have faced this before, the problem is that you have the rep
factor=3 so it seems to hang because it needs 3 nodes to achieve the
factor (replicas are not created on the same node). If you set the
replication factor=2 i think you will not have this issue. So in general
you must make sure that the rep factor is <= to the available datanodes.
BR,
George
On 6/21/2013 12:29 PM, sam liu wrote:
Hi,
I encountered an issue which hangs the decommission operatoin. Its steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
in hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change
the 'dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result: decommission progress hangs and the status always be
'Waiting DataNode status: Decommissioned'
However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode.
Also, after step 2, I noticed that some block's expected replicas is
3, but the 'dfs.replication' value in hdfs-site.xml is always 2!
Could anyone pls help provide some triages?
Thanks in advance!
--
---------------------------
George Kousiouris, PhD
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: [email protected]
Site: http://users.ntua.gr/gkousiou/
National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece