But, please mention that the value of 'dfs.replication' of the cluster is always 2, even when the datanode number is 3. And I am pretty sure I did not manually create any files with rep=3. So, why were some files of hdfs created with repl=3, but not repl=2?
2013/8/1 Harsh J <[email protected]> > The step (a) points to your problem and solution both. You have files > being created with repl=3 on a 2 DN cluster which will prevent > decommission. This is not a bug. > > On Wed, Jul 31, 2013 at 12:09 PM, sam liu <[email protected]> wrote: > > I opened a jira for tracking this issue: > > https://issues.apache.org/jira/browse/HDFS-5046 > > > > > > 2013/7/2 sam liu <[email protected]> > >> > >> Yes, the default replication factor is 3. However, in my case, it's > >> strange: during decommission hangs, I found some block's expected > replicas > >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster > node > >> is always 2 from the beginning of cluster setup. Below is my steps: > >> > >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, > in > >> hdfs-site.xml, set the 'dfs.replication' to 2 > >> 2. Add node dn3 into the cluster as a new datanode, and did not change > the > >> 'dfs.replication' value in hdfs-site.xml and keep it as 2 > >> note: step 2 passed > >> 3. Decommission dn3 from the cluster > >> Expected result: dn3 could be decommissioned successfully > >> Actual result: > >> a). decommission progress hangs and the status always be 'Waiting > DataNode > >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', > the > >> decommission continues and will be completed finally. > >> b). However, if the initial cluster includes >= 3 datanodes, this issue > >> won't be encountered when add/remove another datanode. For example, if I > >> setup a cluster with 3 datanodes, and then I can successfully add the > 4th > >> datanode into it, and then also can successfully remove the 4th datanode > >> from the cluster. > >> > >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any > >> comments? > >> > >> Thanks! > >> > >> > >> 2013/6/21 Harsh J <[email protected]> > >>> > >>> The dfs.replication is a per-file parameter. If you have a client that > >>> does not use the supplied configs, then its default replication is 3 > >>> and all files it will create (as part of the app or via a job config) > >>> will be with replication factor 3. > >>> > >>> You can do an -lsr to find all files and filter which ones have been > >>> created with a factor of 3 (versus expected config of 2). > >>> > >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <[email protected]> > wrote: > >>> > Hi George, > >>> > > >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. > But > >>> > still > >>> > encounter this issue. > >>> > > >>> > Thanks! > >>> > > >>> > > >>> > 2013/6/21 George Kousiouris <[email protected]> > >>> >> > >>> >> > >>> >> Hi, > >>> >> > >>> >> I think i have faced this before, the problem is that you have the > rep > >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the > >>> >> factor > >>> >> (replicas are not created on the same node). If you set the > >>> >> replication > >>> >> factor=2 i think you will not have this issue. So in general you > must > >>> >> make > >>> >> sure that the rep factor is <= to the available datanodes. > >>> >> > >>> >> BR, > >>> >> George > >>> >> > >>> >> > >>> >> On 6/21/2013 12:29 PM, sam liu wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> I encountered an issue which hangs the decommission operatoin. Its > >>> >> steps: > >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. > And, > >>> >> in > >>> >> hdfs-site.xml, set the 'dfs.replication' to 2 > >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not > change > >>> >> the > >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2 > >>> >> note: step 2 passed > >>> >> 3. Decommission dn3 from the cluster > >>> >> > >>> >> Expected result: dn3 could be decommissioned successfully > >>> >> > >>> >> Actual result: decommission progress hangs and the status always be > >>> >> 'Waiting DataNode status: Decommissioned' > >>> >> > >>> >> However, if the initial cluster includes >= 3 datanodes, this issue > >>> >> won't > >>> >> be encountered when add/remove another datanode. > >>> >> > >>> >> Also, after step 2, I noticed that some block's expected replicas is > >>> >> 3, > >>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2! > >>> >> > >>> >> Could anyone pls help provide some triages? > >>> >> > >>> >> Thanks in advance! > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> --------------------------- > >>> >> > >>> >> George Kousiouris, PhD > >>> >> Electrical and Computer Engineer > >>> >> Division of Communications, > >>> >> Electronics and Information Engineering > >>> >> School of Electrical and Computer Engineering > >>> >> Tel: +30 210 772 2546 > >>> >> Mobile: +30 6939354121 > >>> >> Fax: +30 210 772 2569 > >>> >> Email: [email protected] > >>> >> Site: http://users.ntua.gr/gkousiou/ > >>> >> > >>> >> National Technical University of Athens > >>> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Harsh J > >> > >> > > > > > > -- > Harsh J >
