Re: Ambari 2.0 DECOMMISSION

Sumit Mohanty Thu, 14 May 2015 15:42:57 -0700

?Occasions where I do not see the node go to decommission is when the 
replication factor (dfs.replication) is equal to or greater than the number of 
data nodes that are active.

Hosts get removed from exclude file when the host gets deleted. This was added 
at some point so that when the host is added back the DN can join normally. 
Host component start/stop should not trigger this.

________________________________
From: Greg Hill <[email protected]>
Sent: Thursday, May 14, 2015 11:46 AM
To: [email protected]; Sean Roberts
Subject: Re: Ambari 2.0 DECOMMISSION

Some further testing results:

1. Turning on maintenance mode beforehand didn't seem to affect it.
2. The datanodes do go to decommissioning briefly before they go back to live, 
so it is at least trying to decommission them.  Shouldn't they go to 
'decommissioned' after it finishes though?
3. Some operation I'm doing (either stop host components or deleting host 
components) is causing Ambari to automatically do a request like this for each 
node that's been decommissioned:
Remove host slave-6.local from exclude file
When that's done is when they get marked "dead" by the Namenode.

This worked fine in Ambari 1.7, so I'm guessing the "remove host from exclude 
file" thing is what's breaking it as that's new.  Is there some way to disable 
that?  Can someone explain the rationale behind it?  I'd like to be able to 
remove nodes without having to restart the Namenode.

Greg

From: Greg <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, May 14, 2015 at 10:59 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Sean Roberts 
<[email protected]<mailto:[email protected]>>
Subject: COMMERCIAL:Ambari 2.0 DECOMMISSION

Did anything change with DECOMISSION in the 2.0 release?  The process appears 
to decommission fine (the request completes and says it updated the dfs.exclude 
file), but the datanodes aren't decommissioned and HDFS now says they're dead 
and I need to restart the Namenode.  For YARN, the nodemanagers appear to have 
decommissioned ok and are in decommissioned status, but it says I need to 
restart the resource manager (this didn't used to be the case in 1.7.0).

The only difference is that I don't set maintenance mode on the datanodes until 
after the decommission completes, because that wasn't working for me at one 
point (turns out hitting the API slightly differently would have made it work). 
 Is that the cause maybe?  Is restarting the master services now required after 
a decommission?

Task output:

DataNode Decommission: slave-2.local,slave-4.local

stderr:
None
 stdout:
2015-05-14 14:45:48,439 - u"File['/etc/hadoop/conf/dfs.exclude']" {'owner': 
'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2015-05-14 14:45:48,670 - Writing u"File['/etc/hadoop/conf/dfs.exclude']" 
because contents don't match
2015-05-14 14:45:48,864 - u"Execute['']" {'user': 'hdfs'}
2015-05-14 14:45:48,968 - u"ExecuteHadoop['dfsadmin -refreshNodes']" 
{'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': 
'/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
2015-05-14 14:45:49,011 - u"Execute['hadoop --config /etc/hadoop/conf dfsadmin 
-refreshNodes']" {'logoutput': None, 'try_sleep': 0, 'environment': {}, 
'tries': 1, 'user': 'hdfs', 'path': ['/usr/hdp/current/hadoop-client/bin']}

DataNodes Status3 live / 2 dead / 0 decommissioning

NodeManager Decommission: slave-2.local,slave-4.local

stderr:
None
 stdout:
2015-05-14 14:47:16,491 - u"File['/etc/hadoop/conf/yarn.exclude']" {'owner': 
'yarn', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2015-05-14 14:47:16,866 - Writing u"File['/etc/hadoop/conf/yarn.exclude']" 
because contents don't match
2015-05-14 14:47:17,057 - u"Execute[' yarn --config /etc/hadoop/conf rmadmin 
-refreshNodes']" {'environment': {'PATH': 
'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-resourcemanager/bin'},
 'user': 'yarn'}

NodeManagers Status 3 active / 0 lost / 0 unhealthy / 0 rebooted / 2 
decommissioned

Re: Ambari 2.0 DECOMMISSION

Reply via email to