----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53430/#review154733 -----------------------------------------------------------
Ship it! Ship It! - Jonathan Hurley On Nov. 3, 2016, 10:48 a.m., Dmitro Lisnichenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53430/ > ----------------------------------------------------------- > > (Updated Nov. 3, 2016, 10:48 a.m.) > > > Review request for Ambari, Jonathan Hurley and Nate Cole. > > > Bugs: AMBARI-18786 > https://issues.apache.org/jira/browse/AMBARI-18786 > > > Repository: ambari > > > Description > ------- > > Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during > namenode restart. > > This is because, restart command waits for namenode to come out of safemode > and if the cluster size is large, namenode takes more time to leave safemode > but Ambari marks this action as failure as the namenode didn't leave safemode > within the configured timeout in Ambari scripts. > > > {code} > > Traceback (most recent call last): > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", > line 42, in get_value_from_jmx > return data_dict["beans"][0][property] > IndexError: list index out of range > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", > line 420, in <module> > NameNode().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 280, in execute > method(env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 720, in restart > self.start(env, upgrade_type=upgrade_type) > File > "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", > line 101, in start > upgrade_suspended=params.upgrade_suspended, env=env) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", > line 184, in namenode > if is_this_namenode_active() is False: > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", > line 55, in wrapper > return function(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", > line 554, in is_this_namenode_active > raise Fail(format("The NameNode {namenode_id} is not listed as Active or > Standby, waiting...")) > resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as > Active or Standby, waiting... > {code} > > To resolve this, we increased the timeout for ambari > > 1. Increased the timeout in > /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py > from this; > @retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail) > to this; > @retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail) > > 2. Restart Ambari server > > After this upgrade went through fine. > > I think its better to increase the timeout permanently so that we don't have > to deal with this issue again. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py > db52ee1 > > Diff: https://reviews.apache.org/r/53430/diff/ > > > Testing > ------- > > mvn clean test > > > Thanks, > > Dmitro Lisnichenko > >
