Re: Review Request 53430: HDP Upgrade fails when the cluster size is large

Jonathan Hurley Thu, 03 Nov 2016 08:17:05 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53430/#review154733
-----------------------------------------------------------



Ship it!




Ship It!

- Jonathan Hurley


On Nov. 3, 2016, 10:48 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53430/
> -----------------------------------------------------------
> 
> (Updated Nov. 3, 2016, 10:48 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-18786
>     https://issues.apache.org/jira/browse/AMBARI-18786
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during 
> namenode restart.
> 
> This is because, restart command waits for namenode to come out of safemode 
> and if the cluster size is large, namenode takes more time to leave safemode 
> but Ambari marks this action as failure as the namenode didn't leave safemode 
> within the configured timeout in Ambari scripts.
> 
> 
> {code}
> 
> Traceback (most recent call last):
> File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
>  line 42, in get_value_from_jmx
> return data_dict["beans"][0][property]
> IndexError: list index out of range
> Traceback (most recent call last):
> File 
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
>  line 420, in <module>
> NameNode().execute()
> File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 280, in execute
> method(env)
> File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 720, in restart
> self.start(env, upgrade_type=upgrade_type)
> File 
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
>  line 101, in start
> upgrade_suspended=params.upgrade_suspended, env=env)
> File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
> line 89, in thunk
> return fn(*args, **kwargs)
> File 
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
>  line 184, in namenode
> if is_this_namenode_active() is False:
> File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py",
>  line 55, in wrapper
> return function(*args, **kwargs)
> File 
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
>  line 554, in is_this_namenode_active
> raise Fail(format("The NameNode {namenode_id} is not listed as Active or 
> Standby, waiting..."))
> resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as 
> Active or Standby, waiting...
> {code}
> 
> To resolve this, we increased the timeout for ambari
> 
> 1. Increased the timeout in 
> /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
>  from this;
> @retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)
> to this;
> @retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)
> 
> 2. Restart Ambari server
> 
> After this upgrade went through fine.
> 
> I think its better to increase the timeout permanently so that we don't have 
> to deal with this issue again.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
>  db52ee1 
> 
> Diff: https://reviews.apache.org/r/53430/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 53430: HDP Upgrade fails when the cluster size is large

Reply via email to