> On April 11, 2017, 9:20 p.m., Alejandro Fernandez wrote:
> > ambari-common/src/main/python/resource_management/libraries/script/script.py
> > Lines 355 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688457#file1688457line358>
> >
> >     Should we have a hard limit, if more than say 5 mins, then abort so we 
> > can avoid an infinite loop.

we still have a STOP command timeout limit. Not sure that adding yet another 
hardcoded timeout is applicable for all cases. Also it would require to fail 
entire STOP command instead of marking it as timed out.


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171595
-----------------------------------------------------------


On April 11, 2017, 6:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 6:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the 
> DataNodes do not shutdown immediately. However, they do de-register from the 
> NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the 
> DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - 
> call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha 
> -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to 
> datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - 
> Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs 
> hdfs://c1ha -D ipc.client.connect.max.retries=5 -D 
> ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] 
> {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it 
> drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that 
> is: we don't want to execute START of still running component again (e.g. 
> during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   
> ambari-common/src/main/python/resource_management/libraries/script/script.py 
> 9a5da04278 
>   
> ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py
>  4716343fb2 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py
>  151e26cace 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
>  b55237dd1f 
>   
> ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py
>  11bbdd8e6b 
>   
> ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py
>  cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Reply via email to