Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Dmitro Lisnichenko Wed, 05 Apr 2017 09:23:58 -0700


> On April 5, 2017, 4:10 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
> > Lines 51-59 (original), 51-59 (patched)
> > <https://reviews.apache.org/r/58208/diff/1/?file=1685218#file1685218line51>
> >
> >     Let's say that this fails to stop gracefully on the de-register. The 
> > code which calls this is looking for a boolean to be returned:
> >     
> >     ```
> >           stopped = 
> > datanode_upgrade.pre_rolling_upgrade_shutdown(hdfs_binary)
> >           if not stopped:
> >             datanode(action="stop")
> >     ```
> >     
> >     Should we try/catch `_check_datanode_shutdown` and return False if it 
> > fails?
> 
> Dmitro Lisnichenko wrote:
>     this try-catch whould shadow a failure that is covered by 2 our tests


* would


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171118
-----------------------------------------------------------


On April 5, 2017, 3:27 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 3:27 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the 
> DataNodes do not shutdown immediately. However, they do de-register from the 
> NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the 
> DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - 
> call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha 
> -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to 
> datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - 
> Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs 
> hdfs://c1ha -D ipc.client.connect.max.retries=5 -D 
> ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] 
> {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it 
> drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
>  b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Reply via email to