Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Alejandro Fernandez Wed, 05 Apr 2017 10:30:14 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171147
-----------------------------------------------------------





ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
Line 113 (original), 114 (patched)
<https://reviews.apache.org/r/58208/#comment244035>

    Should we catch a specific exception here that means it was deregistered? 
Catching any seems too wide a net.



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
Line 117 (original), 126 (patched)
<https://reviews.apache.org/r/58208/#comment244036>

    Technically, this function could also be called outside of an upgrade, so 
the message shouldn't really know it's "upgrade".
    
    Does it make sense to also call this function during any restart command?


- Alejandro Fernandez


On April 5, 2017, 12:27 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 12:27 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the 
> DataNodes do not shutdown immediately. However, they do de-register from the 
> NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the 
> DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - 
> call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha 
> -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to 
> datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - 
> Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs 
> hdfs://c1ha -D ipc.client.connect.max.retries=5 -D 
> ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] 
> {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it 
> drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
>  b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Reply via email to