-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46434/#review129882
-----------------------------------------------------------


Fix it, then Ship it!





ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py
 (line 59)
<https://reviews.apache.org/r/46434/#comment193431>

    If the process is running thna the second part after the `or` of the 
condition will never be evaluated.
    
    If the process is down than the second part of the condition will always 
fail, won't it?
    
    So this `if` statement can be reduced to just ```if not 
process_is_running:```
    
    Checking only if the process is running might noit be a robust solution. 
Shouldn't we have a similar approach as for DataNode?



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
 (lines 77 - 90)
<https://reviews.apache.org/r/46434/#comment193432>

    Retry logic is already available through annotations. Please consider if 
here `@safe_retry` or `@retry` can be used. See as an example RangerAdminV2 for 
how this annotations are used.


- Sebastian Toader


On April 21, 2016, 10:33 a.m., Daniel Gergely wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46434/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 10:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Miklos Gergely, Oliver Szabo, 
> Sandor Magyari, and Sebastian Toader.
> 
> 
> Bugs: AMBARI-15991
>     https://issues.apache.org/jira/browse/AMBARI-15991
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> If upgrade process takes longer than expected, DataNode and RegionServer is 
> reported as failed. It happens because it needs more time to finish update.
> 
> The fix for RegionServer checks if the process is running and if it is so, 
> then it is not considered as a failure.
> For DataNode the process is also checked and if it is running then check is 
> repeated 2 times with 5 minutes wait. I had a limitation here, python scripts 
> are allowed to run for 20 minutes by default and this checking takes 16 mins 
> (2 minutes initial check, 5 minutes sleep if there is a failure, 2 minutes 
> regaular check, 5 minutes sleep, 2 minutes final check).
> If more time is needed, then default value of *server.task.timeout* and 
> number of repetition in 5 minutes check should be increased.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py
>  01a8156 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
>  8f36001 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py
>  7ad9f39 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 78b8171 
> 
> Diff: https://reviews.apache.org/r/46434/diff/
> 
> 
> Testing
> -------
> 
> I did manual testing on this:
> For RegionServer the process check is tested.
> For DataNodes I made an intentional exception to see if it keeps waiting. 
> (this is how I ran into the 20 minutes server task timeout)
> 
> ----------------------------------------------------------------------
> Total run:970
> Total errors:0
> Total failures:0
> OK
> 
> 
> Thanks,
> 
> Daniel Gergely
> 
>

Reply via email to