-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46434/#review129880
-----------------------------------------------------------


Ship it!




Ship It!

- Oliver Szabo


On April 21, 2016, 8:33 a.m., Daniel Gergely wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46434/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 8:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Miklos Gergely, Oliver Szabo, 
> Sandor Magyari, and Sebastian Toader.
> 
> 
> Bugs: AMBARI-15991
>     https://issues.apache.org/jira/browse/AMBARI-15991
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> If upgrade process takes longer than expected, DataNode and RegionServer is 
> reported as failed. It happens because it needs more time to finish update.
> 
> The fix for RegionServer checks if the process is running and if it is so, 
> then it is not considered as a failure.
> For DataNode the process is also checked and if it is running then check is 
> repeated 2 times with 5 minutes wait. I had a limitation here, python scripts 
> are allowed to run for 20 minutes by default and this checking takes 16 mins 
> (2 minutes initial check, 5 minutes sleep if there is a failure, 2 minutes 
> regaular check, 5 minutes sleep, 2 minutes final check).
> If more time is needed, then default value of *server.task.timeout* and 
> number of repetition in 5 minutes check should be increased.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py
>  01a8156 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
>  8f36001 
>   
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py
>  7ad9f39 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 78b8171 
> 
> Diff: https://reviews.apache.org/r/46434/diff/
> 
> 
> Testing
> -------
> 
> I did manual testing on this:
> For RegionServer the process check is tested.
> For DataNodes I made an intentional exception to see if it keeps waiting. 
> (this is how I ran into the 20 minutes server task timeout)
> 
> ----------------------------------------------------------------------
> Total run:970
> Total errors:0
> Total failures:0
> OK
> 
> 
> Thanks,
> 
> Daniel Gergely
> 
>

Reply via email to