-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46434/
-----------------------------------------------------------

(Updated ápr. 22, 2016, 12:42 du)


Review request for Ambari, Alejandro Fernandez, Miklos Gergely, Oliver Szabo, 
Sandor Magyari, and Sebastian Toader.


Changes
-------

Review fixes


Bugs: AMBARI-15991
    https://issues.apache.org/jira/browse/AMBARI-15991


Repository: ambari


Description
-------

If upgrade process takes longer than expected, DataNode and RegionServer is 
reported as failed. It happens because it needs more time to finish update.

The fix for RegionServer checks if the process is running and if it is so, then 
it is not considered as a failure.
For DataNode the process is also checked and if it is running then check is 
repeated 2 times with 5 minutes wait. I had a limitation here, python scripts 
are allowed to run for 20 minutes by default and this checking takes 16 mins (2 
minutes initial check, 5 minutes sleep if there is a failure, 2 minutes 
regaular check, 5 minutes sleep, 2 minutes final check).
If more time is needed, then default value of *server.task.timeout* and number 
of repetition in 5 minutes check should be increased.


Diffs (updated)
-----

  
ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py
 01a8156 
  
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
 8f36001 
  
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py
 7ad9f39 
  ambari-server/src/test/python/stacks/2.0.6/HBASE/test_hbase_regionserver.py 
8d187ec 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 78b8171 

Diff: https://reviews.apache.org/r/46434/diff/


Testing
-------

I did manual testing on this:
For RegionServer the process check is tested.
For DataNodes I made an intentional exception to see if it keeps waiting. (this 
is how I ran into the 20 minutes server task timeout)

----------------------------------------------------------------------
Total run:970
Total errors:0
Total failures:0
OK


Thanks,

Daniel Gergely

Reply via email to