----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46434/#review130144 -----------------------------------------------------------
Fix it, then Ship it! ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py (line 46) <https://reviews.apache.org/r/46434/#comment193825> Can we change the sleep time to 30 and retries to 20? - Alejandro Fernandez On April 22, 2016, 12:42 p.m., Daniel Gergely wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46434/ > ----------------------------------------------------------- > > (Updated April 22, 2016, 12:42 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Miklos Gergely, Oliver Szabo, > Sandor Magyari, and Sebastian Toader. > > > Bugs: AMBARI-15991 > https://issues.apache.org/jira/browse/AMBARI-15991 > > > Repository: ambari > > > Description > ------- > > If upgrade process takes longer than expected, DataNode and RegionServer is > reported as failed. It happens because it needs more time to finish update. > > The fix for RegionServer checks if the process is running and if it is so, > then it is not considered as a failure. > For DataNode the process is also checked and if it is running then check is > repeated 2 times with 5 minutes wait. I had a limitation here, python scripts > are allowed to run for 20 minutes by default and this checking takes 16 mins > (2 minutes initial check, 5 minutes sleep if there is a failure, 2 minutes > regaular check, 5 minutes sleep, 2 minutes final check). > If more time is needed, then default value of *server.task.timeout* and > number of repetition in 5 minutes check should be increased. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py > 01a8156 > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py > 8f36001 > > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py > 7ad9f39 > ambari-server/src/test/python/stacks/2.0.6/HBASE/test_hbase_regionserver.py > 8d187ec > ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 78b8171 > > Diff: https://reviews.apache.org/r/46434/diff/ > > > Testing > ------- > > I did manual testing on this: > For RegionServer the process check is tested. > For DataNodes I made an intentional exception to see if it keeps waiting. > (this is how I ran into the 20 minutes server task timeout) > > ---------------------------------------------------------------------- > Total run:970 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Daniel Gergely > >