> On June 2, 2016, 10:42 a.m., Jonathan Hurley wrote: > > ambari-agent/src/main/python/ambari_agent/RecoveryManager.py, lines 323-334 > > <https://reviews.apache.org/r/48096/diff/1/?file=1402778#file1402778line323> > > > > This logic is getting a bit "if-elsy". Perhaps a state machine might be > > in order here? > > Nahappan Somasundaram wrote: > Yes, previously discussed this issue with Sumit. Opened a new JIRA > https://issues.apache.org/jira/browse/AMBARI-17069 to fix this.
Thanks for creating the Jira! - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48096/#review135934 ----------------------------------------------------------- On May 31, 2016, 6:23 p.m., Nahappan Somasundaram wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48096/ > ----------------------------------------------------------- > > (Updated May 31, 2016, 6:23 p.m.) > > > Review request for Ambari, Ajit Kumar, Jonathan Hurley, Sumit Mohanty, and > Sid Wagle. > > > Bugs: AMBARI-16935 > https://issues.apache.org/jira/browse/AMBARI-16935 > > > Repository: ambari > > > Description > ------- > > AMBARI-16935: Retry and recover from component install failures > > ** Issue ** > > There are multiple instances where components end up in INSTALL_FAILED state > during cluster setup. Ambari does not retry or recover from INSTALL_FAILED > state. > > Ambari should retry and recover from installation failures. > > > Diffs > ----- > > ambari-agent/src/main/python/ambari_agent/ActionQueue.py > 4a843d840dafd96023ead8b929fef33efcb9fa41 > ambari-agent/src/main/python/ambari_agent/RecoveryManager.py > 87d9483c634026897629396bb48ec0cbabfcfae6 > ambari-agent/src/test/python/ambari_agent/TestRecoveryManager.py > ed0fd2fd3cfd37f535fa14f52835ddefd376038b > > Diff: https://reviews.apache.org/r/48096/diff/ > > > Testing > ------- > > ** 1. mvn clean install ** > > [INFO] > ------------------------------------------------------------------------ > [INFO] Reactor Summary: > [INFO] > [INFO] Ambari Main ....................................... SUCCESS [8.346s] > [INFO] Apache Ambari Project POM ......................... SUCCESS [0.036s] > [INFO] Ambari Web ........................................ SUCCESS [24.196s] > [INFO] Ambari Views ...................................... SUCCESS [1.370s] > [INFO] Ambari Admin View ................................. SUCCESS [7.555s] > [INFO] ambari-metrics .................................... SUCCESS [0.388s] > [INFO] Ambari Metrics Common ............................. SUCCESS [14.289s] > [INFO] Ambari Metrics Hadoop Sink ........................ SUCCESS [1.879s] > [INFO] Ambari Metrics Flume Sink ......................... SUCCESS [0.951s] > [INFO] Ambari Metrics Kafka Sink ......................... SUCCESS [1.085s] > [INFO] Ambari Metrics Storm Sink ......................... SUCCESS [2.354s] > [INFO] Ambari Metrics Collector .......................... SUCCESS [6.883s] > [INFO] Ambari Metrics Monitor ............................ SUCCESS [2.126s] > [INFO] Ambari Metrics Grafana ............................ SUCCESS [0.886s] > [INFO] Ambari Metrics Assembly ........................... SUCCESS [1:15.977s] > [INFO] Ambari Server ..................................... SUCCESS [3:06.681s] > [INFO] Ambari Functional Tests ........................... SUCCESS [1.430s] > [INFO] Ambari Agent ...................................... SUCCESS [30.176s] > [INFO] Ambari Client ..................................... SUCCESS [0.052s] > [INFO] Ambari Python Client .............................. SUCCESS [1.129s] > [INFO] Ambari Groovy Client .............................. SUCCESS [2.394s] > [INFO] Ambari Shell ...................................... SUCCESS [0.078s] > [INFO] Ambari Python Shell ............................... SUCCESS [0.858s] > [INFO] Ambari Groovy Shell ............................... SUCCESS [4.609s] > [INFO] ambari-logsearch .................................. SUCCESS [0.264s] > [INFO] Ambari Logsearch Appender ......................... SUCCESS [0.231s] > [INFO] Ambari Logsearch Solr Client ...................... SUCCESS [4.324s] > [INFO] Ambari Logsearch Portal ........................... SUCCESS [6.150s] > [INFO] Ambari Logsearch Log Feeder ....................... SUCCESS [2.309s] > [INFO] Ambari Logsearch Assembly ......................... SUCCESS [0.101s] > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD SUCCESS > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 6:29.831s > [INFO] Finished at: Tue May 31 15:21:18 PDT 2016 > [INFO] Final Memory: 294M/1039M > [INFO] > ------------------------------------------------------------------------ > > ** 2. mvn test -DskipSurefireTests ** > > ---------------------------------------------------------------------- > Ran 261 tests in 6.695s > > OK > ---------------------------------------------------------------------- > Total run:1052 > Total errors:0 > Total failures:0 > OK > INFO: AMBARI_SERVER_LIB is not set, using default /usr/lib/ambari-server > INFO: Return code from stack upgrade command, retcode = 0 > StackAdvisor implementation for stack HDP1, version 2.0.6 was not found > Returning DefaultStackAdvisor implementation > StackAdvisor implementation for stack XYZ, version 1.0.0 was loaded > StackAdvisor implementation for stack XYZ, version 1.0.1 was loaded > Returning XYZ101StackAdvisor implementation > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD SUCCESS > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 55.370s > [INFO] Finished at: Tue May 31 15:09:42 PDT 2016 > [INFO] Final Memory: 57M/1010M > [INFO] > ------------------------------------------------------------------------ > > ** 3. Manual tests ** > Deployed a single node cluster VM and copied over ActionQueue.py and > RecoveryManager.py from the build to the VM. Put in some code in > ActionQueue.py to fail randomly on executing an install command. Verified > that re-install was attempted when install failed. > > > Thanks, > > Nahappan Somasundaram > >