> On June 2, 2016, 10:42 a.m., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/RecoveryManager.py, lines 323-334
> > <https://reviews.apache.org/r/48096/diff/1/?file=1402778#file1402778line323>
> >
> >     This logic is getting a bit "if-elsy". Perhaps a state machine might be 
> > in order here?
> 
> Nahappan Somasundaram wrote:
>     Yes, previously discussed this issue with Sumit. Opened a new JIRA 
> https://issues.apache.org/jira/browse/AMBARI-17069 to fix this.

Thanks for creating the Jira!


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48096/#review135934
-----------------------------------------------------------


On May 31, 2016, 6:23 p.m., Nahappan Somasundaram wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48096/
> -----------------------------------------------------------
> 
> (Updated May 31, 2016, 6:23 p.m.)
> 
> 
> Review request for Ambari, Ajit Kumar, Jonathan Hurley, Sumit Mohanty, and 
> Sid Wagle.
> 
> 
> Bugs: AMBARI-16935
>     https://issues.apache.org/jira/browse/AMBARI-16935
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> AMBARI-16935: Retry and recover from component install failures
> 
> ** Issue **
> 
> There are multiple instances where components end up in INSTALL_FAILED state 
> during cluster setup. Ambari does not retry or recover from INSTALL_FAILED 
> state. 
> 
> Ambari should retry and recover from installation failures.
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py 
> 4a843d840dafd96023ead8b929fef33efcb9fa41 
>   ambari-agent/src/main/python/ambari_agent/RecoveryManager.py 
> 87d9483c634026897629396bb48ec0cbabfcfae6 
>   ambari-agent/src/test/python/ambari_agent/TestRecoveryManager.py 
> ed0fd2fd3cfd37f535fa14f52835ddefd376038b 
> 
> Diff: https://reviews.apache.org/r/48096/diff/
> 
> 
> Testing
> -------
> 
> ** 1. mvn clean install **
> 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Ambari Main ....................................... SUCCESS [8.346s]
> [INFO] Apache Ambari Project POM ......................... SUCCESS [0.036s]
> [INFO] Ambari Web ........................................ SUCCESS [24.196s]
> [INFO] Ambari Views ...................................... SUCCESS [1.370s]
> [INFO] Ambari Admin View ................................. SUCCESS [7.555s]
> [INFO] ambari-metrics .................................... SUCCESS [0.388s]
> [INFO] Ambari Metrics Common ............................. SUCCESS [14.289s]
> [INFO] Ambari Metrics Hadoop Sink ........................ SUCCESS [1.879s]
> [INFO] Ambari Metrics Flume Sink ......................... SUCCESS [0.951s]
> [INFO] Ambari Metrics Kafka Sink ......................... SUCCESS [1.085s]
> [INFO] Ambari Metrics Storm Sink ......................... SUCCESS [2.354s]
> [INFO] Ambari Metrics Collector .......................... SUCCESS [6.883s]
> [INFO] Ambari Metrics Monitor ............................ SUCCESS [2.126s]
> [INFO] Ambari Metrics Grafana ............................ SUCCESS [0.886s]
> [INFO] Ambari Metrics Assembly ........................... SUCCESS [1:15.977s]
> [INFO] Ambari Server ..................................... SUCCESS [3:06.681s]
> [INFO] Ambari Functional Tests ........................... SUCCESS [1.430s]
> [INFO] Ambari Agent ...................................... SUCCESS [30.176s]
> [INFO] Ambari Client ..................................... SUCCESS [0.052s]
> [INFO] Ambari Python Client .............................. SUCCESS [1.129s]
> [INFO] Ambari Groovy Client .............................. SUCCESS [2.394s]
> [INFO] Ambari Shell ...................................... SUCCESS [0.078s]
> [INFO] Ambari Python Shell ............................... SUCCESS [0.858s]
> [INFO] Ambari Groovy Shell ............................... SUCCESS [4.609s]
> [INFO] ambari-logsearch .................................. SUCCESS [0.264s]
> [INFO] Ambari Logsearch Appender ......................... SUCCESS [0.231s]
> [INFO] Ambari Logsearch Solr Client ...................... SUCCESS [4.324s]
> [INFO] Ambari Logsearch Portal ........................... SUCCESS [6.150s]
> [INFO] Ambari Logsearch Log Feeder ....................... SUCCESS [2.309s]
> [INFO] Ambari Logsearch Assembly ......................... SUCCESS [0.101s]
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 6:29.831s
> [INFO] Finished at: Tue May 31 15:21:18 PDT 2016
> [INFO] Final Memory: 294M/1039M
> [INFO] 
> ------------------------------------------------------------------------
> 
> ** 2. mvn test -DskipSurefireTests **
> 
> ----------------------------------------------------------------------
> Ran 261 tests in 6.695s
> 
> OK
> ----------------------------------------------------------------------
> Total run:1052
> Total errors:0
> Total failures:0
> OK
> INFO: AMBARI_SERVER_LIB is not set, using default /usr/lib/ambari-server
> INFO: Return code from stack upgrade command, retcode = 0
> StackAdvisor implementation for stack HDP1, version 2.0.6 was not found
> Returning DefaultStackAdvisor implementation
> StackAdvisor implementation for stack XYZ, version 1.0.0 was loaded
> StackAdvisor implementation for stack XYZ, version 1.0.1 was loaded
> Returning XYZ101StackAdvisor implementation
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 55.370s
> [INFO] Finished at: Tue May 31 15:09:42 PDT 2016
> [INFO] Final Memory: 57M/1010M
> [INFO] 
> ------------------------------------------------------------------------
> 
> ** 3. Manual tests **
> Deployed a single node cluster VM and copied over ActionQueue.py and 
> RecoveryManager.py from the build to the VM. Put in some code in 
> ActionQueue.py to fail randomly on executing an install command. Verified 
> that re-install was attempted when install failed.
> 
> 
> Thanks,
> 
> Nahappan Somasundaram
> 
>

Reply via email to