-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/
-----------------------------------------------------------

(Updated April 21, 2016, 5:51 p.m.)


Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit 
Mohanty, and Sid Wagle.


Changes
-------

Address review comments.


Bugs: AMBARI-16013
    https://issues.apache.org/jira/browse/AMBARI-16013


Repository: ambari


Description
-------

When hosts register to Ambari server the `TopologyManager` adds these to its 
`availableHosts` collection. When a cluster is provisioned using Blueprints 
`TopologyManager` tries to allocate required hosts to hostgroups from the 
available hosts collection. In case hosts turned into HEARTBEAT_LOST state 
these were not removed from `availableHosts` this resulting scheduling logical 
tasks to unreachable hosts. When these unreachable hosts become available 
re-register with Ambari server. The server since already scheduled logical 
tasks for these it won't try again thus will never create role commands to be 
executed by the hosts.

`TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to 
remove the host in question from its internal `availableHosts` collection.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java 
d221112 
  
ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java
 5a0aca0 

Diff: https://reviews.apache.org/r/46496/diff/


Testing
-------

Manual testign with a 5 node cluster using Blueprints.

Unit tests:
Results :

Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36


Thanks,

Sebastian Toader

Reply via email to