----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46496/ -----------------------------------------------------------
(Updated April 21, 2016, 5:51 p.m.) Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle. Changes ------- Address review comments. Bugs: AMBARI-16013 https://issues.apache.org/jira/browse/AMBARI-16013 Repository: ambari Description ------- When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts. `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection. Diffs (updated) ----- ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 Diff: https://reviews.apache.org/r/46496/diff/ Testing ------- Manual testign with a 5 node cluster using Blueprints. Unit tests: Results : Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36 Thanks, Sebastian Toader