----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46496/#review129906 -----------------------------------------------------------
ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java (line 395) <https://reviews.apache.org/r/46496/#comment193458> Add the host to the log message - Laszlo Puskas On April 21, 2016, 3:19 p.m., Sebastian Toader wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46496/ > ----------------------------------------------------------- > > (Updated April 21, 2016, 3:19 p.m.) > > > Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, > Sumit Mohanty, and Sid Wagle. > > > Bugs: AMBARI-16013 > https://issues.apache.org/jira/browse/AMBARI-16013 > > > Repository: ambari > > > Description > ------- > > When hosts register to Ambari server the `TopologyManager` adds these to its > `availableHosts` collection. When a cluster is provisioned using Blueprints > `TopologyManager` tries to allocate required hosts to hostgroups from the > available hosts collection. In case hosts turned into HEARTBEAT_LOST state > these were not removed from `availableHosts` this resulting scheduling > logical tasks to unreachable hosts. When these unreachable hosts become > available re-register with Ambari server. The server since already scheduled > logical tasks for these it won't try again thus will never create role > commands to be executed by the hosts. > > `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition > to remove the host in question from its internal `availableHosts` collection. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java > d221112 > > ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java > 5a0aca0 > > Diff: https://reviews.apache.org/r/46496/diff/ > > > Testing > ------- > > Manual testign with a 5 node cluster using Blueprints. > > Unit tests: > Results : > > Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36 > > > Thanks, > > Sebastian Toader > >