----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61203/#review181647 -----------------------------------------------------------
ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java Lines 137 (patched) <https://reviews.apache.org/r/61203/#comment257276> So what happens if one collector only created partial structure? This situation would require restart to get otu of. - Sid Wagle On July 28, 2017, 4:50 a.m., Aravindan Vijayan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/61203/ > ----------------------------------------------------------- > > (Updated July 28, 2017, 4:50 a.m.) > > > Review request for Ambari, Dmytro Sen, Sumit Mohanty, and Sid Wagle. > > > Bugs: AMBARI-21593 > https://issues.apache.org/jira/browse/AMBARI-21593 > > > Repository: ambari > > > Description > ------- > > PROBLEM > When 2 metric collectors are started up simultaneously, both of them fail to > start. > > BUG > There exists a race condition in the Metric Collector HA controller > initialization which was introduced through AMBARI-20179Link. When a helix > controller instance finds that the /ambari-metrics-collector znode exists but > a child node does not exists, it deletes the entire znode and recreates. If > another controller instance also initializes simultaneously, a race condition > can occur wherein each instance will end up cancelling the effort of the > other. > > FIX > Do not delete and recreate the znode. Wait and retry for a few seconds to > check if /ambari-metrics-collector was fully initailized. > > > Diffs > ----- > > > ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java > 53e6304 > > > Diff: https://reviews.apache.org/r/61203/diff/1/ > > > Testing > ------- > > Manually tested. > > > Thanks, > > Aravindan Vijayan > >