----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61203/ -----------------------------------------------------------
Review request for Ambari, Dmytro Sen, Sumit Mohanty, and Sid Wagle. Bugs: AMBARI-21593 https://issues.apache.org/jira/browse/AMBARI-21593 Repository: ambari Description ------- PROBLEM When 2 metric collectors are started up simultaneously, both of them fail to start. BUG There exists a race condition in the Metric Collector HA controller initialization which was introduced through AMBARI-20179Link. When a helix controller instance finds that the /ambari-metrics-collector znode exists but a child node does not exists, it deletes the entire znode and recreates. If another controller instance also initializes simultaneously, a race condition can occur wherein each instance will end up cancelling the effort of the other. FIX Do not delete and recreate the znode. Wait and retry for a few seconds to check if /ambari-metrics-collector was fully initailized. Diffs ----- ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java 53e6304 Diff: https://reviews.apache.org/r/61203/diff/1/ Testing ------- Manually tested. Thanks, Aravindan Vijayan