-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61203/
-----------------------------------------------------------
(Updated July 31, 2017, 4:25 p.m.)
Review request for Ambari, Dmytro Sen, Sumit Mohanty, and Sid Wagle.
Bugs: AMBARI-21593
https://issues.apache.org/jira/browse/AMBARI-21593
Repository: ambari
Description
-------
PROBLEM
When 2 metric collectors are started up simultaneously, both of them fail to
start.
BUG
There exists a race condition in the Metric Collector HA controller
initialization which was introduced through AMBARI-20179Link. When a helix
controller instance finds that the /ambari-metrics-collector znode exists but a
child node does not exists, it deletes the entire znode and recreates. If
another controller instance also initializes simultaneously, a race condition
can occur wherein each instance will end up cancelling the effort of the other.
FIX
Do not delete and recreate the znode. Wait and retry for a few seconds to check
if /ambari-metrics-collector was fully initailized.
Diffs (updated)
-----
ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java
53e6304
Diff: https://reviews.apache.org/r/61203/diff/2/
Changes: https://reviews.apache.org/r/61203/diff/1-2/
Testing
-------
Manually tested.
Thanks,
Aravindan Vijayan