-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61203/
-----------------------------------------------------------

Review request for Ambari, Dmytro Sen, Sumit Mohanty, and Sid Wagle.


Bugs: AMBARI-21593
    https://issues.apache.org/jira/browse/AMBARI-21593


Repository: ambari


Description
-------

PROBLEM
When 2 metric collectors are started up simultaneously, both of them fail to 
start.

BUG
There exists a race condition in the Metric Collector HA controller 
initialization which was introduced through AMBARI-20179Link. When a helix 
controller instance finds that the /ambari-metrics-collector znode exists but a 
child node does not exists, it deletes the entire znode and recreates. If 
another controller instance also initializes simultaneously, a race condition 
can occur wherein each instance will end up cancelling the effort of the other.

FIX
Do not delete and recreate the znode. Wait and retry for a few seconds to check 
if /ambari-metrics-collector was fully initailized.


Diffs
-----

  
ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java
 53e6304 


Diff: https://reviews.apache.org/r/61203/diff/1/


Testing
-------

Manually tested.


Thanks,

Aravindan Vijayan

Reply via email to