> On July 28, 2017, 5:02 a.m., Sumit Mohanty wrote:
> > ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java
> > Lines 147 (patched)
> > <https://reviews.apache.org/r/61203/diff/1/?file=1785078#file1785078line151>
> >
> >     Should we add a randomness to sleep (say 5 + random value between 0-5) 
> > so that both instances do not retry at the same time?

Both the collector instances should not error out and come to this step. Since 
one of them will be creating the znode undisturbed and the other one will sleep 
and retry until the first one completes the work.


- Aravindan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61203/#review181646
-----------------------------------------------------------


On July 28, 2017, 4:50 a.m., Aravindan Vijayan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61203/
> -----------------------------------------------------------
> 
> (Updated July 28, 2017, 4:50 a.m.)
> 
> 
> Review request for Ambari, Dmytro Sen, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-21593
>     https://issues.apache.org/jira/browse/AMBARI-21593
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> PROBLEM
> When 2 metric collectors are started up simultaneously, both of them fail to 
> start.
> 
> BUG
> There exists a race condition in the Metric Collector HA controller 
> initialization which was introduced through AMBARI-20179Link. When a helix 
> controller instance finds that the /ambari-metrics-collector znode exists but 
> a child node does not exists, it deletes the entire znode and recreates. If 
> another controller instance also initializes simultaneously, a race condition 
> can occur wherein each instance will end up cancelling the effort of the 
> other.
> 
> FIX
> Do not delete and recreate the znode. Wait and retry for a few seconds to 
> check if /ambari-metrics-collector was fully initailized.
> 
> 
> Diffs
> -----
> 
>   
> ambari-metrics/ambari-metrics-timelineservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/metrics/timeline/availability/MetricCollectorHAController.java
>  53e6304 
> 
> 
> Diff: https://reviews.apache.org/r/61203/diff/1/
> 
> 
> Testing
> -------
> 
> Manually tested.
> 
> 
> Thanks,
> 
> Aravindan Vijayan
> 
>

Reply via email to