Leader election duration

Karol Dudzinski Tue, 28 Apr 2015 10:42:30 -0700

Hi,

We're seeing some rather strange leader election in one of our clusters.  The 
duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log line (and 
equivalent for the leader) seems to vary hugely.  During one rolling reboot, I 
saw the number reported as small as 39ms and as large as 57 seconds (difference 
in units is not a typo).  The average is just about 10 seconds and std dev also 
about 10 seconds.  So the time taken is not only quite large, it's also very 
variable.


We have other clusters but the average election time in those is in the 
hundreds of millis with std dev in a similar ballpark.  I guess one difference 
is the "slow" cluster is 5 participants while the others are 3, which may be a 
factor but I wouldn't expect it to make two orders of magnitude difference!

So my question is, what factors contribute to the election time reported by 
these log lines? And what can we do to speed this up?

As far as I understand from logs and a quick browse through the code that time 
is the time to select a leader.  Syncing up to the leader happens after that.  
The syncing part I can understand will vary depending on load but I don't see 
why selecting the leader would.

Thanks,
Karol

Leader election duration

Reply via email to