Just out of curiosity, if you start the 5 node cluster up with only 3 of the nodes to begin with (like, config 5, but only bring up 3 processes), does it speed up the leader election or is it still slow?
C On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski <[email protected]> wrote: > Hi, > > We're seeing some rather strange leader election in one of our clusters. > The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log line > (and equivalent for the leader) seems to vary hugely. During one rolling > reboot, I saw the number reported as small as 39ms and as large as 57 > seconds (difference in units is not a typo). The average is just about 10 > seconds and std dev also about 10 seconds. So the time taken is not only > quite large, it's also very variable. > > We have other clusters but the average election time in those is in the > hundreds of millis with std dev in a similar ballpark. I guess one > difference is the "slow" cluster is 5 participants while the others are 3, > which may be a factor but I wouldn't expect it to make two orders of > magnitude difference! > > So my question is, what factors contribute to the election time reported > by these log lines? And what can we do to speed this up? > > As far as I understand from logs and a quick browse through the code that > time is the time to select a leader. Syncing up to the leader happens > after that. The syncing part I can understand will vary depending on load > but I don't see why selecting the leader would. > > Thanks, > Karol
