The snapshots are a bit larger than I'd like (~170MB) but the bit that takes time is actually before that.
On a successful election, on the new leader I see the following: - LEADER ELECTION TOOK: 12345 - Synchronizing with Follower... - Sending DIFF or SNAP That last bit will take a long time if the snapshot is large, agreed. However, for me it's arriving at the LEADER ELECTION TOOK bit that takes a long time. And the bit between that and the Synchronizing often times out with the messages I described in my previous email. Karol > On 29 Apr 2015, at 18:46, Flavio Junqueira <[email protected]> > wrote: > > Check the size of your snapshot files. If during election you need to > transfer snapshots and they are large, then recovery is going to take longer. > We try to send a diff of the txns when possible, and sending a diff is much > faster compared to send a full snapshot. Sending a diff isn't always > possible, depending on how far behind followers compared to the leader, so > the difference might be due to snapshots and diffs. > -Flavio > > > On Wednesday, April 29, 2015 6:32 PM, Camille Fournier > <[email protected]> wrote: > > > > Don't suppose you could share some snippets of the log messages you're > seeing a lot? > > On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski <[email protected]> > wrote: > >> Did a bit more investigation today though I didn't try the prod snapshot >> in a test cluster. I did however find a couple of other strange things >> which I'd initially missed. >> >> When the leader of this particular cluster goes for a reboot, I see many >> notification printouts from FastLeaderElection. On some occasions I've >> seen nearly 50. In other clusters I usually see 5 or thereabouts. The >> number of these notifications seems to correlate with how long election >> took. What could cause it to require exchanging so many messages? >> >> This cluster also seems to often timeout while syncing after selecting the >> leader. On the new leader I see the expected number of "Follower sid: ..." >> messages but it doesn't always get to the "Synchronizing with Follower" >> part. On the leader I then get a "Unexpected exception causing shutdown >> while sock still open" message followed by a GOODBYE for one of the >> followers. At the same time on that follower I see a "Exception when >> following the leader" caused by a read timeout in >> Learner.registerWithLeader. Does anything happen prior to syncing that >> could take a long time? >> >> Thanks, >> Karol >> >>>> On 28 Apr 2015, at 20:22, Karol Dudzinski <[email protected]> >>> wrote: >>> >>> Well these are prod clusters so my ability to experiment is rather >> limited. I can take a copy of the snapshot and try both 3 node and 5 in a >> test cluster. >>> >>> One thing I forgot to mention is that in most clusters the number of >> election notification log lines I see is typically, give or take, the same >> as the number of participants. In this cluster however, it's typically 2 >> or 3 times as many notifications as the number of participants. >>> >>> My gut feeling is it's more likely to be due to load as the 5 node >> cluster is much busier and the election time has been increasing over time >> (as has load). I have no idea exactly what load though, whether it's >> number of clients, frequency of transactions, total data size, etc. I >> don't understand why though but that may just be my limited knowledge of >> the election protocol. >>> >>> Karol >>> >>>> On 28 Apr 2015, at 19:54, Camille Fournier <[email protected]> wrote: >>>> >>>> Just out of curiosity, if you start the 5 node cluster up with only 3 of >>>> the nodes to begin with (like, config 5, but only bring up 3 processes), >>>> does it speed up the leader election or is it still slow? >>>> >>>> C >>>> >>>> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski < >> [email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We're seeing some rather strange leader election in one of our >> clusters. >>>>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log >> line >>>>> (and equivalent for the leader) seems to vary hugely. During one >> rolling >>>>> reboot, I saw the number reported as small as 39ms and as large as 57 >>>>> seconds (difference in units is not a typo). The average is just >> about 10 >>>>> seconds and std dev also about 10 seconds. So the time taken is not >> only >>>>> quite large, it's also very variable. >>>>> >>>>> We have other clusters but the average election time in those is in the >>>>> hundreds of millis with std dev in a similar ballpark. I guess one >>>>> difference is the "slow" cluster is 5 participants while the others >> are 3, >>>>> which may be a factor but I wouldn't expect it to make two orders of >>>>> magnitude difference! >>>>> >>>>> So my question is, what factors contribute to the election time >> reported >>>>> by these log lines? And what can we do to speed this up? >>>>> >>>>> As far as I understand from logs and a quick browse through the code >> that >>>>> time is the time to select a leader. Syncing up to the leader happens >>>>> after that. The syncing part I can understand will vary depending on >> load >>>>> but I don't see why selecting the leader would. >>>>> >>>>> Thanks, >>>>> Karol > > >
