170MB isn't that large. I'm thinking that the election might be taking some time because of the issue described in ZK-2164. In that jira, the election was failing, but I can see the blocking call causing the election to take longer as well. You may want to test that fix and see if it makes any difference to you.
-Flavio > On 29 Apr 2015, at 19:05, Karol Dudzinski <[email protected]> wrote: > > The snapshots are a bit larger than I'd like (~170MB) but the bit that takes > time is actually before that. > > On a successful election, on the new leader I see the following: > - LEADER ELECTION TOOK: 12345 > - Synchronizing with Follower... > - Sending DIFF or SNAP > > That last bit will take a long time if the snapshot is large, agreed. > However, for me it's arriving at the LEADER ELECTION TOOK bit that takes a > long time. And the bit between that and the Synchronizing often times out > with the messages I described in my previous email. > > Karol > >> On 29 Apr 2015, at 18:46, Flavio Junqueira <[email protected]> >> wrote: >> >> Check the size of your snapshot files. If during election you need to >> transfer snapshots and they are large, then recovery is going to take >> longer. We try to send a diff of the txns when possible, and sending a diff >> is much faster compared to send a full snapshot. Sending a diff isn't always >> possible, depending on how far behind followers compared to the leader, so >> the difference might be due to snapshots and diffs. >> -Flavio >> >> >> On Wednesday, April 29, 2015 6:32 PM, Camille Fournier >> <[email protected]> wrote: >> >> >> >> Don't suppose you could share some snippets of the log messages you're >> seeing a lot? >> >> On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski <[email protected]> >> wrote: >> >>> Did a bit more investigation today though I didn't try the prod snapshot >>> in a test cluster. I did however find a couple of other strange things >>> which I'd initially missed. >>> >>> When the leader of this particular cluster goes for a reboot, I see many >>> notification printouts from FastLeaderElection. On some occasions I've >>> seen nearly 50. In other clusters I usually see 5 or thereabouts. The >>> number of these notifications seems to correlate with how long election >>> took. What could cause it to require exchanging so many messages? >>> >>> This cluster also seems to often timeout while syncing after selecting the >>> leader. On the new leader I see the expected number of "Follower sid: ..." >>> messages but it doesn't always get to the "Synchronizing with Follower" >>> part. On the leader I then get a "Unexpected exception causing shutdown >>> while sock still open" message followed by a GOODBYE for one of the >>> followers. At the same time on that follower I see a "Exception when >>> following the leader" caused by a read timeout in >>> Learner.registerWithLeader. Does anything happen prior to syncing that >>> could take a long time? >>> >>> Thanks, >>> Karol >>> >>>>> On 28 Apr 2015, at 20:22, Karol Dudzinski <[email protected]> >>>> wrote: >>>> >>>> Well these are prod clusters so my ability to experiment is rather >>> limited. I can take a copy of the snapshot and try both 3 node and 5 in a >>> test cluster. >>>> >>>> One thing I forgot to mention is that in most clusters the number of >>> election notification log lines I see is typically, give or take, the same >>> as the number of participants. In this cluster however, it's typically 2 >>> or 3 times as many notifications as the number of participants. >>>> >>>> My gut feeling is it's more likely to be due to load as the 5 node >>> cluster is much busier and the election time has been increasing over time >>> (as has load). I have no idea exactly what load though, whether it's >>> number of clients, frequency of transactions, total data size, etc. I >>> don't understand why though but that may just be my limited knowledge of >>> the election protocol. >>>> >>>> Karol >>>> >>>>> On 28 Apr 2015, at 19:54, Camille Fournier <[email protected]> wrote: >>>>> >>>>> Just out of curiosity, if you start the 5 node cluster up with only 3 of >>>>> the nodes to begin with (like, config 5, but only bring up 3 processes), >>>>> does it speed up the leader election or is it still slow? >>>>> >>>>> C >>>>> >>>>> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski < >>> [email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> We're seeing some rather strange leader election in one of our >>> clusters. >>>>>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log >>> line >>>>>> (and equivalent for the leader) seems to vary hugely. During one >>> rolling >>>>>> reboot, I saw the number reported as small as 39ms and as large as 57 >>>>>> seconds (difference in units is not a typo). The average is just >>> about 10 >>>>>> seconds and std dev also about 10 seconds. So the time taken is not >>> only >>>>>> quite large, it's also very variable. >>>>>> >>>>>> We have other clusters but the average election time in those is in the >>>>>> hundreds of millis with std dev in a similar ballpark. I guess one >>>>>> difference is the "slow" cluster is 5 participants while the others >>> are 3, >>>>>> which may be a factor but I wouldn't expect it to make two orders of >>>>>> magnitude difference! >>>>>> >>>>>> So my question is, what factors contribute to the election time >>> reported >>>>>> by these log lines? And what can we do to speed this up? >>>>>> >>>>>> As far as I understand from logs and a quick browse through the code >>> that >>>>>> time is the time to select a leader. Syncing up to the leader happens >>>>>> after that. The syncing part I can understand will vary depending on >>> load >>>>>> but I don't see why selecting the leader would. >>>>>> >>>>>> Thanks, >>>>>> Karol >> >> >>
