There is a plan to work on this optimization ZOOKEEPER-1674. -- Thawan Kooburat
On 7/16/13 1:37 PM, "kishore g" <[email protected]> wrote: >All servers in the quorum reading the snapshot from disk as part of the >synchronization phase. From Thawan's email it looks like when ever there >is >a leader election, all zk servers read the snapshot from disk. I am not >sure why all servers should reload the snapshot from disk as this >increases >unavailability time. > > >On Tue, Jul 16, 2013 at 12:35 PM, Flavio Junqueira ><[email protected]>wrote: > >> The synchronization phase is part of the protocol and we use it to >> guarantee that we expose a consistent view of the state. During the >> synchronization phase, servers do not accept requests. >> >> Which behavior are you proposing we change, Kishore? >> >> -Flavio >> >> On Jul 16, 2013, at 7:04 PM, kishore g <[email protected]> wrote: >> >> > Thanks for clarification Flavio. Does this mean during the leader >> election, >> > both reads and writes are not supported?. Do we start a separate >> > thread/jira of changing this behavior?. >> > >> > thanks, >> > Kishore G >> > >> > >> > On Tue, Jul 16, 2013 at 9:16 AM, Flavio Junqueira >><[email protected] >> >wrote: >> > >> >> The disk state should be the authoritative state of a server, so if I >> >> remember correctly, we load the database as a way of validating the >>disk >> >> state. I don't claim that this is strictly necessary, but if we are >>to >> >> change it, then I would need to think this through. >> >> >> >> About leader election, if a leader loses support from a quorum of >> >> followers, >> >> then it will drop leadership. Any event that causes a follower to >>stop >> >> receiving messages from the leader or the follower to disconnect from >> the >> >> leader will make it stop supporting the current leader. >> >> >> >> -Flavio >> >> >> >> -----Original Message----- >> >> From: Sergey Maslyakov [mailto:[email protected]] >> >> Sent: 16 July 2013 16:16 >> >> To: [email protected] >> >> Subject: Re: Maximum size of a snapshot >> >> >> >> And another extension on top of Kishore's question: do the >>reelections >> >> happen if the previously elected leader remains in the cluster? In >>other >> >> words, what events can trigger re-election and the corresponding >> temporary >> >> degradation of the service provided by Zookeeper? >> >> >> >> >> >> Thank you, >> >> /Sergey >> >> >> >> >> >> On Tue, Jul 16, 2013 at 2:21 AM, kishore g <[email protected]> >>wrote: >> >> >> >>> Regarding #2. Is that really true that during leader election every >> >>> machine reloads snapshot data from disk? Any reason why this is >>needed >> >>> unless it really needs to truncate or undo conflicting transactions >> >> already applied? >> >>> >> >>> >> >>> On Mon, Jul 15, 2013 at 9:50 PM, Thawan Kooburat <[email protected]> >> wrote: >> >>> >> >>>> Max snapshot size: >> >>>> >> >>>> Here is my take on these issue, others feel free to add or >>correct. >> >>>> >> >>>> 1. Depends on how much RAM your machine has. Snapshot is should be >> >>>> less than the available RAM since everything is loaded into memory. >> >>>> 2. Depends on what is the availability guarantee that the client >> needs. >> >>>> If there is leader election, every machine need to reload the data >> >>>> from disk. So the quorum will be down for at least the same as >> >>>> snapshot >> >>> loading >> >>>> time. The session timeout on the client side should be at least >> >>>> longer than expected downtime during leader election. >> >>>> >> >>>> -- >> >>>> Thawan Kooburat >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On 7/15/13 8:46 PM, "Sergey Maslyakov" <[email protected]> wrote: >> >>>> >> >>>>> I have a couple of sizing questions to the users and developers. >> >>>>> Hope, >> >>> you >> >>>>> don't mind answering those. >> >>>>> >> >>>>> What is the guideline for the maximum reasonable size of a >>DataTree >> >>> that a >> >>>>> single ZK server can manage? If ZK server writes out a snapshot of >> >>>>> about 1GB in size, is it pushed beyond the limits or is it still >> >> manageable? >> >>> If >> >>>>> so, where is the critical threshold when ZK is really being >>abused? >> >>>>> >> >>>>> Similarly, how can I estimate the propagation delay of a change >> >>>>> across >> >>> an >> >>>>> ensemble of three ZK servers? >> >>>>> >> >>>>> >> >>>>> Thank you, >> >>>>> /Sergey >> >>>> >> >>>> >> >>> >> >> >> >> >> >>
