Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, I have a question which confuses me quite a long time. As is known to us all, ZK servers frequently take snapshots while processing requests. When a ZK server replays a snapshot which contains a transaction which has been executed before this snapshot, the transaction will be executed two times. However if a version number is specified in the transaction, it will not match the current version number when replaying the transaction. How does ZK server solve this problem? Really look forward to your answers! Thank you. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Please reply to my private mail address from now. On Thu, Oct 10, 2019 at 5:01 AM Gao,Wei wrote: > Hi Chris, > I received your codes about zookeeper balancer. It seems that there are a > few java class files missing. They include: > nl.ing.profileha.util.EventCreator; > nl.ing.profileha.util.FailsafeTriggeredException; > nl.ing.profileha.util.StringUtils; > nl.ing.profileha.util.Validator; > nl.ing.profileha.util.shell.SystemCommandExecutorWithTimeout; > nl.ing.profileha.zoomonitor.LocalConfig; > nl.ing.profileha.util.httpGetRequester; > ACLMode.java; > ZookeeperTreeCache.java; > > Would you please send these class file to me? > Really appreciate for your kindness! > Thanks > > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ >
Re: About ZooKeeper Dynamic Reconfiguration
Hi Chris, I received your codes about zookeeper balancer. It seems that there are a few java class files missing. They include: nl.ing.profileha.util.EventCreator; nl.ing.profileha.util.FailsafeTriggeredException; nl.ing.profileha.util.StringUtils; nl.ing.profileha.util.Validator; nl.ing.profileha.util.shell.SystemCommandExecutorWithTimeout; nl.ing.profileha.zoomonitor.LocalConfig; nl.ing.profileha.util.httpGetRequester; ACLMode.java; ZookeeperTreeCache.java; Would you please send these class file to me? Really appreciate for your kindness! Thanks -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
I sent it again, please check. On Wed, Oct 9, 2019 at 6:31 AM Gao,Wei wrote: > Hi oo4load, > Where did you sent it to? Through this site or directly sent to my email? > I received your pseudo codes last week just like this shown below: > > buildDatacenterAndServerModel(configurationFile) { > enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN > object datacenter has servers > object server has zookeeperRole configuredRole, zookeeperRole activeRole > parse(configurationFile) into (datacenter, servers); > } > shiftMajority(designatedSurvivorDatacenter) { > > > designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT) > otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER) > } > balanceServerRoles() { > if (designatedSurvivorDatacenter.hasMinimumQuorum) > someParticipant.dynamicReconfigure(server=OBSERVER) > if (quorumSize.aboveSafeLimit) > someObserver.dynamicReconfigure(server=PARTICIPANT) > //This is a lot more complicated than 2 simple commands, you need an > algorithm or define several scenarios. > } > > > main() { > buildDatacenterAndServerModel(configurationFile); > while (IamLeader) { > parse(zk.getData(“/zookeeper/config”)) into servers.configuredRole; > foreach(server) getServerRole(“server:8081/commands/stat”) into > servers.activeRole; > > foreach(server.activeRole=DOWN) dynamicReconfigure(server=OBSERVER); > server.setConfiguredRole(OBSERVER); > > if(designatedSurvivorDatacenter != datacenter.hasMajority) >shiftMajority(designatedSurvivorDatacenter); >balanceServerRoles(); > } > } > > If this above is not what you mean, would you please send it again? > Really appreciate for your kindness! > > > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ >
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Where did you sent it to? Through this site or directly sent to my email? I received your pseudo codes last week just like this shown below: buildDatacenterAndServerModel(configurationFile) { enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN object datacenter has servers object server has zookeeperRole configuredRole, zookeeperRole activeRole parse(configurationFile) into (datacenter, servers); } shiftMajority(designatedSurvivorDatacenter) { designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT) otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER) } balanceServerRoles() { if (designatedSurvivorDatacenter.hasMinimumQuorum) someParticipant.dynamicReconfigure(server=OBSERVER) if (quorumSize.aboveSafeLimit) someObserver.dynamicReconfigure(server=PARTICIPANT) //This is a lot more complicated than 2 simple commands, you need an algorithm or define several scenarios. } main() { buildDatacenterAndServerModel(configurationFile); while (IamLeader) { parse(zk.getData(“/zookeeper/config”)) into servers.configuredRole; foreach(server) getServerRole(“server:8081/commands/stat”) into servers.activeRole; foreach(server.activeRole=DOWN) dynamicReconfigure(server=OBSERVER); server.setConfiguredRole(OBSERVER); if(designatedSurvivorDatacenter != datacenter.hasMajority) shiftMajority(designatedSurvivorDatacenter); balanceServerRoles(); } } If this above is not what you mean, would you please send it again? Really appreciate for your kindness! -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
I sent it 1 week ago. On Tue, Oct 8, 2019 at 10:08 AM Gao,Wei wrote: > Hi oo4load, > If it is convenient to you, I would like to get the actual code from you > about the zookeeper cluster balancer implementation. My email address is: > wei@arcserve.com > Thank you again. > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ >
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, If it is convenient to you, I would like to get the actual code from you about the zookeeper cluster balancer implementation. My email address is: wei@arcserve.com Thank you again. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Would you please sent me the active code of the implementation? Thank you very much! -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Would you please sent me the active code of the implementation? Thank you very much! -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
No problem, I will send you Monday. On 29 September 2019 04:30:28 "Gao,Wei" wrote: Hi oo4load, If it is convenient to you, I would like to get the actual code from you about the zookeeper cluster balancer implementation. My email address is: */wei@arcserve.com/* Thank you again. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, If it is convenient to you, I would like to get the actual code from you about the zookeeper cluster balancer implementation. My email address is: */wei@arcserve.com/* Thank you again. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Got it. Thanks a lot! -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
No you have to build a zookeeper cluster manager client using my code. Its a zookeeper client. On 27 September 2019 10:44:51 "Gao,Wei" wrote: *Hi oo4load, How could we integrate this implementation with ZooKeeper 3.5.5? Does it mean we have to mix the implementation code into the already released ZooKeeper 3.5.5 and rebuild it again into another ZooKeeper and re-install it? Thanks.* -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
*Hi oo4load, How could we integrate this implementation with ZooKeeper 3.5.5? Does it mean we have to mix the implementation code into the already released ZooKeeper 3.5.5 and rebuild it again into another ZooKeeper and re-install it? Thanks.* -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Thank you so much for your reply! How I wish I could appreciate your design with actual code! Really look forward to hearing from you. -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
Let me write this from memory. :) We have the following: -A running zookeeper cluster with adminserver enabled -One or more balancer client processes (one per datacenter), of which one has a master role through some leader election. The master does the work, the others do nothing. -In our case, we work with a designated survivor datacenter (has 3 participants and the other non survivor datacenter has 2 participant and 1 observer ) , and the balancer always resides in the designated survivor datacenter. This is not a requirement, due to above leader election. -A balancer client configuration file with all predefined Zookeeper servers (use it for building the client connection string and generating the server list). Each predefined server under normal condition has a running Zookeeper in either participant or observer role. Balancer design: buildDatacenterAndServerModel(configurationFile) { enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN object datacenter has servers object server has zookeeperRole configuredRole, zookeeperRole activeRole parse(configurationFile) into (datacenter, servers); } shiftMajority(designatedSurvivorDatacenter) { designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT) otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER) } balanceServerRoles() { if (designatedSurvivorDatacenter.hasMinimumQuorum) someParticipant.dynamicReconfigure(server=OBSERVER) if (quorumSize.aboveSafeLimit) someObserver.dynamicReconfigure(server=PARTICIPANT) //This is a lot more complicated than 2 simple commands, you need an algorithm or define several scenarios. } main() { buildDatacenterAndServerModel(configurationFile); while (iAmLeader) { parse(zk.getData(/zookeeper/config)) into servers.configuredRole foreach(server) getServerRole(server:8081/commands/stat) into servers.activeRole foreach (server.activeRole=DOWN) dynamicReconfigure(server=OBSERVER) ; server.setConfiguredRole(OBSERVER) if(designatedSurvivorDatacenter != datacenter.hasMajority) shiftMajority(designatedSurvivorDatacenter) balanceServerRoles() } } Hope this helps. If you need more details, I can check the actual code coming week. On Fri, Sep 27, 2019 at 5:06 AM Gao,Wei wrote: > Hi oo4load, > Could you please tell me how to implements this to avoid the problem > above? > Thanks > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ >
Re: About ZooKeeper Dynamic Reconfiguration
Hi oo4load, Could you please tell me how to implements this to avoid the problem above? Thanks -- Sent from: http://zookeeper-user.578899.n2.nabble.com/
Re: About ZooKeeper Dynamic Reconfiguration
We have 3+3 of which 1 floating observer in non target datacenter and automatic reconfiguring to more observers if we are losing participants. If the target datacenter blows up this doesn't work, but our main application will be able to serve customers in a readonly state until operators switch the non target datacenter to active mode. On 21 August 2019 20:39:21 Enrico Olivelli wrote: Il mer 21 ago 2019, 20:27 Cee Tee ha scritto: Yes, one side loses quorum and the other remains active. However we actively control which side that is, because our main application is active/passive with 2 datacenters. We need Zookeeper to remain active in the applications active datacenter. How many zk servers you have? 2 + 3? If you lose DC #1 you are okay, but if you lose the #2 you cannot have a quorum of 3, and you cannot simply add another server to #1 Enrico On 21 August 2019 17:22:00 Alexander Shraer wrote: > That's great! Thanks for sharing. > > >> Added benefit is that we can also control which data center gets the quorum >> in case of a network outage between the two. > > > Can you explain how this works? In case of a network outage between two > DCs, one of them has a quorum of participants and the other doesn't. > The participants in the smaller set should not be operational at this time, > since they can't get quorum. no ? > > > > Thanks, > Alex > > > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > > We have solved this by implementing a 'zookeeper cluster balancer', it > calls the admin server api of each zookeeper to get the current status and > will issue dynamic reconfigure commands to change dead servers into > observers so the quorum is not in danger. Once the dead servers reconnect, > they take the observer role and are then reconfigured into participants again. > > Added benefit is that we can also control which data center gets the quorum > in case of a network outage between the two. > Regards > Chris > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > >> Hi, >> >> Reconfiguration, as implemented, is not automatic. In your case, when >> failures happen, this doesn't change the ensemble membership. >> When 2 of 5 fail, this is still a minority, so everything should work >> normally, you just won't be able to handle an additional failure. If you'd >> like >> to remove them from the ensemble, you need to issue an explicit >> reconfiguration command to do so. >> >> Please see details in the manual: >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html >> >> Alex >> >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: >> >>> Hi >>>I encounter a problem which blocks my development of load balance using >>> ZooKeeper 3.5.5. >>>Actually, I have a ZooKeeper cluster which comprises of five zk >>> servers. And the dynamic configuration file is as follows: >>> >>> server.1=zk1:2888:3888:participant;0.0.0.0:2181 >>> server.2=zk2:2888:3888:participant;0.0.0.0:2181 >>> server.3=zk3:2888:3888:participant;0.0.0.0:2181 >>> server.4=zk4:2888:3888:participant;0.0.0.0:2181 >>> server.5=zk5:2888:3888:participant;0.0.0.0:2181 >>> >>> The zk cluster can work fine if every member works normally. However, if >>> say two of them are suddenly down without previously being notified, >>> the dynamic configuration file shown above will not be synchronized >>> dynamically, which leads to the zk cluster fail to work normally. >>> I think this is a very common case which may happen at any time. If so, >>> how can we resolve it? >>> Really look forward to hearing from you! >>> Thanks >>>
RE: About ZooKeeper Dynamic Reconfiguration
At my organization we solve that by running a 3rd site as mentioned in another email. We run a 5 node ensemble with 2 nodes in each primary data center and 1 node in the co-location facility. We try to minimize usage of the 5th node so we explicitly exclude it from our clients' connection string. This way, if there is a network partition between datacenters, which ever one can still talk to the node at the 3rd datacenter will maintain quorum. Ideally, if it was possible, we'd somehow like the node at the third datacenter to never be elected as the leader and even better if there was some way for it to be a voting member only and not bear any data (similar to mongodb's arbiter). -Original Message- From: Cee Tee [mailto:c.turks...@gmail.com] Sent: Wednesday, August 21, 2019 1:27 PM To: Alexander Shraer Cc: user@zookeeper.apache.org Subject: Re: About ZooKeeper Dynamic Reconfiguration {External email message: This email is from an external source. Please exercise caution prior to opening attachments, clicking on links, or providing any sensitive information.} Yes, one side loses quorum and the other remains active. However we actively control which side that is, because our main application is active/passive with 2 datacenters. We need Zookeeper to remain active in the applications active datacenter. On 21 August 2019 17:22:00 Alexander Shraer wrote: > That's great! Thanks for sharing. > > >> Added benefit is that we can also control which data center gets the >> quorum in case of a network outage between the two. > > > Can you explain how this works? In case of a network outage between > two DCs, one of them has a quorum of participants and the other doesn't. > The participants in the smaller set should not be operational at this > time, since they can't get quorum. no ? > > > > Thanks, > Alex > > > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > > We have solved this by implementing a 'zookeeper cluster balancer', it > calls the admin server api of each zookeeper to get the current status > and will issue dynamic reconfigure commands to change dead servers > into observers so the quorum is not in danger. Once the dead servers > reconnect, they take the observer role and are then reconfigured into > participants again. > > Added benefit is that we can also control which data center gets the > quorum in case of a network outage between the two. > Regards > Chris > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > >> Hi, >> >> Reconfiguration, as implemented, is not automatic. In your case, when >> failures happen, this doesn't change the ensemble membership. >> When 2 of 5 fail, this is still a minority, so everything should work >> normally, you just won't be able to handle an additional failure. If >> you'd like to remove them from the ensemble, you need to issue an >> explicit reconfiguration command to do so. >> >> Please see details in the manual: >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html >> >> Alex >> >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: >> >>> Hi >>>I encounter a problem which blocks my development of load balance >>> using ZooKeeper 3.5.5. >>>Actually, I have a ZooKeeper cluster which comprises of five zk >>> servers. And the dynamic configuration file is as follows: >>> >>> server.1=zk1:2888:3888:participant;0.0.0.0:2181 >>> server.2=zk2:2888:3888:participant;0.0.0.0:2181 >>> server.3=zk3:2888:3888:participant;0.0.0.0:2181 >>> server.4=zk4:2888:3888:participant;0.0.0.0:2181 >>> server.5=zk5:2888:3888:participant;0.0.0.0:2181 >>> >>> The zk cluster can work fine if every member works normally. >>> However, if say two of them are suddenly down without previously >>> being notified, the dynamic configuration file shown above will not >>> be synchronized dynamically, which leads to the zk cluster fail to work >>> normally. >>> I think this is a very common case which may happen at any time. >>> If so, how can we resolve it? >>> Really look forward to hearing from you! >>> Thanks >>>
Re: About ZooKeeper Dynamic Reconfiguration
Il mer 21 ago 2019, 20:27 Cee Tee ha scritto: > > Yes, one side loses quorum and the other remains active. However we > actively control which side that is, because our main application is > active/passive with 2 datacenters. We need Zookeeper to remain active in the applications active datacenter. > How many zk servers you have? 2 + 3? If you lose DC #1 you are okay, but if you lose the #2 you cannot have a quorum of 3, and you cannot simply add another server to #1 Enrico > > On 21 August 2019 17:22:00 Alexander Shraer wrote: > > That's great! Thanks for sharing. > > > > > >> Added benefit is that we can also control which data center gets the > quorum > >> in case of a network outage between the two. > > > > > > Can you explain how this works? In case of a network outage between two > > DCs, one of them has a quorum of participants and the other doesn't. > > The participants in the smaller set should not be operational at this > time, > > since they can't get quorum. no ? > > > > > > > > Thanks, > > Alex > > > > > > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > > > > We have solved this by implementing a 'zookeeper cluster balancer', it > > calls the admin server api of each zookeeper to get the current status > and > > will issue dynamic reconfigure commands to change dead servers into > > observers so the quorum is not in danger. Once the dead servers > reconnect, > > they take the observer role and are then reconfigured into participants > again. > > > > Added benefit is that we can also control which data center gets the > quorum > > in case of a network outage between the two. > > Regards > > Chris > > > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > > > >> Hi, > >> > >> Reconfiguration, as implemented, is not automatic. In your case, when > >> failures happen, this doesn't change the ensemble membership. > >> When 2 of 5 fail, this is still a minority, so everything should work > >> normally, you just won't be able to handle an additional failure. If > you'd > >> like > >> to remove them from the ensemble, you need to issue an explicit > >> reconfiguration command to do so. > >> > >> Please see details in the manual: > >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html > >> > >> Alex > >> > >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: > >> > >>> Hi > >>>I encounter a problem which blocks my development of load balance > using > >>> ZooKeeper 3.5.5. > >>>Actually, I have a ZooKeeper cluster which comprises of five zk > >>> servers. And the dynamic configuration file is as follows: > >>> > >>> server.1=zk1:2888:3888:participant;0.0.0.0:2181 > >>> server.2=zk2:2888:3888:participant;0.0.0.0:2181 > >>> server.3=zk3:2888:3888:participant;0.0.0.0:2181 > >>> server.4=zk4:2888:3888:participant;0.0.0.0:2181 > >>> server.5=zk5:2888:3888:participant;0.0.0.0:2181 > >>> > >>> The zk cluster can work fine if every member works normally. > However, if > >>> say two of them are suddenly down without previously being notified, > >>> the dynamic configuration file shown above will not be synchronized > >>> dynamically, which leads to the zk cluster fail to work normally. > >>> I think this is a very common case which may happen at any time. If > so, > >>> how can we resolve it? > >>> Really look forward to hearing from you! > >>> Thanks > >>> > >
Re: About ZooKeeper Dynamic Reconfiguration
Yes, one side loses quorum and the other remains active. However we actively control which side that is, because our main application is active/passive with 2 datacenters. We need Zookeeper to remain active in the applications active datacenter. On 21 August 2019 17:22:00 Alexander Shraer wrote: That's great! Thanks for sharing. Added benefit is that we can also control which data center gets the quorum in case of a network outage between the two. Can you explain how this works? In case of a network outage between two DCs, one of them has a quorum of participants and the other doesn't. The participants in the smaller set should not be operational at this time, since they can't get quorum. no ? Thanks, Alex On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: We have solved this by implementing a 'zookeeper cluster balancer', it calls the admin server api of each zookeeper to get the current status and will issue dynamic reconfigure commands to change dead servers into observers so the quorum is not in danger. Once the dead servers reconnect, they take the observer role and are then reconfigured into participants again. Added benefit is that we can also control which data center gets the quorum in case of a network outage between the two. Regards Chris On 21 August 2019 16:42:37 Alexander Shraer wrote: Hi, Reconfiguration, as implemented, is not automatic. In your case, when failures happen, this doesn't change the ensemble membership. When 2 of 5 fail, this is still a minority, so everything should work normally, you just won't be able to handle an additional failure. If you'd like to remove them from the ensemble, you need to issue an explicit reconfiguration command to do so. Please see details in the manual: https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html Alex On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: Hi I encounter a problem which blocks my development of load balance using ZooKeeper 3.5.5. Actually, I have a ZooKeeper cluster which comprises of five zk servers. And the dynamic configuration file is as follows: server.1=zk1:2888:3888:participant;0.0.0.0:2181 server.2=zk2:2888:3888:participant;0.0.0.0:2181 server.3=zk3:2888:3888:participant;0.0.0.0:2181 server.4=zk4:2888:3888:participant;0.0.0.0:2181 server.5=zk5:2888:3888:participant;0.0.0.0:2181 The zk cluster can work fine if every member works normally. However, if say two of them are suddenly down without previously being notified, the dynamic configuration file shown above will not be synchronized dynamically, which leads to the zk cluster fail to work normally. I think this is a very common case which may happen at any time. If so, how can we resolve it? Really look forward to hearing from you! Thanks
Re: About ZooKeeper Dynamic Reconfiguration
Il mer 21 ago 2019, 17:22 Alexander Shraer ha scritto: > That's great! Thanks for sharing. > > > Added benefit is that we can also control which data center gets the > quorum > > in case of a network outage between the two. > > Can you explain how this works? In case of a network outage between two > DCs, one of them has a quorum of participants and the other doesn't. > The participants in the smaller set should not be operational at this time, > since they can't get quorum. no ? > I have recently talked about a similar problem with Norbert K. on gitter chat. We got to the conclusion that you need 3 datacenters Enrico > Thanks, > Alex > > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > > > We have solved this by implementing a 'zookeeper cluster balancer', it > > calls the admin server api of each zookeeper to get the current status > and > > will issue dynamic reconfigure commands to change dead servers into > > observers so the quorum is not in danger. Once the dead servers > reconnect, > > they take the observer role and are then reconfigured into participants > > again. > > > > Added benefit is that we can also control which data center gets the > > quorum > > in case of a network outage between the two. > > Regards > > Chris > > > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > > > > > Hi, > > > > > > Reconfiguration, as implemented, is not automatic. In your case, when > > > failures happen, this doesn't change the ensemble membership. > > > When 2 of 5 fail, this is still a minority, so everything should work > > > normally, you just won't be able to handle an additional failure. If > > you'd > > > like > > > to remove them from the ensemble, you need to issue an explicit > > > reconfiguration command to do so. > > > > > > Please see details in the manual: > > > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html > > > > > > Alex > > > > > > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: > > > > > >> Hi > > >>I encounter a problem which blocks my development of load balance > > using > > >> ZooKeeper 3.5.5. > > >>Actually, I have a ZooKeeper cluster which comprises of five zk > > >> servers. And the dynamic configuration file is as follows: > > >> > > >> server.1=zk1:2888:3888:participant;0.0.0.0:2181 > > >> server.2=zk2:2888:3888:participant;0.0.0.0:2181 > > >> server.3=zk3:2888:3888:participant;0.0.0.0:2181 > > >> server.4=zk4:2888:3888:participant;0.0.0.0:2181 > > >> server.5=zk5:2888:3888:participant;0.0.0.0:2181 > > >> > > >> The zk cluster can work fine if every member works normally. > However, > > if > > >> say two of them are suddenly down without previously being notified, > > >> the dynamic configuration file shown above will not be synchronized > > >> dynamically, which leads to the zk cluster fail to work normally. > > >> I think this is a very common case which may happen at any time. If > > so, > > >> how can we resolve it? > > >> Really look forward to hearing from you! > > >> Thanks > > >> > > > > > > > > >
Re: About ZooKeeper Dynamic Reconfiguration
That's great! Thanks for sharing. > Added benefit is that we can also control which data center gets the quorum > in case of a network outage between the two. Can you explain how this works? In case of a network outage between two DCs, one of them has a quorum of participants and the other doesn't. The participants in the smaller set should not be operational at this time, since they can't get quorum. no ? Thanks, Alex On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > We have solved this by implementing a 'zookeeper cluster balancer', it > calls the admin server api of each zookeeper to get the current status and > will issue dynamic reconfigure commands to change dead servers into > observers so the quorum is not in danger. Once the dead servers reconnect, > they take the observer role and are then reconfigured into participants > again. > > Added benefit is that we can also control which data center gets the > quorum > in case of a network outage between the two. > Regards > Chris > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > > > Hi, > > > > Reconfiguration, as implemented, is not automatic. In your case, when > > failures happen, this doesn't change the ensemble membership. > > When 2 of 5 fail, this is still a minority, so everything should work > > normally, you just won't be able to handle an additional failure. If > you'd > > like > > to remove them from the ensemble, you need to issue an explicit > > reconfiguration command to do so. > > > > Please see details in the manual: > > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html > > > > Alex > > > > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: > > > >> Hi > >>I encounter a problem which blocks my development of load balance > using > >> ZooKeeper 3.5.5. > >>Actually, I have a ZooKeeper cluster which comprises of five zk > >> servers. And the dynamic configuration file is as follows: > >> > >> server.1=zk1:2888:3888:participant;0.0.0.0:2181 > >> server.2=zk2:2888:3888:participant;0.0.0.0:2181 > >> server.3=zk3:2888:3888:participant;0.0.0.0:2181 > >> server.4=zk4:2888:3888:participant;0.0.0.0:2181 > >> server.5=zk5:2888:3888:participant;0.0.0.0:2181 > >> > >> The zk cluster can work fine if every member works normally. However, > if > >> say two of them are suddenly down without previously being notified, > >> the dynamic configuration file shown above will not be synchronized > >> dynamically, which leads to the zk cluster fail to work normally. > >> I think this is a very common case which may happen at any time. If > so, > >> how can we resolve it? > >> Really look forward to hearing from you! > >> Thanks > >> > > > >
Re: About ZooKeeper Dynamic Reconfiguration
We have solved this by implementing a 'zookeeper cluster balancer', it calls the admin server api of each zookeeper to get the current status and will issue dynamic reconfigure commands to change dead servers into observers so the quorum is not in danger. Once the dead servers reconnect, they take the observer role and are then reconfigured into participants again. Added benefit is that we can also control which data center gets the quorum in case of a network outage between the two. Regards Chris On 21 August 2019 16:42:37 Alexander Shraer wrote: Hi, Reconfiguration, as implemented, is not automatic. In your case, when failures happen, this doesn't change the ensemble membership. When 2 of 5 fail, this is still a minority, so everything should work normally, you just won't be able to handle an additional failure. If you'd like to remove them from the ensemble, you need to issue an explicit reconfiguration command to do so. Please see details in the manual: https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html Alex On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: Hi I encounter a problem which blocks my development of load balance using ZooKeeper 3.5.5. Actually, I have a ZooKeeper cluster which comprises of five zk servers. And the dynamic configuration file is as follows: server.1=zk1:2888:3888:participant;0.0.0.0:2181 server.2=zk2:2888:3888:participant;0.0.0.0:2181 server.3=zk3:2888:3888:participant;0.0.0.0:2181 server.4=zk4:2888:3888:participant;0.0.0.0:2181 server.5=zk5:2888:3888:participant;0.0.0.0:2181 The zk cluster can work fine if every member works normally. However, if say two of them are suddenly down without previously being notified, the dynamic configuration file shown above will not be synchronized dynamically, which leads to the zk cluster fail to work normally. I think this is a very common case which may happen at any time. If so, how can we resolve it? Really look forward to hearing from you! Thanks
Re: About ZooKeeper Dynamic Reconfiguration
Hi, Reconfiguration, as implemented, is not automatic. In your case, when failures happen, this doesn't change the ensemble membership. When 2 of 5 fail, this is still a minority, so everything should work normally, you just won't be able to handle an additional failure. If you'd like to remove them from the ensemble, you need to issue an explicit reconfiguration command to do so. Please see details in the manual: https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html Alex On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: > Hi >I encounter a problem which blocks my development of load balance using > ZooKeeper 3.5.5. >Actually, I have a ZooKeeper cluster which comprises of five zk > servers. And the dynamic configuration file is as follows: > > server.1=zk1:2888:3888:participant;0.0.0.0:2181 > server.2=zk2:2888:3888:participant;0.0.0.0:2181 > server.3=zk3:2888:3888:participant;0.0.0.0:2181 > server.4=zk4:2888:3888:participant;0.0.0.0:2181 > server.5=zk5:2888:3888:participant;0.0.0.0:2181 > > The zk cluster can work fine if every member works normally. However, if > say two of them are suddenly down without previously being notified, > the dynamic configuration file shown above will not be synchronized > dynamically, which leads to the zk cluster fail to work normally. > I think this is a very common case which may happen at any time. If so, > how can we resolve it? > Really look forward to hearing from you! > Thanks >