Re: About ZooKeeper Dynamic Reconfiguration

2019-12-03 Thread Gao,Wei
Hi oo4load,
  I have a question which confuses me quite a long time.
  As is known to us all, ZK servers frequently take snapshots while
processing requests.
When a ZK server replays a snapshot which contains a transaction which has
been executed before this snapshot, the transaction will be executed two
times. However if a version number is specified in the transaction, it will
not match the current version number when replaying the transaction.
  How does ZK server solve this problem?
  Really look forward to your answers!
  Thank you.





--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-09 Thread Chris T.
Please reply to my private mail address from now.

On Thu, Oct 10, 2019 at 5:01 AM Gao,Wei  wrote:

> Hi Chris,
> I received your codes about zookeeper balancer. It seems that there are a
> few java class files missing. They include:
> nl.ing.profileha.util.EventCreator;
> nl.ing.profileha.util.FailsafeTriggeredException;
> nl.ing.profileha.util.StringUtils;
> nl.ing.profileha.util.Validator;
> nl.ing.profileha.util.shell.SystemCommandExecutorWithTimeout;
> nl.ing.profileha.zoomonitor.LocalConfig;
> nl.ing.profileha.util.httpGetRequester;
> ACLMode.java;
> ZookeeperTreeCache.java;
>
> Would you please send these class file to me?
> Really appreciate for your kindness!
> Thanks
>
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-09 Thread Gao,Wei
Hi Chris,
I received your codes about zookeeper balancer. It seems that there are a
few java class files missing. They include:
nl.ing.profileha.util.EventCreator;
nl.ing.profileha.util.FailsafeTriggeredException;
nl.ing.profileha.util.StringUtils;
nl.ing.profileha.util.Validator;
nl.ing.profileha.util.shell.SystemCommandExecutorWithTimeout;
nl.ing.profileha.zoomonitor.LocalConfig;
nl.ing.profileha.util.httpGetRequester;
ACLMode.java;
ZookeeperTreeCache.java;

Would you please send these class file to me?
Really appreciate for your kindness!
Thanks




--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-09 Thread Chris T.
I sent it again, please check.

On Wed, Oct 9, 2019 at 6:31 AM Gao,Wei  wrote:

> Hi oo4load,
>   Where did you sent it to? Through this site or directly sent to my email?
> I received your pseudo codes last week just like this shown below:
>
> buildDatacenterAndServerModel(configurationFile) {
>   enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN
>   object datacenter has servers
>   object server has zookeeperRole configuredRole, zookeeperRole activeRole
>   parse(configurationFile) into (datacenter, servers);
> }
> shiftMajority(designatedSurvivorDatacenter) {
>
>
> designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT)
>   otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER)
> }
> balanceServerRoles() {
>   if (designatedSurvivorDatacenter.hasMinimumQuorum)
>   someParticipant.dynamicReconfigure(server=OBSERVER)
>   if (quorumSize.aboveSafeLimit)
>   someObserver.dynamicReconfigure(server=PARTICIPANT)
>   //This is a lot more complicated than 2 simple commands, you need an
> algorithm or define several scenarios.
> }
>
>
> main() {
>  buildDatacenterAndServerModel(configurationFile);
>   while (IamLeader) {
> parse(zk.getData(“/zookeeper/config”)) into servers.configuredRole;
> foreach(server) getServerRole(“server:8081/commands/stat”) into
> servers.activeRole;
>
> foreach(server.activeRole=DOWN)  dynamicReconfigure(server=OBSERVER);
>   server.setConfiguredRole(OBSERVER);
>
> if(designatedSurvivorDatacenter != datacenter.hasMajority)
>shiftMajority(designatedSurvivorDatacenter);
>balanceServerRoles();
>   }
> }
>
> If this above is not what you mean, would you please send it again?
> Really appreciate for your kindness!
>
>
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-08 Thread Gao,Wei
Hi oo4load,
  Where did you sent it to? Through this site or directly sent to my email?
I received your pseudo codes last week just like this shown below:

buildDatacenterAndServerModel(configurationFile) {
  enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN
  object datacenter has servers
  object server has zookeeperRole configuredRole, zookeeperRole activeRole
  parse(configurationFile) into (datacenter, servers);
}
shiftMajority(designatedSurvivorDatacenter) {
 
designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT)
  otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER)
}
balanceServerRoles() {
  if (designatedSurvivorDatacenter.hasMinimumQuorum)
  someParticipant.dynamicReconfigure(server=OBSERVER)
  if (quorumSize.aboveSafeLimit)
  someObserver.dynamicReconfigure(server=PARTICIPANT)
  //This is a lot more complicated than 2 simple commands, you need an
algorithm or define several scenarios.
}


main() {
 buildDatacenterAndServerModel(configurationFile);
  while (IamLeader) {
parse(zk.getData(“/zookeeper/config”)) into servers.configuredRole;
foreach(server) getServerRole(“server:8081/commands/stat”) into
servers.activeRole;

foreach(server.activeRole=DOWN)  dynamicReconfigure(server=OBSERVER);
  server.setConfiguredRole(OBSERVER);

if(designatedSurvivorDatacenter != datacenter.hasMajority)
   shiftMajority(designatedSurvivorDatacenter);
   balanceServerRoles();
  }
}

If this above is not what you mean, would you please send it again?
Really appreciate for your kindness!





--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-08 Thread Chris T.
I sent it 1 week ago.

On Tue, Oct 8, 2019 at 10:08 AM Gao,Wei  wrote:

> Hi oo4load,
>   If it is convenient to you, I would like to get the actual code from you
> about the zookeeper cluster balancer implementation. My email address is:
> wei@arcserve.com
> Thank you again.
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-08 Thread Gao,Wei
Hi oo4load,
  If it is convenient to you, I would like to get the actual code from you
about the zookeeper cluster balancer implementation. My email address is: 
wei@arcserve.com
Thank you again.



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-07 Thread Gao,Wei
Hi oo4load,
 Would you please sent me the active code of the implementation?
Thank you very much!



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-10-07 Thread Gao,Wei
Hi oo4load,
 Would you please sent me the active code of the implementation?
Thank you very much!



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-28 Thread Cee Tee

No problem, I will send you Monday.

On 29 September 2019 04:30:28 "Gao,Wei"  wrote:


Hi oo4load,
 If it is convenient to you, I would like to get the actual code from you
about the zookeeper cluster balancer implementation. My email address is:
 */wei@arcserve.com/*
 Thank you again.



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/






Re: About ZooKeeper Dynamic Reconfiguration

2019-09-28 Thread Gao,Wei
Hi oo4load,
  If it is convenient to you, I would like to get the actual code from you
about the zookeeper cluster balancer implementation. My email address is:
  */wei@arcserve.com/*
  Thank you again.



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-27 Thread Gao,Wei
Hi oo4load,
  Got it. Thanks a lot!



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-27 Thread Cee Tee
No you have to build a zookeeper cluster manager client using my code. Its 
a zookeeper client.


On 27 September 2019 10:44:51 "Gao,Wei"  wrote:


*Hi oo4load,
 How could we integrate this implementation with ZooKeeper 3.5.5? Does it
mean we have to mix the implementation code into the already released
ZooKeeper 3.5.5 and rebuild it again into another ZooKeeper and re-install
it?
 Thanks.*



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/






Re: About ZooKeeper Dynamic Reconfiguration

2019-09-27 Thread Gao,Wei
*Hi oo4load,
  How could we integrate this implementation with ZooKeeper 3.5.5? Does it
mean we have to mix the implementation code into the already released
ZooKeeper 3.5.5 and rebuild it again into another ZooKeeper and re-install
it?
  Thanks.*



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-27 Thread Gao,Wei
Hi oo4load,
  Thank you so much for your reply!
  How I wish I could appreciate your design with actual code!
  Really look forward to hearing from you.



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-27 Thread Chris T.
Let me write this from memory. :)

We have the following:

-A running zookeeper cluster with adminserver enabled
-One or more balancer client processes (one per datacenter), of which one
has a master role through some leader election. The master does the work,
the others do nothing.
-In our case, we work with a designated survivor datacenter (has 3
participants and the other non survivor datacenter has 2 participant and 1
observer ) , and the balancer always resides in the designated survivor
datacenter. This is not a requirement, due to above leader election.
-A balancer client configuration file with all predefined Zookeeper servers
(use it for building the client connection string and generating the server
list). Each predefined server under normal condition has a running
Zookeeper in either participant or observer role.

Balancer design:

buildDatacenterAndServerModel(configurationFile) {
  enum zookeeperRole PARTICIPANT, OBSERVER, NONE, DOWN
  object datacenter has servers
  object server has zookeeperRole configuredRole, zookeeperRole activeRole
  parse(configurationFile) into (datacenter, servers);
}

shiftMajority(designatedSurvivorDatacenter) {

designatedSurvivorDatacenter.someObserver.dynamicReconfigure(server=PARTICIPANT)
  otherDatacenter.someParticipant.dynamicReconfigure(server=OBSERVER)
}


balanceServerRoles() {
  if (designatedSurvivorDatacenter.hasMinimumQuorum)
someParticipant.dynamicReconfigure(server=OBSERVER)
  if (quorumSize.aboveSafeLimit)
someObserver.dynamicReconfigure(server=PARTICIPANT)
  //This is a lot more complicated than 2 simple commands, you need an
algorithm or define several scenarios.
}


main() {
 buildDatacenterAndServerModel(configurationFile);
  while (iAmLeader) {
parse(zk.getData(/zookeeper/config)) into servers.configuredRole
foreach(server) getServerRole(server:8081/commands/stat) into
servers.activeRole

foreach (server.activeRole=DOWN) dynamicReconfigure(server=OBSERVER) ;
server.setConfiguredRole(OBSERVER)

if(designatedSurvivorDatacenter != datacenter.hasMajority)
shiftMajority(designatedSurvivorDatacenter)

balanceServerRoles()
  }
}



Hope this helps. If you need more details, I can check the actual code
coming week.

On Fri, Sep 27, 2019 at 5:06 AM Gao,Wei  wrote:

> Hi oo4load,
>   Could you please tell me how to implements this to avoid the problem
> above?
> Thanks
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-09-26 Thread Gao,Wei
Hi oo4load,
  Could you please tell me how to implements this to avoid the problem
above?
Thanks



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee
We have 3+3 of which 1 floating observer in non target datacenter and 
automatic reconfiguring to more observers if we are losing participants.


If the target datacenter blows up this doesn't work, but our main 
application will be able to serve customers in a readonly state until 
operators switch the non target datacenter to active mode.


On 21 August 2019 20:39:21 Enrico Olivelli  wrote:


Il mer 21 ago 2019, 20:27 Cee Tee  ha scritto:



Yes, one side loses quorum and the other remains active. However we
actively control which side that is, because our main application is
active/passive with 2 datacenters. We need Zookeeper to remain active in


the applications active datacenter.




How many zk servers you have? 2 + 3?
If you lose DC #1 you are okay, but if you lose the #2 you cannot have a
quorum of 3, and you cannot simply add another server to #1

Enrico



On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> That's great! Thanks for sharing.
>
>
>> Added benefit is that we can also control which data center gets the
quorum
>> in case of a network outage between the two.
>
>
> Can you explain how this works? In case of a network outage between two
> DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this
time,
> since they can't get quorum. no ?
>
>
>
> Thanks,
> Alex
>
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> We have solved this by implementing a 'zookeeper cluster balancer', it
> calls the admin server api of each zookeeper to get the current status
and
> will issue dynamic reconfigure commands to change dead servers into
> observers so the quorum is not in danger. Once the dead servers
reconnect,
> they take the observer role and are then reconfigured into participants
again.
>
> Added benefit is that we can also control which data center gets the
quorum
> in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
>> Hi,
>>
>> Reconfiguration, as implemented, is not automatic. In your case, when
>> failures happen, this doesn't change the ensemble membership.
>> When 2 of 5 fail, this is still a minority, so everything should work
>> normally, you just won't be able to handle an additional failure. If
you'd
>> like
>> to remove them from the ensemble, you need to issue an explicit
>> reconfiguration command to do so.
>>
>> Please see details in the manual:
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
>>
>> Alex
>>
>> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
>>
>>> Hi
>>>I encounter a problem which blocks my development of load balance
using
>>> ZooKeeper 3.5.5.
>>>Actually, I have a ZooKeeper cluster which comprises of five zk
>>> servers. And the dynamic configuration file is as follows:
>>>
>>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>>>
>>>   The zk cluster can work fine if every member works normally.
However, if
>>> say two of them are suddenly down without previously being notified,
>>> the dynamic configuration file shown above will not be synchronized
>>> dynamically, which leads to the zk cluster fail to work normally.
>>>   I think this is a very common case which may happen at any time. If
so,
>>> how can we resolve it?
>>>   Really look forward to hearing from you!
>>> Thanks
>>>








RE: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Kathryn Hogg
At my organization we solve that by running a 3rd site as mentioned in another 
email.  We run a 5 node ensemble with 2 nodes in each primary data center and 1 
node in the co-location facility.  We try to minimize usage of the 5th node so 
we explicitly exclude it from our clients' connection string.

This way, if there is a network partition between datacenters, which ever one 
can still talk to the node at the 3rd datacenter will maintain quorum.

Ideally, if it was possible, we'd somehow like the node at the third datacenter 
to never be elected as the leader and even better if there was some way for it 
to be a voting member only and not bear any data (similar to mongodb's arbiter).


-Original Message-
From: Cee Tee [mailto:c.turks...@gmail.com] 
Sent: Wednesday, August 21, 2019 1:27 PM
To: Alexander Shraer 
Cc: user@zookeeper.apache.org
Subject: Re: About ZooKeeper Dynamic Reconfiguration

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}


Yes, one side loses quorum and the other remains active. However we actively 
control which side that is, because our main application is active/passive with 
2 datacenters. We need Zookeeper to remain active in the applications active 
datacenter.

On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> That's great! Thanks for sharing.
>
>
>> Added benefit is that we can also control which data center gets the 
>> quorum in case of a network outage between the two.
>
>
> Can you explain how this works? In case of a network outage between 
> two DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this 
> time, since they can't get quorum. no ?
>
>
>
> Thanks,
> Alex
>
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> We have solved this by implementing a 'zookeeper cluster balancer', it 
> calls the admin server api of each zookeeper to get the current status 
> and will issue dynamic reconfigure commands to change dead servers 
> into observers so the quorum is not in danger. Once the dead servers 
> reconnect, they take the observer role and are then reconfigured into 
> participants again.
>
> Added benefit is that we can also control which data center gets the 
> quorum in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
>> Hi,
>>
>> Reconfiguration, as implemented, is not automatic. In your case, when 
>> failures happen, this doesn't change the ensemble membership.
>> When 2 of 5 fail, this is still a minority, so everything should work 
>> normally, you just won't be able to handle an additional failure. If 
>> you'd like to remove them from the ensemble, you need to issue an 
>> explicit reconfiguration command to do so.
>>
>> Please see details in the manual:
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
>>
>> Alex
>>
>> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
>>
>>> Hi
>>>I encounter a problem which blocks my development of load balance 
>>> using ZooKeeper 3.5.5.
>>>Actually, I have a ZooKeeper cluster which comprises of five zk 
>>> servers. And the dynamic configuration file is as follows:
>>>
>>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>>>
>>>   The zk cluster can work fine if every member works normally. 
>>> However, if say two of them are suddenly down without previously 
>>> being notified, the dynamic configuration file shown above will not 
>>> be synchronized dynamically, which leads to the zk cluster fail to work 
>>> normally.
>>>   I think this is a very common case which may happen at any time. 
>>> If so, how can we resolve it?
>>>   Really look forward to hearing from you!
>>> Thanks
>>>



Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Enrico Olivelli
Il mer 21 ago 2019, 20:27 Cee Tee  ha scritto:

>
> Yes, one side loses quorum and the other remains active. However we
> actively control which side that is, because our main application is
> active/passive with 2 datacenters. We need Zookeeper to remain active in

the applications active datacenter.
>

How many zk servers you have? 2 + 3?
If you lose DC #1 you are okay, but if you lose the #2 you cannot have a
quorum of 3, and you cannot simply add another server to #1

Enrico

>
> On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> > That's great! Thanks for sharing.
> >
> >
> >> Added benefit is that we can also control which data center gets the
> quorum
> >> in case of a network outage between the two.
> >
> >
> > Can you explain how this works? In case of a network outage between two
> > DCs, one of them has a quorum of participants and the other doesn't.
> > The participants in the smaller set should not be operational at this
> time,
> > since they can't get quorum. no ?
> >
> >
> >
> > Thanks,
> > Alex
> >
> >
> > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
> >
> > We have solved this by implementing a 'zookeeper cluster balancer', it
> > calls the admin server api of each zookeeper to get the current status
> and
> > will issue dynamic reconfigure commands to change dead servers into
> > observers so the quorum is not in danger. Once the dead servers
> reconnect,
> > they take the observer role and are then reconfigured into participants
> again.
> >
> > Added benefit is that we can also control which data center gets the
> quorum
> > in case of a network outage between the two.
> > Regards
> > Chris
> >
> > On 21 August 2019 16:42:37 Alexander Shraer  wrote:
> >
> >> Hi,
> >>
> >> Reconfiguration, as implemented, is not automatic. In your case, when
> >> failures happen, this doesn't change the ensemble membership.
> >> When 2 of 5 fail, this is still a minority, so everything should work
> >> normally, you just won't be able to handle an additional failure. If
> you'd
> >> like
> >> to remove them from the ensemble, you need to issue an explicit
> >> reconfiguration command to do so.
> >>
> >> Please see details in the manual:
> >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> >>
> >> Alex
> >>
> >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> >>
> >>> Hi
> >>>I encounter a problem which blocks my development of load balance
> using
> >>> ZooKeeper 3.5.5.
> >>>Actually, I have a ZooKeeper cluster which comprises of five zk
> >>> servers. And the dynamic configuration file is as follows:
> >>>
> >>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> >>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> >>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> >>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> >>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> >>>
> >>>   The zk cluster can work fine if every member works normally.
> However, if
> >>> say two of them are suddenly down without previously being notified,
> >>> the dynamic configuration file shown above will not be synchronized
> >>> dynamically, which leads to the zk cluster fail to work normally.
> >>>   I think this is a very common case which may happen at any time. If
> so,
> >>> how can we resolve it?
> >>>   Really look forward to hearing from you!
> >>> Thanks
> >>>
>
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee


Yes, one side loses quorum and the other remains active. However we 
actively control which side that is, because our main application is 
active/passive with 2 datacenters. We need Zookeeper to remain active in 
the applications active datacenter.


On 21 August 2019 17:22:00 Alexander Shraer  wrote:

That's great! Thanks for sharing.



Added benefit is that we can also control which data center gets the quorum
in case of a network outage between the two.



Can you explain how this works? In case of a network outage between two 
DCs, one of them has a quorum of participants and the other doesn't.
The participants in the smaller set should not be operational at this time, 
since they can't get quorum. no ?




Thanks,
Alex


On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:

We have solved this by implementing a 'zookeeper cluster balancer', it
calls the admin server api of each zookeeper to get the current status and
will issue dynamic reconfigure commands to change dead servers into
observers so the quorum is not in danger. Once the dead servers reconnect,
they take the observer role and are then reconfigured into participants again.

Added benefit is that we can also control which data center gets the quorum
in case of a network outage between the two.
Regards
Chris

On 21 August 2019 16:42:37 Alexander Shraer  wrote:


Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:


Hi
   I encounter a problem which blocks my development of load balance using
ZooKeeper 3.5.5.
   Actually, I have a ZooKeeper cluster which comprises of five zk
servers. And the dynamic configuration file is as follows:

  server.1=zk1:2888:3888:participant;0.0.0.0:2181
  server.2=zk2:2888:3888:participant;0.0.0.0:2181
  server.3=zk3:2888:3888:participant;0.0.0.0:2181
  server.4=zk4:2888:3888:participant;0.0.0.0:2181
  server.5=zk5:2888:3888:participant;0.0.0.0:2181

  The zk cluster can work fine if every member works normally. However, if
say two of them are suddenly down without previously being notified,
the dynamic configuration file shown above will not be synchronized
dynamically, which leads to the zk cluster fail to work normally.
  I think this is a very common case which may happen at any time. If so,
how can we resolve it?
  Really look forward to hearing from you!
Thanks





Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Enrico Olivelli
Il mer 21 ago 2019, 17:22 Alexander Shraer  ha scritto:

> That's great! Thanks for sharing.
>
> > Added benefit is that we can also control which data center gets the
> quorum
> > in case of a network outage between the two.
>
> Can you explain how this works? In case of a network outage between two
> DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this time,
> since they can't get quorum. no ?
>

I have recently talked about a similar problem with Norbert K. on gitter
chat.
We got to the conclusion that you need 3 datacenters

Enrico



> Thanks,
> Alex
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> > We have solved this by implementing a 'zookeeper cluster balancer', it
> > calls the admin server api of each zookeeper to get the current status
> and
> > will issue dynamic reconfigure commands to change dead servers into
> > observers so the quorum is not in danger. Once the dead servers
> reconnect,
> > they take the observer role and are then reconfigured into participants
> > again.
> >
> > Added benefit is that we can also control which data center gets the
> > quorum
> > in case of a network outage between the two.
> > Regards
> > Chris
> >
> > On 21 August 2019 16:42:37 Alexander Shraer  wrote:
> >
> > > Hi,
> > >
> > > Reconfiguration, as implemented, is not automatic. In your case, when
> > > failures happen, this doesn't change the ensemble membership.
> > > When 2 of 5 fail, this is still a minority, so everything should work
> > > normally, you just won't be able to handle an additional failure. If
> > you'd
> > > like
> > > to remove them from the ensemble, you need to issue an explicit
> > > reconfiguration command to do so.
> > >
> > > Please see details in the manual:
> > > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> > >
> > > Alex
> > >
> > > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> > >
> > >> Hi
> > >>I encounter a problem which blocks my development of load balance
> > using
> > >> ZooKeeper 3.5.5.
> > >>Actually, I have a ZooKeeper cluster which comprises of five zk
> > >> servers. And the dynamic configuration file is as follows:
> > >>
> > >>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> > >>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> > >>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> > >>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> > >>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> > >>
> > >>   The zk cluster can work fine if every member works normally.
> However,
> > if
> > >> say two of them are suddenly down without previously being notified,
> > >> the dynamic configuration file shown above will not be synchronized
> > >> dynamically, which leads to the zk cluster fail to work normally.
> > >>   I think this is a very common case which may happen at any time. If
> > so,
> > >> how can we resolve it?
> > >>   Really look forward to hearing from you!
> > >> Thanks
> > >>
> >
> >
> >
> >
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Alexander Shraer
That's great! Thanks for sharing.

> Added benefit is that we can also control which data center gets the
quorum
> in case of a network outage between the two.

Can you explain how this works? In case of a network outage between two
DCs, one of them has a quorum of participants and the other doesn't.
The participants in the smaller set should not be operational at this time,
since they can't get quorum. no ?

Thanks,
Alex

On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:

> We have solved this by implementing a 'zookeeper cluster balancer', it
> calls the admin server api of each zookeeper to get the current status and
> will issue dynamic reconfigure commands to change dead servers into
> observers so the quorum is not in danger. Once the dead servers reconnect,
> they take the observer role and are then reconfigured into participants
> again.
>
> Added benefit is that we can also control which data center gets the
> quorum
> in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
> > Hi,
> >
> > Reconfiguration, as implemented, is not automatic. In your case, when
> > failures happen, this doesn't change the ensemble membership.
> > When 2 of 5 fail, this is still a minority, so everything should work
> > normally, you just won't be able to handle an additional failure. If
> you'd
> > like
> > to remove them from the ensemble, you need to issue an explicit
> > reconfiguration command to do so.
> >
> > Please see details in the manual:
> > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> >
> > Alex
> >
> > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> >
> >> Hi
> >>I encounter a problem which blocks my development of load balance
> using
> >> ZooKeeper 3.5.5.
> >>Actually, I have a ZooKeeper cluster which comprises of five zk
> >> servers. And the dynamic configuration file is as follows:
> >>
> >>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> >>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> >>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> >>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> >>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> >>
> >>   The zk cluster can work fine if every member works normally. However,
> if
> >> say two of them are suddenly down without previously being notified,
> >> the dynamic configuration file shown above will not be synchronized
> >> dynamically, which leads to the zk cluster fail to work normally.
> >>   I think this is a very common case which may happen at any time. If
> so,
> >> how can we resolve it?
> >>   Really look forward to hearing from you!
> >> Thanks
> >>
>
>
>
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee
We have solved this by implementing a 'zookeeper cluster balancer', it 
calls the admin server api of each zookeeper to get the current status and 
will issue dynamic reconfigure commands to change dead servers into 
observers so the quorum is not in danger. Once the dead servers reconnect, 
they take the observer role and are then reconfigured into participants again.


Added benefit is that we can also control which data center gets the quorum 
in case of a network outage between the two.

Regards
Chris

On 21 August 2019 16:42:37 Alexander Shraer  wrote:


Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:


Hi
   I encounter a problem which blocks my development of load balance using
ZooKeeper 3.5.5.
   Actually, I have a ZooKeeper cluster which comprises of five zk
servers. And the dynamic configuration file is as follows:

  server.1=zk1:2888:3888:participant;0.0.0.0:2181
  server.2=zk2:2888:3888:participant;0.0.0.0:2181
  server.3=zk3:2888:3888:participant;0.0.0.0:2181
  server.4=zk4:2888:3888:participant;0.0.0.0:2181
  server.5=zk5:2888:3888:participant;0.0.0.0:2181

  The zk cluster can work fine if every member works normally. However, if
say two of them are suddenly down without previously being notified,
the dynamic configuration file shown above will not be synchronized
dynamically, which leads to the zk cluster fail to work normally.
  I think this is a very common case which may happen at any time. If so,
how can we resolve it?
  Really look forward to hearing from you!
Thanks







Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Alexander Shraer
Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:

> Hi
>I encounter a problem which blocks my development of load balance using
> ZooKeeper 3.5.5.
>Actually, I have a ZooKeeper cluster which comprises of five zk
> servers. And the dynamic configuration file is as follows:
>
>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>
>   The zk cluster can work fine if every member works normally. However, if
> say two of them are suddenly down without previously being notified,
> the dynamic configuration file shown above will not be synchronized
> dynamically, which leads to the zk cluster fail to work normally.
>   I think this is a very common case which may happen at any time. If so,
> how can we resolve it?
>   Really look forward to hearing from you!
> Thanks
>