Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee
We have 3+3 of which 1 floating observer in non target datacenter and 
automatic reconfiguring to more observers if we are losing participants.


If the target datacenter blows up this doesn't work, but our main 
application will be able to serve customers in a readonly state until 
operators switch the non target datacenter to active mode.


On 21 August 2019 20:39:21 Enrico Olivelli  wrote:


Il mer 21 ago 2019, 20:27 Cee Tee  ha scritto:



Yes, one side loses quorum and the other remains active. However we
actively control which side that is, because our main application is
active/passive with 2 datacenters. We need Zookeeper to remain active in


the applications active datacenter.




How many zk servers you have? 2 + 3?
If you lose DC #1 you are okay, but if you lose the #2 you cannot have a
quorum of 3, and you cannot simply add another server to #1

Enrico



On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> That's great! Thanks for sharing.
>
>
>> Added benefit is that we can also control which data center gets the
quorum
>> in case of a network outage between the two.
>
>
> Can you explain how this works? In case of a network outage between two
> DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this
time,
> since they can't get quorum. no ?
>
>
>
> Thanks,
> Alex
>
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> We have solved this by implementing a 'zookeeper cluster balancer', it
> calls the admin server api of each zookeeper to get the current status
and
> will issue dynamic reconfigure commands to change dead servers into
> observers so the quorum is not in danger. Once the dead servers
reconnect,
> they take the observer role and are then reconfigured into participants
again.
>
> Added benefit is that we can also control which data center gets the
quorum
> in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
>> Hi,
>>
>> Reconfiguration, as implemented, is not automatic. In your case, when
>> failures happen, this doesn't change the ensemble membership.
>> When 2 of 5 fail, this is still a minority, so everything should work
>> normally, you just won't be able to handle an additional failure. If
you'd
>> like
>> to remove them from the ensemble, you need to issue an explicit
>> reconfiguration command to do so.
>>
>> Please see details in the manual:
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
>>
>> Alex
>>
>> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
>>
>>> Hi
>>>I encounter a problem which blocks my development of load balance
using
>>> ZooKeeper 3.5.5.
>>>Actually, I have a ZooKeeper cluster which comprises of five zk
>>> servers. And the dynamic configuration file is as follows:
>>>
>>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>>>
>>>   The zk cluster can work fine if every member works normally.
However, if
>>> say two of them are suddenly down without previously being notified,
>>> the dynamic configuration file shown above will not be synchronized
>>> dynamically, which leads to the zk cluster fail to work normally.
>>>   I think this is a very common case which may happen at any time. If
so,
>>> how can we resolve it?
>>>   Really look forward to hearing from you!
>>> Thanks
>>>








RE: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Kathryn Hogg
At my organization we solve that by running a 3rd site as mentioned in another 
email.  We run a 5 node ensemble with 2 nodes in each primary data center and 1 
node in the co-location facility.  We try to minimize usage of the 5th node so 
we explicitly exclude it from our clients' connection string.

This way, if there is a network partition between datacenters, which ever one 
can still talk to the node at the 3rd datacenter will maintain quorum.

Ideally, if it was possible, we'd somehow like the node at the third datacenter 
to never be elected as the leader and even better if there was some way for it 
to be a voting member only and not bear any data (similar to mongodb's arbiter).


-Original Message-
From: Cee Tee [mailto:c.turks...@gmail.com] 
Sent: Wednesday, August 21, 2019 1:27 PM
To: Alexander Shraer 
Cc: user@zookeeper.apache.org
Subject: Re: About ZooKeeper Dynamic Reconfiguration

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}


Yes, one side loses quorum and the other remains active. However we actively 
control which side that is, because our main application is active/passive with 
2 datacenters. We need Zookeeper to remain active in the applications active 
datacenter.

On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> That's great! Thanks for sharing.
>
>
>> Added benefit is that we can also control which data center gets the 
>> quorum in case of a network outage between the two.
>
>
> Can you explain how this works? In case of a network outage between 
> two DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this 
> time, since they can't get quorum. no ?
>
>
>
> Thanks,
> Alex
>
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> We have solved this by implementing a 'zookeeper cluster balancer', it 
> calls the admin server api of each zookeeper to get the current status 
> and will issue dynamic reconfigure commands to change dead servers 
> into observers so the quorum is not in danger. Once the dead servers 
> reconnect, they take the observer role and are then reconfigured into 
> participants again.
>
> Added benefit is that we can also control which data center gets the 
> quorum in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
>> Hi,
>>
>> Reconfiguration, as implemented, is not automatic. In your case, when 
>> failures happen, this doesn't change the ensemble membership.
>> When 2 of 5 fail, this is still a minority, so everything should work 
>> normally, you just won't be able to handle an additional failure. If 
>> you'd like to remove them from the ensemble, you need to issue an 
>> explicit reconfiguration command to do so.
>>
>> Please see details in the manual:
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
>>
>> Alex
>>
>> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
>>
>>> Hi
>>>I encounter a problem which blocks my development of load balance 
>>> using ZooKeeper 3.5.5.
>>>Actually, I have a ZooKeeper cluster which comprises of five zk 
>>> servers. And the dynamic configuration file is as follows:
>>>
>>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>>>
>>>   The zk cluster can work fine if every member works normally. 
>>> However, if say two of them are suddenly down without previously 
>>> being notified, the dynamic configuration file shown above will not 
>>> be synchronized dynamically, which leads to the zk cluster fail to work 
>>> normally.
>>>   I think this is a very common case which may happen at any time. 
>>> If so, how can we resolve it?
>>>   Really look forward to hearing from you!
>>> Thanks
>>>



Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Enrico Olivelli
Il mer 21 ago 2019, 20:27 Cee Tee  ha scritto:

>
> Yes, one side loses quorum and the other remains active. However we
> actively control which side that is, because our main application is
> active/passive with 2 datacenters. We need Zookeeper to remain active in

the applications active datacenter.
>

How many zk servers you have? 2 + 3?
If you lose DC #1 you are okay, but if you lose the #2 you cannot have a
quorum of 3, and you cannot simply add another server to #1

Enrico

>
> On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> > That's great! Thanks for sharing.
> >
> >
> >> Added benefit is that we can also control which data center gets the
> quorum
> >> in case of a network outage between the two.
> >
> >
> > Can you explain how this works? In case of a network outage between two
> > DCs, one of them has a quorum of participants and the other doesn't.
> > The participants in the smaller set should not be operational at this
> time,
> > since they can't get quorum. no ?
> >
> >
> >
> > Thanks,
> > Alex
> >
> >
> > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
> >
> > We have solved this by implementing a 'zookeeper cluster balancer', it
> > calls the admin server api of each zookeeper to get the current status
> and
> > will issue dynamic reconfigure commands to change dead servers into
> > observers so the quorum is not in danger. Once the dead servers
> reconnect,
> > they take the observer role and are then reconfigured into participants
> again.
> >
> > Added benefit is that we can also control which data center gets the
> quorum
> > in case of a network outage between the two.
> > Regards
> > Chris
> >
> > On 21 August 2019 16:42:37 Alexander Shraer  wrote:
> >
> >> Hi,
> >>
> >> Reconfiguration, as implemented, is not automatic. In your case, when
> >> failures happen, this doesn't change the ensemble membership.
> >> When 2 of 5 fail, this is still a minority, so everything should work
> >> normally, you just won't be able to handle an additional failure. If
> you'd
> >> like
> >> to remove them from the ensemble, you need to issue an explicit
> >> reconfiguration command to do so.
> >>
> >> Please see details in the manual:
> >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> >>
> >> Alex
> >>
> >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> >>
> >>> Hi
> >>>I encounter a problem which blocks my development of load balance
> using
> >>> ZooKeeper 3.5.5.
> >>>Actually, I have a ZooKeeper cluster which comprises of five zk
> >>> servers. And the dynamic configuration file is as follows:
> >>>
> >>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> >>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> >>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> >>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> >>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> >>>
> >>>   The zk cluster can work fine if every member works normally.
> However, if
> >>> say two of them are suddenly down without previously being notified,
> >>> the dynamic configuration file shown above will not be synchronized
> >>> dynamically, which leads to the zk cluster fail to work normally.
> >>>   I think this is a very common case which may happen at any time. If
> so,
> >>> how can we resolve it?
> >>>   Really look forward to hearing from you!
> >>> Thanks
> >>>
>
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee


Yes, one side loses quorum and the other remains active. However we 
actively control which side that is, because our main application is 
active/passive with 2 datacenters. We need Zookeeper to remain active in 
the applications active datacenter.


On 21 August 2019 17:22:00 Alexander Shraer  wrote:

That's great! Thanks for sharing.



Added benefit is that we can also control which data center gets the quorum
in case of a network outage between the two.



Can you explain how this works? In case of a network outage between two 
DCs, one of them has a quorum of participants and the other doesn't.
The participants in the smaller set should not be operational at this time, 
since they can't get quorum. no ?




Thanks,
Alex


On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:

We have solved this by implementing a 'zookeeper cluster balancer', it
calls the admin server api of each zookeeper to get the current status and
will issue dynamic reconfigure commands to change dead servers into
observers so the quorum is not in danger. Once the dead servers reconnect,
they take the observer role and are then reconfigured into participants again.

Added benefit is that we can also control which data center gets the quorum
in case of a network outage between the two.
Regards
Chris

On 21 August 2019 16:42:37 Alexander Shraer  wrote:


Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:


Hi
   I encounter a problem which blocks my development of load balance using
ZooKeeper 3.5.5.
   Actually, I have a ZooKeeper cluster which comprises of five zk
servers. And the dynamic configuration file is as follows:

  server.1=zk1:2888:3888:participant;0.0.0.0:2181
  server.2=zk2:2888:3888:participant;0.0.0.0:2181
  server.3=zk3:2888:3888:participant;0.0.0.0:2181
  server.4=zk4:2888:3888:participant;0.0.0.0:2181
  server.5=zk5:2888:3888:participant;0.0.0.0:2181

  The zk cluster can work fine if every member works normally. However, if
say two of them are suddenly down without previously being notified,
the dynamic configuration file shown above will not be synchronized
dynamically, which leads to the zk cluster fail to work normally.
  I think this is a very common case which may happen at any time. If so,
how can we resolve it?
  Really look forward to hearing from you!
Thanks





Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Enrico Olivelli
Il mer 21 ago 2019, 17:22 Alexander Shraer  ha scritto:

> That's great! Thanks for sharing.
>
> > Added benefit is that we can also control which data center gets the
> quorum
> > in case of a network outage between the two.
>
> Can you explain how this works? In case of a network outage between two
> DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this time,
> since they can't get quorum. no ?
>

I have recently talked about a similar problem with Norbert K. on gitter
chat.
We got to the conclusion that you need 3 datacenters

Enrico



> Thanks,
> Alex
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> > We have solved this by implementing a 'zookeeper cluster balancer', it
> > calls the admin server api of each zookeeper to get the current status
> and
> > will issue dynamic reconfigure commands to change dead servers into
> > observers so the quorum is not in danger. Once the dead servers
> reconnect,
> > they take the observer role and are then reconfigured into participants
> > again.
> >
> > Added benefit is that we can also control which data center gets the
> > quorum
> > in case of a network outage between the two.
> > Regards
> > Chris
> >
> > On 21 August 2019 16:42:37 Alexander Shraer  wrote:
> >
> > > Hi,
> > >
> > > Reconfiguration, as implemented, is not automatic. In your case, when
> > > failures happen, this doesn't change the ensemble membership.
> > > When 2 of 5 fail, this is still a minority, so everything should work
> > > normally, you just won't be able to handle an additional failure. If
> > you'd
> > > like
> > > to remove them from the ensemble, you need to issue an explicit
> > > reconfiguration command to do so.
> > >
> > > Please see details in the manual:
> > > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> > >
> > > Alex
> > >
> > > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> > >
> > >> Hi
> > >>I encounter a problem which blocks my development of load balance
> > using
> > >> ZooKeeper 3.5.5.
> > >>Actually, I have a ZooKeeper cluster which comprises of five zk
> > >> servers. And the dynamic configuration file is as follows:
> > >>
> > >>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> > >>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> > >>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> > >>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> > >>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> > >>
> > >>   The zk cluster can work fine if every member works normally.
> However,
> > if
> > >> say two of them are suddenly down without previously being notified,
> > >> the dynamic configuration file shown above will not be synchronized
> > >> dynamically, which leads to the zk cluster fail to work normally.
> > >>   I think this is a very common case which may happen at any time. If
> > so,
> > >> how can we resolve it?
> > >>   Really look forward to hearing from you!
> > >> Thanks
> > >>
> >
> >
> >
> >
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Alexander Shraer
That's great! Thanks for sharing.

> Added benefit is that we can also control which data center gets the
quorum
> in case of a network outage between the two.

Can you explain how this works? In case of a network outage between two
DCs, one of them has a quorum of participants and the other doesn't.
The participants in the smaller set should not be operational at this time,
since they can't get quorum. no ?

Thanks,
Alex

On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:

> We have solved this by implementing a 'zookeeper cluster balancer', it
> calls the admin server api of each zookeeper to get the current status and
> will issue dynamic reconfigure commands to change dead servers into
> observers so the quorum is not in danger. Once the dead servers reconnect,
> they take the observer role and are then reconfigured into participants
> again.
>
> Added benefit is that we can also control which data center gets the
> quorum
> in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
> > Hi,
> >
> > Reconfiguration, as implemented, is not automatic. In your case, when
> > failures happen, this doesn't change the ensemble membership.
> > When 2 of 5 fail, this is still a minority, so everything should work
> > normally, you just won't be able to handle an additional failure. If
> you'd
> > like
> > to remove them from the ensemble, you need to issue an explicit
> > reconfiguration command to do so.
> >
> > Please see details in the manual:
> > https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
> >
> > Alex
> >
> > On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
> >
> >> Hi
> >>I encounter a problem which blocks my development of load balance
> using
> >> ZooKeeper 3.5.5.
> >>Actually, I have a ZooKeeper cluster which comprises of five zk
> >> servers. And the dynamic configuration file is as follows:
> >>
> >>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
> >>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
> >>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
> >>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
> >>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
> >>
> >>   The zk cluster can work fine if every member works normally. However,
> if
> >> say two of them are suddenly down without previously being notified,
> >> the dynamic configuration file shown above will not be synchronized
> >> dynamically, which leads to the zk cluster fail to work normally.
> >>   I think this is a very common case which may happen at any time. If
> so,
> >> how can we resolve it?
> >>   Really look forward to hearing from you!
> >> Thanks
> >>
>
>
>
>


Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Cee Tee
We have solved this by implementing a 'zookeeper cluster balancer', it 
calls the admin server api of each zookeeper to get the current status and 
will issue dynamic reconfigure commands to change dead servers into 
observers so the quorum is not in danger. Once the dead servers reconnect, 
they take the observer role and are then reconfigured into participants again.


Added benefit is that we can also control which data center gets the quorum 
in case of a network outage between the two.

Regards
Chris

On 21 August 2019 16:42:37 Alexander Shraer  wrote:


Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:


Hi
   I encounter a problem which blocks my development of load balance using
ZooKeeper 3.5.5.
   Actually, I have a ZooKeeper cluster which comprises of five zk
servers. And the dynamic configuration file is as follows:

  server.1=zk1:2888:3888:participant;0.0.0.0:2181
  server.2=zk2:2888:3888:participant;0.0.0.0:2181
  server.3=zk3:2888:3888:participant;0.0.0.0:2181
  server.4=zk4:2888:3888:participant;0.0.0.0:2181
  server.5=zk5:2888:3888:participant;0.0.0.0:2181

  The zk cluster can work fine if every member works normally. However, if
say two of them are suddenly down without previously being notified,
the dynamic configuration file shown above will not be synchronized
dynamically, which leads to the zk cluster fail to work normally.
  I think this is a very common case which may happen at any time. If so,
how can we resolve it?
  Really look forward to hearing from you!
Thanks







Re: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Alexander Shraer
Hi,

Reconfiguration, as implemented, is not automatic. In your case, when
failures happen, this doesn't change the ensemble membership.
When 2 of 5 fail, this is still a minority, so everything should work
normally, you just won't be able to handle an additional failure. If you'd
like
to remove them from the ensemble, you need to issue an explicit
reconfiguration command to do so.

Please see details in the manual:
https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html

Alex

On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:

> Hi
>I encounter a problem which blocks my development of load balance using
> ZooKeeper 3.5.5.
>Actually, I have a ZooKeeper cluster which comprises of five zk
> servers. And the dynamic configuration file is as follows:
>
>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>
>   The zk cluster can work fine if every member works normally. However, if
> say two of them are suddenly down without previously being notified,
> the dynamic configuration file shown above will not be synchronized
> dynamically, which leads to the zk cluster fail to work normally.
>   I think this is a very common case which may happen at any time. If so,
> how can we resolve it?
>   Really look forward to hearing from you!
> Thanks
>


About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Gao,Wei
Hi
   I encounter a problem which blocks my development of load balance using 
ZooKeeper 3.5.5.
   Actually, I have a ZooKeeper cluster which comprises of five zk servers. And 
the dynamic configuration file is as follows:

  server.1=zk1:2888:3888:participant;0.0.0.0:2181
  server.2=zk2:2888:3888:participant;0.0.0.0:2181
  server.3=zk3:2888:3888:participant;0.0.0.0:2181
  server.4=zk4:2888:3888:participant;0.0.0.0:2181
  server.5=zk5:2888:3888:participant;0.0.0.0:2181

  The zk cluster can work fine if every member works normally. However, if say 
two of them are suddenly down without previously being notified,
the dynamic configuration file shown above will not be synchronized 
dynamically, which leads to the zk cluster fail to work normally.
  I think this is a very common case which may happen at any time. If so, how 
can we resolve it?
  Really look forward to hearing from you!
Thanks


Re: The current epoch, 7, is older than the last zxid, 8589935882

2019-08-21 Thread Debraj Manna
With the other two zookeeper servers running I stopped the zookeeper in the
broken node and the deleted all the contents inside
/var/lib/zookeeper/version-2
and started the zookeeper back on the node. It is running fine now and got
all the data from the other servers.

I am getting confused after going through ZOOKEEPER-1653
 and ZOOKEEPER-2354
 . The issues say it
is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13
also. Can someone let me know if the issue is present in 3.4.13 also?



On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna 
wrote:

> Thanks for replying.
>
> What is the recommended way to remove a node and delete all data from it
> and make it start fresh?
>
> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, 
> wrote:
>
>> Hello,
>> Sorry for so late reply.
>> If you have 3 servers you can nuke the broken one and make it start from
>> scratch, it will join the cluster and then recover data from the other
>> servers
>>
>> Try it in a staging env, not in production
>>
>> Enrico
>>
>> Il mar 20 ago 2019, 20:30 Debraj Manna  ha
>> scritto:
>>
>> > The same has been asked in stackoverflow
>> > <
>> >
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>> > >
>> > also. But no response there also.
>> >
>> > Anyone any thoughts on this one?
>> >
>> > On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna 
>> > wrote:
>> >
>> > > Posted wrong Jira link. I meant
>> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>> let
>> > me
>> > > know what is the recommended way to recover the node?
>> > >
>> > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> acceptedEpoch
>> > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> currentEpoch
>> > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> > currentEpoch.tmp
>> > > 8support@platform2
>> > >
>> > > On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>> subharaj.ma...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi
>> > >>
>> > >> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>> > >> after reboot of machine zookeeper is not starting and I am seeing the
>> > below
>> > >> errors in logs.
>> > >>
>> > >> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
>> Can
>> > >> someone let me if this is fixed in 3.4.13 or not as I can see the
>> issue
>> > >> still open? Also can somone suggest what is the recommended way to
>> > recover
>> > >> the set-up ?
>> > >>
>> > >> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>> Unable
>> > >> to load database on disk
>> > >> java.io.IOException: The current epoch, 7, is older than the last
>> zxid,
>> > >> 34359738370
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> > >> at
>> > >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> > >> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>> > >> Unexpected exception, exiting abnormally
>> > >> java.lang.RuntimeException: Unable to run quorum server
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>> > >> at
>> > >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> > >> Caused by: java.io.IOException: The current epoch, 7, is older than
>> the
>> > >> last zxid, 34359738370
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> > >> ... 4 more
>> > >>
>> > >>
>> > >>
>> >
>>
>