Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-09-19 Thread rammohan ganapavarapu
Hi,

One more question, does the proposal count metric get reset for every
leader election? What i observed is that the old leader still has the
highest proposal count.

[image: image.png]
Thanks,
Ram


On Wed, Aug 24, 2022 at 4:51 AM Szalay-Bekő Máté 
wrote:

> Hello Ram,
>
> sorry, I don't really understand the question. The zxid is a 64 bit long
> number. The upper 32 bits are coding an election epoch number (a logical
> time / counter for leader elections), while the bottom 32 bits are counting
> / providing an auto incremented id for all the changes made (committed) in
> ZooKeeper. As far as I understood, the followers are sending proposals to
> the leader, and each accepted (committed) proposal will result in an
> increase in the zxid. The "current" / "latest" zxid is the same in the
> whole cluster (of course followers can lag behind a little, but not much in
> theory. if they are in-sync and part of the quorum).
>
> My understanding is that what you want to catch, is the event when the
> lower 32 bits of the zxid is approaching 0x . As when the last 32
> bits of the zxid is reaching 0x, then a new leader election will be
> triggered automatically and ZooKeeper won't be able to serve for a short
> period of time. And I guess you want to control this event and maybe
> restart the leader manually in a time what is suiting you better?
>
> But maybe I misunderstood your question.
>
> Máté
>
> On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Máté,
> >
> > Thanks for quick reply, yes i did see that srvr command can give the
> > current zxid, I also see a metric in mntr "proposal_count" which gives
> > total proposals and when we hit the zxid limit that is matching with the
> > proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
> > how this zxid will get incitement ? I don't see zxid in logs for normal
> > events other than leader elections time.
> >
> > Ram
> >
> >
> >
> > On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com> wrote:
> >
> > > Hello!
> > >
> > > I think the "srvr" 4-letter-word diagnostic command should print you
> the
> > > current zxid. Also the similar command works on the Admin Rest API (if
> it
> > > is enabled).
> > >
> > > See:
> > >
> >
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
> > >
> > > An example:
> > >
> > >
> > > echo srvr | nc localhost 2181
> > >
> > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> > > built on 08/08/2022 11:04 GMT
> > > Latency min/avg/max: 0/0/1808
> > > Received: 9599434
> > > Sent: 9673689
> > > Connections: 41
> > > Outstanding: 0
> > > Zxid: 0x2000afcbf <- this line
> > > Mode: leader
> > > Node count: 1384
> > > Proposal sizes last/min/max: 32/32/4226
> > >
> > >
> > >
> > >
> > > Also the zxid is added to the name of the snapshots / transaction log
> > > files, which are flushed to the file system. Like:  log.  or
> > > snapshot.
> > >
> > > e.g.:
> > >
> > > ls -la -R /var/lib/zookeeper/version-2/
> > >
> > > /var/lib/zookeeper/version-2/:
> > > total 57328
> > > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
> > > drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
> > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
> > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> > > -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09
> snapshot.20005a540
> > > -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37
> snapshot.20006fc18
> > > -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43
> snapshot.20008754f
> > > -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40
> snapshot.200096ed4
> > > -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30
> snapshot.2000a9c56
> > >
> > >
> > >
> > > Best regards,
> > > Máté
> > >
> > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> > > rammohanga...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We recently had a leader election due to "*zxid lower 32 bits have
> > rolled
> > > > over, forcing re-election*". This is the first time we are seeing
> this
> > > and
> > > > trying to understand how to find if the ensemble is reaching that
> > limit.
> > > > Are there any metrics available in zk to track this? How can I
> estimate
> > > > when my zk cluster will reach this limit?
> > > >
> > > > Thanks,
> > > > Ram
> > > >
> > >
> >
>


Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-24 Thread rammohan ganapavarapu
Yes, your understanding is correct. I would like to predict it and control
the leader election by manual restart.

Thanks

On Wed, Aug 24, 2022, 4:51 AM Szalay-Bekő Máté 
wrote:

> Hello Ram,
>
> sorry, I don't really understand the question. The zxid is a 64 bit long
> number. The upper 32 bits are coding an election epoch number (a logical
> time / counter for leader elections), while the bottom 32 bits are counting
> / providing an auto incremented id for all the changes made (committed) in
> ZooKeeper. As far as I understood, the followers are sending proposals to
> the leader, and each accepted (committed) proposal will result in an
> increase in the zxid. The "current" / "latest" zxid is the same in the
> whole cluster (of course followers can lag behind a little, but not much in
> theory. if they are in-sync and part of the quorum).
>
> My understanding is that what you want to catch, is the event when the
> lower 32 bits of the zxid is approaching 0x . As when the last 32
> bits of the zxid is reaching 0x, then a new leader election will be
> triggered automatically and ZooKeeper won't be able to serve for a short
> period of time. And I guess you want to control this event and maybe
> restart the leader manually in a time what is suiting you better?
>
> But maybe I misunderstood your question.
>
> Máté
>
> On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Máté,
> >
> > Thanks for quick reply, yes i did see that srvr command can give the
> > current zxid, I also see a metric in mntr "proposal_count" which gives
> > total proposals and when we hit the zxid limit that is matching with the
> > proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
> > how this zxid will get incitement ? I don't see zxid in logs for normal
> > events other than leader elections time.
> >
> > Ram
> >
> >
> >
> > On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com> wrote:
> >
> > > Hello!
> > >
> > > I think the "srvr" 4-letter-word diagnostic command should print you
> the
> > > current zxid. Also the similar command works on the Admin Rest API (if
> it
> > > is enabled).
> > >
> > > See:
> > >
> >
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
> > >
> > > An example:
> > >
> > >
> > > echo srvr | nc localhost 2181
> > >
> > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> > > built on 08/08/2022 11:04 GMT
> > > Latency min/avg/max: 0/0/1808
> > > Received: 9599434
> > > Sent: 9673689
> > > Connections: 41
> > > Outstanding: 0
> > > Zxid: 0x2000afcbf <- this line
> > > Mode: leader
> > > Node count: 1384
> > > Proposal sizes last/min/max: 32/32/4226
> > >
> > >
> > >
> > >
> > > Also the zxid is added to the name of the snapshots / transaction log
> > > files, which are flushed to the file system. Like:  log.  or
> > > snapshot.
> > >
> > > e.g.:
> > >
> > > ls -la -R /var/lib/zookeeper/version-2/
> > >
> > > /var/lib/zookeeper/version-2/:
> > > total 57328
> > > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
> > > drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
> > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
> > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> > > -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09
> snapshot.20005a540
> > > -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37
> snapshot.20006fc18
> > > -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43
> snapshot.20008754f
> > > -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40
> snapshot.200096ed4
> > > -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30
> snapshot.2000a9c56
> > >
> > >
> > >
> > > Best regards,
> > > Máté
> > >
> > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> > > rammohanga...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We recently had a leader election due to "*zxid lower 32 bits have
> > rolled
> > > > over, forcing re-election*". This is the first time we are seeing
> this
> > > and
> > > > trying to understand how to find if the ensemble is reaching that
> > limit.
> > > > Are there any metrics available in zk to track this? How can I
> estimate
> > > > when my zk cluster will reach this limit?
> > > >
> > > > Thanks,
> > > > Ram
> > > >
> > >
> >
>


Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-24 Thread Szalay-Bekő Máté
Hello Ram,

sorry, I don't really understand the question. The zxid is a 64 bit long
number. The upper 32 bits are coding an election epoch number (a logical
time / counter for leader elections), while the bottom 32 bits are counting
/ providing an auto incremented id for all the changes made (committed) in
ZooKeeper. As far as I understood, the followers are sending proposals to
the leader, and each accepted (committed) proposal will result in an
increase in the zxid. The "current" / "latest" zxid is the same in the
whole cluster (of course followers can lag behind a little, but not much in
theory. if they are in-sync and part of the quorum).

My understanding is that what you want to catch, is the event when the
lower 32 bits of the zxid is approaching 0x . As when the last 32
bits of the zxid is reaching 0x, then a new leader election will be
triggered automatically and ZooKeeper won't be able to serve for a short
period of time. And I guess you want to control this event and maybe
restart the leader manually in a time what is suiting you better?

But maybe I misunderstood your question.

Máté

On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Máté,
>
> Thanks for quick reply, yes i did see that srvr command can give the
> current zxid, I also see a metric in mntr "proposal_count" which gives
> total proposals and when we hit the zxid limit that is matching with the
> proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
> how this zxid will get incitement ? I don't see zxid in logs for normal
> events other than leader elections time.
>
> Ram
>
>
>
> On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
> > Hello!
> >
> > I think the "srvr" 4-letter-word diagnostic command should print you the
> > current zxid. Also the similar command works on the Admin Rest API (if it
> > is enabled).
> >
> > See:
> >
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
> >
> > An example:
> >
> >
> > echo srvr | nc localhost 2181
> >
> > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> > built on 08/08/2022 11:04 GMT
> > Latency min/avg/max: 0/0/1808
> > Received: 9599434
> > Sent: 9673689
> > Connections: 41
> > Outstanding: 0
> > Zxid: 0x2000afcbf <- this line
> > Mode: leader
> > Node count: 1384
> > Proposal sizes last/min/max: 32/32/4226
> >
> >
> >
> >
> > Also the zxid is added to the name of the snapshots / transaction log
> > files, which are flushed to the file system. Like:  log.  or
> > snapshot.
> >
> > e.g.:
> >
> > ls -la -R /var/lib/zookeeper/version-2/
> >
> > /var/lib/zookeeper/version-2/:
> > total 57328
> > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
> > drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
> > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
> > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> > -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
> > -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
> > -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
> > -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
> > -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56
> >
> >
> >
> > Best regards,
> > Máté
> >
> > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We recently had a leader election due to "*zxid lower 32 bits have
> rolled
> > > over, forcing re-election*". This is the first time we are seeing this
> > and
> > > trying to understand how to find if the ensemble is reaching that
> limit.
> > > Are there any metrics available in zk to track this? How can I estimate
> > > when my zk cluster will reach this limit?
> > >
> > > Thanks,
> > > Ram
> > >
> >
>


Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-23 Thread rammohan ganapavarapu
Máté,

Thanks for quick reply, yes i did see that srvr command can give the
current zxid, I also see a metric in mntr "proposal_count" which gives
total proposals and when we hit the zxid limit that is matching with the
proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
how this zxid will get incitement ? I don't see zxid in logs for normal
events other than leader elections time.

Ram



On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
szalay.beko.m...@gmail.com> wrote:

> Hello!
>
> I think the "srvr" 4-letter-word diagnostic command should print you the
> current zxid. Also the similar command works on the Admin Rest API (if it
> is enabled).
>
> See:
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
>
> An example:
>
>
> echo srvr | nc localhost 2181
>
> Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> built on 08/08/2022 11:04 GMT
> Latency min/avg/max: 0/0/1808
> Received: 9599434
> Sent: 9673689
> Connections: 41
> Outstanding: 0
> Zxid: 0x2000afcbf <- this line
> Mode: leader
> Node count: 1384
> Proposal sizes last/min/max: 32/32/4226
>
>
>
>
> Also the zxid is added to the name of the snapshots / transaction log
> files, which are flushed to the file system. Like:  log.  or
> snapshot.
>
> e.g.:
>
> ls -la -R /var/lib/zookeeper/version-2/
>
> /var/lib/zookeeper/version-2/:
> total 57328
> drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
> drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
> -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
> -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
> -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
> -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
> -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
> -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56
>
>
>
> Best regards,
> Máté
>
> On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Hi,
> >
> > We recently had a leader election due to "*zxid lower 32 bits have rolled
> > over, forcing re-election*". This is the first time we are seeing this
> and
> > trying to understand how to find if the ensemble is reaching that limit.
> > Are there any metrics available in zk to track this? How can I estimate
> > when my zk cluster will reach this limit?
> >
> > Thanks,
> > Ram
> >
>


Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-23 Thread Szalay-Bekő Máté
Hello!

I think the "srvr" 4-letter-word diagnostic command should print you the
current zxid. Also the similar command works on the Admin Rest API (if it
is enabled).

See:
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands

An example:


echo srvr | nc localhost 2181

Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
built on 08/08/2022 11:04 GMT
Latency min/avg/max: 0/0/1808
Received: 9599434
Sent: 9673689
Connections: 41
Outstanding: 0
Zxid: 0x2000afcbf <- this line
Mode: leader
Node count: 1384
Proposal sizes last/min/max: 32/32/4226




Also the zxid is added to the name of the snapshots / transaction log
files, which are flushed to the file system. Like:  log.  or
snapshot.

e.g.:

ls -la -R /var/lib/zookeeper/version-2/

/var/lib/zookeeper/version-2/:
total 57328
drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
-rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
-rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
-rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
-rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
-rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
-rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56



Best regards,
Máté

On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> We recently had a leader election due to "*zxid lower 32 bits have rolled
> over, forcing re-election*". This is the first time we are seeing this and
> trying to understand how to find if the ensemble is reaching that limit.
> Are there any metrics available in zk to track this? How can I estimate
> when my zk cluster will reach this limit?
>
> Thanks,
> Ram
>


Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-23 Thread rammohan ganapavarapu
Hi,

We recently had a leader election due to "*zxid lower 32 bits have rolled
over, forcing re-election*". This is the first time we are seeing this and
trying to understand how to find if the ensemble is reaching that limit.
Are there any metrics available in zk to track this? How can I estimate
when my zk cluster will reach this limit?

Thanks,
Ram