Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Hi, One more question, does the proposal count metric get reset for every leader election? What i observed is that the old leader still has the highest proposal count. [image: image.png] Thanks, Ram On Wed, Aug 24, 2022 at 4:51 AM Szalay-Bekő Máté wrote: > Hello Ram, > > sorry, I don't really understand the question. The zxid is a 64 bit long > number. The upper 32 bits are coding an election epoch number (a logical > time / counter for leader elections), while the bottom 32 bits are counting > / providing an auto incremented id for all the changes made (committed) in > ZooKeeper. As far as I understood, the followers are sending proposals to > the leader, and each accepted (committed) proposal will result in an > increase in the zxid. The "current" / "latest" zxid is the same in the > whole cluster (of course followers can lag behind a little, but not much in > theory. if they are in-sync and part of the quorum). > > My understanding is that what you want to catch, is the event when the > lower 32 bits of the zxid is approaching 0x . As when the last 32 > bits of the zxid is reaching 0x, then a new leader election will be > triggered automatically and ZooKeeper won't be able to serve for a short > period of time. And I guess you want to control this event and maybe > restart the leader manually in a time what is suiting you better? > > But maybe I misunderstood your question. > > Máté > > On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > Máté, > > > > Thanks for quick reply, yes i did see that srvr command can give the > > current zxid, I also see a metric in mntr "proposal_count" which gives > > total proposals and when we hit the zxid limit that is matching with the > > proposal_count 2^32=*4,294,967,296)*metric. So i am trying to understand > > how this zxid will get incitement ? I don't see zxid in logs for normal > > events other than leader elections time. > > > > Ram > > > > > > > > On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté < > > szalay.beko.m...@gmail.com> wrote: > > > > > Hello! > > > > > > I think the "srvr" 4-letter-word diagnostic command should print you > the > > > current zxid. Also the similar command works on the Admin Rest API (if > it > > > is enabled). > > > > > > See: > > > > > > https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands > > > > > > An example: > > > > > > > > > echo srvr | nc localhost 2181 > > > > > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae, > > > built on 08/08/2022 11:04 GMT > > > Latency min/avg/max: 0/0/1808 > > > Received: 9599434 > > > Sent: 9673689 > > > Connections: 41 > > > Outstanding: 0 > > > Zxid: 0x2000afcbf <- this line > > > Mode: leader > > > Node count: 1384 > > > Proposal sizes last/min/max: 32/32/4226 > > > > > > > > > > > > > > > Also the zxid is added to the name of the snapshots / transaction log > > > files, which are flushed to the file system. Like: log. or > > > snapshot. > > > > > > e.g.: > > > > > > ls -la -R /var/lib/zookeeper/version-2/ > > > > > > /var/lib/zookeeper/version-2/: > > > total 57328 > > > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 . > > > drwxr-x--- 3 zookeeper zookeeper 4096 Aug 9 10:41 .. > > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch > > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57 > > > -rw-r--r-- 1 zookeeper zookeeper 1372956 Aug 17 10:09 > snapshot.20005a540 > > > -rw-r--r-- 1 zookeeper zookeeper 1370403 Aug 19 00:37 > snapshot.20006fc18 > > > -rw-r--r-- 1 zookeeper zookeeper 1369122 Aug 20 18:43 > snapshot.20008754f > > > -rw-r--r-- 1 zookeeper zookeeper 1369034 Aug 21 21:40 > snapshot.200096ed4 > > > -rw-r--r-- 1 zookeeper zookeeper 1379613 Aug 23 06:30 > snapshot.2000a9c56 > > > > > > > > > > > > Best regards, > > > Máté > > > > > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu < > > > rammohanga...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > We recently had a leader election due to "*zxid lower 32 bits have > > rolled > > > > over, forcing re-election*". This is the first time we are seeing > this > > > and > > > > trying to understand how to find if the ensemble is reaching that > > limit. > > > > Are there any metrics available in zk to track this? How can I > estimate > > > > when my zk cluster will reach this limit? > > > > > > > > Thanks, > > > > Ram > > > > > > > > > >
Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Yes, your understanding is correct. I would like to predict it and control the leader election by manual restart. Thanks On Wed, Aug 24, 2022, 4:51 AM Szalay-Bekő Máté wrote: > Hello Ram, > > sorry, I don't really understand the question. The zxid is a 64 bit long > number. The upper 32 bits are coding an election epoch number (a logical > time / counter for leader elections), while the bottom 32 bits are counting > / providing an auto incremented id for all the changes made (committed) in > ZooKeeper. As far as I understood, the followers are sending proposals to > the leader, and each accepted (committed) proposal will result in an > increase in the zxid. The "current" / "latest" zxid is the same in the > whole cluster (of course followers can lag behind a little, but not much in > theory. if they are in-sync and part of the quorum). > > My understanding is that what you want to catch, is the event when the > lower 32 bits of the zxid is approaching 0x . As when the last 32 > bits of the zxid is reaching 0x, then a new leader election will be > triggered automatically and ZooKeeper won't be able to serve for a short > period of time. And I guess you want to control this event and maybe > restart the leader manually in a time what is suiting you better? > > But maybe I misunderstood your question. > > Máté > > On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > Máté, > > > > Thanks for quick reply, yes i did see that srvr command can give the > > current zxid, I also see a metric in mntr "proposal_count" which gives > > total proposals and when we hit the zxid limit that is matching with the > > proposal_count 2^32=*4,294,967,296)*metric. So i am trying to understand > > how this zxid will get incitement ? I don't see zxid in logs for normal > > events other than leader elections time. > > > > Ram > > > > > > > > On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté < > > szalay.beko.m...@gmail.com> wrote: > > > > > Hello! > > > > > > I think the "srvr" 4-letter-word diagnostic command should print you > the > > > current zxid. Also the similar command works on the Admin Rest API (if > it > > > is enabled). > > > > > > See: > > > > > > https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands > > > > > > An example: > > > > > > > > > echo srvr | nc localhost 2181 > > > > > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae, > > > built on 08/08/2022 11:04 GMT > > > Latency min/avg/max: 0/0/1808 > > > Received: 9599434 > > > Sent: 9673689 > > > Connections: 41 > > > Outstanding: 0 > > > Zxid: 0x2000afcbf <- this line > > > Mode: leader > > > Node count: 1384 > > > Proposal sizes last/min/max: 32/32/4226 > > > > > > > > > > > > > > > Also the zxid is added to the name of the snapshots / transaction log > > > files, which are flushed to the file system. Like: log. or > > > snapshot. > > > > > > e.g.: > > > > > > ls -la -R /var/lib/zookeeper/version-2/ > > > > > > /var/lib/zookeeper/version-2/: > > > total 57328 > > > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 . > > > drwxr-x--- 3 zookeeper zookeeper 4096 Aug 9 10:41 .. > > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch > > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6 > > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57 > > > -rw-r--r-- 1 zookeeper zookeeper 1372956 Aug 17 10:09 > snapshot.20005a540 > > > -rw-r--r-- 1 zookeeper zookeeper 1370403 Aug 19 00:37 > snapshot.20006fc18 > > > -rw-r--r-- 1 zookeeper zookeeper 1369122 Aug 20 18:43 > snapshot.20008754f > > > -rw-r--r-- 1 zookeeper zookeeper 1369034 Aug 21 21:40 > snapshot.200096ed4 > > > -rw-r--r-- 1 zookeeper zookeeper 1379613 Aug 23 06:30 > snapshot.2000a9c56 > > > > > > > > > > > > Best regards, > > > Máté > > > > > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu < > > > rammohanga...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > We recently had a leader election due to "*zxid lower 32 bits have > > rolled > > > > over, forcing re-election*". This is the first time we are seeing > this > > > and > > > > trying to understand how to find if the ensemble is reaching that > > limit. > > > > Are there any metrics available in zk to track this? How can I > estimate > > > > when my zk cluster will reach this limit? > > > > > > > > Thanks, > > > > Ram > > > > > > > > > >
Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Hello Ram, sorry, I don't really understand the question. The zxid is a 64 bit long number. The upper 32 bits are coding an election epoch number (a logical time / counter for leader elections), while the bottom 32 bits are counting / providing an auto incremented id for all the changes made (committed) in ZooKeeper. As far as I understood, the followers are sending proposals to the leader, and each accepted (committed) proposal will result in an increase in the zxid. The "current" / "latest" zxid is the same in the whole cluster (of course followers can lag behind a little, but not much in theory. if they are in-sync and part of the quorum). My understanding is that what you want to catch, is the event when the lower 32 bits of the zxid is approaching 0x . As when the last 32 bits of the zxid is reaching 0x, then a new leader election will be triggered automatically and ZooKeeper won't be able to serve for a short period of time. And I guess you want to control this event and maybe restart the leader manually in a time what is suiting you better? But maybe I misunderstood your question. Máté On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu < rammohanga...@gmail.com> wrote: > Máté, > > Thanks for quick reply, yes i did see that srvr command can give the > current zxid, I also see a metric in mntr "proposal_count" which gives > total proposals and when we hit the zxid limit that is matching with the > proposal_count 2^32=*4,294,967,296)*metric. So i am trying to understand > how this zxid will get incitement ? I don't see zxid in logs for normal > events other than leader elections time. > > Ram > > > > On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté < > szalay.beko.m...@gmail.com> wrote: > > > Hello! > > > > I think the "srvr" 4-letter-word diagnostic command should print you the > > current zxid. Also the similar command works on the Admin Rest API (if it > > is enabled). > > > > See: > > > https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands > > > > An example: > > > > > > echo srvr | nc localhost 2181 > > > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae, > > built on 08/08/2022 11:04 GMT > > Latency min/avg/max: 0/0/1808 > > Received: 9599434 > > Sent: 9673689 > > Connections: 41 > > Outstanding: 0 > > Zxid: 0x2000afcbf <- this line > > Mode: leader > > Node count: 1384 > > Proposal sizes last/min/max: 32/32/4226 > > > > > > > > > > Also the zxid is added to the name of the snapshots / transaction log > > files, which are flushed to the file system. Like: log. or > > snapshot. > > > > e.g.: > > > > ls -la -R /var/lib/zookeeper/version-2/ > > > > /var/lib/zookeeper/version-2/: > > total 57328 > > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 . > > drwxr-x--- 3 zookeeper zookeeper 4096 Aug 9 10:41 .. > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch > > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541 > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19 > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550 > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6 > > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57 > > -rw-r--r-- 1 zookeeper zookeeper 1372956 Aug 17 10:09 snapshot.20005a540 > > -rw-r--r-- 1 zookeeper zookeeper 1370403 Aug 19 00:37 snapshot.20006fc18 > > -rw-r--r-- 1 zookeeper zookeeper 1369122 Aug 20 18:43 snapshot.20008754f > > -rw-r--r-- 1 zookeeper zookeeper 1369034 Aug 21 21:40 snapshot.200096ed4 > > -rw-r--r-- 1 zookeeper zookeeper 1379613 Aug 23 06:30 snapshot.2000a9c56 > > > > > > > > Best regards, > > Máté > > > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu < > > rammohanga...@gmail.com> wrote: > > > > > Hi, > > > > > > We recently had a leader election due to "*zxid lower 32 bits have > rolled > > > over, forcing re-election*". This is the first time we are seeing this > > and > > > trying to understand how to find if the ensemble is reaching that > limit. > > > Are there any metrics available in zk to track this? How can I estimate > > > when my zk cluster will reach this limit? > > > > > > Thanks, > > > Ram > > > > > >
Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Máté, Thanks for quick reply, yes i did see that srvr command can give the current zxid, I also see a metric in mntr "proposal_count" which gives total proposals and when we hit the zxid limit that is matching with the proposal_count 2^32=*4,294,967,296)*metric. So i am trying to understand how this zxid will get incitement ? I don't see zxid in logs for normal events other than leader elections time. Ram On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté < szalay.beko.m...@gmail.com> wrote: > Hello! > > I think the "srvr" 4-letter-word diagnostic command should print you the > current zxid. Also the similar command works on the Admin Rest API (if it > is enabled). > > See: > https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands > > An example: > > > echo srvr | nc localhost 2181 > > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae, > built on 08/08/2022 11:04 GMT > Latency min/avg/max: 0/0/1808 > Received: 9599434 > Sent: 9673689 > Connections: 41 > Outstanding: 0 > Zxid: 0x2000afcbf <- this line > Mode: leader > Node count: 1384 > Proposal sizes last/min/max: 32/32/4226 > > > > > Also the zxid is added to the name of the snapshots / transaction log > files, which are flushed to the file system. Like: log. or > snapshot. > > e.g.: > > ls -la -R /var/lib/zookeeper/version-2/ > > /var/lib/zookeeper/version-2/: > total 57328 > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 . > drwxr-x--- 3 zookeeper zookeeper 4096 Aug 9 10:41 .. > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541 > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19 > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550 > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6 > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57 > -rw-r--r-- 1 zookeeper zookeeper 1372956 Aug 17 10:09 snapshot.20005a540 > -rw-r--r-- 1 zookeeper zookeeper 1370403 Aug 19 00:37 snapshot.20006fc18 > -rw-r--r-- 1 zookeeper zookeeper 1369122 Aug 20 18:43 snapshot.20008754f > -rw-r--r-- 1 zookeeper zookeeper 1369034 Aug 21 21:40 snapshot.200096ed4 > -rw-r--r-- 1 zookeeper zookeeper 1379613 Aug 23 06:30 snapshot.2000a9c56 > > > > Best regards, > Máté > > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu < > rammohanga...@gmail.com> wrote: > > > Hi, > > > > We recently had a leader election due to "*zxid lower 32 bits have rolled > > over, forcing re-election*". This is the first time we are seeing this > and > > trying to understand how to find if the ensemble is reaching that limit. > > Are there any metrics available in zk to track this? How can I estimate > > when my zk cluster will reach this limit? > > > > Thanks, > > Ram > > >
Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Hello! I think the "srvr" 4-letter-word diagnostic command should print you the current zxid. Also the similar command works on the Admin Rest API (if it is enabled). See: https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands An example: echo srvr | nc localhost 2181 Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae, built on 08/08/2022 11:04 GMT Latency min/avg/max: 0/0/1808 Received: 9599434 Sent: 9673689 Connections: 41 Outstanding: 0 Zxid: 0x2000afcbf <- this line Mode: leader Node count: 1384 Proposal sizes last/min/max: 32/32/4226 Also the zxid is added to the name of the snapshots / transaction log files, which are flushed to the file system. Like: log. or snapshot. e.g.: ls -la -R /var/lib/zookeeper/version-2/ /var/lib/zookeeper/version-2/: total 57328 drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 . drwxr-x--- 3 zookeeper zookeeper 4096 Aug 9 10:41 .. -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541 -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19 -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550 -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6 -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57 -rw-r--r-- 1 zookeeper zookeeper 1372956 Aug 17 10:09 snapshot.20005a540 -rw-r--r-- 1 zookeeper zookeeper 1370403 Aug 19 00:37 snapshot.20006fc18 -rw-r--r-- 1 zookeeper zookeeper 1369122 Aug 20 18:43 snapshot.20008754f -rw-r--r-- 1 zookeeper zookeeper 1369034 Aug 21 21:40 snapshot.200096ed4 -rw-r--r-- 1 zookeeper zookeeper 1379613 Aug 23 06:30 snapshot.2000a9c56 Best regards, Máté On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu < rammohanga...@gmail.com> wrote: > Hi, > > We recently had a leader election due to "*zxid lower 32 bits have rolled > over, forcing re-election*". This is the first time we are seeing this and > trying to understand how to find if the ensemble is reaching that limit. > Are there any metrics available in zk to track this? How can I estimate > when my zk cluster will reach this limit? > > Thanks, > Ram >
Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)
Hi, We recently had a leader election due to "*zxid lower 32 bits have rolled over, forcing re-election*". This is the first time we are seeing this and trying to understand how to find if the ensemble is reaching that limit. Are there any metrics available in zk to track this? How can I estimate when my zk cluster will reach this limit? Thanks, Ram