[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @breed . Thanks for your comment. You are right, we should keep the enough epoch value to avoid meet the epoch overflow. So i offer a better solution is 24-bit epoch in second comment. Even if the frequency of leader election is once by every single minutes, we will not experience the epoch overflow until **1915.2** years later. ![image](https://user-images.githubusercontent.com/8108788/34022152-9f04832c-e178-11e7-9bf3-c1b047613dae.png) ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user breed commented on the issue: https://github.com/apache/zookeeper/pull/262 i think it would be much better to extend ZOOKEEPER-1277 to more transparently do the rollover without a full leader election. the main issue i have with shortening the epoch size is that once the epoch hits the maximum value the ensemble is stuck, nothing can proceed, so we really need to keep the epoch size big enough that we would never hit that condition. i don't think a 16-bit epoch satisfies that requirement. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @phunt . Indeed, the `FastLeaderElection` algorithm is very efficient. Most of the leader election situation would finished in hundreds milliseconds. However, some real-time stream frameworks suck as Apache Kafka and Apache Storm etc, could make lots of pressures into Zookeeper cluster when they carry on too many business data or processing logic. So maybe, the leader election will be triggered very frequently and the process becomes time consuming. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user phunt commented on the issue: https://github.com/apache/zookeeper/pull/262 Ok, thanks for the update. fwiw restarting taking a few minutes is going to be an issue regardless, no? Any regular type issue, such as a temporary network outage, could cause the quorum to be lost and leader election triggered. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user phunt commented on the issue: https://github.com/apache/zookeeper/pull/262 Are you seeing this behavior with ZOOKEEPER-1277 applied? If so it's a bug in that change, because after that's applied the leader should shutdown as we approach the rollover. It would be nice to address this by changing the zxid semantics, but I don't believe that's a great idea. Instead I would rather see us address any shortcoming in my original fix (1277) fwiw - what I have seen people do in this situation is to monitor the zxid and when it gets close (say within 10%) of the rollover they have an automated script which restarts the leader, which forces a re-election. However 1277 should be doing this for you. Given you are seeing this issue perhaps you can help with resolving any bugs in 1277? thanks! ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 @JarenGlover It's a good idea, but not the best solution. Still we can using the `restart` operation to solve this problem without any changes to Zookeeper for now. (BTW, you are welcome to get more details of this problem in my [blog](https://yuzhouwan.com/posts/31915#%e6%9e%b6%e6%9e%84%e8%ae%be%e8%ae%a1) :-) ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user JarenGlover commented on the issue: https://github.com/apache/zookeeper/pull/262 @yunfan123 @asdf2014 i have seen this issue a twice over a month period. is there anything one can do to prevent this from happening? maybe allowing for leader restarts at "off peak hours" weekly?(yuck i know) it sound like if we can move forward with this if we move to 48 bits low correct? note version: `3.4.10` ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user yunfan123 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @asdf2014 In most cases, I don't think the epoch can overflow 16-bit. In general, zookeeper leader election is very rare, and it may take several seconds even several minutes to finish leader election. And zookeeper is totally unavailable during leader election. If the zookeeper that you use can overflow 16-bits, it turns out the zookeeper you used is totally unreliable. Finally, compatible with old version is really important. If not compatible with old versions, I must restart all my zookeeper nodes. All of nodes need reload snapshot and log from disk, it will cost a lot of time. I believe this upgrade process is unacceptable by most zookeeper users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @yunfan123 . Thank you for your suggestion. As you said in the opinion, so that it can guarantee a smooth upgrade. However, if the 16-bit `epoch` overflow rather than the `counter` overflow, it will make Zookeeper cannot keep provide services by re-election anymore. So, i thought we should keep enough space for `epoch`. What you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user yunfan123 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, I think 48 bits low is better for large throughput zk cluster. Another benefits is when use 48 bits low we assuming the epoch low than (1<<16), so we can 16 bits high to judge whether it is old version or new version. So use 48 bits low we can make the upgrade progress smoothly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 @maoling You are welcome. Already changed it into following [code](https://github.com/apache/zookeeper/pull/262/files#diff-f4e58b67b9a4084420cb9b58398a953cR125). Then, i think it can still guarantee its idempotency. ```java long count = ZxidUtils.getCounterFromZxid(zxid); long epoch = ZxidUtils.getEpochFromZxid(zxid); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user maoling commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @asdf2014 .Thanks for your explanation! But I still have some confusions about the question one: look at code like this : ``` int epoch = (int)Long.rotateRight(zxid, 32);// >> 32; long count = zxid & 0xffL; ``` it all depends on that **zxid** can not be altered(no write operation when **zxid** has generated at the first time) in the multithread situation,otherwise epoch and count isn't idempotent.should **zxid** be decorated by **final**? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Hi, @maoling. Thanks for your discussion. Maybe due to my description is problematic, so make you confused. 1. I am worry about if the lower 8 bits of the upper 32 bits are divided into the low 32 bits of the entire `long` and become 40 bits low, there may be a concurrent problem. Actually, it shouldn't be worried, all operation about `ZXID` is bit operation rather than `=` assignment operation. So, it cann't be a concurrent problem in `JVM` level. 2. Yep, it is. Especially, if it is `1k/s` ops, then as long as $2^{32} / (86400 * 1000) \approx 49.7$ days `ZXID` will exhausted. And more terrible situation will make the `re-election` process comes early. At the same time, the "re-election" process could take half a minute. And it will be cannot acceptable. 3. As so far, it will throw a `XidRolloverException` to force `re-election` process and reset the `counter` to zero. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user maoling commented on the issue: https://github.com/apache/zookeeper/pull/262 A good and interesting question! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Due to this [jvm bug](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7177813), JDK7 cannot recognition `static import`... I will use fully qualified name replace of it. ```bash [javac] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/JsonGenerator.java:129: error: cannot find symbol [javac] long epoch = getEpochFromZxid(zxid); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Seems like all test cases [passed](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/738/testReport/), but some problems happened in `Zookeeper_operations` :: `testOperationsAndDisconnectConcurrently1`: ```bash [exec] BUILD FAILED [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1298: The following error occurred while executing this line: [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1308: exec returned: 2 [exec] [exec] Total time: 15 minutes 45 seconds [exec] /bin/kill -9 16911 [exec] [exec] Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1044 [exec] [exec] Zookeeper_operations::testAsyncGetOperation : elapsed 4 : OK [exec] [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently1FAIL: zktest-mt [exec] [exec] == [exec] [exec] 1 of 2 tests failed [exec] [exec] Please report to u...@zookeeper.apache.org [exec] [exec] == [exec] [exec] make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build/test/test-cppunit' [exec] [exec] /bin/bash: line 5: 15116 Segmentation fault ZKROOT=/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/src/c/../.. CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar ${dir}$tst [exec] [exec] make[1]: *** [check-TESTS] Error 1 [exec] [exec] make: *** [check-am] Error 2 [exec] [exec] Running contrib tests. [exec] == [exec] [exec] /home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -DZookeeperPatchProcess= -Dtest.junit.output.format=xml -Dtest.output=yes test-contrib [exec] Buildfile: /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml [exec] [exec] test-contrib: [exec] [exec] BUILD SUCCESSFUL [exec] Total time: 0 seconds ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Why `jenkins` reported the following message: ```bash mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #262: ZOOKEEPER-2789: Reassign `ZXID` for solving 32bit over...
Github user asdf2014 commented on the issue: https://github.com/apache/zookeeper/pull/262 Thinking about some abnormal situations, maybe 24 bit for `epoch` and 40 bit for `counter` is more better choice: $Math.min(2^{24} / (24 * 365), 2^{40} / (86400 * 1000 * 365)) \approx Math.min(1915.2, 34.9) = 34.9$ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---