[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640957#comment-16640957 ] maoling commented on ZOOKEEPER-2778: [~hanm] Are still working on this?Could I pick up it? > Potential server deadlock between follower sync with leader and follower > receiving external connection requests. > > > Key: ZOOKEEPER-2778 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.5.3 >Reporter: Michael Han >Assignee: Michael Han >Priority: Critical > > It's possible to have a deadlock during recovery phase. > Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest > [1]. . Here is a sample thread dump that illustrates the state of the > execution: > {noformat} > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642) > [junit] > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471) > [junit] at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520) > [junit] at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) > {noformat} > The dead lock happens between the quorum peer thread which running the > follower that doing sync with leader work, and the listener of the qcm of the > same quorum peer that doing the receiving connection work. Basically to > finish sync with leader, the follower needs to synchronize on both QV_LOCK > and the qmc object it owns; while in the receiver thread to finish setup an > incoming connection the thread needs to synchronize on both the qcm object > the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here > is the order of acquiring two locks are different, thus depends on timing / > actual execution order, two threads might end up acquiring one lock while > holding another. > [1] > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640957#comment-16640957 ] maoling edited comment on ZOOKEEPER-2778 at 10/7/18 3:02 AM: - [~hanm] Are you still working on this?Could I pick up it(smirk)? was (Author: maoling): [~hanm] Are still working on this?Could I pick up it? > Potential server deadlock between follower sync with leader and follower > receiving external connection requests. > > > Key: ZOOKEEPER-2778 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.5.3 >Reporter: Michael Han >Assignee: Michael Han >Priority: Critical > > It's possible to have a deadlock during recovery phase. > Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest > [1]. . Here is a sample thread dump that illustrates the state of the > execution: > {noformat} > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642) > [junit] > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471) > [junit] at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520) > [junit] at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) > {noformat} > The dead lock happens between the quorum peer thread which running the > follower that doing sync with leader work, and the listener of the qcm of the > same quorum peer that doing the receiving connection work. Basically to > finish sync with leader, the follower needs to synchronize on both QV_LOCK > and the qmc object it owns; while in the receiver thread to finish setup an > incoming connection the thread needs to synchronize on both the qcm object > the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here > is the order of acquiring two locks are different, thus depends on timing / > actual execution order, two threads might end up acquiring one lock while > holding another. > [1] > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig -- This message was sent by Atlassian JIRA (v7.6.3#76005)
ZooKeeper-trunk - Build # 221 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk/221/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 333.68 KB...] [exec] : elapsed 1001 : OK [exec] Zookeeper_simpleSystem::testLogCallbackClearLog Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1080: Client environment:zookeeper.version=zookeeper C client 3.6.0] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1084: Client environment:host.name=asf909.gq1.ygridcore.net] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1091: Client environment:os.name=Linux] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1092: Client environment:os.arch=3.13.0-153-generic] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1093: Client environment:os.version=#203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1101: Client environment:user.name=jenkins] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1109: Client environment:user.home=/home/jenkins] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1121: Client environment:user.dir=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@zookeeper_init_internal@1167: Initiating client connection, host=127.0.0.1:22181 sessionTimeout=1 watcher=0x4639e0 sessionId=0 sessionPasswd= context=0x7ffc43b3fe80 flags=0] [exec] Log Message Received: [2018-10-06 23:45:47,542:30308(0x2afe0a205700):ZOO_INFO@check_events@2454: initiated connection to server 127.0.0.1:22181] [exec] Log Message Received: [2018-10-06 23:45:47,561:30308(0x2afe0a205700):ZOO_INFO@check_events@2506: session establishment complete on server 127.0.0.1:22181, sessionId=0x101cdc8d80e000f, negotiated timeout=1 ] [exec] : elapsed 1001 : OK [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : elapsed 10520 : OK [exec] Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK [exec] Zookeeper_simpleSystem::testFirstServerDown : elapsed 1001 : OK [exec] Zookeeper_simpleSystem::testNonexistentHost : elapsed 1034 : OK [exec] Zookeeper_simpleSystem::testNullData : elapsed 1032 : OK [exec] Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK [exec] Zookeeper_simpleSystem::testCreate : elapsed 1016 : OK [exec] Zookeeper_simpleSystem::testPath : elapsed 1049 : OK [exec] Zookeeper_simpleSystem::testPathValidation : elapsed 1158 : OK [exec] Zookeeper_simpleSystem::testPing : elapsed 17642 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1016 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3081 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started ZooKeeper server started : elapsed 31095 : OK [exec] Zookeeper_simpleSystem::testHangingClient : elapsed 1046 : OK [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 15679 : OK [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 15783 : OK [exec] Zookeeper_simpleSystem::testGetChildren2 : elapsed 1079 : OK [exec] Zookeeper_simpleSystem::testLastZxid : elapsed 4537 : OK [exec] Zookeeper_simpleSystem::testRemoveWatchers ZooKeeper server started : elapsed 4718 : OK [exec] *** Error in `./zktest-mt': free(): invalid pointer: 0x2afe0818e000 *** [exec] /bin/bash: line 5: 30308 Aborted ZKROOT=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/zookeeper-client/zookeeper-client-c/../.. CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar ${dir}$tst [exec] Zookeeper_readOnly::testReadOnly : elapsed 4132 : OK [exec] Zookeeper_logClientEnv::testLogClientEnv : elapsed 1 : OK [exec] OK (76) [exec] FAIL: zktest-mt [exec] == [exec] 1 of 2 tests failed [exec] Please report to u...@zookeeper.apache.org [exec] == [exec] make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit' [exec] make[1]: *** [check-TESTS] Error 1 [exec] make: *** [check-am] Error 2 BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1550: The following error occurred while
Roadmap to MetricsProvider
Hi guys, as I am going to work on Prometheus implementation I should at least add some minimal metric to expose to MetricsProvider. In order to introduce a MetricsProvided based instrumentation we should eventually drop existing instrumentation and replace with the new system. The challenge would be to not drop 4 letter words API and/or not to duplicate all the instrumentation points. A minimal instrumentation, just to expose some useful value is to add this method to ZooKeeperServer.java, and expose basic data. protected void setupMetrics() { rootMetricsContext.registerGauge("outstanding_requests", () -> { return serverStats.getOutstandingRequests(); }); rootMetricsContext.registerGauge("znode_count", () -> { return zkDb.getNodeCount(); }); rootMetricsContext.registerGauge("watch_count", () -> { return zkDb.getDataTree().getWatchCount(); }); rootMetricsContext.registerGauge("ephemerals_count", () -> { return zkDb.getDataTree().getEphemeralsCount(); }); rootMetricsContext.registerGauge("approximate_data_size", () -> { return zkDb.getDataTree().cachedApproximateDataSize(); }); rootMetricsContext.registerGauge("global_sessions", () -> { return zkDb.getSessionCount(); }); rootMetricsContext.registerGauge("local_sessions", () -> { return sessionTracker.getLocalSessionCount(); }); } This approach is not the one I expect for the long term, as each subsystem (ZkDatabase, Packket Processor)...will have its own specific instrumentation. This can work in the very short term only for "gauges" and not for Summaries (with avg/min/max...) and Counters, which should be collected in-place. Do you have any suggestion ? Enrico
[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...
Github user eolivelli commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/632#discussion_r223181405 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java --- @@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType type, Watcher watcher) { public ReferenceCountedACLCache getReferenceCountedAclCache() { return aclCache; } + +/** + * Add the digest to the historical list, and update the latest zxid digest. + */ +private void logZxidDigest(long zxid, long digest) { +ZxidDigest zxidDigest = new ZxidDigest(zxid, DigestCalculator.DIGEST_VERSION, digest); +lastProcessedZxidDigest = zxidDigest; +if (zxidDigest.zxid % 128 == 0) { +synchronized (digestLog) { +digestLog.add(zxidDigest); +if (digestLog.size() > DIGEST_LOG_LIMIT) { +digestLog.poll(); +} +} +} +} + +/** + * Serializing the digest to snapshot, this is done after the data tree + * is being serialized, so when we replay the txns and it hits this zxid + * we know we should be in a non-fuzzy state, and have the same digest. + * + * @param oa the output stream to write to + * @return true if the digest is serialized successfully + */ +public Boolean serializeZxidDigest(OutputArchive oa) throws IOException { +if (!DigestCalculator.digestEnabled()) { +return false; +} + +ZxidDigest zxidDigest = lastProcessedZxidDigest; +if (zxidDigest == null) { +// write an empty digest +zxidDigest = new ZxidDigest(); +} +zxidDigest.serialize(oa); +return true; +} + +/** + * Deserializing the zxid digest from the input stream and update the + * digestFromLoadedSnapshot. + * + * @param ia the input stream to read from + * @return the true if it deserialized successfully + */ +public Boolean deserializeZxidDigest(InputArchive ia) throws IOException { --- End diff -- Nit: boolean ---
[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...
Github user eolivelli commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/632#discussion_r223181364 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java --- @@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType type, Watcher watcher) { public ReferenceCountedACLCache getReferenceCountedAclCache() { return aclCache; } + +/** + * Add the digest to the historical list, and update the latest zxid digest. + */ +private void logZxidDigest(long zxid, long digest) { +ZxidDigest zxidDigest = new ZxidDigest(zxid, DigestCalculator.DIGEST_VERSION, digest); +lastProcessedZxidDigest = zxidDigest; +if (zxidDigest.zxid % 128 == 0) { +synchronized (digestLog) { +digestLog.add(zxidDigest); +if (digestLog.size() > DIGEST_LOG_LIMIT) { +digestLog.poll(); +} +} +} +} + +/** + * Serializing the digest to snapshot, this is done after the data tree + * is being serialized, so when we replay the txns and it hits this zxid + * we know we should be in a non-fuzzy state, and have the same digest. + * + * @param oa the output stream to write to + * @return true if the digest is serialized successfully + */ +public Boolean serializeZxidDigest(OutputArchive oa) throws IOException { --- End diff -- Nit: boolean not Boolean ---
[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...
Github user eolivelli commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/632#discussion_r223181383 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java --- @@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType type, Watcher watcher) { public ReferenceCountedACLCache getReferenceCountedAclCache() { return aclCache; } + +/** + * Add the digest to the historical list, and update the latest zxid digest. + */ +private void logZxidDigest(long zxid, long digest) { +ZxidDigest zxidDigest = new ZxidDigest(zxid, DigestCalculator.DIGEST_VERSION, digest); +lastProcessedZxidDigest = zxidDigest; +if (zxidDigest.zxid % 128 == 0) { --- End diff -- Can you explain this magic value '128' ? Maybe a comment will help. Maybe I am missing something ---
[jira] [Commented] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640701#comment-16640701 ] Lasaro Camargos commented on ZOOKEEPER-3109: I believe I've seen this problem in 3.4.10, which caused a leader never to be elected after the original leader got disconnected. Only after I increased the maximum connection time a new leader got elected. I've tried to reproduce the issue but wasn't successful. Lásaro > Avoid long unavailable time due to voter changed mind when activating the > leader during election > > > Key: ZOOKEEPER-3109 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum, server >Affects Versions: 3.6.0 >Reporter: Fangmin Lv >Assignee: Fangmin Lv >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Occasionally, we'll find it takes long time to elect a leader, might longer > then 1 minute, depends on how big the initLimit and tickTime are set. > > This exposes an issue in leader election protocol. During leader election, > before the voter goes to the LEADING/FOLLOWING state, it will wait for a > finalizeWait time before changing its state. Depends on the order of > notifications, some voter might change mind just after it voting for a > server. If the server it was previous voting for has majority of votes after > considering this one, then that server will goto LEADING state. In some > corner cases, the leader may end up with timeout waiting for epoch ACK from > majority, because of the changed mind voter. This usually happen when there > are even number of servers in the ensemble (either because one of the server > is down or being restarted and it takes long time to restart). If there are 5 > servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING > state, another two in LOOKING state, but the LOOKING servers cannot join the > quorum since they're waiting for majority servers FOLLOWING the current > leader before changing to FOLLOWING as well. > > As far as we know, this voter will change mind if it received a vote from > another host which just started and start to vote itself, or there is a > server takes long time to shutdown it's previous ZK server and start to vote > itself when starting the leader election process. > > Also the follower may abandon the leader if the leader is not ready for > accepting learner connection when the follower tried to connect to it. > > To solve this issue, there are multiple options: > 1. increase the finalizeWait time > 2. smartly detect this state on leader and quit earlier > > The 1st option is straightforward and easier to change, but it will cause > longer leader election time in common cases. > > The 2nd option is more complexity, but it can efficiently solve the problem > without sacrificing the performance in common cases. It remembers the first > majority servers voting for it, checking if there is anyone changed mind > while it's waiting for epoch ACK. The leader will wait for sometime before > quitting LEADING state, since one voter changed may not be a problem if there > are still majority voters voting for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ZOOKEEPER-2844) Zookeeper auto purge process does not purge files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640676#comment-16640676 ] maoling edited comment on ZOOKEEPER-2844 at 10/6/18 10:44 AM: -- [~astei...@varonis.com],[~timkrueger] Could you plz provide some more clues about what happened in your windows server? Do you have any advance? was (Author: maoling): [~astei...@varonis.com][~timkrueger] Could you plz provide some more clues? > Zookeeper auto purge process does not purge files > - > > Key: ZOOKEEPER-2844 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2844 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 > Environment: Windows Server 2008 R2 >Reporter: Avi Steiner >Priority: Major > Attachments: ZK.zip > > > I'm using Zookeeper 3.4.6 > > The ZK log data folder keeps growing with transaction logs files (log.*). > > I set the following in zoo.cfg: > autopurge.purgeInterval=1 > autopurge.snapRetainCount=3 > dataDir=..\\data > > Per ZK log, it reads those parameters: > > 2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 1 > > It also says that cleanup process is running: > > 2017-07-13 10:36:21,266 [myid:] - INFO > [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. > 2017-07-13 10:36:21,297 [myid:] - INFO > [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. > > But actually nothing is deleted. > Every service restart, a new file is created. > > The only parameter I managed to change is preAllocSize, which means the > minimum size per file. The default is 64MB. I changed it to 10KB only for > testing, and I swa the effect as expected: new files were created with 10KB. > I also tried to create a batch file that will run the following: > java -cp > zookeeper-3.4.6.jar;lib/slf4j-api-1.6.1.jar;lib/slf4j-log4j12-1.6.1.jar;lib/log4j-1.2.16.jar;conf > org.apache.zookeeper.server.PurgeTxnLog .\data -n 3 > But it still doesn't do the job. > Please advise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2844) Zookeeper auto purge process does not purge files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640676#comment-16640676 ] maoling commented on ZOOKEEPER-2844: [~astei...@varonis.com][~timkrueger] Could you plz provide some more clues? > Zookeeper auto purge process does not purge files > - > > Key: ZOOKEEPER-2844 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2844 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 > Environment: Windows Server 2008 R2 >Reporter: Avi Steiner >Priority: Major > Attachments: ZK.zip > > > I'm using Zookeeper 3.4.6 > > The ZK log data folder keeps growing with transaction logs files (log.*). > > I set the following in zoo.cfg: > autopurge.purgeInterval=1 > autopurge.snapRetainCount=3 > dataDir=..\\data > > Per ZK log, it reads those parameters: > > 2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2017-07-13 10:36:21,266 [myid:] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 1 > > It also says that cleanup process is running: > > 2017-07-13 10:36:21,266 [myid:] - INFO > [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. > 2017-07-13 10:36:21,297 [myid:] - INFO > [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed. > > But actually nothing is deleted. > Every service restart, a new file is created. > > The only parameter I managed to change is preAllocSize, which means the > minimum size per file. The default is 64MB. I changed it to 10KB only for > testing, and I swa the effect as expected: new files were created with 10KB. > I also tried to create a batch file that will run the following: > java -cp > zookeeper-3.4.6.jar;lib/slf4j-api-1.6.1.jar;lib/slf4j-log4j12-1.6.1.jar;lib/log4j-1.2.16.jar;conf > org.apache.zookeeper.server.PurgeTxnLog .\data -n 3 > But it still doesn't do the job. > Please advise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Success: ZOOKEEPER- PreCommit Build #2376
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 82.17 MB...] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] [exec] Error: No value specified for option "issue" [exec] Session logged out. Session was JSESSIONID=C89B2FC4BD5DFEE55F40E7CB2215FAF2. [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD SUCCESSFUL Total time: 24 minutes 18 seconds Archiving artifacts Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Recording test results Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 [description-setter] Description set: ZOOKEEPER-3125 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Adding one-line test results to commit status... Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting status of 1d1d50c3b7c8eac97d5d7ff83c9839e1d789d0cb to SUCCESS with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ and message: 'SUCCESS 1722 tests run, 3 skipped, 0 failed.' Using context: Jenkins Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Email was triggered for: Success Sending email for trigger: Success Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 ### ## FAILED TESTS (if any) ## All tests passed
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ ---
[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/632 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/ ---
Failed: ZOOKEEPER- PreCommit Build #2375
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 76.57 MB...] [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] [exec] Error: No value specified for option "issue" [exec] Session logged out. Session was JSESSIONID=522F58EB8CA26E6301ABC0BF9D3AD120. [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1953: exec returned: 1 Total time: 11 minutes 58 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Recording test results Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 [description-setter] Description set: ZOOKEEPER-3150 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Adding one-line test results to commit status... Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting status of e040eae608452ff3fa73840152013b329ff95e7c to FAILURE with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/ and message: 'FAILURE 1765 tests run, 1 skipped, 1 failed.' Using context: Jenkins Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/ Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.testPZxidUpdatedWhenLoadingSnapshot Error Message: KeeperErrorCode = ConnectionLoss for /testPZxidUpdatedDuringTakingSnapshot Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /testPZxidUpdatedDuringTakingSnapshot at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2046) at org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.compareStat(FuzzySnapshotRelatedTest.java:260) at org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.testPZxidUpdatedWhenLoadingSnapshot(FuzzySnapshotRelatedTest.java:235) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/632 This is a really useful feature, which helps us find multiple data inconsistent issues, like ZOOKEEPER-3144, ZOOKEEPER-3127, ZOOKEEPER-3125. It can avoid introducing new inconsistent bugs in ZooKeeper in the future, so please take a look when you have time. I'll introduce the 2nd part after this got reviewed and merged. For performance, we saw some very minor impact, will provide the micro-benchmark result. ---
Success: ZOOKEEPER- PreCommit Build #2374
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 79.81 MB...] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] [exec] Error: No value specified for option "issue" [exec] Session logged out. Session was JSESSIONID=000DA200F798C7F3CB313A1BB097C711. [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD SUCCESSFUL Total time: 18 minutes 10 seconds Archiving artifacts Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Recording test results Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 [description-setter] Description set: ZOOKEEPER-3114 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Adding one-line test results to commit status... Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting status of 34673d9889a33aab11dc8686c79501547dc40847 to SUCCESS with url https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/ and message: 'SUCCESS 1737 tests run, 1 skipped, 0 failed.' Using context: Jenkins Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/ Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Email was triggered for: Success Sending email for trigger: Success Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8 ### ## FAILED TESTS (if any) ## All tests passed
[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/632 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/ ---