[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2018-10-06 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640957#comment-16640957
 ] 

maoling commented on ZOOKEEPER-2778:


[~hanm] 
Are still working on this?Could I pick up it?

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2018-10-06 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640957#comment-16640957
 ] 

maoling edited comment on ZOOKEEPER-2778 at 10/7/18 3:02 AM:
-

[~hanm] 
Are you still working on this?Could I pick up it(smirk)?


was (Author: maoling):
[~hanm] 
Are still working on this?Could I pick up it?

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper-trunk - Build # 221 - Still Failing

2018-10-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/221/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 333.68 KB...]
 [exec]  : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testLogCallbackClearLog Message Received: 
[2018-10-06 23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1080: Client 
environment:zookeeper.version=zookeeper C client 3.6.0]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1084: Client 
environment:host.name=asf909.gq1.ygridcore.net]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1091: Client 
environment:os.name=Linux]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1092: Client 
environment:os.arch=3.13.0-153-generic]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1093: Client 
environment:os.version=#203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1101: Client 
environment:user.name=jenkins]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1109: Client 
environment:user.home=/home/jenkins]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@log_env@1121: Client 
environment:user.dir=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe081a2f40):ZOO_INFO@zookeeper_init_internal@1167: 
Initiating client connection, host=127.0.0.1:22181 sessionTimeout=1 
watcher=0x4639e0 sessionId=0 sessionPasswd= context=0x7ffc43b3fe80 
flags=0]
 [exec] Log Message Received: [2018-10-06 
23:45:47,542:30308(0x2afe0a205700):ZOO_INFO@check_events@2454: initiated 
connection to server 127.0.0.1:22181]
 [exec] Log Message Received: [2018-10-06 
23:45:47,561:30308(0x2afe0a205700):ZOO_INFO@check_events@2506: session 
establishment complete on server 127.0.0.1:22181, sessionId=0x101cdc8d80e000f, 
negotiated timeout=1 ]
 [exec]  : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server 
started : elapsed 10520 : OK
 [exec] Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
 [exec] Zookeeper_simpleSystem::testFirstServerDown : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testNonexistentHost : elapsed 1034 : OK
 [exec] Zookeeper_simpleSystem::testNullData : elapsed 1032 : OK
 [exec] Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK
 [exec] Zookeeper_simpleSystem::testCreate : elapsed 1016 : OK
 [exec] Zookeeper_simpleSystem::testPath : elapsed 1049 : OK
 [exec] Zookeeper_simpleSystem::testPathValidation : elapsed 1158 : OK
 [exec] Zookeeper_simpleSystem::testPing : elapsed 17642 : OK
 [exec] Zookeeper_simpleSystem::testAcl : elapsed 1016 : OK
 [exec] Zookeeper_simpleSystem::testChroot : elapsed 3081 : OK
 [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started ZooKeeper 
server started : elapsed 31095 : OK
 [exec] Zookeeper_simpleSystem::testHangingClient : elapsed 1046 : OK
 [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper 
server started ZooKeeper server started ZooKeeper server started : elapsed 
15679 : OK
 [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper 
server started ZooKeeper server started ZooKeeper server started : elapsed 
15783 : OK
 [exec] Zookeeper_simpleSystem::testGetChildren2 : elapsed 1079 : OK
 [exec] Zookeeper_simpleSystem::testLastZxid : elapsed 4537 : OK
 [exec] Zookeeper_simpleSystem::testRemoveWatchers ZooKeeper server started 
: elapsed 4718 : OK
 [exec] *** Error in `./zktest-mt': free(): invalid pointer: 
0x2afe0818e000 ***
 [exec] /bin/bash: line 5: 30308 Aborted 
ZKROOT=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/zookeeper-client/zookeeper-client-c/../..
 CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar ${dir}$tst
 [exec] Zookeeper_readOnly::testReadOnly : elapsed 4132 : OK
 [exec] Zookeeper_logClientEnv::testLogClientEnv : elapsed 1 : OK
 [exec] OK (76)
 [exec] FAIL: zktest-mt
 [exec] ==
 [exec] 1 of 2 tests failed
 [exec] Please report to u...@zookeeper.apache.org
 [exec] ==
 [exec] make[1]: Leaving directory 
`/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit'
 [exec] make[1]: *** [check-TESTS] Error 1
 [exec] make: *** [check-am] Error 2

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1550: The 
following error occurred while 

Roadmap to MetricsProvider

2018-10-06 Thread Enrico Olivelli
Hi guys,
as I am going to work on Prometheus implementation I should at least
add some minimal metric to expose to MetricsProvider.

In order to introduce a MetricsProvided based instrumentation we
should eventually drop existing instrumentation and replace with the
new system.
The challenge would be to not drop 4 letter words API and/or not to
duplicate all the instrumentation points.

A minimal instrumentation, just to expose some useful value is to add
this method to ZooKeeperServer.java, and expose basic data.


protected void setupMetrics() {
rootMetricsContext.registerGauge("outstanding_requests",
() -> {
 return serverStats.getOutstandingRequests();
 });
rootMetricsContext.registerGauge("znode_count",
 () -> {
 return zkDb.getNodeCount();
 });
rootMetricsContext.registerGauge("watch_count",
 () -> {
 return zkDb.getDataTree().getWatchCount();
 });
rootMetricsContext.registerGauge("ephemerals_count",
 () -> {
 return zkDb.getDataTree().getEphemeralsCount();
 });
rootMetricsContext.registerGauge("approximate_data_size",
 () -> {
 return zkDb.getDataTree().cachedApproximateDataSize();
 });
rootMetricsContext.registerGauge("global_sessions",
 () -> {
 return zkDb.getSessionCount();
 });
rootMetricsContext.registerGauge("local_sessions",
 () -> {
 return sessionTracker.getLocalSessionCount();
 });
}


This approach is not the one I expect for the long term, as each
subsystem (ZkDatabase, Packket Processor)...will have its own specific
instrumentation.
This can work in the very short term only for "gauges" and not for
Summaries (with avg/min/max...) and Counters, which should be
collected in-place.

Do you have any suggestion ?

Enrico


[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...

2018-10-06 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/632#discussion_r223181405
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java ---
@@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType 
type, Watcher watcher) {
 public ReferenceCountedACLCache getReferenceCountedAclCache() {
 return aclCache;
 }
+
+/**
+ * Add the digest to the historical list, and update the latest zxid 
digest.
+ */
+private void logZxidDigest(long zxid, long digest) {
+ZxidDigest zxidDigest = new ZxidDigest(zxid, 
DigestCalculator.DIGEST_VERSION, digest);
+lastProcessedZxidDigest = zxidDigest;
+if (zxidDigest.zxid % 128 == 0) {
+synchronized (digestLog) {
+digestLog.add(zxidDigest);
+if (digestLog.size() > DIGEST_LOG_LIMIT) {
+digestLog.poll();
+}
+}
+}
+}
+
+/**
+ * Serializing the digest to snapshot, this is done after the data 
tree 
+ * is being serialized, so when we replay the txns and it hits this 
zxid 
+ * we know we should be in a non-fuzzy state, and have the same 
digest. 
+ *
+ * @param oa the output stream to write to 
+ * @return true if the digest is serialized successfully
+ */
+public Boolean serializeZxidDigest(OutputArchive oa) throws 
IOException {
+if (!DigestCalculator.digestEnabled()) {
+return false;
+}
+
+ZxidDigest zxidDigest = lastProcessedZxidDigest;
+if (zxidDigest == null) {
+// write an empty digest
+zxidDigest = new ZxidDigest();
+}
+zxidDigest.serialize(oa);
+return true;
+}
+
+/**
+ * Deserializing the zxid digest from the input stream and update the 
+ * digestFromLoadedSnapshot.
+ *
+ * @param ia the input stream to read from
+ * @return the true if it deserialized successfully
+ */
+public Boolean deserializeZxidDigest(InputArchive ia) throws 
IOException {
--- End diff --

Nit: boolean


---


[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...

2018-10-06 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/632#discussion_r223181364
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java ---
@@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType 
type, Watcher watcher) {
 public ReferenceCountedACLCache getReferenceCountedAclCache() {
 return aclCache;
 }
+
+/**
+ * Add the digest to the historical list, and update the latest zxid 
digest.
+ */
+private void logZxidDigest(long zxid, long digest) {
+ZxidDigest zxidDigest = new ZxidDigest(zxid, 
DigestCalculator.DIGEST_VERSION, digest);
+lastProcessedZxidDigest = zxidDigest;
+if (zxidDigest.zxid % 128 == 0) {
+synchronized (digestLog) {
+digestLog.add(zxidDigest);
+if (digestLog.size() > DIGEST_LOG_LIMIT) {
+digestLog.poll();
+}
+}
+}
+}
+
+/**
+ * Serializing the digest to snapshot, this is done after the data 
tree 
+ * is being serialized, so when we replay the txns and it hits this 
zxid 
+ * we know we should be in a non-fuzzy state, and have the same 
digest. 
+ *
+ * @param oa the output stream to write to 
+ * @return true if the digest is serialized successfully
+ */
+public Boolean serializeZxidDigest(OutputArchive oa) throws 
IOException {
--- End diff --

Nit: boolean not Boolean


---


[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...

2018-10-06 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/632#discussion_r223181383
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java ---
@@ -1521,4 +1562,179 @@ public boolean removeWatch(String path, WatcherType 
type, Watcher watcher) {
 public ReferenceCountedACLCache getReferenceCountedAclCache() {
 return aclCache;
 }
+
+/**
+ * Add the digest to the historical list, and update the latest zxid 
digest.
+ */
+private void logZxidDigest(long zxid, long digest) {
+ZxidDigest zxidDigest = new ZxidDigest(zxid, 
DigestCalculator.DIGEST_VERSION, digest);
+lastProcessedZxidDigest = zxidDigest;
+if (zxidDigest.zxid % 128 == 0) {
--- End diff --

Can you explain this magic value '128' ? Maybe a comment will help.

Maybe I am missing something


---


[jira] [Commented] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election

2018-10-06 Thread Lasaro Camargos (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640701#comment-16640701
 ] 

Lasaro Camargos commented on ZOOKEEPER-3109:


I believe I've seen this problem in 3.4.10, which caused a leader never to
be elected after the original leader got disconnected. Only after I
increased the maximum connection time a new leader got elected.
I've tried to reproduce the issue but wasn't successful.
Lásaro





> Avoid long unavailable time due to voter changed mind when activating the 
> leader during election
> 
>
> Key: ZOOKEEPER-3109
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Occasionally, we'll find it takes long time to elect a leader, might longer 
> then 1 minute, depends on how big the initLimit and tickTime are set.
>   
>  This exposes an issue in leader election protocol. During leader election, 
> before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
> finalizeWait time before changing its state. Depends on the order of 
> notifications, some voter might change mind just after it voting for a 
> server. If the server it was previous voting for has majority of votes after 
> considering this one, then that server will goto LEADING state. In some 
> corner cases, the leader may end up with timeout waiting for epoch ACK from 
> majority, because of the changed mind voter. This usually happen when there 
> are even number of servers in the ensemble (either because one of the server 
> is down or being restarted and it takes long time to restart). If there are 5 
> servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING 
> state, another two in LOOKING state, but the LOOKING servers cannot join the 
> quorum since they're waiting for majority servers FOLLOWING the current 
> leader before changing to FOLLOWING as well.
>   
>  As far as we know, this voter will change mind if it received a vote from 
> another host which just started and start to vote itself, or there is a 
> server takes long time to shutdown it's previous ZK server and start to vote 
> itself when starting the leader election process.
>   
>  Also the follower may abandon the leader if the leader is not ready for 
> accepting learner connection when the follower tried to connect to it.
>   
>  To solve this issue, there are multiple options: 
> 1. increase the finalizeWait time
> 2. smartly detect this state on leader and quit earlier
>  
>  The 1st option is straightforward and easier to change, but it will cause 
> longer leader election time in common cases.
>   
>  The 2nd option is more complexity, but it can efficiently solve the problem 
> without sacrificing the performance in common cases. It remembers the first 
> majority servers voting for it, checking if there is anyone changed mind 
> while it's waiting for epoch ACK. The leader will wait for sometime before 
> quitting LEADING state, since one voter changed may not be a problem if there 
> are still majority voters voting for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-2844) Zookeeper auto purge process does not purge files

2018-10-06 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640676#comment-16640676
 ] 

maoling edited comment on ZOOKEEPER-2844 at 10/6/18 10:44 AM:
--

[~astei...@varonis.com],[~timkrueger]
Could you plz provide some more clues about what happened in your windows 
server?
Do you have any advance?


was (Author: maoling):
[~astei...@varonis.com][~timkrueger]
Could you plz provide some more clues?

> Zookeeper auto purge process does not purge files
> -
>
> Key: ZOOKEEPER-2844
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2844
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
> Environment: Windows Server 2008 R2
>Reporter: Avi Steiner
>Priority: Major
> Attachments: ZK.zip
>
>
> I'm using Zookeeper 3.4.6
> 
> The ZK log data folder keeps growing with transaction logs files (log.*).
> 
> I set the following in zoo.cfg:
> autopurge.purgeInterval=1
> autopurge.snapRetainCount=3
> dataDir=..\\data
> 
> Per ZK log, it reads those parameters:
> 
> 2017-07-13 10:36:21,266 [myid:] - INFO  [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2017-07-13 10:36:21,266 [myid:] - INFO  [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 1
> 
> It also says that cleanup process is running:
> 
> 2017-07-13 10:36:21,266 [myid:] - INFO  
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> 2017-07-13 10:36:21,297 [myid:] - INFO  
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 
> But actually nothing is deleted.
> Every service restart, a new file is created.
> 
> The only parameter I managed to change is preAllocSize, which means the 
> minimum size per file. The default is 64MB. I changed it to 10KB only for 
> testing, and I swa the effect as expected: new files were created with 10KB.
> I also tried to create a batch file that will run the following:
> java -cp 
> zookeeper-3.4.6.jar;lib/slf4j-api-1.6.1.jar;lib/slf4j-log4j12-1.6.1.jar;lib/log4j-1.2.16.jar;conf
>  org.apache.zookeeper.server.PurgeTxnLog .\data -n 3
> But it still doesn't do the job.
> Please advise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2844) Zookeeper auto purge process does not purge files

2018-10-06 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640676#comment-16640676
 ] 

maoling commented on ZOOKEEPER-2844:


[~astei...@varonis.com][~timkrueger]
Could you plz provide some more clues?

> Zookeeper auto purge process does not purge files
> -
>
> Key: ZOOKEEPER-2844
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2844
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
> Environment: Windows Server 2008 R2
>Reporter: Avi Steiner
>Priority: Major
> Attachments: ZK.zip
>
>
> I'm using Zookeeper 3.4.6
> 
> The ZK log data folder keeps growing with transaction logs files (log.*).
> 
> I set the following in zoo.cfg:
> autopurge.purgeInterval=1
> autopurge.snapRetainCount=3
> dataDir=..\\data
> 
> Per ZK log, it reads those parameters:
> 
> 2017-07-13 10:36:21,266 [myid:] - INFO  [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2017-07-13 10:36:21,266 [myid:] - INFO  [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 1
> 
> It also says that cleanup process is running:
> 
> 2017-07-13 10:36:21,266 [myid:] - INFO  
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> 2017-07-13 10:36:21,297 [myid:] - INFO  
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 
> But actually nothing is deleted.
> Every service restart, a new file is created.
> 
> The only parameter I managed to change is preAllocSize, which means the 
> minimum size per file. The default is 64MB. I changed it to 10KB only for 
> testing, and I swa the effect as expected: new files were created with 10KB.
> I also tried to create a batch file that will run the following:
> java -cp 
> zookeeper-3.4.6.jar;lib/slf4j-api-1.6.1.jar;lib/slf4j-log4j12-1.6.1.jar;lib/log4j-1.2.16.jar;conf
>  org.apache.zookeeper.server.PurgeTxnLog .\data -n 3
> But it still doesn't do the job.
> Please advise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #2376

2018-10-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 82.17 MB...]
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=C89B2FC4BD5DFEE55F40E7CB2215FAF2.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 24 minutes 18 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3125
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting status of 1d1d50c3b7c8eac97d5d7ff83c9839e1d789d0cb to SUCCESS with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ and 
message: 'SUCCESS 
 1722 tests run, 3 skipped, 0 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-06 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/



---


[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...

2018-10-06 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/632
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/



---


Failed: ZOOKEEPER- PreCommit Build #2375

2018-10-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 76.57 MB...]
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=522F58EB8CA26E6301ABC0BF9D3AD120.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1953:
 exec returned: 1

Total time: 11 minutes 58 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3150
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting status of e040eae608452ff3fa73840152013b329ff95e7c to FAILURE with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/ and 
message: 'FAILURE
 1765 tests run, 1 skipped, 1 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2375/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.testPZxidUpdatedWhenLoadingSnapshot

Error Message:
KeeperErrorCode = ConnectionLoss for /testPZxidUpdatedDuringTakingSnapshot

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /testPZxidUpdatedDuringTakingSnapshot
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2046)
at 
org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.compareStat(FuzzySnapshotRelatedTest.java:260)
at 
org.apache.zookeeper.server.quorum.FuzzySnapshotRelatedTest.testPZxidUpdatedWhenLoadingSnapshot(FuzzySnapshotRelatedTest.java:235)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...

2018-10-06 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/632
  
This is a really useful feature, which helps us find multiple data 
inconsistent issues, like ZOOKEEPER-3144, ZOOKEEPER-3127, ZOOKEEPER-3125. 

It can avoid introducing new inconsistent bugs in ZooKeeper in the future, 
so please take a look when you have time. I'll introduce the 2nd part after 
this got reviewed and merged.

For performance, we saw some very minor impact, will provide the 
micro-benchmark result.


---


Success: ZOOKEEPER- PreCommit Build #2374

2018-10-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 79.81 MB...]
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=000DA200F798C7F3CB313A1BB097C711.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 10 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3114
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting status of 34673d9889a33aab11dc8686c79501547dc40847 to SUCCESS with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/ and 
message: 'SUCCESS 
 1737 tests run, 1 skipped, 0 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper issue #632: [ZOOKEEPER-3150] Add tree digest check and verify data...

2018-10-06 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/632
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2374/



---