ZooKeeper_branch34_jdk8 - Build # 1570 - Still Failing

2018-10-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1570/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 43.60 KB...]
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.749 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.99 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.778 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.733 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.686 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.775 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.691 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.09 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.831 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.747 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.987 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.993 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.1 
sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.769 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.801 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.603 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.038 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.116 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.677 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.662 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.779 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.859 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.094 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1467: 
The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1470: 
Tests failed!

Total time: 41 minutes 33 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
17 tests failed.
FAILED:  org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync

Error Message:
Address already in use

Stack Trace:
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at 
org.apache.z

ZooKeeper two issues review

2018-10-21 Thread 田毅群
Hi, all
I proposed a Jira issue to commit ZooKeeper codes. I was asked to follow the 
new issue. So firstly I need to send an email to describe my two issues.

First one:
Jira:  https://issues.apache.org/jira/browse/ZOOKEEPER-3167.
Purpose:  add an API to get total count of recursive sub nodes of one node
Description:

1.In production environment, there will be always a situation that there 
are a lot of recursive sub nodes of one node. We need to count total number of 
the node. Like this.(We want to get all the subnodes of nodeA.)

[cid:image002.jpg@01D46948.03F27490]

2. Now, we can only use API getChildren which returns the List of first 
level of sub nodes.(We can only get the nodeB list directly). We need to 
iterate every sub node to get recursive sub nodes. It will cost a lot of time.

3. In zookeeper server side, it uses Hasp to store node. The 
key of the map represents the path of the node. We can iterate the map get 
total number of all levels of sub nodes of one node.


Second One:
Jira:  https://issues.apache.org/jira/browse/ZOOKEEPER-3168
Purpose:  Reduce session revalidation time after zxid roll over
Description:

1. Sometimes Zookeeper cluster will receive a lot of connections from clients, 
sometimes connection number even exceeds 1W. When zxid rolls over, the clients 
will reconnect and revalidate the session.

2. In Zookeeper design structure, when follower server receives the session 
revalidation requests, it will send requests to leader server, which is 
designed to be responsible for session revalidation.

[cid:image004.png@01D46948.03F27490] When LearnerZooKeeperServer receives 
reconnection, it will send revalidation requests to LeaderZooKeeperServer. 
LeaderZooKeeperServer will face a lot of pressure.

3. In a short time, Leader will handle lots of requests. I use a tool to get 
the statistics, some clients need to wait over 20s. It is too long for some 
special clients, like ResourceManager.

4. I design a thought: when zxid rollover happens. Leader will record the 
accurate time. When reelection finishs, all servers will get the rollover time. 
When clients reconnect and revalidate session. All servers can judge it. So it 
can reduce a lots of pressure of cluster, all clients can will wait for less 
time.


These are my two issues. Help to review the solution is right or not. Thank you 
a lot.
田毅群
技术产品中心  云平台
爱奇艺公司
QIYI.com, Inc.
地址:上海市长宁区临虹路365号爱奇艺创新大厦6层
邮编:201103
手机:+86 157 2140 1256
邮箱:tianyi...@qiyi.com



ZooKeeper_branch35_jdk8 - Build # 1164 - Failure

2018-10-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1164/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 63.83 KB...]
[junit] Running 
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
8.221 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest
[junit] Running org.apache.zookeeper.server.quorum.ReconfigFailureCasesTest 
in thread 3
[junit] Tests run: 30, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
117.431 sec, Thread: 4, Class: 
org.apache.zookeeper.server.quorum.QuorumRequestPipelineTest
[junit] Running org.apache.zookeeper.server.quorum.ReconfigLegacyTest in 
thread 4
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
63.131 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.ReconfigFailureCasesTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
52.331 sec, Thread: 4, Class: 
org.apache.zookeeper.server.quorum.ReconfigLegacyTest
[junit] Running org.apache.zookeeper.server.quorum.ReconfigRecoveryTest in 
thread 4
[junit] Running 
org.apache.zookeeper.server.quorum.ReconfigRollingRestartCompatibilityTest in 
thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
25.802 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.ReconfigRollingRestartCompatibilityTest
[junit] Running org.apache.zookeeper.server.quorum.RemotePeerBeanTest in 
thread 3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.148 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.RemotePeerBeanTest
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
216.335 sec, Thread: 1, Class: 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[junit] Running org.apache.zookeeper.server.quorum.StandaloneDisabledTest 
in thread 3
[junit] Running org.apache.zookeeper.server.quorum.StatCommandTest in 
thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.068 sec, Thread: 1, Class: org.apache.zookeeper.server.quorum.StatCommandTest
[junit] Running org.apache.zookeeper.server.quorum.StatResetCommandTest in 
thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.12 sec, Thread: 1, Class: 
org.apache.zookeeper.server.quorum.StatResetCommandTest
[junit] Running org.apache.zookeeper.server.quorum.UnifiedServerSocketTest 
in thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.451 sec, Thread: 1, Class: 
org.apache.zookeeper.server.quorum.UnifiedServerSocketTest
[junit] Running org.apache.zookeeper.server.quorum.WatchLeakTest in thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.922 sec, Thread: 1, Class: org.apache.zookeeper.server.quorum.WatchLeakTest
[junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test in thread 1
[junit] Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
37.913 sec, Thread: 1, Class: org.apache.zookeeper.server.quorum.Zab1_0Test
[junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED
[junit] Running org.apache.zookeeper.server.quorum.auth.MiniKdcTest in 
thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.343 sec, Thread: 1, Class: org.apache.zookeeper.server.quorum.auth.MiniKdcTest
[junit] Running 
org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest in thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
99.429 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.StandaloneDisabledTest
[junit] Running 
org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest in thread 3
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
43.646 sec, Thread: 1, Class: 
org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest
[junit] Running 
org.apache.zookeeper.server.quorum.auth.QuorumKerberosAuthTest in thread 1
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
26.355 sec, Thread: 3, Class: 
org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest
[junit] Running 
org.apache.zookeeper.server.quorum.auth.QuorumKerberosHostBasedAuthTest in 
thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
12.43 sec, Thread: 1, Class: 
org.apache.zookeeper.server.quorum.auth.QuorumKerberosAuthTest
[junit] Running org.apache.zookeeper.server.util.SerializeUtilsTest in 
thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.573 sec, Thread: 1, Class: org.apache.zookeeper.server.util.SerializeUtilsTest
[junit] Running org.apache.zookeeper.server.util.VerifyingFil

[GitHub] zookeeper issue #665: [ZOOKEEPER-3163] Use session map in the Netty to impro...

2018-10-21 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/665
  
- Very useful improvement. `closeSession` can really face a perfermance 
issue when thousands of clients.
- This issue is very similar to 
[ZOOKEEPER-1669](https://issues.apache.org/jira/browse/ZOOKEEPER-1669) 


---


Failed: ZOOKEEPER- PreCommit Build #2481

2018-10-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2481/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 85.19 MB...]
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=055C1F04B20977CCF0CDFC8D5C89136F.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1839:
 exec returned: 1

Total time: 21 minutes 44 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
[description-setter] Description set: ZOOKEEPER-3155
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting status of 9680f29f998197d3d1b66771e83736a84d643722 to FAILURE with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2481/ and 
message: 'FAILURE
 1802 tests run, 2 skipped, 0 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2481/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper issue #671: ZOOKEEPER-3155: Remove Forrest XMLs and their build pr...

2018-10-21 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/671
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2481/



---


[jira] [Resolved] (ZOOKEEPER-2082) Mistype of electionAlgo can fill out your disk in minutes

2018-10-21 Thread maoling (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maoling resolved ZOOKEEPER-2082.

Resolution: Fixed

> Mistype of electionAlgo can fill out your disk in minutes
> -
>
> Key: ZOOKEEPER-2082
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2082
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.6
> Environment: Cluster (multi-server) setup
>Reporter: Tianyin Xu
>Priority: Minor
>
> The parameter, electionAlgo, is supposed to be 0--3. However, when I mistyped 
> the value in my zoo.cfg (I'm stupid), ZK falls into a dead loop and starts 
> filling up the entire disk which millions of the follow 2 lines...
> 2014-11-14 14:28:44,588 \[myid:3\] - INFO  
> \[QuorumPeer\[myid=3\]/0:0:0:0:0:0:0:0:2183:QuorumPeer@714\] - LOOKING
> 2014-11-14 14:28:44,588 \[myid:3\] - WARN  
> \[QuorumPeer\[myid=3\]/0:0:0:0:0:0:0:0:2183:QuorumPeer@764\] - Unexpected 
> exception
> java.lang.NullPointerException
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
> The error rooted in createElectionAlgorithm() where an invalid setting leads 
> to null for the Election object. Then, in the while look in run(), it causes 
> null-pointer de-referencing which is captured but is not handled well.
> I think our should check the setting of electionAlg in the very beginning to 
> make sure it's a valid setting, instead of using it at runtime and cause the 
> unfortunate things.
> Let me know if you wanna a patch. I'd like to check it in the 
> parseProperties() function in QuorumPeerConfig.java.
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #671: ZOOKEEPER-3155: Remove Forrest XMLs and their build pr...

2018-10-21 Thread tamaashu
Github user tamaashu commented on the issue:

https://github.com/apache/zookeeper/pull/671
  
rebased after last commit


---


[jira] [Assigned] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2018-10-21 Thread maoling (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maoling reassigned ZOOKEEPER-2778:
--

Assignee: maoling  (was: Michael Han)

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: maoling
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)