[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739609#action_12739609 ]
Patrick Hunt commented on ZOOKEEPER-498: ---------------------------------------- Looks to me like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: ------------- Standard Error ----------------- Exception in thread "Thread-108" junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) ------------- ---------------- --------------- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider "success"ful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 & 5 server quorums "by hand from the command line" with 0 weight and they see similar issues to what Todd is seeing. this is happening for me on both the trunk and 3.2 branch source. > Unending Leader Elections : WAN configuration > --------------------------------------------- > > Key: ZOOKEEPER-498 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.2.0 > Environment: Each machine: > CentOS 5.2 64-bit > 2GB ram > java version "1.6.0_13" > Java(TM) SE Runtime Environment (build 1.6.0_13-b03) > Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed > Network Topology: > DC : central data center > POD(N): remote data center > Zookeeper Topology: > Leaders may be elected only in DC (weight = 1) > Only followers are elected in PODS (weight = 0) > Reporter: Todd Greenwood-Geer > Assignee: Patrick Hunt > Priority: Critical > Fix For: 3.2.1, 3.3.0 > > Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg > > > In a WAN configuration, ZooKeeper is endlessly electing, terminating, and > re-electing a ZooKeeper leader. The WAN configuration involves two groups, a > central DC group of ZK servers that have a voting weight = 1, and a group of > servers in remote pods with a voting weight of 0. > What we expect to see is leaders elected only in the DC, and the pods to > contain only followers. What we are seeing is a continuous cycling of > leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended > patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.