[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

Patrick Hunt (JIRA) Wed, 05 Aug 2009 10:01:38 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739609#action_12739609
 ]


Patrick Hunt commented on ZOOKEEPER-498:
----------------------------------------

Looks to me like 0 weight is still busted, fle0weighttest is actually failing 
on my machine, however it's reported as success:
------------- Standard Error -----------------
Exception in thread "Thread-108" junit.framework.AssertionFailedError: Elected 
zero-weight server
    at junit.framework.Assert.fail(Assert.java:47)
    at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138)
------------- ---------------- ---------------

this is probably due because the test is calling assert in a thread other than 
the main test thread - which junit will not track/knowabout.

One problem I see with these tests (0weight test I looked at) -- it doesn't 
have a client attempt to connect to the various servers as part of declaring 
success. Really we should only consider "success"ful test (ie assert that) if a 
client can connect to each server in the cluster and change/seechanges. As part 
of fixing this we really need to do a sanity check by testing the various 
command lines and checking that a client can connect.

I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch 
seems to just thrash...

Also I tried 3 & 5 server quorums "by hand from the command line" with 0 weight 
and they see similar issues to what Todd is seeing.

this is happening for me on both the trunk and 3.2 branch source.

> Unending Leader Elections : WAN configuration
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-498
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.2.0
>         Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>            Reporter: Todd Greenwood-Geer
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.2.1, 3.3.0
>
>         Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

Reply via email to