[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742287#action_12742287 ] Hudson commented on ZOOKEEPER-498: -- Integrated in ZooKeeper-trunk #413 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/413/]) . Unending Leader Elections : WAN configuration (flavio via mahadev) Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498-3.2.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741526#action_12741526 ] Patrick Hunt commented on ZOOKEEPER-498: Looks good to me, I tested out a number of q configs (2/3/4/5/7/9) with various weights/groups (granted, not exhaustive), and the quorum always formed. Was able to connect client, also tried stopping/starting servers to ensure rejoin of quorum. Looks good to me. Also - the logging seems better, no longer errors for things that are not errors. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741529#action_12741529 ] Patrick Hunt commented on ZOOKEEPER-498: btw, my testing was on trunk. I'll give branch a try too and report I find any issues (otw assume it's ok) Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741565#action_12741565 ] Mahadev konar commented on ZOOKEEPER-498: - flavio, can you upload a patch for 3.2 as well? Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741605#action_12741605 ] Benjamin Reed commented on ZOOKEEPER-498: - +1 looks good. when setting the stop flags, you should really do an interrupt to wake up the wait, but that will cause a message to be printed to stdout. i'll open another jira to fix that. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740848#action_12740848 ] Hadoop QA commented on ZOOKEEPER-498: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12415916/ZOOKEEPER-498.patch against trunk revision 802188. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/180/console This message is automatically generated. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740668#action_12740668 ] Hadoop QA commented on ZOOKEEPER-498: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12415848/ZOOKEEPER-498.patch against trunk revision 802108. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/177/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/177/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/177/console This message is automatically generated. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740720#action_12740720 ] Patrick Hunt commented on ZOOKEEPER-498: fwiw: I tested this out on trunk with a few diff configurations and it seems to have fixed the issue(s) seen before! 9 servers : OK 9 servers: weight 1:1:1:1:1:0:0:0:0 : OK 9 servers: weight 1:1:1:1:1:0:0:0:0 3 groups 1:2:3:4:5 6:7 8:9 : OK 9 servers: weight 1:1:1:1:1:0:0:0:0 1 group 1:2:3:4:5:6:7:8:9 : OK Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740731#action_12740731 ] Todd Greenwood-Geer commented on ZOOKEEPER-498: --- Patrick, Thanks for the update. I'm closely following the dev alias and appreciate the effort the ZK team is putting in. For the time being, I'll stick with 3.1.1 and solve our WAN issues with an ensemble synchronizer. I'm in the middle of writing that bit right now. BTW - Should I succeed in convincing my company to allow me to open source various components that I've written (on top of zookeeper), what is the process for that? -Todd Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740732#action_12740732 ] Patrick Hunt commented on ZOOKEEPER-498: I also tried this (todd's original configuration) and it also works with this patch: 9 servers: weight 1:1:1:1:1:0:0:0:0 2 groups 1:2:3:4:5 6:7:8:9 : OK server 1 config: - tickTime=2000 initLimit=10 syncLimit=5 dataDir=./s1/data clientPort=2181 electionAlg=3 server.1=localhost:3181:4181 server.2=localhost:3182:4182 server.3=localhost:3183:4183 server.4=localhost:3184:4184 server.5=localhost:3185:4185 server.6=localhost:3186:4186 server.7=localhost:3187:4187 server.8=localhost:3188:4188 server.9=localhost:3189:4189 weight.1=1 weight.2=1 weight.3=1 weight.4=1 weight.5=1 weight.6=0 weight.7=0 weight.8=0 weight.9=0 group.1=1:2:3:4:5 group.2=6:7:8:9 Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740736#action_12740736 ] Patrick Hunt commented on ZOOKEEPER-498: Todd, we appreciate your help and patience while we straighten this issue out. Thanks! Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740737#action_12740737 ] Patrick Hunt commented on ZOOKEEPER-498: Todd, the basic process from our end is that you should enter a JIRA and attach a patch. if you are contributing outside core, then you prolly want to add your stuff to src/contrib http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740195#action_12740195 ] Mahadev konar commented on ZOOKEEPER-498: - flavio, can you include the example in the forrest docs? It would be good for folks using it. It gets quite confusing when using flexible quorums. exmaples/docs should help. thanks Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739609#action_12739609 ] Patrick Hunt commented on ZOOKEEPER-498: Looks to me like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread Thread-108 junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider successful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 5 server quorums by hand from the command line with 0 weight and they see similar issues to what Todd is seeing. this is happening for me on both the trunk and 3.2 branch source. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739787#action_12739787 ] Patrick Hunt commented on ZOOKEEPER-498: Please fix the following as well - incorrect logging levels are being used in quorum code, example: 2009-08-05 15:17:02,733 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - There is a connection for server 1 2009-08-05 15:17:02,753 - ERROR [WorkerSender Thread:quorumcnxmana...@341] - There is a connection for server 2 this is INFO, not ERROR Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739789#action_12739789 ] Patrick Hunt commented on ZOOKEEPER-498: Todd,I did see an issue with your config, it's not: group.1:1:2:3 rather it's: group.1=1:2:3 (should be = not : ) Regardless though - even after I fix this it's still not forming a cluster properly, we're still looking. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739891#action_12739891 ] Flavio Paiva Junqueira commented on ZOOKEEPER-498: -- Pat, we have a description of how to configure in the Cluster options of the Administrator guide. We are missing an example, which is in the source code as you point out. Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Flavio Paiva Junqueira Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zk498-test.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.