[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918878#action_12918878 ] Hudson commented on ZOOKEEPER-822: -- Integrated in ZooKeeper-trunk #959 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/959/]) ZOOKEEPER-822. Leader election taking a long time to complete > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915796#action_12915796 ] Benjamin Reed commented on ZOOKEEPER-822: - looks good overall flavio. just a quick questions: i notice that operations on senderWorkerMap in initiateConnection are not synchronized. senderWorkerMap is concurrent, but there could be a race between the get, put, and vsw.finish if initiateConnection is called concurrently for the same sid. right? also you need to add a blurb to the config doc for the timeout system variable, which should be "zookeeper.cnxtimeout" so that it can be set from the configuration file. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913307#action_12913307 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, +1. Looks good. I remember looking at the socket.connect() method, but I don't remember why I ruled it out in the favor of thread. Minor point - missing space before "error" in LOG.warn("Connection broken: for id " + sid + "my id = " + self.getId() + "error..). Thank you. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912483#action_12912483 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, Thanks. I will take a look at the patches. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, > ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909592#action_12909592 ] Mahadev konar commented on ZOOKEEPER-822: - vishal, I was expecting some commitment from you for making it use a selector :). > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909560#action_12909560 ] Vishal K commented on ZOOKEEPER-822: I agree with Mahadev. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909558#action_12909558 ] Mahadev konar commented on ZOOKEEPER-822: - visha, flavio, If there is just one thread running at one point in time, then its ok. Also, I am really worried about the code structure in LeaderElection.java. Its ok to have a temporary fix, but it would be great to see some commitment from someone on doing it right in 3.4. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909556#action_12909556 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, As I mentioned earlier, this is a temporary patch until the selector based approach (non-blocking IO) is ready. In general, what is the concern with the current fix? There will be only one thread running at a time. The thread just makes sure that we can bound the connection time. This patch is working well for us as a temporary fix. Apart from the overhead of starting a thread I don't see anything wrong with the fix. Again, given that this bug is a blocker for us, we certainly cannot wait until the non-blocking implementation is done and released. Thanks. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909434#action_12909434 ] Flavio Junqueira commented on ZOOKEEPER-822: Hi Vishal, I have taken a look at your patch. As I said before, it sounds good to me to make SocketChannel non-blocking, but I don't like very much the approach of creating one thread per connection attempt. Instead, I was thinking that we should try to use a selector. What do you think? > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906872#action_12906872 ] Flavio Junqueira commented on ZOOKEEPER-822: I'll have a look at it, Vishal. Thanks for posting it. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906854#action_12906854 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, > I think we need some time to converge on problems and fixes. I don't think it would take a long time to converge. I think the patch that I attached is quite simple. After adding a new property for timeout we should be good to go. > My understanding is that we want to have 3.3.2 out soon, and my feeling is > that this is not a blocker for 3.3.2 given Vishal's description and our > experience with the system so far, but it would be good to hear from Vishal. >From our earlier email exchanges I have a feeling that in most cases FLE was >tested by restarting the ZooKeeper service (and not by rebooting/shutting down >the host). I am a bit concerned that enough time may not have been spent in >testing/reproducing this problem. In my opinion, this fix should go in 3.3.2. >I know for sure that we won't be able to use the next release as is without >this fix. Thanks. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, > ZOOKEEPER-822.patch_v1 > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906346#action_12906346 ] Flavio Junqueira commented on ZOOKEEPER-822: I think we need some time to converge on problems and fixes. My understanding is that we want to have 3.3.2 out soon, and my feeling is that this is not a blocker for 3.3.2 given Vishal's description and our experience with the system so far, but it would be good to hear from Vishal. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Assignee: Vishal K >Priority: Blocker > Fix For: 3.3.2, 3.4.0 > > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905836#action_12905836 ] Flavio Junqueira commented on ZOOKEEPER-822: {quote} 1. Blocking connects and accepts: You are right, when the node is down TCP timeouts rule. a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective pe! er. {quote} As I commented before, it might be ok to make it asynchronous, especially if we have a way of checking that there is an attempt to establish a connection in progress. I'm also still intrigued about why this is a problem for you. I haven't seen any of this being a problem before, which of course doesn't mean we shouldn't fix it. It would be nice to understand what's special about your setup or if others have seen similar problems and I missed the reports. {quote} b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. {quote} If I remember correctly, we currently synchronize connectOne and make all connection establishments through connectOne so that we make sure that we do one at a time. My understanding is that this should reduce the number of rounds of attempts to establish connections, perhaps at the cost of a longer delay in some runs. {quote} 2. Buggy senderWorkerMap handling: The code that manages senderWorkerMap is very buggy. It is causing multiple election rounds. While debugging I found that sometimes after FLE a node will have its sendWorkerMap empty even if it has SenderWorker and RecvWorker threads for each peer. {quote} I don't think that having multiple rounds is bad; in fact, I think it is unavoidable using reasonable timeout values. The second part, however, sounds like a problem we should fix. {quote} a) The receiveConnection() method calls the finish() method, which removes an entry from the map. Additionally, the thread itself calls finish() which could remove the newly added entry from the map. In short, receiveConnection is causing the exact condition that you mentioned above. {quote} I thought that we were increasing the intervals between notifications, and if so I believe the case you mention above should not happen more than a few times. Now, to fix it, it sounds like we need to check that the finish call is removing the correct object in sendWorkerMap. That is, obj.finish() should remove obj and do nothing if the SendWorker object in sendWorkerMap is a different one. What do you think? {quote} b) Apart from the bug in finish(), receiveConnection is making an entry in senderWorkerMap at the wrong place. Here's the buggy code: SendWorker vsw = senderWorkerMap.get(sid); senderWorkerMap.put(sid, sw); if(vsw != null) vsw.finish(); It makes an entry for the new thread and then calls finish, which causes the new thread to be removed from the Map. The old thread will also get terminated since finish() will interrupt the thread. {quote} See my comment above. Perhaps I should wait to see your proposed modifications, but I wonder if works to check that we are removing the correct SendWorker object. {quote} 3. Race condition in receiveConnection and initiateConnection: In theory, two peers can keep disconnecting each other's connection. Example: T0: Peer 0 initia
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905528#action_12905528 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, I was planning to send out a mail explaining the problems in the FLE implementation that I have found so far. For now, I will put the info here. We can create new JIRAs if needed. I am waiting to hear back from our legal department to resolve copyright issues so that I can share my fixes as well. 1. Blocking connects and accepts: You are right, when the node is down TCP timeouts rule. a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective pe! er. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. 2. Buggy senderWorkerMap handling: The code that manages senderWorkerMap is very buggy. It is causing multiple election rounds. While debugging I found that sometimes after FLE a node will have its sendWorkerMap empty even if it has SenderWorker and RecvWorker threads for each peer. a) The receiveConnection() method calls the finish() method, which removes an entry from the map. Additionally, the thread itself calls finish() which could remove the newly added entry from the map. In short, receiveConnection is causing the exact condition that you mentioned above. b) Apart from the bug in finish(), receiveConnection is making an entry in senderWorkerMap at the wrong place. Here's the buggy code: SendWorker vsw = senderWorkerMap.get(sid); senderWorkerMap.put(sid, sw); if(vsw != null) vsw.finish(); It makes an entry for the new thread and then calls finish, which causes the new thread to be removed from the Map. The old thread will also get terminated since finish() will interrupt the thread. 3. Race condition in receiveConnection and initiateConnection: *In theory*, two peers can keep disconnecting each other's connection. Example: T0: Peer 0 initiates a connection (request 1) T1: Peer 1 receives connection from peer 0 T2: Peer 1 calls receiveConnection() T2: Peer 0 closes connection to Peer 1 because its ID is lower. T3: Peer 0 re-initiates connection to Peer 1 from manger.toSend() (request 2) T3: Peer 1 terminates older connection to peer 0 T4: Peer 1 calls connectOne() which starts new sendWorker threads for peer 0 T5: Peer 1 kills connection created in T3 because it receives another (request 2) connect request from 0 The problem here is that while Peer 0 is accepting a connection from Peer 1 it can also be initiating a connection to Peer 1. So if they hit the right frequencies they could sit in a connect/disconnect loop and cause multiple rounds of leader election. I think the cause here is again blocking connects()/accepts(). A peer starts to take action (to kill existing threads and start new threads) as soon as a connection is established at the *TCP level*. That is, it does not give us any control to synchronized connect and accepts. We could use non-blocking connects and accepts. This will allow us to a) tell a thre
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904684#action_12904684 ] Flavio Junqueira commented on ZOOKEEPER-822: Hi VIshal, Good catches: 1- It sounds right that blocking the connection establishment might increase the time to election unnecessarily when the other party is not up. Here is my interpretation. If the machine is up but the the zk server is not running, then we simply get a connection failure and move on. The same doesn't happen when the the machine is down, since we need to wait for the connection establishment to time out; 2- It sounds right that a connection can be dropped erroneously due to a race, but I don't see in which case it can cause the election time to increase substantially, unless the race is triggered multiple times in a row. A server will try to connect upon every new notification, and a server only calls SendWorker.finish() in receiveNotification if it has a higher identifier. In this case, it creates a new connection immediately after, so it would need a previous connection being dropped right before to have the case you're describing; 3- Servers with higher identifiers decline connection requests from servers with lower identifiers; it is part of the protocol. Is this what you're referring to? > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902988#action_12902988 ] Vishal K commented on ZOOKEEPER-822: The fix for problem 1 and 2 above eliminates the bug. I will have a patch out soon. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900898#action_12900898 ] Vishal K commented on ZOOKEEPER-822: Correction: 2. SendWorker.run() calls finish at the end. This could result in finish() getting called twice (e.g., finish called from receiveConnection), thus, causing senderWorkerMap.remove(sid) called twice and removing an entry that should *not* be removed. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900401#action_12900401 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, Ah! my trunk is quite old then. But I don't think it is necessary to run with the latest code for debugging this issue. I have identified one problem in WorkerSender.process(). This function calls manager.toSend() whicih calls connectOne. connectOne does a blocking connect (which takes order of minutes to return if a node is down). Thus, WorkerSender.run() will block and not send any successive notifications to other nodes. Let met know what you think I tired with adding timeouts to connectOne, but I am running into similar issue somewhere else. So that didnt fix the problem > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900165#action_12900165 ] Flavio Junqueira commented on ZOOKEEPER-822: Vishal, You don't seem to be using trunk code. The current trunk code would report notifications using the following format when report level info is enabled: {noformat} LOG.info("Notification: " + n.leader + " (n.leader), " + n.zxid + " (n.zxid), " + n.epoch + " (n.round), " + n.state + " (n.state), " + n.sid + " (n.sid), " + self.getPeerState() + " (my state)"); {noformat} And I'm seeing the following in the excerpt above: {noformat} Notification: 0, 34359738368, 4, 0, LOOKING, LOOKING, 0 {noformat} Also, it would be great if we could use loggraph to visualize what is going on in your situation. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900111#action_12900111 ] Vishal K commented on ZOOKEEPER-822: at line 852 in 10.17.119.101-zookeeper.log WorkSender finally finds something in sendqueue and starts sending the notification to server 1. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900108#action_12900108 ] Vishal K commented on ZOOKEEPER-822: I am suspecting that one of the node (10.17.119.101) is not sending the notification to the other node. sendNotifications() is called to send notification to all peers. This functions enteres the notification in sendqueue. However, either the entry was not put in the queue (sendqueue.offer failed) or the thread that polls sendqueue did not wake up. I am not sure what the cause is yet. I had added extra debug messages. Three messages are of main interest: 1. in sendNotifications(): Print "IN FLE sending notification to server id = 1" for each server. Also print "proposedLeader, proposedZxid, logicalclock" 2. In FastLeaderElection.lookForLeader() print "Updating proposa" before calling upgradeProposal if (totalOrderPredicate(n.leader, n.zxid, proposedZxid) is true 3. in WorkerSender.process(), log - LOG.info("WorkSender.process() QUEUEING m.state= " + m.state + " m.leader=" + m.leader + " m.sid=" + m.sid); Suppporting log entries from 10.17.119.101-zookeeper.log. I have added description inline.. -- 2010-08-18 14:53:56,451 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@475] - IN FLE sending notification to server id = 1 2010-08-18 14:53:56,451 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@476] - proposedLeader, proposedZxid, logicalclock 0343597383684 2010-08-18 14:53:56,452 - INFO [WorkerSender Thread:fastleaderelection$messenger$workersen...@352] - WorkSender.process() QUEUEING m.state= LOOKING m.leader=0 m.sid=1 2010-08-18 14:53:56,452 - DEBUG [WorkerSender Thread:quorumcnxmana...@347] - Opening channel to server 1 2010-08-18 14:53:56,453 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@475] - IN FLE sending notification to server id = 2 2010-08-18 14:53:56,453 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@476] - proposedLeader, proposedZxid, logicalclock 0343597383684 2010-08-18 14:53:56,453 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@690] - Notification: 1, 34359738368, 4, 0, LOOKING, LOOKING, 1 2010-08-18 14:53:56,454 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@496] - id: 1, proposed id: 0, zxid: 34359738368, proposed zxid: 34359738368 2010-08-18 14:53:56,454 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@710] - Updating proposal 2010-08-18 14:53:56,454 - INFO [WorkerSender Thread:quorumcnxmana...@162] - Have smaller server identifier, so dropping the connection: (1, 0) 2010-08-18 14:53:56,455 - INFO [WorkerSender Thread:fastleaderelection$messenger$workersen...@352] - WorkSender.process() QUEUEING m.state= LOOKING m.leader=0 m.sid=2 2010-08-18 14:53:56,458 - DEBUG [WorkerSender Thread:quorumcnxmana...@347] - Opening channel to server 2 2010-08-18 14:53:56,458 - WARN [Thread-19:quorumcnxmanager$recvwor...@659] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:631) 2010-08-18 14:53:56,459 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@475] - IN FLE sending notification to server id = 0 2010-08-18 14:53:56,460 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@476] - proposedLeader, proposedZxid, logicalclock 1343597383684 * The above line shows that this node (server 0) is going to vote for 1. see - proposedLeader, proposedZxid, logicalclock 1 34359738368 4 Forgot to add spaces in the message :-) 2010-08-18 14:53:56,460 - DEBUG [Thread-1:quorumcnxmanager$liste...@446] - Connection request /10.17.119.102:41597 2010-08-18 14:53:56,461 - DEBUG [Thread-1:quorumcnxmanager$liste...@449] - Connection request: 0 2010-08-18 14:53:56,461 - DEBUG [Thread-1:quorumcnxmanager$sendwor...@505] - Address of remote peer: 1 2010-08-18 14:53:56,461 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@475] - IN FLE sending notification to server id = 1 2010-08-18 14:53:56,462 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@476] - proposedLeader, proposedZxid, logicalclock 1343597383684 * Above, server 0 queued a notification to be sent to server 1. The notfication is saying that it accepts 1 as the leader. But the notification never got sent. process() was not called at all from WorkerSender. Its almost as if the notification was never entered in sendqueue (in sendNotifications). * 2010-08-18 14:53:56,462 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@475] - IN FLE sending notification to server id = 2 2010-08-18 14:53:56,462 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@476] - proposedLeader, proposedZxid, logicalclock 1343597383684 2010-08-18 14:53:56,463 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900010#action_12900010 ] Flavio Junqueira commented on ZOOKEEPER-822: Hi Vishal, Thanks for reporting. Here are some quick comments: Issue 1: I think that just the javadoc message is incorrect. We really just want to check that some process has received notifications. Issue2: The connection will eventually timeout if not established, so setting a different value should not make a difference. The point about blocking connect is a good one. I think it is worthwhile creating a jira for it. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899983#action_12899983 ] Vishal K commented on ZOOKEEPER-822: While going through the code yesterday, I found two potential problems that I though might be worth reporting in the context of this bug. 1. In FastLeaderElection.java /** * Check if all queues are empty, indicating that all messages have been delivered. */ boolean haveDelivered() { for (ArrayBlockingQueue queue : queueSendMap.values()) { LOG.debug("Queue size: " + queue.size()); if (queue.size() == 0) return true; } return false; } the haveDelivered() function returns true without checking if rest of the queus are empty. 2. QuorumCnxManager.connectAll() function connects to one peer at a time and it uses a blocking connect (SocketChannle.open). I added a timeout to the SocketChannel.open and that did not fix the problem. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, > test_zookeeper_2.log, zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899156#action_12899156 ] Vishal K commented on ZOOKEEPER-822: Hi Ivan, Can you describe me your setup? My setup info: - 3 ESX boxes - 1 SLES 11 VMs on each - Cluster of 3 nodes I hit this problem consistently after rebooting the leader. Thanks. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896983#action_12896983 ] Ivan Kelly commented on ZOOKEEPER-822: -- Actually, ignore that. Read it wrong. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896980#action_12896980 ] Ivan Kelly commented on ZOOKEEPER-822: -- Could this be related? https://issues.apache.org/jira/browse/ZOOKEEPER-785 > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896063#action_12896063 ] Ivan Kelly commented on ZOOKEEPER-822: -- They logs were of some help, but I don't understand what's happening. I looks like multiple nodes are claiming leadership at the same time, but that can't be right. The FLE changes won't fix it, but they do log more information, so they may make it easier to see what is happening. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896061#action_12896061 ] Vishal K commented on ZOOKEEPER-822: Hi Ivan. Were the logs of any help? I might be worth having 3 VMs and rebooting the leader instead of shutting down the interface. We have seen this on all of our dev cluster. Al tough all the dev clusters are based on same VM images. So it won't be fair to claim that the problem was seen on different set of machines. I will try with the latest trunk and let you know the result. What FLE changes do you think would have fixed this problem? Thanks. -Vishal > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896043#action_12896043 ] Ivan Kelly commented on ZOOKEEPER-822: -- I've tried to repro this with 3 zookeepers running on the same machine, and 3 zookeepers running on virtual machine and I cannot get it to repro. I was taking out the leader by shutting down the network interface. Have you been able to repro this on another set of machines other than the ones you first observed it on? Also, could you try this with the latest trunk? Some improvements were made around the FLE which may shed some more light. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894576#action_12894576 ] Vishal K commented on ZOOKEEPER-822: I have attached new logs. I don't use ntp, but all l the nodes should be at the most a few seconds apart. I have marked start and end of the faulty election. look at zookeeper-192.168.10.3-log and search for "vishal", Note - it is super easy to reproduce the bug. Create a 3 node cluster and reboot the leader (or shutdown the network interface). You may need to repeat the test several times. If you do a clean shutdown of the leader (zkServer.sh stop), then you won't see this bug. I feel that there is something releated to TCP timeout/ session management of failed node that is causing this problem. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: 822.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893121#action_12893121 ] Ivan Kelly commented on ZOOKEEPER-822: -- currently the timestamp is used as the zxid isn't always available in the message logs. But yes, zxid would be more desirable. Perhaps I can extract that from the context some other way. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892830#action_12892830 ] Patrick Hunt commented on ZOOKEEPER-822: Ivan, do the clocks need to be in sync? Perhaps you should use the xid (cxid) instead? > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892761#action_12892761 ] Ivan Kelly commented on ZOOKEEPER-822: -- Hi Vishal, The logs in zk_leader_election.tar.gz seem to be from different runs. node0 starts at 2010-07-22 17:33:54,166, node1 at 2010-07-22 22:21:11,979 and node2 at 2010-07-22 22:22:17,249. Are the clocks on the machine in sync? -Ivan > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891582#action_12891582 ] Flavio Paiva Junqueira commented on ZOOKEEPER-822: -- Thanks for the logs, Vishal. The jira discussing loggraph is ZOOKEEPER-773. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log, > zk_leader_election.tar.gz > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890199#action_12890199 ] Flavio Paiva Junqueira commented on ZOOKEEPER-822: -- Hi Vishal, Do you think you can uploaded all three log files for a problematic run? We would like to put it on loggraph to visualize what's going on there. It sounds like it is somehow related to the VM reboots, I don't know why yet. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890052#action_12890052 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, I have Zookeeper servers running in a VM. To kill ZK server, I power off a VM. On the other hand, I tried several times killing ZK process and restarting it and I did not see any issues. So there is something about the reboot that is causing this problem (TCP session not getting cleaned-up?). I don't see many connection exceptions in the log. Once the leader election starts we start seeing "Notification time out" messages. However, before this we do see that the connection was established (show below): 2010-07-19 14:40:52,562 - DEBUG [WorkerSender Thread:quorumcnxmana...@366] - There is a connection already for server 0 2010-07-19 14:40:52,563 - DEBUG [WorkerSender Thread:quorumcnxmana...@346] - Opening channel to server 2 Do you still think this is a communication problem? > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889906#action_12889906 ] Flavio Paiva Junqueira commented on ZOOKEEPER-822: -- Vishal, I can't reproduce your problem. I just tried twice to kill the leader and rejoin it 20 times each, and I can't see the problem you're mentioning. I wonder if there is anything special about your setup. I also can see in your logs lots of exceptions related to connections, and as a first cut, it sounds like this is preventing the severs from exchanging notifications, and therefore the delay. Two minor comments: your log file for server 2 does not contain "START HERE" and each file duplicates every message. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889907#action_12889907 ] Ivan Kelly commented on ZOOKEEPER-822: -- Could you try putting the logs through loggraph (in zookeeper/src/contrib)? Perhaps a graphical view will give some insight? > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889902#action_12889902 ] Vishal K commented on ZOOKEEPER-822: I would like that add that the problem is highly reproducible. > Leader election taking a long time to complete > --- > > Key: ZOOKEEPER-822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.3.0 >Reporter: Vishal K >Priority: Blocker > Attachments: test_zookeeper_1.log, test_zookeeper_2.log > > > Created a 3 node cluster. > 1 Fail the ZK leader > 2. Let leader election finish. Restart the leader and let it join the > 3. Repeat > After a few rounds leader election takes anywhere 25- 60 seconds to finish. > Note- we didn't have any ZK clients and no new znodes were created. > zoo.cfg is shown below: > #Mon Jul 19 12:15:10 UTC 2010 > server.1=192.168.4.12\:2888\:3888 > server.0=192.168.4.11\:2888\:3888 > clientPort=2181 > dataDir=/var/zookeeper > syncLimit=2 > server.2=192.168.4.13\:2888\:3888 > initLimit=5 > tickTime=2000 > I have attached logs from two nodes that took a long time to form the cluster > after failing the leader. The leader was down anyways so logs from that node > shouldn't matter. > Look for "START HERE". Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.