[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471020#comment-16471020 ] Patrick Hunt commented on ZOOKEEPER-1807: - The test is failing occasionally, but at this point fairly rarely. I'm reclosing this given the fix is in, however if it shows up again please open a new JIRA. > Observers spam each other creating connections to the election addr > --- > > Key: ZOOKEEPER-1807 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 > Project: ZooKeeper > Issue Type: Bug >Reporter: Raul Gutierrez Segales >Assignee: Alexander Shraer >Priority: Blocker > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, > ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, > ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, > ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png > > > Hey [~shralex], > I noticed today that my Observers are spamming each other trying to open > connections to the election port. I've got tons of these: > {noformat} > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a > connection already for server 9 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a > connection already for server 10 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a > connection already for server 6 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a > connection already for server 12 > 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a > connection already for server 14 > {noformat} > and so and so on ad nauseam. > Now, looking around I found this inside FastLeaderElection.java from when you > committed ZOOKEEPER-107: > {noformat} > private void sendNotifications() { > -for (QuorumServer server : self.getVotingView().values()) { > -long sid = server.id; > - > +for (long sid : self.getAllKnownServerIds()) { > +QuorumVerifier qv = self.getQuorumVerifier(); > {noformat} > Is that really desired? I suspect that is what's causing Observers to try to > connect to each other (as opposed as just connecting to participants). I'll > give it a try now and let you know. (Also, we use observer ids that are > 0, > and I saw some parts of the code that might not deal with that assumption - > so it could be that too..). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491249#comment-14491249 ] Alexander Shraer commented on ZOOKEEPER-1807: - The test consistently passes locally for me, and I don't have access to the build machine so I'm not sure how to debug this. Does it fail for anyone else ? One interesting thing (although probably unrelated) is that there is Processing stat command 3000 times in the log. Are so many stat invocations expected ? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490322#comment-14490322 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Was the build failure transient? The build logs are gone... Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490726#comment-14490726 ] Michi Mutsuzaki commented on ZOOKEEPER-1807: I haven't seen this failure for a while, but I'm not sure anything was done to fix this. How about moving this to 3.5.2? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069024#comment-14069024 ] Patrick Hunt commented on ZOOKEEPER-1807: - [~fpj] and [~shralex] please take a look at the recent test failure, a test introduced by this jira: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/2380/testReport/junit/org.apache.zookeeper.server.quorum/ReconfigRecoveryTest/testCurrentObserverIsParticipantInNewConfig/ Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067486#comment-14067486 ] Hudson commented on ZOOKEEPER-1807: --- SUCCESS: Integrated in ZooKeeper-trunk #2378 (See [https://builds.apache.org/job/ZooKeeper-trunk/2378/]) ZOOKEEPER-1807. Observers spam each other creating connections to the election addr (Alex Shraer via fpj) (fpj: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611765) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/ReconfigRecoveryTest.java Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066529#comment-14066529 ] Flavio Junqueira commented on ZOOKEEPER-1807: - hey alex, would you mind cleaning up the patch a bit? There are some formatting problems and empty spaces, like: {noformat} -HashSetLong set = new HashSetLong(); + SyncedLearnerTracker voteSet = new SyncedLearnerTracker(); + voteSet.addQuorumVerifier(self.getQuorumVerifier()); + if (self.getLastSeenQuorumVerifier() != null + self.getLastSeenQuorumVerifier().getVersion() self.getQuorumVerifier().getVersion()) { + voteSet.addQuorumVerifier(self.getLastSeenQuorumVerifier()); + } {noformat} In this case there are more spaces for the hashset line than the others. There are also spurious spaces like at the end of the closing curly brace above, and some empty lines with spaces across the patch. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066706#comment-14066706 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656544/ZOOKEEPER-1807-ver7.patch against trunk revision 1611732. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 87 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2199//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2199//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2199//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066724#comment-14066724 ] Flavio Junqueira commented on ZOOKEEPER-1807: - +1, thanks, [~shralex]. Committed revision 1611765. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807-ver7.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064812#comment-14064812 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656256/ZOOKEEPER-1807-ver6.patch against trunk revision 1611309. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 87 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2196//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2196//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2196//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065005#comment-14065005 ] Flavio Junqueira commented on ZOOKEEPER-1807: - I checked QCM and it indeed only keeps trying to connect while it is executing lookForLeader, so your observation is right. The approach you propose seems right to me. I also checked the test failure and the test that failed is NioNettySuiteHammerTest. It timed out, so it is hard to say if it is related to this patch or not, although I've seen this same test failure with another patch recently. I'm not sure what's causing it, so we might want to consider committing this patch and trying to figure out the hammer test problem separately. What do you think? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065087#comment-14065087 ] Alexander Shraer commented on ZOOKEEPER-1807: - Thanks for checking this. I ran the failing test locally twice and it passes. I'm not sure what happened with the findBugs failure, it doesn't look related to the patch, maybe a different version of findbugs started running ? if you think its ok, we could commit it. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065543#comment-14065543 ] Flavio Junqueira commented on ZOOKEEPER-1807: - We should get this in. The findbugs problem isn't related to this patch, and it is appearing in every patch that has been submitted in the past few days. I had a look at the findbugs report in any case and couldn't see anything related to this patch. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062797#comment-14062797 ] Alexander Shraer commented on ZOOKEEPER-1807: - [~fpj], I'm trying to test your theory that new servers will continue to ping old ones until they connect. This scenario (I described in my previous message) comes up in the testNextConfigAlreadyActive in ReconfigRecoveryTest, which fails with the latest patch. it seems that servers 2 3 4 try to contact 0 and 1 but only once or twice and then stop trying. Do you know why this could be happening ? or where the retry logic implemented ? the log below is everything I get with respect to connection attempts, even if I wait longer. 3 Opening channel to server 0 2 Opening channel to server 0 2 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 3 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 3 Opening channel to server 1 2 Opening channel to server 1 3 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 3 Opening channel to server 2 2 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 3 Connected to server 2 2 Opening channel to server 3 2 Connected to server 3 4 Opening channel to server 0 4 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 4 Opening channel to server 1 4 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 4 Opening channel to server 2 4 Connected to server 2 2 Opening channel to server 4 2 Connected to server 4 3 Opening channel to server 4 3 Connected to server 4 2 Opening channel to server 0 4 Opening channel to server 3 4 Connected to server 3 2 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 2 Opening channel to server 1 2 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 4 Opening channel to server 3 4 Connected to server 3 2 Opening channel to server 0 2 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 2 Opening channel to server 1 2 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 3 Opening channel to server 0 3 Cannot open channel to 0 at election address localhost/127.0.0.1:11223 3 Opening channel to server 1 3 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 0 Opening channel to server 1 0 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 0 Opening channel to server 1 0 Cannot open channel to 1 at election address localhost/127.0.0.1:11226 1 Opening channel to server 0 1 Connected to server 0 Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058735#comment-14058735 ] Flavio Junqueira commented on ZOOKEEPER-1807: - Ping, I don't think we have converged here yet, have we? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972376#comment-13972376 ] Alexander Shraer commented on ZOOKEEPER-1807: - The failing test raises a couple of interesting issues... Mainly I think there is a race between the completion of FLE where we only require a quorum of old config and the establishment of new leader where we'd need both old and new quorums if we're recovering from a failed reconfig. It looks like we should ensure that we at least have a quorum of new config before ending FLE and moving to the next stage where we actually need this quorum. Here are two scenarios where this seems important. Suppose we have A, B in old config and A, B, C, D, E in new one. Suppose A, B rebooted during reconfig and will now have to recover (commit or join the new config). Case 1 (the failing test): C, D, E committed the reconfig. If A and B don't establish connection to C, D, E before completing FLE they won't find out about the new config being committed and will continuously try and fail to complete the reconfig (they'll fail because they won't get a quorum of new config). Its sort of ok since C, D, E are up and running, and possibly C D E will eventually contact A and B, but perhaps we should avoid this scenario anyway. By ensuring that A,B talk with a quorum of new config during FLE we guarantee that they switch to new config and not try to establish a leader in old one. Case 2: if C, D, E hasn't committed the new config and are actually trying to connect to A and B, but A and B could complete FLE before hearing from C, D, E they may again end up giving up and returning to FLE because they have no quorum of new config. So perhaps we should send the notifications to new config too and enforce having a quorum of new config before FLE is complete... Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972903#comment-13972903 ] Flavio Junqueira commented on ZOOKEEPER-1807: - bq. possibly C D E will eventually contact A and B QCM in C D E will keep trying to send notifications to A B, no? If so, they will learn of the new config as soon as one of C D E connect to them. It doesn't seem so bad and perhaps it is not worth delaying FLE by enforcing a quorum of both old and new configs. What do you think? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972120#comment-13972120 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640553/ZOOKEEPER-1807-ver5.patch against trunk revision 1587818. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2046//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2046//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2046//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968548#comment-13968548 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Maybe you looked at the initial comments, which implied that Observers shouldn't receive notifications? I think what we agreed of in the end is: https://reviews.apache.org/r/15317/ Which I think aligns with what you said. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968562#comment-13968562 ] Alexander Shraer commented on ZOOKEEPER-1807: - Hi Flavio, Raul, Befor ZK-107 the line was for (QuorumServer server : self.getVotingView().values()) { This patch basically brings this back. So if I understand correctly this wasn't sending notifications to observers before. But - everyone will send notifications to followers and if a follower receives a message it will respond directly, even to an observer. My reasoning is that FLE terminates once we have a quorum of the last committed config. So we could only possibly need votes from followers in the last committed config. Not from observers. Observers may contact followers through the same logic and get updated but this is not enforced by the termination rule of FLE. In the attached test the observer finds out that he really is a follower (whose vote is needed). Alex Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968579#comment-13968579 ] Flavio Junqueira commented on ZOOKEEPER-1807: - I'm actually getting it from the patch: {noformat} self.getVotingView().keySet() {noformat} Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968638#comment-13968638 ] Alexander Shraer commented on ZOOKEEPER-1807: - Yeah, and I think observers will indeed get a notification - they will sendNotifications to followers and the followers will respond directly to them. I meant that the termPredicate only needs followers to elect a leader. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968654#comment-13968654 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Sorry guys - I totally mixed things up. FWIW, this is what we've been using for a couple of months now: {noformat} diff --git a/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java b/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderEle index 9876c3d..38ae999 100644 --- a/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java +++ b/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java @@ -582,7 +582,7 @@ public class FastLeaderElection implements Election { * Send notifications to all peers upon a change in our vote */ private void sendNotifications() { -for (long sid : self.getAllKnownServerIds()) { +for (long sid : self.getCurrentAndNextConfigVoters()) { QuorumVerifier qv = self.getQuorumVerifier(); ToSend notmsg = new ToSend(ToSend.mType.notification, proposedLeader, diff --git a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java index 8926a82..06cf7d4 100644 --- a/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java +++ b/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java @@ -1112,12 +1112,12 @@ public class QuorumPeer extends Thread implements QuorumStats.Provider { return getQuorumVerifier().getObservingMembers(); } -public synchronized SetLong getAllKnownServerIds(){ - SetLong tmp = new HashSetLong(getQuorumVerifier().getAllMembers().keySet()); +public synchronized SetLong getCurrentAndNextConfigVoters(){ + SetLong voterIds = new HashSetLong(getQuorumVerifier().getVotingMembers().keySet()); if (getLastSeenQuorumVerifier()!=null) { - tmp.addAll(getLastSeenQuorumVerifier().getAllMembers().keySet()); + voterIds.addAll(getLastSeenQuorumVerifier().getVotingMembers().keySet()); } - return tmp; + return voterIds; } /** {noformat} Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968053#comment-13968053 ] Benjamin Reed commented on ZOOKEEPER-1807: -- +1 looks good to me. it would be nice if [~fpj] gave it a glance though :) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942447#comment-13942447 ] Michi Mutsuzaki commented on ZOOKEEPER-1807: I changed the priority to 'blocker'. We should get this fixed in 3.5.0. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942527#comment-13942527 ] Hadoop QA commented on ZOOKEEPER-1807: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612698/ZOOKEEPER-1807-ver5.patch against trunk revision 1577756. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1971//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1971//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1971//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838518#comment-13838518 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Guys, can we get some input here? cc: [~thawan], [~fpj]. Happy to trade some review karma :) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13823924#comment-13823924 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- I am happy to give the RB a shipit but I would prefer to have more feedback/reviews from [~thawan] and [~fpj] since they are more familiar with the internals of FLE. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816152#comment-13816152 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- [~shralex]: do you think that, perhaps, adding a comment elaborating a bit more on the rationale of notifications and the state of the new/old config would be worthwhile? I am thinking the comment should be along sendNotifications(). Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816157#comment-13816157 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- (could we get a reviewboard for this? some inline comments below) For: {noformat} +// start server 3 with new config +zk[2] = new ZooKeeper(127.0.0.1: + ports[2][2], ClientBase.CONNECTION_TIMEOUT, this); {noformat} I think the zk[2] assignment goes before the comment. For: {noformat} +for (int i=2; i3; i++) { +Assert.assertTrue(waiting for server + i + being up, +ClientBase.waitForServerUp(127.0.0.1: + ports[i][2], +CONNECTION_TIMEOUT * 2)); +ReconfigTest.testServerHasConfig(zk[i], allServersNext, null); +} {noformat} i= 3? Or no loop if you only want it to loop one time I guess. Also the ports assignment loop and the currentQuorumCfgSection creation are repeated in testObserverConvertedToParticipantDuringFLE and testCurrentObserverIsParticipantInNewConfig; mind DRY-ing this up a bit by putting those in private methods? (i.e.: generatePorts() and generateInitialConfig() or such such). Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816173#comment-13816173 ] Hadoop QA commented on ZOOKEEPER-1807: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612635/ZOOKEEPER-1807-ver3.patch against trunk revision 1539529. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1749//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1749//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1749//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816316#comment-13816316 ] Hadoop QA commented on ZOOKEEPER-1807: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612664/ZOOKEEPER-1807-ver4.patch against trunk revision 1539529. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1750//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1750//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1750//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816797#comment-13816797 ] Hadoop QA commented on ZOOKEEPER-1807: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612698/ZOOKEEPER-1807-ver5.patch against trunk revision 1539529. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1751//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1751//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1751//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815511#comment-13815511 ] Alexander Shraer commented on ZOOKEEPER-1807: - seems like I've already found and solved the same QuorumPeer NPE bug in ZOOKEEPER-1783, so once that one is committed I'll update the patch on this JIRA. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813944#comment-13813944 ] Germán Blanco commented on ZOOKEEPER-1807: -- My son was playing with the keyboard yesterday and he assigned this JIRA to me. I hope this is the worst part of the mess. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814516#comment-13814516 ] Hadoop QA commented on ZOOKEEPER-1807: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612295/ZOOKEEPER-1807-ver2.patch against trunk revision 1538853. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1745//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1745//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1745//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813071#comment-13813071 ] Alexander Shraer commented on ZOOKEEPER-1807: - probably there's not going to be any more of a loop than for participants. if you think this is not acceptable for observers, it would be sufficient to reply only when the sending server has a bigger config version (the one in QuorumVerifier) than the potential receiver. Otherwise there's no benefit for the receiver in terms of learning about new configs. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813077#comment-13813077 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Thanks for the quick comment Alex. Yeah sounds to me that might be acceptable. Again, for huge deployments it might be a bit of concern since you'll be putting extra pressure on the cluster after, say, a big network partition. Thoughts? Cc: [~thawan], [~fpj]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813085#comment-13813085 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611988/ZOOKEEPER-1807.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813108#comment-13813108 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611999/notifications-loop.png against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1741//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813111#comment-13813111 ] Thawan Kooburat commented on ZOOKEEPER-1807: I believe we have a much different concern using large number of observers. In our internal deployment, we did a few hacks which essentially kill all observer-to-observer communication. Observers only observe the result of election algorithm. We also add random delay when observer try to reconnect, so that participants has a chance to synchronize with the leader and form the quorum before the observers take away the leader's bandwidth. My understanding is that with our leader election algorithm, you need to broadcast your vote whenever your current vote change, so this will generate a lot of message during the initial phase of the algorithm. Also, N x N communication needed by LE is not going to scale for large deployment. For me, I don't think promoting observer to participant is going to be a common case (only needed for DR purpose), it would be acceptable to have optional flag to disable that feature in order to reduce LE overhead with large number of observers. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813287#comment-13813287 ] Alexander Shraer commented on ZOOKEEPER-1807: - This part is described in Section 3.2 of the paper: https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf Of course the paper doesn't talk about FastLeaderElection and things like that. So the actual implementation needs to have comments, and it does have them in many places, here we should probably explain some more. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813294#comment-13813294 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612023/ZOOKEEPER-1807-alex.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811902#comment-13811902 ] Alexander Shraer commented on ZOOKEEPER-1807: - Thanks Raul. This seems like a bit of an overkill - you're eliminating observer to observer responses. Would be better to understand what causes it to spin and to send notifications in normal rate like for participants. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811993#comment-13811993 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Well, if we really need observer to observer responses, for reconfig purposes I presume, then should we be sending them to observers not in LOOKING state? See the conditions that apply when responding to participants in the lines below my patch. But even still with that being correct it might be too much overhead for large Observers deployments. Should this be optional? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812073#comment-13812073 ] Alexander Shraer commented on ZOOKEEPER-1807: - what if we remove the if(!self.getVotingView().containsKey(response.sid)){ and always run the else code ? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812074#comment-13812074 ] Alexander Shraer commented on ZOOKEEPER-1807: - regarding overhead - if I understand the else code correctly, it will only send a message if one of them is LOOKING, so I'm not sure that the overhead is excessive. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812100#comment-13812100 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Yeah - I think you are right. In this ZOOKEEPER-107 world in which observers can be promoted, etc the initial if() doesn't make sense anymore. I'll submit a new patch so we can think about it a bit more. With regards of the overhead and making all of this optional, well if you have 100 observers restarted at once you'll have a large of notifications traffic. But I guess within the limits of tolerable. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812123#comment-13812123 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611781/ZOOKEEPER-1807.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1735//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1735//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811771#comment-13811771 ] Alexander Shraer commented on ZOOKEEPER-1807: - Hi Raul, ZK-107 allows changing server roles. In one config a server is an observer, in the next one it may be a follower. I haven't looked closely, but I think the intention was to talk with everyone you know to try to get the most up-to-date config information. Instead of reverting this to the previous code, consider adding a check (regardless of whether this is an observer/participant server) that won't attempt to create a connection if one is already there to the same server with the same election address (election addresses may change from one view to the next). The code should handle observer id 0, please file a JIRA if you find that there is a problem somewhere. Thanks, Alex Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811782#comment-13811782 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Oh - fair enough. So I suspect QuorumCnxManager isn't doing the right thing then. Will take look. Thanks for the quick reply! Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811798#comment-13811798 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Actually - my initial assessment was wrong (the spammy there is already a connection.. message confused me).I am seeing an excess in traffic between Observers through the election port, but it's not due to connection attempts. I'll come back with the actual messages. Sorry if this isn't actually related to ZOOKEEPER-107, [~shralex]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811800#comment-13811800 ] Flavio Junqueira commented on ZOOKEEPER-1807: - It would be good to understand if this is a bug that affects the 3.4 branch as well and if it is a blocker, [~rgs]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811802#comment-13811802 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Yes - absolutely [~fpj]. The amount of traffic that I am seeing between Observers through the election port is... scary. I am still trying to figure out what is going on. Will be back in a bit when I have a proper analysis. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811838#comment-13811838 ] Thawan Kooburat commented on ZOOKEEPER-1807: In our internal deployment, the host list in zoo.cfg for each observer only have the participants and itself. This helps address this issue a bit but obviously, in 3.5 world, this won't work if you want to promote an observer to a participant. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811849#comment-13811849 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex]. I added some debugging logging and I've see that the spam, to all Observers, are the notifications: {noformat} 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 9, peerEpoch = 130, configData = [B@5a0c0ce6 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 12, peerEpoch = 130, configData = [B@4d22fe39 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 6, peerEpoch = 130, configData = [B@346077bf 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@2955b776 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 11, peerEpoch = 130, configData = [B@3a7fb92d 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 14, peerEpoch = 130, configData = [B@1756575c 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@258164fc {noformat} As you can see, it's sending tons of notifications per second. Not good :) With this diff in FastLeaderElection.java (i.e.: a revert of part of your change): {noformat} private void sendNotifications() { -for (long sid : self.getAllKnownServerIds()) { +for (QuorumServer server : self.getVotingView().values()) { +long sid = server.id; {noformat} observers, of course, don't get spammed. I am guessing some condition is failing for Observers that assumes the notifications are fresh and sends them repeatedly? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811850#comment-13811850 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- [~fpj]: I think this is 3.5.0 specific since it goes away whilst reverting those bits from ZOOKEEPER-107 (there is a chance I am overlooking something, of course, and it's some other thing). But this is most likely a blocker for the 3.5.0 release though. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811851#comment-13811851 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- [~thawan]: should omitting the Observers from zoo.cfg actually make any difference? If so we should document it somewhere (unless it already is is). In my case, where I do explicitly enumerate them I don't get observers-to-observers connections on the election port once I remove the bits I mentioned above in FLE (so it seems to me it isn't). Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811858#comment-13811858 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- I think what's happening is that when we send the initial notifications to all members, as opposed to just voting members as it was before, we trigger off a self-replicating cascade of notifications. Each Observers gets the notification and then by virtue of: {noformat} /* * If it is from a non-voting server (such as an observer or * a non-voting follower), respond right away. */ if(!self.getVotingView().containsKey(response.sid)){ . } {noformat} it replies back to each Observer and so on. So sounds to me that this needs to match what we have in sendNotifications and actually check response.sid against self.getAllKnownServerIds() to avoid the endless echoing of notifications that I am seeing. Thoughts [~shralex], [~fpj] ? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811865#comment-13811865 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611737/ZOOKEEPER-1807.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)