[jira] [Commented] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811080#comment-13811080 ] Leader Ni commented on ZOOKEEPER-1804: -- patch2 sample $echo rwps|nc localhost 2181 RealTime R/W Statistics: getChildren2:0.5994005994005994 createSession: 1.6983016983016983 closeSession:0.999000999000999 setData:110.18981018981019 setWatches: 129.17082917082917 getChildren: 68.83116883116884 delete: 19.980019980019982 create: 22.27772227772228 exists:1806.2937062937062 getDate: 729.5704295704296 Stat the realtime tps of zookeepr server Key: ZOOKEEPER-1804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Leader Ni Assignee: Leader Ni Attachments: ZOOKEEPER-1804.patch At this time, we assessed whether zookeeper supports some business scenarios, always use the number of subscribers, or to assess the number of clients。 You konw, some times, many client connection with zookeeper, but do noting, and the onthers do complex business logic。 So,we must stat the realtime tps of zookeepr。 -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper_branch33_solaris - Build # 694 - Failure
See https://builds.apache.org/job/ZooKeeper_branch33_solaris/694/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on solaris1 in workspace /export/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:714) at hudson.EnvVars.getRemote(EnvVars.java:212) at hudson.model.Computer.getEnvironment(Computer.java:909) at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:40) at hudson.model.Job.getEnvironment(Job.java:374) at hudson.model.AbstractProject.getEnvironment(AbstractProject.java:354) at hudson.model.Run.getEnvironment(Run.java:2111) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:911) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:776) at hudson.model.AbstractProject.checkout(AbstractProject.java:1411) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:657) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:562) at hudson.model.Run.execute(Run.java:1603) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:246) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:774) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Updated] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leader Ni updated ZOOKEEPER-1804: - Description: At this time, we assessed whether zookeeper supports some business scenarios, always use the number of subscribers, or to assess the number of clients。 You konw, some times, many client connection with zookeeper, but do noting, and the onthers do complex business logic。 So,we must stat the realtime tps of zookeepr。 [-Solution---] Solution1: If you only want to know the real time transaction processed, you can use the patch ZOOKEEPER-1804.patch. Solution2: If you also want to know how client use zookeeper, and the real time r/w ps of each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch Sample: $echo rwps|nc localhost 2181 RealTime R/W Statistics: getChildren2: 0.5994005994005994 createSession: 1.6983016983016983 closeSession: 0.999000999000999 setData: 110.18981018981019 setWatches: 129.17082917082917 getChildren: 68.83116883116884 delete: 19.980019980019982 create: 22.27772227772228 exists: 1806.2937062937062 getDate: 729.5704295704296 was: At this time, we assessed whether zookeeper supports some business scenarios, always use the number of subscribers, or to assess the number of clients。 You konw, some times, many client connection with zookeeper, but do noting, and the onthers do complex business logic。 So,we must stat the realtime tps of zookeepr。 Stat the realtime tps of zookeepr server Key: ZOOKEEPER-1804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Leader Ni Assignee: Leader Ni Attachments: ZOOKEEPER-1804.patch At this time, we assessed whether zookeeper supports some business scenarios, always use the number of subscribers, or to assess the number of clients。 You konw, some times, many client connection with zookeeper, but do noting, and the onthers do complex business logic。 So,we must stat the realtime tps of zookeepr。 [-Solution---] Solution1: If you only want to know the real time transaction processed, you can use the patch ZOOKEEPER-1804.patch. Solution2: If you also want to know how client use zookeeper, and the real time r/w ps of each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch Sample: $echo rwps|nc localhost 2181 RealTime R/W Statistics: getChildren2: 0.5994005994005994 createSession: 1.6983016983016983 closeSession: 0.999000999000999 setData: 110.18981018981019 setWatches: 129.17082917082917 getChildren: 68.83116883116884 delete: 19.980019980019982 create: 22.27772227772228 exists: 1806.2937062937062 getDate: 729.5704295704296 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1798) Fix race condition in testNormalObserverRun
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811090#comment-13811090 ] Thawan Kooburat commented on ZOOKEEPER-1798: Just for the record, this test is not known to be flaky in our internal Jenkins (that test our internal branch). I am able to repro this on my mac. (Java 1.7.0_15, OSX 10.7.5). When this happen, it looks txnlog doesn't have any valid content in it. So the zkdb that we loaded after shutting down the observer never have txn that its znodes to data2. I also modified the test to leave the data files around and try to load it manually after the test fail. The txnlog is loaded successfully with the right content. I am thinking that the data flushed to disk by one thread is not visible by the other thread even after thread.join() is called in between. However, this really seem unlikely. But I ran the same test in our production host, I cannot repro the issue (yet) In Patrick log, this is slightly different. The test failed at line 1105, this means that the first txn in txnlog is read correctly, but not the second one. Fix race condition in testNormalObserverRun --- Key: ZOOKEEPER-1798 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1798 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Thawan Kooburat Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: TEST-org.apache.zookeeper.server.quorum.Zab1_0Test.txt, ZOOKEEPER-1798-b3.4.patch, ZOOKEEPER-1798-b3.4.patch, ZOOKEEPER-1798-b3.4.patch, ZOOKEEPER-1798.patch, ZOOKEEPER-1798.patch This is the output messges: noformat Testcase: testNormalObserverRun took 4.221 sec FAILED expected:data[2] but was:data[1] junit.framework.AssertionFailedError: expected:data[2] but was:data[1] at org.apache.zookeeper.server.quorum.Zab1_0Test$8.converseWithObserver(Zab1_0Test.java:1118) at org.apache.zookeeper.server.quorum.Zab1_0Test.testObserverConversation(Zab1_0Test.java:546) at org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalObserverRun(Zab1_0Test.java:994) noformat -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811092#comment-13811092 ] Raul Gutierrez Segales commented on ZOOKEEPER-1805: --- +1. Don't care value in ZooKeeper election breaks rolling upgrades Key: ZOOKEEPER-1805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper_branch34_solaris - Build # 696 - Still Failing
See https://builds.apache.org/job/ZooKeeper_branch34_solaris/696/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 156397 lines...] [junit] 2013-11-01 07:56:17,827 [myid:] - INFO [Thread-4:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:51448 (no session established for client) [junit] 2013-11-01 07:56:17,827 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-01 07:56:17,828 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-01 07:56:17,829 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-01 07:56:17,829 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-01 07:56:17,829 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-01 07:56:17,829 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [main:PrepRequestProcessor@761] - Shutting down [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! [junit] 2013-11-01 07:56:17,830 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2013-11-01 07:56:17,831 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-11-01 07:56:17,831 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 07:56:17,831 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-01 07:56:17,832 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-01 07:56:17,833 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test2587819802444114277.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test2587819802444114277.junit.dir/version-2 [junit] 2013-11-01 07:56:17,833 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-01 07:56:17,835 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 07:56:17,836 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:51450 [junit] 2013-11-01 07:56:17,836 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:51450 [junit] 2013-11-01 07:56:17,836 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-11-01 07:56:17,837 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:51450 (no session established for client) [junit] 2013-11-01 07:56:17,837 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-01 07:56:17,838 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-01 07:56:17,838 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-01 07:56:17,838 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-01 07:56:17,838 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-01 07:56:17,838 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-01 07:56:17,839 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-01 07:56:17,921 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x14212a9a76d closed [junit] 2013-11-01 07:56:17,921 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@509] - EventThread shut down [junit] 2013-11-01 07:56:17,921 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-01 07:56:17,923 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2013-11-01 07:56:17,923
ZooKeeper-trunk-solaris - Build # 718 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/718/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 222312 lines...] [junit] 2013-11-01 09:04:58,692 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-11-01 09:04:58,693 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-11-01 09:04:58,693 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-11-01 09:04:58,693 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-11-01 09:04:58,694 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-11-01 09:04:58,694 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-11-01 09:04:58,694 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-11-01 09:04:58,694 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-11-01 09:04:58,695 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 09:04:58,695 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-01 09:04:58,696 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-01 09:04:58,696 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2 [junit] 2013-11-01 09:04:58,697 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-11-01 09:04:58,697 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-01 09:04:58,698 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2/snapshot.b [junit] 2013-11-01 09:04:58,700 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test5069164875118837083.junit.dir/version-2/snapshot.b [junit] 2013-11-01 09:04:58,702 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 09:04:58,702 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:62958 [junit] 2013-11-01 09:04:58,703 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:62958 [junit] 2013-11-01 09:04:58,703 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-11-01 09:04:58,704 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:62958 (no session established for client) [junit] 2013-11-01 09:04:58,704 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-01 09:04:58,705 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-01 09:04:58,705 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-01 09:04:58,705 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-01 09:04:58,706 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-01 09:04:58,706 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-01 09:04:58,706 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-01 09:04:58,777 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x14212e88860 closed [junit] 2013-11-01 09:04:58,778 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-11-01 09:04:58,778 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-01 09:04:58,778 [myid:] - INFO
ZooKeeper-3.4-WinVS2008_java - Build # 340 - Still Failing
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/340/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 213188 lines...] [junit] 2013-11-01 10:12:28,108 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2013-11-01 10:12:28,108 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2013-11-01 10:12:28,209 [myid:] - INFO [main:PrepRequestProcessor@761] - Shutting down [junit] 2013-11-01 10:12:28,209 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2013-11-01 10:12:28,209 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! [junit] 2013-11-01 10:12:28,209 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2013-11-01 10:12:28,210 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-11-01 10:12:28,309 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 10:12:29,300 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-01 10:12:29,301 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-01 10:12:29,301 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1011140216037316854.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1011140216037316854.junit.dir\version-2 [junit] 2013-11-01 10:12:29,316 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-01 10:12:29,320 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 10:12:29,321 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65321 [junit] 2013-11-01 10:12:29,321 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:65321 [junit] 2013-11-01 10:12:29,416 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-11-01 10:12:29,418 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:65321 (no session established for client) [junit] 2013-11-01 10:12:29,418 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-01 10:12:29,420 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-01 10:12:29,420 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-01 10:12:29,516 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-01 10:12:29,516 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-01 10:12:29,516 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-01 10:12:29,516 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-01 10:12:29,815 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-11-01 10:12:29,816 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-11-01 10:12:29,816 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:65327 [junit] 2013-11-01 10:12:29,820 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client attempting to renew session 0x14213264b5c at /127.0.0.1:65327 [junit] 2013-11-01 10:12:29,821 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established session 0x14213264b5c with negotiated timeout 3 for client /127.0.0.1:65327 [junit] 2013-11-01 10:12:29,821 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x14213264b5c, negotiated timeout = 3 [junit] 2013-11-01 10:12:29,920 [myid:] - INFO [ProcessThread(sid:0
ZooKeeper-trunk-WinVS2008_java - Build # 588 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/588/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 283707 lines...] [junit] 2013-11-01 10:26:48,001 [myid:] - INFO [SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop! [junit] 2013-11-01 10:26:48,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop! [junit] 2013-11-01 10:26:48,093 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-11-01 10:26:48,094 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 10:26:48,608 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1008] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-11-01 10:26:49,093 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-01 10:26:49,094 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-01 10:26:49,094 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2 [junit] 2013-11-01 10:26:49,099 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 4 worker threads, and 64 kB direct buffers. [junit] 2013-11-01 10:26:49,100 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-01 10:26:49,102 [myid:] - INFO [main:FileSnap@83] - Reading snapshot f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2\snapshot.b [junit] 2013-11-01 10:26:49,105 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4049243956898429553.junit.dir\version-2\snapshot.b [junit] 2013-11-01 10:26:49,106 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-01 10:26:49,110 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:51549 [junit] 2013-11-01 10:26:49,111 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:51549 [junit] 2013-11-01 10:26:49,112 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-11-01 10:26:49,112 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:51549 (no session established for client) [junit] 2013-11-01 10:26:49,208 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-01 10:26:49,210 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-01 10:26:49,210 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-01 10:26:49,210 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-01 10:26:49,210 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-01 10:26:49,211 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-01 10:26:49,211 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-01 10:26:49,598 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@882] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-11-01 10:26:49,598 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:51544 [junit] 2013-11-01 10:26:49,599 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@858] - Client attempting to renew session 0x142133367d1 at /127.0.0.1:51544 [junit] 2013-11-01 10:26:49,615 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@604] - Established session 0x142133367d1 with negotiated timeout 3 for client /127.0.0.1:51544 [junit] 2013-11-01 10:26:49,616 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811263#comment-13811263 ] Flavio Junqueira commented on ZOOKEEPER-1805: - Thanks, guys. bq. Only a very mild suggestion, if you wish you could remove the changes introduced by ZOOKEEPER-1732 in Leader.java, Learner.java and QuorumPeer.java. I don't understand this comment. The change here simply detects that there is a mix of messages with and without don't care values, which must correspond to a rolling upgrade, so it ignores the corresponding fields and simply verifies that the epoch is greater. If this is right, then the changes of ZOOKEEPER-1732 are still required when everyone is sending don't care values. Am I missing anything? Don't care value in ZooKeeper election breaks rolling upgrades Key: ZOOKEEPER-1805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811314#comment-13811314 ] Benjamin Reed commented on ZOOKEEPER-1742: -- i propose that i open up another issue to not build these tests on the Mac and do a patch there and then lower the priority of this issue. does that sound ok? make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811391#comment-13811391 ] Flavio Junqueira commented on ZOOKEEPER-1742: - If we disable the tests for on Mac OS, aren't we exposing ourselves to trouble? The tests are there for a reason, but granted that I can't remember what these tests are doing, so it is not clear how critical it is that we run them on Mac OS. Just so that I understand, trying to fix the tests so that they run on Mac OS would be hard? They are running on Ubuntu fine now, is it right? make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
Raul Gutierrez Segales created ZOOKEEPER-1807: - Summary: Observers spam each other creating connections to the election addr Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811771#comment-13811771 ] Alexander Shraer commented on ZOOKEEPER-1807: - Hi Raul, ZK-107 allows changing server roles. In one config a server is an observer, in the next one it may be a follower. I haven't looked closely, but I think the intention was to talk with everyone you know to try to get the most up-to-date config information. Instead of reverting this to the previous code, consider adding a check (regardless of whether this is an observer/participant server) that won't attempt to create a connection if one is already there to the same server with the same election address (election addresses may change from one view to the next). The code should handle observer id 0, please file a JIRA if you find that there is a problem somewhere. Thanks, Alex Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811769#comment-13811769 ] Patrick Hunt commented on ZOOKEEPER-1742: - bq. They are running on Ubuntu fine now, is it right? correct in my case (ubuntu 13.10) make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811782#comment-13811782 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Oh - fair enough. So I suspect QuorumCnxManager isn't doing the right thing then. Will take look. Thanks for the quick reply! Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811798#comment-13811798 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Actually - my initial assessment was wrong (the spammy there is already a connection.. message confused me).I am seeing an excess in traffic between Observers through the election port, but it's not due to connection attempts. I'll come back with the actual messages. Sorry if this isn't actually related to ZOOKEEPER-107, [~shralex]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-1807: Fix Version/s: 3.5.0 Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811800#comment-13811800 ] Flavio Junqueira commented on ZOOKEEPER-1807: - It would be good to understand if this is a bug that affects the 3.4 branch as well and if it is a blocker, [~rgs]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811802#comment-13811802 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Yes - absolutely [~fpj]. The amount of traffic that I am seeing between Observers through the election port is... scary. I am still trying to figure out what is going on. Will be back in a bit when I have a proper analysis. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811837#comment-13811837 ] Thawan Kooburat commented on ZOOKEEPER-1790: In our internal deployment, we use negative sid to for observers (actually all of them is -1) . This is probably not the intended usage but it worked so far. It would be nice to add to release note of 3.5 if there is a change in the valid sid range. Deal with special ObserverId in QuorumCnxManager.receiveConnection -- Key: ZOOKEEPER-1790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.4.6, 3.5.0 QuorumCnxManager.receiveConnection assumes that a negative sid means that this is a 3.5.0 server, which has a different communication protocol. This doesn't account for the fact that ObserverId = -1 is a special id that may be used by observers and is also negative. This requires a fix to trunk and a separate fix to 3.4 branch, where this function is different (see ZOOKEEPER-1633) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811838#comment-13811838 ] Thawan Kooburat commented on ZOOKEEPER-1807: In our internal deployment, the host list in zoo.cfg for each observer only have the participants and itself. This helps address this issue a bit but obviously, in 3.5 world, this won't work if you want to promote an observer to a participant. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811839#comment-13811839 ] Flavio Junqueira commented on ZOOKEEPER-1790: - I was wondering if this has anything to do with ZOOKEEPER-1807. Deal with special ObserverId in QuorumCnxManager.receiveConnection -- Key: ZOOKEEPER-1790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.4.6, 3.5.0 QuorumCnxManager.receiveConnection assumes that a negative sid means that this is a 3.5.0 server, which has a different communication protocol. This doesn't account for the fact that ObserverId = -1 is a special id that may be used by observers and is also negative. This requires a fix to trunk and a separate fix to 3.4 branch, where this function is different (see ZOOKEEPER-1633) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811849#comment-13811849 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Okey - this seems to actually be related to ZOOKEEPER-107, [~shralex]. I added some debugging logging and I've see that the spam, to all Observers, are the notifications: {noformat} 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 9, peerEpoch = 130, configData = [B@5a0c0ce6 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 12, peerEpoch = 130, configData = [B@4d22fe39 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 6, peerEpoch = 130, configData = [B@346077bf 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@2955b776 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 11, peerEpoch = 130, configData = [B@3a7fb92d 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 14, peerEpoch = 130, configData = [B@1756575c 2013-11-02 02:33:21,341 - INFO [WorkerSender[myid=13]] - will send: leader = 3, zxid = 558362464215, electionEpoch = 5, state = OBSERVING, sid = 13, peerEpoch = 130, configData = [B@258164fc {noformat} As you can see, it's sending tons of notifications per second. Not good :) With this diff in FastLeaderElection.java (i.e.: a revert of part of your change): {noformat} private void sendNotifications() { -for (long sid : self.getAllKnownServerIds()) { +for (QuorumServer server : self.getVotingView().values()) { +long sid = server.id; {noformat} observers, of course, don't get spammed. I am guessing some condition is failing for Observers that assumes the notifications are fresh and sends them repeatedly? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811850#comment-13811850 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- [~fpj]: I think this is 3.5.0 specific since it goes away whilst reverting those bits from ZOOKEEPER-107 (there is a chance I am overlooking something, of course, and it's some other thing). But this is most likely a blocker for the 3.5.0 release though. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811851#comment-13811851 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- [~thawan]: should omitting the Observers from zoo.cfg actually make any difference? If so we should document it somewhere (unless it already is is). In my case, where I do explicitly enumerate them I don't get observers-to-observers connections on the election port once I remove the bits I mentioned above in FLE (so it seems to me it isn't). Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1790) Deal with special ObserverId in QuorumCnxManager.receiveConnection
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811852#comment-13811852 ] Raul Gutierrez Segales commented on ZOOKEEPER-1790: --- [~fpj]: I don't think it's related - my initial assessment was wrong. It isn't connection attempts that generate the extra traffic I am seeing but the Notifications (as commented in ZOOKEEPER-1807). Deal with special ObserverId in QuorumCnxManager.receiveConnection -- Key: ZOOKEEPER-1790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1790 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.4.6, 3.5.0 QuorumCnxManager.receiveConnection assumes that a negative sid means that this is a 3.5.0 server, which has a different communication protocol. This doesn't account for the fact that ObserverId = -1 is a special id that may be used by observers and is also negative. This requires a fix to trunk and a separate fix to 3.4 branch, where this function is different (see ZOOKEEPER-1633) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811858#comment-13811858 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- I think what's happening is that when we send the initial notifications to all members, as opposed to just voting members as it was before, we trigger off a self-replicating cascade of notifications. Each Observers gets the notification and then by virtue of: {noformat} /* * If it is from a non-voting server (such as an observer or * a non-voting follower), respond right away. */ if(!self.getVotingView().containsKey(response.sid)){ . } {noformat} it replies back to each Observer and so on. So sounds to me that this needs to match what we have in sendNotifications and actually check response.sid against self.getAllKnownServerIds() to avoid the endless echoing of notifications that I am seeing. Thoughts [~shralex], [~fpj] ? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-1807: -- Attachment: ZOOKEEPER-1807.patch The attached patch prevents sending replies back when we are an Observer. Since ZOOKEEPER-107 we send notifications to Observers because they can be promoted to Participants. But to avoid replicating replies forver (i.e.: an observer sends a notification and the receiving observer then sends another one and so on) we don't have to send notifications when we are a LearnerType.OBSERVER. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811865#comment-13811865 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611737/ZOOKEEPER-1807.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1807 PreCommit Build #1733
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 314178 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12611737/ZOOKEEPER-1807.patch [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1733//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 84362112158ddbbb07230db45919a0034a96 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 33 minutes 16 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1807 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Created] (BOOKKEEPER-701) Improve exception handling of Bookkeeper threads
Rakesh R created BOOKKEEPER-701: --- Summary: Improve exception handling of Bookkeeper threads Key: BOOKKEEPER-701 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-701 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-auto-recovery, bookkeeper-client, bookkeeper-server Reporter: Rakesh R Assignee: Rakesh R This JIRA discusses how to improve the exception handling of bookkeeper threads. As part of this it needs to review all the bookkeeper threads, if any unhandled exception from a thread, it should, - log a loud error when a thread dies. - exit if any of the critical thread dies. Please have a look at BOOKKEEPER-700 to know the initial discussions. -- This message was sent by Atlassian JIRA (v6.1#6144)