ZooKeeper-trunk-solaris - Build # 710 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/710/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 224709 lines...] [junit] 2013-10-24 09:05:36,859 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-10-24 09:05:36,860 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-10-24 09:05:36,860 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-10-24 09:05:36,860 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-10-24 09:05:36,861 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-24 09:05:36,861 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-10-24 09:05:36,861 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-24 09:05:36,861 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-10-24 09:05:36,862 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 09:05:36,862 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-24 09:05:36,863 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-24 09:05:36,863 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7565802057807854427.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7565802057807854427.junit.dir/version-2 [junit] 2013-10-24 09:05:36,864 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-10-24 09:05:36,864 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-24 09:05:36,865 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7565802057807854427.junit.dir/version-2/snapshot.b [junit] 2013-10-24 09:05:36,867 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7565802057807854427.junit.dir/version-2/snapshot.b [junit] 2013-10-24 09:05:36,869 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 09:05:36,869 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:54186 [junit] 2013-10-24 09:05:36,870 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:54186 [junit] 2013-10-24 09:05:36,870 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-10-24 09:05:36,871 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:54186 (no session established for client) [junit] 2013-10-24 09:05:36,871 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-24 09:05:36,873 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-24 09:05:36,873 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-24 09:05:36,873 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-24 09:05:36,873 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-24 09:05:36,873 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-24 09:05:36,874 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-24 09:05:36,953 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x141e9b63cd5 closed [junit] 2013-10-24 09:05:36,953 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-10-24 09:05:36,953 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-24 09:05:36,953 [myid:] - INFO
ZooKeeper-3.4-WinVS2008_java - Build # 334 - Still Failing
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/334/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 210546 lines...] [junit] 2013-10-24 11:08:51,387 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-10-24 11:08:51,388 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 11:08:51,791 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) [junit] 2013-10-24 11:08:52,378 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-24 11:08:52,379 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-24 11:08:52,379 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test5092380397019331526.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test5092380397019331526.junit.dir\version-2 [junit] 2013-10-24 11:08:52,394 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-24 11:08:52,398 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 11:08:52,399 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:58706 [junit] 2013-10-24 11:08:52,399 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:58706 [junit] 2013-10-24 11:08:52,493 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-10-24 11:08:52,495 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:58706 (no session established for client) [junit] 2013-10-24 11:08:52,495 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-24 11:08:52,497 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-24 11:08:52,497 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-24 11:08:52,594 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-24 11:08:52,594 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-24 11:08:52,594 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-24 11:08:52,595 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-24 11:08:52,786 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2013-10-24 11:08:52,786 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:58702 [junit] 2013-10-24 11:08:52,786 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client attempting to renew session 0x141e9f01a3f at /127.0.0.1:58702 [junit] 2013-10-24 11:08:52,795 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established session 0x141e9f01a3f with negotiated timeout 3 for client /127.0.0.1:58702 [junit] 2013-10-24 11:08:52,795 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x141e9f01a3f, negotiated timeout = 3 [junit] 2013-10-24 11:08:52,796 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x141e9f01a3f [junit] 2013-10-24 11:08:52,896 [myid:] - INFO [SyncThread:0:FileTxnLog@199] - Creating new log file: log.c [junit] 2013-10-24 11:08:52,912 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x141e9f01a3f closed [junit] 2013-10-24 11:08:52,912 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-24 11:08:52,913 [myid:] - INFO [main:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:58702 which had sessionid 0x141e9f01a3f [junit] 2013-10-24 11:08:52,913 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@509] -
ZooKeeper-trunk-WinVS2008_java - Build # 582 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/582/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 276238 lines...] [junit] 2013-10-24 11:24:14,807 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@219] - accept thread exitted run method [junit] 2013-10-24 11:24:14,989 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-10-24 11:24:14,989 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-10-24 11:24:14,989 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-10-24 11:24:14,990 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-24 11:24:14,990 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-10-24 11:24:14,990 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-24 11:24:15,089 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-10-24 11:24:15,001 [myid:] - INFO [SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop! [junit] 2013-10-24 11:24:15,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@134] - SessionTrackerImpl exited loop! [junit] 2013-10-24 11:24:15,090 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 11:24:16,086 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-24 11:24:16,091 [myid:] - INFO [main:ZKTestCase$1@66] - FAILED testQuota [junit] junit.framework.AssertionFailedError: expected:0 but was:1 [junit] at junit.framework.Assert.fail(Assert.java:50) [junit] at junit.framework.Assert.failNotEquals(Assert.java:287) [junit] at junit.framework.Assert.assertEquals(Assert.java:67) [junit] at junit.framework.Assert.assertEquals(Assert.java:199) [junit] at junit.framework.Assert.assertEquals(Assert.java:205) [junit] at org.apache.zookeeper.test.JMXEnv.ensureOnly(JMXEnv.java:138) [junit] at org.apache.zookeeper.test.ClientBase.startServer(ClientBase.java:417) [junit] at org.apache.zookeeper.test.ZooKeeperQuotaTest.testQuota(ZooKeeperQuotaTest.java:72) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) [junit] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) [junit] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:52) [junit] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48) [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:292) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) [junit] 2013-10-24 11:24:16,095 [myid:] - INFO [main:ZKTestCase$1@56] - FINISHED testQuota [junit] Tests run: 1, Failures: 1, Errors:
[jira] [Updated] (ZOOKEEPER-1794) Add hash check to transaction history in quorum servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1794: - Fix Version/s: (was: 3.4.6) Add hash check to transaction history in quorum servers --- Key: ZOOKEEPER-1794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1794 Project: ZooKeeper Issue Type: Sub-task Components: quorum Reporter: Germán Blanco Assignee: Germán Blanco Fix For: 3.5.0 Original Estimate: 336h Remaining Estimate: 336h The goal of this task is to add a hash number to each transaction in the transaction history. This hash number will be the same in all members of the quorum, since it shall have the same result if the members have the same transaction history. That means that there will be no need to send any new information between members of the quorum, during the broadcast phase. The hash number will be checked by the leader when learners try to connect, and it shall also be sent together with the snapshot during synchronisation. If the hash number does not match, the synchronisation shall be done with a snapshot in order to overwrite the conflicts in the transaction history. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1777: - Fix Version/s: (was: 3.4.6) Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Critical Fix For: 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804070#comment-13804070 ] Hudson commented on ZOOKEEPER-1557: --- SUCCESS: Integrated in ZooKeeper-trunk #2099 (See [https://builds.apache.org/job/ZooKeeper-trunk/2099/]) ZOOKEEPER-1557. jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch (Eugene Koontz via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535251) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailNotifyTest.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SaslAuthFailTest.java jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch - Key: ZOOKEEPER-1557 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5, 3.5.0 Reporter: Patrick Hunt Assignee: Eugene Koontz Fix For: 3.4.6, 3.5.0 Attachments: jstack.out, SaslAuthFailTest.log, ZOOKEEPER-1557.patch, ZOOKEEPER-1557.patch Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ haven't seen this before. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1744) clientPortAddress breaks zkServer.sh status
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804072#comment-13804072 ] Hudson commented on ZOOKEEPER-1744: --- SUCCESS: Integrated in ZooKeeper-trunk #2099 (See [https://builds.apache.org/job/ZooKeeper-trunk/2099/]) ZOOKEEPER-1744. clientPortAddress breaks zkServer.sh status (Nick Ohanian via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535278) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/bin/zkServer.sh clientPortAddress breaks zkServer.sh status -- Key: ZOOKEEPER-1744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1744 Project: ZooKeeper Issue Type: Bug Components: scripts Affects Versions: 3.4.5 Reporter: Nick Ohanian Assignee: Nick Ohanian Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1744-br34.patch, ZOOKEEPER-1744.patch When clientPortAddress is used in the config file (zoo.cfg), zkServer.sh's status command runs a grep command that matches both clientPort and clientPortAddress. This creates an extra argument for FourLetterWordMain, which fails, so the status command incorrectly indicates that it couldn't connect to the server. Also, localhost is hardcoded as the target host for FourLetterWordMain. The clientPortAddress should be used if it is provided in the config file. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1499) clientPort config changes not backwards-compatible
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804071#comment-13804071 ] Hudson commented on ZOOKEEPER-1499: --- SUCCESS: Integrated in ZooKeeper-trunk #2099 (See [https://builds.apache.org/job/ZooKeeper-trunk/2099/]) ZOOKEEPER-1499. clientPort config changes not backwards-compatible (Alexander Shraer via phunt, breed) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1535280) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ReconfigTest.java clientPort config changes not backwards-compatible -- Key: ZOOKEEPER-1499 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1499 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Camille Fournier Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1499.patch, ZOOKEEPER-1499-ver1.java, ZOOKEEPER-1499-ver2.java, ZOOKEEPER-1499-ver3.patch With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper-trunk-jdk7 - Build # 691 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/691/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 280720 lines...] [junit] 2013-10-24 11:09:10,436 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-10-24 11:09:10,436 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-10-24 11:09:10,436 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-10-24 11:09:10,436 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-10-24 11:09:10,436 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-10-24 11:09:10,437 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-10-24 11:09:10,437 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 11:09:10,437 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-10-24 11:09:10,439 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-10-24 11:09:10,439 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7919402043197407303.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7919402043197407303.junit.dir/version-2 [junit] 2013-10-24 11:09:10,439 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-10-24 11:09:10,440 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-10-24 11:09:10,440 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7919402043197407303.junit.dir/version-2/snapshot.b [junit] 2013-10-24 11:09:10,443 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test7919402043197407303.junit.dir/version-2/snapshot.b [junit] 2013-10-24 11:09:10,444 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-10-24 11:09:10,445 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:44052 [junit] 2013-10-24 11:09:10,445 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:44052 [junit] 2013-10-24 11:09:10,446 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-10-24 11:09:10,446 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:44052 (no session established for client) [junit] 2013-10-24 11:09:10,446 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-10-24 11:09:10,448 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-10-24 11:09:10,448 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-10-24 11:09:10,448 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-10-24 11:09:10,448 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-10-24 11:09:10,448 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-10-24 11:09:10,449 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-10-24 11:09:10,520 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-10-24 11:09:10,520 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x141ea275baf closed [junit] 2013-10-24 11:09:10,521 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-10-24 11:09:10,521 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2013-10-24 11:09:10,521 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@219] - accept thread exitted run method [junit] 2013-10-24 11:09:10,521 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804281#comment-13804281 ] Flavio Junqueira commented on ZOOKEEPER-1742: - That's odd... Do you think the test failures are related to your patch? make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804293#comment-13804293 ] Flavio Junqueira commented on ZOOKEEPER-1382: - Given that the Mock* classes are only used in WatchLeakTest, I was wondering if it wouldn't be best to have them defined inside the WatchLeakTest class. Also, there is a test case not running because @Test is commented out. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Germán Blanco Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session
[jira] [Resolved] (ZOOKEEPER-1740) Zookeeper 3.3.4 loses ephemeral nodes under stress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira resolved ZOOKEEPER-1740. - Resolution: Not A Problem I'm marking this jira as not a problem based on my previous comment. If anyone disagrees, please reopen it. Zookeeper 3.3.4 loses ephemeral nodes under stress -- Key: ZOOKEEPER-1740 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1740 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.4 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6, 3.5.0 The current behavior of zookeeper for ephemeral nodes is that session expiration and ephemeral node deletion is not an atomic operation. The side-effect of the above zookeeper behavior in Kafka, for certain corner cases, is that ephemeral nodes can be lost even if the session is not expired. The sequence of events that can lead to lossy ephemeral nodes is as follows - 1. The session expires on the client, it assumes the ephemeral nodes are deleted, so it establishes a new session with zookeeper and tries to re-create the ephemeral nodes. 2. However, when it tries to re-create the ephemeral node,zookeeper throws back a NodeExists error code. Now this is legitimate during a session disconnect event (since zkclient automatically retries the operation and raises a NodeExists error). Also by design, Kafka server doesn't have multiple zookeeper clients create the same ephemeral node, so Kafka server assumes the NodeExists is normal. 3. However, after a few seconds zookeeper deletes that ephemeral node. So from the client's perspective, even though the client has a new valid session, its ephemeral node is gone. This behavior is triggered due to very long fsync operations on the zookeeper leader. When the leader wakes up from such a long fsync operation, it has several sessions to expire. And the time between the session expiration and the ephemeral node deletion is magnified. Between these 2 operations, a zookeeper client can issue a ephemeral node creation operation, that could've appeared to have succeeded, but the leader later deletes the ephemeral node leading to permanent ephemeral node loss from the client's perspective. Thread from zookeeper mailing list: http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results The way to reproduce this behavior is as follows - 1. Bring up a zookeeper 3.3.4 cluster and create several sessions with ephemeral ndoes on it using zkclient. Make sure the session expiration callback is implemented and it re-registers the ephemeral node. 2. Run the following script on the zookeeper leader - while true do kill -STOP $1 sleep 8 kill -CONT $1 sleep 60 done 3. Run another script to check for existence of ephemeral nodes. This script shows that zookeeper loses the ephemeral nodes and the clients still have a valid session. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804344#comment-13804344 ] Germán Blanco commented on ZOOKEEPER-1777: -- I have received a different suggestion that has less impact. The idea would be to reserve some bits of the zxid for sanity check (e.g. 12 bits). That means that the zxid will rollover more often, but the remaining space for zxid+epoch (51 bits) still should last for more than one hundred years. This sanity check will be calculated by the leader when increasing the zxid and it can be e.g. a random number. When a Follower connects to a leader or a client connects to a server, the leader and the server will only check if they see this zxid in their transaction history. If it is not there, then there is a warning and an snap is sent to the follower or the client connection is closed. There is no need to modify any protocol or storage with this, as far as I see. And most likely the biggest impact will be on the test cases. However if this is a configuration option, it will also be possible to decide to avoid the failures in some of the test cases. Any comments or opinions? Missing ephemeral nodes in one of the members of the ensemble - Key: ZOOKEEPER-1777 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.5 Environment: Linux, Java 1.7 Reporter: Germán Blanco Assignee: Germán Blanco Priority: Critical Fix For: 3.5.0 Attachments: logs_trunk.tar.gz, snaps.tar, ZOOKEEPER-1777-3.4.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.patch, ZOOKEEPER-1777.tar.gz In a 3-servers ensemble, one of the followers doesn't see part of the ephemeral nodes that are present in the leader and the other follower. The 8 missing nodes in the follower that is not ok were created in the end of epoch 1, the ensemble is running in epoch 2. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (ZOOKEEPER-1800) jenkins failure in testGetProposalFromTxn
Patrick Hunt created ZOOKEEPER-1800: --- Summary: jenkins failure in testGetProposalFromTxn Key: ZOOKEEPER-1800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1800 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Patrick Hunt Assignee: Thawan Kooburat Fix For: 3.5.0 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-jdk7/691/testReport/junit/org.apache.zookeeper.test/GetProposalFromTxnTest/testGetProposalFromTxn/ test was introduced in ZOOKEEPER-1413, seems to have failed twice so far this month. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1742) make check doesn't work on macos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804602#comment-13804602 ] Germán Blanco commented on ZOOKEEPER-1742: -- The test cases that fail in Linux must be related with the patch, since it is the only thing changed and the test cases passed without the patch. They don't look directly related with the changes, though. That is, at a first sight, they don't look like failures related with the counters that are updated in the functions that are wrapped. So it is probably some side effect, or an error in my changes. My guess about the test cases that fail in Mac and don't fail in Linux is that they are related with the new environment. That could be libraries, compiler, operating system or any other part that is new; and it could be generic to all macos users or specific to the environment that I used (Mac OSX 10.8.5, gcc 4.7.3). If anybody else has access to a mac, we can see if it fails in the same test cases or not, maybe that helps. make check doesn't work on macos -- Key: ZOOKEEPER-1742 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1742 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Benjamin Reed Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1742-3.4.patch, ZOOKEEPER-1742.patch There are two problems I have spotted when running make check with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that --wrap is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1382: - Attachment: ZOOKEEPER-1382-branch-3.4.patch Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Germán Blanco Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired - // Now that the session is ready we can start receiving packets synchronized (this.factory) { sk.selector().wakeup(); enableRecv();
[jira] [Updated] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco updated ZOOKEEPER-1382: - Attachment: ZOOKEEPER-1382.patch At some point I commented that out. I thought that the test case didn't make much sense since it was passing both with and without the change. It is also not there in the 3.4 version. But this was hiding an error in the test cases. Now it is fixed. I had to make a small modification in NIOServerCnxn.java in order to allow the test to work. SocketChannel.isOpen is a final method and can't be mocked by Mockito. The Mock classes can't be moved to a different package. Otherwise they don't have access to the protected elements in that package any more. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Germán Blanco Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805018#comment-13805018 ] Hadoop QA commented on ZOOKEEPER-1382: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610262/ZOOKEEPER-1382.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//console This message is automatically generated. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Germán Blanco Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client
Success: ZOOKEEPER-1382 PreCommit Build #1722
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 298819 lines...] [exec] BUILD SUCCESSFUL [exec] Total time: 0 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12610262/ZOOKEEPER-1382.patch [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 0b80d8c2155af18cc3cbaefc90d5f34f81615c5a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 33 minutes 16 seconds Archiving artifacts Recording test results Description set: ZOOKEEPER-1382 Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805039#comment-13805039 ] David Lao commented on ZOOKEEPER-1459: -- Hi. The Oct 8 patch does not actually close the transaction log. The log need to be closed explicitly via the FileTxnSnapLog handle that's passed into the ZooKeeperServer when bootstrapped through runFromConfig. I'm attach a patch generated from release-3.4.5 source that fixes it. Please run a QA pass on it. Thanks. Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Lao updated ZOOKEEPER-1459: - Attachment: ZOOKEEPER-1459.patch Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1459 PreCommit Build #1723
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1723/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 66 lines...] [exec] == [exec] [exec] [exec] patching file b/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java [exec] Hunk #1 FAILED at 40. [exec] Hunk #2 FAILED at 100. [exec] Hunk #3 FAILED at 123. [exec] 3 out of 3 hunks FAILED -- saving rejects to file b/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java.rej [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12610271/ZOOKEEPER-1459.patch [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1723//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 1f127b318ec22406c9051db6161cdb4e26de9a3d logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 42 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1459 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805045#comment-13805045 ] Hadoop QA commented on ZOOKEEPER-1459: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610271/ZOOKEEPER-1459.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1723//console This message is automatically generated. Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805061#comment-13805061 ] Rakesh R commented on ZOOKEEPER-1459: - Hi [~davidlao], Thanks for the review. Hope you are referring to the Oct 9th patch. Here I could see one case, it will fail to close the snapLog only when zkServer.shutdown(); throws an exception, are you pointing me to the same? Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)