ZooKeeper-trunk-solaris - Build # 816 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/816/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 218399 lines...] [junit] 2014-02-10 09:04:50,362 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2014-02-10 09:04:50,362 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2014-02-10 09:04:50,362 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2014-02-10 09:04:50,362 [myid:] - INFO [main:FinalRequestProcessor@454] - shutdown of request processor complete [junit] 2014-02-10 09:04:50,363 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 09:04:50,364 [myid:] - INFO [main:JMXEnv@142] - ensureOnly:[] [junit] 2014-02-10 09:04:50,365 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-02-10 09:04:50,365 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-02-10 09:04:50,366 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2014-02-10 09:04:50,366 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-02-10 09:04:50,367 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-02-10 09:04:50,367 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test1482293020184050001.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test1482293020184050001.junit.dir/version-2 [junit] 2014-02-10 09:04:50,368 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test1482293020184050001.junit.dir/version-2/snapshot.b [junit] 2014-02-10 09:04:50,371 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test1482293020184050001.junit.dir/version-2/snapshot.b [junit] 2014-02-10 09:04:50,373 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 09:04:50,373 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:49467 [junit] 2014-02-10 09:04:50,374 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:49467 [junit] 2014-02-10 09:04:50,374 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-02-10 09:04:50,375 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:49467 (no session established for client) [junit] 2014-02-10 09:04:50,378 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-02-10 09:04:50,379 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-02-10 09:04:50,380 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-02-10 09:04:50,380 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-02-10 09:04:50,380 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-02-10 09:04:50,380 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 13366 [junit] 2014-02-10 09:04:50,380 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 24 [junit] 2014-02-10 09:04:50,381 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-02-10 09:04:50,381 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-02-10 09:04:50,449 [myid:] - INFO [main:ZooKeeper@954] - Session: 0x1441b0ab3f6 closed [junit] 2014-02-10 09:04:50,449 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-02-10 09:04:50,449 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@533] - EventThread shut down [junit] 2014-02-10 09:04:50,449 [myid:] - INFO
ZooKeeper-3.4-WinVS2008_java - Build # 433 - Still Failing
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/433/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 174081 lines...] [junit] 2014-02-10 09:24:18,948 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2014-02-10 09:24:18,948 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2014-02-10 09:24:18,949 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 09:24:19,277 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Opening socket connection to server 127.0.0.1/127.0.0.1:11221. Will not attempt to authenticate using SASL (unknown error) [junit] 2014-02-10 09:24:19,970 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2014-02-10 09:24:19,972 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-02-10 09:24:19,972 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-02-10 09:24:19,973 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-02-10 09:24:19,973 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-02-10 09:24:19,973 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test4458843289248132841.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test4458843289248132841.junit.dir\version-2 [junit] 2014-02-10 09:24:19,976 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 09:24:20,001 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53534 [junit] 2014-02-10 09:24:20,001 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing stat command from /127.0.0.1:53534 [junit] 2014-02-10 09:24:20,002 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@663] - Stat command output [junit] 2014-02-10 09:24:20,002 [myid:] - INFO [Thread-4:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:53534 (no session established for client) [junit] 2014-02-10 09:24:20,003 [myid:] - INFO [main:JMXEnv@229] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-02-10 09:24:20,004 [myid:] - INFO [main:JMXEnv@246] - expect:InMemoryDataTree [junit] 2014-02-10 09:24:20,004 [myid:] - INFO [main:JMXEnv@250] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-02-10 09:24:20,005 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2014-02-10 09:24:20,005 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-02-10 09:24:20,005 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 10710 [junit] 2014-02-10 09:24:20,005 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 20 [junit] 2014-02-10 09:24:20,005 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-02-10 09:24:20,006 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-02-10 09:24:20,275 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@849] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2014-02-10 09:24:20,275 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53531 [junit] 2014-02-10 09:24:20,276 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client attempting to renew session 0x1441b1c8486 at /127.0.0.1:53531 [junit] 2014-02-10 09:24:20,276 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established session 0x1441b1c8486 with negotiated timeout 3 for client /127.0.0.1:53531 [junit] 2014-02-10 09:24:20,276 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1228] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x1441b1c8486, negotiated timeout = 3 [junit] 2014-02-10 09:24:20,277 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1441b1c8486 [junit] 2014-02-10 09:24:20,277 [myid:] -
[jira] [Commented] (ZOOKEEPER-1833) fix windows build
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896372#comment-13896372 ] Flavio Junqueira commented on ZOOKEEPER-1833: - ZOOKEEPER-1872 had a pretty good effect, there much fewer failures and errors on build 433. Good job, [~rakeshr]! fix windows build - Key: ZOOKEEPER-1833 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1833 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.5 Reporter: Michi Mutsuzaki Assignee: Michi Mutsuzaki Priority: Blocker Fix For: 3.4.6 Attachments: LeaderSessionTrackerTest-output.txt, TEST-org.apache.zookeeper.test.QuorumTest.zip, ZOOKEEPER-1833-b3.4.patch, ZOOKEEPER-1833.patch, ZOOKEEPER-1833.patch A bunch of 3.4 tests are failing on windows. {noformat} [junit] 2013-12-06 08:40:59,692 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testEarlyLeaderAbandonment [junit] 2013-12-06 08:41:10,472 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testHighestZxidJoinLate [junit] 2013-12-06 08:45:31,085 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testUpdatingEpoch [junit] 2013-12-06 08:55:34,630 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testObserversHammer [junit] 2013-12-06 08:55:59,889 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncExistsFailure_NoNode [junit] 2013-12-06 08:56:00,571 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetACL [junit] 2013-12-06 08:56:02,626 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildrenEmpty [junit] 2013-12-06 08:56:03,491 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildrenSingle [junit] 2013-12-06 08:56:11,276 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildrenTwo [junit] 2013-12-06 08:56:13,878 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildrenFailure_NoNode [junit] 2013-12-06 08:56:16,294 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildren2Empty [junit] 2013-12-06 08:56:18,622 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildren2Single [junit] 2013-12-06 08:56:21,224 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildren2Two [junit] 2013-12-06 08:56:23,738 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetChildren2Failure_NoNode [junit] 2013-12-06 08:56:26,058 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetData [junit] 2013-12-06 08:56:28,482 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testAsyncGetDataFailure_NoNode [junit] 2013-12-06 08:57:35,527 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testStartupFailureCreate [junit] 2013-12-06 08:57:38,645 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testStartupFailureSet [junit] 2013-12-06 08:57:41,261 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testStartupFailureSnapshot [junit] 2013-12-06 08:59:22,222 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testClientWithWatcherObj [junit] 2013-12-06 09:00:05,592 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testClientCleanup [junit] 2013-12-06 09:01:24,113 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testBindByAddress [junit] 2013-12-06 09:02:14,123 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testClientwithoutWatcherObj [junit] 2013-12-06 09:05:56,461 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testZeroWeightQuorum [junit] 2013-12-06 09:08:18,747 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testResyncByDiffAfterFollowerCrashes [junit] 2013-12-06 09:09:42,271 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testFourLetterWords [junit] 2013-12-06 09:14:03,770 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testLE [junit] 2013-12-06 09:46:30,002 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testHierarchicalQuorum [junit] 2013-12-06 09:50:26,912 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testHammerBasic [junit] 2013-12-06 09:51:07,604 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testQuotaWithQuorum [junit] 2013-12-06 09:52:41,515 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testNull [junit] 2013-12-06 09:53:22,648 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testDeleteWithChildren [junit] 2013-12-06 09:56:49,061 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testClientwithoutWatcherObj [junit] 2013-12-06 09:58:27,705 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testGetView [junit] 2013-12-06 09:59:07,856 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testViewContains [junit] 2013-12-06 10:01:31,418 [myid:] - INFO [main:ZKTestCase$1@65] - FAILED testSessionMoved [junit] 2013-12-06 10:04:50,542
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896380#comment-13896380 ] Germán Blanco commented on ZOOKEEPER-832: - I can prepare a fix using the dbid as proposed by [~breed]. It will take some time (two or three weeks). The protocol between client and server needs to be modified. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.1, 3.4.5, 3.5.0 Environment: All Reporter: Ryan Holmes Assignee: Germán Blanco Fix For: 3.4.7, 3.5.0 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
ZooKeeper-trunk-WinVS2008_java - Build # 679 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/679/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 235168 lines...] [junit] 2014-02-10 10:30:38,559 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-02-10 10:30:38,560 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 4 worker threads, and 64 kB direct buffers. [junit] 2014-02-10 10:30:38,561 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-02-10 10:30:38,562 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-02-10 10:30:38,562 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4956943973854301676.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4956943973854301676.junit.dir\version-2 [junit] 2014-02-10 10:30:38,563 [myid:] - INFO [main:FileSnap@83] - Reading snapshot f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4956943973854301676.junit.dir\version-2\snapshot.b [junit] 2014-02-10 10:30:38,565 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test4956943973854301676.junit.dir\version-2\snapshot.b [junit] 2014-02-10 10:30:38,567 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 10:30:38,568 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:53175 [junit] 2014-02-10 10:30:38,568 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:53175 [junit] 2014-02-10 10:30:38,569 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-02-10 10:30:38,569 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:53175 (no session established for client) [junit] 2014-02-10 10:30:38,569 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-02-10 10:30:38,571 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-02-10 10:30:38,571 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-02-10 10:30:38,571 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-02-10 10:30:38,571 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-02-10 10:30:38,572 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 13094 [junit] 2014-02-10 10:30:38,572 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 24 [junit] 2014-02-10 10:30:38,572 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-02-10 10:30:38,572 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-02-10 10:30:38,611 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@968] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2014-02-10 10:30:38,611 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:53170 [junit] 2014-02-10 10:30:38,612 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@858] - Client attempting to renew session 0x1441b5939fa at /127.0.0.1:53170 [junit] 2014-02-10 10:30:38,613 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@604] - Established session 0x1441b5939fa with negotiated timeout 3 for client /127.0.0.1:53170 [junit] 2014-02-10 10:30:38,614 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1347] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x1441b5939fa, negotiated timeout = 3 [junit] 2014-02-10 10:30:38,660 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@680] - Processed session termination for sessionid: 0x1441b5939fa [junit] 2014-02-10 10:30:38,660 [myid:] - INFO [SyncThread:0:FileTxnLog@200] - Creating new log file: log.c [junit] 2014-02-10
ZooKeeper-trunk-jdk7 - Build # 781 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-jdk7/781/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 266392 lines...] [junit] 2014-02-10 11:19:35,720 [myid:] - INFO [main:FinalRequestProcessor@454] - shutdown of request processor complete [junit] 2014-02-10 11:19:35,721 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 11:19:35,721 [myid:] - INFO [main:JMXEnv@142] - ensureOnly:[] [junit] 2014-02-10 11:19:35,723 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-02-10 11:19:35,723 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-02-10 11:19:35,724 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 3 selector thread(s), 48 worker threads, and 64 kB direct buffers. [junit] 2014-02-10 11:19:35,724 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-02-10 11:19:35,725 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-02-10 11:19:35,725 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test1324450919300373067.junit.dir/version-2 snapdir /x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test1324450919300373067.junit.dir/version-2 [junit] 2014-02-10 11:19:35,726 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test1324450919300373067.junit.dir/version-2/snapshot.b [junit] 2014-02-10 11:19:35,729 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk7/trunk/build/test/tmp/test1324450919300373067.junit.dir/version-2/snapshot.b [junit] 2014-02-10 11:19:35,731 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-02-10 11:19:35,731 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:49908 [junit] 2014-02-10 11:19:35,732 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:49908 [junit] 2014-02-10 11:19:35,732 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-02-10 11:19:35,733 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:49908 (no session established for client) [junit] 2014-02-10 11:19:35,733 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-02-10 11:19:35,735 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-02-10 11:19:35,735 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-02-10 11:19:35,735 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-02-10 11:19:35,736 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-02-10 11:19:35,736 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 18262 [junit] 2014-02-10 11:19:35,736 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 25 [junit] 2014-02-10 11:19:35,737 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-02-10 11:19:35,737 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-02-10 11:19:35,790 [myid:] - INFO [main:ZooKeeper@954] - Session: 0x1441b861292 closed [junit] 2014-02-10 11:19:35,790 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-02-10 11:19:35,790 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@533] - EventThread shut down [junit] 2014-02-10 11:19:35,795 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2014-02-10 11:19:35,796 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@219] - accept thread exitted run method [junit] 2014-02-10 11:19:35,796 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2014-02-10 11:19:35,796 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896791#comment-13896791 ] Raul Gutierrez Segales commented on ZOOKEEPER-1573: --- fwiw, i like the idea of merging this and opening a new ticket to revisit the issue later on (potentially with a more robust approach). Unable to load database due to missing parent node -- Key: ZOOKEEPER-1573 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.5.0 Reporter: Thawan Kooburat Assignee: Vinayakumar B Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1573-3.4.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch While replaying txnlog on data tree, the server has a code to detect missing parent node. This code block was last modified as part of ZOOKEEPER-1333. In our production, we found a case where this check is return false positive. The sequence of txns is as follows: zxid 1: create /prefix/a zxid 2: create /prefix/a/b zxid 3: delete /prefix/a/b zxid 4: delete /prefix/a The server start capturing snapshot at zxid 1. However, by the time it traversing the data tree down to /prefix, txn 4 is already applied and /prefix have no children. When the server restore from snapshot, it process txnlog starting from zxid 2. This txn generate missing parent error and the server refuse to start up. The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (ZOOKEEPER-1879) improve the correctness checking of txn log replay
Patrick Hunt created ZOOKEEPER-1879: --- Summary: improve the correctness checking of txn log replay Key: ZOOKEEPER-1879 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1879 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Patrick Hunt Fix For: 3.4.7, 3.5.0 In ZOOKEEPER-1573 we decided to fix an issue by relaxing some of the checking. Specifically when the sequence of txns is as follows: * zxid 1: create /prefix/a * zxid 2: create /prefix/a/b * zxid 3: delete /prefix/a/b * zxid 4: delete /prefix/a the log may fail to replay. We addressed this by relaxing a check, which is essentially invalid for this case, but is important in finding corruptions of the datastore. We should add this check back with proper validation of correctness. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896985#comment-13896985 ] Patrick Hunt commented on ZOOKEEPER-1573: - Cool because I was thinking similar. I'm not 100% sure though it's going to be as simple as we'd like (see ZOOKEEPER-1879). I'll take care of committing this in a bit. Thanks all! Unable to load database due to missing parent node -- Key: ZOOKEEPER-1573 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.5.0 Reporter: Thawan Kooburat Assignee: Vinayakumar B Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1573-3.4.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch While replaying txnlog on data tree, the server has a code to detect missing parent node. This code block was last modified as part of ZOOKEEPER-1333. In our production, we found a case where this check is return false positive. The sequence of txns is as follows: zxid 1: create /prefix/a zxid 2: create /prefix/a/b zxid 3: delete /prefix/a/b zxid 4: delete /prefix/a The server start capturing snapshot at zxid 1. However, by the time it traversing the data tree down to /prefix, txn 4 is already applied and /prefix have no children. When the server restore from snapshot, it process txnlog starting from zxid 2. This txn generate missing parent error and the server refuse to start up. The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1879) improve the correctness checking of txn log replay
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896983#comment-13896983 ] Patrick Hunt commented on ZOOKEEPER-1879: - I've been thinking about this for a bit. I was hoping it would be simple and we could just add it to ZOOKEEPER-1573, however I'm not sure. In particular we would need to handle the case where the child/parent znode were added/removed multiple times. A simple check is probably not going to be good enough? We should probably think about it a bit to make sure we handle all the various cases. improve the correctness checking of txn log replay -- Key: ZOOKEEPER-1879 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1879 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Patrick Hunt Fix For: 3.4.7, 3.5.0 In ZOOKEEPER-1573 we decided to fix an issue by relaxing some of the checking. Specifically when the sequence of txns is as follows: * zxid 1: create /prefix/a * zxid 2: create /prefix/a/b * zxid 3: delete /prefix/a/b * zxid 4: delete /prefix/a the log may fail to replay. We addressed this by relaxing a check, which is essentially invalid for this case, but is important in finding corruptions of the datastore. We should add this check back with proper validation of correctness. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1811) The ZooKeeperSaslClient service name principal is hardcoded to zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1811: Assignee: Harsh J The ZooKeeperSaslClient service name principal is hardcoded to zookeeper -- Key: ZOOKEEPER-1811 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1811 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Harsh J Assignee: Harsh J Attachments: ZOOKEEPER-1811.patch The ClientCnxn class in ZK instantiates the ZooKeeperSaslClient with a hardcoded service name of zookeeper. This causes all apps to fail in accessing ZK in a secure environment where the administrator has changed the principal name ZooKeeper runs as. The service name should be configurable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1646) mt c client tests fail on Ubuntu Raring
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1646: Component/s: tests mt c client tests fail on Ubuntu Raring --- Key: ZOOKEEPER-1646 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1646 Project: ZooKeeper Issue Type: Bug Components: c client, tests Affects Versions: 3.4.5, 3.5.0 Environment: Ubuntu 13.04 (raring), glibc 2.17 Reporter: James Page Assignee: Patrick Hunt Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1646.patch Misc tests fail in the c client binding under the current Ubuntu development release: ./zktest-mt ZooKeeper server startedRunning Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server started : elapsed 9315 : OK Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054 Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055 Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : elapsed 1066 Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055 Zookeeper_init::testBasic : elapsed 1 : OK Zookeeper_init::testAddressResolution : elapsed 0 : OK Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK Zookeeper_init::testNullAddressString : elapsed 0 : OK Zookeeper_init::testEmptyAddressString : elapsed 0 : OK Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK Zookeeper_init::testNonexistentHost : elapsed 92 : OK Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056 Zookeeper_close::testCloseUnconnected : elapsed 0 : OK Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056 Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076 Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : elapsed 12155 : OK Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK Zookeeper_simpleSystem::testPath : elapsed 1024 : OK Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK Zookeeper_simpleSystem::testPing : elapsed 17287 : OK Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010 Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK Zookeeper_multi::testCreate : elapsed 1017 : OK Zookeeper_multi::testCreateDelete : elapsed 1007 : OK Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK Zookeeper_multi::testNestedCreate : elapsed 1009 : OK Zookeeper_multi::testSetData : elapsed 6019 : OK Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK Zookeeper_multi::testMultiFail : elapsed 1006 : OK Zookeeper_multi::testCheck : elapsed 1020 : OK Zookeeper_multi::testWatch : elapsed 2013 : OK Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: tests/ZKMocks.cc:271: SyncedBoolCondition DeliverWatchersWrapper::isDelivered() const: Assertion `i1000' failed. Aborted (core dumped) It would appear that the zookeeper connection does not transition to connected within the required time; I increased the time allowed but no change. Ubuntu raring has glibc 2.17; the test suite works fine on previous Ubuntu releases and this is the only difference that stood out. Interestingly the cli_mt worked just fine connecting to the same zookeeper instance that the tests left lying around so I'm assuming this is a test error rather than an actual bug. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1557) jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1557: Component/s: tests jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch - Key: ZOOKEEPER-1557 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1557 Project: ZooKeeper Issue Type: Bug Components: server, tests Affects Versions: 3.4.5, 3.5.0 Reporter: Patrick Hunt Assignee: Eugene Koontz Fix For: 3.4.6, 3.5.0 Attachments: SaslAuthFailTest.log, ZOOKEEPER-1557.patch, ZOOKEEPER-1557.patch, jstack.out Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ haven't seen this before. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1414) QuorumPeerMainTest.testQuorum, testBadPackets are failing intermittently
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-1414: Component/s: tests QuorumPeerMainTest.testQuorum, testBadPackets are failing intermittently Key: ZOOKEEPER-1414 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1414 Project: ZooKeeper Issue Type: Sub-task Components: server, tests Affects Versions: 3.4.3, 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Priority: Minor Labels: test Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1414.patch The QuorumPeerMainTest.testQuorum, testBadPackets testcases are failing intermittently due to the wrong ZKClient usage pattern. Saw the following ConnectionLoss on 3.4 version: {noformat} KeeperErrorCode = ConnectionLoss for /foo_q1 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foo_q1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:657) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPackets(QuorumPeerMainTest.java:212) {noformat} Since the ZooKeeper connection is happening in async way through ClientCnxn, the client should wait for the 'KeeperState.SyncConnected' event before start using. But these test cases are not waiting for the connection like: {noformat} ZooKeeper zk = new ZooKeeper(127.0.0.1: + CLIENT_PORT_QP1, ClientBase.CONNECTION_TIMEOUT, this); zk.create(/foo_q1, foobar1.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897335#comment-13897335 ] Ted Yu commented on ZOOKEEPER-1861: --- Further review on this would be appreciated. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897356#comment-13897356 ] Raul Gutierrez Segales commented on ZOOKEEPER-1861: --- How are the first hunks different? They also do the allocation even though you might not need them as well, no? I think that to avoid eagerly allocating and solve the concurrency issues you could put the check/set bits in a synchronized method and call that instead. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897419#comment-13897419 ] Jerry He commented on ZOOKEEPER-1875: - Hi, Are we good with the patch? NullPointerException in ClientCnxn$EventThread.processEvent --- Key: ZOOKEEPER-1875 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Jerry He Priority: Minor Fix For: 3.5.0 Attachments: ZOOKEEPER-1875-trunk.patch We've been seeing NullPointerException while working on HBase: {code} 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/biadmin/hbase-trunk 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hdtest009:2181 sessionTimeout=9 watcher=null 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to hdtest009/9.30.194.18:2181, initiating session 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, negotiated timeout = 6 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.NullPointerException at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) {code} The reason is the watcher is null in this part of the code: {code} private void processEvent(Object event) { try { if (event instanceof WatcherSetEventPair) { // each watcher will process the event WatcherSetEventPair pair = (WatcherSetEventPair) event; for (Watcher watcher : pair.watchers) { try { watcher.process(pair.event); } catch (Throwable t) { LOG.error(Error while calling watcher , t); } } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897458#comment-13897458 ] Camille Fournier commented on ZOOKEEPER-1875: - No easy test possible? NullPointerException in ClientCnxn$EventThread.processEvent --- Key: ZOOKEEPER-1875 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Jerry He Priority: Minor Fix For: 3.5.0 Attachments: ZOOKEEPER-1875-trunk.patch We've been seeing NullPointerException while working on HBase: {code} 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/biadmin/hbase-trunk 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=hdtest009:2181 sessionTimeout=9 watcher=null 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to hdtest009/9.30.194.18:2181, initiating session 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, negotiated timeout = 6 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher java.lang.NullPointerException at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) {code} The reason is the watcher is null in this part of the code: {code} private void processEvent(Object event) { try { if (event instanceof WatcherSetEventPair) { // each watcher will process the event WatcherSetEventPair pair = (WatcherSetEventPair) event; for (Watcher watcher : pair.watchers) { try { watcher.process(pair.event); } catch (Throwable t) { LOG.error(Error while calling watcher , t); } } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1573) Unable to load database due to missing parent node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897520#comment-13897520 ] Vinayakumar B commented on ZOOKEEPER-1573: -- Thanks all Unable to load database due to missing parent node -- Key: ZOOKEEPER-1573 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3, 3.5.0 Reporter: Thawan Kooburat Assignee: Vinayakumar B Priority: Critical Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1573-3.4.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch While replaying txnlog on data tree, the server has a code to detect missing parent node. This code block was last modified as part of ZOOKEEPER-1333. In our production, we found a case where this check is return false positive. The sequence of txns is as follows: zxid 1: create /prefix/a zxid 2: create /prefix/a/b zxid 3: delete /prefix/a/b zxid 4: delete /prefix/a The server start capturing snapshot at zxid 1. However, by the time it traversing the data tree down to /prefix, txn 4 is already applied and /prefix have no children. When the server restore from snapshot, it process txnlog starting from zxid 2. This txn generate missing parent error and the server refuse to start up. The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1879) improve the correctness checking of txn log replay
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897577#comment-13897577 ] Thawan Kooburat commented on ZOOKEEPER-1879: We can contribute our consistency checker as a contrib module. It is essentially a program that read the entire data tree from 2 servers using normal client API and compare it. It has heuristic to ignore in-flight changes so it never report false positive. We use this program to make pair-wise comparison between servers in each production ensemble. improve the correctness checking of txn log replay -- Key: ZOOKEEPER-1879 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1879 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Patrick Hunt Fix For: 3.4.7, 3.5.0 In ZOOKEEPER-1573 we decided to fix an issue by relaxing some of the checking. Specifically when the sequence of txns is as follows: * zxid 1: create /prefix/a * zxid 2: create /prefix/a/b * zxid 3: delete /prefix/a/b * zxid 4: delete /prefix/a the log may fail to replay. We addressed this by relaxing a check, which is essentially invalid for this case, but is important in finding corruptions of the datastore. We should add this check back with proper validation of correctness. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (BOOKKEEPER-629) Support hostname based ledger metadata to help users to change IP with existing installation
[ https://issues.apache.org/jira/browse/BOOKKEEPER-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated BOOKKEEPER-629: Attachment: 10-BOOKKEEPER-629.patch Thanks Sijie for the review and your time. Attached new patch addressing the comments. Support hostname based ledger metadata to help users to change IP with existing installation Key: BOOKKEEPER-629 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-629 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-auto-recovery, bookkeeper-client, bookkeeper-server Affects Versions: 4.2.1 Reporter: Vinayakumar B Assignee: Rakesh R Fix For: 4.3.0 Attachments: 1-BOOKKEEPER-629.patch, 10-BOOKKEEPER-629.patch, 2-BOOKKEEPER-629.patch, 3-BOOKKEEPER-629.patch, 4-BOOKKEEPER-629.patch, 5-BOOKKEEPER-629.patch, 6-BOOKKEEPER-629.patch, 7-BOOKKEEPER-629.patch, 9-BOOKKEEPER-629.patch Register the bookie with *hostname:port* and also store the bookie addresses as *hostname:port* in ledger metadata files instead of *ip:port* This will help users to change the machine IP if they want without loosing their data. Supporting hostname based installation/functionality is one of the important requirement of users. Any thoughts? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (BOOKKEEPER-629) Support hostname based ledger metadata to help users to change IP with existing installation
[ https://issues.apache.org/jira/browse/BOOKKEEPER-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896798#comment-13896798 ] Hadoop QA commented on BOOKKEEPER-629: -- Testing JIRA BOOKKEEPER-629 Patch [10-BOOKKEEPER-629.patch|https://issues.apache.org/jira/secure/attachment/12628005/10-BOOKKEEPER-629.patch] downloaded at Mon Feb 10 17:31:33 UTC 2014 {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 120 .{color:green}+1{color} the patch does adds/modifies 4 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 FINDBUGS{color} .{color:green}+1{color} the patch does not seem to introduce new Findbugs warnings {color:red}-1 TESTS{color} .Tests run: 904 .Tests failed: 0 .Tests errors: 1 .The patch failed the following testcases: . {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/588/ Support hostname based ledger metadata to help users to change IP with existing installation Key: BOOKKEEPER-629 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-629 Project: Bookkeeper Issue Type: Sub-task Components: bookkeeper-auto-recovery, bookkeeper-client, bookkeeper-server Affects Versions: 4.2.1 Reporter: Vinayakumar B Assignee: Rakesh R Fix For: 4.3.0 Attachments: 1-BOOKKEEPER-629.patch, 10-BOOKKEEPER-629.patch, 2-BOOKKEEPER-629.patch, 3-BOOKKEEPER-629.patch, 4-BOOKKEEPER-629.patch, 5-BOOKKEEPER-629.patch, 6-BOOKKEEPER-629.patch, 7-BOOKKEEPER-629.patch, 9-BOOKKEEPER-629.patch Register the bookie with *hostname:port* and also store the bookie addresses as *hostname:port* in ledger metadata files instead of *ip:port* This will help users to change the machine IP if they want without loosing their data. Supporting hostname based installation/functionality is one of the important requirement of users. Any thoughts? -- This message was sent by Atlassian JIRA (v6.1.5#6160)