ZooKeeper-trunk-WinVS2008 - Build # 754 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/754/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 168 lines...] Build FAILED. f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default target) (1) - (zookeeper target) - cl : Command line warning D9035: option 'Wp64' has been deprecated and will be removed in a future release .\src\zookeeper.c(502): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zookeeper.c(750): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zookeeper.c(1005): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zookeeper.c(1016): warning C4996: 'strncpy': This function or variable may be unsafe. Consider using strncpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. .\src\zookeeper.c(1022): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zookeeper.c(1083): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zk_log.c(120): warning C4996: '_snprintf': This function or variable may be unsafe. Consider using _snprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. .\src\zk_log.c(134): warning C4996: 'getpid': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _getpid. See online help for details. .\src\zk_log.c(160): warning C4996: 'vsnprintf': This function or variable may be unsafe. Consider using vsnprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. .\src\zk_hashtable.c(189): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\zk_hashtable.c(213): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\mt_adaptor.c(498): warning C4244: '=' : conversion from 'time_t' to 'int32_t', possible loss of data f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default target) (1) - (Cli target) - .\src\cli.c(124): warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. .\src\cli.c(281): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(282): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(351): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(371): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(386): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(397): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(430): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(433): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(445): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(458): warning C4996: 'strdup': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _strdup. See online help for details. .\src\cli.c(551): warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. .\src\cli.c(556): warning C4996: 'fopen': This function or variable
ZooKeeper-trunk-WinVS2008 - Build # 755 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/755/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 112 lines...] compile_jute_uptodate: compile_jute: [mkdir] Created dir: f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\java\generated [mkdir] Created dir: f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\generated [java] ../../zookeeper.jute Parsed Successfully [java] ../../zookeeper.jute Parsed Successfully [touch] Creating f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\java\generated\.generated BUILD SUCCESSFUL Total time: 8 seconds [ZooKeeper-trunk-WinVS2008] $ cmd /c call C:\Users\hudson\AppData\Local\Temp\hudson6878150785352312376.bat f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008set ZOOKEEPER_HOME=f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008msbuild trunk/src/c/zookeeper.sln /p:Configuration=Release Microsoft (R) Build Engine Version 3.5.30729.1 [Microsoft .NET Framework, Version 2.0.50727.4223] Copyright (C) Microsoft Corporation 2007. All rights reserved. Build started 3/3/2013 11:02:35 AM. Project f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln on node 0 (default targets). Building solution configuration Release|Win32. cl : Command line warning D9035: option 'Wp64' has been deprecated and will be removed in a future release C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_bbddcd05sy': No space left on device C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(120): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_94c14deasy': No space left on device f:\hudson\hudson-slave\workspace\zookeeper-trunk-winvs2008\trunk\src\c\src\hashtable/hashtable_itr.h(39): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_082f04a3sy': No space left on device C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_d298b055sy': No space left on device C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_a7c8acacsy': No space left on device Done Building Project f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default targets) -- FAILED. Build FAILED. f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default target) (1) - (zookeeper target) - cl : Command line warning D9035: option 'Wp64' has been deprecated and will be removed in a future release f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default target) (1) - (zookeeper target) - C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_bbddcd05sy': No space left on device C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(120): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_94c14deasy': No space left on device f:\hudson\hudson-slave\workspace\zookeeper-trunk-winvs2008\trunk\src\c\src\hashtable/hashtable_itr.h(39): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_082f04a3sy': No space left on device C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_d298b055sy': No space left on device f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008\trunk\src\c\zookeeper.sln (default target) (1) - (Cli target) - C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\time.inl(94): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\hudson\AppData\Local\Temp\_CL_a7c8acacsy': No space left on device 1 Warning(s) 5 Error(s) Time Elapsed 00:00:07.68 f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008exit 1 Build step 'Execute Windows batch command' marked build as failure Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Created] (ZOOKEEPER-1658) Support SRV records
Devin Bayer created ZOOKEEPER-1658: -- Summary: Support SRV records Key: ZOOKEEPER-1658 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1658 Project: ZooKeeper Issue Type: Improvement Reporter: Devin Bayer Priority: Minor We want to make client configuration easy so the quorum is just a single A rrset with multiple IP addresses. This isn't ideal because we need to hard-code the IPs off our zookeeper servers and they already have domain names. If zookeeper supported SRV, we could just do: _zookeeper.example.com. 86400 IN SRV 10 60 2181 worker1 _zookeeper.example.com. 86400 IN SRV 10 20 2181 worker2 _zookeeper.example.com. 86400 IN SRV 10 10 2181 worker3 and -Dhbase.zookeeper.quorum=example.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
ZooKeeper-trunk-solaris - Build # 486 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/486/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 165013 lines...] [junit] 2013-03-03 13:40:39,288 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@413] - selector thread exitted run method [junit] 2013-03-03 13:40:39,289 [myid:] - INFO [main:ZooKeeperServer@398] - shutting down [junit] 2013-03-03 13:40:39,289 [myid:] - INFO [main:SessionTrackerImpl@180] - Shutting down [junit] 2013-03-03 13:40:39,289 [myid:] - INFO [main:PrepRequestProcessor@804] - Shutting down [junit] 2013-03-03 13:40:39,289 [myid:] - INFO [main:SyncRequestProcessor@175] - Shutting down [junit] 2013-03-03 13:40:39,289 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@144] - PrepRequestProcessor exited loop! [junit] 2013-03-03 13:40:39,290 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited! [junit] 2013-03-03 13:40:39,290 [myid:] - INFO [main:FinalRequestProcessor@421] - shutdown of request processor complete [junit] 2013-03-03 13:40:39,291 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-03-03 13:40:39,291 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-03-03 13:40:39,292 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-03-03 13:40:39,292 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test444373612090522741.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test444373612090522741.junit.dir/version-2 [junit] 2013-03-03 13:40:39,293 [myid:] - INFO [main:NIOServerCnxnFactory@663] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-03-03 13:40:39,293 [myid:] - INFO [main:NIOServerCnxnFactory@676] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-03-03 13:40:39,294 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test444373612090522741.junit.dir/version-2/snapshot.b [junit] 2013-03-03 13:40:39,297 [myid:] - INFO [main:FileTxnSnapLog@270] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test444373612090522741.junit.dir/version-2/snapshot.b [junit] 2013-03-03 13:40:39,298 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-03-03 13:40:39,299 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@289] - Accepted socket connection from /127.0.0.1:34520 [junit] 2013-03-03 13:40:39,300 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@829] - Processing stat command from /127.0.0.1:34520 [junit] 2013-03-03 13:40:39,304 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@678] - Stat command output [junit] 2013-03-03 13:40:39,304 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:34520 (no session established for client) [junit] 2013-03-03 13:40:39,304 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-03-03 13:40:39,306 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-03-03 13:40:39,306 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-03-03 13:40:39,306 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-03-03 13:40:39,306 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-03-03 13:40:39,307 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-03-03 13:40:39,307 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-03-03 13:40:39,369 [myid:] - INFO [main:ZooKeeper@744] - Session: 0x13d307b978b closed [junit] 2013-03-03 13:40:39,369 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-03-03 13:40:39,369 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-03-03 13:40:39,370 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@413]
[jira] [Commented] (ZOOKEEPER-1633) Introduce a protocol version to connection initiation message
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591824#comment-13591824 ] Flavio Junqueira commented on ZOOKEEPER-1633: - +1 for me too. does anyone else want to comment on this patch? Introduce a protocol version to connection initiation message - Key: ZOOKEEPER-1633 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1633 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Alexander Shraer Assignee: Alexander Shraer Fix For: 3.4.6 Attachments: ZOOKEEPER-1633.patch, ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-v4.patch, ZOOKEEPER-1633-ver2.patch, ZOOKEEPER-1633-ver3.patch Currently the first message a server sends to another server includes just one field - the server's id (long). This is in QuorumCnxManager.java. This makes changes to the information passed during this initial connection very difficult. This patch will change the first field of the message to be a protocol version (a negative number that can't be a server id). The second field will be the server id. The third field is number of bytes in the remainder of the message. A 3.4 server will read the first field as before, but if this is a negative number it will read the second field to find the server id, and then remove the remainder of the message from the stream. This will not affect 3.4 since 3.4 and earlier servers send just the server id (so the code in the patch will not run unless there is a server 3.4 trying to connect). This will, however, provide the necessary flexibility for future releases as well as an upgrade path from 3.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Morello updated ZOOKEEPER-1382: --- Attachment: ZOOKEEPER-1382.patch I managed to create a JUnit test case that demonstrate the watches leak, unfortunately i had to promote a few methods to the public scope. As Neha pointed out the unit test shows that if a client send a connection request with an expired session the NIOServerCnxn class still processes upcoming messages like OpCode.setWatches. Actually the fix simply consists to call cnxn.enableRecv() only if the session has been validated by the leader. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its
Failed: ZOOKEEPER-1382 PreCommit Build #1419
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1419/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 71 lines...] [exec] Hunk #1 FAILED at 597. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java.rej [exec] patching file src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java [exec] Hunk #1 FAILED at 206. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java.rej [exec] patching file src/java/main/org/apache/zookeeper/ClientCnxn.java [exec] Hunk #1 succeeded at 231 (offset 4 lines). [exec] Hunk #2 succeeded at 258 (offset 4 lines). [exec] Hunk #3 succeeded at 277 (offset 4 lines). [exec] Hunk #4 succeeded at 304 (offset 4 lines). [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12571815/ZOOKEEPER-1382.patch [exec] against trunk revision 1448007. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1419//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 99f7cbcf806d15cce5ce7d0bf5780c5bb01beb7b logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 43 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1382 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591848#comment-13591848 ] Hadoop QA commented on ZOOKEEPER-1382: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571815/ZOOKEEPER-1382.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1419//console This message is automatically generated. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after
[jira] [Updated] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Morello updated ZOOKEEPER-1382: --- Attachment: (was: ZOOKEEPER-1382.patch) Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired - // Now that the session is ready we can start receiving packets synchronized (this.factory) { sk.selector().wakeup(); enableRecv(); } } catch (Exception e) { LOG.warn(Exception while establishing session, closing, e); close();
[jira] [Updated] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Morello updated ZOOKEEPER-1382: --- Attachment: ZOOKEEPER-1382-branch-3.4.patch Current patch is for branch 3.4 Sorry Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired - // Now that the session is ready we can start receiving packets synchronized (this.factory) { sk.selector().wakeup(); enableRecv(); } } catch (Exception e) {
Failed: ZOOKEEPER-1382 PreCommit Build #1420
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1420/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 71 lines...] [exec] Hunk #1 FAILED at 597. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java.rej [exec] patching file src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java [exec] Hunk #1 FAILED at 206. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java.rej [exec] patching file src/java/main/org/apache/zookeeper/ClientCnxn.java [exec] Hunk #1 succeeded at 231 (offset 4 lines). [exec] Hunk #2 succeeded at 258 (offset 4 lines). [exec] Hunk #3 succeeded at 277 (offset 4 lines). [exec] Hunk #4 succeeded at 304 (offset 4 lines). [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12571817/ZOOKEEPER-1382-branch-3.4.patch [exec] against trunk revision 1448007. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1420//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] e372528e767e2b6757ed4e2f6bc7f26b87e7bb55 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 42 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1382 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591854#comment-13591854 ] Hadoop QA commented on ZOOKEEPER-1382: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571817/ZOOKEEPER-1382-branch-3.4.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1420//console This message is automatically generated. Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In
ZooKeeper_branch33_solaris - Build # 460 - Failure
See https://builds.apache.org/job/ZooKeeper_branch33_solaris/460/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 96942 lines...] [junit] 2013-03-04 07:06:59,076 - INFO [main:ZooKeeperServer@154] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2 [junit] 2013-03-04 07:06:59,077 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-03-04 07:06:59,079 - INFO [main:FileSnap@82] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2/snapshot.0 [junit] 2013-03-04 07:06:59,082 - INFO [main:FileTxnSnapLog@256] - Snapshotting: b [junit] 2013-03-04 07:06:59,084 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-03-04 07:06:59,085 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:45594 [junit] 2013-03-04 07:06:59,086 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing stat command from /127.0.0.1:45594 [junit] 2013-03-04 07:06:59,087 - INFO [Thread-4:NIOServerCnxn$StatCommand@1153] - Stat command output [junit] 2013-03-04 07:06:59,088 - INFO [Thread-4:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:45594 (no session established for client) [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] expect:InMemoryDataTree [junit] found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] expect:StandaloneServer_port [junit] found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-03-04 07:06:59,090 - INFO [main:ClientBase@408] - STOPPING server [junit] 2013-03-04 07:06:59,091 - INFO [ProcessThread:-1:PrepRequestProcessor@128] - PrepRequestProcessor exited loop! [junit] 2013-03-04 07:06:59,091 - INFO [SyncThread:0:SyncRequestProcessor@151] - SyncRequestProcessor exited! [junit] 2013-03-04 07:06:59,092 - INFO [main:FinalRequestProcessor@370] - shutdown of request processor complete [junit] 2013-03-04 07:06:59,093 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2013-03-04 07:06:59,095 - INFO [main:ClientBase@401] - STARTING server [junit] 2013-03-04 07:06:59,095 - INFO [main:ZooKeeperServer@154] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2 [junit] 2013-03-04 07:06:59,096 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-03-04 07:06:59,097 - INFO [main:FileSnap@82] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test6220841508599798586.junit.dir/version-2/snapshot.b [junit] 2013-03-04 07:06:59,100 - INFO [main:FileTxnSnapLog@256] - Snapshotting: b [junit] 2013-03-04 07:06:59,102 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-03-04 07:06:59,103 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:45596 [junit] 2013-03-04 07:06:59,103 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing stat command from /127.0.0.1:45596 [junit] 2013-03-04 07:06:59,104 - INFO [Thread-5:NIOServerCnxn$StatCommand@1153] - Stat command output [junit] 2013-03-04 07:06:59,105 - INFO [Thread-5:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:45596 (no session established for client) [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] expect:InMemoryDataTree [junit] found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] expect:StandaloneServer_port [junit] found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-03-04 07:06:59,107 - INFO
[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592041#comment-13592041 ] Mahadev konar commented on ZOOKEEPER-1382: -- Michael, Would you be able to upload a patch for trunk as well? Zookeeper server holds onto dead/expired session ids in the watch data structures - Key: ZOOKEEPER-1382 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Critical Fix For: 3.4.6 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382-branch-3.4.patch I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired - // Now that the session is ready we can start receiving packets synchronized (this.factory) { sk.selector().wakeup(); enableRecv(); } } catch
[jira] [Commented] (ZOOKEEPER-1551) Observer ignore txns that comes after snapshot and UPTODATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592043#comment-13592043 ] Mahadev konar commented on ZOOKEEPER-1551: -- [~fpj] would you be able to review the latest patch? Observer ignore txns that comes after snapshot and UPTODATE Key: ZOOKEEPER-1551 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1551 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.3 Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Blocker Fix For: 3.5.0, 3.4.6 Attachments: ZOOKEEPER-1551.patch, ZOOKEEPER-1551.patch In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1657: - Fix Version/s: 3.4.6 3.5.0 Increased CPU usage by unnecessary SASL checks -- Key: ZOOKEEPER-1657 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Gunnar Wagenknecht Labels: performance Fix For: 3.5.0, 3.4.6 Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, zookeeper-hotspot.png I did some profiling in one of our Java environments and found an interesting footprint in ZooKeeper. The SASL support seems to trigger a lot times on the client although it's not even in use. Is there a switch to disable SASL completely? The attached screenshot shows a 10-minute profiling session on one of our production Jetty servers. The Jetty server handles ~1k web requests per minute. The average response time per web request is a few milli seconds. The profiling was performed on a machine running for 24h. We noticed a significant CPU increase on our servers when deploying an update from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% are spend in ZooKeeper. A few notes/thoughts: * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to be the culprit * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be called very often? * There is quite a bit reflection involved in {{java.security.AccessController.doPrivileged}} * No security manager is active in the JVM: I tend to place an if-check in the code before calling {{AccessController.doPrivileged}}. When no SM is installed, the runnable can be called directly which safes cycles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira