Build failed in Hudson: ZooKeeper-trunk #241
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/241/ -- [...truncated 55409 lines...] [junit] 2009-02-26 11:54:30,349 - INFO [NIOServerCxn.Factory:33221:nioservercnxn$fact...@177] - NIOServerCnxn factory exited run method [junit] 2009-02-26 11:54:30,349 - INFO [main:finalrequestproces...@268] - shutdown of request processor complete [junit] 2009-02-26 11:54:30,349 - INFO [SyncThread:0:syncrequestproces...@119] - SyncRequestProcessor exited! [junit] 2009-02-26 11:54:30,349 - INFO [ProcessThread:-1:preprequestproces...@111] - PrepRequestProcessor exited loop! [junit] 2009-02-26 11:54:30,449 - INFO [main:clientb...@306] - STARTING server [junit] 2009-02-26 11:54:30,449 - INFO [main:zookeeperser...@160] - Created server [junit] 2009-02-26 11:54:30,450 - INFO [main:files...@71] - Reading snapshot http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/ws/trunk/build/test/tmp/test7339586621902816764.junit.dir/version-2/snapshot.0 [junit] 2009-02-26 11:54:30,451 - INFO [main:filetxnsnap...@198] - Snapshotting: 3 [junit] 2009-02-26 11:54:30,453 - INFO [NIOServerCxn.Factory:33221:nioserverc...@635] - Processing stat command from /127.0.0.1:45760 [junit] 2009-02-26 11:54:30,453 - WARN [NIOServerCxn.Factory:33221:nioserverc...@431] - Exception causing close of session 0x0 due to java.io.IOException: Responded to info probe [junit] 2009-02-26 11:54:30,454 - INFO [NIOServerCxn.Factory:33221:nioserverc...@766] - closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/127.0.0.1:33221 remote=/127.0.0.1:45760] [junit] 2009-02-26 11:54:32,157 - INFO [main-SendThread:clientcnxn$sendthr...@800] - Attempting connection to server /127.0.0.1:33221 [junit] 2009-02-26 11:54:32,157 - INFO [main-SendThread:clientcnxn$sendthr...@716] - Priming connection to java.nio.channels.SocketChannel[connected local=/127.0.0.1:45761 remote=/127.0.0.1:33221] [junit] 2009-02-26 11:54:32,157 - INFO [main-SendThread:clientcnxn$sendthr...@868] - Server connection successful [junit] 2009-02-26 11:54:32,158 - INFO [NIOServerCxn.Factory:33221:nioserverc...@517] - Connected to /127.0.0.1:45761 lastZxid 3 [junit] 2009-02-26 11:54:32,158 - INFO [NIOServerCxn.Factory:33221:nioserverc...@895] - Finished init of 0x11fb26f923a valid:true [junit] 2009-02-26 11:54:32,158 - INFO [NIOServerCxn.Factory:33221:nioserverc...@545] - Renewing session 0x11fb26f923a [junit] 2009-02-26 11:54:33,000 - INFO [SessionTracker:sessiontrackeri...@142] - SessionTrackerImpl exited loop! [junit] 2009-02-26 11:54:33,000 - INFO [SessionTracker:sessiontrackeri...@142] - SessionTrackerImpl exited loop! [junit] 2009-02-26 11:55:06,172 - INFO [main:clientb...@300] - STOPPING server [junit] 2009-02-26 11:55:06,173 - INFO [main:nioserverc...@766] - closing session:0x11fb26f923a NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/127.0.0.1:33221 remote=/127.0.0.1:45761] [junit] 2009-02-26 11:55:06,173 - WARN [main-SendThread:clientcnxn$sendthr...@898] - Exception closing session 0x11fb26f923a to sun.nio.ch.selectionkeyi...@4204 [junit] java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] [junit] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:632) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:876) [junit] 2009-02-26 11:55:06,173 - INFO [NIOServerCxn.Factory:33221:nioservercnxn$fact...@177] - NIOServerCnxn factory exited run method [junit] 2009-02-26 11:55:06,174 - INFO [main:finalrequestproces...@268] - shutdown of request processor complete [junit] 2009-02-26 11:55:06,174 - INFO [ProcessThread:-1:preprequestproces...@111] - PrepRequestProcessor exited loop! [junit] 2009-02-26 11:55:06,174 - INFO [SyncThread:0:syncrequestproces...@119] - SyncRequestProcessor exited! [junit] 2009-02-26 11:55:06,273 - INFO [main:clientb...@306] - STARTING server [junit] 2009-02-26 11:55:06,274 - INFO [main:zookeeperser...@160] - Created server [junit] 2009-02-26 11:55:06,275 - INFO [main:files...@71] - Reading snapshot http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/ws/trunk/build/test/tmp/test7339586621902816764.junit.dir/version-2/snapshot.3 [junit] 2009-02-26 11:55:06,297 - INFO [main:filetxnsnap...@198] - Snapshotting: 5 [junit] 2009-02-26 11:55:06,299 - INFO [NIOServerCxn.Factory:33221:nioserverc...@635] - Processing stat command from /127.0.0.1:45763 [junit] 2009-02-26 11:55:06,300 - WARN [NIOServerCxn.Factory:33221:nioserverc...@431] - Exception causing close of session 0x0 due to java.io.IOException: Responded to info probe [junit] 2009-02-26 11:55:06,300 - INFO [NIOServerCxn.Factory:33221:nioserverc...@766] - closing session:0x0 NIOServerCnxn:
[jira] Updated: (ZOOKEEPER-330) zookeeper standalone server does not startup with just a port and datadir.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Darroch updated ZOOKEEPER-330: Attachment: ZOOKEEPER-330.patch Sorry I didn't catch the problem with src/c/tests/zkServer.sh invoking ZooKeeperMain directly. The only issue I had with your proposed patch here is that the original problem I encountered in ZOOKEEPER-326 returns -- if you start a standalone server using QuorumPeerMain, it ignores tickTime settings there again. This contradicts the information here: http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperStarted.html#sc_InstallingSingleMode and it also just would be nice, I think, for the standard zkServer.sh (which uses QuorumPeerMain) and standard config file to work fully for standalone installations out of the box. My revisions in the attached patch allow ZooKeeperMain to take a single config file as an alternate set of arguments, in which case it works like QuorumPeerMain. This seems to resolve both my original issue, support the addition of a command-line tickTime argument as per your patch, and allow the src/c/tests/zkServer.sh script to work as-is. zookeeper standalone server does not startup with just a port and datadir. -- Key: ZOOKEEPER-330 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-330 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-330.patch, ZOOKEEPER-330.patch ZOOKEEPER-326 made a change to zookeeperservermain.java that broke the starting of zookeeperserver with just the port and datadir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-319) add locking around auth info in zhandle_t
[ https://issues.apache.org/jira/browse/ZOOKEEPER-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Darroch updated ZOOKEEPER-319: Attachment: ZOOKEEPER-319.patch Good points -- see if this suits. add locking around auth info in zhandle_t - Key: ZOOKEEPER-319 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-319 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Chris Darroch Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-319.patch, ZOOKEEPER-319.patch, ZOOKEEPER-319.patch Looking over the zookeeper.c code it appears to me that the zoo_add_auth() function may be called at any time by the user in their main thread. This function alters the elements of the auth_info structure in the zhandle_t structure. Meanwhile, the IO thread may read those elements at any time in such functions as send_auth_info() and auth_completion_func(). It seems important, then, to add a lock which prevents data being read by the IO thread while only partially changed by the user's thread. The attached patch add such a lock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-318) remove locking in zk_hashtable.c or add locking in collect_keys()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677147#action_12677147 ] Chris Darroch commented on ZOOKEEPER-318: - Well, my own take on things like this and ZOOKEEPER-262 is that it's always good to keep things clean and simple -- a good day of programming in my book is one that removes more lines of code than are created and yet keeps the same or better functionality. Aside from any incremental performance gains, I think the big win with both of these patches is that they make the purpose of the code that much more apparent. A significant part of programming, I believe, is psychology. A programmer who comes across a package laced with pthread_mutex_lock() statements immediately makes two pretty reasonable assumptions: the code is used in a multi-threaded context, and it's MT-safe. In this case, both assumptions are incorrect; the code isn't used in an MT context and if it were to be, collect_keys() appears to be lacking the necessary locks and I suspect it would be MT-unsafe. There could always be other subtle MT-related bugs which haven't been shaken out too, should one start using it in MT code. Thus my own feeling is that it's better to simplify and remove these locks for a variety of reasons: it makes the code more self-documenting; easier to read, understand, and revise; and marginally faster. Should the hashtables need to be used in an MT context in the future, the existing code can always be recovered quickly from SVN. If there's an explanatory note in the SVN log that mentions the collect_keys() issue, all the better; then whoever might need to do this work will be prompted to think that aspect through as well. That's just my two cents, of course. :-) remove locking in zk_hashtable.c or add locking in collect_keys() - Key: ZOOKEEPER-318 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-318 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Chris Darroch Fix For: 3.2.0, 4.0.0 Attachments: ZOOKEEPER-318.patch From a review of zk_hashtable.c it appears to me that all functions which manipulate the hashtables are called from the IO thread, and therefore any need for locking is obviated. If I'm wrong about that, then I think at a minimum collect_keys() should acquire a lock in the same manner as collect_session_watchers(). Both iterate over hashtable contents (in the latter case using copy_table()). However, from what I can see, the only function (besides the init/destroy functions used when creating a zhandle_t) called from the completion thread is deliverWatchers(), which simply iterates over a delivery list created from the hashtables by collectWatchers(). The activateWatcher() function contains comments which describe it being called by the completion thread, but in fact it is called by the IO thread in zookeeper_process(). I believe all calls to collectWatchers(), activateWatcher(), and collect_keys() are made by the IO thread in zookeeper_interest(), zookeeper_process(), check_events(), send_set_watches(), and handle_error(). Note that queue_session_event() is aliased as PROCESS_SESSION_EVENT, but appears only in handle_error() and check_events(). Also note that handle_error() is called only in zookeeper_process() and handle_socket_error_msg(), which is used only by the IO thread, so far as I can see. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-320) call auth completion in free_completions()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677143#action_12677143 ] cdarroch edited comment on ZOOKEEPER-320 at 2/26/09 2:45 PM: -- Updated with a NULL initialization as per the comment on [ZOOKEEPER-319#action_12676824]. Out of interest --- what compiler gives these errors? My gcc 4.1.2 with -Wall doesn't report any troubles. was (Author: cdarroch): Updated with a NULL initialization as per the comment on ZOOKEEPER-319#action_12676824 Out of interest --- what compiler gives these errors? My gcc 4.1.2 with -Wall doesn't report any troubles. call auth completion in free_completions() -- Key: ZOOKEEPER-320 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Chris Darroch Assignee: Chris Darroch Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch If a client calls zoo_add_auth() with an invalid scheme (e.g., foo) the ZooKeeper server will mark their session expired and close the connection. However, the C client has returned immediately after queuing the new auth data to be sent with a ZOK return code. If the client then waits for their auth completion function to be called, they can wait forever, as no session event is ever delivered to that completion function. All other completion functions are notified of session events by free_completions(), which is called by cleanup_bufs() in handle_error() in handle_socket_error_msg(). In actual fact, what can happen (about 50% of the time, for me) is that the next call by the IO thread to flush_send_queue() calls send() from within send_buffer(), and receives a SIGPIPE signal during this send() call. Because the ZooKeeper C API is a library, it properly does not catch that signal. If the user's code is not catching that signal either, they experience an abort caused by an untrapped signal. If they are ignoring the signal -- which is common in context I'm working in, the Apache httpd server -- then flush_send_queue()'s error return code is EPIPE, which is logged by handle_socket_error_msg(), and all non-auth completion functions are notified of a session event. However, if the caller is waiting for their auth completion function, they wait forever while the IO thread tries repeatedly to reconnect and is rejected by the server as having an expired session. So, first of all, it would be useful to document in the C API portion of the programmer's guide that trapping or ignoring SIGPIPE is important, as this signal may be generated by the C API. Next, the two attached patches call the auth completion function, if any, in free_completions(), which fixes this problem for me. The second attached patch includes auth lock/unlock function, as per ZOOKEEPER-319. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-320) call auth completion in free_completions()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677169#action_12677169 ] Mahadev konar commented on ZOOKEEPER-320: - my compiler is gcc 3.4.4 call auth completion in free_completions() -- Key: ZOOKEEPER-320 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Chris Darroch Assignee: Chris Darroch Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch If a client calls zoo_add_auth() with an invalid scheme (e.g., foo) the ZooKeeper server will mark their session expired and close the connection. However, the C client has returned immediately after queuing the new auth data to be sent with a ZOK return code. If the client then waits for their auth completion function to be called, they can wait forever, as no session event is ever delivered to that completion function. All other completion functions are notified of session events by free_completions(), which is called by cleanup_bufs() in handle_error() in handle_socket_error_msg(). In actual fact, what can happen (about 50% of the time, for me) is that the next call by the IO thread to flush_send_queue() calls send() from within send_buffer(), and receives a SIGPIPE signal during this send() call. Because the ZooKeeper C API is a library, it properly does not catch that signal. If the user's code is not catching that signal either, they experience an abort caused by an untrapped signal. If they are ignoring the signal -- which is common in context I'm working in, the Apache httpd server -- then flush_send_queue()'s error return code is EPIPE, which is logged by handle_socket_error_msg(), and all non-auth completion functions are notified of a session event. However, if the caller is waiting for their auth completion function, they wait forever while the IO thread tries repeatedly to reconnect and is rejected by the server as having an expired session. So, first of all, it would be useful to document in the C API portion of the programmer's guide that trapping or ignoring SIGPIPE is important, as this signal may be generated by the C API. Next, the two attached patches call the auth completion function, if any, in free_completions(), which fixes this problem for me. The second attached patch includes auth lock/unlock function, as per ZOOKEEPER-319. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-330) zookeeper standalone server does not startup with just a port and datadir.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677250#action_12677250 ] Mahadev konar commented on ZOOKEEPER-330: - +1 for the aptch good changes chris... zookeeper standalone server does not startup with just a port and datadir. -- Key: ZOOKEEPER-330 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-330 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-330.patch, ZOOKEEPER-330.patch ZOOKEEPER-326 made a change to zookeeperservermain.java that broke the starting of zookeeperserver with just the port and datadir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.