Build failed in Hudson: ZooKeeper-trunk #241

2009-02-26 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/241/

--
[...truncated 55409 lines...]
[junit] 2009-02-26 11:54:30,349 - INFO  
[NIOServerCxn.Factory:33221:nioservercnxn$fact...@177] - NIOServerCnxn factory 
exited run method
[junit] 2009-02-26 11:54:30,349 - INFO  [main:finalrequestproces...@268] - 
shutdown of request processor complete
[junit] 2009-02-26 11:54:30,349 - INFO  
[SyncThread:0:syncrequestproces...@119] - SyncRequestProcessor exited!
[junit] 2009-02-26 11:54:30,349 - INFO  
[ProcessThread:-1:preprequestproces...@111] - PrepRequestProcessor exited loop!
[junit] 2009-02-26 11:54:30,449 - INFO  [main:clientb...@306] - STARTING 
server
[junit] 2009-02-26 11:54:30,449 - INFO  [main:zookeeperser...@160] - 
Created server
[junit] 2009-02-26 11:54:30,450 - INFO  [main:files...@71] - Reading 
snapshot 
http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/ws/trunk/build/test/tmp/test7339586621902816764.junit.dir/version-2/snapshot.0
 
[junit] 2009-02-26 11:54:30,451 - INFO  [main:filetxnsnap...@198] - 
Snapshotting: 3
[junit] 2009-02-26 11:54:30,453 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@635] - Processing stat command from 
/127.0.0.1:45760
[junit] 2009-02-26 11:54:30,453 - WARN  
[NIOServerCxn.Factory:33221:nioserverc...@431] - Exception causing close of 
session 0x0 due to java.io.IOException: Responded to info probe
[junit] 2009-02-26 11:54:30,454 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@766] - closing session:0x0 
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/127.0.0.1:33221 
remote=/127.0.0.1:45760]
[junit] 2009-02-26 11:54:32,157 - INFO  
[main-SendThread:clientcnxn$sendthr...@800] - Attempting connection to server 
/127.0.0.1:33221
[junit] 2009-02-26 11:54:32,157 - INFO  
[main-SendThread:clientcnxn$sendthr...@716] - Priming connection to 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:45761 
remote=/127.0.0.1:33221]
[junit] 2009-02-26 11:54:32,157 - INFO  
[main-SendThread:clientcnxn$sendthr...@868] - Server connection successful
[junit] 2009-02-26 11:54:32,158 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@517] - Connected to /127.0.0.1:45761 
lastZxid 3
[junit] 2009-02-26 11:54:32,158 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@895] - Finished init of 
0x11fb26f923a valid:true
[junit] 2009-02-26 11:54:32,158 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@545] - Renewing session 
0x11fb26f923a
[junit] 2009-02-26 11:54:33,000 - INFO  
[SessionTracker:sessiontrackeri...@142] - SessionTrackerImpl exited loop!
[junit] 2009-02-26 11:54:33,000 - INFO  
[SessionTracker:sessiontrackeri...@142] - SessionTrackerImpl exited loop!
[junit] 2009-02-26 11:55:06,172 - INFO  [main:clientb...@300] - STOPPING 
server
[junit] 2009-02-26 11:55:06,173 - INFO  [main:nioserverc...@766] - closing 
session:0x11fb26f923a NIOServerCnxn: 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:33221 
remote=/127.0.0.1:45761]
[junit] 2009-02-26 11:55:06,173 - WARN  
[main-SendThread:clientcnxn$sendthr...@898] - Exception closing session 
0x11fb26f923a to sun.nio.ch.selectionkeyi...@4204
[junit] java.io.IOException: Read error rc = -1 
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:632)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:876)
[junit] 2009-02-26 11:55:06,173 - INFO  
[NIOServerCxn.Factory:33221:nioservercnxn$fact...@177] - NIOServerCnxn factory 
exited run method
[junit] 2009-02-26 11:55:06,174 - INFO  [main:finalrequestproces...@268] - 
shutdown of request processor complete
[junit] 2009-02-26 11:55:06,174 - INFO  
[ProcessThread:-1:preprequestproces...@111] - PrepRequestProcessor exited loop!
[junit] 2009-02-26 11:55:06,174 - INFO  
[SyncThread:0:syncrequestproces...@119] - SyncRequestProcessor exited!
[junit] 2009-02-26 11:55:06,273 - INFO  [main:clientb...@306] - STARTING 
server
[junit] 2009-02-26 11:55:06,274 - INFO  [main:zookeeperser...@160] - 
Created server
[junit] 2009-02-26 11:55:06,275 - INFO  [main:files...@71] - Reading 
snapshot 
http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/ws/trunk/build/test/tmp/test7339586621902816764.junit.dir/version-2/snapshot.3
 
[junit] 2009-02-26 11:55:06,297 - INFO  [main:filetxnsnap...@198] - 
Snapshotting: 5
[junit] 2009-02-26 11:55:06,299 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@635] - Processing stat command from 
/127.0.0.1:45763
[junit] 2009-02-26 11:55:06,300 - WARN  
[NIOServerCxn.Factory:33221:nioserverc...@431] - Exception causing close of 
session 0x0 due to java.io.IOException: Responded to info probe
[junit] 2009-02-26 11:55:06,300 - INFO  
[NIOServerCxn.Factory:33221:nioserverc...@766] - closing session:0x0 
NIOServerCnxn: 

[jira] Updated: (ZOOKEEPER-330) zookeeper standalone server does not startup with just a port and datadir.

2009-02-26 Thread Chris Darroch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Darroch updated ZOOKEEPER-330:


Attachment: ZOOKEEPER-330.patch

Sorry I didn't catch the problem with src/c/tests/zkServer.sh invoking 
ZooKeeperMain directly.

The only issue I had with your proposed patch here is that the original problem 
I encountered in ZOOKEEPER-326 returns -- if you start a standalone server 
using QuorumPeerMain, it ignores tickTime settings there again.  This 
contradicts the information here:

http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperStarted.html#sc_InstallingSingleMode

and it also just would be nice, I think, for the standard zkServer.sh (which 
uses QuorumPeerMain) and standard config file to work fully for standalone 
installations out of the box.

My revisions in the attached patch allow ZooKeeperMain to take a single config 
file as an alternate set of arguments, in which case it works like 
QuorumPeerMain.  This seems to resolve both my original issue, support the 
addition of a command-line tickTime argument as per your patch, and allow the 
src/c/tests/zkServer.sh script to work as-is.

 zookeeper standalone server does not startup with just a port and datadir.
 --

 Key: ZOOKEEPER-330
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-330
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-330.patch, ZOOKEEPER-330.patch


 ZOOKEEPER-326 made a change to zookeeperservermain.java that broke the 
 starting of zookeeperserver with just the port and datadir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-319) add locking around auth info in zhandle_t

2009-02-26 Thread Chris Darroch (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Darroch updated ZOOKEEPER-319:


Attachment: ZOOKEEPER-319.patch

Good points -- see if this suits.

 add locking around auth info in zhandle_t
 -

 Key: ZOOKEEPER-319
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-319
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Chris Darroch
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-319.patch, ZOOKEEPER-319.patch, 
 ZOOKEEPER-319.patch


 Looking over the zookeeper.c code it appears to me that the zoo_add_auth() 
 function may be called at any time by the user in their main thread.  This 
 function alters the elements of the auth_info structure in the zhandle_t 
 structure.
 Meanwhile, the IO thread may read those elements at any time in such 
 functions as send_auth_info() and auth_completion_func().  It seems 
 important, then, to add a lock which prevents data being read by the IO 
 thread while only partially changed by the user's thread.  The attached patch 
 add such a lock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-318) remove locking in zk_hashtable.c or add locking in collect_keys()

2009-02-26 Thread Chris Darroch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677147#action_12677147
 ] 

Chris Darroch commented on ZOOKEEPER-318:
-

Well, my own take on things like this and ZOOKEEPER-262 is that it's always 
good to keep things clean and simple -- a good day of programming in my book is 
one that removes more lines of code than are created and yet keeps the same or 
better functionality.

Aside from any incremental performance gains, I think the big win with both of 
these patches is that they make the purpose of the code that much more 
apparent.  A significant part of programming, I believe, is psychology.  A 
programmer who comes across a package laced with pthread_mutex_lock() 
statements immediately makes two pretty reasonable assumptions: the code is 
used in a multi-threaded context, and it's MT-safe.

In this case, both assumptions are incorrect; the code isn't used in an MT 
context and if it were to be, collect_keys() appears to be lacking the 
necessary locks and I suspect it would be MT-unsafe.  There could always be 
other subtle MT-related bugs which haven't been shaken out too, should one 
start using it in MT code.

Thus my own feeling is that it's better to simplify and remove these locks for 
a variety of reasons: it makes the code more self-documenting; easier to read, 
understand, and revise; and marginally faster.

Should the hashtables need to be used in an MT context in the future, the 
existing code can always be recovered quickly from SVN.  If there's an 
explanatory note in the SVN log that mentions the collect_keys() issue, all the 
better; then whoever might need to do this work will be prompted to think that 
aspect through as well.

That's just my two cents, of course.  :-)

 remove locking in zk_hashtable.c or add locking in collect_keys()
 -

 Key: ZOOKEEPER-318
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-318
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Chris Darroch
 Fix For: 3.2.0, 4.0.0

 Attachments: ZOOKEEPER-318.patch


 From a review of zk_hashtable.c it appears to me that all functions which 
 manipulate the hashtables are called from the IO thread, and therefore any 
 need for locking is obviated.
 If I'm wrong about that, then I think at a minimum collect_keys() should 
 acquire a lock in the same manner as collect_session_watchers().  Both 
 iterate over hashtable contents (in the latter case using copy_table()).
 However, from what I can see, the only function (besides the init/destroy 
 functions used when creating a zhandle_t) called from the completion thread 
 is deliverWatchers(), which simply iterates over a delivery list created 
 from the hashtables by collectWatchers().  The activateWatcher() function 
 contains comments which describe it being called by the completion thread, 
 but in fact it is called by the IO thread in zookeeper_process().
 I believe all calls to collectWatchers(), activateWatcher(), and 
 collect_keys() are made by the IO thread in zookeeper_interest(), 
 zookeeper_process(), check_events(), send_set_watches(), and handle_error().  
 Note that queue_session_event() is aliased as PROCESS_SESSION_EVENT, but 
 appears only in handle_error() and check_events().
 Also note that handle_error() is called only in zookeeper_process() and 
 handle_socket_error_msg(), which is used only by the IO thread, so far as I 
 can see.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-320) call auth completion in free_completions()

2009-02-26 Thread Chris Darroch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677143#action_12677143
 ] 

cdarroch edited comment on ZOOKEEPER-320 at 2/26/09 2:45 PM:
--

Updated with a NULL initialization as per the comment on 
[ZOOKEEPER-319#action_12676824].

Out of interest --- what compiler gives these errors?  My gcc 4.1.2 with -Wall 
doesn't report any troubles.

  was (Author: cdarroch):
Updated with a NULL initialization as per the comment on 
ZOOKEEPER-319#action_12676824

Out of interest --- what compiler gives these errors?  My gcc 4.1.2 with -Wall 
doesn't report any troubles.
  
 call auth completion in free_completions()
 --

 Key: ZOOKEEPER-320
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Chris Darroch
Assignee: Chris Darroch
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320-319.patch, 
 ZOOKEEPER-320.patch


 If a client calls zoo_add_auth() with an invalid scheme (e.g., foo) the 
 ZooKeeper server will mark their session expired and close the connection.  
 However, the C client has returned immediately after queuing the new auth 
 data to be sent with a ZOK return code.
 If the client then waits for their auth completion function to be called, 
 they can wait forever, as no session event is ever delivered to that 
 completion function.  All other completion functions are notified of session 
 events by free_completions(), which is called by cleanup_bufs() in 
 handle_error() in handle_socket_error_msg().
 In actual fact, what can happen (about 50% of the time, for me) is that the 
 next call by the IO thread to flush_send_queue() calls send() from within 
 send_buffer(), and receives a SIGPIPE signal during this send() call.  
 Because the ZooKeeper C API is a library, it properly does not catch that 
 signal.  If the user's code is not catching that signal either, they 
 experience an abort caused by an untrapped signal.  If they are ignoring the 
 signal -- which is common in context I'm working in, the Apache httpd server 
 -- then flush_send_queue()'s error return code is EPIPE, which is logged by 
 handle_socket_error_msg(), and all non-auth completion functions are notified 
 of a session event.  However, if the caller is waiting for their auth 
 completion function, they wait forever while the IO thread tries repeatedly 
 to reconnect and is rejected by the server as having an expired session.
 So, first of all, it would be useful to document in the C API portion of the 
 programmer's guide that trapping or ignoring SIGPIPE is important, as this 
 signal may be generated by the C API.
 Next, the two attached patches call the auth completion function, if any, in 
 free_completions(), which fixes this problem for me.  The second attached 
 patch includes auth lock/unlock function, as per ZOOKEEPER-319.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-320) call auth completion in free_completions()

2009-02-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677169#action_12677169
 ] 

Mahadev konar commented on ZOOKEEPER-320:
-

my compiler is gcc 3.4.4


 call auth completion in free_completions()
 --

 Key: ZOOKEEPER-320
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.0.0, 3.0.1, 3.1.0
Reporter: Chris Darroch
Assignee: Chris Darroch
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320-319.patch, 
 ZOOKEEPER-320.patch


 If a client calls zoo_add_auth() with an invalid scheme (e.g., foo) the 
 ZooKeeper server will mark their session expired and close the connection.  
 However, the C client has returned immediately after queuing the new auth 
 data to be sent with a ZOK return code.
 If the client then waits for their auth completion function to be called, 
 they can wait forever, as no session event is ever delivered to that 
 completion function.  All other completion functions are notified of session 
 events by free_completions(), which is called by cleanup_bufs() in 
 handle_error() in handle_socket_error_msg().
 In actual fact, what can happen (about 50% of the time, for me) is that the 
 next call by the IO thread to flush_send_queue() calls send() from within 
 send_buffer(), and receives a SIGPIPE signal during this send() call.  
 Because the ZooKeeper C API is a library, it properly does not catch that 
 signal.  If the user's code is not catching that signal either, they 
 experience an abort caused by an untrapped signal.  If they are ignoring the 
 signal -- which is common in context I'm working in, the Apache httpd server 
 -- then flush_send_queue()'s error return code is EPIPE, which is logged by 
 handle_socket_error_msg(), and all non-auth completion functions are notified 
 of a session event.  However, if the caller is waiting for their auth 
 completion function, they wait forever while the IO thread tries repeatedly 
 to reconnect and is rejected by the server as having an expired session.
 So, first of all, it would be useful to document in the C API portion of the 
 programmer's guide that trapping or ignoring SIGPIPE is important, as this 
 signal may be generated by the C API.
 Next, the two attached patches call the auth completion function, if any, in 
 free_completions(), which fixes this problem for me.  The second attached 
 patch includes auth lock/unlock function, as per ZOOKEEPER-319.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-330) zookeeper standalone server does not startup with just a port and datadir.

2009-02-26 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677250#action_12677250
 ] 

Mahadev konar commented on ZOOKEEPER-330:
-

+1 for the aptch  good changes chris... 

 zookeeper standalone server does not startup with just a port and datadir.
 --

 Key: ZOOKEEPER-330
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-330
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-330.patch, ZOOKEEPER-330.patch


 ZOOKEEPER-326 made a change to zookeeperservermain.java that broke the 
 starting of zookeeperserver with just the port and datadir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.