Hi Chris,
  Just to mention that in case you want your patch reviewed please make it
patch available.

Here is a link to the process we follow.

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

mahadev


On 2/18/09 2:12 PM, "Chris Darroch (JIRA)" <j...@apache.org> wrote:

> 
>      [ 
> https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.pl
> ugin.system.issuetabpanels:all-tabpanel ]
> 
> Chris Darroch updated ZOOKEEPER-320:
> ------------------------------------
> 
>     Attachment:     (was: ZOOKEEPER-320-319.patch)
> 
>> call auth completion in free_completions()
>> ------------------------------------------
>> 
>>                 Key: ZOOKEEPER-320
>>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
>>             Project: Zookeeper
>>          Issue Type: Bug
>>          Components: c client
>>    Affects Versions: 3.0.0, 3.0.1, 3.1.0
>>            Reporter: Chris Darroch
>>             Fix For: 3.1.1, 3.2.0
>> 
>>         Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch
>> 
>> 
>> If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the
>> ZooKeeper server will mark their session expired and close the connection.
>> However, the C client has returned immediately after queuing the new auth
>> data to be sent with a ZOK return code.
>> If the client then waits for their auth completion function to be called,
>> they can wait forever, as no session event is ever delivered to that
>> completion function.  All other completion functions are notified of session
>> events by free_completions(), which is called by cleanup_bufs() in
>> handle_error() in handle_socket_error_msg().
>> In actual fact, what can happen (about 50% of the time, for me) is that the
>> next call by the IO thread to flush_send_queue() calls send() from within
>> send_buffer(), and receives a SIGPIPE signal during this send() call.
>> Because the ZooKeeper C API is a library, it properly does not catch that
>> signal.  If the user's code is not catching that signal either, they
>> experience an abort caused by an untrapped signal.  If they are ignoring the
>> signal -- which is common in context I'm working in, the Apache httpd server
>> -- then flush_send_queue()'s error return code is EPIPE, which is logged by
>> handle_socket_error_msg(), and all non-auth completion functions are notified
>> of a session event.  However, if the caller is waiting for their auth
>> completion function, they wait forever while the IO thread tries repeatedly
>> to reconnect and is rejected by the server as having an expired session.
>> So, first of all, it would be useful to document in the C API portion of the
>> programmer's guide that trapping or ignoring SIGPIPE is important, as this
>> signal may be generated by the C API.
>> Next, the two attached patches call the auth completion function, if any, in
>> free_completions(), which fixes this problem for me.  The second attached
>> patch includes auth lock/unlock function, as per ZOOKEEPER-319.

Reply via email to