[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677144#action_12677144
 ] 

Chris Darroch commented on ZOOKEEPER-320:
-----------------------------------------

Also, please note my suggestion that the docs mention the need to catch, 
ignore, or otherwise be aware of SIGPIPE signals.

> call auth completion in free_completions()
> ------------------------------------------
>
>                 Key: ZOOKEEPER-320
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.0.0, 3.0.1, 3.1.0
>            Reporter: Chris Darroch
>            Assignee: Chris Darroch
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320-319.patch, 
> ZOOKEEPER-320.patch
>
>
> If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the 
> ZooKeeper server will mark their session expired and close the connection.  
> However, the C client has returned immediately after queuing the new auth 
> data to be sent with a ZOK return code.
> If the client then waits for their auth completion function to be called, 
> they can wait forever, as no session event is ever delivered to that 
> completion function.  All other completion functions are notified of session 
> events by free_completions(), which is called by cleanup_bufs() in 
> handle_error() in handle_socket_error_msg().
> In actual fact, what can happen (about 50% of the time, for me) is that the 
> next call by the IO thread to flush_send_queue() calls send() from within 
> send_buffer(), and receives a SIGPIPE signal during this send() call.  
> Because the ZooKeeper C API is a library, it properly does not catch that 
> signal.  If the user's code is not catching that signal either, they 
> experience an abort caused by an untrapped signal.  If they are ignoring the 
> signal -- which is common in context I'm working in, the Apache httpd server 
> -- then flush_send_queue()'s error return code is EPIPE, which is logged by 
> handle_socket_error_msg(), and all non-auth completion functions are notified 
> of a session event.  However, if the caller is waiting for their auth 
> completion function, they wait forever while the IO thread tries repeatedly 
> to reconnect and is rejected by the server as having an expired session.
> So, first of all, it would be useful to document in the C API portion of the 
> programmer's guide that trapping or ignoring SIGPIPE is important, as this 
> signal may be generated by the C API.
> Next, the two attached patches call the auth completion function, if any, in 
> free_completions(), which fixes this problem for me.  The second attached 
> patch includes auth lock/unlock function, as per ZOOKEEPER-319.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to