Hi Chris, Just to mention that in case you want your patch reviewed please make it patch available.
Here is a link to the process we follow. http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute mahadev On 2/18/09 2:12 PM, "Chris Darroch (JIRA)" <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/ZOOKEEPER-320?page=com.atlassian.jira.pl > ugin.system.issuetabpanels:all-tabpanel ] > > Chris Darroch updated ZOOKEEPER-320: > ------------------------------------ > > Attachment: (was: ZOOKEEPER-320-319.patch) > >> call auth completion in free_completions() >> ------------------------------------------ >> >> Key: ZOOKEEPER-320 >> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-320 >> Project: Zookeeper >> Issue Type: Bug >> Components: c client >> Affects Versions: 3.0.0, 3.0.1, 3.1.0 >> Reporter: Chris Darroch >> Fix For: 3.1.1, 3.2.0 >> >> Attachments: ZOOKEEPER-320-319.patch, ZOOKEEPER-320.patch >> >> >> If a client calls zoo_add_auth() with an invalid scheme (e.g., "foo") the >> ZooKeeper server will mark their session expired and close the connection. >> However, the C client has returned immediately after queuing the new auth >> data to be sent with a ZOK return code. >> If the client then waits for their auth completion function to be called, >> they can wait forever, as no session event is ever delivered to that >> completion function. All other completion functions are notified of session >> events by free_completions(), which is called by cleanup_bufs() in >> handle_error() in handle_socket_error_msg(). >> In actual fact, what can happen (about 50% of the time, for me) is that the >> next call by the IO thread to flush_send_queue() calls send() from within >> send_buffer(), and receives a SIGPIPE signal during this send() call. >> Because the ZooKeeper C API is a library, it properly does not catch that >> signal. If the user's code is not catching that signal either, they >> experience an abort caused by an untrapped signal. If they are ignoring the >> signal -- which is common in context I'm working in, the Apache httpd server >> -- then flush_send_queue()'s error return code is EPIPE, which is logged by >> handle_socket_error_msg(), and all non-auth completion functions are notified >> of a session event. However, if the caller is waiting for their auth >> completion function, they wait forever while the IO thread tries repeatedly >> to reconnect and is rejected by the server as having an expired session. >> So, first of all, it would be useful to document in the C API portion of the >> programmer's guide that trapping or ignoring SIGPIPE is important, as this >> signal may be generated by the C API. >> Next, the two attached patches call the auth completion function, if any, in >> free_completions(), which fixes this problem for me. The second attached >> patch includes auth lock/unlock function, as per ZOOKEEPER-319.