Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Pavel Zuna Wed, 09 Nov 2011 06:32:53 -0800

On 11/01/2011 02:58 PM, Jakub Hrozek wrote:

On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote:

On 10/31/2011 04:29 PM, Jakub Hrozek wrote:

On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:

On 10/25/2011 01:08 PM, Pavel Zuna wrote:

On 10/21/2011 06:44 PM, Stephen Gallagher wrote:

On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:

Base on the second proposal:


https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld

There is some old SIGCHLD handling code in src/providers/child_common.[ch], that
should probably go away if this gets accepted. There was also a naming conflict
with the sss_child_ctx structure. This structure is only used internally by
functions defined src/providers/child_common.c. I renamed the original structure
to sss_child_ctx_old for now.



Nack (but a good start)

sss_child_destructor() cannot return the result of hash_delete(). Talloc
destructors may only return 0 on success or -1 on failure. However,
because destructor failure results in cancelling the free(), we should
just log an error here and return 0.

sss_child_register:
Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want
to leave child_ctx untouched until successful completion. Also, please
don't use the variable name "tmp". It's not descriptive. Call it
"child".

Instead of allocating *child_ctx at the beginning, allocate "child" on
the mem_ctx.

child = talloc_zero(mem_ctx, struct sss_child_ctx);

Then, only at the end of execution, right before returning EOK, do
*child_ctx = child;


In sss_child_register():

+    error = hash_enter(sigchld_ctx->children,&key,&value);
+    if (error != HASH_SUCCESS&&   error != HASH_ERROR_KEY_NOT_FOUND) {

I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would
it mean?) and even if it did, why ignore it?


sss_child_invoke_cb():
I'm wary of having the callback invocation free the child_ctx. It needs
to be documented well (so that consumers know they must not free it once
the callback has fired or they'll get a double-free).


This still needs fixing.


Forgot about that, sorry. Replaced the talloc_free with removing
child_ctx from the hash table only.


sss_child_handler():
You need to check whether waitpid() returned an error and handle that
appropriately. waitpid() will return -1 on error and set errno. If the
errno value is EINTR, you should retry (because waitpid() was
interrupted by a signal at the time). Other errors should generate a
DEBUG message.


I think you misunderstood how the error handling should be done:

+        if (pid == -1) {
+            if (errno == ECHILD) {
+                DEBUG(0, ("waitpid failed with ECHILD\n"));
+            } else if (errno == EINVAL) {
+                DEBUG(0, ("waitpid failed with EINVAL\n"));
+            }
+            return;
+        }

Instead of having a DEBUG() call per retval, it would be better to inform user
on why the child process exited, similarly to how child_sig_handler() does it
now.


I don't agree. The system is designed to inform providers that a
child has exited by invoking their callbacks. It's up to the
provider to decide what to do with the result and inform the user if
appropriate. This would also make sense only in the case of a
successful call (pid>= 0).


OK, then why the separate branches for ECHILD and EINVAL? Isn't it
easier to just print errno and strerror(errno) ?


Yeah, it's definitely better. Improved in current patch. :)


In case of EINTR, we need to repeat the call to waitpid, because it
was interrupted by a signal. In other cases (ECHILD and EINVAL) the
call is going to always fail, so there's won't be any callbacks to
invoke.


Yup.


Ok, I'm done with all of this. It was pretty straight forward, but...


Please separate any changes you make to the existing sss_child_ctx into
its own patch. I don't like just renaming it to _old, either. Please
find a more appropriate way to replace this old object (preferably by
converting it to use your new mechanism, also in a new patch).


... this, I'm trying to find a way to make the old code use the new mechanism. I
think it would be better to remove the old one completely and replace it with
the new one everywhere where it's used.

Pavel


New patch with the above mentioned issues fixed is attached to this email.

We agreed that we're going to replace all uses of the old system by
the new. Expect about 3 patches to follow on this one - one for each
provider using the old system.

Pavel


Thanks for taking the time to review this. New patch attached.

Pavel


Two more issues (sorry for not catching all this at once):

Please don't just use DEBUG(0) for all the messages. DEBUG(0) is
supposed to be used for fatal errors that cause the provider to quit
completely.

See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html
for explanation of the log levels. In general, this patch is going to be
included in 1.7 and onwards -- please use the new debug levels (see
src/util/util.h) defined as macros instead of integers.

Ok.

The second issue is that I'm getting a segfault when logging in via
Kerberos, the backtrace is pointing to tevent_common_check_signal().


Finally found what the problem was. I suspect there might be a bug in tevent.

The SEGFAULT was generated from tevent_common_check_signal as you pointed out. It happened when copying a portion of siginfo that my signal handler wasn't using.


Simply adding the SA_SIGINFO flag to tevent_add_signal solved the issue.

Updated patch attached.

Pavel

pzuna_sssd_0002_4-sigchld.patch
Description: application/mbox

_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Reply via email to