Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Jakub Hrozek Tue, 01 Nov 2011 06:58:17 -0700

On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote:
> On 10/31/2011 04:29 PM, Jakub Hrozek wrote:
> >On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:
> >>On 10/25/2011 01:08 PM, Pavel Zuna wrote:
> >>>On 10/21/2011 06:44 PM, Stephen Gallagher wrote:
> >>>>On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:
> >>>>>Base on the second proposal:
> >>>>>
> >>>>>https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld
> >>>>>
> >>>>>There is some old SIGCHLD handling code in 
> >>>>>src/providers/child_common.[ch], that
> >>>>>should probably go away if this gets accepted. There was also a naming 
> >>>>>conflict
> >>>>>with the sss_child_ctx structure. This structure is only used internally 
> >>>>>by
> >>>>>functions defined src/providers/child_common.c. I renamed the original 
> >>>>>structure
> >>>>>to sss_child_ctx_old for now.
> >>>>
> >>>>
> >>>>Nack (but a good start)
> >>>>
> >>>>sss_child_destructor() cannot return the result of hash_delete(). Talloc
> >>>>destructors may only return 0 on success or -1 on failure. However,
> >>>>because destructor failure results in cancelling the free(), we should
> >>>>just log an error here and return 0.
> >>>>
> >>>>sss_child_register:
> >>>>Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want
> >>>>to leave child_ctx untouched until successful completion. Also, please
> >>>>don't use the variable name "tmp". It's not descriptive. Call it
> >>>>"child".
> >>>>
> >>>>Instead of allocating *child_ctx at the beginning, allocate "child" on
> >>>>the mem_ctx.
> >>>>
> >>>>child = talloc_zero(mem_ctx, struct sss_child_ctx);
> >>>>
> >>>>Then, only at the end of execution, right before returning EOK, do
> >>>>*child_ctx = child;
> >>>>
> >
> >In sss_child_register():
> >
> >+    error = hash_enter(sigchld_ctx->children,&key,&value);
> >+    if (error != HASH_SUCCESS&&  error != HASH_ERROR_KEY_NOT_FOUND) {
> >
> >I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would
> >it mean?) and even if it did, why ignore it?
> >
> >>>>
> >>>>sss_child_invoke_cb():
> >>>>I'm wary of having the callback invocation free the child_ctx. It needs
> >>>>to be documented well (so that consumers know they must not free it once
> >>>>the callback has fired or they'll get a double-free).
> >
> >This still needs fixing.
> >
> 
> Forgot about that, sorry. Replaced the talloc_free with removing
> child_ctx from the hash table only.
> 
> >>>>
> >>>>sss_child_handler():
> >>>>You need to check whether waitpid() returned an error and handle that
> >>>>appropriately. waitpid() will return -1 on error and set errno. If the
> >>>>errno value is EINTR, you should retry (because waitpid() was
> >>>>interrupted by a signal at the time). Other errors should generate a
> >>>>DEBUG message.
> >
> >I think you misunderstood how the error handling should be done:
> >
> >+        if (pid == -1) {
> >+            if (errno == ECHILD) {
> >+                DEBUG(0, ("waitpid failed with ECHILD\n"));
> >+            } else if (errno == EINVAL) {
> >+                DEBUG(0, ("waitpid failed with EINVAL\n"));
> >+            }
> >+            return;
> >+        }
> >
> >Instead of having a DEBUG() call per retval, it would be better to inform 
> >user
> >on why the child process exited, similarly to how child_sig_handler() does it
> >now.
> >
> 
> I don't agree. The system is designed to inform providers that a
> child has exited by invoking their callbacks. It's up to the
> provider to decide what to do with the result and inform the user if
> appropriate. This would also make sense only in the case of a
> successful call (pid >= 0).


OK, then why the separate branches for ECHILD and EINVAL? Isn't it
easier to just print errno and strerror(errno) ?

> 
> In case of EINTR, we need to repeat the call to waitpid, because it
> was interrupted by a signal. In other cases (ECHILD and EINVAL) the
> call is going to always fail, so there's won't be any callbacks to
> invoke.

Yup.

> 
> >>>>
> >>>
> >>>Ok, I'm done with all of this. It was pretty straight forward, but...
> >>>
> >>>>
> >>>>Please separate any changes you make to the existing sss_child_ctx into
> >>>>its own patch. I don't like just renaming it to _old, either. Please
> >>>>find a more appropriate way to replace this old object (preferably by
> >>>>converting it to use your new mechanism, also in a new patch).
> >>>>
> >>>
> >>>... this, I'm trying to find a way to make the old code use the new 
> >>>mechanism. I
> >>>think it would be better to remove the old one completely and replace it 
> >>>with
> >>>the new one everywhere where it's used.
> >>>
> >>>Pavel
> >>
> >>New patch with the above mentioned issues fixed is attached to this email.
> >>
> >>We agreed that we're going to replace all uses of the old system by
> >>the new. Expect about 3 patches to follow on this one - one for each
> >>provider using the old system.
> >>
> >>Pavel
> >
> 
> Thanks for taking the time to review this. New patch attached.
> 
> Pavel

Two more issues (sorry for not catching all this at once):

Please don't just use DEBUG(0) for all the messages. DEBUG(0) is
supposed to be used for fatal errors that cause the provider to quit
completely.

See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html
for explanation of the log levels. In general, this patch is going to be
included in 1.7 and onwards -- please use the new debug levels (see
src/util/util.h) defined as macros instead of integers.

The second issue is that I'm getting a segfault when logging in via
Kerberos, the backtrace is pointing to tevent_common_check_signal().
_______________________________________________
sssd-devel mailing list
[email protected]
https://fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] [PATCH] Add common SIGCHLD handling.

Reply via email to