On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote: > On 10/31/2011 04:29 PM, Jakub Hrozek wrote: > >On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote: > >>On 10/25/2011 01:08 PM, Pavel Zuna wrote: > >>>On 10/21/2011 06:44 PM, Stephen Gallagher wrote: > >>>>On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote: > >>>>>Base on the second proposal: > >>>>> > >>>>>https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld > >>>>> > >>>>>There is some old SIGCHLD handling code in > >>>>>src/providers/child_common.[ch], that > >>>>>should probably go away if this gets accepted. There was also a naming > >>>>>conflict > >>>>>with the sss_child_ctx structure. This structure is only used internally > >>>>>by > >>>>>functions defined src/providers/child_common.c. I renamed the original > >>>>>structure > >>>>>to sss_child_ctx_old for now. > >>>> > >>>> > >>>>Nack (but a good start) > >>>> > >>>>sss_child_destructor() cannot return the result of hash_delete(). Talloc > >>>>destructors may only return 0 on success or -1 on failure. However, > >>>>because destructor failure results in cancelling the free(), we should > >>>>just log an error here and return 0. > >>>> > >>>>sss_child_register: > >>>>Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want > >>>>to leave child_ctx untouched until successful completion. Also, please > >>>>don't use the variable name "tmp". It's not descriptive. Call it > >>>>"child". > >>>> > >>>>Instead of allocating *child_ctx at the beginning, allocate "child" on > >>>>the mem_ctx. > >>>> > >>>>child = talloc_zero(mem_ctx, struct sss_child_ctx); > >>>> > >>>>Then, only at the end of execution, right before returning EOK, do > >>>>*child_ctx = child; > >>>> > > > >In sss_child_register(): > > > >+ error = hash_enter(sigchld_ctx->children,&key,&value); > >+ if (error != HASH_SUCCESS&& error != HASH_ERROR_KEY_NOT_FOUND) { > > > >I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would > >it mean?) and even if it did, why ignore it? > > > >>>> > >>>>sss_child_invoke_cb(): > >>>>I'm wary of having the callback invocation free the child_ctx. It needs > >>>>to be documented well (so that consumers know they must not free it once > >>>>the callback has fired or they'll get a double-free). > > > >This still needs fixing. > > > > Forgot about that, sorry. Replaced the talloc_free with removing > child_ctx from the hash table only. > > >>>> > >>>>sss_child_handler(): > >>>>You need to check whether waitpid() returned an error and handle that > >>>>appropriately. waitpid() will return -1 on error and set errno. If the > >>>>errno value is EINTR, you should retry (because waitpid() was > >>>>interrupted by a signal at the time). Other errors should generate a > >>>>DEBUG message. > > > >I think you misunderstood how the error handling should be done: > > > >+ if (pid == -1) { > >+ if (errno == ECHILD) { > >+ DEBUG(0, ("waitpid failed with ECHILD\n")); > >+ } else if (errno == EINVAL) { > >+ DEBUG(0, ("waitpid failed with EINVAL\n")); > >+ } > >+ return; > >+ } > > > >Instead of having a DEBUG() call per retval, it would be better to inform > >user > >on why the child process exited, similarly to how child_sig_handler() does it > >now. > > > > I don't agree. The system is designed to inform providers that a > child has exited by invoking their callbacks. It's up to the > provider to decide what to do with the result and inform the user if > appropriate. This would also make sense only in the case of a > successful call (pid >= 0).
OK, then why the separate branches for ECHILD and EINVAL? Isn't it easier to just print errno and strerror(errno) ? > > In case of EINTR, we need to repeat the call to waitpid, because it > was interrupted by a signal. In other cases (ECHILD and EINVAL) the > call is going to always fail, so there's won't be any callbacks to > invoke. Yup. > > >>>> > >>> > >>>Ok, I'm done with all of this. It was pretty straight forward, but... > >>> > >>>> > >>>>Please separate any changes you make to the existing sss_child_ctx into > >>>>its own patch. I don't like just renaming it to _old, either. Please > >>>>find a more appropriate way to replace this old object (preferably by > >>>>converting it to use your new mechanism, also in a new patch). > >>>> > >>> > >>>... this, I'm trying to find a way to make the old code use the new > >>>mechanism. I > >>>think it would be better to remove the old one completely and replace it > >>>with > >>>the new one everywhere where it's used. > >>> > >>>Pavel > >> > >>New patch with the above mentioned issues fixed is attached to this email. > >> > >>We agreed that we're going to replace all uses of the old system by > >>the new. Expect about 3 patches to follow on this one - one for each > >>provider using the old system. > >> > >>Pavel > > > > Thanks for taking the time to review this. New patch attached. > > Pavel Two more issues (sorry for not catching all this at once): Please don't just use DEBUG(0) for all the messages. DEBUG(0) is supposed to be used for fatal errors that cause the provider to quit completely. See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html for explanation of the log levels. In general, this patch is going to be included in 1.7 and onwards -- please use the new debug levels (see src/util/util.h) defined as macros instead of integers. The second issue is that I'm getting a segfault when logging in via Kerberos, the backtrace is pointing to tevent_common_check_signal(). _______________________________________________ sssd-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/sssd-devel
