On 11/01/2011 02:58 PM, Jakub Hrozek wrote:
On Tue, Nov 01, 2011 at 11:34:27AM +0100, Pavel Zuna wrote:On 10/31/2011 04:29 PM, Jakub Hrozek wrote:On Thu, Oct 27, 2011 at 02:03:25PM +0200, Pavel Zuna wrote:On 10/25/2011 01:08 PM, Pavel Zuna wrote:On 10/21/2011 06:44 PM, Stephen Gallagher wrote:On Fri, 2011-10-21 at 15:28 +0200, Pavel Zuna wrote:Base on the second proposal:https://fedorahosted.org/sssd/wiki/DesignDocs/SigChld There is some old SIGCHLD handling code in src/providers/child_common.[ch], that should probably go away if this gets accepted. There was also a naming conflict with the sss_child_ctx structure. This structure is only used internally by functions defined src/providers/child_common.c. I renamed the original structure to sss_child_ctx_old for now.Nack (but a good start) sss_child_destructor() cannot return the result of hash_delete(). Talloc destructors may only return 0 on success or -1 on failure. However, because destructor failure results in cancelling the free(), we should just log an error here and return 0. sss_child_register: Don't initialize sss_child_ctx *tmp to the child_ctx passed in. We want to leave child_ctx untouched until successful completion. Also, please don't use the variable name "tmp". It's not descriptive. Call it "child". Instead of allocating *child_ctx at the beginning, allocate "child" on the mem_ctx. child = talloc_zero(mem_ctx, struct sss_child_ctx); Then, only at the end of execution, right before returning EOK, do *child_ctx = child;In sss_child_register(): + error = hash_enter(sigchld_ctx->children,&key,&value); + if (error != HASH_SUCCESS&& error != HASH_ERROR_KEY_NOT_FOUND) { I don't think hash_enter can return HASH_ERROR_KEY_NOT_FOUND (what would it mean?) and even if it did, why ignore it?sss_child_invoke_cb(): I'm wary of having the callback invocation free the child_ctx. It needs to be documented well (so that consumers know they must not free it once the callback has fired or they'll get a double-free).This still needs fixing.Forgot about that, sorry. Replaced the talloc_free with removing child_ctx from the hash table only.sss_child_handler(): You need to check whether waitpid() returned an error and handle that appropriately. waitpid() will return -1 on error and set errno. If the errno value is EINTR, you should retry (because waitpid() was interrupted by a signal at the time). Other errors should generate a DEBUG message.I think you misunderstood how the error handling should be done: + if (pid == -1) { + if (errno == ECHILD) { + DEBUG(0, ("waitpid failed with ECHILD\n")); + } else if (errno == EINVAL) { + DEBUG(0, ("waitpid failed with EINVAL\n")); + } + return; + } Instead of having a DEBUG() call per retval, it would be better to inform user on why the child process exited, similarly to how child_sig_handler() does it now.I don't agree. The system is designed to inform providers that a child has exited by invoking their callbacks. It's up to the provider to decide what to do with the result and inform the user if appropriate. This would also make sense only in the case of a successful call (pid>= 0).OK, then why the separate branches for ECHILD and EINVAL? Isn't it easier to just print errno and strerror(errno) ?
Yeah, it's definitely better. Improved in current patch. :)
In case of EINTR, we need to repeat the call to waitpid, because it was interrupted by a signal. In other cases (ECHILD and EINVAL) the call is going to always fail, so there's won't be any callbacks to invoke.Yup.Ok, I'm done with all of this. It was pretty straight forward, but...Please separate any changes you make to the existing sss_child_ctx into its own patch. I don't like just renaming it to _old, either. Please find a more appropriate way to replace this old object (preferably by converting it to use your new mechanism, also in a new patch).... this, I'm trying to find a way to make the old code use the new mechanism. I think it would be better to remove the old one completely and replace it with the new one everywhere where it's used. PavelNew patch with the above mentioned issues fixed is attached to this email. We agreed that we're going to replace all uses of the old system by the new. Expect about 3 patches to follow on this one - one for each provider using the old system. PavelThanks for taking the time to review this. New patch attached. PavelTwo more issues (sorry for not catching all this at once): Please don't just use DEBUG(0) for all the messages. DEBUG(0) is supposed to be used for fatal errors that cause the provider to quit completely. See https://fedorahosted.org/pipermail/sssd-devel/2011-June/006407.html for explanation of the log levels. In general, this patch is going to be included in 1.7 and onwards -- please use the new debug levels (see src/util/util.h) defined as macros instead of integers.
Ok.
The second issue is that I'm getting a segfault when logging in via Kerberos, the backtrace is pointing to tevent_common_check_signal().
Finally found what the problem was. I suspect there might be a bug in tevent.The SEGFAULT was generated from tevent_common_check_signal as you pointed out. It happened when copying a portion of siginfo that my signal handler wasn't using.
Simply adding the SA_SIGINFO flag to tevent_add_signal solved the issue. Updated patch attached. Pavel
pzuna_sssd_0002_4-sigchld.patch
Description: application/mbox
_______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel