On Wed, Apr 20, 2016 at 09:59:57AM +0200, Jakub Hrozek wrote: > On Tue, Apr 05, 2016 at 02:54:10PM -0400, Simo Sorce wrote: > > On Tue, 2016-04-05 at 12:57 -0400, Simo Sorce wrote: > > > Thanks, IIRC the int-instead of enum use is intentional, I will look > > > at the others. > > > > The last coverity/clang thing is a false positive, but I initialized > > reply to NULL anyway, I expect now it will start complaining of possible > > NULL dereference :-) > > > > Attached find patches that fixes all other issues (hopefully), one of > > them simply dropped an entire function as it turned out I wasn't using > > it. > > > > Simo. > > > > -- > > Simo Sorce * Red Hat, Inc * New York > > Ugh, it took me too long to get to this review properly. Sorry about > this.. > > Since this patchset is large, I will send replies in batches. > > > From e215e5534bb56f3887521443ce6c77d13ea3518d Mon Sep 17 00:00:00 2001 > > From: Simo Sorce <s...@redhat.com> > > Date: Tue, 12 Jan 2016 20:07:59 -0500 > > Subject: [PATCH 01/15] Util: Add watchdog helper > > ACK > > I know Pavel Brezina also reviewed this patch earlier in a separate thread, > so note to self: Add his RB :-) > > > From 0dff46755af6063ed4b0339020ae5bb686692de1 Mon Sep 17 00:00:00 2001 > > From: Simo Sorce <s...@redhat.com> > > Date: Tue, 12 Jan 2016 20:13:28 -0500 > > Subject: [PATCH 02/15] Server: Enable Watchdog in all daemons > > > > This allows the services to self monitor. > > > > Related: > > https://fedorahosted.org/sssd/ticket/2921 > > Is it intentional that we also enable the watchdog in monitor? I haven't > seen the sssd process being stuck and if it does, we probably have > bigger issues, so it's probably fine, I just need to remember to not > SIGSTOP sssd when testing anymore :) > > Otherwise ack.
Actually, more questions... Can you help me test this patch? I tried to inject sleep() into sssd_be code and the sleep was just interrupted by the SIGRT delivery. With SSSD, most of the time the process was stuck was because it was writing a lot of data with fsync()/fdatasync(). I can't find any information in the Linux fsync manpage on how fsync behaves wrt signals. openpub manpages indicate that fsync would return EINTR, which worries me a bit.. > > > From c576e37c188ded932996c9714b18a71251ef1d63 Mon Sep 17 00:00:00 2001 > > From: Simo Sorce <s...@redhat.com> > > Date: Wed, 13 Jan 2016 11:51:09 -0500 > > Subject: [PATCH 03/15] Monitor: Remove ping infrastructure > > Can we also remove the force_timeout option and the functions that use > it? Looking at monitor.c, they are only called from a single failure > situation which is ENOMEM, so I think it's actually a dead code, the > probability we ever call it is very low. > > If you agree with removing the force_timeout but you're busy, let me know > and I can prepare a patch atop yours. > > The only code that proved to be useful in the field of all this we are > about to remove is the diag_cmd(). We could use it to run pstack on the > 'stuck' child to learn where it was actually stuck. But I'm not sure > it's possible to implement it since the watchdog handler is a "plain" > signal handler and the diag_cmd() forks and execs... After reading man 7 signal, it seems that fork() and exec() are OK to be called in a signal handler, but the question above is more urgent IMO.. Also, is there a reason to just deprecate the ping() method and not remove it right away? > > That said, I still think it's worth removing the pings in favor of the > watchdog, the pings proved to be too fragile.. > > Maybe it would be better to file a ticket to check again if systemd > watchdog can be used in the future? > > To be continued.. > _______________________________________________ > sssd-devel mailing list > sssd-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org