On Wed, Apr 20, 2016 at 09:59:57AM +0200, Jakub Hrozek wrote:
> On Tue, Apr 05, 2016 at 02:54:10PM -0400, Simo Sorce wrote:
> > On Tue, 2016-04-05 at 12:57 -0400, Simo Sorce wrote:
> > > Thanks, IIRC the int-instead of enum use is intentional, I will look
> > > at the others.
> > 
> > The last coverity/clang thing is a false positive, but I initialized
> > reply to NULL anyway, I expect now it will start complaining of possible
> > NULL dereference :-)
> > 
> > Attached find patches that fixes all other issues (hopefully), one of
> > them simply dropped an entire function as it turned out I wasn't using
> > it.
> > 
> > Simo.
> > 
> > -- 
> > Simo Sorce * Red Hat, Inc * New York
> 
> Ugh, it took me too long to get to this review properly. Sorry about
> this..
> 
> Since this patchset is large, I will send replies in batches.
> 
> > From e215e5534bb56f3887521443ce6c77d13ea3518d Mon Sep 17 00:00:00 2001
> > From: Simo Sorce <s...@redhat.com>
> > Date: Tue, 12 Jan 2016 20:07:59 -0500
> > Subject: [PATCH 01/15] Util: Add watchdog helper
> 
> ACK
> 
> I know Pavel Brezina also reviewed this patch earlier in a separate thread,
> so note to self: Add his RB :-)
> 
> > From 0dff46755af6063ed4b0339020ae5bb686692de1 Mon Sep 17 00:00:00 2001
> > From: Simo Sorce <s...@redhat.com>
> > Date: Tue, 12 Jan 2016 20:13:28 -0500
> > Subject: [PATCH 02/15] Server: Enable Watchdog in all daemons
> > 
> > This allows the services to self monitor.
> > 
> > Related:
> > https://fedorahosted.org/sssd/ticket/2921
> 
> Is it intentional that we also enable the watchdog in monitor? I haven't
> seen the sssd process being stuck and if it does, we probably have
> bigger issues, so it's probably fine, I just need to remember to not
> SIGSTOP sssd when testing anymore :)
> 
> Otherwise ack.

Actually, more questions...

Can you help me test this patch? I tried to inject sleep() into sssd_be
code and the sleep was just interrupted by the SIGRT delivery. With SSSD,
most of the time the process was stuck was because it was writing a lot of
data with fsync()/fdatasync(). I can't find any information in the Linux
fsync manpage on how fsync behaves wrt signals. openpub manpages indicate
that fsync would return EINTR, which worries me a bit..

> 
> > From c576e37c188ded932996c9714b18a71251ef1d63 Mon Sep 17 00:00:00 2001
> > From: Simo Sorce <s...@redhat.com>
> > Date: Wed, 13 Jan 2016 11:51:09 -0500
> > Subject: [PATCH 03/15] Monitor: Remove ping infrastructure
> 
> Can we also remove the force_timeout option and the functions that use
> it? Looking at monitor.c, they are only called from a single failure
> situation which is ENOMEM, so I think it's actually a dead code, the
> probability we ever call it is very low.
> 
> If you agree with removing the force_timeout but you're busy, let me know
> and I can prepare a patch atop yours.
> 
> The only code that proved to be useful in the field of all this we are
> about to remove is the diag_cmd(). We could use it to run pstack on the
> 'stuck' child to learn where it was actually stuck. But I'm not sure
> it's possible to implement it since the watchdog handler is a "plain"
> signal handler and the diag_cmd() forks and execs...

After reading man 7 signal, it seems that fork() and exec() are OK to be
called in a signal handler, but the question above is more urgent IMO..

Also, is there a reason to just deprecate the ping() method and not
remove it right away?

> 
> That said, I still think it's worth removing the pings in favor of the
> watchdog, the pings proved to be too fragile..
> 
> Maybe it would be better to file a ticket to check again if systemd
> watchdog can be used in the future?
> 
> To be continued..
> _______________________________________________
> sssd-devel mailing list
> sssd-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to