On Wed, 2016-04-20 at 11:12 +0200, Jakub Hrozek wrote: > On Wed, Apr 20, 2016 at 10:32:59AM +0200, Jakub Hrozek wrote: > > > > From 0dff46755af6063ed4b0339020ae5bb686692de1 Mon Sep 17 00:00:00 2001 > > > > From: Simo Sorce <s...@redhat.com> > > > > Date: Tue, 12 Jan 2016 20:13:28 -0500 > > > > Subject: [PATCH 02/15] Server: Enable Watchdog in all daemons > > > > > > > > This allows the services to self monitor. > > > > > > > > Related: > > > > https://fedorahosted.org/sssd/ticket/2921 > > > > > > Is it intentional that we also enable the watchdog in monitor? I haven't > > > seen the sssd process being stuck and if it does, we probably have > > > bigger issues, so it's probably fine, I just need to remember to not > > > SIGSTOP sssd when testing anymore :) > > > > > > Otherwise ack. > > > > Actually, more questions... > > > > Can you help me test this patch? I tried to inject sleep() into sssd_be > > code and the sleep was just interrupted by the SIGRT delivery. With SSSD, > > most of the time the process was stuck was because it was writing a lot of > > data with fsync()/fdatasync(). I can't find any information in the Linux > > fsync manpage on how fsync behaves wrt signals. openpub manpages indicate > > that fsync would return EINTR, which worries me a bit.. > > Hmm, sorry, I was not being careful enough. man 7 signal also says: > """ > The sleep(3) function is also never restarted if interrupted by a > handler, but gives a success return: the number of seconds remaining to > sleep. > """ > > so the sleep testcase was wrong even though CatchSignal uses SA_RESTART. > But do you know how would write() or fsync() behave here? The signal > manpage is a bit unclar to me as it talks about "slow" devices.. > > Or can you think of some easy way to test this?
The fsync manpage here says: "The call blocks until the device reports that the transfer has completed." And does not report EINTR as a possible error. That said I am a bit unclear what you want to test actually ? Yes interruptible calls can be interrupted by a signal, that's always the case, if we have code that misbehave when a syscall is interrupted we need to fix that code. Afaik when we write() we always check the return and retry on EINTR. Simo. -- Simo Sorce * Red Hat, Inc * New York _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org