On Wed, 2016-04-20 at 11:12 +0200, Jakub Hrozek wrote:
> On Wed, Apr 20, 2016 at 10:32:59AM +0200, Jakub Hrozek wrote:
> > > > From 0dff46755af6063ed4b0339020ae5bb686692de1 Mon Sep 17 00:00:00 2001
> > > > From: Simo Sorce <s...@redhat.com>
> > > > Date: Tue, 12 Jan 2016 20:13:28 -0500
> > > > Subject: [PATCH 02/15] Server: Enable Watchdog in all daemons
> > > > 
> > > > This allows the services to self monitor.
> > > > 
> > > > Related:
> > > > https://fedorahosted.org/sssd/ticket/2921
> > > 
> > > Is it intentional that we also enable the watchdog in monitor? I haven't
> > > seen the sssd process being stuck and if it does, we probably have
> > > bigger issues, so it's probably fine, I just need to remember to not
> > > SIGSTOP sssd when testing anymore :)
> > > 
> > > Otherwise ack.
> > 
> > Actually, more questions...
> > 
> > Can you help me test this patch? I tried to inject sleep() into sssd_be
> > code and the sleep was just interrupted by the SIGRT delivery. With SSSD,
> > most of the time the process was stuck was because it was writing a lot of
> > data with fsync()/fdatasync(). I can't find any information in the Linux
> > fsync manpage on how fsync behaves wrt signals. openpub manpages indicate
> > that fsync would return EINTR, which worries me a bit..
> 
> Hmm, sorry, I was not being careful enough. man 7 signal also says:
> """
> The sleep(3) function is also never restarted if interrupted by a
> handler, but gives a success return: the number of seconds remaining to
> sleep.
> """
> 
> so the sleep testcase was wrong even though CatchSignal uses SA_RESTART.
> But do you know how would write() or fsync() behave here? The signal
> manpage is a bit unclar to me as it talks about "slow" devices..
> 
> Or can you think of some easy way to test this?

The fsync manpage here says:
        "The call blocks until the device reports that the transfer has
        completed."
        
And does not report EINTR as a possible error.

That said I am a bit unclear what you want to test actually ?

Yes interruptible calls can be interrupted by a signal, that's always
the case, if we have code that misbehave when a syscall is interrupted
we need to fix that code.

Afaik when we write() we always check the return and retry on EINTR.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to