URL: https://github.com/SSSD/sssd/pull/107 Title: #107: WATCHDOG: Avoid non async-signal-safe from the signal_handler
jhrozek commented: """ On Tue, Dec 13, 2016 at 07:06:58AM -0800, Simo Sorce wrote: > On Tue, 2016-12-13 at 05:59 -0800, Jakub Hrozek wrote: > > On Tue, Dec 13, 2016 at 02:44:44AM -0800, Simo Sorce wrote: > > > On Tue, 2016-12-13 at 02:25 -0800, fidencio wrote: > > > > Pavel, > > > > > > > > On Tue, Dec 13, 2016 at 11:20 AM, Pavel Březina > > > > <[email protected]> > > > > wrote: > > > > > > > > > There are two scenarios: > > > > > > > > > > 1. timeshift during system boot -- it is very common to be several > > > > > hours > > > > > 2. timeshift due to an ntp update when booted up -- usually only > > > > > few > > > > > seconds, not a big deal > > > > > > > > > > The problem with tevent timer is that if we shift backwards the timer > > > > > remains too far in the future. This applies to all timers, not only > > > > > for the > > > > > watchdog. Forward shift is not a problem, it just executes the timers > > > > > immediately. Resetting the watchdog helps in a way that sssd is not > > > > > killed, > > > > > we don't have any capability to reschedule all timed event and we > > > > > actually > > > > > can not tell that sssd will be functioning properly (dyndns, sudo > > > > > refresh, > > > > > enumeration, domain refresh, even idle timer on socket activation)... > > > > > all > > > > > those operations that depends on time() would become unreliable. > > > > > > > > > > I think the best thing to do would be restart the process (although > > > > > the > > > > > question is how would this affect the boot up) and patch tevent to > > > > > deal > > > > > with timeshift either by using monotonic clock or by detecting them > > > > > and > > > > > altering timers accordingly. > > > > > > > > > > > > > In the latest version of patch I've just called _exit(1) when the > > > > timeshift > > > > is detected. > > > > About patching tevent, I've seen some old discussions happening and it > > > > doesn't seem a trivial thing to do. Would the patch, as it is right > > > > now, be > > > > acceptable and then a work on tevent could be done later (yes, I'd add > > > > it > > > > to my queue and do it as soon as we have an agreement on doing this)? > > > > > > This is really a blunt tool (calling exit()), but until tevent can be > > > fixed the only other option would be to use some wrapper to keep track > > > of all existing timed events and cancel and restart them all if the > > > clock changes abruptly. > > > > that's why I suggested signaling self to a tevent-driven signal handler > > from where we can just set up the timer anew. > > > > If there is any other way to 'break out' of the POSIX signal handler > > into somewhere where we can call tevent/talloc (or in general unsafe > > calls) I'm all ears. > > I guess I need to understand better what exactly you want to do to be > able to advice on something. I can think of a coulpe of options, none of > them particularly elegant :) OK, let me try to explain better. A machine drifts time. Then an SSSD process receives SIGRT in watchdog_handler() and detects the time has drifted, so it avoids increasing the watchdog ticks counter -- this is done in watchdog_detect_timeshift() at the moment. At that point, in the current master, we call teardown_watchdog() and setup_watchdog() to set a new watchdog (the part that is based on tevent timers). This is unsafe to do in a signal handler because it involves malloc and free among others called from tevent. What I'm trying to figure out is how to reset the watchdog when I detect in watchdog_detect_timeshift() the time is out of sync and the tevent timer that resets the ticks will not arrive until the sssd process receives enough SIGRT signals to get itself killed. Does the question make sense now? """ See the full comment at https://github.com/SSSD/sssd/pull/107#issuecomment-266778681
_______________________________________________ sssd-devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
