On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote: > > Now we proceed to the next mainloop poll: > > > >> poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, > >> events=POLLIN|POLLPRI}], 3, -1 > > > > Note the -1 (infinity timeout!) > > > > So even though the trigger was (presumably) set, > > and the ->prepare() should have returned true, > > the mainloop waits forever for "something" to happen on those file > > descriptors. > > > > > > I suggest this: > > > > crm_trigger_prepare should set *timeout = 0, if trigger is set. > > > > Also think about this race: crm_trigger_prepare was already > > called, only then the signal came in... > > > > diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c > > index 2e8b1d0..fd17b87 100644 > > --- a/lib/common/mainloop.c > > +++ b/lib/common/mainloop.c > > @@ -33,6 +33,13 @@ static gboolean > > crm_trigger_prepare(GSource * source, gint * timeout) > > { > > crm_trigger_t *trig = (crm_trigger_t *) source; > > + /* Do not delay signal processing by the mainloop poll stage */ > > + if (trig->trigger) > > + *timeout = 0; > > + /* To avoid races between signal delivery and the mainloop poll stage, > > + * make sure we always have a finite timeout. Unit: milliseconds. */ > > + else > > + *timeout = 5000; /* arbitrary */ > > > > return trig->trigger; > > } > > > > > > This scenario does not let the blocked IPC off the hook, though. > > That is still possible, both for blocking send and blocking receive, > > so that should probably be fixed as well, somehow. > > I'm not sure how likely this "stuck in blocking IPC" is, though. > > Interesting, are you sure you're in the right function though? > trigger and signal events don't have a file descriptor... wouldn't > these polls be for the IPC related sources and wouldn't they be > setting their own timeout?
http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs iiuc, mainloop does something similar to (oversimplified): timeout = -1; /* infinity */ for s in all GSource tmp_timeout = -1; s->prepare(s, &tmp_timeout) if (tmp_timeout >= 0 && tmp_timeout < timeout) timeout = tmp_timeout; poll(GSource fd set, n, timeout); for s in all GSource if s->check(s) s->dispatch(s, ...) And at some stage it also orders by priority, of course. Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare(). BTW, the mentioned race between signal delivery and mainloop already doing the poll stage could potentially be solved by using cl_signal_set_interrupt(SIGTERM, 1), which would mean we can condense the prepare to if (trig->trigger) *timeout = 0; return trig->trigger; Glue (and heartbeat) code base is not that, let's say, involved, because someone had been paranoid. But because someone had been paranoid for a reason ;-) Cheers, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org