Firstly, thanks for the in-depth reply! Some thoughts interleaved below...
Rocco Caputo wrote:
On Aug 22, 2006, at 14:06, Nick Williams wrote:
Rocco Caputo wrote:
Do you have a use case where it's impossible to do something under
the new behavior? I'm working under the assumption that a session
can always find a way to call sig(YOUR_SIGNAL_HERE => undef) when
it's ready to be destroyed.
"When it's ready to be destroyed" is the key. The new behaviour
means that sessions need to track that themselves and (in effect)
manage their own reference counting independent of POE's. Before,
they could just rely on POE to let them know via _stop. Now, _stop
is no longer called unless the session clears the sig first. This is
just a catch-22. And it leaves a race condition whereby the session
exists but has declared it no longer wants the signal.
Sessions already do need to manage their reference counts, at least
in the sense that they won't exit until they stop watching for
events. Signal events are just another kind of event, and the
semantics were a bit exceptional for some good reasons that don't
necessarily apply anymore.
If I understand the catch-22 correctly, it's that a session can't
clear its signal watchers from _stop because those watchers prevent
_stop from executing. If that's the case, I'd like to point out that
it's not very useful to clear any resources from _stop to begin
with. POE will automatically reclaim its resources from the session
after _stop returns, so any explicit POE cleanup in _stop is an
expensive no-op.
In my existing code, I'm not cleaning up POE resources - it's *my*
resources I'm cleaning up in _stop. The point is that _stop no longer
gets called because of the signal handlers, so I can no longer use _stop
as a garbage cleanup mechanism similar to DESTROY, since I will by
definition always have to know when the session is going to be
destructed (in order to remove signal handlers) and therefore _stop
becomes superfluous.
To be fair regarding discussions on this list, Jonathan Steinert
announced the intent to make sig() hold sessions alive in his 19
October 2005 message titled "Nastiness, and wrapping up signal
reforms". I replied that day with:
Big change. I don't mind this; the old semantics of not holding
a reference count were tied to _signal, which delivered signals
without sessions explicitly asking for them. _signal is gone
now, so we can tie the explicit interest of sig() into a
reference count to keep the session alive.
Nobody else responded. 17 days later I replied with a public go-
ahead to make the change.
Yes, I realise that there was this discussion previously. However,
speaking purely for myself, I didn't understand the impact of this
at the time, since I wasn't cognizant of the internals of session
reference counting at the time. Now I've looked at this, and I can't
see how the new implementation makes sense.
I can understand how the implementation might be confusing. The
released versions since last December have flaws, especially
regarding reference counting. In fact I recently committed fixes for
them while portability testing some of Benjamin Smith's new tests.
My issue is that that the bugzilla that Jonathan was attempting to
fix is just trivial to fix using existing POE mechanisms of aliases,
since there's an easy point at which you know you want to start the
persistence of the session, and there's a well-defined point at
which you can release the persistence. However, by making the
behaviour of persistence implicit within signals, there is simply no
way to achieve the opposite effect (automatic garbage collection).
The user (the application) must decide at which point it has no more
work to do and at that point it can then clear the signal. And only
then will POE do it's garbage collection and call back to the
application. This just doesn't make sense. Especially when you
compare signals in POE with signals in other dispatchers. Having a
handler configured for a signal should not make that process
persistent.
The point that persistence shouldn't be tied to signals for
flexibility's sake is the start of a slippery slope. What would then
stop us from asserting that some timers should not imply
persistence? Input timeouts, for example. They don't contribute to
a session's lifespan since they're only relevant as long as there's
an I/O watcher. Why then should delay() keep sessions alive?
We're really talking semantics. A delay says (i.e. is documented as)
"call me back at time T" (in effect). The time will always happen (by
definition). So, it makes sense with those semantics that the session
should stay alive until at least time T. However a signal (by
definition) might never happen.
One solution might be to expand the Kernel's APIs for different
semantic variants of each watcher. The endgame for that strategy is
ugly at best.
And after reading your comments, I'll agree with you that many variants
of the API aren't a good idea.
Another solution might be to drop implicit reference counting
altogether. Every session would then need to explicitly hold
something as long as it wanted to live. That's the "alias" idea,
although it should probably be something with a stronger reference
count. Then there's the converse: No session will die unless it
explicitly asks to be killed. Each has a certain charm, although
both have their pitfalls.
I think none of the solutions in the previous paragraph will satisfy
a significant portion of POE's users, so I'd rather just not go there
at this point.
Agreed.
Your last point is a good one. Where possible, POE has used Unix as
a model for its behavior, and signal handlers don't contribute to a
process' lifetime. The new sig() semantics are therefore incongruous
with the base model. Before I trash them, though, I'd like to learn
more about the other side. Are the new sig() semantics necessary? I
seem to recall yes, but I don't remember the details. If Jonathan
Steinert doesn't explain it, I'll need to make time to grovel through
my logs (and the source) to refresh my memory.
The logs imply that the reason it was changed was to deal with a
bugzilla entered about there being no simple way to set up a session to
wait for UI_DESTROY. This is interesting, because from the earlier point
about the distinction between delays (time passing will always happen)
and signals (which may never happen), we can see that UI_DESTROY is a
very special signal that WILL ALWAYS happen (if requested), which is
very different from the rest of the signals. The API was changed to deal
with UI_DESTROY, not taking into account that it's a different beast.
And it's after 04.30 here (it's the only time I could make to answer
this message). I'm not committing to anything until after some sleep.
Again, much thanks for taking the time to put some notes onto the
mailing list for us IRC-deprived people :-).
Maybe I'm thinking about it wrong. To me, "explicit interest of sig
()" does not mean to me that I (this session) want to stay around
until that sig, it means to me that it's merely putting in place a
handler IN CASE of that sig. It's a really really important
distinction. With the huge differentiator that the latter behaviour
of 'in-case-of-handlers' CANNOT be achieved in the new POE signal
world without careful application coding and ignoring any of the
benefits of POE's internal garbage collection.
As a compromise, I've also proposed implicitly that maybe we should
have a new function wait_for_sig() as well as just sig(), so that we
can make the difference in semantics explicit to users. I don't mind
which way around the functions and semantics are achieved, so long
as there is a way of doing this.
People can interpret it either way depending on requirements and
point of view. In the past I've had to explain why sig(USR1 =>
"event") won't keep a daemon alive.
I'm still opposed to methods for semantic variants, though. I'd
rather avoid having both semantics at once if that's possible.
Fair enough.
Nick.