Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

Amos Jeffries Wed, 16 Jan 2013 20:58:26 -0800

On 17/01/2013 7:31 a.m., Alex Rousskov wrote:

On 01/15/2013 07:39 AM, Rainer Weikusat wrote:

internal method named 'stillConnecting' is introduced which
contains the check for an 'abort done by the parent' which is presently
in ConnOpener::connect,


      // our parent Jobs signal abort by cancelling their callbacks.
      if (callback_ == NULL || callback_->canceled())
          return;

This method is called from both ::connect and ::timeout to determine
if the state of the connect request is still undecided by the time
either of both methods executes. This guards against the case that
both ::connect and ::timeout are scheduled before one of them
actually runs and detects when the 'other side' of a full
connection has already gone away or is in the process of going away
(and hence, the already existing 'result state' should be kept).

The stillConnecting() mechanism should not be needed at all. The
doneAll() method tests should be what is used to determine whether
the Job is still running or not.

These two lines of code are presently at the top of
ConnOpener::connect. If it should really be ConnOpener::doneAll there,
than, the new method is obviously redundant, but that's something I
don't know (since I didn't wrote the code containing the two slightly
different 'still something to do' checks and if in doubt, I'd rather
assume that there was a reason for that).


Amos, the done() protection does not replace stillConnecting() because
the job does not know yet that it is done. The automatic done() check
happens _after_ the async call into the job, not before that call.
[Perhaps done() check should be performed before and after, but we
currently do it after because, in part, some callbacks have side effects
not covered by done() that require us to call them.]

There is a significant overlap between the ConnOpener doneAll() and thenew stillConnecting().

Whet I was trying to point out is that we have three race conditionsbetween timeout() and connect() Calls...


* connect() happens first and runs re-schedules while timeout() is queued.
  - this is the case we are now ignoring. I'm fine with that.

* connect() happens first and runs doneConnection() while timeout() isqueued.- in this case doneConnecting() will cancel the timeout() queued Call.The Dialer skips cancelled Calls correct? so we have no need to check ifwe are still connecting because we have to be for the timeout Call to benon-canceled. Therefore use of stillConnecting() is a waste of cycles intimeout().

* timeout() happens first and runs doneConnecting() while connect() isqueued.

 - we are still connecting because the connect() Call has not run.

- we have no handle to the connect() AsyncCall stored so we can'tcancel it directly (bug?).- Therefore the connect() call when it is run must still check fordoneAll() details (Dialer did not do it beforehand as you say). However,note that the previous timeout() Call resulted in doneConnecting()clearing all the state and doneAll() after the earlier timeout()presented true.


So we have a choice of:

a) hold a pointer in the calls_ struct to the latest connect() callqueued. And cancel it in doneConnecting() or when in-progress schedulesa new one.

or

b) status-quo: check for doneAll() explicitly insideConnOpener::connect() function and abort/return if it says the jobshould have exited already.


Rainer, the stillConnecting() check should not be needed. Nothing should
break if the code requesting a connection has quit while we were
waiting. The bzr log message for adding that check is "Save come CPU
cycles scheduling useless calls". Personally, I suspect we might be
losing more cycles on this useless check than on an occasional useless
call, but the common case may be environment-dependent.

It prevents connect() function scheduling further connect() Calls aftertimeout() has already sent the callback and the related Job is dead/dying.


Amos, are you OK with us removing that optimization during this rewrite?

No. I'm not clear on what happens if the Job is supposedly exiting byone of the Calls keeps re-scheduling itself as an I/O handler. The needfor actually adding the optimization in the first place was IIRC that wehad a lot of Calls scheduled disregarding the Job status and hoggingmemory the whole time.

I'm more in favour of this patch doing the (a) option above which willmake it clear that all Calls are cancelled explicitly and nothing badcould possibly go wrong.



Amos

Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

Reply via email to