Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

Alex Rousskov Wed, 16 Jan 2013 10:32:00 -0800

On 01/15/2013 07:39 AM, Rainer Weikusat wrote:
>>> internal method named 'stillConnecting' is introduced which
>>> contains the check for an 'abort done by the parent' which is presently
>>> in ConnOpener::connect,
>>>
>>>      // our parent Jobs signal abort by cancelling their callbacks.
>>>      if (callback_ == NULL || callback_->canceled())
>>>          return;
>>>
>>> This method is called from both ::connect and ::timeout to determine
>>> if the state of the connect request is still undecided by the time
>>> either of both methods executes. This guards against the case that
>>> both ::connect and ::timeout are scheduled before one of them
>>> actually runs and detects when the 'other side' of a full
>>> connection has already gone away or is in the process of going away
>>> (and hence, the already existing 'result state' should be kept).
>>
>> The stillConnecting() mechanism should not be needed at all. The
>> doneAll() method tests should be what is used to determine whether
>> the Job is still running or not.
> 
> These two lines of code are presently at the top of
> ConnOpener::connect. If it should really be ConnOpener::doneAll there,
> than, the new method is obviously redundant, but that's something I
> don't know (since I didn't wrote the code containing the two slightly
> different 'still something to do' checks and if in doubt, I'd rather
> assume that there was a reason for that).



Amos, the done() protection does not replace stillConnecting() because
the job does not know yet that it is done. The automatic done() check
happens _after_ the async call into the job, not before that call.
[Perhaps done() check should be performed before and after, but we
currently do it after because, in part, some callbacks have side effects
not covered by done() that require us to call them.]

Rainer, the stillConnecting() check should not be needed. Nothing should
break if the code requesting a connection has quit while we were
waiting. The bzr log message for adding that check is "Save come CPU
cycles scheduling useless calls". Personally, I suspect we might be
losing more cycles on this useless check than on an occasional useless
call, but the common case may be environment-dependent.

Amos, are you OK with us removing that optimization during this rewrite?


> [...] the timeout code shouldn't generated
> any diagnostics about connections which timed out when they actually
> didn't.

They did, from the timeout code point of view. If there is a race
between I/O timeout and I/O readiness, the handler called first (and its
point of view) should win. Any other approach leads to needlessly
complex and, hence, buggy code (that we are dealing with here). Let's
make it simple.


Please note that stillConnecting() is misnamed. The function checks
whether somebody still needs us to connect, and not that we are still
trying to connect.

Rainer, please let me know when you are tired of working on this patch,
and I will polish it (if you want). I think you are on the right track,
but try to make it simpler, not more complex. Think less of what [order
of calls] might happen, but how a particular call handler should be
written to work correctly regardless of what happened before it. If you
need to open (or close) a connection from different handlers, move the
opening (or closing) code into a dedicated method.

Finally, keep in mind that the write handler is not called for a closing
connection. Thus, the write handler cannot have side-effects such as
deleting something (unless the swanSong() code deletes that something as
well but then you need to be careful so that one of those places does
not delete the same thing twice or dereferences a pointer to the deleted
memory). This is where leaks are coming from now and where the crashes
were coming from before we added leaks.


HTH,

Alex.

Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

Reply via email to