On 01/15/2013 07:39 AM, Rainer Weikusat wrote: >>> internal method named 'stillConnecting' is introduced which >>> contains the check for an 'abort done by the parent' which is presently >>> in ConnOpener::connect, >>> >>> // our parent Jobs signal abort by cancelling their callbacks. >>> if (callback_ == NULL || callback_->canceled()) >>> return; >>> >>> This method is called from both ::connect and ::timeout to determine >>> if the state of the connect request is still undecided by the time >>> either of both methods executes. This guards against the case that >>> both ::connect and ::timeout are scheduled before one of them >>> actually runs and detects when the 'other side' of a full >>> connection has already gone away or is in the process of going away >>> (and hence, the already existing 'result state' should be kept). >> >> The stillConnecting() mechanism should not be needed at all. The >> doneAll() method tests should be what is used to determine whether >> the Job is still running or not. > > These two lines of code are presently at the top of > ConnOpener::connect. If it should really be ConnOpener::doneAll there, > than, the new method is obviously redundant, but that's something I > don't know (since I didn't wrote the code containing the two slightly > different 'still something to do' checks and if in doubt, I'd rather > assume that there was a reason for that).
Amos, the done() protection does not replace stillConnecting() because the job does not know yet that it is done. The automatic done() check happens _after_ the async call into the job, not before that call. [Perhaps done() check should be performed before and after, but we currently do it after because, in part, some callbacks have side effects not covered by done() that require us to call them.] Rainer, the stillConnecting() check should not be needed. Nothing should break if the code requesting a connection has quit while we were waiting. The bzr log message for adding that check is "Save come CPU cycles scheduling useless calls". Personally, I suspect we might be losing more cycles on this useless check than on an occasional useless call, but the common case may be environment-dependent. Amos, are you OK with us removing that optimization during this rewrite? > [...] the timeout code shouldn't generated > any diagnostics about connections which timed out when they actually > didn't. They did, from the timeout code point of view. If there is a race between I/O timeout and I/O readiness, the handler called first (and its point of view) should win. Any other approach leads to needlessly complex and, hence, buggy code (that we are dealing with here). Let's make it simple. Please note that stillConnecting() is misnamed. The function checks whether somebody still needs us to connect, and not that we are still trying to connect. Rainer, please let me know when you are tired of working on this patch, and I will polish it (if you want). I think you are on the right track, but try to make it simpler, not more complex. Think less of what [order of calls] might happen, but how a particular call handler should be written to work correctly regardless of what happened before it. If you need to open (or close) a connection from different handlers, move the opening (or closing) code into a dedicated method. Finally, keep in mind that the write handler is not called for a closing connection. Thus, the write handler cannot have side-effects such as deleting something (unless the swanSong() code deletes that something as well but then you need to be careful so that one of those places does not delete the same thing twice or dereferences a pointer to the deleted memory). This is where leaks are coming from now and where the crashes were coming from before we added leaks. HTH, Alex.
