On 25/01/2013 3:06 p.m., Alex Rousskov wrote:
Hello,
The attached patch fixes several ConnOpener problems by relying on
AsyncJob protections while maintaining a tighter grip on various I/O and
sleep states. It is in PREVIEW state because I would like to do more
testing, but it did pass basic tests, and I am not currently aware of
serious problems with the patch code.
I started with Rainer Weikusat's timeout polishing patch posted
yesterday, but all bugs are mine.
Here are some of the addressed problems:
* Connection descriptor was not closed when attempting to reconnect
after failures. We now properly close on failures, sleep with descriptor
closed, and then reopen.
* Timeout handler was not cleaned up properly in some cases, causing
memory leaks (for the handler Pointer) and possibly timeouts that were
fired (for then-active handler), after the connection was passed to the
initiator.
* Comm close handler was not cleaned up properly.
* Connection timeout was enforced for each connection attempt instead of
all attempts together.
and possibly other problems. The full extent of all side-effects of
mishandled race conditions and state conflicts is probably unknown.
TODO: Needs more testing, especially around corner cases.
Does somebody need more specific callback cancellation reasons?
Consider calling comm_close instead of direct write_data cleanup.
Make connect_timeout documentation in squid.conf less ambiguous.
Move prevalent conn_ debugging to the status() method?
Polish Comm timeout handling to always reset .timeout on callback?
Consider revising eventDelete() to delete between-I/O sleep
timeout.
Feedback welcomed.
NP: This is way beyond the simple fixes Ranier was working on. The
changes here relys on code behaviour which will limit the patch to trunk
or 3.3. I was a bit borderline on the earlier on the size of Raniers
patches, but this is going over the change amount I'm comfortable
porting to the stable branch with a beta cycle coming to an end.
Auditing anyway:
* You are still making comments about what "Comm" should do (XXX: Comm
should!). ConnOpener *is* "Comm" at this point in the transaction. If
"Comm" needs to do anything then it is *this* objects responsibility
scope to see that it happens. If thre is a *simple* helper function
elsewhere in comm_*() or Comm:: or fd_*() which can help so be it, but
this object *is* Comm and needs to peform the "Comm should do X"
operations as related to state opening an FD.
* It was probably done beforehand, but it much clearer that it happens
now that the sleep()/DelayedRetry mechanism leaks Pointer() as well as
the InProgress mechanism.
+++ IMHO: leave it leaking, the use-case is a rarity and we can update
the event API separately and faster than we can fix all the callers to
workaround it.
* Looking at your comment in there about comm_close() I see why we are
at such loggerheads about it.
+++ comm_close() does *not* replace the write_data cleanup.
- write_data cleanup is explicitly and solely to remove the leaked
Pointer()'s *nothing else*. The extra two lines of code are to ensure
that hack does not corrupt anything. Until those fd_table pointers stop
being dynamic this code is non-optional.
- even if you called comm_close() you would need to perform the
write_data cleanup before calling it to prevent the leak.
* apart from the timeout unset the relevant parts of the comm_close()
sequence are all in comm_close_complete(). Perhapse you should schedule
one of those calls instead of clearing the handlers and calling
fd_table() synchronously..
++ but notice how that (and comm_close()) adds a second async delay on
the FD becoming available for re-use. This is a performance
optimization. You will need to setup a synchronous comm_close_complete()
to achieve the same speed.
++ the other operations in com_close() function itself are all NOP. No
other component has been given the FD or conn_ to set any state themselves.
* the naming of "ConnOpener::open()" is ambiguous. Since it is not the
active open() operation in this Job. It is the getSocket() operation and
should be named as such.
Amos