Re: [PREVIEW] ConnOpener fixes

Amos Jeffries Thu, 24 Jan 2013 20:36:03 -0800

On 25/01/2013 3:06 p.m., Alex Rousskov wrote:

Hello,


     The attached patch fixes several ConnOpener problems by relying on
AsyncJob protections while maintaining a tighter grip on various I/O and
sleep states. It is in PREVIEW state because I would like to do more
testing, but it did pass basic tests, and I am not currently aware of
serious problems with the patch code.

I started with Rainer Weikusat's timeout polishing patch posted
yesterday, but all bugs are mine.


Here are some of the addressed problems:

* Connection descriptor was not closed when attempting to reconnect
after failures. We now properly close on failures, sleep with descriptor
closed, and then reopen.

* Timeout handler was not cleaned up properly in some cases, causing
memory leaks (for the handler Pointer) and possibly timeouts that were
fired (for then-active handler), after the connection was passed to the
initiator.

* Comm close handler was not cleaned up properly.

* Connection timeout was enforced for each connection attempt instead of
all attempts together.

and possibly other problems. The full extent of all side-effects of
mishandled race conditions and state conflicts is probably unknown.


TODO: Needs more testing, especially around corner cases.
       Does somebody need more specific callback cancellation reasons?
       Consider calling comm_close instead of direct write_data cleanup.
       Make connect_timeout documentation in squid.conf less ambiguous.
       Move prevalent conn_ debugging to the status() method?
       Polish Comm timeout handling to always reset .timeout on callback?
       Consider revising eventDelete() to delete between-I/O sleep
       timeout.

Feedback welcomed.

NP: This is way beyond the simple fixes Ranier was working on. Thechanges here relys on code behaviour which will limit the patch to trunkor 3.3. I was a bit borderline on the earlier on the size of Ranierspatches, but this is going over the change amount I'm comfortableporting to the stable branch with a beta cycle coming to an end.


Auditing anyway:

* You are still making comments about what "Comm" should do (XXX: Commshould!). ConnOpener *is* "Comm" at this point in the transaction. If"Comm" needs to do anything then it is *this* objects responsibilityscope to see that it happens. If thre is a *simple* helper functionelsewhere in comm_*() or Comm:: or fd_*() which can help so be it, butthis object *is* Comm and needs to peform the "Comm should do X"operations as related to state opening an FD.

* It was probably done beforehand, but it much clearer that it happensnow that the sleep()/DelayedRetry mechanism leaks Pointer() as well asthe InProgress mechanism.+++ IMHO: leave it leaking, the use-case is a rarity and we can updatethe event API separately and faster than we can fix all the callers toworkaround it.

* Looking at your comment in there about comm_close() I see why we areat such loggerheads about it.

  +++ comm_close() does *not* replace the write_data cleanup.

- write_data cleanup is explicitly and solely to remove the leakedPointer()'s *nothing else*. The extra two lines of code are to ensurethat hack does not corrupt anything. Until those fd_table pointers stopbeing dynamic this code is non-optional.- even if you called comm_close() you would need to perform thewrite_data cleanup before calling it to prevent the leak.

* apart from the timeout unset the relevant parts of the comm_close()sequence are all in comm_close_complete(). Perhapse you should scheduleone of those calls instead of clearing the handlers and callingfd_table() synchronously..++ but notice how that (and comm_close()) adds a second async delay onthe FD becoming available for re-use. This is a performanceoptimization. You will need to setup a synchronous comm_close_complete()to achieve the same speed.++ the other operations in com_close() function itself are all NOP. Noother component has been given the FD or conn_ to set any state themselves.

* the naming of "ConnOpener::open()" is ambiguous. Since it is not theactive open() operation in this Job. It is the getSocket() operation andshould be named as such.


Amos

Re: [PREVIEW] ConnOpener fixes

Reply via email to