On 29/01/2013 6:32 a.m., Alex Rousskov wrote:
Hello,

     The attached patch fixes several ConnOpener problems by relying on
AsyncJob protections while maintaining a tighter grip on various I/O and
sleep states. This patch is identical to the last ConnOpener patch
posted in PREVIEW state. I am now done with valgrind testing. No leaks
were detected, although I doubt my tests were comprehensive enough to
trigger all possible conditions where the unpatched code leaks.

I started with Rainer Weikusat's timeout polishing patch posted a few
days ago, but all bugs are mine.


Here are some of the addressed problems:

* Connection descriptor was not closed when attempting to reconnect
after failures. We now properly close on failures, sleep with descriptor
closed, and then reopen.

* Timeout handler was not cleaned up properly in some cases, causing
memory leaks (for the handler Pointer) and possibly timeouts that were
fired (for then-active handler) after the connection was passed to the
initiator.

* Comm close handler was not cleaned up properly.

* statCounter.syscalls.sock.closes counter was not updated on FD closure.

* Waiting pending accepts were not kicked on FD closure.

* Connection timeout was enforced for each connection attempt instead of
applying to all attempts taken together.

and possibly other problems. The full extent of all side-effects of
mishandled race conditions and state conflicts is probably unknown.


TODO (outside this project scope):
   Polish comm_close() to always reset "Select" state.
   Make connect_timeout documentation in squid.conf less ambiguous.
   Move prevalent conn_ debugging to the status() method?
   Polish Comm timeout handling to always reset .timeout on callback?
   Revise eventDelete() to delete between-I/O sleep timeout?


HTH,

Alex.

One performane optimization I can see:

* if the callback_ is != NULL but has been cancelled by the receiving Job. Your code now calls sendAnswer() and schedules the already cancelled Call. Previously this was tested for in swanSong() to simply NULL the callback_ pointer instead of scheduling a dead Call. NP: it would probably be a good idea to test for the callback_->cancelled() case and debugs() report it instead of scheduling.

The result of omitting this check saves one if() test per connection by adding the full overheads of memory, scheduling, and CPU for one AsyncCall queue operation per connection. ConnOpener operates on the server-side where multiple client-side Jobs may result in TIMEOUT or ABORTED. This cancellation case is a small but common case.


+1. Other than the above performance tweak this looks fine for commit.


Amos

Reply via email to