On 26/01/2013 6:35 a.m., Alex Rousskov wrote:
On 01/24/2013 09:35 PM, Amos Jeffries wrote:
On 25/01/2013 3:06 p.m., Alex Rousskov wrote:
NP: This is way beyond the simple fixes Ranier was working on. The
changes here relys on code behaviour which will limit the patch to trunk
or 3.3. I was a bit borderline on the earlier on the size of Raniers
patches, but this is going over the change amount I'm comfortable
porting to the stable branch with a beta cycle coming to an end.
I have not looked at v3.2 specifically during this project, so I do not
know which behaviors make my patch incompatible with that version. I
certainly do not insist that you port it to v3.2, but I am happy to help
with that if you decide that it is the best way forward.
* You are still making comments about what "Comm" should do (XXX: Comm
should!). ConnOpener *is* "Comm" at this point in the transaction.
Fixed comments by using specific Comm API names.
* It was probably done beforehand, but it much clearer that it happens
now that the sleep()/DelayedRetry mechanism leaks Pointer()
Where do you see a leak? Please note that our static DelayedRetry event
callback is always called, regardless of the job state/existence, so it
will always delete the Pointer. I clarified that point by adjusting the
comment in cancelSleep().
You commented earlier that the events were scheduled as AsyncCall queue.
You commented earlier that AsyncCall queue would not run a Call if the
Job were gone.
I connected those two statements to imply the DelayRetry event would not
happen. Same as is happening for InProgress.
Uhm, were you meaning these events are *always* run and the scheduling
happens in our wrapper which deletes the Pointer() ?
If so there is no leak. Otherwise there is.
* Looking at your comment in there about comm_close() I see why we are
at such loggerheads about it.
+++ comm_close() does *not* replace the write_data cleanup.
Fixed comments that were missing two facts:
1) ConnOpener's bypass of Comm::Write() API makes
COMMIO_FD_WRITECB(fd)->active() false, which means comm_close() will not
call and cleanup the write handler properly. Sorry for missing that
point. We should consider using Comm::Write eventually so that
comm_close can do this cleanup for us instead (and so that we can use
regular non-leaking job callbacks).
That does ring a memory bell. Sorry I can't recall specifics right now
though.
Good catch.
On the whole I am kind of against calling comm_close() for the sheer
number of conditionals it requires to effectively do a NOP operation. It
is nicer from the code readability and completeness viewpoint, but not
performance. Even a few dozen micro-seconds in the ConnOpener and
related stages can translate to a significatn proportion of a MEM_HIT
transaction time.
2) comm_close() is probably slightly broken itself because its
COMMIO_FD_WRITECB cleanup condition is probably wrong. It should not
matter whether there is an active callback when it comes to the
SetSelect cleanup call. That call should be done unconditionally, and
some SetSelect call implementations may optimize to clean only when
there _was_ a write handler at some point.
We have fixed a related bug #3048 (r10880) by adding those cleanup
calls, but it looks like we should have been even more aggressive and
make cleanup unconditional. This may explain why I see many "FD ready
but there is no handler" debug lines in some cache.logs! If I am
correct, fixing this may even speed things up a little as we would be
polling fewer FDs :-).
Happy to review the patch when you have it ready. :-)
* apart from the timeout unset the relevant parts of the comm_close()
sequence are all in comm_close_complete().
Not exactly. There is more potentially useful code in current
_comm_close(): source code address setting and the arguably less useful
comm_empty_os_read_buffers() call. And who knows what else will be added
or changed there! We should call comm_close() instead of trying to take
it apart and use what we think is currently relevant. IMO this is basic
API sanity policy: If we call comm_open, we must call comm_close.
Okay. However I am trying to work towards an overall goal of reducing
the things comm_close() does, by moving the cleanup of these things into
the Jobs which are setting them.
* the naming of "ConnOpener::open()" is ambiguous. Since it is not the
active open() operation in this Job. It is the getSocket() operation and
should be named as such.
Fixed by renaming open() to createSocket(). I wanted to emphasize that
this method creates something rather than just extracting something that
was there before.
Fair enough. I'm fine with that.
Adjusted patch attached. Besides method renaming and comment changes
discussed above, this patch
* Calls comm_close() as discussed.
* No longer resets COMM_SELECT_WRITE when keeping FD for the job
initiator, to mimic Comm::DoSelect() optimization. We still clear our
write handler, of course.
You still only delete and unset write_data, without clearing the rest of
the write state at the same time via Comm::SetSelect().
* "XXX: We are about to remove write_handler, " is now completely
false. You are about to clear write_data *only*. There are several code
paths leading to cleanFd(). One of which is called keepFd() and does not
result in that extra cleanup you are relying on.
What happens if the write handler is called before being replaced? ...
assert(ptr) crashes Squid.
What happens if I/O loops detect a write_handler set? ... they schedule
write(). then what?
Please do us all a favour and call Comm::SetSelect to clear the write
handler state *fully* at the point the hack takes place without relying
on half-hidden code elsewhere doing it now and forever. This type of
optimization is how many Squid bugs are born.
Amos