On 1/09/2012 5:05 a.m., Alex Rousskov wrote:
On 08/31/2012 09:01 AM, Alexander Komyagin wrote:
Alex, I figured it out, finally! The bug was in comm_connect_addr()
function (I suppose it is kernel-dependent though).
Consider following call trace:
1) Xaction starts ConnOpener in order to create new connection to ICAP
2) ConnOpener calls comm_connect_addr()
3) comm_connect_addr() initiates connection through connect() and
returns COMM_INPROGRESS
...since our ICAP service is too busy connection will eventually timeout
(connectTimeout_), so...
4) Comm::ConnOpener::timeout() is called
5) Comm::ConnOpener::timeout calls connect()
6) connect() calls comm_connect_addr()
7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check
current connection status
8) getsockopt(..) returns errno 0 <--- BUG
Actually, the bug is in step #5. If we timed out, we should not try to
connect again.
Step #8 seems correct -- from the OS point of view, internal Squid
timeout is not an error. If you fix step #5, step #8 will not happen though.
9) comm_connect_addr() returns COMM_OK so ConnOpener would think that
connection was successfully established and pass the connection back to
Xaction object, then we would have noteCommRead and noteCommWrote
exceptions and so on...
According to `man connect`, when connect() returns errno EINPROGRESS:
EINPROGRESS
The socket is nonblocking and the connection cannot be completed
immediately. It is possible to select(2) or poll(2) for
completion by selecting the socket for writing. After select(2)
indicates writability, use getsockopt(2) to read the SO_ERROR
option at level SOL_SOCKET to determine whether connect()
completed successfully (SO_ERROR is zero) or unsuccessfully
(SO_ERROR is one of the usual error codes listed here,
explaining the reason for the failure).
Actually ConnOpener uses SetSelect(..) in order to wait before calling
comm_connect_addr() (and connect() ) again, but timeout in fact breaks
the rule calling getsockopt() right after connect().
The above man page excerpt says to call getsockopt() after successful
select() notification. I think that is what we do now.
I do not think we are or should be calling connect() twice -- the OS
knows that we want to connect and does not need reminders. Note the
(!F->flags.called_connect) guard in comm.cc.
The "connect()" mentioned in step #5 and #6 is
Comm::ConnOpener::connect() which calls the comm.cc code.
If you are looking for connect-related bugs, I recommend fixing step #5
and also examining this hack:
x = connect(sock, AI->ai_addr, AI->ai_addrlen);
// XXX: ICAP code refuses callbacks during a pending comm_ call
// Async calls development will fix this.
if (x == 0) {
x = -1;
errno = EINPROGRESS;
}
I hope the above XXX code is no longer needed (and removing it will save
a lot of CPU cycles for everybody, not just ICAP!), but this needs to be
double checked using bzr logs to restore the exact reason behind that
hack and by examining the current ICAP code.
In my environment we use uClibc and rsbac kernel, but I suppose that the
problem may persist in other environment too.
I've attached two small patches. One - to avoid connection loss in case
client aborted the request before connection to the ICAP was established
(remember that "BUG: Orphaned..." stuff I asked you before?).
Another one is to actually fix the ICAP connection timeout (not only
ICAP actually) problem. Comments will be appreciated.
* For the conn-open-timeout patch:
Overall, I think this patch is fighting the side-effects of other bugs.
Most importantly, the timeout handler should abort the ConnOpener job on
the spot rather than go through one more select() try. I will comment on
specific changes below, just in case, but please keep this overall
problem in mind.
ConnOpener should be obeying squid.conf connect_retries option for
number of times it closes the socket, re-open and re-connect()s.
case COMM_INPROGRESS:
// check for timeout FIRST.
-if (squid_curtime - connectStart_ > connectTimeout_) {
+if (squid_curtime - connectStart_ >= connectTimeout_) {
debugs(5, 5, HERE << conn_ << ": * - ERR took too long already.");
Please revert this change. In theory, we can be called back again before
(squid_curtime - connectStart_) changes. Not a big deal, of course, (the
equality should be a very rare event) but the old code was more accurate
from logic/math point of view.
Also, if the debug message were to print by how much we were late,
printing zero would look strange.
If there is some hidden side-effect or dependency here that I missed,
please let me know (but note that I think this code must not be called
from the timeout handler -- as discussed above, the timeout handler
should abort instead of double checking whether there is a timeout).