Alex Rousskov <[email protected]> writes: > On 01/13/2013 03:10 PM, Rainer Weikusat wrote: >> Assuming that a client attempted to contact a HTTPS server which is >> actually a 'port 443 blackhole' (meaning, attempts to connect to the >> corresponding address and port 443 don't result in any kind of reply) >> and this request was intercepted by a squid configured to do 'server >> first' SSL bumping, the timeout squid enforces for the asynchronous >> connect requests ultimatively triggers in assert in forward.cc. This >> happens because the ConnOpener::timeout method calls >> ConnOpener::connect which - in turn - calls comm_connect_addr to >> determine the status of the connection attempt. This routine uses >> getsockopt/ SOL_SOCKET/ SO_ERROR to determine if the connect >> succeeded. Because nothing was received from the remote endpoint, at >> least on Linux, the result will be 'no error' which means a 'false >> positive' 'connection sucessfully established' situation >> occurs. > > Hi Rainer, > > Nice analysis, thank you! Have you seen the discussion about > ConnOpener problems in the squid-dev thread called "ICAP connections > under heavy loads"? (If you have not, please review -- it is mostly not > about ICAP). I suspect the comprehensive solution sketched out there > solves this problem as well.
Well, fixing the timeout handling because of another problem this also causes would also fix this problem (and another nascent one I happen to be aware of, namely, the only reason why this doesn't bomb out for plain HTTP, too, is that the client-side will time out first). But I've read through the discussion and I agree with the opinion that the 'squid vs tcp' race is a moot issue: Because the connect is asynchronous, it may succeed at any time after connect was called and including a single, additional check for 'did it succeed in the meantime' is not going to solve the problem because it could suceed one microsecond after this check: Any timeout which is shorter than the connection establishment timeout enforced by the kernel will occasionally cause a spurious connection failure and actually, even the kernel timeout will occasionally cause that because the SYN-ACK could arrive immediately after the kernel has decided to give up aka "the internet isn't reliable". I also agree with the other opinion that the existing timeout handling code is heavily contorted and that the connect code should deal with connections and the timeout code with timeouts, especially considering that the 'check for timeout' in connect is also done in checkTimeouts (comm.cc) and that the connect check in its present form will come to the conclusion 'no timeout' after the checkTimeouts code concluded 'timeout'. In any case, I need a working solution for this now because my employer uses 3.3 for at least one customer. Since I really don't like the hack I did yesterday, I've code what I consider to be a sensible approach to deal with this issue instead. Because my boss also requested that I should make this available to the project, I'm going to send a third e-mail with the 2nd version of the patch.
