Re: Comm::TcpAcceptor::doAccept fd limit handling is broken
Rainer Weikusat writes: [...] > It is possible to hit the 'fd limit' > bug (with a single client) by running squid with a tight file descriptor > limit (eg, 64) and trying hard enough. In order to make for easier > debugging, I changed the TcpAcceptor/ AcceptLimiter code to act as if > only a single file descriptor was available for client connections There's actually a 2nd way to hit this easily but I didn't want to mention that until I had a fix for that in my tree[*]: Configure a port for 'server first' SSL interception and make a direct connection to that. This will cause the proxy to connect to itself in order to peek at the server certificate [until out of memory, goto start of the sentence]. [*] That's based on maintaining a hashed database of the local addresses of all outgoing connections and rejecting incoming connections from any of these addresses. This may not be the smarted way to deal with this situation but it works. But it's written in C as I generally use C for all additions I have to make to 'our squid'.
Re: Comm::TcpAcceptor::doAccept fd limit handling is broken
Alex Rousskov writes: > On 07/16/2014 11:11 AM, Rainer Weikusat wrote: > >> void >> Comm::TcpAcceptor::doAccept(int fd, void *data) >> { >> try { >> debugs(5, 2, HERE << "New connection on FD " << fd); >> >> Must(isOpen(fd)); >> TcpAcceptor *afd = static_cast(data); >> >> if (!okToAccept()) { >> AcceptLimiter::Instance().defer(afd); >> } else { >> afd->acceptNext(); >> } >> SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0); > > >> This is broken because it re-enables readable notifications even if no >> connection was accepted. > > In other words, it re-enables a check for new connections when > > * we know that we currently cannot accept any new connections, > * Squid is under stress already, and > * our code does not handle multiple deferences of the same TcpAcceptor > object well. There's an important 4th point: The check is re-enabled when it is known that it will immediately return true again because the connection which was not accepted is still pending (hence, the socket is still readable), more details below. [...] >> More conservative change which fixes this for my test case >> >> >> --- mad-squid/src/comm/TcpAcceptor.cc 8 Jan 2013 17:36:44 - >> 1.1.1.2 >> +++ mad-squid/src/comm/TcpAcceptor.cc 16 Jul 2014 16:51:25 - >> @@ -204,7 +204,6 @@ >> } else { >> afd->acceptNext(); >> } >> -SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, >> 0); >> >> } catch (const std::exception &e) { >> fatalf("FATAL: error while accepting new client connection: %s\n", >> e.what()); >> @@ -257,7 +256,8 @@ >> notify(flag, newConnDetails); >> mustStop("Listener socket closed"); >> return; >> -} >> +} else if (!isLimited) >> + SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, >> this, 0); >> >> debugs(5, 5, HERE << "Listener: " << conn << >> " accepted new connection " << newConnDetails << > > Looks good to me. AFAICT, the "else" keyword is not needed, and we > should move the SetSelect() call _after_ logging the already accepted > connection. Let's wait for other opinions though. You're correct on this as all other code paths end with a return. Moving it after the debugs also seems to be a good idea because it implies that diagnostic output for stuff which happens because a connection was accepted appears after the message that a connection was accepted. [test code] >> (code modified to accept only one connection at the same time): > > AFAICT, both official and patched code accept only one connection at a > time -- our Select code does not support multiple listeners on a single > FD. There's a misunderstanding here: It is possible to hit the 'fd limit' bug (with a single client) by running squid with a tight file descriptor limit (eg, 64) and trying hard enough. In order to make for easier debugging, I changed the TcpAcceptor/ AcceptLimiter code to act as if only a single file descriptor was available for client connections and created two connections via netcat. What it should do in this case is to ignore the second connection until the first one was closed. Instead, it went into an endless 'defer this connection' loop, as mentioned above, because the pending 2nd connection meant the socket remained readable. updated patch: --- mad-squid/src/comm/TcpAcceptor.cc 8 Jan 2013 17:36:44 - 1.1.1.2 +++ mad-squid/src/comm/TcpAcceptor.cc 16 Jul 2014 20:36:35 - @@ -204,7 +204,6 @@ } else { afd->acceptNext(); } -SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0); } catch (const std::exception &e) { fatalf("FATAL: error while accepting new client connection: %s\n", e.what()); @@ -262,6 +261,8 @@ debugs(5, 5, HERE << "Listener: " << conn << " accepted new connection " << newConnDetails << " handler Subscription: " << theCallSub); + +if (!isLimited) SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, this, 0); notify(flag, newConnDetails); }
Re: Comm::TcpAcceptor::doAccept fd limit handling is broken
On 07/16/2014 11:11 AM, Rainer Weikusat wrote: > void > Comm::TcpAcceptor::doAccept(int fd, void *data) > { > try { > debugs(5, 2, HERE << "New connection on FD " << fd); > > Must(isOpen(fd)); > TcpAcceptor *afd = static_cast(data); > > if (!okToAccept()) { > AcceptLimiter::Instance().defer(afd); > } else { > afd->acceptNext(); > } > SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0); > This is broken because it re-enables readable notifications even if no > connection was accepted. In other words, it re-enables a check for new connections when * we know that we currently cannot accept any new connections, * Squid is under stress already, and * our code does not handle multiple deferences of the same TcpAcceptor object well. A triple whammy of a bug! And this looks like a very old bug, going back to at least 2009 (I have not checked further). Fortunately, the bug primarily affects misconfigured or DoSed Squids. > Judging from what I've seen > from the code, the deferred queue could probably be ripped > completely. Well, we need to restart listening when the number of descriptors goes down again. And since there may be many different waiting listeners some kind of queue seems to be needed to store them, right? We are using the wrong structure for that storage, but that is a separate issue. > More conservative change which fixes this for my test case > > > --- mad-squid/src/comm/TcpAcceptor.cc 8 Jan 2013 17:36:44 - > 1.1.1.2 > +++ mad-squid/src/comm/TcpAcceptor.cc 16 Jul 2014 16:51:25 - > @@ -204,7 +204,6 @@ > } else { > afd->acceptNext(); > } > -SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0); > > } catch (const std::exception &e) { > fatalf("FATAL: error while accepting new client connection: %s\n", > e.what()); > @@ -257,7 +256,8 @@ > notify(flag, newConnDetails); > mustStop("Listener socket closed"); > return; > -} > +} else if (!isLimited) > + SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, > this, 0); > > debugs(5, 5, HERE << "Listener: " << conn << > " accepted new connection " << newConnDetails << Looks good to me. AFAICT, the "else" keyword is not needed, and we should move the SetSelect() call _after_ logging the already accepted connection. Let's wait for other opinions though. > (code modified to accept only one connection at the same time): AFAICT, both official and patched code accept only one connection at a time -- our Select code does not support multiple listeners on a single FD. Your change avoids queuing "billions" of pointers to the same "disabled" listener in the deferred queue [that we do not handle well]. > NB^2: This is an attempt to be polite and helpful, not an invitation to > use me for target practice. Sorry to hear that -- polite and helpful are the best targets :-)! Thanks a lot, Alex.