Re: Comm::TcpAcceptor::doAccept fd limit handling is broken

2014-07-18 Thread Rainer Weikusat
Rainer Weikusat  writes:

[...]

> It is possible to hit the 'fd limit'
> bug (with a single client) by running squid with a tight file descriptor
> limit (eg, 64) and trying hard enough. In order to make for easier
> debugging, I changed the TcpAcceptor/ AcceptLimiter code to act as if
> only a single file descriptor was available for client connections

There's actually a 2nd way to hit this easily but I didn't want to
mention that until I had a fix for that in my tree[*]: Configure a port
for 'server first' SSL interception and make a direct connection to
that. This will cause the proxy to connect to itself in order to peek at
the server certificate [until out of memory, goto start of the sentence].

[*] That's based on maintaining a hashed database of the local addresses
of all outgoing connections and rejecting incoming connections from any
of these addresses. This may not be the smarted way to deal with this
situation but it works. But it's written in C as I generally use C for
all additions I have to make to 'our squid'.


Re: Comm::TcpAcceptor::doAccept fd limit handling is broken

2014-07-16 Thread Rainer Weikusat
Alex Rousskov  writes:
> On 07/16/2014 11:11 AM, Rainer Weikusat wrote:
>
>> void
>> Comm::TcpAcceptor::doAccept(int fd, void *data)
>> {
>> try {
>> debugs(5, 2, HERE << "New connection on FD " << fd);
>> 
>> Must(isOpen(fd));
>> TcpAcceptor *afd = static_cast(data);
>> 
>> if (!okToAccept()) {
>> AcceptLimiter::Instance().defer(afd);
>> } else {
>> afd->acceptNext();
>> }
>> SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0);
>
>
>> This is broken because it re-enables readable notifications even if no
>> connection was accepted. 
>
> In other words, it re-enables a check for new connections when
>
> * we know that we currently cannot accept any new connections,
> * Squid is under stress already, and
> * our code does not handle multiple deferences of the same TcpAcceptor
> object well.

There's an important 4th point: The check is re-enabled when it is known
that it will immediately return true again because the connection which
was not accepted is still pending (hence, the socket is still readable),
more details below.

[...]

>> More conservative change which fixes this for my test case
>> 
>> 
>> --- mad-squid/src/comm/TcpAcceptor.cc   8 Jan 2013 17:36:44 -   
>> 1.1.1.2
>> +++ mad-squid/src/comm/TcpAcceptor.cc   16 Jul 2014 16:51:25 -
>> @@ -204,7 +204,6 @@
>>  } else {
>>  afd->acceptNext();
>>  }
>> -SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 
>> 0);
>>  
>>  } catch (const std::exception &e) {
>>  fatalf("FATAL: error while accepting new client connection: %s\n", 
>> e.what());
>> @@ -257,7 +256,8 @@
>>  notify(flag, newConnDetails);
>>  mustStop("Listener socket closed");
>>  return;
>> -}
>> +} else if (!isLimited)
>> +   SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, 
>> this, 0);
>>  
>>  debugs(5, 5, HERE << "Listener: " << conn <<
>> " accepted new connection " << newConnDetails <<
>
> Looks good to me. AFAICT, the "else" keyword is not needed, and we
> should move the SetSelect() call _after_ logging the already accepted
> connection. Let's wait for other opinions though.

You're correct on this as all other code paths end with a return. Moving
it after the debugs also seems to be a good idea because it implies that
diagnostic output for stuff which happens because a connection was
accepted appears after the message that a connection was accepted.

[test code]
>> (code modified to accept only one connection at the same time):
>
> AFAICT, both official and patched code accept only one connection at a
> time -- our Select code does not support multiple listeners on a single
> FD.

There's a misunderstanding here: It is possible to hit the 'fd limit'
bug (with a single client) by running squid with a tight file descriptor
limit (eg, 64) and trying hard enough. In order to make for easier
debugging, I changed the TcpAcceptor/ AcceptLimiter code to act as if
only a single file descriptor was available for client connections and
created two connections via netcat. What it should do in this case is to
ignore the second connection until the first one was closed. Instead, it
went into an endless 'defer this connection' loop, as mentioned above,
because the pending 2nd connection meant the socket remained readable.

updated patch:

--- mad-squid/src/comm/TcpAcceptor.cc   8 Jan 2013 17:36:44 -   1.1.1.2
+++ mad-squid/src/comm/TcpAcceptor.cc   16 Jul 2014 20:36:35 -
@@ -204,7 +204,6 @@
 } else {
 afd->acceptNext();
 }
-SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0);
 
 } catch (const std::exception &e) {
 fatalf("FATAL: error while accepting new client connection: %s\n", 
e.what());
@@ -262,6 +261,8 @@
 debugs(5, 5, HERE << "Listener: " << conn <<
" accepted new connection " << newConnDetails <<
" handler Subscription: " << theCallSub);
+
+if (!isLimited) SetSelect(conn->fd, COMM_SELECT_READ, 
Comm::TcpAcceptor::doAccept, this, 0);
 notify(flag, newConnDetails);
 }
 


Re: Comm::TcpAcceptor::doAccept fd limit handling is broken

2014-07-16 Thread Alex Rousskov
On 07/16/2014 11:11 AM, Rainer Weikusat wrote:

> void
> Comm::TcpAcceptor::doAccept(int fd, void *data)
> {
> try {
> debugs(5, 2, HERE << "New connection on FD " << fd);
> 
> Must(isOpen(fd));
> TcpAcceptor *afd = static_cast(data);
> 
> if (!okToAccept()) {
> AcceptLimiter::Instance().defer(afd);
> } else {
> afd->acceptNext();
> }
> SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0);


> This is broken because it re-enables readable notifications even if no
> connection was accepted. 

In other words, it re-enables a check for new connections when

* we know that we currently cannot accept any new connections,
* Squid is under stress already, and
* our code does not handle multiple deferences of the same TcpAcceptor
object well.

A triple whammy of a bug! And this looks like a very old bug, going back
to at least 2009 (I have not checked further). Fortunately, the bug
primarily affects misconfigured or DoSed Squids.


> Judging from what I've seen
> from the code, the deferred queue could probably be ripped
> completely. 

Well, we need to restart listening when the number of descriptors goes
down again. And since there may be many different waiting listeners some
kind of queue seems to be needed to store them, right? We are using the
wrong structure for that storage, but that is a separate issue.


> More conservative change which fixes this for my test case
> 
> 
> --- mad-squid/src/comm/TcpAcceptor.cc   8 Jan 2013 17:36:44 -   
> 1.1.1.2
> +++ mad-squid/src/comm/TcpAcceptor.cc   16 Jul 2014 16:51:25 -
> @@ -204,7 +204,6 @@
>  } else {
>  afd->acceptNext();
>  }
> -SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0);
>  
>  } catch (const std::exception &e) {
>  fatalf("FATAL: error while accepting new client connection: %s\n", 
> e.what());
> @@ -257,7 +256,8 @@
>  notify(flag, newConnDetails);
>  mustStop("Listener socket closed");
>  return;
> -}
> +} else if (!isLimited)
> +   SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, 
> this, 0);
>  
>  debugs(5, 5, HERE << "Listener: " << conn <<
> " accepted new connection " << newConnDetails <<

Looks good to me. AFAICT, the "else" keyword is not needed, and we
should move the SetSelect() call _after_ logging the already accepted
connection. Let's wait for other opinions though.


> (code modified to accept only one connection at the same time):

AFAICT, both official and patched code accept only one connection at a
time -- our Select code does not support multiple listeners on a single
FD. Your change avoids queuing "billions" of pointers to the same
"disabled" listener in the deferred queue [that we do not handle well].


> NB^2: This is an attempt to be polite and helpful, not an invitation to
> use me for target practice.

Sorry to hear that -- polite and helpful are the best targets :-)!


Thanks a lot,

Alex.