Hello,

We are in the process of moving to Pound as our load balancer, and while it
passed functional testing and worked fine in front of internal users, when
putting it in front of external users, we had to roll back due to a gradual
slow down in our site's response times. After approximately half an hour,
the site became unusable. We're struggling to identify the cause.

Our first thought was that the slow down came as we use cookie based
sessions to one of our backend pools. In the half hour it was in front of
customers, we generated a lot of sessions. We ruled this out by load
testing it, creating a similar number of sessions, and the site remained
responsive.

In the course of load testing, we ran in to issues with ulimit and max
number of open files, as indicated by the Pound log. Reviewing the logs
from the external deployment, however, we don't see any corresponding log
messages, so we don't think this is the cause.

We don't think it can be volume of traffic related. Our current load
balancer handles our traffic fine, memory and cpu utilisation of the Pound
server throughout the half hour look fine, and we're only serving in the
region of 2 - 300 TCP requests per second.

We have Pound doing SSL decryption, and have recompiled it using
--with-maxbuf=16384 to overcome an issue with some users having extremely
large HTTP headers (large cookies from Google analytics).

It looks like the problem manifested as failed connections, with no http
response being returned.

At a bit of a loss as to where to go next. Any advice would be warmly
received.

Jon

Reply via email to