Hello, We are in the process of moving to Pound as our load balancer, and while it passed functional testing and worked fine in front of internal users, when putting it in front of external users, we had to roll back due to a gradual slow down in our site's response times. After approximately half an hour, the site became unusable. We're struggling to identify the cause.
Our first thought was that the slow down came as we use cookie based sessions to one of our backend pools. In the half hour it was in front of customers, we generated a lot of sessions. We ruled this out by load testing it, creating a similar number of sessions, and the site remained responsive. In the course of load testing, we ran in to issues with ulimit and max number of open files, as indicated by the Pound log. Reviewing the logs from the external deployment, however, we don't see any corresponding log messages, so we don't think this is the cause. We don't think it can be volume of traffic related. Our current load balancer handles our traffic fine, memory and cpu utilisation of the Pound server throughout the half hour look fine, and we're only serving in the region of 2 - 300 TCP requests per second. We have Pound doing SSL decryption, and have recompiled it using --with-maxbuf=16384 to overcome an issue with some users having extremely large HTTP headers (large cookies from Google analytics). It looks like the problem manifested as failed connections, with no http response being returned. At a bit of a loss as to where to go next. Any advice would be warmly received. Jon
