Thanks for the suggestion. Keep-alive does seem to be the same for the load test and standard clients.
As an update (incase its useful to anyone else in the future), I did eventually manage to recreate the problem load testing (or at least a similar problem), and have improved, although I don't think eradicated, the problem by: * rebuilding our RPM with gperftools available * adding four more CPU cores to the VM (8 cores now) * upping Threads to 1024. I'm still experimenting, and want to try and tweak Threads on 4 cores again as the same work load is currently being served by a proprietary product on just 4 cores without any trouble - it would be disappointing if I can't match it with Pound! It's been difficult to confirm whether this is a real solution because we're killing our backend (possibly due to the strange load distribution in my earlier message). Jon On 29 August 2014 15:47, David Martineau <[email protected]> wrote: > We did a bit of coding recently on the 2.7c source for internal use. In > the process we found out things get a bit odd when the browser sends a > keep-alive directive. By odd I simply mean it was kept alive on subsequent > requests several seconds later than I would have expected. You might want > to review your load test and see if it using the same keep-alive > information that your normal clients are and if your load test framework > matches browser behavior for keep-alive. That doesn't answer your question > for Pound but it might give you insight into the difference in results. > > > On Fri, Aug 29, 2014 at 3:02 AM, Jonathan Roberts < > [email protected]> wrote: > >> Hello, >> >> We are in the process of moving to Pound as our load balancer, and while >> it passed functional testing and worked fine in front of internal users, >> when putting it in front of external users, we had to roll back due to a >> gradual slow down in our site's response times. After approximately half an >> hour, the site became unusable. We're struggling to identify the cause. >> >> Our first thought was that the slow down came as we use cookie based >> sessions to one of our backend pools. In the half hour it was in front of >> customers, we generated a lot of sessions. We ruled this out by load >> testing it, creating a similar number of sessions, and the site remained >> responsive. >> >> In the course of load testing, we ran in to issues with ulimit and max >> number of open files, as indicated by the Pound log. Reviewing the logs >> from the external deployment, however, we don't see any corresponding log >> messages, so we don't think this is the cause. >> >> We don't think it can be volume of traffic related. Our current load >> balancer handles our traffic fine, memory and cpu utilisation of the Pound >> server throughout the half hour look fine, and we're only serving in the >> region of 2 - 300 TCP requests per second. >> >> We have Pound doing SSL decryption, and have recompiled it using >> --with-maxbuf=16384 to overcome an issue with some users having extremely >> large HTTP headers (large cookies from Google analytics). >> >> It looks like the problem manifested as failed connections, with no http >> response being returned. >> >> At a bit of a loss as to where to go next. Any advice would be warmly >> received. >> >> Jon >> > > > > -- > David Martineau > CTO > ContractPal, Inc. > 801.494.1861 x120 > [email protected] <[email protected]> >
