Thanks for the suggestion. Keep-alive does seem to be the same for the load
test and standard clients.

As an update (incase its useful to anyone else in the future), I did
eventually manage to recreate the problem load testing (or at least a
similar problem), and have improved, although I don't think eradicated, the
problem by:

 * rebuilding our RPM with gperftools available
 * adding four more CPU cores to the VM (8 cores now)
 * upping Threads to 1024.

I'm still experimenting, and want to try and tweak Threads on 4 cores again
as the same work load is currently being served by a proprietary product on
just 4 cores without any trouble - it would be disappointing if I can't
match it with Pound!

It's been difficult to confirm whether this is a real solution because
we're killing our backend (possibly due to the strange load distribution in
my earlier message).

Jon


On 29 August 2014 15:47, David Martineau <[email protected]>
wrote:

> We did a bit of coding recently on the 2.7c source for internal use.  In
> the process we found out things get a bit odd when the browser sends a
> keep-alive directive.  By odd I simply mean it was kept alive on subsequent
> requests several seconds later than I would have expected.  You might want
> to review your load test and see if it using the same keep-alive
> information that your normal clients are and if your load test framework
> matches browser behavior for keep-alive.  That doesn't answer your question
> for Pound but it might give you insight into the difference in results.
>
>
> On Fri, Aug 29, 2014 at 3:02 AM, Jonathan Roberts <
> [email protected]> wrote:
>
>> Hello,
>>
>> We are in the process of moving to Pound as our load balancer, and while
>> it passed functional testing and worked fine in front of internal users,
>> when putting it in front of external users, we had to roll back due to a
>> gradual slow down in our site's response times. After approximately half an
>> hour, the site became unusable. We're struggling to identify the cause.
>>
>> Our first thought was that the slow down came as we use cookie based
>> sessions to one of our backend pools. In the half hour it was in front of
>> customers, we generated a lot of sessions. We ruled this out by load
>> testing it, creating a similar number of sessions, and the site remained
>> responsive.
>>
>> In the course of load testing, we ran in to issues with ulimit and max
>> number of open files, as indicated by the Pound log. Reviewing the logs
>> from the external deployment, however, we don't see any corresponding log
>> messages, so we don't think this is the cause.
>>
>> We don't think it can be volume of traffic related. Our current load
>> balancer handles our traffic fine, memory and cpu utilisation of the Pound
>> server throughout the half hour look fine, and we're only serving in the
>> region of 2 - 300 TCP requests per second.
>>
>> We have Pound doing SSL decryption, and have recompiled it using
>> --with-maxbuf=16384 to overcome an issue with some users having extremely
>> large HTTP headers (large cookies from Google analytics).
>>
>> It looks like the problem manifested as failed connections, with no http
>> response being returned.
>>
>> At a bit of a loss as to where to go next. Any advice would be warmly
>> received.
>>
>> Jon
>>
>
>
>
> --
> David Martineau
> CTO
> ContractPal, Inc.
> 801.494.1861 x120
> [email protected] <[email protected]>
>

Reply via email to