Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
On Fri, Mar 06, 2009 at 02:36:59PM -0800, Michael Fortson wrote:
> On Fri, Mar 6, 2009 at 1:46 PM, Willy Tarreau  wrote:
> > On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote:
> >> Thanks Willy -- here's the sysctl -a |grep ^net output:
> >> http://pastie.org/409735
> >
> > after a quick check, I see two major things :
> >  - net.ipv4.tcp_max_syn_backlog = 1024
> >    => far too low, increase it to 10240 and check if it helps
> >
> >  - net.netfilter.nf_conntrack_max = 265535
> >  - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
> >    => this proves that netfiler is indeed running on this machine
> >       and might be responsible for session drops. 265k sessions is
> >       very low for the large time_wait. It limits to about 2k
> >       sessions/s, including local connections on loopback, etc...
> >
> > You should then increase nf_conntrack_max and nf_conntrack_buckets
> > to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait
> > to about 30 seconds.
> >
> >> Our outbound cap is 400 Mb
> >
> > OK so I think you're still far away from that.
> >
> > Regards,
> > Willy
> >
> >
> 
> Hmm; I did these (John is right, netfilter is down at the moment
> because I dropped iptables to help troubleshoot this),

What did you unload precisely ? You don't need any iptables rules
for the conntrack to take effect.

> so I guess the
> syn backlog is the only net change. No difference so far -- still
> seeing regular 3s responses.
> 
> It's weird, but I actually see better results testing mongrel than
> nginx; haproxy => mongrel heartbeat is more reliable than the haproxy
> => nginx request.

mongrel is on another machine ? You might be running out of some
resource on the local one making it difficult to reach accept().
Unfortunately I don't see what :-(

Have you checked with "dmesg" that you don't have network stack
errors or any type of warning ?

Willy




Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
On Fri, Mar 06, 2009 at 05:20:48PM -0500, John Lauro wrote:
> >   - net.netfilter.nf_conntrack_max = 265535
> >   - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
> > => this proves that netfiler is indeed running on this machine
> >and might be responsible for session drops. 265k sessions is
> >very low for the large time_wait. It limits to about 2k
> >sessions/s, including local connections on loopback, etc...
> > 
> > You should then increase nf_conntrack_max and nf_conntrack_buckets
> > to about nf_conntrack_max/16, and reduce
> > nf_conntrack_tcp_timeout_time_wait
> > to about 30 seconds.
> > 
> 
> Minor nit...
> He has:  net.netfilter.nf_conntrack_count = 0
> Which if I am not mistaken, indicates connection tracking although in the
> kernel, it is not being used.

or maybe it was checked while the machine was not being used ?

>  (No firewall rules triggering it).

you don't need firewall rules to trigger conntrack. Once loaded,
it does its work. Some people even use it to defragment packets :-)

Regards,
Willy




Re: measuring haproxy performance impact

2009-03-06 Thread Michael Fortson
On Fri, Mar 6, 2009 at 1:46 PM, Willy Tarreau  wrote:
> On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote:
>> Thanks Willy -- here's the sysctl -a |grep ^net output:
>> http://pastie.org/409735
>
> after a quick check, I see two major things :
>  - net.ipv4.tcp_max_syn_backlog = 1024
>    => far too low, increase it to 10240 and check if it helps
>
>  - net.netfilter.nf_conntrack_max = 265535
>  - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
>    => this proves that netfiler is indeed running on this machine
>       and might be responsible for session drops. 265k sessions is
>       very low for the large time_wait. It limits to about 2k
>       sessions/s, including local connections on loopback, etc...
>
> You should then increase nf_conntrack_max and nf_conntrack_buckets
> to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait
> to about 30 seconds.
>
>> Our outbound cap is 400 Mb
>
> OK so I think you're still far away from that.
>
> Regards,
> Willy
>
>

Hmm; I did these (John is right, netfilter is down at the moment
because I dropped iptables to help troubleshoot this), so I guess the
syn backlog is the only net change. No difference so far -- still
seeing regular 3s responses.

It's weird, but I actually see better results testing mongrel than
nginx; haproxy => mongrel heartbeat is more reliable than the haproxy
=> nginx request.



RE: measuring haproxy performance impact

2009-03-06 Thread John Lauro
>   - net.netfilter.nf_conntrack_max = 265535
>   - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
> => this proves that netfiler is indeed running on this machine
>and might be responsible for session drops. 265k sessions is
>very low for the large time_wait. It limits to about 2k
>sessions/s, including local connections on loopback, etc...
> 
> You should then increase nf_conntrack_max and nf_conntrack_buckets
> to about nf_conntrack_max/16, and reduce
> nf_conntrack_tcp_timeout_time_wait
> to about 30 seconds.
> 

Minor nit...
He has:  net.netfilter.nf_conntrack_count = 0
Which if I am not mistaken, indicates connection tracking although in the
kernel, it is not being used.  (No firewall rules triggering it).






Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote:
> Thanks Willy -- here's the sysctl -a |grep ^net output:
> http://pastie.org/409735

after a quick check, I see two major things :
  - net.ipv4.tcp_max_syn_backlog = 1024
=> far too low, increase it to 10240 and check if it helps

  - net.netfilter.nf_conntrack_max = 265535
  - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
=> this proves that netfiler is indeed running on this machine
   and might be responsible for session drops. 265k sessions is
   very low for the large time_wait. It limits to about 2k
   sessions/s, including local connections on loopback, etc...

You should then increase nf_conntrack_max and nf_conntrack_buckets
to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait
to about 30 seconds.

> Our outbound cap is 400 Mb

OK so I think you're still far away from that.

Regards,
Willy




Re: measuring haproxy performance impact

2009-03-06 Thread Michael Fortson
On Fri, Mar 6, 2009 at 12:53 PM, Willy Tarreau  wrote:
> On Fri, Mar 06, 2009 at 11:49:39AM -0800, Michael Fortson wrote:
>> Oops, looks like it's actually Gb -> Gb:
>> http://pastie.org/409653
>
> ah nice !
>
>> Here's a netstat -s:
>> http://pastie.org/409652
>
> Oh there are interesting things there :
>
>  - 513607 failed connection attempts
>    => let's assume it was for dead servers
>
>  - 34784881 segments retransmited
>    => this is huge, maybe your outgoing bandwidth is limited
>       by the provider, causing lots of drops ?
>
>  - 8325393 SYN cookies sent
>    => either you've been experiencing a SYN flood attack, or
>       one of your listening socket's backlog is extremely small
>
>  -  1235433 times the listen queue of a socket overflowed
>     1235433 SYNs to LISTEN sockets ignored
>     => up to 1.2 million times some client socket experienced
>        a drop, causing at least a 3 seconds delay to establish.
>     The errors your scripts detect certainly account for a small
>     part of those.
>
>  - 2962458 times recovered from packet loss due to SACK data
>    => many losses, related to second point above.
>
> Could you post the output of "sysctl -a |grep ^net" ? I think that
> your TCP syn backlog is very low. Your stats page indicate an average
> of about 300 sessions/s over the last 24 hours. If your external
> bandwidth is capped and causes drops, you can nearly saturate the
> default backlog of 1024 with 300 sessions/s each taking 3s to
> complete. If you're interested, the latest snapshot will report
> the number of sess/s in the stats.
>
>> Haproxy and nginx are currently on the same box. Mongrels are all on a
>> private network accessed through eth1 (public access is via eth0).
>
> OK.
>
>> stats page attached (backend "everything" is not currently in use;
>> it'll be a use-when-full option for fast_mongrels once we upgrade to
>> the next haproxy).
>
> According to the stats, your avg output bandwidth is around 10 Mbps.
> Would this match your external link ?
>
> Regards,
> Willy
>

Thanks Willy -- here's the sysctl -a |grep ^net output:
http://pastie.org/409735

Our outbound cap is 400 Mb



Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
On Fri, Mar 06, 2009 at 11:49:39AM -0800, Michael Fortson wrote:
> Oops, looks like it's actually Gb -> Gb:
> http://pastie.org/409653

ah nice !

> Here's a netstat -s:
> http://pastie.org/409652

Oh there are interesting things there :

  - 513607 failed connection attempts
=> let's assume it was for dead servers

  - 34784881 segments retransmited
=> this is huge, maybe your outgoing bandwidth is limited
   by the provider, causing lots of drops ?

  - 8325393 SYN cookies sent
=> either you've been experiencing a SYN flood attack, or
   one of your listening socket's backlog is extremely small

  -  1235433 times the listen queue of a socket overflowed
 1235433 SYNs to LISTEN sockets ignored
 => up to 1.2 million times some client socket experienced
a drop, causing at least a 3 seconds delay to establish.
 The errors your scripts detect certainly account for a small
 part of those.

  - 2962458 times recovered from packet loss due to SACK data
=> many losses, related to second point above.

Could you post the output of "sysctl -a |grep ^net" ? I think that
your TCP syn backlog is very low. Your stats page indicate an average
of about 300 sessions/s over the last 24 hours. If your external
bandwidth is capped and causes drops, you can nearly saturate the
default backlog of 1024 with 300 sessions/s each taking 3s to
complete. If you're interested, the latest snapshot will report
the number of sess/s in the stats.

> Haproxy and nginx are currently on the same box. Mongrels are all on a
> private network accessed through eth1 (public access is via eth0).

OK.

> stats page attached (backend "everything" is not currently in use;
> it'll be a use-when-full option for fast_mongrels once we upgrade to
> the next haproxy).

According to the stats, your avg output bandwidth is around 10 Mbps.
Would this match your external link ?

Regards,
Willy




Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
On Fri, Mar 06, 2009 at 11:23:02AM -0800, Michael Fortson wrote:
> On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau  wrote:
> > Hi Michael,
> >
> > On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote:
> >> I'm trying to understand why our proxied requests have a much greater
> >> chance of significant delay than non-proxied requests.
> >>
> >> The server is an 8-core (dual quad) Intel machine. Making requests
> >> directly to the nginx backend is just far more reliable. Here's a
> >> shell script output that just continuously requests a blank 0k image
> >> file from nginx directly on its own port, and spits out a timestamp if
> >> the delay isn't 0 or 1 seconds:
> >>
> >> Thu Mar 5 12:36:17 PST 2009
> >> beginning continuous test of nginx port 8080
> >> Thu Mar 5 12:38:06 PST 2009
> >> Nginx Time is 2 seconds
> >>
> >>
> >>
> >> Here's the same test running through haproxy, simultaneously:
> >>
> >> Thu Mar 5 12:36:27 PST 2009
> >> beginning continuous test of haproxy port 80
> >> Thu Mar 5 12:39:39 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:39:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:39:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:03 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:45 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:40:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:41:55 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:01 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:29 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:42:38 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:43:05 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:43:15 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:25 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:30 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:33 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:39 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:46 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:44:54 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:07 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:16 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:45 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:54 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:45:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:05 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:08 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:32 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:48 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:53 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:46:58 PST 2009
> >> Nginx Time is 3 seconds
> >> Thu Mar 5 12:47:40 PST 2009
> >> Nginx Time is 3 seconds
> >
> > 3 seconds is typically a TCP retransmit. You have network losses somewhere
> > from/to your haproxy. Would you happen to be running on a gigabit port
> > connected to a 100 Mbps switch ? What type of NIC is this ? I've seen
> > many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares,
> > but it seems to work fine for other people after a firmware upgrade.
> >
> >> My sanitized haproxy config is here (mongrel backend was omitted for 
> >> brevity) :
> >> http://pastie.org/408729
> >>
> >> Are the ACLs just too expensive?
> >
> > Not at all. Especially in your case. To reach 3 seconds of latency, you 
> > would
> > need hundreds of thousands of ACLs, so this is clearly unrelated to your 
> > config.
> >
> >> Nginx is running with 4 processes, and the box shows mostly idle.
> >
> > ... which indicates that you aren't burning CPU cycles processing ACLs ;-)
> >
> > It is also possible that some TCP settings are too low for your load, but
> > I don't know what your load is. Above a few hundreds-thousands of sessions
> > per second, you will need to do some tuning, otherwise you can end up with
> > similar situations.
> >
> > Regards,
> > Willy
> >
> >
> 
> Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount
> servers and switches).

OK so then please check with ethtool if your port is running in half
or full duplex :

# ethtool eth0

Most often, 100 Mbps switches are forced to 100-full without autoneg,
and gig ports in front of them see them as half thinking they are hubs.

> The nginx backend runs on the same machine as
> haproxy and is referenced via 127.0.0.1 -- does that still involve a
> real network port? Should I try the test all on localhost to isolate
> it from any networking retransmits?

Yes if you can do that, that would be nice. If the issue persists,
we'll have to check the network stack tuning, but that's gett

Re: measuring haproxy performance impact

2009-03-06 Thread Michael Fortson
On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau  wrote:
> Hi Michael,
>
> On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote:
>> I'm trying to understand why our proxied requests have a much greater
>> chance of significant delay than non-proxied requests.
>>
>> The server is an 8-core (dual quad) Intel machine. Making requests
>> directly to the nginx backend is just far more reliable. Here's a
>> shell script output that just continuously requests a blank 0k image
>> file from nginx directly on its own port, and spits out a timestamp if
>> the delay isn't 0 or 1 seconds:
>>
>> Thu Mar 5 12:36:17 PST 2009
>> beginning continuous test of nginx port 8080
>> Thu Mar 5 12:38:06 PST 2009
>> Nginx Time is 2 seconds
>>
>>
>>
>> Here's the same test running through haproxy, simultaneously:
>>
>> Thu Mar 5 12:36:27 PST 2009
>> beginning continuous test of haproxy port 80
>> Thu Mar 5 12:39:39 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:39:48 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:39:55 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:40:03 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:40:45 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:40:48 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:40:55 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:40:58 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:41:55 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:42:01 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:42:08 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:42:29 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:42:38 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:43:05 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:43:15 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:08 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:25 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:30 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:33 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:39 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:46 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:44:54 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:45:07 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:45:16 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:45:45 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:45:54 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:45:58 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:05 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:08 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:32 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:48 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:53 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:46:58 PST 2009
>> Nginx Time is 3 seconds
>> Thu Mar 5 12:47:40 PST 2009
>> Nginx Time is 3 seconds
>
> 3 seconds is typically a TCP retransmit. You have network losses somewhere
> from/to your haproxy. Would you happen to be running on a gigabit port
> connected to a 100 Mbps switch ? What type of NIC is this ? I've seen
> many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares,
> but it seems to work fine for other people after a firmware upgrade.
>
>> My sanitized haproxy config is here (mongrel backend was omitted for 
>> brevity) :
>> http://pastie.org/408729
>>
>> Are the ACLs just too expensive?
>
> Not at all. Especially in your case. To reach 3 seconds of latency, you would
> need hundreds of thousands of ACLs, so this is clearly unrelated to your 
> config.
>
>> Nginx is running with 4 processes, and the box shows mostly idle.
>
> ... which indicates that you aren't burning CPU cycles processing ACLs ;-)
>
> It is also possible that some TCP settings are too low for your load, but
> I don't know what your load is. Above a few hundreds-thousands of sessions
> per second, you will need to do some tuning, otherwise you can end up with
> similar situations.
>
> Regards,
> Willy
>
>

Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount
servers and switches). The nginx backend runs on the same machine as
haproxy and is referenced via 127.0.0.1 -- does that still involve a
real network port? Should I try the test all on localhost to isolate
it from any networking retransmits?

Here's a peek at the stats page after about a day of running (this
should help demonstrate current loading)
http://pastie.org/409632



Re: measuring haproxy performance impact

2009-03-06 Thread Willy Tarreau
Hi Michael,

On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote:
> I'm trying to understand why our proxied requests have a much greater
> chance of significant delay than non-proxied requests.
> 
> The server is an 8-core (dual quad) Intel machine. Making requests
> directly to the nginx backend is just far more reliable. Here's a
> shell script output that just continuously requests a blank 0k image
> file from nginx directly on its own port, and spits out a timestamp if
> the delay isn't 0 or 1 seconds:
> 
> Thu Mar 5 12:36:17 PST 2009
> beginning continuous test of nginx port 8080
> Thu Mar 5 12:38:06 PST 2009
> Nginx Time is 2 seconds
> 
> 
> 
> Here's the same test running through haproxy, simultaneously:
> 
> Thu Mar 5 12:36:27 PST 2009
> beginning continuous test of haproxy port 80
> Thu Mar 5 12:39:39 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:39:48 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:39:55 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:40:03 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:40:45 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:40:48 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:40:55 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:40:58 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:41:55 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:42:01 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:42:08 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:42:29 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:42:38 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:43:05 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:43:15 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:08 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:25 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:30 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:33 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:39 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:46 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:44:54 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:45:07 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:45:16 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:45:45 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:45:54 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:45:58 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:05 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:08 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:32 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:48 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:53 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:46:58 PST 2009
> Nginx Time is 3 seconds
> Thu Mar 5 12:47:40 PST 2009
> Nginx Time is 3 seconds

3 seconds is typically a TCP retransmit. You have network losses somewhere
from/to your haproxy. Would you happen to be running on a gigabit port
connected to a 100 Mbps switch ? What type of NIC is this ? I've seen
many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares,
but it seems to work fine for other people after a firmware upgrade.

> My sanitized haproxy config is here (mongrel backend was omitted for brevity) 
> :
> http://pastie.org/408729
> 
> Are the ACLs just too expensive?

Not at all. Especially in your case. To reach 3 seconds of latency, you would
need hundreds of thousands of ACLs, so this is clearly unrelated to your config.

> Nginx is running with 4 processes, and the box shows mostly idle.

... which indicates that you aren't burning CPU cycles processing ACLs ;-)

It is also possible that some TCP settings are too low for your load, but
I don't know what your load is. Above a few hundreds-thousands of sessions
per second, you will need to do some tuning, otherwise you can end up with
similar situations.

Regards,
Willy