Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 02:36:59PM -0800, Michael Fortson wrote: > On Fri, Mar 6, 2009 at 1:46 PM, Willy Tarreau wrote: > > On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote: > >> Thanks Willy -- here's the sysctl -a |grep ^net output: > >> http://pastie.org/409735 > > > > after a quick check, I see two major things : > > - net.ipv4.tcp_max_syn_backlog = 1024 > > => far too low, increase it to 10240 and check if it helps > > > > - net.netfilter.nf_conntrack_max = 265535 > > - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 > > => this proves that netfiler is indeed running on this machine > > and might be responsible for session drops. 265k sessions is > > very low for the large time_wait. It limits to about 2k > > sessions/s, including local connections on loopback, etc... > > > > You should then increase nf_conntrack_max and nf_conntrack_buckets > > to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait > > to about 30 seconds. > > > >> Our outbound cap is 400 Mb > > > > OK so I think you're still far away from that. > > > > Regards, > > Willy > > > > > > Hmm; I did these (John is right, netfilter is down at the moment > because I dropped iptables to help troubleshoot this), What did you unload precisely ? You don't need any iptables rules for the conntrack to take effect. > so I guess the > syn backlog is the only net change. No difference so far -- still > seeing regular 3s responses. > > It's weird, but I actually see better results testing mongrel than > nginx; haproxy => mongrel heartbeat is more reliable than the haproxy > => nginx request. mongrel is on another machine ? You might be running out of some resource on the local one making it difficult to reach accept(). Unfortunately I don't see what :-( Have you checked with "dmesg" that you don't have network stack errors or any type of warning ? Willy
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 05:20:48PM -0500, John Lauro wrote: > > - net.netfilter.nf_conntrack_max = 265535 > > - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 > > => this proves that netfiler is indeed running on this machine > >and might be responsible for session drops. 265k sessions is > >very low for the large time_wait. It limits to about 2k > >sessions/s, including local connections on loopback, etc... > > > > You should then increase nf_conntrack_max and nf_conntrack_buckets > > to about nf_conntrack_max/16, and reduce > > nf_conntrack_tcp_timeout_time_wait > > to about 30 seconds. > > > > Minor nit... > He has: net.netfilter.nf_conntrack_count = 0 > Which if I am not mistaken, indicates connection tracking although in the > kernel, it is not being used. or maybe it was checked while the machine was not being used ? > (No firewall rules triggering it). you don't need firewall rules to trigger conntrack. Once loaded, it does its work. Some people even use it to defragment packets :-) Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 6, 2009 at 1:46 PM, Willy Tarreau wrote: > On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote: >> Thanks Willy -- here's the sysctl -a |grep ^net output: >> http://pastie.org/409735 > > after a quick check, I see two major things : > - net.ipv4.tcp_max_syn_backlog = 1024 > => far too low, increase it to 10240 and check if it helps > > - net.netfilter.nf_conntrack_max = 265535 > - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 > => this proves that netfiler is indeed running on this machine > and might be responsible for session drops. 265k sessions is > very low for the large time_wait. It limits to about 2k > sessions/s, including local connections on loopback, etc... > > You should then increase nf_conntrack_max and nf_conntrack_buckets > to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait > to about 30 seconds. > >> Our outbound cap is 400 Mb > > OK so I think you're still far away from that. > > Regards, > Willy > > Hmm; I did these (John is right, netfilter is down at the moment because I dropped iptables to help troubleshoot this), so I guess the syn backlog is the only net change. No difference so far -- still seeing regular 3s responses. It's weird, but I actually see better results testing mongrel than nginx; haproxy => mongrel heartbeat is more reliable than the haproxy => nginx request.
RE: measuring haproxy performance impact
> - net.netfilter.nf_conntrack_max = 265535 > - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 > => this proves that netfiler is indeed running on this machine >and might be responsible for session drops. 265k sessions is >very low for the large time_wait. It limits to about 2k >sessions/s, including local connections on loopback, etc... > > You should then increase nf_conntrack_max and nf_conntrack_buckets > to about nf_conntrack_max/16, and reduce > nf_conntrack_tcp_timeout_time_wait > to about 30 seconds. > Minor nit... He has: net.netfilter.nf_conntrack_count = 0 Which if I am not mistaken, indicates connection tracking although in the kernel, it is not being used. (No firewall rules triggering it).
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote: > Thanks Willy -- here's the sysctl -a |grep ^net output: > http://pastie.org/409735 after a quick check, I see two major things : - net.ipv4.tcp_max_syn_backlog = 1024 => far too low, increase it to 10240 and check if it helps - net.netfilter.nf_conntrack_max = 265535 - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 => this proves that netfiler is indeed running on this machine and might be responsible for session drops. 265k sessions is very low for the large time_wait. It limits to about 2k sessions/s, including local connections on loopback, etc... You should then increase nf_conntrack_max and nf_conntrack_buckets to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait to about 30 seconds. > Our outbound cap is 400 Mb OK so I think you're still far away from that. Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 6, 2009 at 12:53 PM, Willy Tarreau wrote: > On Fri, Mar 06, 2009 at 11:49:39AM -0800, Michael Fortson wrote: >> Oops, looks like it's actually Gb -> Gb: >> http://pastie.org/409653 > > ah nice ! > >> Here's a netstat -s: >> http://pastie.org/409652 > > Oh there are interesting things there : > > - 513607 failed connection attempts > => let's assume it was for dead servers > > - 34784881 segments retransmited > => this is huge, maybe your outgoing bandwidth is limited > by the provider, causing lots of drops ? > > - 8325393 SYN cookies sent > => either you've been experiencing a SYN flood attack, or > one of your listening socket's backlog is extremely small > > - 1235433 times the listen queue of a socket overflowed > 1235433 SYNs to LISTEN sockets ignored > => up to 1.2 million times some client socket experienced > a drop, causing at least a 3 seconds delay to establish. > The errors your scripts detect certainly account for a small > part of those. > > - 2962458 times recovered from packet loss due to SACK data > => many losses, related to second point above. > > Could you post the output of "sysctl -a |grep ^net" ? I think that > your TCP syn backlog is very low. Your stats page indicate an average > of about 300 sessions/s over the last 24 hours. If your external > bandwidth is capped and causes drops, you can nearly saturate the > default backlog of 1024 with 300 sessions/s each taking 3s to > complete. If you're interested, the latest snapshot will report > the number of sess/s in the stats. > >> Haproxy and nginx are currently on the same box. Mongrels are all on a >> private network accessed through eth1 (public access is via eth0). > > OK. > >> stats page attached (backend "everything" is not currently in use; >> it'll be a use-when-full option for fast_mongrels once we upgrade to >> the next haproxy). > > According to the stats, your avg output bandwidth is around 10 Mbps. > Would this match your external link ? > > Regards, > Willy > Thanks Willy -- here's the sysctl -a |grep ^net output: http://pastie.org/409735 Our outbound cap is 400 Mb
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 11:49:39AM -0800, Michael Fortson wrote: > Oops, looks like it's actually Gb -> Gb: > http://pastie.org/409653 ah nice ! > Here's a netstat -s: > http://pastie.org/409652 Oh there are interesting things there : - 513607 failed connection attempts => let's assume it was for dead servers - 34784881 segments retransmited => this is huge, maybe your outgoing bandwidth is limited by the provider, causing lots of drops ? - 8325393 SYN cookies sent => either you've been experiencing a SYN flood attack, or one of your listening socket's backlog is extremely small - 1235433 times the listen queue of a socket overflowed 1235433 SYNs to LISTEN sockets ignored => up to 1.2 million times some client socket experienced a drop, causing at least a 3 seconds delay to establish. The errors your scripts detect certainly account for a small part of those. - 2962458 times recovered from packet loss due to SACK data => many losses, related to second point above. Could you post the output of "sysctl -a |grep ^net" ? I think that your TCP syn backlog is very low. Your stats page indicate an average of about 300 sessions/s over the last 24 hours. If your external bandwidth is capped and causes drops, you can nearly saturate the default backlog of 1024 with 300 sessions/s each taking 3s to complete. If you're interested, the latest snapshot will report the number of sess/s in the stats. > Haproxy and nginx are currently on the same box. Mongrels are all on a > private network accessed through eth1 (public access is via eth0). OK. > stats page attached (backend "everything" is not currently in use; > it'll be a use-when-full option for fast_mongrels once we upgrade to > the next haproxy). According to the stats, your avg output bandwidth is around 10 Mbps. Would this match your external link ? Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 11:23:02AM -0800, Michael Fortson wrote: > On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau wrote: > > Hi Michael, > > > > On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote: > >> I'm trying to understand why our proxied requests have a much greater > >> chance of significant delay than non-proxied requests. > >> > >> The server is an 8-core (dual quad) Intel machine. Making requests > >> directly to the nginx backend is just far more reliable. Here's a > >> shell script output that just continuously requests a blank 0k image > >> file from nginx directly on its own port, and spits out a timestamp if > >> the delay isn't 0 or 1 seconds: > >> > >> Thu Mar 5 12:36:17 PST 2009 > >> beginning continuous test of nginx port 8080 > >> Thu Mar 5 12:38:06 PST 2009 > >> Nginx Time is 2 seconds > >> > >> > >> > >> Here's the same test running through haproxy, simultaneously: > >> > >> Thu Mar 5 12:36:27 PST 2009 > >> beginning continuous test of haproxy port 80 > >> Thu Mar 5 12:39:39 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:39:48 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:39:55 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:40:03 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:40:45 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:40:48 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:40:55 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:40:58 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:41:55 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:42:01 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:42:08 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:42:29 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:42:38 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:43:05 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:43:15 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:08 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:25 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:30 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:33 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:39 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:46 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:44:54 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:45:07 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:45:16 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:45:45 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:45:54 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:45:58 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:05 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:08 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:32 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:48 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:53 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:46:58 PST 2009 > >> Nginx Time is 3 seconds > >> Thu Mar 5 12:47:40 PST 2009 > >> Nginx Time is 3 seconds > > > > 3 seconds is typically a TCP retransmit. You have network losses somewhere > > from/to your haproxy. Would you happen to be running on a gigabit port > > connected to a 100 Mbps switch ? What type of NIC is this ? I've seen > > many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares, > > but it seems to work fine for other people after a firmware upgrade. > > > >> My sanitized haproxy config is here (mongrel backend was omitted for > >> brevity) : > >> http://pastie.org/408729 > >> > >> Are the ACLs just too expensive? > > > > Not at all. Especially in your case. To reach 3 seconds of latency, you > > would > > need hundreds of thousands of ACLs, so this is clearly unrelated to your > > config. > > > >> Nginx is running with 4 processes, and the box shows mostly idle. > > > > ... which indicates that you aren't burning CPU cycles processing ACLs ;-) > > > > It is also possible that some TCP settings are too low for your load, but > > I don't know what your load is. Above a few hundreds-thousands of sessions > > per second, you will need to do some tuning, otherwise you can end up with > > similar situations. > > > > Regards, > > Willy > > > > > > Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount > servers and switches). OK so then please check with ethtool if your port is running in half or full duplex : # ethtool eth0 Most often, 100 Mbps switches are forced to 100-full without autoneg, and gig ports in front of them see them as half thinking they are hubs. > The nginx backend runs on the same machine as > haproxy and is referenced via 127.0.0.1 -- does that still involve a > real network port? Should I try the test all on localhost to isolate > it from any networking retransmits? Yes if you can do that, that would be nice. If the issue persists, we'll have to check the network stack tuning, but that's gett
Re: measuring haproxy performance impact
On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau wrote: > Hi Michael, > > On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote: >> I'm trying to understand why our proxied requests have a much greater >> chance of significant delay than non-proxied requests. >> >> The server is an 8-core (dual quad) Intel machine. Making requests >> directly to the nginx backend is just far more reliable. Here's a >> shell script output that just continuously requests a blank 0k image >> file from nginx directly on its own port, and spits out a timestamp if >> the delay isn't 0 or 1 seconds: >> >> Thu Mar 5 12:36:17 PST 2009 >> beginning continuous test of nginx port 8080 >> Thu Mar 5 12:38:06 PST 2009 >> Nginx Time is 2 seconds >> >> >> >> Here's the same test running through haproxy, simultaneously: >> >> Thu Mar 5 12:36:27 PST 2009 >> beginning continuous test of haproxy port 80 >> Thu Mar 5 12:39:39 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:39:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:39:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:03 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:45 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:40:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:41:55 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:01 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:29 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:42:38 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:43:05 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:43:15 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:25 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:30 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:33 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:39 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:46 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:44:54 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:07 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:16 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:45 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:54 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:45:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:05 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:08 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:32 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:48 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:53 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:46:58 PST 2009 >> Nginx Time is 3 seconds >> Thu Mar 5 12:47:40 PST 2009 >> Nginx Time is 3 seconds > > 3 seconds is typically a TCP retransmit. You have network losses somewhere > from/to your haproxy. Would you happen to be running on a gigabit port > connected to a 100 Mbps switch ? What type of NIC is this ? I've seen > many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares, > but it seems to work fine for other people after a firmware upgrade. > >> My sanitized haproxy config is here (mongrel backend was omitted for >> brevity) : >> http://pastie.org/408729 >> >> Are the ACLs just too expensive? > > Not at all. Especially in your case. To reach 3 seconds of latency, you would > need hundreds of thousands of ACLs, so this is clearly unrelated to your > config. > >> Nginx is running with 4 processes, and the box shows mostly idle. > > ... which indicates that you aren't burning CPU cycles processing ACLs ;-) > > It is also possible that some TCP settings are too low for your load, but > I don't know what your load is. Above a few hundreds-thousands of sessions > per second, you will need to do some tuning, otherwise you can end up with > similar situations. > > Regards, > Willy > > Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount servers and switches). The nginx backend runs on the same machine as haproxy and is referenced via 127.0.0.1 -- does that still involve a real network port? Should I try the test all on localhost to isolate it from any networking retransmits? Here's a peek at the stats page after about a day of running (this should help demonstrate current loading) http://pastie.org/409632
Re: measuring haproxy performance impact
Hi Michael, On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote: > I'm trying to understand why our proxied requests have a much greater > chance of significant delay than non-proxied requests. > > The server is an 8-core (dual quad) Intel machine. Making requests > directly to the nginx backend is just far more reliable. Here's a > shell script output that just continuously requests a blank 0k image > file from nginx directly on its own port, and spits out a timestamp if > the delay isn't 0 or 1 seconds: > > Thu Mar 5 12:36:17 PST 2009 > beginning continuous test of nginx port 8080 > Thu Mar 5 12:38:06 PST 2009 > Nginx Time is 2 seconds > > > > Here's the same test running through haproxy, simultaneously: > > Thu Mar 5 12:36:27 PST 2009 > beginning continuous test of haproxy port 80 > Thu Mar 5 12:39:39 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:39:48 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:39:55 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:40:03 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:40:45 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:40:48 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:40:55 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:40:58 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:41:55 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:42:01 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:42:08 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:42:29 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:42:38 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:43:05 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:43:15 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:08 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:25 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:30 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:33 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:39 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:46 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:44:54 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:45:07 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:45:16 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:45:45 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:45:54 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:45:58 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:05 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:08 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:32 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:48 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:53 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:46:58 PST 2009 > Nginx Time is 3 seconds > Thu Mar 5 12:47:40 PST 2009 > Nginx Time is 3 seconds 3 seconds is typically a TCP retransmit. You have network losses somewhere from/to your haproxy. Would you happen to be running on a gigabit port connected to a 100 Mbps switch ? What type of NIC is this ? I've seen many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares, but it seems to work fine for other people after a firmware upgrade. > My sanitized haproxy config is here (mongrel backend was omitted for brevity) > : > http://pastie.org/408729 > > Are the ACLs just too expensive? Not at all. Especially in your case. To reach 3 seconds of latency, you would need hundreds of thousands of ACLs, so this is clearly unrelated to your config. > Nginx is running with 4 processes, and the box shows mostly idle. ... which indicates that you aren't burning CPU cycles processing ACLs ;-) It is also possible that some TCP settings are too low for your load, but I don't know what your load is. Above a few hundreds-thousands of sessions per second, you will need to do some tuning, otherwise you can end up with similar situations. Regards, Willy