Re: Sudden queueing to backends
Hi Sander, On Tue, Mar 10, 2020 at 10:28:38AM +0100, Sander Klein wrote: > Hi All, > > I'm looking in a strange issue I'm having and I start to think it is HAProxy > related. > > I have a setup with HAProxy serving multiple frontends and multiple backends > which are Nginx server with PHP-FPM. Sometimes all of the sudden the maxconn > limit is hit and connections get queued to a backend server and I do not > have a clue why. The backend is not overloaded, not traffic is flowing, > Nginx/PHP-FPM picks op other connections like the health checks from HAProxy > or out monitoring server, PHP-FPM is not doing anything so no long runing > processes, Nginx is doint nothing, but it does not receive any new > connection from HAProxy. Sometimes this is for 1 second, but this also > happens for as much as 30 seconds. > > It does not happen on all backend servers at once, just random at one > server. So if I have defined a backend with 2 servers it happens to only one > at a time. There could be two possible explanations to this. The first one is that the backend server is sometimes slowing down for whatever reason on the requests it is processing, resulting in maxconn being reached on haproxy. This commonly happens on applications where one request is much more expensive than most others. I've seen some sites use a dedicated backend with much lower maxconn for a search button for example because they knew this search request could take multiple seconds, and if enough of them happen at the same time, the maxconn is reached and extra requests get queued even if they could have been handled. The other possibility would be that some requests produce huge responses that take a lot of time to be consumed by the clients. And until one response finishes to be delivered there's no more slots available on the server. > I'm running HAProxy 2.0.13 on Debian Buster in a VM. I've tested with 'no > option http-use-htx' and HAProxy 2.1.3 and I see the problem on both. > Backends are Nginx with PHP-FPM and only using HTTP/1.1 over port 80, also > VM's. > > Today I disabled H2 on the frontends and now the problem seems to have > disappeared. So it seems to be releated to that part. But, I'm not sure. > How should I go on and debug this? The best way to do it is to emit "show sess all", "show info", "show stats" and "show fd" on the CLI when this happens. This will indicate if there are many connections still active to the server under trouble, and/or if there are many connections from a single source address for example. One thing that could happen with H2 would be that it's trivial to send many requests at once (100 by default), and if someone wants to have fun with your site once they've found an expensive request, sending 100 of these doesn't take more than a single TCP segment. Or if you're having heavy pages, a single H2 connection can request many objects at once and if the network link to the client is lossy, these can take a while to deliver over a single connection. And since H2 is much faster than H1 on reliable networks, but much worse on lossy networks, that could be an explanation. Do you have lots of static files ? If so it might make sense to deliver them from dedicated servers that are not subject to the very low maxconn. And if such objects are small, you could also enable some caching to reduce the number of connections to the servers when fetching them. > The config looks a bit like this (very redacted and very, very much > shortened): (...) Looks pretty fine at first glance. I'm seeing "prefer-last-server", it *might* participate to the problem if it's caused by sudden spikes, as it will encourage requests from a same client to go to the same server, but that's not necessarily the case. Willy
Sudden queueing to backends
Hi All, I'm looking in a strange issue I'm having and I start to think it is HAProxy related. I have a setup with HAProxy serving multiple frontends and multiple backends which are Nginx server with PHP-FPM. Sometimes all of the sudden the maxconn limit is hit and connections get queued to a backend server and I do not have a clue why. The backend is not overloaded, not traffic is flowing, Nginx/PHP-FPM picks op other connections like the health checks from HAProxy or out monitoring server, PHP-FPM is not doing anything so no long runing processes, Nginx is doint nothing, but it does not receive any new connection from HAProxy. Sometimes this is for 1 second, but this also happens for as much as 30 seconds. It does not happen on all backend servers at once, just random at one server. So if I have defined a backend with 2 servers it happens to only one at a time. I'm running HAProxy 2.0.13 on Debian Buster in a VM. I've tested with 'no option http-use-htx' and HAProxy 2.1.3 and I see the problem on both. Backends are Nginx with PHP-FPM and only using HTTP/1.1 over port 80, also VM's. Today I disabled H2 on the frontends and now the problem seems to have disappeared. So it seems to be releated to that part. But, I'm not sure. How should I go on and debug this? The config looks a bit like this (very redacted and very, very much shortened): global master-worker log /dev/loglocal0 log /dev/loglocal1 notice daemon userhaproxy group haproxy maxconn 32768 spread-checks 3 nbproc 1 nbthread4 stats socket/var/run/haproxy.stat mode 666 level admin # ciphers generator (https://mozilla.github.io/server-side-tls/ssl-config-generator/) ssl-server-verify none ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256 ssl-default-bind-ciphersuites TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256:TLS_CHACHA20_POLY1305_SHA256 ssl-default-bind-options no-tls-tickets ssl-min-ver TLSv1.2 ssl-default-server-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256 ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets ssl-dh-param-file /etc/haproxy/ffdhe3072.pem defaults log global timeout check 2s timeout client 60s timeout connect 10s timeout http-keep-alive 4s timeout http-request15s timeout queue 30s timeout server 60s timeout tarpit 120s errorfile 400 /etc/haproxy/errors.loc/400.http errorfile 403 /etc/haproxy/errors.loc/403.http errorfile 500 /etc/haproxy/errors.loc/500.http errorfile 502 /etc/haproxy/errors.loc/502.http errorfile 503 /etc/haproxy/errors.loc/503.http errorfile 504 /etc/haproxy/errors.loc/504.http option http-use-htx listen admin bindx.x.x.x:8080 ssl crt /some/path/ strict-sni alpn h2,http/1.1 modehttp stats enable stats uri /haproxy?stats stats auth username:password stats admin if TRUE stats refresh 5s frontend cluster1-in # LB itself bind x.x.x.x:80 transparent bind aaa:aaa:aaa::a:80 transparent bind x.x.x.x:443 transparent ssl crt /some/path/ bind aaa:aaa:aaa::a:443 transparent ssl crt /some/path/ # Mass hosting VIP bind y.y.y.y:80 transparent bind aab:aab:aab::a:80 transparent bind y.y.y.y:443 transparent ssl crt /some/cert.pem crt /another/cert.pem crt /some/path/ strict-sni alpn h2,http/1.1 bind aab:aab:aab::a:443 transparent ssl crt /some/cert.pem crt /another/cert.pem crt /some/path/ strict-sni alpn h2,http/1.1 mode http maxconn 8192 option httplog option dontlog-normal option http-ignore-probes option forwardfor capture request header Host len 64 capture request header User-Agent len 16 capture request header Content-Length len 10 capture request header Referer len 256 capture response header Content-Length len 10 # # Some security stuff starts here # acl name src -f /some/file.txt http-request deny if name http-request del-header Proxy