Re: Theoretical limits for a HAProxy instance
Hi, On Mon, 2022-12-12 at 09:47 +0100, Iago Alonso wrote: > Can you share haproxy -vv output ? > HAProxy config: > global > log /dev/log len 65535 local0 warning > chroot /var/lib/haproxy > stats socket /run/haproxy-admin.sock mode 660 level admin > user haproxy > group haproxy > daemon > maxconn 200 > maxconnrate 2500 > maxsslrate 2500 From your graphs (haproxy_process_current_ssl_rate / haproxy_process_current_connection_rate) you might hit maxconnrate/maxsslrate -Jarno -- Jarno Huuskonen
Re: Theoretical limits for a HAProxy instance
Hi, We do hit our defined max ssl/conn rates, but given the spare resources, we don't expect to suddenly return 5xx. Here's the output of `haproxy -vv` (I've also added it to the post on discourse): HAProxy version 2.6.6-274d1a4 2022/09/22 - https://haproxy.org/ Status: long-term supported branch - will stop receiving fixes around Q2 2027. Known bugs: http://www.haproxy.org/bugs/bugs-2.6.6.html Running on: Linux 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 Build options : TARGET = linux-glibc CPU = generic CC = cc CFLAGS = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment OPTIONS = USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1 USE_PROMEX=1 DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL -LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=32). Built with OpenSSL version : OpenSSL 3.0.7 1 Nov 2022 Running on OpenSSL version : OpenSSL 3.0.7 1 Nov 2022 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 OpenSSL providers loaded : default Built with the Prometheus exporter as a service Built with network namespace support. Support for malloc_trim() is enabled. Built with zlib version : 1.2.11 Running on zlib version : 1.2.11 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with PCRE version : 8.39 2016-06-14 Running on PCRE version : 8.39 2016-06-14 PCRE library supports JIT : no (USE_PCRE_JIT not set) Encrypted password support via crypt(3): yes Built with gcc compiler version 11.3.0 Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTTP side=FE|BE mux=H2flags=HTX|HOL_RISK|NO_UPG fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG : mode=HTTP side=FE|BE mux=H1flags=HTX h1 : mode=HTTP side=FE|BE mux=H1flags=HTX|NO_UPG : mode=TCP side=FE|BE mux=PASS flags= none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG Available services : prometheus-exporter Available filters : [CACHE] cache [COMP] compression [FCGI] fcgi-app [SPOE] spoe [TRACE] trace On Mon, Dec 12, 2022 at 2:50 PM Jarno Huuskonen wrote: > > Hi, > > On Mon, 2022-12-12 at 09:47 +0100, Iago Alonso wrote: > > > > Can you share haproxy -vv output ? > > > HAProxy config: > > global > > log /dev/log len 65535 local0 warning > > chroot /var/lib/haproxy > > stats socket /run/haproxy-admin.sock mode 660 level admin > > user haproxy > > group haproxy > > daemon > > maxconn 200 > > maxconnrate 2500 > > maxsslrate 2500 > > From your graphs (haproxy_process_current_ssl_rate / > haproxy_process_current_connection_rate) you might hit > maxconnrate/maxsslrate > > -Jarno > > -- > Jarno Huuskonen
Re: Theoretical limits for a HAProxy instance
Hi, Have you tried increasing the number of processes/threads? I dont see any nbthreads or nbproc in your config. Check out https://www.haproxy.com/blog/multithreading-in-haproxy/ BR., Emerson Em seg., 12 de dez. de 2022 às 02:49, Iago Alonso escreveu: > Hello, > > We are performing a lot of load tests, and we hit what we think is an > artificial limit of some sort, or a parameter that we are not taking > into account (HAProxy config setting, kernel parameter…). We are > wondering if there’s a known limit on what HAProxy is able to process, > or if someone has experienced something similar, as we are thinking > about moving to bigger servers, and we don’t know if we will observe a > big difference. > > When trying to perform the load test in production, we observe that we > can sustain 200k connections, and 10k rps, with a load1 of about 10. > The maxsslrate and maxsslconn are maxed out, but we handle the > requests fine, and we don’t return 5xx. Once we increase the load just > a bit and hit 11k rps and about 205k connections, we start to return > 5xx and we rapidly decrease the load, as these are tests against > production. > > Production server specs: > CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads) > RAM: DDR4 64GB (2666 MT/s) > > When trying to perform a load test with synthetic tests using k6 as > our load generator against staging, we are able to sustain 750k > connections, with 20k rps. The load generator has a ramp-up time of > 120s to achieve the 750k connections, as that’s what we are trying to > benchmark. > > Staging server specs: > CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads) > RAM: DDR4 64GB (3200 MT/s) > > I've made a post about this on discourse, and I got the suggestion to > post here. In said post, I've included screenshots of some of our > Prometheus metrics. > > https://discourse.haproxy.org/t/theoretical-limits-for-a-haproxy-instance/8168 > > Custom kernel parameters: > net.ipv4.ip_local_port_range = "1276860999" > net.nf_conntrack_max = 500 > fs.nr_open = 500 > > HAProxy config: > global > log /dev/log len 65535 local0 warning > chroot /var/lib/haproxy > stats socket /run/haproxy-admin.sock mode 660 level admin > user haproxy > group haproxy > daemon > maxconn 200 > maxconnrate 2500 > maxsslrate 2500 > > defaults > log global > option dontlognull > timeout connect 10s > timeout client 120s > timeout server 120s > > frontend stats > mode http > bind *:8404 > http-request use-service prometheus-exporter if { path /metrics } > stats enable > stats uri /stats > stats refresh 10s > > frontend k8s-api > bind *:6443 > mode tcp > option tcplog > timeout client 300s > default_backend k8s-api > > backend k8s-api > mode tcp > option tcp-check > timeout server 300s > balance leastconn > default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s > maxconn 500 maxqueue 256 weight 100 > server master01 x.x.x.x:6443 check > server master02 x.x.x.x:6443 check > server master03 x.x.x.x:6443 check > retries 0 > > frontend k8s-server > bind *:80 > mode http > http-request add-header X-Forwarded-Proto http > http-request add-header X-Forwarded-Port 80 > default_backend k8s-server > > backend k8s-server > mode http > balance leastconn > option forwardfor > default-server inter 10s downinter 5s rise 2 fall 2 check > server worker01a x.x.x.x:31551 maxconn 20 > server worker02a x.x.x.x:31551 maxconn 20 > server worker03a x.x.x.x:31551 maxconn 20 > server worker04a x.x.x.x:31551 maxconn 20 > server worker05a x.x.x.x:31551 maxconn 20 > server worker06a x.x.x.x:31551 maxconn 20 > server worker07a x.x.x.x:31551 maxconn 20 > server worker08a x.x.x.x:31551 maxconn 20 > server worker09a x.x.x.x:31551 maxconn 20 > server worker10a x.x.x.x:31551 maxconn 20 > server worker11a x.x.x.x:31551 maxconn 20 > server worker12a x.x.x.x:31551 maxconn 20 > server worker13a x.x.x.x:31551 maxconn 20 > server worker14a x.x.x.x:31551 maxconn 20 > server worker15a x.x.x.x:31551 maxconn 20 > server worker16a x.x.x.x:31551 maxconn 20 > server worker17a x.x.x.x:31551 maxconn 20 > server worker18a x.x.x.x:31551 maxconn 20 > server worker19a x.x.x.x:31551 maxconn 20 > server worker20a x.x.x.x:31551 maxconn 20 > server worker01an x.x.x.x:31551 maxconn 20 > server worker02an x.x.x.x:31551 maxconn 20 > server worker03an x.x.x.x:31551 maxconn 20 > retries 0 > > frontend k8s-server-https > bind *:443 ssl crt /etc/haproxy/certs/ > mode http > http-request add-header X-Forwarded-Proto https > http-request add-header X-Forwarded-Port 443 > http-request del-header X-SERVER-SNI > http-request set-header X-SERVER-SNI %[ssl_
Re: Theoretical limits for a HAProxy instance
Hi, On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote: > Hi, > > We do hit our defined max ssl/conn rates, but given the spare > resources, we don't expect to suddenly return 5xx. What bothers me is that once this limit is reached there's no more connection accepted by haproxy so you should indeed not see such errors. What element produces these errors ? Since you've enabled logging, can you check them in your logs and figure whether they're sent by the server, by haproxy or none (it might well be the load generator translating connection timeouts to 5xx for user reporting). If the errors are produced by haproxy, then their exact value and the termination flags are very important as they'll indicate where the problem is. Another thing you'll observe in your logs are the server's response time. It's possible that your request rate is getting very close to the server's limits. Among other limitations the packet rate and network bandwidth might represent a limit. For example, let's say you're requesting 10kB objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't pass anymore. When you look at the haproxy stats page, the cumulated live network bandwidth is reported at the top, it might give you an indication of a possible limit. But in any case, stats on the status codes and termination codes in logs would be extremely useful. Note that you can do this natively using halog (I seem to remember it's halog -st and halog -tc, but please double-check, and in any case, cut|sort|uniq -c always works). Regards, Willy
Re: Theoretical limits for a HAProxy instance
Hi, We are letting HAProxy pick what it think is best for our hardware. It seems to always pick `nbproc = 1` (we only have one cpu socket), and `nbthread = $(nproc)` On 14/12/22 3:15, Emerson Gomes wrote: Hi, Have you tried increasing the number of processes/threads? I dont see any nbthreads or nbproc in your config. Check out https://www.haproxy.com/blog/multithreading-in-haproxy/ BR., Emerson Em seg., 12 de dez. de 2022 às 02:49, Iago Alonso escreveu: Hello, We are performing a lot of load tests, and we hit what we think is an artificial limit of some sort, or a parameter that we are not taking into account (HAProxy config setting, kernel parameter…). We are wondering if there’s a known limit on what HAProxy is able to process, or if someone has experienced something similar, as we are thinking about moving to bigger servers, and we don’t know if we will observe a big difference. When trying to perform the load test in production, we observe that we can sustain 200k connections, and 10k rps, with a load1 of about 10. The maxsslrate and maxsslconn are maxed out, but we handle the requests fine, and we don’t return 5xx. Once we increase the load just a bit and hit 11k rps and about 205k connections, we start to return 5xx and we rapidly decrease the load, as these are tests against production. Production server specs: CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads) RAM: DDR4 64GB (2666 MT/s) When trying to perform a load test with synthetic tests using k6 as our load generator against staging, we are able to sustain 750k connections, with 20k rps. The load generator has a ramp-up time of 120s to achieve the 750k connections, as that’s what we are trying to benchmark. Staging server specs: CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads) RAM: DDR4 64GB (3200 MT/s) I've made a post about this on discourse, and I got the suggestion to post here. In said post, I've included screenshots of some of our Prometheus metrics. https://discourse.haproxy.org/t/theoretical-limits-for-a-haproxy-instance/8168 Custom kernel parameters: net.ipv4.ip_local_port_range = "12768 60999" net.nf_conntrack_max = 500 fs.nr_open = 500 HAProxy config: global log /dev/log len 65535 local0 warning chroot /var/lib/haproxy stats socket /run/haproxy-admin.sock mode 660 level admin user haproxy group haproxy daemon maxconn 200 maxconnrate 2500 maxsslrate 2500 defaults log global option dontlognull timeout connect 10s timeout client 120s timeout server 120s frontend stats mode http bind *:8404 http-request use-service prometheus-exporter if { path /metrics } stats enable stats uri /stats stats refresh 10s frontend k8s-api bind *:6443 mode tcp option tcplog timeout client 300s default_backend k8s-api backend k8s-api mode tcp option tcp-check timeout server 300s balance leastconn default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 500 maxqueue 256 weight 100 server master01 x.x.x.x:6443 check server master02 x.x.x.x:6443 check server master03 x.x.x.x:6443 check retries 0 frontend k8s-server bind *:80 mode http http-request add-header X-Forwarded-Proto http http-request add-header X-Forwarded-Port 80 default_backend k8s-server backend k8s-server mode http balance leastconn option forwardfor default-server inter 10s downinter 5s rise 2 fall 2 check server worker01a x.x.x.x:31551 maxconn 20 server worker02a x.x.x.x:31551 maxconn 20 server worker03a x.x.x.x:31551 maxconn 20 server worker04a x.x.x.x:31551 maxconn 20 server worker05a x.x.x.x:31551 maxconn 20 server worker06a x.x.x.x:31551 maxconn 20 server worker07a x.x.x.x:31551 maxconn 20 server worker08a x.x.x.x:31551 maxconn 20 server worker09a x.x.x.x:31551 maxconn 20 server worker10a x.x.x.x:31551 maxconn 20 server worker11a x.x.x.x:31551 maxconn 20 server worker12a x.x.x.x:31551 maxconn 20 server worker13a x.x.x.x:31551 maxconn 20 server worker14a x.x.x.x:31551 maxconn 20 server worker15a x.x.x.x:31551 maxconn 20 server worker16a x.x.x.x:31551 maxconn 20 server worker17a x.x.x.x:31551 maxconn 20 server worker18a x.x.x.x:31551 maxconn 20 server worker19a x.x.x.x:31551 maxconn 20 server worker20a x.x.x.x:31551 maxconn 20 server worker01an x.x.x.x:31551 maxconn 200
Re: Theoretical limits for a HAProxy instance
Hi, We are not sure what element produces the errors, in the haproxy logs we don't see them. What does it happen with the new connections when we hit the limit? What server resource should be affected by it, if any? We have our logs in `warning` level so we do not see the response time on the haproxy logs. In the `haproxy_backend_response_time_average_seconds` we see "normal" values (20ms average and some spikes at less than 80ms). We see a spike in the `haproxy_backend_connect_time_average_seconds`, as well as in the `haproxy_server_connect_time_average_seconds` metrics when the errors start. We have more haproxies (that we were not stressing in the test) in front of the same backends, serving without issues. We are using https between to communicate to the backends too, and we see that the task consuming the majority of the resources is `ssl_sock_io_cb`. Our traffic is not request rate intensive, but we do have a high amount of concurrent connections. For example, at 200k connections, we have 10k rps, and at 300k connections, we have 14k rps. In our tests, we never saturate our network bandwidth, we hit as much as 50% our available bandwidth for the server. Regards. On 14/12/22 8:09, Willy Tarreau wrote: Hi, On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote: Hi, We do hit our defined max ssl/conn rates, but given the spare resources, we don't expect to suddenly return 5xx. What bothers me is that once this limit is reached there's no more connection accepted by haproxy so you should indeed not see such errors. What element produces these errors ? Since you've enabled logging, can you check them in your logs and figure whether they're sent by the server, by haproxy or none (it might well be the load generator translating connection timeouts to 5xx for user reporting). If the errors are produced by haproxy, then their exact value and the termination flags are very important as they'll indicate where the problem is. Another thing you'll observe in your logs are the server's response time. It's possible that your request rate is getting very close to the server's limits. Among other limitations the packet rate and network bandwidth might represent a limit. For example, let's say you're requesting 10kB objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't pass anymore. When you look at the haproxy stats page, the cumulated live network bandwidth is reported at the top, it might give you an indication of a possible limit. But in any case, stats on the status codes and termination codes in logs would be extremely useful. Note that you can do this natively using halog (I seem to remember it's halog -st and halog -tc, but please double-check, and in any case, cut|sort|uniq -c always works). Regards, Willy
Re: Theoretical limits for a HAProxy instance
Also, our connections are renewed constantly, with a ssl rate of about 3000 per second, same as our connection rate. On 14/12/22 12:50, Iago Alonso wrote: Hi, We are not sure what element produces the errors, in the haproxy logs we don't see them. What does it happen with the new connections when we hit the limit? What server resource should be affected by it, if any? We have our logs in `warning` level so we do not see the response time on the haproxy logs. In the `haproxy_backend_response_time_average_seconds` we see "normal" values (20ms average and some spikes at less than 80ms). We see a spike in the `haproxy_backend_connect_time_average_seconds`, as well as in the `haproxy_server_connect_time_average_seconds` metrics when the errors start. We have more haproxies (that we were not stressing in the test) in front of the same backends, serving without issues. We are using https between to communicate to the backends too, and we see that the task consuming the majority of the resources is `ssl_sock_io_cb`. Our traffic is not request rate intensive, but we do have a high amount of concurrent connections. For example, at 200k connections, we have 10k rps, and at 300k connections, we have 14k rps. In our tests, we never saturate our network bandwidth, we hit as much as 50% our available bandwidth for the server. Regards. On 14/12/22 8:09, Willy Tarreau wrote: Hi, On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote: Hi, We do hit our defined max ssl/conn rates, but given the spare resources, we don't expect to suddenly return 5xx. What bothers me is that once this limit is reached there's no more connection accepted by haproxy so you should indeed not see such errors. What element produces these errors ? Since you've enabled logging, can you check them in your logs and figure whether they're sent by the server, by haproxy or none (it might well be the load generator translating connection timeouts to 5xx for user reporting). If the errors are produced by haproxy, then their exact value and the termination flags are very important as they'll indicate where the problem is. Another thing you'll observe in your logs are the server's response time. It's possible that your request rate is getting very close to the server's limits. Among other limitations the packet rate and network bandwidth might represent a limit. For example, let's say you're requesting 10kB objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't pass anymore. When you look at the haproxy stats page, the cumulated live network bandwidth is reported at the top, it might give you an indication of a possible limit. But in any case, stats on the status codes and termination codes in logs would be extremely useful. Note that you can do this natively using halog (I seem to remember it's halog -st and halog -tc, but please double-check, and in any case, cut|sort|uniq -c always works). Regards, Willy
Re: Theoretical limits for a HAProxy instance
On Wed, Dec 14, 2022 at 12:50:10PM +0100, Iago Alonso wrote: > Hi, > > We are not sure what element produces the errors, in the haproxy logs we > don't see them. Then they don't pass through haproxy nor do they come from haproxy. > What does it happen with the new connections when we hit the > limit? If you're speaking about the maxconnrate/maxsessrate, the connection acceptance is simply delayed until the rate becomes lower again. > What server resource should be affected by it, if any? None at all, actually the connection remains in the kernel's accept queue. However this can have as a side effect that once too many are queued, new ones are just rejected once the kernel queue is full, and maybe the client translates this into a 5xx (this wouldn't shock me as long as there is a way for the user to figure it). > We have our logs in `warning` level so we do not see the response time on > the haproxy logs. Ah, that's an important point. So in fact you almost don't have logs, and maybe you''re really having 5xx that you don't log. You could perhaps just leave them at "info" level and add "option dontlog-normal" that will eliminate logs for successful requests and only log the failed ones. > In the `haproxy_backend_response_time_average_seconds` we > see "normal" values (20ms average and some spikes at less than 80ms). OK, that's good. > We see a spike in the `haproxy_backend_connect_time_average_seconds`, as > well as in the `haproxy_server_connect_time_average_seconds` metrics when > the errors start. Interesting, then this could be related to either a conntrack filling on one element in the chain (including both extremities), or a shortage of source ports when dealing with too many connections. The cost of finding a source port can sometimes become extreme in the system, I've personally seen connect() taking up to 50 ms to succeed! In this case, just increasing the source port range can be enough, just like explicitly setting source port ranges in haproxy (one per "server" line). In this last case, it saves the system from having to find one port, but you must be certain not to collide with listening ports. > We have more haproxies (that we were not stressing in the > test) in front of the same backends, serving without issues. We are using > https between to communicate to the backends too, and we see that the task > consuming the majority of the resources is `ssl_sock_io_cb`. Ah that's pretty useful :-) It's very likely dealing with the handshake. Could you please run "perf top" on this machine and list the 10 top-most lines ? I'm interested in seeing if you're saturating on crypto functions or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would indicate an abuse of mutexes. By the way, are you running with OpenSSL 3.0 ? That one is absolutely terrible and makes extreme abuse of mutexes and locks, to the point that certain workloads were divided by 2-digit numbers between 1.1.1 and 3.0. It took me one day to figure that my load generator which was caping at 400 conn/s was in fact suffering from an accidental build using 3.0 while in 1.1.1 the perf went back to 75000/s! > Our traffic is not request rate intensive, but we do have a high amount of > concurrent connections. For example, at 200k connections, we have 10k rps, > and at 300k connections, we have 14k rps. Are your connections kept alive long or regularly closed ? In the stats page when you hover on the "Sessions/Total" line of a backend or server, there is this (for example): Cum. sessions:1253716 New connections: 84396 Reused connections: 1169320 (93%) It says that for 1.25M requests, 84k connections were needed and that the rest (93%) could reuse an existing connection. If your reuse rate is too low, you're indeed dealing with new connections that can take some time over SSL. > In our tests, we never saturate our network bandwidth, we hit as much as 50% > our available bandwidth for the server. OK fine. It's perfectly normal not to saturate the network bandwidth using small requests, as the cost of parsing & handling a request is orders of magnitude higher than the cost of processing the equivalent packet. My concern was that I had no idea about the response size, so it could have been possible that a link got filled. Regards, Willy
Re: Theoretical limits for a HAProxy instance
On Wed, Dec 14, 2022 at 02:04:44PM +0100, Iago Alonso wrote: > Also, our connections are renewed constantly, with a ssl rate of about 3000 > per second, same as our connection rate. Very useful, that answers one of my previous questions. Then please check "perf top", it will be very instructive I think. Willy
Re: Theoretical limits for a HAProxy instance
Hi, > Ah that's pretty useful :-) It's very likely dealing with the handshake. > Could you please run "perf top" on this machine and list the 10 top-most > lines ? I'm interested in seeing if you're saturating on crypto functions > or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would > indicate an abuse of mutexes. We are load testing with k6 (https://k6.io/), in a server with an AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test with 300k connections, 15k rps, and a sslrate and connrate of about 4k. The server had a `load1` of 12, the `node_nf_conntrack_entries` were about 450k. The `haproxy_server_connect_time_average_seconds` was between 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was always about 0.2s lower. We have this custom kernel parameters set: net.nf_conntrack_max=500 fs.nr_open=500 net.ipv4.ip_local_port_range=1276860999 At that load, we were starting to see just a small number of 5xx, and in the logs we could see that they were caused by connect timeout. XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker 261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET / HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384 This is the output from `perf top`, after 40M samples, during the test 17.38% libc.so.6 [.] pthread_rwlock_rdlock 10.25% libc.so.6 [.] pthread_rwlock_unlock 2.32% haproxy[.] sha512_block_data_order_avx2 1.64% libc.so.6[.] malloc 1.62% libc.so.6[.] cfree 1.51% haproxy[.] fe_mul 1.43% libc.so.6[.] 0x000a4620 1.29% haproxy[.] OPENSSL_LH_retrieve 1.16% haproxy[.] x25519_fe64_sqr 0.80% haproxy[.] bn_sqrx8x_internal 0.75% haproxy[.] ossl_sa_doall_arg 0.70% haproxy[.] ossl_lh_strcasehash 0.67% [kernel] [k] __nf_conntrack_find_get 0.67% haproxy [.] OPENSSL_cleanse 0.66% haproxy [.] do_name 0.56% haproxy [.] OPENSSL_LH_doall_arg 0.53% haproxy [.] x25519_fe64_mul 0.51% haproxy [.] fe_sq > By the way, are you running with OpenSSL > 3.0 ? That one is absolutely terrible and makes extreme abuse of mutexes > and locks, to the point that certain workloads were divided by 2-digit > numbers between 1.1.1 and 3.0. It took me one day to figure that my load > generator which was caping at 400 conn/s was in fact suffering from an > accidental build using 3.0 while in 1.1.1 the perf went back to 75000/s! Yes, we are running OpenSSL 3.0.7. We recently upgraded to Ubuntu 20.04, and we tested all the possible combinations between HAProxy 2.5.3 and 2.6.6, compiled with OpenSSL 1.1.1f and 3.0.7, and we didn't see a noticeable performance hit running similar tests as the previously mentioned one. We are planning to try again (next week) just stressing the sslrate so we can assess a bit more the performance hit. On Wed, Dec 14, 2022 at 2:07 PM Willy Tarreau wrote: > > On Wed, Dec 14, 2022 at 12:50:10PM +0100, Iago Alonso wrote: > > Hi, > > > > We are not sure what element produces the errors, in the haproxy logs we > > don't see them. > > Then they don't pass through haproxy nor do they come from haproxy. > > > What does it happen with the new connections when we hit the > > limit? > > If you're speaking about the maxconnrate/maxsessrate, the connection > acceptance is simply delayed until the rate becomes lower again. > > > What server resource should be affected by it, if any? > > None at all, actually the connection remains in the kernel's accept > queue. However this can have as a side effect that once too many are > queued, new ones are just rejected once the kernel queue is full, and > maybe the client translates this into a 5xx (this wouldn't shock me > as long as there is a way for the user to figure it). > > > We have our logs in `warning` level so we do not see the response time on > > the haproxy logs. > > Ah, that's an important point. So in fact you almost don't have logs, > and maybe you''re really having 5xx that you don't log. You could > perhaps just leave them at "info" level and add "option dontlog-normal" > that will eliminate logs for successful requests and only log the failed > ones. > > > In the `haproxy_backend_response_time_average_seconds` we > > see "normal" values (20ms average and some spikes at less than 80ms). > > OK, that's good. > > > We see a spike in the `haproxy_backend_connect_time_average_seconds`, as > > well as in the `haproxy_server_connect_time_average_seconds` metrics when > > the errors start. > > Interesting, then this could be related to either a conntrack filling on > one element in the chain (including both extremities), or
Re: Theoretical limits for a HAProxy instance
On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote: > Hi, > > > Ah that's pretty useful :-) It's very likely dealing with the handshake. > > Could you please run "perf top" on this machine and list the 10 top-most > > lines ? I'm interested in seeing if you're saturating on crypto functions > > or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would > > indicate an abuse of mutexes. > > We are load testing with k6 (https://k6.io/), in a server with an > AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test > with 300k connections, 15k rps, and a sslrate and connrate of about 4k. > The server had a `load1` of 12, the `node_nf_conntrack_entries` were about > 450k. The `haproxy_server_connect_time_average_seconds` was between > 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was > always about 0.2s lower. > > We have this custom kernel parameters set: > net.nf_conntrack_max=500 > fs.nr_open=500 > net.ipv4.ip_local_port_range=1276860999 Interesting, so you have conntrack enabled. With 5M conns, there's no reason to fill your table. However, have you checked your servers' kernel logs to see if you find any "conntrack table full" message, that might be caused by too fast connection recycling ? > At that load, we were starting to see just a small number of 5xx, and > in the logs > we could see that they were caused by connect timeout. > XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker > 261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET > / HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384 Hmmm not good. Check if you have any "-m state --state INVALID -j DROP" rule in your iptables whose counter increases during the tests. It might be possible that certain outgoing connections are rejected due to fast port recycling if they were closed by an RST that was lost before hitting conntrack. Another possibility here is that the TCP connection was established but that the SSL handshake didn't complete in time. This could be caused precisely by the openssl 3 locking issues I was speaking about, but I guess not at such numbers, and you'd very likely see the native_queue_spinlock_slowpath thing in perf top. > This is the output from `perf top`, after 40M samples, during the test > > 17.38% libc.so.6 [.] pthread_rwlock_rdlock > 10.25% libc.so.6 [.] pthread_rwlock_unlock With that said, excessive use of rwlock indicates that openssl is very likely there, because that's about the only caller in the process. To counter this abuse, we've developed a dirty counter-measure that consists in providing our own rwlock implementation as a replacement for the pthread one. That's really not beautiful but we have to deal with what we have. You can enable it in 2.7 by adding USE_PTHREAD_EMULATION=1 to your "make" command line. These locks are way less expensive but they will never sleep (no syscall) so instead of idle when the machine is congested, you'll see pl_wait_unlock_long() eating CPU in "perf top" output. In our tests that roughly doubles the performance under high congestion but it's hard to be that specific. In fact it mostly gets us our lock callback back, after they were removed in 1.1.0. > Yes, we are running OpenSSL 3.0.7. We recently upgraded to Ubuntu 20.04, > and we tested all the possible combinations between HAProxy 2.5.3 and 2.6.6, > compiled with OpenSSL 1.1.1f and 3.0.7, and we didn't see a noticeable > performance > hit running similar tests as the previously mentioned one. We are > planning to try again > (next week) just stressing the sslrate so we can assess a bit more the > performance > hit. OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few fixes, though these should not impact your tests and 2.7.0 will be OK). And please check whether USE_PTHREAD_EMULATION=1 improves your situation. If so it will definitely confirm that you're endlessly waiting for some threads stuck in SSL waiting on a mutex. I would really like that we distinguish L4 from L6 in logs output to indicate whether L4 was established and SSL not or if it's blocking at L4. The tricky part is that it requires adding new values for the termination flags, and everyone got used a lot to the current ones... But I think we'll have to do it anyway, at least to sort out such issues. Regards, Willy
Re: Theoretical limits for a HAProxy instance
Interesting, so you have conntrack enabled. With 5M conns, there's no reason to fill your table. However, have you checked your servers' kernel logs to see if you find any "conntrack table full" message, that might be caused by too fast connection recycling ? We can't see any message mentioning conntrack. With that said, excessive use of rwlock indicates that openssl is very likely there, because that's about the only caller in the process. Indeed, we tried compiling HAProxy 2.6.6 against OpenSSL 1.1.1s and the `perf top` results look much better. We can't recally why we decided to use OpenSSL v3, but seems like we have to stick to v1. 2.61% haproxy [.] sha512_block_data_order_avx2 1.89% haproxy [.] fe_mul 1.64% haproxy [.] x25519_fe64_sqr 1.44% libc.so.6 [.] malloc 1.39% [kernel] [k] __nf_conntrack_find_get 1.21% libc.so.6 [.] cfree 0.95% [kernel] [k] native_queued_spin_lock_slowpath.part.0 0.94% haproxy [.] OPENSSL_cleanse 0.78% libc.so.6 [.] 0x000a3b46 0.78% [kernel] [k] do_epoll_ctl 0.75% [kernel] [k] ___slab_alloc 0.73% [kernel] [k] psi_group_change For reference, with a 12 core machine, we ran a test with 420k connections, 32k rps, and a sustained ssl rate of 5k, and the load1 was about 9, and the idle was about 30. And as a side note, we didn't have http/2 enabled, and after doing so, we ran the same test, and the idle was about 55, and the load1 was about 6. OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few fixes, though these should not impact your tests and 2.7.0 will be OK). And please check whether USE_PTHREAD_EMULATION=1 improves your situation. If so it will definitely confirm that you're endlessly waiting for some threads stuck in SSL waiting on a mutex. We've just tried 2.7.1 with USE_PTHREAD_EMULATION=1 (OpenSSL 3.0.7), and we the results are pretty much the same, no noticeable improvement. Hmmm not good. Check if you have any "-m state --state INVALID -j DROP" rule in your iptables whose counter increases during the tests. It might be possible that certain outgoing connections are rejected due to fast port recycling if they were closed by an RST that was lost before hitting conntrack. Another possibility here is that the TCP connection was established but that the SSL handshake didn't complete in time. This could be caused precisely by the openssl 3 locking issues I was speaking about, but I guess not at such numbers, and you'd very likely see the native_queue_spinlock_slowpath thing in perf top. We do have a `-A ufw-before-input -m conntrack --ctstate INVALID -j DROP` but the counter does not increase during the tests. Looks like OpenSSL v3 is the issue. On 16/12/22 18:11, Willy Tarreau wrote: On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote: Hi, Ah that's pretty useful :-) It's very likely dealing with the handshake. Could you please run "perf top" on this machine and list the 10 top-most lines ? I'm interested in seeing if you're saturating on crypto functions or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would indicate an abuse of mutexes. We are load testing with k6 (https://k6.io/), in a server with an AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test with 300k connections, 15k rps, and a sslrate and connrate of about 4k. The server had a `load1` of 12, the `node_nf_conntrack_entries` were about 450k. The `haproxy_server_connect_time_average_seconds` was between 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was always about 0.2s lower. We have this custom kernel parameters set: net.nf_conntrack_max=500 fs.nr_open=500 net.ipv4.ip_local_port_range=1276860999 Interesting, so you have conntrack enabled. With 5M conns, there's no reason to fill your table. However, have you checked your servers' kernel logs to see if you find any "conntrack table full" message, that might be caused by too fast connection recycling ? At that load, we were starting to see just a small number of 5xx, and in the logs we could see that they were caused by connect timeout. XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker 261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET / HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384 Hmmm not good. Check if you have any "-m state --state INVALID -j DROP" rule in your iptables whose counter increases during the tests. It might be possible that certain outgoing connections are rejected due to fast port recycling if they were closed by an RST that was lost before hitting conntrack. Another possibility here is that the TCP connection was established but that the SSL handshake didn't complete in time. This c
Re: Theoretical limits for a HAProxy instance
We are happy to report that after downgrading to OpenSSL 1.1.1s (from 3.0.7), our performance problems are solved, and now looks like HAProxy scales linearly with the available resources. For reference, in a synthetic load test with a request payload of 2k, and a 32-core server (128GB RAM) with 10Gb bandwidth available, we are able to sustain 1.83M conn, 82K rps, and a sslrate of 22k, with a load average of about 31 (idle time percent was about 6) On Wed, Dec 21, 2022 at 5:39 PM Iago Alonso wrote: > > > Interesting, so you have conntrack enabled. With 5M conns, there's no > > reason to fill your table. However, have you checked your servers' kernel > > logs to see if you find any "conntrack table full" message, that might > > be caused by too fast connection recycling ? > We can't see any message mentioning conntrack. > > > With that said, excessive use of rwlock indicates that openssl is > > very likely there, because that's about the only caller in the process. > Indeed, we tried compiling HAProxy 2.6.6 against OpenSSL 1.1.1s and the > > `perf top` results look much better. We can't recally why we decided > > to use OpenSSL v3, but seems like we have to stick to v1. > > 2.61% haproxy[.] sha512_block_data_order_avx2 > 1.89% haproxy[.] fe_mul > 1.64% haproxy[.] x25519_fe64_sqr > 1.44% libc.so.6[.] malloc > 1.39% [kernel] [k] __nf_conntrack_find_get > 1.21% libc.so.6[.] cfree > 0.95% [kernel] [k] > native_queued_spin_lock_slowpath.part.0 > 0.94% haproxy[.] OPENSSL_cleanse > 0.78% libc.so.6[.] 0x000a3b46 > 0.78% [kernel] [k] do_epoll_ctl > 0.75% [kernel] [k] ___slab_alloc > 0.73% [kernel] [k] psi_group_change > > For reference, with a 12 core machine, we ran a test with 420k connections, > > 32k rps, and a sustained ssl rate of 5k, and the load1 was about 9, and > > the idle was about 30. > > And as a side note, we didn't have http/2 enabled, and after doing > > so, we ran the same test, and the idle was about 55, and the load1 was > about 6. > > > OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few > > fixes, though these should not impact your tests and 2.7.0 will be OK). > > And please check whether USE_PTHREAD_EMULATION=1 improves your situation. > > If so it will definitely confirm that you're endlessly waiting for some > > threads stuck in SSL waiting on a mutex. > We've just tried 2.7.1 with USE_PTHREAD_EMULATION=1 (OpenSSL 3.0.7), and > > we the results are pretty much the same, no noticeable improvement. > > > Hmmm not good. Check if you have any "-m state --state INVALID -j DROP" > > rule in your iptables whose counter increases during the tests. It might > > be possible that certain outgoing connections are rejected due to fast > > port recycling if they were closed by an RST that was lost before hitting > > conntrack. Another possibility here is that the TCP connection was > > established but that the SSL handshake didn't complete in time. This > > could be caused precisely by the openssl 3 locking issues I was speaking > > about, but I guess not at such numbers, and you'd very likely see > > the native_queue_spinlock_slowpath thing in perf top. > We do have a `-A ufw-before-input -m conntrack --ctstate INVALID -j DROP` > > but the counter does not increase during the tests. Looks like OpenSSL > > v3 is the issue. > > > On 16/12/22 18:11, Willy Tarreau wrote: > > On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote: > >> Hi, > >> > >>> Ah that's pretty useful :-) It's very likely dealing with the handshake. > >>> Could you please run "perf top" on this machine and list the 10 top-most > >>> lines ? I'm interested in seeing if you're saturating on crypto functions > >>> or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would > >>> indicate an abuse of mutexes. > >> We are load testing with k6 (https://k6.io/), in a server with an > >> AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test > >> with 300k connections, 15k rps, and a sslrate and connrate of about 4k. > >> The server had a `load1` of 12, the `node_nf_conntrack_entries` were about > >> 450k. The `haproxy_server_connect_time_average_seconds` was between > >> 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was > >> always about 0.2s lower. > >> > >> We have this custom kernel parameters set: > >> net.nf_conntrack_max=500 > >> fs.nr_open=500 > >> net.ipv4.ip_local_port_range=1276860999 > > Interesting, so you have conntrack enabled. With 5M conns, there's no > > reason to fill your table. However, have you checked your servers' kernel > > logs to see if you find any "conntrack table full" message, that might > > be caused by too fast connection recy
Re: Theoretical limits for a HAProxy instance
Hi Iago, On Tue, Jan 24, 2023 at 04:45:54PM +0100, Iago Alonso wrote: > We are happy to report that after downgrading to OpenSSL 1.1.1s (from > 3.0.7), our performance problems are solved, and now looks like > HAProxy scales linearly with the available resources. Excellent, thanks for this nice feedback! And welcome to the constantly growing list of users condemned to use 1.1.1 forever :-/ > For reference, in a synthetic load test with a request payload of 2k, > and a 32-core server (128GB RAM) with 10Gb bandwidth available, we are > able to sustain 1.83M conn, 82K rps, and a sslrate of 22k, with a load > average of about 31 (idle time percent was about 6) This looks like excellent numbers, which should hopefully give you plenty of headroom. Thanks for your feedback! Willy