Re: Theoretical limits for a HAProxy instance

2022-12-12 Thread Jarno Huuskonen
Hi,

On Mon, 2022-12-12 at 09:47 +0100, Iago Alonso wrote:
> 

Can you share haproxy -vv output ?

> HAProxy config:
> global
>     log /dev/log len 65535 local0 warning
>     chroot /var/lib/haproxy
>     stats socket /run/haproxy-admin.sock mode 660 level admin
>     user haproxy
>     group haproxy
>     daemon
>     maxconn 200
>     maxconnrate 2500
>     maxsslrate 2500

From your graphs (haproxy_process_current_ssl_rate /
haproxy_process_current_connection_rate) you might hit
maxconnrate/maxsslrate

-Jarno

-- 
Jarno Huuskonen


Re: Theoretical limits for a HAProxy instance

2022-12-13 Thread Iago Alonso
Hi,

We do hit our defined max ssl/conn rates, but given the spare
resources, we don't expect to suddenly return 5xx.

Here's the output of `haproxy -vv` (I've also added it to the post on
discourse):

HAProxy version 2.6.6-274d1a4 2022/09/22 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.6.html
Running on: Linux 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30
UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement
-Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2
-Wduplicated-cond -Wnull-dereference -fwrapv
-Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare
-Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers
-Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2
-PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY
+LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO
+OPENSSL -LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS
+DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER
+PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX
-MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=32).
Built with OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
Running on OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.3.0

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' keyword)
 h2 : mode=HTTP  side=FE|BE  mux=H2flags=HTX|HOL_RISK|NO_UPG
   fcgi : mode=HTTP  side=BE mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
   : mode=HTTP  side=FE|BE  mux=H1flags=HTX
 h1 : mode=HTTP  side=FE|BE  mux=H1flags=HTX|NO_UPG
   : mode=TCP   side=FE|BE  mux=PASS  flags=
   none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace

On Mon, Dec 12, 2022 at 2:50 PM Jarno Huuskonen  wrote:
>
> Hi,
>
> On Mon, 2022-12-12 at 09:47 +0100, Iago Alonso wrote:
> >
>
> Can you share haproxy -vv output ?
>
> > HAProxy config:
> > global
> > log /dev/log len 65535 local0 warning
> > chroot /var/lib/haproxy
> > stats socket /run/haproxy-admin.sock mode 660 level admin
> > user haproxy
> > group haproxy
> > daemon
> > maxconn 200
> > maxconnrate 2500
> > maxsslrate 2500
>
> From your graphs (haproxy_process_current_ssl_rate /
> haproxy_process_current_connection_rate) you might hit
> maxconnrate/maxsslrate
>
> -Jarno
>
> --
> Jarno Huuskonen



Re: Theoretical limits for a HAProxy instance

2022-12-13 Thread Emerson Gomes
Hi,

Have you tried increasing the number of processes/threads?
I dont see any nbthreads or nbproc in your config.

Check out https://www.haproxy.com/blog/multithreading-in-haproxy/

BR.,
Emerson


Em seg., 12 de dez. de 2022 às 02:49, Iago Alonso 
escreveu:

> Hello,
>
> We are performing a lot of load tests, and we hit what we think is an
> artificial limit of some sort, or a parameter that we are not taking
> into account (HAProxy config setting, kernel parameter…). We are
> wondering if there’s a known limit on what HAProxy is able to process,
> or if someone has experienced something similar, as we are thinking
> about moving to bigger servers, and we don’t know if we will observe a
> big difference.
>
> When trying to perform the load test in production, we observe that we
> can sustain 200k connections, and 10k rps, with a load1 of about 10.
> The maxsslrate and maxsslconn are maxed out, but we handle the
> requests fine, and we don’t return 5xx. Once we increase the load just
> a bit and hit 11k rps and about 205k connections, we start to return
> 5xx and we rapidly decrease the load, as these are tests against
> production.
>
> Production server specs:
> CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
> RAM: DDR4 64GB (2666 MT/s)
>
> When trying to perform a load test with synthetic tests using k6 as
> our load generator against staging, we are able to sustain 750k
> connections, with 20k rps. The load generator has a ramp-up time of
> 120s to achieve the 750k connections, as that’s what we are trying to
> benchmark.
>
> Staging server specs:
> CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads)
> RAM: DDR4 64GB (3200 MT/s)
>
> I've made a post about this on discourse, and I got the suggestion to
> post here. In said post, I've included screenshots of some of our
> Prometheus metrics.
>
> https://discourse.haproxy.org/t/theoretical-limits-for-a-haproxy-instance/8168
>
> Custom kernel parameters:
> net.ipv4.ip_local_port_range = "1276860999"
> net.nf_conntrack_max = 500
> fs.nr_open = 500
>
> HAProxy config:
> global
> log /dev/log len 65535 local0 warning
> chroot /var/lib/haproxy
> stats socket /run/haproxy-admin.sock mode 660 level admin
> user haproxy
> group haproxy
> daemon
> maxconn 200
> maxconnrate 2500
> maxsslrate 2500
>
> defaults
> log global
> option  dontlognull
> timeout connect 10s
> timeout client  120s
> timeout server  120s
>
> frontend stats
> mode http
> bind *:8404
> http-request use-service prometheus-exporter if { path /metrics }
> stats enable
> stats uri /stats
> stats refresh 10s
>
> frontend k8s-api
> bind *:6443
> mode tcp
> option tcplog
> timeout client 300s
> default_backend k8s-api
>
> backend k8s-api
> mode tcp
> option tcp-check
> timeout server 300s
> balance leastconn
> default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s
> maxconn 500 maxqueue 256 weight 100
> server master01 x.x.x.x:6443 check
> server master02 x.x.x.x:6443 check
> server master03 x.x.x.x:6443 check
> retries 0
>
> frontend k8s-server
> bind *:80
> mode http
> http-request add-header X-Forwarded-Proto http
> http-request add-header X-Forwarded-Port 80
> default_backend k8s-server
>
> backend k8s-server
> mode http
> balance leastconn
> option forwardfor
> default-server inter 10s downinter 5s rise 2 fall 2 check
> server worker01a x.x.x.x:31551 maxconn 20
> server worker02a x.x.x.x:31551 maxconn 20
> server worker03a x.x.x.x:31551 maxconn 20
> server worker04a x.x.x.x:31551 maxconn 20
> server worker05a x.x.x.x:31551 maxconn 20
> server worker06a x.x.x.x:31551 maxconn 20
> server worker07a x.x.x.x:31551 maxconn 20
> server worker08a x.x.x.x:31551 maxconn 20
> server worker09a x.x.x.x:31551 maxconn 20
> server worker10a x.x.x.x:31551 maxconn 20
> server worker11a x.x.x.x:31551 maxconn 20
> server worker12a x.x.x.x:31551 maxconn 20
> server worker13a x.x.x.x:31551 maxconn 20
> server worker14a x.x.x.x:31551 maxconn 20
> server worker15a x.x.x.x:31551 maxconn 20
> server worker16a x.x.x.x:31551 maxconn 20
> server worker17a x.x.x.x:31551 maxconn 20
> server worker18a x.x.x.x:31551 maxconn 20
> server worker19a x.x.x.x:31551 maxconn 20
> server worker20a x.x.x.x:31551 maxconn 20
> server worker01an x.x.x.x:31551 maxconn 20
> server worker02an x.x.x.x:31551 maxconn 20
> server worker03an x.x.x.x:31551 maxconn 20
> retries 0
>
> frontend k8s-server-https
> bind *:443 ssl crt /etc/haproxy/certs/
> mode http
> http-request add-header X-Forwarded-Proto https
> http-request add-header X-Forwarded-Port 443
> http-request del-header X-SERVER-SNI
> http-request set-header X-SERVER-SNI %[ssl_

Re: Theoretical limits for a HAProxy instance

2022-12-13 Thread Willy Tarreau
Hi,

On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote:
> Hi,
> 
> We do hit our defined max ssl/conn rates, but given the spare
> resources, we don't expect to suddenly return 5xx.

What bothers me is that once this limit is reached there's no more
connection accepted by haproxy so you should indeed not see such
errors. What element produces these errors ? Since you've enabled
logging, can you check them in your logs and figure whether they're
sent by the server, by haproxy or none (it might well be the load
generator translating connection timeouts to 5xx for user reporting).

If the errors are produced by haproxy, then their exact value and the
termination flags are very important as they'll indicate where the
problem is.

Another thing you'll observe in your logs are the server's response
time. It's possible that your request rate is getting very close to
the server's limits.

Among other limitations the packet rate and network bandwidth might
represent a limit. For example, let's say you're requesting 10kB
objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't
pass anymore.

When you look at the haproxy stats page, the cumulated live network
bandwidth is reported at the top, it might give you an indication of
a possible limit.

But in any case, stats on the status codes and termination codes in
logs would be extremely useful. Note that you can do this natively
using halog (I seem to remember it's halog -st and halog -tc, but
please double-check, and in any case, cut|sort|uniq -c always works).

Regards,
Willy



Re: Theoretical limits for a HAProxy instance

2022-12-14 Thread Iago Alonso

Hi,

We are letting HAProxy pick what it think is best for our hardware. It 
seems to always pick `nbproc = 1` (we only have one cpu socket), and 
`nbthread = $(nproc)`



On 14/12/22 3:15, Emerson Gomes wrote:

Hi,

Have you tried increasing the number of processes/threads?
I dont see any nbthreads or nbproc in your config.

Check out https://www.haproxy.com/blog/multithreading-in-haproxy/

BR.,
Emerson


Em seg., 12 de dez. de 2022 às 02:49, Iago Alonso 
 escreveu:


Hello,

We are performing a lot of load tests, and we hit what we think is an
artificial limit of some sort, or a parameter that we are not taking
into account (HAProxy config setting, kernel parameter…). We are
wondering if there’s a known limit on what HAProxy is able to process,
or if someone has experienced something similar, as we are thinking
about moving to bigger servers, and we don’t know if we will observe a
big difference.

When trying to perform the load test in production, we observe that we
can sustain 200k connections, and 10k rps, with a load1 of about 10.
The maxsslrate and maxsslconn are maxed out, but we handle the
requests fine, and we don’t return 5xx. Once we increase the load just
a bit and hit 11k rps and about 205k connections, we start to return
5xx and we rapidly decrease the load, as these are tests against
production.

Production server specs:
CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
RAM: DDR4 64GB (2666 MT/s)

When trying to perform a load test with synthetic tests using k6 as
our load generator against staging, we are able to sustain 750k
connections, with 20k rps. The load generator has a ramp-up time of
120s to achieve the 750k connections, as that’s what we are trying to
benchmark.

Staging server specs:
CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads)
RAM: DDR4 64GB (3200 MT/s)

I've made a post about this on discourse, and I got the suggestion to
post here. In said post, I've included screenshots of some of our
Prometheus metrics.

https://discourse.haproxy.org/t/theoretical-limits-for-a-haproxy-instance/8168

Custom kernel parameters:
net.ipv4.ip_local_port_range = "12768    60999"
net.nf_conntrack_max = 500
fs.nr_open = 500

HAProxy config:
global
    log /dev/log len 65535 local0 warning
    chroot /var/lib/haproxy
    stats socket /run/haproxy-admin.sock mode 660 level admin
    user haproxy
    group haproxy
    daemon
    maxconn 200
    maxconnrate 2500
    maxsslrate 2500

defaults
    log     global
    option  dontlognull
    timeout connect 10s
    timeout client  120s
    timeout server  120s

frontend stats
    mode http
    bind *:8404
    http-request use-service prometheus-exporter if { path /metrics }
    stats enable
    stats uri /stats
    stats refresh 10s

frontend k8s-api
    bind *:6443
    mode tcp
    option tcplog
    timeout client 300s
    default_backend k8s-api

backend k8s-api
    mode tcp
    option tcp-check
    timeout server 300s
    balance leastconn
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s
maxconn 500 maxqueue 256 weight 100
    server master01 x.x.x.x:6443 check
    server master02 x.x.x.x:6443 check
    server master03 x.x.x.x:6443 check
    retries 0

frontend k8s-server
    bind *:80
    mode http
    http-request add-header X-Forwarded-Proto http
    http-request add-header X-Forwarded-Port 80
    default_backend k8s-server

backend k8s-server
    mode http
    balance leastconn
    option forwardfor
    default-server inter 10s downinter 5s rise 2 fall 2 check
    server worker01a x.x.x.x:31551 maxconn 20
    server worker02a x.x.x.x:31551 maxconn 20
    server worker03a x.x.x.x:31551 maxconn 20
    server worker04a x.x.x.x:31551 maxconn 20
    server worker05a x.x.x.x:31551 maxconn 20
    server worker06a x.x.x.x:31551 maxconn 20
    server worker07a x.x.x.x:31551 maxconn 20
    server worker08a x.x.x.x:31551 maxconn 20
    server worker09a x.x.x.x:31551 maxconn 20
    server worker10a x.x.x.x:31551 maxconn 20
    server worker11a x.x.x.x:31551 maxconn 20
    server worker12a x.x.x.x:31551 maxconn 20
    server worker13a x.x.x.x:31551 maxconn 20
    server worker14a x.x.x.x:31551 maxconn 20
    server worker15a x.x.x.x:31551 maxconn 20
    server worker16a x.x.x.x:31551 maxconn 20
    server worker17a x.x.x.x:31551 maxconn 20
    server worker18a x.x.x.x:31551 maxconn 20
    server worker19a x.x.x.x:31551 maxconn 20
    server worker20a x.x.x.x:31551 maxconn 20
    server worker01an x.x.x.x:31551 maxconn 200

Re: Theoretical limits for a HAProxy instance

2022-12-14 Thread Iago Alonso

Hi,

We are not sure what element produces the errors, in the haproxy logs we 
don't see them. What does it happen with the new connections when we hit 
the limit? What server resource should be affected by it, if any?


We have our logs in `warning` level so we do not see the response time 
on the haproxy logs. In the 
`haproxy_backend_response_time_average_seconds` we see "normal" values 
(20ms average and some spikes at less than 80ms).


We see a spike in the `haproxy_backend_connect_time_average_seconds`, as 
well as in the `haproxy_server_connect_time_average_seconds` metrics 
when the errors start. We have more haproxies (that we were not 
stressing in the test) in front of the same backends, serving without 
issues. We are using https between to communicate to the backends too, 
and we see that the task consuming the majority of the resources is 
`ssl_sock_io_cb`.


Our traffic is not request rate intensive, but we do have a high amount 
of concurrent connections. For example, at 200k connections, we have 10k 
rps, and at 300k connections, we have 14k rps.


In our tests, we never saturate our network bandwidth, we hit as much as 
50% our available bandwidth for the server.


Regards.


On 14/12/22 8:09, Willy Tarreau wrote:

Hi,

On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote:

Hi,

We do hit our defined max ssl/conn rates, but given the spare
resources, we don't expect to suddenly return 5xx.

What bothers me is that once this limit is reached there's no more
connection accepted by haproxy so you should indeed not see such
errors. What element produces these errors ? Since you've enabled
logging, can you check them in your logs and figure whether they're
sent by the server, by haproxy or none (it might well be the load
generator translating connection timeouts to 5xx for user reporting).

If the errors are produced by haproxy, then their exact value and the
termination flags are very important as they'll indicate where the
problem is.

Another thing you'll observe in your logs are the server's response
time. It's possible that your request rate is getting very close to
the server's limits.

Among other limitations the packet rate and network bandwidth might
represent a limit. For example, let's say you're requesting 10kB
objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't
pass anymore.

When you look at the haproxy stats page, the cumulated live network
bandwidth is reported at the top, it might give you an indication of
a possible limit.

But in any case, stats on the status codes and termination codes in
logs would be extremely useful. Note that you can do this natively
using halog (I seem to remember it's halog -st and halog -tc, but
please double-check, and in any case, cut|sort|uniq -c always works).

Regards,
Willy




Re: Theoretical limits for a HAProxy instance

2022-12-14 Thread Iago Alonso
Also, our connections are renewed constantly, with a ssl rate of about 
3000 per second, same as our connection rate.


On 14/12/22 12:50, Iago Alonso wrote:

Hi,

We are not sure what element produces the errors, in the haproxy logs 
we don't see them. What does it happen with the new connections when 
we hit the limit? What server resource should be affected by it, if any?


We have our logs in `warning` level so we do not see the response time 
on the haproxy logs. In the 
`haproxy_backend_response_time_average_seconds` we see "normal" values 
(20ms average and some spikes at less than 80ms).


We see a spike in the `haproxy_backend_connect_time_average_seconds`, 
as well as in the `haproxy_server_connect_time_average_seconds` 
metrics when the errors start. We have more haproxies (that we were 
not stressing in the test) in front of the same backends, serving 
without issues. We are using https between to communicate to the 
backends too, and we see that the task consuming the majority of the 
resources is `ssl_sock_io_cb`.


Our traffic is not request rate intensive, but we do have a high 
amount of concurrent connections. For example, at 200k connections, we 
have 10k rps, and at 300k connections, we have 14k rps.


In our tests, we never saturate our network bandwidth, we hit as much 
as 50% our available bandwidth for the server.


Regards.


On 14/12/22 8:09, Willy Tarreau wrote:

Hi,

On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote:

Hi,

We do hit our defined max ssl/conn rates, but given the spare
resources, we don't expect to suddenly return 5xx.

What bothers me is that once this limit is reached there's no more
connection accepted by haproxy so you should indeed not see such
errors. What element produces these errors ? Since you've enabled
logging, can you check them in your logs and figure whether they're
sent by the server, by haproxy or none (it might well be the load
generator translating connection timeouts to 5xx for user reporting).

If the errors are produced by haproxy, then their exact value and the
termination flags are very important as they'll indicate where the
problem is.

Another thing you'll observe in your logs are the server's response
time. It's possible that your request rate is getting very close to
the server's limits.

Among other limitations the packet rate and network bandwidth might
represent a limit. For example, let's say you're requesting 10kB
objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't
pass anymore.

When you look at the haproxy stats page, the cumulated live network
bandwidth is reported at the top, it might give you an indication of
a possible limit.

But in any case, stats on the status codes and termination codes in
logs would be extremely useful. Note that you can do this natively
using halog (I seem to remember it's halog -st and halog -tc, but
please double-check, and in any case, cut|sort|uniq -c always works).

Regards,
Willy




Re: Theoretical limits for a HAProxy instance

2022-12-14 Thread Willy Tarreau
On Wed, Dec 14, 2022 at 12:50:10PM +0100, Iago Alonso wrote:
> Hi,
> 
> We are not sure what element produces the errors, in the haproxy logs we
> don't see them.

Then they don't pass through haproxy nor do they come from haproxy.

> What does it happen with the new connections when we hit the
> limit?

If you're speaking about the maxconnrate/maxsessrate, the connection
acceptance is simply delayed until the rate becomes lower again.

> What server resource should be affected by it, if any?

None at all, actually the connection remains in the kernel's accept
queue. However this can have as a side effect that once too many are
queued, new ones are just rejected once the kernel queue is full, and
maybe the client translates this into a 5xx (this wouldn't shock me
as long as there is a way for the user to figure it).

> We have our logs in `warning` level so we do not see the response time on
> the haproxy logs.

Ah, that's an important point. So in fact you almost don't have logs,
and maybe you''re really having 5xx that you don't log. You could
perhaps just leave them at "info" level and add "option dontlog-normal"
that will eliminate logs for successful requests and only log the failed
ones.

> In the `haproxy_backend_response_time_average_seconds` we
> see "normal" values (20ms average and some spikes at less than 80ms).

OK, that's good.

> We see a spike in the `haproxy_backend_connect_time_average_seconds`, as
> well as in the `haproxy_server_connect_time_average_seconds` metrics when
> the errors start.

Interesting, then this could be related to either a conntrack filling on
one element in the chain (including both extremities), or a shortage of
source ports when dealing with too many connections. The cost of finding
a source port can sometimes become extreme in the system, I've personally
seen connect() taking up to 50 ms to succeed! In this case, just increasing
the source port range can be enough, just like explicitly setting source
port ranges in haproxy (one per "server" line). In this last case, it saves
the system from having to find one port, but you must be certain not to
collide with listening ports.

> We have more haproxies (that we were not stressing in the
> test) in front of the same backends, serving without issues. We are using
> https between to communicate to the backends too, and we see that the task
> consuming the majority of the resources is `ssl_sock_io_cb`.

Ah that's pretty useful :-) It's very likely dealing with the handshake.
Could you please run "perf top" on this machine and list the 10 top-most
lines ? I'm interested in seeing if you're saturating on crypto functions
or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would
indicate an abuse of mutexes. By the way, are you running with OpenSSL
3.0 ?  That one is absolutely terrible and makes extreme abuse of mutexes
and locks, to the point that certain workloads were divided by 2-digit
numbers between 1.1.1 and 3.0. It took me one day to figure that my load
generator which was caping at 400 conn/s was in fact suffering from an
accidental build using 3.0 while in 1.1.1 the perf went back to 75000/s!

> Our traffic is not request rate intensive, but we do have a high amount of
> concurrent connections. For example, at 200k connections, we have 10k rps,
> and at 300k connections, we have 14k rps.

Are your connections kept alive long or regularly closed ? In the stats page
when you hover on the "Sessions/Total" line of a backend or server, there is
this (for example):

  Cum. sessions:1253716
  New connections:  84396
  Reused connections:   1169320 (93%)

It says that for 1.25M requests, 84k connections were needed and that the
rest (93%) could reuse an existing connection. If your reuse rate is too
low, you're indeed dealing with new connections that can take some time
over SSL.

> In our tests, we never saturate our network bandwidth, we hit as much as 50%
> our available bandwidth for the server.

OK fine. It's perfectly normal not to saturate the network bandwidth using
small requests, as the cost of parsing & handling a request is orders of
magnitude higher than the cost of processing the equivalent packet. My
concern was that I had no idea about the response size, so it could have
been possible that a link got filled.

Regards,
Willy



Re: Theoretical limits for a HAProxy instance

2022-12-14 Thread Willy Tarreau
On Wed, Dec 14, 2022 at 02:04:44PM +0100, Iago Alonso wrote:
> Also, our connections are renewed constantly, with a ssl rate of about 3000
> per second, same as our connection rate.

Very useful, that answers one of my previous questions. Then please check
"perf top", it will be very instructive I think.

Willy



Re: Theoretical limits for a HAProxy instance

2022-12-16 Thread Iago Alonso
Hi,

> Ah that's pretty useful :-) It's very likely dealing with the handshake.
> Could you please run "perf top" on this machine and list the 10 top-most
> lines ? I'm interested in seeing if you're saturating on crypto functions
> or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would
> indicate an abuse of mutexes.

We are load testing with k6 (https://k6.io/), in a server with an
AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test
with 300k connections, 15k rps, and a sslrate and connrate of about 4k.
The server had a `load1` of 12, the `node_nf_conntrack_entries` were about
450k. The `haproxy_server_connect_time_average_seconds` was between
0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was
always about 0.2s lower.

We have this custom kernel parameters set:
net.nf_conntrack_max=500
fs.nr_open=500
net.ipv4.ip_local_port_range=1276860999

At that load, we were starting to see just a small number of 5xx, and
in the logs
we could see that they were caused by connect timeout.
XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker
261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET
/ HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384

This is the output from `perf top`, after 40M samples, during the test

  17.38%  libc.so.6   [.] pthread_rwlock_rdlock
  10.25%  libc.so.6   [.] pthread_rwlock_unlock
   2.32%  haproxy[.] sha512_block_data_order_avx2
   1.64%  libc.so.6[.] malloc
   1.62%  libc.so.6[.] cfree
   1.51%  haproxy[.] fe_mul
   1.43%  libc.so.6[.] 0x000a4620
   1.29%  haproxy[.] OPENSSL_LH_retrieve
   1.16%  haproxy[.] x25519_fe64_sqr
   0.80%  haproxy[.] bn_sqrx8x_internal
   0.75%  haproxy[.] ossl_sa_doall_arg
   0.70%  haproxy[.] ossl_lh_strcasehash
   0.67%  [kernel] [k] __nf_conntrack_find_get
   0.67%  haproxy [.] OPENSSL_cleanse
   0.66%  haproxy [.] do_name
   0.56%  haproxy [.] OPENSSL_LH_doall_arg
   0.53%  haproxy [.] x25519_fe64_mul
   0.51%  haproxy [.] fe_sq

> By the way, are you running with OpenSSL
> 3.0 ?  That one is absolutely terrible and makes extreme abuse of mutexes
> and locks, to the point that certain workloads were divided by 2-digit
> numbers between 1.1.1 and 3.0. It took me one day to figure that my load
> generator which was caping at 400 conn/s was in fact suffering from an
> accidental build using 3.0 while in 1.1.1 the perf went back to 75000/s!

Yes, we are running OpenSSL 3.0.7. We recently upgraded to Ubuntu 20.04,
and we tested all the possible combinations between HAProxy 2.5.3 and 2.6.6,
compiled with OpenSSL 1.1.1f and 3.0.7, and we didn't see a noticeable
performance
hit running similar tests as the previously mentioned one. We are
planning to try again
(next week) just stressing the sslrate so we can assess a bit more the
performance
hit.

On Wed, Dec 14, 2022 at 2:07 PM Willy Tarreau  wrote:
>
> On Wed, Dec 14, 2022 at 12:50:10PM +0100, Iago Alonso wrote:
> > Hi,
> >
> > We are not sure what element produces the errors, in the haproxy logs we
> > don't see them.
>
> Then they don't pass through haproxy nor do they come from haproxy.
>
> > What does it happen with the new connections when we hit the
> > limit?
>
> If you're speaking about the maxconnrate/maxsessrate, the connection
> acceptance is simply delayed until the rate becomes lower again.
>
> > What server resource should be affected by it, if any?
>
> None at all, actually the connection remains in the kernel's accept
> queue. However this can have as a side effect that once too many are
> queued, new ones are just rejected once the kernel queue is full, and
> maybe the client translates this into a 5xx (this wouldn't shock me
> as long as there is a way for the user to figure it).
>
> > We have our logs in `warning` level so we do not see the response time on
> > the haproxy logs.
>
> Ah, that's an important point. So in fact you almost don't have logs,
> and maybe you''re really having 5xx that you don't log. You could
> perhaps just leave them at "info" level and add "option dontlog-normal"
> that will eliminate logs for successful requests and only log the failed
> ones.
>
> > In the `haproxy_backend_response_time_average_seconds` we
> > see "normal" values (20ms average and some spikes at less than 80ms).
>
> OK, that's good.
>
> > We see a spike in the `haproxy_backend_connect_time_average_seconds`, as
> > well as in the `haproxy_server_connect_time_average_seconds` metrics when
> > the errors start.
>
> Interesting, then this could be related to either a conntrack filling on
> one element in the chain (including both extremities), or

Re: Theoretical limits for a HAProxy instance

2022-12-16 Thread Willy Tarreau
On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote:
> Hi,
> 
> > Ah that's pretty useful :-) It's very likely dealing with the handshake.
> > Could you please run "perf top" on this machine and list the 10 top-most
> > lines ? I'm interested in seeing if you're saturating on crypto functions
> > or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would
> > indicate an abuse of mutexes.
> 
> We are load testing with k6 (https://k6.io/), in a server with an
> AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test
> with 300k connections, 15k rps, and a sslrate and connrate of about 4k.
> The server had a `load1` of 12, the `node_nf_conntrack_entries` were about
> 450k. The `haproxy_server_connect_time_average_seconds` was between
> 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was
> always about 0.2s lower.
> 
> We have this custom kernel parameters set:
> net.nf_conntrack_max=500
> fs.nr_open=500
> net.ipv4.ip_local_port_range=1276860999

Interesting, so you have conntrack enabled. With 5M conns, there's no
reason to fill your table. However, have you checked your servers' kernel
logs to see if you find any "conntrack table full" message, that might
be caused by too fast connection recycling ? 

> At that load, we were starting to see just a small number of 5xx, and
> in the logs
> we could see that they were caused by connect timeout.
> XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker
> 261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET
> / HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384

Hmmm not good. Check if you have any "-m state --state INVALID -j DROP"
rule in your iptables whose counter increases during the tests. It might
be possible that certain outgoing connections are rejected due to fast
port recycling if they were closed by an RST that was lost before hitting
conntrack. Another possibility here is that the TCP connection was
established but that the SSL handshake didn't complete in time. This
could be caused precisely by the openssl 3 locking issues I was speaking
about, but I guess not at such numbers, and you'd very likely see
the native_queue_spinlock_slowpath thing in perf top.

> This is the output from `perf top`, after 40M samples, during the test
> 
>   17.38%  libc.so.6   [.] pthread_rwlock_rdlock
>   10.25%  libc.so.6   [.] pthread_rwlock_unlock

With that said, excessive use of rwlock indicates that openssl is
very likely there, because that's about the only caller in the process.

To counter this abuse, we've developed a dirty counter-measure that
consists in providing our own rwlock implementation as a replacement for
the pthread one. That's really not beautiful but we have to deal with
what we have. You can enable it in 2.7 by adding USE_PTHREAD_EMULATION=1
to your "make" command line. These locks are way less expensive but they
will never sleep (no syscall) so instead of idle when the machine is
congested, you'll see pl_wait_unlock_long() eating CPU in "perf top"
output. In our tests that roughly doubles the performance under high
congestion but it's hard to be that specific. In fact it mostly gets
us our lock callback back, after they were removed in 1.1.0.

> Yes, we are running OpenSSL 3.0.7. We recently upgraded to Ubuntu 20.04,
> and we tested all the possible combinations between HAProxy 2.5.3 and 2.6.6,
> compiled with OpenSSL 1.1.1f and 3.0.7, and we didn't see a noticeable
> performance
> hit running similar tests as the previously mentioned one. We are
> planning to try again
> (next week) just stressing the sslrate so we can assess a bit more the
> performance
> hit.

OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few
fixes, though these should not impact your tests and 2.7.0 will be OK).
And please check whether USE_PTHREAD_EMULATION=1 improves your situation.
If so it will definitely confirm that you're endlessly waiting for some
threads stuck in SSL waiting on a mutex.

I would really like that we distinguish L4 from L6 in logs output to
indicate whether L4 was established and SSL not or if it's blocking at
L4. The tricky part is that it requires adding new values for the
termination flags, and everyone got used a lot to the current ones...
But I think we'll have to do it anyway, at least to sort out such
issues.

Regards,
Willy



Re: Theoretical limits for a HAProxy instance

2022-12-21 Thread Iago Alonso

Interesting, so you have conntrack enabled. With 5M conns, there's no
reason to fill your table. However, have you checked your servers' kernel
logs to see if you find any "conntrack table full" message, that might
be caused by too fast connection recycling ?

We can't see any message mentioning conntrack.


With that said, excessive use of rwlock indicates that openssl is
very likely there, because that's about the only caller in the process.

Indeed, we tried compiling HAProxy 2.6.6 against OpenSSL 1.1.1s and the

`perf top` results look much better. We can't recally why we decided

to use OpenSSL v3, but seems like we have to stick to v1.

   2.61%  haproxy    [.] sha512_block_data_order_avx2
   1.89%  haproxy    [.] fe_mul
   1.64%  haproxy    [.] x25519_fe64_sqr
   1.44%  libc.so.6        [.] malloc
   1.39%  [kernel]     [k] __nf_conntrack_find_get
   1.21%  libc.so.6    [.] cfree
   0.95%  [kernel] [k] 
native_queued_spin_lock_slowpath.part.0

   0.94%  haproxy    [.] OPENSSL_cleanse
   0.78%  libc.so.6    [.] 0x000a3b46
   0.78%  [kernel]     [k] do_epoll_ctl
   0.75%  [kernel]         [k] ___slab_alloc
   0.73%  [kernel]         [k] psi_group_change

For reference, with a 12 core machine, we ran a test with 420k connections,

32k rps, and a sustained ssl rate of 5k, and the load1 was about 9, and

the idle was about 30.

And as a side note, we didn't have http/2 enabled, and after doing

so, we ran the same test, and the idle was about 55, and the load1 was 
about 6.



OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few
fixes, though these should not impact your tests and 2.7.0 will be OK).
And please check whether USE_PTHREAD_EMULATION=1 improves your situation.
If so it will definitely confirm that you're endlessly waiting for some
threads stuck in SSL waiting on a mutex.

We've just tried 2.7.1 with USE_PTHREAD_EMULATION=1 (OpenSSL 3.0.7), and

we the results are pretty much the same, no noticeable improvement.


Hmmm not good. Check if you have any "-m state --state INVALID -j DROP"
rule in your iptables whose counter increases during the tests. It might
be possible that certain outgoing connections are rejected due to fast
port recycling if they were closed by an RST that was lost before hitting
conntrack. Another possibility here is that the TCP connection was
established but that the SSL handshake didn't complete in time. This
could be caused precisely by the openssl 3 locking issues I was speaking
about, but I guess not at such numbers, and you'd very likely see
the native_queue_spinlock_slowpath thing in perf top.

We do have a `-A ufw-before-input -m conntrack --ctstate INVALID -j DROP`

but the counter does not increase during the tests. Looks like OpenSSL

v3 is the issue.


On 16/12/22 18:11, Willy Tarreau wrote:

On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote:

Hi,


Ah that's pretty useful :-) It's very likely dealing with the handshake.
Could you please run "perf top" on this machine and list the 10 top-most
lines ? I'm interested in seeing if you're saturating on crypto functions
or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would
indicate an abuse of mutexes.

We are load testing with k6 (https://k6.io/), in a server with an
AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test
with 300k connections, 15k rps, and a sslrate and connrate of about 4k.
The server had a `load1` of 12, the `node_nf_conntrack_entries` were about
450k. The `haproxy_server_connect_time_average_seconds` was between
0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was
always about 0.2s lower.

We have this custom kernel parameters set:
net.nf_conntrack_max=500
fs.nr_open=500
net.ipv4.ip_local_port_range=1276860999

Interesting, so you have conntrack enabled. With 5M conns, there's no
reason to fill your table. However, have you checked your servers' kernel
logs to see if you find any "conntrack table full" message, that might
be caused by too fast connection recycling ?


At that load, we were starting to see just a small number of 5xx, and
in the logs
we could see that they were caused by connect timeout.
XXX.XXX.XXX.XXX:15636 https-front~ https-back/redacted.worker
261/0/-1/-1/10260 503 0 - - sC-- 419484/419480/104950/6557/0 0/0 "GET
/ HTTP/1.1" 0//0/0/0 TLSv1.3/TLS_AES_256_GCM_SHA384

Hmmm not good. Check if you have any "-m state --state INVALID -j DROP"
rule in your iptables whose counter increases during the tests. It might
be possible that certain outgoing connections are rejected due to fast
port recycling if they were closed by an RST that was lost before hitting
conntrack. Another possibility here is that the TCP connection was
established but that the SSL handshake didn't complete in time. This
c

Re: Theoretical limits for a HAProxy instance

2023-01-24 Thread Iago Alonso
We are happy to report that after downgrading to OpenSSL 1.1.1s (from
3.0.7), our performance problems are solved, and now looks like
HAProxy scales linearly with the available resources.

For reference, in a synthetic load test with a request payload of 2k,
and a 32-core server (128GB RAM) with 10Gb bandwidth available, we are
able to sustain 1.83M conn, 82K rps, and a sslrate of 22k, with a load
average of about 31 (idle time percent was about 6)

On Wed, Dec 21, 2022 at 5:39 PM Iago Alonso  wrote:
>
> > Interesting, so you have conntrack enabled. With 5M conns, there's no
> > reason to fill your table. However, have you checked your servers' kernel
> > logs to see if you find any "conntrack table full" message, that might
> > be caused by too fast connection recycling ?
> We can't see any message mentioning conntrack.
>
> > With that said, excessive use of rwlock indicates that openssl is
> > very likely there, because that's about the only caller in the process.
> Indeed, we tried compiling HAProxy 2.6.6 against OpenSSL 1.1.1s and the
>
> `perf top` results look much better. We can't recally why we decided
>
> to use OpenSSL v3, but seems like we have to stick to v1.
>
> 2.61%  haproxy[.] sha512_block_data_order_avx2
> 1.89%  haproxy[.] fe_mul
> 1.64%  haproxy[.] x25519_fe64_sqr
> 1.44%  libc.so.6[.] malloc
> 1.39%  [kernel] [k] __nf_conntrack_find_get
> 1.21%  libc.so.6[.] cfree
> 0.95%  [kernel] [k]
> native_queued_spin_lock_slowpath.part.0
> 0.94%  haproxy[.] OPENSSL_cleanse
> 0.78%  libc.so.6[.] 0x000a3b46
> 0.78%  [kernel] [k] do_epoll_ctl
> 0.75%  [kernel] [k] ___slab_alloc
> 0.73%  [kernel] [k] psi_group_change
>
> For reference, with a 12 core machine, we ran a test with 420k connections,
>
> 32k rps, and a sustained ssl rate of 5k, and the load1 was about 9, and
>
> the idle was about 30.
>
> And as a side note, we didn't have http/2 enabled, and after doing
>
> so, we ran the same test, and the idle was about 55, and the load1 was
> about 6.
>
> > OK. Please give 2.7.x a try (we hope to have 2.7.1 ready soon with a few
> > fixes, though these should not impact your tests and 2.7.0 will be OK).
> > And please check whether USE_PTHREAD_EMULATION=1 improves your situation.
> > If so it will definitely confirm that you're endlessly waiting for some
> > threads stuck in SSL waiting on a mutex.
> We've just tried 2.7.1 with USE_PTHREAD_EMULATION=1 (OpenSSL 3.0.7), and
>
> we the results are pretty much the same, no noticeable improvement.
>
> > Hmmm not good. Check if you have any "-m state --state INVALID -j DROP"
> > rule in your iptables whose counter increases during the tests. It might
> > be possible that certain outgoing connections are rejected due to fast
> > port recycling if they were closed by an RST that was lost before hitting
> > conntrack. Another possibility here is that the TCP connection was
> > established but that the SSL handshake didn't complete in time. This
> > could be caused precisely by the openssl 3 locking issues I was speaking
> > about, but I guess not at such numbers, and you'd very likely see
> > the native_queue_spinlock_slowpath thing in perf top.
> We do have a `-A ufw-before-input -m conntrack --ctstate INVALID -j DROP`
>
> but the counter does not increase during the tests. Looks like OpenSSL
>
> v3 is the issue.
>
>
> On 16/12/22 18:11, Willy Tarreau wrote:
> > On Fri, Dec 16, 2022 at 05:42:50PM +0100, Iago Alonso wrote:
> >> Hi,
> >>
> >>> Ah that's pretty useful :-) It's very likely dealing with the handshake.
> >>> Could you please run "perf top" on this machine and list the 10 top-most
> >>> lines ? I'm interested in seeing if you're saturating on crypto functions
> >>> or locking functions (e.g. "native_queued_spin_lock_slowpath"), that would
> >>> indicate an abuse of mutexes.
> >> We are load testing with k6 (https://k6.io/), in a server with an
> >> AMD Ryzen 5 3600 6-Core Processor (12 threads). We ran a test
> >> with 300k connections, 15k rps, and a sslrate and connrate of about 4k.
> >> The server had a `load1` of 12, the `node_nf_conntrack_entries` were about
> >> 450k. The `haproxy_server_connect_time_average_seconds` was between
> >> 0.5s and 2s, and the `haproxy_backend_connect_time_average_seconds` was
> >> always about 0.2s lower.
> >>
> >> We have this custom kernel parameters set:
> >> net.nf_conntrack_max=500
> >> fs.nr_open=500
> >> net.ipv4.ip_local_port_range=1276860999
> > Interesting, so you have conntrack enabled. With 5M conns, there's no
> > reason to fill your table. However, have you checked your servers' kernel
> > logs to see if you find any "conntrack table full" message, that might
> > be caused by too fast connection recy

Re: Theoretical limits for a HAProxy instance

2023-01-24 Thread Willy Tarreau
Hi Iago,

On Tue, Jan 24, 2023 at 04:45:54PM +0100, Iago Alonso wrote:
> We are happy to report that after downgrading to OpenSSL 1.1.1s (from
> 3.0.7), our performance problems are solved, and now looks like
> HAProxy scales linearly with the available resources.

Excellent, thanks for this nice feedback! And welcome to the constantly
growing list of users condemned to use 1.1.1 forever :-/

> For reference, in a synthetic load test with a request payload of 2k,
> and a 32-core server (128GB RAM) with 10Gb bandwidth available, we are
> able to sustain 1.83M conn, 82K rps, and a sslrate of 22k, with a load
> average of about 31 (idle time percent was about 6)

This looks like excellent numbers, which should hopefully give you
plenty of headroom.

Thanks for your feedback!
Willy