Re: OpenSSL 1.1.1 vs 3.0 client cert verify "x509_strict" issues

2022-12-12 Thread Remi Tricot-Le Breton

Hello,

On 12/12/2022 16:45, Froehlich, Dominik wrote:


Hello HAproxy community!

We’ve recently updated from OpenSSL 1.1.1 to OpenSSL 3.0 for our 
HAproxy deployment.


We are now seeing some client certificates getting denied with these 
error messages:


“*SSL client CA chain cannot be verified”/“error:0A86:SSL 
routines::certificate verify failed*” 30/0A86


We found out that for this CA certificate, the error was

X509_V_ERR_MISSING_SUBJECT_KEY_IDENTIFIER

This error is only thrown if we run openssl verify with the 
“-x509_strict” option. The same call (even with the “-x509_strict” 
option) on OpenSSL 1.1.1 returned OK and verified.




Indeed, OpenSSL extended what the x509_strict option actually does in 
order to follow the requirements described in RFC 5280. OpenSSL's commit 
0e071fbce4 gives a detailed list of the extra checks performed when 
x509_strict is set.


As this was a bit surprising to us and we now have a customer who 
can’t use their client certificate anymore, we wanted to ask for some 
details on the OpenSSL verify check in HAproxy:


  * How does HAproxy call the “verify” command in OpenSSL?



Actual certificate and certificate chain verification is performed 
inside OpenSSL so any default behavior change in OpenSSL itself might 
have an impact on which certificate we reject or not.




  * Does HAproxy use the “x509_strict” option programmatically?
  * Is there a flag in HAproxy that would allow us to temporarily
disable the “strict” setting so that the customer has time to
update their PKI?



I did not try to reproduce the problem you encountered yet but you might 
have success with a proper crt-ignore-err and ca-ignore-err combination 
(on HAProxy's side). It does not disable strict checking per se but it 
could allow you to accept certificates that were otherwise rejected.




  * If there is no flag, we could temporarily patch out the code that
uses the flag, can you give us some pointers?

Thanks a lot for your help!

Dominik Froehlich, SAP



Hope this helps.

Rémi LB


OpenSSL 1.1.1 vs 3.0 client cert verify "x509_strict" issues

2022-12-12 Thread Froehlich, Dominik
Hello HAproxy community!

We’ve recently updated from OpenSSL 1.1.1 to OpenSSL 3.0 for our HAproxy 
deployment.

We are now seeing some client certificates getting denied with these error 
messages:

“SSL client CA chain cannot be verified”/“error:0A86:SSL 
routines::certificate verify failed” 30/0A86

We found out that for this CA certificate, the error was

X509_V_ERR_MISSING_SUBJECT_KEY_IDENTIFIER


This error is only thrown if we run openssl verify with the “-x509_strict” 
option. The same call (even with the “-x509_strict” option) on OpenSSL 1.1.1 
returned OK and verified.

As this was a bit surprising to us and we now have a customer who can’t use 
their client certificate anymore, we wanted to ask for some details on the 
OpenSSL verify check in HAproxy:


  *   How does HAproxy call the “verify” command in OpenSSL?
  *   Does HAproxy use the “x509_strict” option programmatically?
  *   Is there a flag in HAproxy that would allow us to temporarily disable the 
“strict” setting so that the customer has time to update their PKI?
  *   If there is no flag, we could temporarily patch out the code that uses 
the flag, can you give us some pointers?


Thanks a lot for your help!

Dominik Froehlich, SAP


Re: Reproducible CI build with OpenSSL and "latest" keyword

2022-12-12 Thread William Lallemand
On Mon, Dec 12, 2022 at 07:27:59PM +0500, Илья Шипицин wrote:
> I attached a patch.
> 

Thanks!

> btw, we only build for the latest LibreSSL. are we ok to skip LibreSSL for
> stable branches ?
> 

In <= 2.5 we are still building with 3.5.3,
http://git.haproxy.org/?p=haproxy-2.5.git;a=blob;f=.github/matrix.py;hb=HEAD#l132

Ideally it would be better to still build libreSSL in stable.

In my opinion there should be at least one version + the latest for this
method to work, but if the latest is equal to an already built version
that doesn't make sens to build it again.

-- 
William Lallemand



Re: Reproducible CI build with OpenSSL and "latest" keyword

2022-12-12 Thread Илья Шипицин
I attached a patch.

btw, we only build for the latest LibreSSL. are we ok to skip LibreSSL for
stable branches ?

the remaining feature requests might be addressed later, I hope

пн, 12 дек. 2022 г. в 13:03, William Lallemand :

> On Mon, Dec 12, 2022 at 08:48:06AM +0100, William Lallemand wrote:
> > Hi Ilya !
> >
> > On Mon, Dec 12, 2022 at 10:56:11AM +0500, Илья Шипицин wrote:
> > > hello,
> > >
> > > I made some prototype of I meant:
> > >
> > >
> https://github.com/chipitsine/haproxy/commit/c95955ecfd1a5b514c235b0f155bfa71178b51d5
> > >
> >
> > - We don't often use "dev" in our branches so we should build everything
> >   when it's not a stable branch.
> >
> > - We don't want to build "3.0" OR latest, in fact we only need to
> >   condition the "latest" build, because the other one will always be
> >   built.
> >
> >   So once the "3.1" is released we could add an entry for it to
> >   the file and "latest" will be another version. This way we could
> >   backport the "3.1" in previous branches if we want to support it.
> >
> > > I;m not sure how stable branches are named in private github ci. If
> you can
> > > enlighten me, I'll try to adopt.
> > > currently, I did the following, if branch name is either master or
> contains
> > > "dev", so "latest" semantic is chosen, fixed versions are used
> otherwise.
> > >
> >
> > The stable branches are named "haproxy-X.Y", so in my opinion we should
> > build the "latest" for anything which is not a stable branch.
> >
> > > also, I know that the same ci is used for
> > >
> > > https://github.com/haproxytech/quic-dev
> > >
> > >
> > > @Frederic Lecaille  , which behaviour would
> you like
> > > for that repo ? what is branch naming convention ?
> > >
> > The same as the master branch IMHO.
> >
> > Also, the problem is uglier than I thought, we are not testing 1.1.1
> > anymore since "ubuntu-latest" was upgraded to 22.04 a few weeks ago
> > without us noticing.  "ssl=stock" is now a 3.0 branch. It brokes all
> > stable branches below 2.6 because they need the deprecated SSL API.
> > I changed "ubuntu-latest" to "ubuntu-20.04" for those branches so it
> > works as earlier. I'm going to reintroduce "1.1.1" for master to 2.6 so
> > it is correctly tested again.
> >
> > In my opinion we need a similar mecanism for the distribution than for
> > the ssl libs. Maybe using "latest" only in dev branches and a fixed
> > version for stable branches will be enough.
> >
> > Regards,
> >
>
> Just thought about something, is it possible to have the versions in the
> job names ? So we don't have surprises. For example the Ubuntu version
> which was resolved by "ubuntu-latest" and the SSL version of
> "ssl=stock", we could easily see the changes this way.
>
> --
> William Lallemand
>
From d3056da0e532914fca7ff0936be34d3df3e94602 Mon Sep 17 00:00:00 2001
From: Ilya Shipitsin 
Date: Mon, 12 Dec 2022 19:15:22 +0500
Subject: [PATCH] CI: split ssl lib selection based on git branch

when *SSL_VERSION="latest" behaviour was introduced, it seems to be fine
for development branches, but too intrusive for stable branches.

let us limit "latest" semantic only for development builds, if branch name
contains "haproxy-" it is supposed to be stable branch, no latest openssl
should be taken
---
 .github/matrix.py   | 10 --
 .github/workflows/vtest.yml |  2 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/.github/matrix.py b/.github/matrix.py
index 98d0a1f2a..fd9491aee 100755
--- a/.github/matrix.py
+++ b/.github/matrix.py
@@ -15,12 +15,12 @@ import re
 from os import environ
 
 if len(sys.argv) == 2:
-build_type = sys.argv[1]
+ref_name = sys.argv[1]
 else:
-print("Usage: {} ".format(sys.argv[0]), file=sys.stderr)
+print("Usage: {} ".format(sys.argv[0]), file=sys.stderr)
 sys.exit(1)
 
-print("Generating matrix for type '{}'.".format(build_type))
+print("Generating matrix for type '{}'.".format(ref_name))
 
 
 def clean_os(os):
@@ -129,11 +129,9 @@ for CC in ["gcc", "clang"]:
 "stock",
 "OPENSSL_VERSION=1.0.2u",
 "OPENSSL_VERSION=1.1.1s",
-"OPENSSL_VERSION=latest",
-"LIBRESSL_VERSION=latest",
 "QUICTLS=yes",
 #"BORINGSSL=yes",
-]:
+] + (["OPENSSL_VERSION=latest", "LIBRESSL_VERSION=latest"] if "haproxy-" not in ref_name else []):
 flags = ["USE_OPENSSL=1"]
 if ssl == "BORINGSSL=yes" or ssl == "QUICTLS=yes" or "LIBRESSL" in ssl:
 flags.append("USE_QUIC=1")
diff --git a/.github/workflows/vtest.yml b/.github/workflows/vtest.yml
index fb7b1d968..a7cdcc514 100644
--- a/.github/workflows/vtest.yml
+++ b/.github/workflows/vtest.yml
@@ -26,7 +26,7 @@ jobs:
   - uses: actions/checkout@v3
   - name: Generate Build Matrix
 id: set-matrix
-run: python3 .github/matrix.py "${{ github.event_name }}"
+run: python3 .github/matrix.py "${{ github.ref_name }}"
 
   # The Test job actually runs the tests.
   Test:
-- 
2.38.1



Re: Theoretical limits for a HAProxy instance

2022-12-12 Thread Jarno Huuskonen
Hi,

On Mon, 2022-12-12 at 09:47 +0100, Iago Alonso wrote:
> 

Can you share haproxy -vv output ?

> HAProxy config:
> global
>     log /dev/log len 65535 local0 warning
>     chroot /var/lib/haproxy
>     stats socket /run/haproxy-admin.sock mode 660 level admin
>     user haproxy
>     group haproxy
>     daemon
>     maxconn 200
>     maxconnrate 2500
>     maxsslrate 2500

From your graphs (haproxy_process_current_ssl_rate /
haproxy_process_current_connection_rate) you might hit
maxconnrate/maxsslrate

-Jarno

-- 
Jarno Huuskonen


Vertical scaling of HAProxy instances

2022-12-12 Thread Iago Alonso
Hello,

We are trying to vertically scale our HAProxy instances, and we are
not getting the results that one would expect by upgrading the
hardware (assuming that the software can take advantage of the extra
resources).

We upgraded from machines with 16 threads, to machines with 32
threads, and we are only observing a 50% increase in the ability to
sustain connections and rps, as well as SSL rate, and we can’t seem to
reach that rate before we overload the server.

I’ve recently posted about “Theoretical limits for a HAProxy
instance”, where I used the "Small" server as an example for the
limits we were observing. I am using the same metrics here. We
performed the same test in a bigger server with production traffic,
but raising the maxsslrate and maxconnrate, from 2500 to 5000.

"Small" server specs:
CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
RAM: DDR4 64GB (2666 MT/s)

"Big" server specs:
CPU: AMD Ryzen 9 5950X 16-Core Processor (32 threads)
RAM: DDR4 128GB (2666 MT/s)

This is the post on discourse, where I posted some of our Prometheus
metrics 
https://discourse.haproxy.org/t/vertical-scaling-of-haproxy-instances/8190
.

We are wondering if:
- Are these results expected?
- Does anyone with a similar setup/config get different results?

Thanks in advance.



Theoretical limits for a HAProxy instance

2022-12-12 Thread Iago Alonso
Hello,

We are performing a lot of load tests, and we hit what we think is an
artificial limit of some sort, or a parameter that we are not taking
into account (HAProxy config setting, kernel parameter…). We are
wondering if there’s a known limit on what HAProxy is able to process,
or if someone has experienced something similar, as we are thinking
about moving to bigger servers, and we don’t know if we will observe a
big difference.

When trying to perform the load test in production, we observe that we
can sustain 200k connections, and 10k rps, with a load1 of about 10.
The maxsslrate and maxsslconn are maxed out, but we handle the
requests fine, and we don’t return 5xx. Once we increase the load just
a bit and hit 11k rps and about 205k connections, we start to return
5xx and we rapidly decrease the load, as these are tests against
production.

Production server specs:
CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
RAM: DDR4 64GB (2666 MT/s)

When trying to perform a load test with synthetic tests using k6 as
our load generator against staging, we are able to sustain 750k
connections, with 20k rps. The load generator has a ramp-up time of
120s to achieve the 750k connections, as that’s what we are trying to
benchmark.

Staging server specs:
CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads)
RAM: DDR4 64GB (3200 MT/s)

I've made a post about this on discourse, and I got the suggestion to
post here. In said post, I've included screenshots of some of our
Prometheus metrics.
https://discourse.haproxy.org/t/theoretical-limits-for-a-haproxy-instance/8168

Custom kernel parameters:
net.ipv4.ip_local_port_range = "1276860999"
net.nf_conntrack_max = 500
fs.nr_open = 500

HAProxy config:
global
log /dev/log len 65535 local0 warning
chroot /var/lib/haproxy
stats socket /run/haproxy-admin.sock mode 660 level admin
user haproxy
group haproxy
daemon
maxconn 200
maxconnrate 2500
maxsslrate 2500

defaults
log global
option  dontlognull
timeout connect 10s
timeout client  120s
timeout server  120s

frontend stats
mode http
bind *:8404
http-request use-service prometheus-exporter if { path /metrics }
stats enable
stats uri /stats
stats refresh 10s

frontend k8s-api
bind *:6443
mode tcp
option tcplog
timeout client 300s
default_backend k8s-api

backend k8s-api
mode tcp
option tcp-check
timeout server 300s
balance leastconn
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s
maxconn 500 maxqueue 256 weight 100
server master01 x.x.x.x:6443 check
server master02 x.x.x.x:6443 check
server master03 x.x.x.x:6443 check
retries 0

frontend k8s-server
bind *:80
mode http
http-request add-header X-Forwarded-Proto http
http-request add-header X-Forwarded-Port 80
default_backend k8s-server

backend k8s-server
mode http
balance leastconn
option forwardfor
default-server inter 10s downinter 5s rise 2 fall 2 check
server worker01a x.x.x.x:31551 maxconn 20
server worker02a x.x.x.x:31551 maxconn 20
server worker03a x.x.x.x:31551 maxconn 20
server worker04a x.x.x.x:31551 maxconn 20
server worker05a x.x.x.x:31551 maxconn 20
server worker06a x.x.x.x:31551 maxconn 20
server worker07a x.x.x.x:31551 maxconn 20
server worker08a x.x.x.x:31551 maxconn 20
server worker09a x.x.x.x:31551 maxconn 20
server worker10a x.x.x.x:31551 maxconn 20
server worker11a x.x.x.x:31551 maxconn 20
server worker12a x.x.x.x:31551 maxconn 20
server worker13a x.x.x.x:31551 maxconn 20
server worker14a x.x.x.x:31551 maxconn 20
server worker15a x.x.x.x:31551 maxconn 20
server worker16a x.x.x.x:31551 maxconn 20
server worker17a x.x.x.x:31551 maxconn 20
server worker18a x.x.x.x:31551 maxconn 20
server worker19a x.x.x.x:31551 maxconn 20
server worker20a x.x.x.x:31551 maxconn 20
server worker01an x.x.x.x:31551 maxconn 20
server worker02an x.x.x.x:31551 maxconn 20
server worker03an x.x.x.x:31551 maxconn 20
retries 0

frontend k8s-server-https
bind *:443 ssl crt /etc/haproxy/certs/
mode http
http-request add-header X-Forwarded-Proto https
http-request add-header X-Forwarded-Port 443
http-request del-header X-SERVER-SNI
http-request set-header X-SERVER-SNI %[ssl_fc_sni] if { ssl_fc_sni
-m found }
http-request set-var(txn.fc_sni) hdr(X-SERVER-SNI) if {
hdr(X-SERVER-SNI) -m found }
http-request del-header X-SERVER-SNI
default_backend k8s-server-https

backend k8s-server-https
mode http
balance leastconn
option forwardfor
default-server inter 10s downinter 5s rise 2 fall 2  check no-check-ssl
server worker01a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt
sni var(txn.fc_sni) maxconn 20
server worker02a x.x.x.x:31445 ssl 

Re: Reproducible CI build with OpenSSL and "latest" keyword

2022-12-12 Thread William Lallemand
On Mon, Dec 12, 2022 at 08:48:06AM +0100, William Lallemand wrote:
> Hi Ilya !
> 
> On Mon, Dec 12, 2022 at 10:56:11AM +0500, Илья Шипицин wrote:
> > hello,
> > 
> > I made some prototype of I meant:
> > 
> > https://github.com/chipitsine/haproxy/commit/c95955ecfd1a5b514c235b0f155bfa71178b51d5
> > 
> 
> - We don't often use "dev" in our branches so we should build everything
>   when it's not a stable branch.
> 
> - We don't want to build "3.0" OR latest, in fact we only need to
>   condition the "latest" build, because the other one will always be
>   built. 
> 
>   So once the "3.1" is released we could add an entry for it to
>   the file and "latest" will be another version. This way we could
>   backport the "3.1" in previous branches if we want to support it.
> 
> > I;m not sure how stable branches are named in private github ci. If you can
> > enlighten me, I'll try to adopt.
> > currently, I did the following, if branch name is either master or contains
> > "dev", so "latest" semantic is chosen, fixed versions are used otherwise.
> > 
> 
> The stable branches are named "haproxy-X.Y", so in my opinion we should
> build the "latest" for anything which is not a stable branch.
> 
> > also, I know that the same ci is used for
> > 
> > https://github.com/haproxytech/quic-dev
> > 
> > 
> > @Frederic Lecaille  , which behaviour would you like
> > for that repo ? what is branch naming convention ?
> > 
> The same as the master branch IMHO.
> 
> Also, the problem is uglier than I thought, we are not testing 1.1.1
> anymore since "ubuntu-latest" was upgraded to 22.04 a few weeks ago
> without us noticing.  "ssl=stock" is now a 3.0 branch. It brokes all
> stable branches below 2.6 because they need the deprecated SSL API.
> I changed "ubuntu-latest" to "ubuntu-20.04" for those branches so it
> works as earlier. I'm going to reintroduce "1.1.1" for master to 2.6 so
> it is correctly tested again.
> 
> In my opinion we need a similar mecanism for the distribution than for
> the ssl libs. Maybe using "latest" only in dev branches and a fixed
> version for stable branches will be enough.
> 
> Regards,
> 

Just thought about something, is it possible to have the versions in the
job names ? So we don't have surprises. For example the Ubuntu version
which was resolved by "ubuntu-latest" and the SSL version of
"ssl=stock", we could easily see the changes this way.

-- 
William Lallemand