Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Shawn Heisey

On 5/29/23 20:38, Willy Tarreau wrote:

Have you verified that the CPU is saturated ?


The CPU on the machine running the test settles at about 1800 percent 
for my test program.  12 real cores, hyperthreaded.


The CPU on the frontend haproxy process is barely breathing hard.  Never 
saw it get above 150%.  That server has 24 real cores.


The CPU on the backend haproxy running on the raspberry pi hovers 
between 250 and 280%.  It's a 3B, so it has four CPU cores.


Those CPU values gathered with the test program running 24 threads with 
quictls 1.1.1t.  With 200 threads, the CPU usage on all 3 systems is 
even lower.


So I would say I am not saturating the CPU.  I need a different test 
methodology ... this Java program is not really doing much to haproxy.



Without keep-alive nor TLS resume, you should see roughly 1000 connections
per second per core, and with TLS resume you should see roughly 4000 conns/s
per core. So if you have 12 cores you should see 12000 or 48000 conns/s
depending if you're using TLS resume or full rekey.


It's doing whatever Apache's httpclient does with Java's TLS.  I know 
it's not doing keepalive, I explicitly pass the connection close header. 
 I do not know if it uses TLS resume or not, and I do not know how to 
discover that info.


I'm not seeing anywhere near that connection rate.  Not even with an 
haproxy backend.



Hmmm are you sure you didn't build the client with OpenSSL 3.0 ? I'm asking
because that was our first concern when we tested the perf on Intel's SPR
machine. No way to go beyond 400 conn/s, with haproxy totally idle and the
client at 100% on 48 cores... The cause was OpenSSL 3. Rebuilding under 1.1.1
jumped to 74000, almost 200 times more!


The client is a Java program running in Java 11, with nothing to have it 
use anything but Java's TLS.  It should not be using any version of openssl.



https://asciinema.elyograg.org/haproxyssltest1.html


Hmmm host not found here.


Oops.  I did not get that name in my public DNS.  Fixed.  The run it 
shows is from earlier, before I set up a backend running haproxy.  That 
run is using 200 threads.  When it ends, it reports the connection rate 
at 244.69 per second.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Willy Tarreau
On Sat, May 27, 2023 at 02:56:39PM -0600, Shawn Heisey wrote:
> On 5/27/23 02:59, Willy Tarreau wrote:
> > The little difference makes me think you've sent your requests over
> > a keep-alive connection, which is fine, but which doesn't stress the
> > TLS stack anymore.
> 
> Yup.  It was using keepalive.  I turned keepalive off and repeated the
> tests.
> 
> I'm still not seeing a notable difference between the branches, so I have to
> wonder whether I need a completely different test.  Or whether I simply
> don't need to worry about it at all because my traffic needs are so small.

Have you verified that the CPU is saturated ?

> Requests per second is down around 60 instead of 1200, and the request time
> percentile values went up.

At such a low performance it's unlikely that you could hurt the CPU at all,
I suspect the limiting factor is the load generator (or there's something
else).

> I've included two runs per branch here.  24
> threads, each doing 1000 requests.  The haproxy logs indicate the page I'm
> hitting returns 829 bytes, while the actual index.html is 1187 bytes.  I
> think gzip compression and the HTTP headers explains the difference.
> Without keepalive, the overall test takes a lot longer, which is not
> surprising.

Without keep-alive nor TLS resume, you should see roughly 1000 connections
per second per core, and with TLS resume you should see roughly 4000 conns/s
per core. So if you have 12 cores you should see 12000 or 48000 conns/s
depending if you're using TLS resume or full rekey.

Hmmm are you sure you didn't build the client with OpenSSL 3.0 ? I'm asking
because that was our first concern when we tested the perf on Intel's SPR
machine. No way to go beyond 400 conn/s, with haproxy totally idle and the
client at 100% on 48 cores... The cause was OpenSSL 3. Rebuilding under 1.1.1
jumped to 74000, almost 200 times more!

> The high percentiles are not encouraging.  7 seconds to get a web page under
> 1kb?, even with 1.1.1t?
> 
> This might be interesting to someone:
> 
> https://asciinema.elyograg.org/haproxyssltest1.html

Hmmm host not found here.

> I put the project in github.
> 
> https://github.com/elyograg/haproxytestssl

I'm seeing everything being done in doGet() but I have no idea about
the overhead of the allocations there nor the cost of the lower layers.
Maybe there's even some DNS resolution involved, I don't know. That's
exactly what I don't like with such languages, they come with tons of
pre-defined functions to do whatever but you have no idea how they do
them so in the end you don't know what you're testing.

Please do me a favor and verify two things:
  - check the CPU usage using "top" on the haproxy machine during the
test
  - check the CPU usage using "top" on the load geneator machine during
the test

Until you reach 100% on haproxy you're measuring something else. Please
do a comparative check using h1load from a machine having openssl 1.1.1
(e.g. ubuntu 20):

  git clone https://github.com/wtarreau/h1load/
  cd h1load
  make -j
  ./h1load -t $(nbproc) -c 240 -r 1 --tls-reuse https://hostname/path

This will create 240 concurrent connections to the server, without
keep-alive (-r 1 = 1 request per connection), with TLS session
resume, and using as many threads as you have CPU cores. You'll
see the number of connections per second in the cps column, and the
number of requests per second in the rps column. On the left column
you'll see the instant number of connections, and on the right you'll
see the response time in milliseconds. And please do check that this
time the CPU is saturated either on haproxy or on the client. If you
have some network latency between the two, you may need to increase
the number of connections. You can drop "-r 1" if you want to test
with keep-alive. Or you can drop --tls-reuse if you want to test the
rekeying performance (for sites that take many new clients making
few requests). You can also limit the total number of requests using
"-n 24000" for example. Better make sure this number is an integral
multiple of the number of connections, even though this is not mandatory
at least it's cleaner. Similarly it's better if the number of connections
(-c) is an integral multiple of the number of threads (-t) so that each
thread is equally loaded.

Willy



Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Shawn Heisey

On 5/29/23 01:43, Aleksandar Lazic wrote:

HAProxies FE => HAProxies BE => Destination Servers

Where the Destination Servers are also HAProxies which just returns a 
static content or any high performance low latency HTTPS Server.

With such a Setup can you test also the Client mode of the OpenSSL.


Oops.  Mistype sent that message before I could finish it.

Interesting idea.

I set up haproxy on raspberry pi and configured it to serve a static web 
page with https.  Running the same version of haproxy on both the main 
server and the raspi, running with the same version of quictls.


https://raspi1.elyograg.org

Side note: compiling and installing quictls and haproxy is a lot slower 
on a raspberry pi than on a dell server.  84 seconds on the dell server 
and 2591 seconds on the pi.  Make gets 12 threads on the server, 2 on 
the pi ... I give it half of the physical core count, rounded up to 2.


It took a while to get this info due to the slow compile speeds on the 
pi.  I wish build systems could give me an accurate estimate of how far 
done the build is.  The quictls one doesn't say ANYTHING.


The requests are taking more time in general.  This is due to another 
round trip (including TLS) from the server to the raspberry pi that did 
not occur before.  With the other URL, it was forwarding to Apache on 
the same server, port 81 without TLS.


I still wouldn't call it a smoking gun, but this test shows evidence of 
1.1 handling the concurrency better than 3.0.


1.1.1t:
20:31:21.177 [main] INFO  o.e.t.h.MainSSLTest Count 24000 310.31/s
20:31:21.177 [main] INFO  o.e.t.h.MainSSLTest 10th % 53 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest 25th % 60 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest Median 69 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest 75th % 81 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest 95th % 125 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest 99th % 163 ms
20:31:21.178 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 633 ms

3.0.8:
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest Count 24000 290.48/s
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest 10th % 59 ms
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest 25th % 66 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest Median 75 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 75th % 87 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 95th % 123 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 99th % 161 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 1004 ms

3.1.0+locks:
The quictls compile failed on the pi.  So I couldn't test this one.  I 
suppose I could have done it without TLS, but I didn't do that.  Here's 
the log from the compile:


/usr/bin/ld: unknown architecture of input file 
`libcrypto.a(libdefault-lib-pbkdf2_fips.o)' is incompatible with aarch64 
output

collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:22146: fuzz/cmp-test] Error 1
make[1]: *** Waiting for unfinished jobs
/usr/bin/ld: unknown architecture of input file 
`libcrypto.a(libdefault-lib-pbkdf2_fips.o)' is incompatible with aarch64 
output

collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:22270: fuzz/punycode-test] Error 1
make: *** [Makefile:3278: build_sw] Error 2

I wonder why that happened.  1.1.1t and 3.0.8 compiled just fine.  All 
three work on x86_64.


I should set up my third server to serve the static page from haproxy. 
It's x86_64.  Maybe when I find all that free time I am looking for!


Slightly interesting detail, not sure what it means:  The backend for 
haproxy on the pi shows L6OK on the stats page instead of L7OK like all 
the other backends.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Shawn Heisey

On 5/29/23 19:52, Shawn Heisey wrote:

Interesting idea.


So sorry.  I was writing up the new reply, and my fingers got confused 
for a moment, accidentally did Ctrl-Enter which tells Thunderbird to 
send the message.  Will send a new complete reply.




Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Shawn Heisey

On 5/29/23 01:43, Aleksandar Lazic wrote:

HAProxies FE => HAProxies BE => Destination Servers

Where the Destination Servers are also HAProxies which just returns a 
static content or any high performance low latency HTTPS Server.

With such a Setup can you test also the Client mode of the OpenSSL.


Interesting idea.

I set up haproxy on raspberry pi and configured it to serve a static web 
page with https.  Running the same version of haproxy on both the main 
server and the raspi, running with the same version of quictls.


https://raspi1.elyograg.org

Side note: compiling and installing quictls and haproxy is a lot slower 
on a raspberry pi than on a dell server.  84 seconds on the dell server 
and 2591 seconds on the pi.  Make gets 12 threads on the server, 2 on 
the pi ... I give it half of the physical core count, rounded up to 2.


It took a while to get this info due to the slow compile speeds on the 
pi.  I wish build systems could give me an accurate estimate of how far 
done the build is.  The quictls one doesn't say ANYTHING.


The requests are taking more time in general.  This is due to another 
round trip (including SSL) from the server to the raspberry pi that did 
not occur before.  With the other URL, it was forwarding to Apache on 
the same server, port 81 without ssl.


1.1.t:

3.0.8:
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest Count 24000 290.48/s
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest 10th % 59 ms
19:22:12.281 [main] INFO  o.e.t.h.MainSSLTest 25th % 66 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest Median 75 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 75th % 87 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 95th % 123 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 99th % 161 ms
19:22:12.282 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 1004 ms

3.1.0+locks:

Couldn't do this one.  Compile fails:





Re: Followup on openssl 3.0 note seen in another thread

2023-05-29 Thread Aleksandar Lazic

Hi Shawn.

On 2023-05-28 (So.) 05:30, Shawn Heisey wrote:

On 5/27/23 18:03, Shawn Heisey wrote:

On 5/27/23 14:56, Shawn Heisey wrote:
Yup.  It was using keepalive.  I turned keepalive off and repeated 
the tests.


I did the tests again with 200 threads.  The system running the tests 
has 12 hyperthreaded cores, so this definitely pushes its capabilities.


I had forgotten a crucial fact that means all my prior testing work was 
invalid:  Apache HttpClient 4.x defaults to a max simultaneous 
connection count of 2.  Not going to exercise concurrency with that!


I have increased that to 1024, my program's max thread count, and now 
the test is a LOT faster ... it's actually running 200 threads at the 
same time.  Two runs per branch here, one with 200 threads and one with 
24 threads.


Still no smoking gun showing 3.0 as the slowest of the bunch.  In fact, 
3.0 is giving the best results!  So my test method is still probably the 
wrong approach.


Maybe you can change the setup in that way

HAProxies FE => HAProxies BE => Destination Servers

Where the Destination Servers are also HAProxies which just returns a 
static content or any high performance low latency HTTPS Server.

With such a Setup can you test also the Client mode of the OpenSSL.

Regards
Alex


1.1.1t:
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest Count 20 234.54/s
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest 10th % 54 ms
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest 25th % 94 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest Median 188 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 75th % 991 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 95th % 3698 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 99th % 6924 ms
21:06:45.390 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 11983 ms
-
21:20:35.400 [main] INFO  o.e.t.h.MainSSLTest Count 24000 355.56/s
21:20:35.400 [main] INFO  o.e.t.h.MainSSLTest 10th % 40 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 25th % 46 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest Median 57 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 75th % 71 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 95th % 126 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 99th % 168 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 721 ms

3.0.8:
20:50:12.916 [main] INFO  o.e.t.h.MainSSLTest Count 20 244.69/s
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 10th % 56 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 25th % 93 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest Median 197 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 75th % 949 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 95th % 3425 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 99th % 6679 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 11582 ms
-
21:23:22.076 [main] INFO  o.e.t.h.MainSSLTest Count 24000 404.78/s
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 10th % 40 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 25th % 45 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest Median 53 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 75th % 63 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 95th % 90 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 99th % 121 ms
21:23:22.078 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 671 ms

3.1.0+locks:
20:33:32.805 [main] INFO  o.e.t.h.MainSSLTest Count 20 238.02/s
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 10th % 58 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 25th % 95 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest Median 196 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 75th % 1001 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 95th % 3475 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 99th % 6288 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 10700 ms
-
21:26:24.555 [main] INFO  o.e.t.h.MainSSLTest Count 24000 402.89/s
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 10th % 39 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 25th % 45 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest Median 52 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 75th % 64 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 95th % 93 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 99th % 127 ms
21:26:24.557 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 689 ms





Re: Followup on openssl 3.0 note seen in another thread

2023-05-27 Thread Shawn Heisey

On 5/27/23 18:03, Shawn Heisey wrote:

On 5/27/23 14:56, Shawn Heisey wrote:
Yup.  It was using keepalive.  I turned keepalive off and repeated the 
tests.


I did the tests again with 200 threads.  The system running the tests 
has 12 hyperthreaded cores, so this definitely pushes its capabilities.


I had forgotten a crucial fact that means all my prior testing work was 
invalid:  Apache HttpClient 4.x defaults to a max simultaneous 
connection count of 2.  Not going to exercise concurrency with that!


I have increased that to 1024, my program's max thread count, and now 
the test is a LOT faster ... it's actually running 200 threads at the 
same time.  Two runs per branch here, one with 200 threads and one with 
24 threads.


Still no smoking gun showing 3.0 as the slowest of the bunch.  In fact, 
3.0 is giving the best results!  So my test method is still probably the 
wrong approach.



1.1.1t:
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest Count 20 234.54/s
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest 10th % 54 ms
21:06:45.388 [main] INFO  o.e.t.h.MainSSLTest 25th % 94 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest Median 188 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 75th % 991 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 95th % 3698 ms
21:06:45.389 [main] INFO  o.e.t.h.MainSSLTest 99th % 6924 ms
21:06:45.390 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 11983 ms
-
21:20:35.400 [main] INFO  o.e.t.h.MainSSLTest Count 24000 355.56/s
21:20:35.400 [main] INFO  o.e.t.h.MainSSLTest 10th % 40 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 25th % 46 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest Median 57 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 75th % 71 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 95th % 126 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 99th % 168 ms
21:20:35.401 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 721 ms

3.0.8:
20:50:12.916 [main] INFO  o.e.t.h.MainSSLTest Count 20 244.69/s
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 10th % 56 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 25th % 93 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest Median 197 ms
20:50:12.917 [main] INFO  o.e.t.h.MainSSLTest 75th % 949 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 95th % 3425 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 99th % 6679 ms
20:50:12.918 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 11582 ms
-
21:23:22.076 [main] INFO  o.e.t.h.MainSSLTest Count 24000 404.78/s
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 10th % 40 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 25th % 45 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest Median 53 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 75th % 63 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 95th % 90 ms
21:23:22.077 [main] INFO  o.e.t.h.MainSSLTest 99th % 121 ms
21:23:22.078 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 671 ms

3.1.0+locks:
20:33:32.805 [main] INFO  o.e.t.h.MainSSLTest Count 20 238.02/s
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 10th % 58 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 25th % 95 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest Median 196 ms
20:33:32.806 [main] INFO  o.e.t.h.MainSSLTest 75th % 1001 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 95th % 3475 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 99th % 6288 ms
20:33:32.807 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 10700 ms
-
21:26:24.555 [main] INFO  o.e.t.h.MainSSLTest Count 24000 402.89/s
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 10th % 39 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 25th % 45 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest Median 52 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 75th % 64 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 95th % 93 ms
21:26:24.556 [main] INFO  o.e.t.h.MainSSLTest 99th % 127 ms
21:26:24.557 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 689 ms



Re: Followup on openssl 3.0 note seen in another thread

2023-05-27 Thread Shawn Heisey

On 5/27/23 14:56, Shawn Heisey wrote:
Yup.  It was using keepalive.  I turned keepalive off and repeated the 
tests.


I did the tests again with 200 threads.  The system running the tests 
has 12 hyperthreaded cores, so this definitely pushes its capabilities.


The system running haproxy has 24 hyperthreaded cores.  There is no 
thread or process info in haproxy.cfg.


200 threads takes so long to run that I didn't do multiple runs per 
branch.  Any inconsistencies created by the fact that haproxy has just 
been restarted will hopefully be leveled out due to how long the run takes.


The request times for 200 threads vs. 24 threads shows that the speed 
went down.  I think I have definitely saturated the test system, and 
hopefully also the haproxy server.  Still no smoking gun showing the 
lock problems in 3.0.  I had hoped that would be apparent.


1.1.1t:
15:52:18.666 [main] INFO  o.e.t.h.MainSSLTest Count 20 56.82/s
15:52:18.668 [main] INFO  o.e.t.h.MainSSLTest 10th % 31 ms
15:52:18.668 [main] INFO  o.e.t.h.MainSSLTest 25th % 47 ms
15:52:18.668 [main] INFO  o.e.t.h.MainSSLTest Median 994 ms
15:52:18.669 [main] INFO  o.e.t.h.MainSSLTest 75th % 4953 ms
15:52:18.669 [main] INFO  o.e.t.h.MainSSLTest 95th % 14205 ms
15:52:18.669 [main] INFO  o.e.t.h.MainSSLTest 99th % 23581 ms
15:52:18.669 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 37396 ms

3.0.8:
16:59:03.645 [main] INFO  o.e.t.h.MainSSLTest Count 20 58.34/s
16:59:03.647 [main] INFO  o.e.t.h.MainSSLTest 10th % 30 ms
16:59:03.648 [main] INFO  o.e.t.h.MainSSLTest 25th % 35 ms
16:59:03.648 [main] INFO  o.e.t.h.MainSSLTest Median 368 ms
16:59:03.648 [main] INFO  o.e.t.h.MainSSLTest 75th % 4606 ms
16:59:03.648 [main] INFO  o.e.t.h.MainSSLTest 95th % 14840 ms
16:59:03.649 [main] INFO  o.e.t.h.MainSSLTest 99th % 25561 ms
16:59:03.649 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 40826 ms

3.1.1+locks:
18:01:04.198 [main] INFO  o.e.t.h.MainSSLTest Count 20 56.69/s
18:01:04.198 [main] INFO  o.e.t.h.MainSSLTest 10th % 31 ms
18:01:04.198 [main] INFO  o.e.t.h.MainSSLTest 25th % 39 ms
18:01:04.199 [main] INFO  o.e.t.h.MainSSLTest Median 455 ms
18:01:04.199 [main] INFO  o.e.t.h.MainSSLTest 75th % 4759 ms
18:01:04.199 [main] INFO  o.e.t.h.MainSSLTest 95th % 15071 ms
18:01:04.199 [main] INFO  o.e.t.h.MainSSLTest 99th % 25729 ms
18:01:04.200 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 41308 ms



Re: Followup on openssl 3.0 note seen in another thread

2023-05-27 Thread Shawn Heisey

On 5/27/23 02:59, Willy Tarreau wrote:

The little difference makes me think you've sent your requests over
a keep-alive connection, which is fine, but which doesn't stress the
TLS stack anymore.


Yup.  It was using keepalive.  I turned keepalive off and repeated the 
tests.


I'm still not seeing a notable difference between the branches, so I 
have to wonder whether I need a completely different test.  Or whether I 
simply don't need to worry about it at all because my traffic needs are 
so small.


Requests per second is down around 60 instead of 1200, and the request 
time percentile values went up.  I've included two runs per branch here. 
 24 threads, each doing 1000 requests.  The haproxy logs indicate the 
page I'm hitting returns 829 bytes, while the actual index.html is 1187 
bytes.  I think gzip compression and the HTTP headers explains the 
difference.  Without keepalive, the overall test takes a lot longer, 
which is not surprising.


The high percentiles are not encouraging.  7 seconds to get a web page 
under 1kb?, even with 1.1.1t?


This might be interesting to someone:

https://asciinema.elyograg.org/haproxyssltest1.html

I put the project in github.

https://github.com/elyograg/haproxytestssl

quictls branch: OpenSSL_1_1_1t+quic
14:15:57.496 [main] INFO  o.e.t.h.MainSSLTest Count 24000 64.65/s
14:15:57.498 [main] INFO  o.e.t.h.MainSSLTest 10th % 28 ms
14:15:57.499 [main] INFO  o.e.t.h.MainSSLTest 25th % 28 ms
14:15:57.499 [main] INFO  o.e.t.h.MainSSLTest Median 31 ms
14:15:57.499 [main] INFO  o.e.t.h.MainSSLTest 75th % 65 ms
14:15:57.500 [main] INFO  o.e.t.h.MainSSLTest 95th % 2690 ms
14:15:57.500 [main] INFO  o.e.t.h.MainSSLTest 99th % 5058 ms
14:15:57.500 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 9342 ms
-
14:22:19.922 [main] INFO  o.e.t.h.MainSSLTest Count 24000 65.39/s
14:22:19.924 [main] INFO  o.e.t.h.MainSSLTest 10th % 28 ms
14:22:19.924 [main] INFO  o.e.t.h.MainSSLTest 25th % 28 ms
14:22:19.924 [main] INFO  o.e.t.h.MainSSLTest Median 31 ms
14:22:19.925 [main] INFO  o.e.t.h.MainSSLTest 75th % 62 ms
14:22:19.925 [main] INFO  o.e.t.h.MainSSLTest 95th % 2683 ms
14:22:19.925 [main] INFO  o.e.t.h.MainSSLTest 99th % 4978 ms
14:22:19.925 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 7291 ms

quictls branch: openssl-3.1.0+quic+locks
13:15:28.901 [main] INFO  o.e.t.h.MainSSLTest Count 24000 63.43/s
13:15:28.903 [main] INFO  o.e.t.h.MainSSLTest 10th % 29 ms
13:15:28.903 [main] INFO  o.e.t.h.MainSSLTest 25th % 29 ms
13:15:28.903 [main] INFO  o.e.t.h.MainSSLTest Median 32 ms
13:15:28.904 [main] INFO  o.e.t.h.MainSSLTest 75th % 66 ms
13:15:28.904 [main] INFO  o.e.t.h.MainSSLTest 95th % 2660 ms
13:15:28.904 [main] INFO  o.e.t.h.MainSSLTest 99th % 4879 ms
13:15:28.905 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 9241 ms
-
13:23:15.119 [main] INFO  o.e.t.h.MainSSLTest Count 24000 62.99/s
13:23:15.121 [main] INFO  o.e.t.h.MainSSLTest 10th % 29 ms
13:23:15.122 [main] INFO  o.e.t.h.MainSSLTest 25th % 29 ms
13:23:15.122 [main] INFO  o.e.t.h.MainSSLTest Median 32 ms
13:23:15.122 [main] INFO  o.e.t.h.MainSSLTest 75th % 61 ms
13:23:15.123 [main] INFO  o.e.t.h.MainSSLTest 95th % 2275 ms
13:23:15.123 [main] INFO  o.e.t.h.MainSSLTest 99th % 6189 ms
13:23:15.123 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 11406 ms

quictls branch: openssl-3.0.8+quic
13:34:25.780 [main] INFO  o.e.t.h.MainSSLTest Count 24000 64.57/s
13:34:25.783 [main] INFO  o.e.t.h.MainSSLTest 10th % 28 ms
13:34:25.783 [main] INFO  o.e.t.h.MainSSLTest 25th % 28 ms
13:34:25.783 [main] INFO  o.e.t.h.MainSSLTest Median 33 ms
13:34:25.783 [main] INFO  o.e.t.h.MainSSLTest 75th % 66 ms
13:34:25.784 [main] INFO  o.e.t.h.MainSSLTest 95th % 2642 ms
13:34:25.784 [main] INFO  o.e.t.h.MainSSLTest 99th % 4994 ms
13:34:25.784 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 7503 ms
-
14:08:33.750 [main] INFO  o.e.t.h.MainSSLTest Count 24000 63.06/s
14:08:33.753 [main] INFO  o.e.t.h.MainSSLTest 10th % 28 ms
14:08:33.753 [main] INFO  o.e.t.h.MainSSLTest 25th % 29 ms
14:08:33.754 [main] INFO  o.e.t.h.MainSSLTest Median 33 ms
14:08:33.754 [main] INFO  o.e.t.h.MainSSLTest 75th % 64 ms
14:08:33.754 [main] INFO  o.e.t.h.MainSSLTest 95th % 2904 ms
14:08:33.754 [main] INFO  o.e.t.h.MainSSLTest 99th % 5216 ms
14:08:33.755 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 8287 ms



Re: Followup on openssl 3.0 note seen in another thread

2023-05-27 Thread Willy Tarreau
Hi Shawn,

On Fri, May 26, 2023 at 11:17:15PM -0600, Shawn Heisey wrote:
> On 5/25/23 09:08, Willy Tarreau wrote:
> > The problem definitely is concurrency, so 1000 curl will show nothing
> > and will not even match production traffic. You'll need to use a load
> > generator that allows you to tweak the TLS resume support, like we do
> > with h1load's argument "--tls-reuse". Also I don't know how often the
> > recently modified locks are used per server connection and per client
> > connection, that's what the SSL guys want to know since they're not able
> > to test their changes.
> 
> I finally got a test program together.  After trying and failing with the
> Jetty HttpClient and Apache HttpClient version 5 (both options that would
> have let me do HTTP/2) I got a program together with Apache HttpClient
> version 4.  I had one version that shelled out to curl, but it ran about ten
> times slower.
> 
> I know lots of people are going to have bad things to say about writing a
> test in Java.  It's the only language where I already know how to write
> multi-threaded code.

:-)

> I would have to spend a bunch of time learning how to
> do that in another language.

For h2 there's h2load that is available but it doesn't allow you to close
and re-open connections.

> It fires up X threads, each of which make 1000 consecutive requests to the
> URL specified.  It records the time in milliseconds for each request, and
> when all the threads finish, prints out statistics.  These runs are with 24
> threads.  I ran it on a different system so that it would not affect CPU
> usage on the server running haproxy.  Here's the results:
> 
> quictls branch: OpenSSL_1_1_1t+quic
> 23:01:19.067 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1228.69/s
> 23:01:19.069 [main] INFO  o.e.t.h.MainSSLTest Median 7562839 ns
> 23:01:19.069 [main] INFO  o.e.t.h.MainSSLTest 75th % 25138492 ns
> 23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 95th % 70603313 ns
> 23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 99th % 120502022 ns
> 23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 355829439 ns
> 
> quictls branch: openssl-3.1.0+quic+locks
> 22:56:11.457 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1267.96/s
> 22:56:11.459 [main] INFO  o.e.t.h.MainSSLTest Median 6827111 ns
> 22:56:11.459 [main] INFO  o.e.t.h.MainSSLTest 75th % 23239248 ns
> 22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 95th % 70625628 ns
> 22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 99th % 129494323 ns
> 22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 307070582 ns
> 
> quictls branch: openssl-3.0.8+quic
> 22:59:12.614 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1163.24/s
> 22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest Median 6930268 ns
> 22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 75th % 26238752 ns
> 22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 95th % 75464869 ns
> 22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 99th % 132522508 ns
> 22:59:12.617 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 445411125 ns
> 
> The stats don't show any kind of smoking gun like I had hoped they would.
> Not a lot of difference there.
> 
> Differences in the requests per second are also not huge, but more in line
> with what I was expecting.  If I can believe those numbers, and I admit that
> this kind of micro-benchmark is not the most reliable way to test
> performance, it looks like 3.1.0 with the lock fixes is slightly faster than
> 1.1.1t. 24 threads might not be enough to really exercise the concurrency
> though.

The little difference makes me think you've sent your requests over
a keep-alive connection, which is fine, but which doesn't stress the
TLS stack anymore. Those suffering from TLS performance problems are
those with many connections where the sole fact of resuming a TLS
session (and even more creating a new one) takes a lot of time. But
if your requests all pass over established connections, the TLS stack
does nothing anymore, that's just trivial AES crypto that comes for
free nowadays.

I have updated the ticket there with my measurements. With 24 cores
I didn't measure a big difference in new sessions rate since the CPU
was dominated by asymmetric crypto (27.4k for 3.1 vs 30.5k for 1.1.1
and 35k for wolfSSL). However with resumed connections the difference
was more visible: 48.5k for 3.1, 49.9k for 3.1+locks, 106k for 1.1.1
and 124k for wolfSSL. And there, there's not that much contention
(around 15% CPU lost waiting for a lock), which tends to indicate that
it's mainly the excess usage of locks (even uncontended) or atomic ops
that divides the performance by 2-2.5.

For some users it means that if they currently need 4 LB to stay under
80% load in 1.1.1, they will need 8-9 with 3.1 under the same conditions.

Another point that I didn't measure there (because it's always a pain
to do) is the client mode, which is much more affected. It's less
dramatic in 3.1 than in 3.0 but still very impacted. This will affect
re-encrypted communications between haproxy and the origin serv

Re: Followup on openssl 3.0 note seen in another thread

2023-05-26 Thread Shawn Heisey

On 5/25/23 09:08, Willy Tarreau wrote:

The problem definitely is concurrency, so 1000 curl will show nothing
and will not even match production traffic. You'll need to use a load
generator that allows you to tweak the TLS resume support, like we do
with h1load's argument "--tls-reuse". Also I don't know how often the
recently modified locks are used per server connection and per client
connection, that's what the SSL guys want to know since they're not able
to test their changes.


I finally got a test program together.  After trying and failing with 
the Jetty HttpClient and Apache HttpClient version 5 (both options that 
would have let me do HTTP/2) I got a program together with Apache 
HttpClient version 4.  I had one version that shelled out to curl, but 
it ran about ten times slower.


I know lots of people are going to have bad things to say about writing 
a test in Java.  It's the only language where I already know how to 
write multi-threaded code.  I would have to spend a bunch of time 
learning how to do that in another language.


It fires up X threads, each of which make 1000 consecutive requests to 
the URL specified.  It records the time in milliseconds for each 
request, and when all the threads finish, prints out statistics.  These 
runs are with 24 threads.  I ran it on a different system so that it 
would not affect CPU usage on the server running haproxy.  Here's the 
results:


quictls branch: OpenSSL_1_1_1t+quic
23:01:19.067 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1228.69/s
23:01:19.069 [main] INFO  o.e.t.h.MainSSLTest Median 7562839 ns
23:01:19.069 [main] INFO  o.e.t.h.MainSSLTest 75th % 25138492 ns
23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 95th % 70603313 ns
23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 99th % 120502022 ns
23:01:19.070 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 355829439 ns

quictls branch: openssl-3.1.0+quic+locks
22:56:11.457 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1267.96/s
22:56:11.459 [main] INFO  o.e.t.h.MainSSLTest Median 6827111 ns
22:56:11.459 [main] INFO  o.e.t.h.MainSSLTest 75th % 23239248 ns
22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 95th % 70625628 ns
22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 99th % 129494323 ns
22:56:11.460 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 307070582 ns

quictls branch: openssl-3.0.8+quic
22:59:12.614 [main] INFO  o.e.t.h.MainSSLTest Count 24000 1163.24/s
22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest Median 6930268 ns
22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 75th % 26238752 ns
22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 95th % 75464869 ns
22:59:12.616 [main] INFO  o.e.t.h.MainSSLTest 99th % 132522508 ns
22:59:12.617 [main] INFO  o.e.t.h.MainSSLTest 99.9 % 445411125 ns

The stats don't show any kind of smoking gun like I had hoped they 
would.  Not a lot of difference there.


Differences in the requests per second are also not huge, but more in 
line with what I was expecting.  If I can believe those numbers, and I 
admit that this kind of micro-benchmark is not the most reliable way to 
test performance, it looks like 3.1.0 with the lock fixes is slightly 
faster than 1.1.1t. 24 threads might not be enough to really exercise 
the concurrency though.


I will poke at it a little more tomorrow, trying more threads.



Re: Followup on openssl 3.0 note seen in another thread

2023-05-25 Thread Илья Шипицин
чт, 25 мая 2023 г. в 17:11, Willy Tarreau :

> On Thu, May 25, 2023 at 07:33:11AM -0600, Shawn Heisey wrote:
> > On 3/11/23 22:52, Willy Tarreau wrote:
> > > According to the OpenSSL devs, 3.1 should be "4 times better than 3.0",
> > > so it could still remain 5-40 times worse than 1.1.1. I intend to run
> > > some tests soon on it on a large machine, but preparing tests takes a
> > > lot of time and my progress got delayed by the painful bug of last
> week.
> > > I'll share my findings anywya.
> >
> > Just noticed that quictls has a special branch for lock changes in 3.1.0:
> >
> > https://github.com/quictls/openssl/tree/openssl-3.1.0+quic+locks
>
> Yes, it was made so that the few of us who reported important issues can
> retest the impact of the changes. I hope to be able to run a test on a
> smaller machine soon.
>
> > I am not sure how to go about proper testing for performance on this.  I
> did
> > try a very basic "curl a URL 1000 times in bash" test back when 3.1.0 was
> > released, but that showed 3.0.8 and 3.1.0 were faster than 1.1.1, so
> > concurrency is likely required to see a problem.
>
> The problem definitely is concurrency, so 1000 curl will show nothing
> and will not even match production traffic. You'll need to use a load
>

I do not think 1000 instances of curl are required.

I recall doing some comparative tests (when we evaluated arm64 servers),
some really lightweight
with profiling enabled were enough to compare "before" and "after".

I'll try the JMeter next weekend maybe.



> generator that allows you to tweak the TLS resume support, like we do
> with h1load's argument "--tls-reuse". Also I don't know how often the
> recently modified locks are used per server connection and per client
> connection, that's what the SSL guys want to know since they're not able
> to test their changes.
>
> The first test report *before* the changes was published here a month
> ago:
>
>
> https://github.com/openssl/openssl/issues/20286#issuecomment-1527869072
>
> And now we have to find time to setup a test platform to test this one
> in more or less similar conditions (or at least run a before/after).
>
> Do not hesitate to participate if you see you can provide results
> comparing the two quictls-3.1 branches, it will help already. It's even
> possible that these efforts do not bring anything yet, we don't know and
> that's what they want to know.
>
> Thanks,
> Willy
>
>


Re: Followup on openssl 3.0 note seen in another thread

2023-05-25 Thread Willy Tarreau
On Thu, May 25, 2023 at 07:33:11AM -0600, Shawn Heisey wrote:
> On 3/11/23 22:52, Willy Tarreau wrote:
> > According to the OpenSSL devs, 3.1 should be "4 times better than 3.0",
> > so it could still remain 5-40 times worse than 1.1.1. I intend to run
> > some tests soon on it on a large machine, but preparing tests takes a
> > lot of time and my progress got delayed by the painful bug of last week.
> > I'll share my findings anywya.
> 
> Just noticed that quictls has a special branch for lock changes in 3.1.0:
> 
> https://github.com/quictls/openssl/tree/openssl-3.1.0+quic+locks

Yes, it was made so that the few of us who reported important issues can
retest the impact of the changes. I hope to be able to run a test on a
smaller machine soon.

> I am not sure how to go about proper testing for performance on this.  I did
> try a very basic "curl a URL 1000 times in bash" test back when 3.1.0 was
> released, but that showed 3.0.8 and 3.1.0 were faster than 1.1.1, so
> concurrency is likely required to see a problem.

The problem definitely is concurrency, so 1000 curl will show nothing
and will not even match production traffic. You'll need to use a load
generator that allows you to tweak the TLS resume support, like we do
with h1load's argument "--tls-reuse". Also I don't know how often the
recently modified locks are used per server connection and per client
connection, that's what the SSL guys want to know since they're not able
to test their changes.

The first test report *before* the changes was published here a month
ago:

 https://github.com/openssl/openssl/issues/20286#issuecomment-1527869072

And now we have to find time to setup a test platform to test this one
in more or less similar conditions (or at least run a before/after).

Do not hesitate to participate if you see you can provide results
comparing the two quictls-3.1 branches, it will help already. It's even
possible that these efforts do not bring anything yet, we don't know and
that's what they want to know.

Thanks,
Willy



Re: Followup on openssl 3.0 note seen in another thread

2023-05-25 Thread Shawn Heisey

On 3/11/23 22:52, Willy Tarreau wrote:

According to the OpenSSL devs, 3.1 should be "4 times better than 3.0",
so it could still remain 5-40 times worse than 1.1.1. I intend to run
some tests soon on it on a large machine, but preparing tests takes a
lot of time and my progress got delayed by the painful bug of last week.
I'll share my findings anywya.


Just noticed that quictls has a special branch for lock changes in 3.1.0:

https://github.com/quictls/openssl/tree/openssl-3.1.0+quic+locks

I am not sure how to go about proper testing for performance on this.  I 
did try a very basic "curl a URL 1000 times in bash" test back when 
3.1.0 was released, but that showed 3.0.8 and 3.1.0 were faster than 
1.1.1, so concurrency is likely required to see a problem.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2023-03-11 Thread Willy Tarreau
Hi Shawn,

On Sat, Mar 11, 2023 at 07:10:30PM -0700, Shawn Heisey wrote:
> On 12/14/22 07:15, Willy Tarreau wrote:
> > On Wed, Dec 14, 2022 at 07:01:59AM -0700, Shawn Heisey wrote:
> > > On 12/14/22 06:07, Willy Tarreau wrote:
> > > > By the way, are you running with OpenSSL
> > > > 3.0 ?  That one is absolutely terrible and makes extreme abuse of
> > > > mutexes and locks, to the point that certain workloads were divided
> > > > by 2-digit numbers between 1.1.1 and 3.0. It took me one day to
> > > > figure that my load generator which was caping at 400 conn/s was in
> > > > fact suffering from an accidental build using 3.0 while in 1.1.1
> > > > the perf went back to 75000/s!
> > > 
> > > Is this a current problem with the latest openssl built from source?
> > 
> > Yes and deeper than that actually, there's even a meta-issue to try to
> > reference the many reports for massive performance regressions on the
> > project:
> 
> A followup to my followup.  Time flies!
> 
> I was just reading on the openssl mailing list about what's coming in
> version 3.1.  The first release highlight is:
> 
> * Refactoring of the OSSL_LIB_CTX code to avoid excessive locking
> 
> Is anyone enough in tune with openssl happenings to know whether that fixes
> the issues that Willy was advising me about?  Or maybe improves the
> situation but doesn't fully resolve it?

According to the OpenSSL devs, 3.1 should be "4 times better than 3.0",
so it could still remain 5-40 times worse than 1.1.1. I intend to run
some tests soon on it on a large machine, but preparing tests takes a
lot of time and my progress got delayed by the painful bug of last week.
I'll share my findings anywya.

> I tried to figure this out for myself based on data in the CHANGES.md file,
> but didn't see anything that looked relevant to my very untrained eye.

Quite frankly I suspect it's the same for those who write that file as
well :-/

> Reading the code wouldn't help, as I am completely clueless when it comes to
> encryption code.

Same for me.

Cheers,
Willy



Re: Followup on openssl 3.0 note seen in another thread

2023-03-11 Thread Shawn Heisey

On 12/14/22 07:15, Willy Tarreau wrote:

On Wed, Dec 14, 2022 at 07:01:59AM -0700, Shawn Heisey wrote:

On 12/14/22 06:07, Willy Tarreau wrote:

By the way, are you running with OpenSSL
3.0 ?  That one is absolutely terrible and makes extreme abuse of
mutexes and locks, to the point that certain workloads were divided
by 2-digit numbers between 1.1.1 and 3.0. It took me one day to
figure that my load generator which was caping at 400 conn/s was in
fact suffering from an accidental build using 3.0 while in 1.1.1
the perf went back to 75000/s!


Is this a current problem with the latest openssl built from source?


Yes and deeper than that actually, there's even a meta-issue to try to
reference the many reports for massive performance regressions on the
project:


A followup to my followup.  Time flies!

I was just reading on the openssl mailing list about what's coming in 
version 3.1.  The first release highlight is:


* Refactoring of the OSSL_LIB_CTX code to avoid excessive locking

Is anyone enough in tune with openssl happenings to know whether that 
fixes the issues that Willy was advising me about?  Or maybe improves 
the situation but doesn't fully resolve it?


I tried to figure this out for myself based on data in the CHANGES.md 
file, but didn't see anything that looked relevant to my very untrained 
eye.  Reading the code wouldn't help, as I am completely clueless when 
it comes to encryption code.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-16 Thread Willy Tarreau
On Fri, Dec 16, 2022 at 06:58:33AM -0700, Shawn Heisey wrote:
> On 12/16/22 01:59, Shawn Heisey wrote:
> > On 12/16/22 00:26, Willy Tarreau wrote:
> >  > Both work for me using firefox (green flash after reload).
> > 
> > It wasn't working when I tested it.  I rebooted for a kernel upgrade and
> > it still wasn't working.
> > 
> > And then a while later I was poking around in my zabbix UI and saw the
> > green lightning bolt.  No idea what changed.  Glad it's working, but
> > problems that fix themselves annoy me because I usually never learn what
> > happened.
> 
> I think I know what happened.
> 
> I was having problems with my pacemaker cluster where it got very confused
> about the haproxy resource.  I had the haproxy service enabled at boot for
> both systems.  I have now disabled that in systemd so it's fully under the
> control of pacemaker.  I'm pretty sure that pacemaker was confused because
> it saw the service running on a system where it should have been disabled
> and pacemaker didn't start it ... and it decided that was unacceptable and
> basically broke the cluster.
> 
> So for a while I had the virtual IP resource on the "lesser" server and the
> haproxy resource on the main server.  But because I had haproxy enabled at
> boot time, it was actually running on both.  The haproxy config is the same
> between both systems, but the other server was still running a broken
> haproxy version.  Most of the backends are actually on the better server
> accessed by br0 IP address rather than localhost, so the broken haproxy was
> still sending them to the right place.  This also explains why I was not
> seeing traffic with tcpdump filtering on "udp port 443".  I have a ways to
> go before I've got true HA for my websites.  Setting up a database cluster
> is going to be challenging, I think.
> 
> I got pacemaker back in working order after I was done with my testing, so
> both resources were colocated on the better server and haproxy was not
> running on the other one.  I think you tried the URLs after I had fixed
> pacemaker, and when I saw it working on zabbix, that was also definitely
> after I fixed pacemaker.

Thanks for sharing your analysis. Indeed, everything makes sense now.

> On that UDP bind thing ... I now have three binds defined.  The virtual IP,
> the IP of the first server, and the IP of the second server.

As long as you don't have too many nodes, that's often the simplest thing
to do. It requires ip_non_local_bind=1 but that's extremely frequent where
haproxy runs.

Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-16 Thread Shawn Heisey

On 12/16/22 01:59, Shawn Heisey wrote:

On 12/16/22 00:26, Willy Tarreau wrote:
 > Both work for me using firefox (green flash after reload).

It wasn't working when I tested it.  I rebooted for a kernel upgrade and 
it still wasn't working.


And then a while later I was poking around in my zabbix UI and saw the 
green lightning bolt.  No idea what changed.  Glad it's working, but 
problems that fix themselves annoy me because I usually never learn what 
happened.


I think I know what happened.

I was having problems with my pacemaker cluster where it got very 
confused about the haproxy resource.  I had the haproxy service enabled 
at boot for both systems.  I have now disabled that in systemd so it's 
fully under the control of pacemaker.  I'm pretty sure that pacemaker 
was confused because it saw the service running on a system where it 
should have been disabled and pacemaker didn't start it ... and it 
decided that was unacceptable and basically broke the cluster.


So for a while I had the virtual IP resource on the "lesser" server and 
the haproxy resource on the main server.  But because I had haproxy 
enabled at boot time, it was actually running on both.  The haproxy 
config is the same between both systems, but the other server was still 
running a broken haproxy version.  Most of the backends are actually on 
the better server accessed by br0 IP address rather than localhost, so 
the broken haproxy was still sending them to the right place.  This also 
explains why I was not seeing traffic with tcpdump filtering on "udp 
port 443".  I have a ways to go before I've got true HA for my websites. 
 Setting up a database cluster is going to be challenging, I think.


I got pacemaker back in working order after I was done with my testing, 
so both resources were colocated on the better server and haproxy was 
not running on the other one.  I think you tried the URLs after I had 
fixed pacemaker, and when I saw it working on zabbix, that was also 
definitely after I fixed pacemaker.


On that UDP bind thing ... I now have three binds defined.  The virtual 
IP, the IP of the first server, and the IP of the second server.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-16 Thread Shawn Heisey

On 12/16/22 00:26, Willy Tarreau wrote:
> Both work for me using firefox (green flash after reload).

It wasn't working when I tested it.  I rebooted for a kernel upgrade and 
it still wasn't working.


And then a while later I was poking around in my zabbix UI and saw the 
green lightning bolt.  No idea what changed.  Glad it's working, but 
problems that fix themselves annoy me because I usually never learn what 
happened.


> You indeed need to
> bind to both the native and the virtual IP addresses (you can have the
> two on the same "bind" line, delimited by comma).

That's the little bit of info that I needed.  Now it works the way I was 
expecting with both IP addresses.  I have a lot less experience with UDP 
than TCP, I wasn't aware of that gotcha.  It does make perfect sense now 
that it's been pointed out.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-16 Thread Shawn Heisey

On 12/16/22 00:01, Willy Tarreau wrote:

   - if you want to use QUIC, use quictls-1.1.1. Once you have to build
 something yourself, you definitely don't want to waste your time on
 the performance-crippled 3.0, and 1.1.1 will change less often than
 3.0 so that also means less package updates.

   - if you want to experiment with QUIC and help developers, running
 compatibility tests with the latest haproxy master and the latest
 WolfSSL master could be useful. I just don't know if the maintainers
 are ready to receive lots of uncoordinated reports yet, I'm aware
 that they're still in the process of fixing a few basic integration
 issues that will make things run much smoother soon. Similarly,
 LibreSSL's QUIC support is very recent (3.6) and few people seem to
 use LibreSSL, I don't know how well it's supported in distros these
 days. More tests on this one would probably be nice and may possibly
 encourage its support.


I'd say that I am somewhere in between these two.  Helping the devs is 
not an EXPLICIT goal, but I am already tinkering with this stuff for 
myself, so it's not a lot of extra effort to be involved here.  I think 
my setup can provide a little bit of useful data and another test 
environment.  Pursuing http3 has been fun.


Straying offtopic:

I find that being a useful member of open source communities is an 
awesome experience.  For this one I'm not as much use at the code level 
as I am for other communities.  My experience with C was a long time ago 
... it was one of my first languages.  I spend more time with Bash and 
Java than anything else these days.  Occasionally delve into Perl, which 
I really like.


On the subject of building things myself ... way back in the 90s I used 
to build all my own Linux kernels, enabling only what I needed, building 
it into the kernel directly, and optimizing for the specific CPU in the 
machine.  And I tended to build most of the software I used from source 
as well.


These days, some distros have figured out how to do all these things 
better than I ever could, so I mostly install from apt repos.  For 
really mainstream software, they keep up with recent versions pretty well.


For some software, haproxy being one of the most prominent, the distro 
packages are so far behind what's current that I pretty much have to 
build it myself if I want useful features.  I got started using haproxy 
with version 1.4, and quickly went to 1.5-dev because I was pursuing the 
best TLS setup I could get.  In those days I wasn't using source 
repositories, I would download tarballs from 1wt.eu.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 08:40:59PM -0700, Shawn Heisey wrote:
> On 12/15/22 09:47, Shawn Heisey wrote:
> > The version of curl with http3 support is not available in any of the
> > distro repos for my Ubuntu machines, so I found a docker image with it.
> > That works in cases where a browser won't switch, but that's because it
> > never tries TCP, it goes straight to UDP.  The problem doesn't break H3,
> > it just breaks a browser's ability to transition from TCP to UDP.
> 
> With the provided patch, the h3 is working well on the machine with this
> URL:
> 
> https://http3test.elyograg.org/
> 
> But it doesn't work correctly on the machine with this URL:
> 
> https://admin.elyograg.org/

Both work for me using firefox (green flash after reload).

(...)
> TLDR:  I also have another oddity.  The basement server is part of a
> pacemaker cluster which starts a virtual IP and haproxy on one of the
> servers, with the server in question having the highest resource placement
> setting.  Two of the servers in the cluster are bare metal, the third is a
> VM running on a third machine, providing a tiebreaker vote so the cluster
> works properly without STONITH.  Settings prevent the resources from
> starting on the VM, and cause haproxy to always be co-located with the
> virtual IP.  I had to go with a VM because the third machine is running
> Ubuntu 22.10 and I couldn't form the cluster with different versions of
> pacemaker/corosync/pcsd on that machine compared to the other two.

OK that's indeed a significant difference.

> If I bind quic4@0.0.0.0:443 then UDP/443 requests to that virtual IP do not
> work.  But if I bind quic4@192.168.217.170:443 which is that virtual IP,
> then UDP/443 requests do work.

Expected (even though annoying). It's likely that responses are sent from
the native IP address instead of the virtual one. That's actually due to
QUIC relying on UDP, and UDP not being well supported by the good old BSD
socket API (you can't specify the address to send from). We have a work
around for this in 2.8-dev, which comes with other benefits but for now
it's better to limit it to setups with less than a few thousands QUIC
connections (yours very likely qualifies based on your explanation).
I remembered we noted this limitation somehwere but can't find it anymore.
Maybe it was just in the announce message. At least we need to make it more
prominent (e.g. in the "bind" keyword documentation). You indeed need to
bind to both the native and the virtual IP addresses (you can have the
two on the same "bind" line, delimited by comma).

Hoping this helps,
Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 09:47:36AM -0700, Shawn Heisey wrote:
> Just got a look at the patch.  One line code fixes are awesome.

We all love them. Sometimes I even suspect we unconsciously create
such bugs to have the pleasure of contemplating these fixes :-)

Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Fri, Dec 16, 2022 at 01:44:15AM -0500, John Lauro wrote:
> What exactly is needed to reproduce the poor performance issue with openssl
> 3?  I was able to test 20k req/sec with it using k6 to simulate 16k users
> over a wan.  The k6 box did have openssl1.  Probably could have sustained
> more, but that's all I need right now.  Openssl v1 tested a little faster,
> but within 10%.  Wasn't trying to max out my tests as that should be over
> 4x the needed performance.

It mainly depends on the number of CPU cores. What's happening is that
in 1.1.0 they silently removed the support for the locking callbacks
(these are now ignored) and switched to pthread_mutex instead, without
realizing that in case of contention, syscalls would be emitted. Using
syscalls for tiny operations is already not good, but it got even worse
in the post-SPECTRE era. And in 3.0 they made lots of stuff much more
dynamic, with locks everywhere. I measured about 80 lock/unlock sequences
for a single request! The problem is that once the load becomes sufficient
for threads to compete on a lock, one of them goes into the system and
sleeps there. And that's when you start seeing
native_queued_spin_lock_slowpath() eat all your CPU. Worse, the time
wasted sleeping in the system is so huge compared to the tiny operations
that the lock aimed at protecting against, that this time is definitely
lost and the system can never recover from this loss because work continues
to accumulate. So you can observe good performance until it's too high, at
which point you have to significantly lower it to recover. The worst I've
seen was the client mode with performance going down from 74k cps to 400
cps on a 24-core machine, i.e. performance divided by almost 200!

> Not doing H3, and the backends are send-proxy-v2.
> Default libs on Alma linux on arm.
> # rpm -qa | grep openssl
> openssl-pkcs11-0.4.11-7.el9.aarch64
> xmlsec1-openssl-1.2.29-9.el9.aarch64
> openssl-libs-3.0.1-43.el9_0.aarch64
> openssl-3.0.1-43.el9_0.aarch64
> openssl-devel-3.0.1-43.el9_0.aarch64
> 
> This is the first box I setup with EL9 and thus openssl-3.  Might it only
> be an issue when ssl is used to the backends?

That's where it has the highest effect, sadly, mostly with renegotiation.
If you intend to run at less than a few thousands connection per second
it could possibly be OK.

Emeric collected some numbers, and we'll soon post them (but bear with
us, it takes time to aggregate everything).

Also, I don't know if you're using HTTP on the backends, but if so, you
should normally mostly benefit from keep-alive and connection reuse.

If you want to reproduce these issues, make sure you disable http-reuse
(http-reuse never), and disable session resumption on the "server" lines
("no-ssl-reuse").

And never forget to run "perf top" on the machine to see where the CPU
is spent.

Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 11:39:16PM -0700, Shawn Heisey wrote:
> On 12/15/22 21:49, Willy Tarreau wrote:
> > There's currently a great momentum around WolfSSL that was already
> > adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
> > powers most HTTP/3-compatible agents). Its support on haproxy is
> > making fast progress thanks to the efforts on the two sides, and it's
> > pleasant to speak to people who care about performance.
> 
> What would be your recommendation right now for a quic-enabled library to
> use with haproxy?  Are there any choices better than quictls 1.1.1?
> Is wolfSSL support far enough along that I could build and try it and have
> some hope of success, or should I stick with quictls for now?

For now I'd say that quictls 1.1.1 is the best option. 1.1.x doesn't scale
very well but doesn't collapse under load like 3.0 at least. And admittedly,
support for openssl is proven by now. Other libs are either unmaintainable
(BoringSSL with no release cycle and whose API regularly breaks the build
in the middle of our stable branches), lagging a bit behind (LibreSSL has
not caught up with 1.1.1 on everything and is measurably slower), not
supported yet (GnuTLS), or only start to be supported by haproxy (WolfSSL).
Thus I'd suggest in this order:

  - if you don't want to use QUIC and have a small or personal site, use
your distro's package, even if it's 3.0, you're unlikely to notice
the performance problems.

  - if you don't want to use QUIC but have a moderate to large site, use
openssl 1.1.1, which is easily achieved by staying on the current LTS
distros that still provide it. This way you won't need to build and
maintain your own package.

  - if you want to use QUIC, use quictls-1.1.1. Once you have to build
something yourself, you definitely don't want to waste your time on
the performance-crippled 3.0, and 1.1.1 will change less often than
3.0 so that also means less package updates.

  - if you want to experiment with QUIC and help developers, running
compatibility tests with the latest haproxy master and the latest
WolfSSL master could be useful. I just don't know if the maintainers
are ready to receive lots of uncoordinated reports yet, I'm aware
that they're still in the process of fixing a few basic integration
issues that will make things run much smoother soon. Similarly,
LibreSSL's QUIC support is very recent (3.6) and few people seem to
use LibreSSL, I don't know how well it's supported in distros these
days. More tests on this one would probably be nice and may possibly
encourage its support.

> My websites
> certainly aren't anything mission-critical, but there are people that would
> be annoyed if I have problems.

That's a good reason for staying on quictls for now. That's what we're doing
on haproxy.org as well.

> Email is more important than the websites,
> and that's directly on the Internet in my AWS instance, not going through
> haproxy.

OK. This part should definitely not be touched under any circumstance.

Hoping this helps,
Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread John Lauro
What exactly is needed to reproduce the poor performance issue with openssl
3?  I was able to test 20k req/sec with it using k6 to simulate 16k users
over a wan.  The k6 box did have openssl1.  Probably could have sustained
more, but that's all I need right now.  Openssl v1 tested a little faster,
but within 10%.  Wasn't trying to max out my tests as that should be over
4x the needed performance.

Not doing H3, and the backends are send-proxy-v2.
Default libs on Alma linux on arm.
# rpm -qa | grep openssl
openssl-pkcs11-0.4.11-7.el9.aarch64
xmlsec1-openssl-1.2.29-9.el9.aarch64
openssl-libs-3.0.1-43.el9_0.aarch64
openssl-3.0.1-43.el9_0.aarch64
openssl-devel-3.0.1-43.el9_0.aarch64

This is the first box I setup with EL9 and thus openssl-3.  Might it only
be an issue when ssl is used to the backends?

On Thu, Dec 15, 2022 at 11:50 PM Willy Tarreau  wrote:

> On Thu, Dec 15, 2022 at 08:58:29PM -0700, Shawn Heisey wrote:
> > I'm sure the performance issue has been brought to the attention of the
> > OpenSSL project ... what did they have to say about the likelihood and
> > timeline for providing a fix?
>
> They're still working on it for 3.1. 3.1-alpha is "less worse" than
> 3.0 but still far behind 1.1.1 in our tests.
>
> > Is there an article or bug filed I can read for more information?
>
> There's this issue that centralizes the status of the most important
> regression reports:
>
>   https://github.com/openssl/openssl/issues/17627#issuecomment-1060123659
>
> We've also planned to issue an article to summarize our observations
> about this before users are hit too strong, but it will take some
> time to collect all info and write it down. But it's definitely a big
> problem for users who upgrade to latest LTS distros that shipped 3.0
> without testing it (though I can't blame distros, it's not the package
> maintainers' job to run performance tests on what they maintain) :-(
>
> My personal feeling is that this disaster combined with the stubborn
> refusal to support the QUIC crypto API that is mandatory for any
> post-2021 HTTP agent basically means that OpenSSL is not part of the
> future of web environments and that it's urgent to find alternatives,
> just like all other projects are currently seeking. And with http-based
> products forced to abandon OpenSSL, it's unlikely that their performance
> issues will be relevant in the future so it should get even worse over
> time by lack of testing and exposure. It's sad, because before the QUIC
> drama, we hoped to spend some time helping them improve their perfomance
> by reducing the locking abuse. Now the project has gone too far in the
> wrong direction for anything to be doable anymore, and I doubt that
> anyone has the energy to fork 1.1.1 and restart from a mostly clean
> state. But anyway, a solution must be found for the next batch of LTS
> distros so that users can jump from 20.x to 24.x and skip 22.x.
>
> There's currently a great momentum around WolfSSL that was already
> adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
> powers most HTTP/3-compatible agents). Its support on haproxy is
> making fast progress thanks to the efforts on the two sides, and it's
> pleasant to speak to people who care about performance. I'd bet we'll
> find it packaged in a usable state long before OpenSSL finally changes
> their mind on QUIC and reaches distros in a usable state. That's a
> perfect (though sad) example of the impact of design by committee!
>
>https://www.openssl.org/policies/omc-bylaws.html#OMC
>https://en.wikipedia.org/wiki/Design_by_committee
>
> Everything was written...
> Willy
>
>


Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Fri, Dec 16, 2022 at 07:29:23AM +0100, Vincent Bernat wrote:
> On 2022-12-16 05:49, Willy Tarreau wrote:
> > There's currently a great momentum around WolfSSL that was already
> > adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
> > powers most HTTP/3-compatible agents). Its support on haproxy is
> > making fast progress thanks to the efforts on the two sides, and it's
> > pleasant to speak to people who care about performance. I'd bet we'll
> > find it packaged in a usable state long before OpenSSL finally changes
> > their mind on QUIC and reaches distros in a usable state. That's a
> > perfect (though sad) example of the impact of design by committee!
> 
> It's currently packaged in Debian and Ubuntu. For Ubuntu, it is currently in
> universe (no security support). For Debian, there are discussions to not
> ship it in the next release due to security concerns, but this is worked on.

That's great! I noticed that the lib comes with many build options, and I
guess that one difficult aspect will be to figure which ones to enable in
the packaged version. I guess that the various projects supporting it will
help them figure a reasonable set of default settings that suits everyone
(at least all packaged projects). This could constitute a potential solution
to have both QUIC support and performance back in future distros.

> I'll ask again later when its support is finished in HAProxy if we can
> switch to it for Debian/Ubuntu packages.

Great, thank you for your help! Most users don't realize how much the
success of certain protocol improvements depends on just a bunch of
people's willingness to improve the situation for end users ;-)

> Next Debian will be using OpenSSL 3.0.0. Ubuntu is using OpenSSL 3.0.0 since
> Jammy.

Good to know for Debian, thanks!
Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Shawn Heisey

On 12/15/22 21:49, Willy Tarreau wrote:

There's currently a great momentum around WolfSSL that was already
adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
powers most HTTP/3-compatible agents). Its support on haproxy is
making fast progress thanks to the efforts on the two sides, and it's
pleasant to speak to people who care about performance.


What would be your recommendation right now for a quic-enabled library 
to use with haproxy?  Are there any choices better than quictls 1.1.1?


Is wolfSSL support far enough along that I could build and try it and 
have some hope of success, or should I stick with quictls for now?  My 
websites certainly aren't anything mission-critical, but there are 
people that would be annoyed if I have problems.  Email is more 
important than the websites, and that's directly on the Internet in my 
AWS instance, not going through haproxy.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Vincent Bernat

On 2022-12-16 05:49, Willy Tarreau wrote:

There's currently a great momentum around WolfSSL that was already
adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
powers most HTTP/3-compatible agents). Its support on haproxy is
making fast progress thanks to the efforts on the two sides, and it's
pleasant to speak to people who care about performance. I'd bet we'll
find it packaged in a usable state long before OpenSSL finally changes
their mind on QUIC and reaches distros in a usable state. That's a
perfect (though sad) example of the impact of design by committee!


It's currently packaged in Debian and Ubuntu. For Ubuntu, it is 
currently in universe (no security support). For Debian, there are 
discussions to not ship it in the next release due to security concerns, 
but this is worked on.


I'll ask again later when its support is finished in HAProxy if we can 
switch to it for Debian/Ubuntu packages.


Next Debian will be using OpenSSL 3.0.0. Ubuntu is using OpenSSL 3.0.0 
since Jammy.




Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 08:58:29PM -0700, Shawn Heisey wrote:
> I'm sure the performance issue has been brought to the attention of the
> OpenSSL project ... what did they have to say about the likelihood and
> timeline for providing a fix?

They're still working on it for 3.1. 3.1-alpha is "less worse" than
3.0 but still far behind 1.1.1 in our tests.

> Is there an article or bug filed I can read for more information?

There's this issue that centralizes the status of the most important
regression reports:

  https://github.com/openssl/openssl/issues/17627#issuecomment-1060123659

We've also planned to issue an article to summarize our observations
about this before users are hit too strong, but it will take some
time to collect all info and write it down. But it's definitely a big
problem for users who upgrade to latest LTS distros that shipped 3.0
without testing it (though I can't blame distros, it's not the package
maintainers' job to run performance tests on what they maintain) :-(

My personal feeling is that this disaster combined with the stubborn
refusal to support the QUIC crypto API that is mandatory for any
post-2021 HTTP agent basically means that OpenSSL is not part of the
future of web environments and that it's urgent to find alternatives,
just like all other projects are currently seeking. And with http-based
products forced to abandon OpenSSL, it's unlikely that their performance
issues will be relevant in the future so it should get even worse over
time by lack of testing and exposure. It's sad, because before the QUIC
drama, we hoped to spend some time helping them improve their perfomance
by reducing the locking abuse. Now the project has gone too far in the
wrong direction for anything to be doable anymore, and I doubt that
anyone has the energy to fork 1.1.1 and restart from a mostly clean
state. But anyway, a solution must be found for the next batch of LTS
distros so that users can jump from 20.x to 24.x and skip 22.x.

There's currently a great momentum around WolfSSL that was already
adopted by Apache, Curl, and Ngtcp2 (which is the QUIC stack that
powers most HTTP/3-compatible agents). Its support on haproxy is
making fast progress thanks to the efforts on the two sides, and it's
pleasant to speak to people who care about performance. I'd bet we'll
find it packaged in a usable state long before OpenSSL finally changes
their mind on QUIC and reaches distros in a usable state. That's a
perfect (though sad) example of the impact of design by committee!

   https://www.openssl.org/policies/omc-bylaws.html#OMC
   https://en.wikipedia.org/wiki/Design_by_committee

Everything was written...
Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Shawn Heisey

On 12/15/22 02:19, Willy Tarreau wrote:

I guess you'll get them only while the previous version remains maintained
(i.e. use a package from the previous LTS distro). But regardless you'll
also need to use executables linked with that version and that's where it
can become a pain.


When I upgraded my main server from Ubuntu 20 to Ubuntu 22, it still had 
openssl 1.1.x installed as an unmanaged package not part of any repo. 
Little by little I got my third-party APT repos updated to jammy.  The 
last holdout was Gitlab, and I got that resolved just a few days ago. 
Then I was able to remove the 1.1 package.


I'm sure the performance issue has been brought to the attention of the 
OpenSSL project ... what did they have to say about the likelihood and 
timeline for providing a fix?  Is there an article or bug filed I can 
read for more information?


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Shawn Heisey

On 12/15/22 09:47, Shawn Heisey wrote:
The version of curl with http3 support is not available in any of the 
distro repos for my Ubuntu machines, so I found a docker image with it. 
That works in cases where a browser won't switch, but that's because it 
never tries TCP, it goes straight to UDP.  The problem doesn't break H3, 
it just breaks a browser's ability to transition from TCP to UDP.


With the provided patch, the h3 is working well on the machine with this 
URL:


https://http3test.elyograg.org/

But it doesn't work correctly on the machine with this URL:

https://admin.elyograg.org/

Testing with the curl docker image works on both servers.  Testing with 
https://http3check.net also works with both servers.


The configs are not completely identical, but everything related to 
quic/h3 for those URLs is identical.  The only significant difference I 
have found so far between the two systems is that the one that works is 
Ubuntu 20.04 with edge kernel 5.15, and the one that doesn't work is 
Ubuntu 22.04 with edge kernel 6.0.


Both have quictls 1.1.1s compiled with exactly the same options, and the 
same haproxy 2.7 version with the same options -- up to date master with 
that one line patch.  They have different openssl versions, but haproxy 
should not be using that, it should just be using quictls.


The hardware is very different.  The one that works is an AWS t3a.large 
instance, 2 CPUs (linux reports AMD EPYC 7571) and 8GB RAM.  The one 
that doesn't work is a Dell R720xd in my basement with two of the 
following CPU, each with 12 cores, and 88GB RAM:


Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

I have been through the configs working out minor differences, which 
resulted in changes to both configs.  Nothing new -- the AWS instance is 
still working, the basement server isn't.  The backends are the same 
except for the names and IP addresses.


H3 used to work on the basement machine, and I couldn't say when it 
stopped working.  I had seen the green lightning bolt on my zabbix 
install that runs on the basement machine, not sure when it disappeared. 
 I noticed it first on the AWS machine when I was switching quictls 
versions.  I usually update both servers haproxy together, so they 
probably stopped working about the same time.  The patched version works 
well on one, but not the other.


I downgraded the basement to 437fd289f2e32e56498d2d4da63852d483f284ef 
which should be the 2.7.0 release.  That didn't help, so maybe there is 
something else going on.


I believe that haproxy works intimately with kernel code ... could the 
difference of 5.15 and 6.0 (both with all of ubuntu's patches) be enough 
to explain this?


These are very much homegrown configs.  I cobbled together info from the 
documentation, info obtained on this mailing list, and random articles 
found with google.  I might be doing things substantially different than 
a true expert would.


This is how I configure quictls.  If this should be adjusted, I'm open 
to that.


-
CONFARGS="--prefix=/opt/quictls enable-tls1_3 no-idea no-mdc2 no-rc5 
no-zlib no-ssl3 enable-unit-test no-ssl3-method enable-rfc3779 
enable-cms no-capieng threads"


if [ "$(uname -i)" == "x86_64" ]; then
  CONFARGS="${CONFARGS} enable-ec_nistp_64_gcc_128"
fi
-

And here is the latest haproxy -vv:

HAProxy version 2.7.0-e557ae-43 2022/12/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2024.
Known bugs: http://www.haproxy.org/bugs/bugs-2.7.0.html
Running on: Linux 5.15.0-1026-aws #30~20.04.2-Ubuntu SMP Fri Nov 25 
14:53:22 UTC 2022 x86_64

Build options :
  TARGET  = linux-glibc
  CPU = native
  CC  = cc
  CFLAGS  = -O2 -march=native -g -Wall -Wextra -Wundef 
-Wdeclaration-after-statement -Wfatal-errors -Wtype-limits 
-Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond 
-Wnull-dereference -fwrapv -Wno-address-of-packed-member 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered 
-Wno-missing-field-initializers -Wno-cast-function-type 
-Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1 
USE_QUIC=1

  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT -PCRE2 
+PCRE2_JIT +POLL +THREAD -PTHREAD_EMULATION +BACKTRACE -STATIC_PCRE 
-STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H 
-ENGINE +GETADDRINFO +OPENSSL -OPENSSL_WOLFSSL -LUA +ACCEPT4 -CLOSEFROM 
+ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL 
+SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT 
+QUIC -PROMEX -MEMORY_PROFILING +SHM_OPEN


Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, 
default=2).

Built with OpenSSL version : OpenSSL 1.1.1s+quic  1 Nov 2022
Running on OpenSSL version : OpenSSL 1.1.1s+quic  1 Nov 2022
OpenSSL library supports TLS extensions : yes
OpenSSL 

Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Shawn Heisey

On 12/15/22 00:58, Amaury Denoyelle wrote:

I seem to be able to reach your website with H3 currently. Did you
revert to an older version ? Regarding this commit, it rejects requests
with invalid headers (with uppercase or non-HTTP tokens in the field
name). Have you tried with several browsers and with command-line
clients ?


Yes, once I found the problem commit, I reverted to the commit just 
prior, which is why you saw it working.


Had to use --3way to apply the patch from your other message to apply to 
the 2.8-dev master branch.  Got that built and deployed.  H3 works. 
Looking forward to the fix coming to 2.7.


I did try with firefox, chrome, and a special version of curl.

The version of curl with http3 support is not available in any of the 
distro repos for my Ubuntu machines, so I found a docker image with it. 
That works in cases where a browser won't switch, but that's because it 
never tries TCP, it goes straight to UDP.  The problem doesn't break H3, 
it just breaks a browser's ability to transition from TCP to UDP.


With the commit just prior to the one that broke H3 in a browser, H3 is 
a lot more stable than it has been in the past.  Before, by clicking 
around between folders in my webmail, I could eventually (after maybe a 
dozen clicks) reach a point where the website becomes unresponsive until 
I shift-reload to get it back to H2 and then reload to have it switch to 
H3 again.  That did not happen with the newer commit.  Building with 
your patch also handles webmail flawlessly.


Looks like you meant that I was supposed to apply the patch to the 2.7 
master branch, not 2.8-dev.  It applied there without --3way, and that 
also fixes the problem.


Just got a look at the patch.  One line code fixes are awesome.

Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Amaury Denoyelle
On Thu, Dec 15, 2022 at 09:20:01AM +0100, Amaury Denoyelle wrote:
> On Thu, Dec 15, 2022 at 09:03:18AM +0100, Amaury Denoyelle wrote:
> > On Thu, Dec 15, 2022 at 08:58:16AM +0100, Amaury Denoyelle wrote:
> > > On Wed, Dec 14, 2022 at 11:20:44PM -0700, Shawn Heisey wrote:
> > > > On 12/14/22 21:23, Илья Шипицин wrote:
> > > > > Can you try to bisect?
> > > > I had made some incorrect assumptions about what's needed to use
> > > > bisect.  With a little bit of research I figured it out and it was a
> > > > LOT easier than I had imagined.
> > > > > I suspect that it won't help, browsers tend to remember things in
> > > > > their own way
> > > > One thing I have learned in my testing is that doing shift-reload on
> > > > the page means it will never switch to h3.  So I use shift-reload
> > > > followed by a couple of regular reloads as a way of resetting what
> > > > the browser remembers.  That seems to work.
> > > > The bisect process only took a few runs to find the problem commit:
> > > > 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46 is the first bad commit
> > > > commit 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46
> > > > Author: Amaury Denoyelle 
> > > > Date:   Wed Dec 7 14:31:42 2022 +0100
> > > > BUG/MEDIUM: h3: reject request with invalid header name
> > > > [...]
> > > I seem to be able to reach your website with H3 currently. Did you
> > > revert to an older version ? Regarding this commit, it rejects requests
> > > with invalid headers (with uppercase or non-HTTP tokens in the field
> > > name). Have you tried with several browsers and with command-line
> > > clients ?
> > > I will look on my side to see if I missed something.
> > With a local instance of nextcloud I am able to reproduce a bug linked
> > to this commit with caused the deactivation of H3. I'm investigating on
> > it...
> The issue seems to be triggered by request with a cookie header. Can you
> please apply the following patch on top of the master branch and confirm
> me if this resolves your issue ? Thanks.
> [...]

I'm definitely sure on the fix so I merged my patch. If you can, please
give a try to the new master branch and tell me if your issue is
resolved.

Thanks you for your help on this issue, I really appreciate !

-- 
Amaury Denoyelle



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 08:56:13AM +0100, Vincent Bernat wrote:
> On 2022-12-14 15:15, Willy Tarreau wrote:
> > Possibly, yes. It's more efficient in every way from what we can see.
> > For users who build themselves (and with QUIC right now you don't have
> > a better choice), it should not change anything and will keep robustness.
> > For those relying on the distro's package, I don't know if it's possible
> > to install the previous distro's package side-by-side, but in any case
> > it can start to become a mess to deal with.
> 
> It's possible on Debian and I suspect this is the same for RedHat. However,
> you don't get security updates in this case.

I guess you'll get them only while the previous version remains maintained
(i.e. use a package from the previous LTS distro). But regardless you'll
also need to use executables linked with that version and that's where it
can become a pain.

Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Amaury Denoyelle
On Thu, Dec 15, 2022 at 09:03:18AM +0100, Amaury Denoyelle wrote:
> On Thu, Dec 15, 2022 at 08:58:16AM +0100, Amaury Denoyelle wrote:
> > On Wed, Dec 14, 2022 at 11:20:44PM -0700, Shawn Heisey wrote:
> > > On 12/14/22 21:23, Илья Шипицин wrote:
> > > > Can you try to bisect?
> > > I had made some incorrect assumptions about what's needed to use
> > > bisect.  With a little bit of research I figured it out and it was a
> > > LOT easier than I had imagined.
> > > > I suspect that it won't help, browsers tend to remember things in
> > > > their own way
> > > One thing I have learned in my testing is that doing shift-reload on
> > > the page means it will never switch to h3.  So I use shift-reload
> > > followed by a couple of regular reloads as a way of resetting what
> > > the browser remembers.  That seems to work.
> > > The bisect process only took a few runs to find the problem commit:
> > > 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46 is the first bad commit
> > > commit 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46
> > > Author: Amaury Denoyelle 
> > > Date:   Wed Dec 7 14:31:42 2022 +0100
> > > BUG/MEDIUM: h3: reject request with invalid header name
> > > [...]
> > I seem to be able to reach your website with H3 currently. Did you
> > revert to an older version ? Regarding this commit, it rejects requests
> > with invalid headers (with uppercase or non-HTTP tokens in the field
> > name). Have you tried with several browsers and with command-line
> > clients ?
> > I will look on my side to see if I missed something.
> With a local instance of nextcloud I am able to reproduce a bug linked
> to this commit with caused the deactivation of H3. I'm investigating on
> it...

The issue seems to be triggered by request with a cookie header. Can you
please apply the following patch on top of the master branch and confirm
me if this resolves your issue ? Thanks.

-- 
Amaury Denoyelle
>From 603a919c8b0cea75516571c27e427960e85fae72 Mon Sep 17 00:00:00 2001
From: Amaury Denoyelle 
Date: Thu, 15 Dec 2022 09:18:25 +0100
Subject: [PATCH] BUG/MEDIUM: h3: fix cookie header parsing

---
 src/h3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/h3.c b/src/h3.c
index d24b3de5f..10d19e2cd 100644
--- a/src/h3.c
+++ b/src/h3.c
@@ -544,6 +544,7 @@ static ssize_t h3_headers_to_htx(struct qcs *qcs, const 
struct buffer *buf,
 
if (isteq(list[hdr_idx].n, ist("cookie"))) {
http_cookie_register(list, hdr_idx, &cookie, 
&last_cookie);
+   ++hdr_idx;
continue;
}
else if (isteq(list[hdr_idx].n, ist("content-length"))) {
-- 
2.39.0



Re: Followup on openssl 3.0 note seen in another thread

2022-12-15 Thread Amaury Denoyelle
On Thu, Dec 15, 2022 at 08:58:16AM +0100, Amaury Denoyelle wrote:
> On Wed, Dec 14, 2022 at 11:20:44PM -0700, Shawn Heisey wrote:
> > On 12/14/22 21:23, Илья Шипицин wrote:
> > > Can you try to bisect?
> > I had made some incorrect assumptions about what's needed to use
> > bisect.  With a little bit of research I figured it out and it was a
> > LOT easier than I had imagined.
> > > I suspect that it won't help, browsers tend to remember things in
> > > their own way
> > One thing I have learned in my testing is that doing shift-reload on
> > the page means it will never switch to h3.  So I use shift-reload
> > followed by a couple of regular reloads as a way of resetting what
> > the browser remembers.  That seems to work.
> > The bisect process only took a few runs to find the problem commit:
> > 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46 is the first bad commit
> > commit 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46
> > Author: Amaury Denoyelle 
> > Date:   Wed Dec 7 14:31:42 2022 +0100
> > BUG/MEDIUM: h3: reject request with invalid header name
> > [...]
> I seem to be able to reach your website with H3 currently. Did you
> revert to an older version ? Regarding this commit, it rejects requests
> with invalid headers (with uppercase or non-HTTP tokens in the field
> name). Have you tried with several browsers and with command-line
> clients ?
> I will look on my side to see if I missed something.

With a local instance of nextcloud I am able to reproduce a bug linked
to this commit with caused the deactivation of H3. I'm investigating on
it...

-- 
Amaury Denoyelle



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Amaury Denoyelle
On Wed, Dec 14, 2022 at 11:20:44PM -0700, Shawn Heisey wrote:
> On 12/14/22 21:23, Илья Шипицин wrote:
> > Can you try to bisect?
> I had made some incorrect assumptions about what's needed to use
> bisect.  With a little bit of research I figured it out and it was a
> LOT easier than I had imagined.
> > I suspect that it won't help, browsers tend to remember things in
> > their own way
> One thing I have learned in my testing is that doing shift-reload on
> the page means it will never switch to h3.  So I use shift-reload
> followed by a couple of regular reloads as a way of resetting what
> the browser remembers.  That seems to work.
> The bisect process only took a few runs to find the problem commit:
> 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46 is the first bad commit
> commit 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46
> Author: Amaury Denoyelle 
> Date:   Wed Dec 7 14:31:42 2022 +0100
> BUG/MEDIUM: h3: reject request with invalid header name
> [...]

I seem to be able to reach your website with H3 currently. Did you
revert to an older version ? Regarding this commit, it rejects requests
with invalid headers (with uppercase or non-HTTP tokens in the field
name). Have you tried with several browsers and with command-line
clients ?

I will look on my side to see if I missed something.

-- 
Amaury Denoyelle



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Vincent Bernat

On 2022-12-14 15:15, Willy Tarreau wrote:

Possibly, yes. It's more efficient in every way from what we can see.
For users who build themselves (and with QUIC right now you don't have
a better choice), it should not change anything and will keep robustness.
For those relying on the distro's package, I don't know if it's possible
to install the previous distro's package side-by-side, but in any case
it can start to become a mess to deal with.


It's possible on Debian and I suspect this is the same for RedHat. 
However, you don't get security updates in this case.




Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Shawn Heisey

On 12/14/22 21:23, Илья Шипицин wrote:

Can you try to bisect?


I had made some incorrect assumptions about what's needed to use bisect. 
 With a little bit of research I figured it out and it was a LOT easier 
than I had imagined.


I suspect that it won't help, browsers tend to remember things in their 
own way


One thing I have learned in my testing is that doing shift-reload on the 
page means it will never switch to h3.  So I use shift-reload followed 
by a couple of regular reloads as a way of resetting what the browser 
remembers.  That seems to work.


The bisect process only took a few runs to find the problem commit:

3ca4223c5e1f18a19dc93b0b09ffdbd295554d46 is the first bad commit
commit 3ca4223c5e1f18a19dc93b0b09ffdbd295554d46
Author: Amaury Denoyelle 
Date:   Wed Dec 7 14:31:42 2022 +0100

BUG/MEDIUM: h3: reject request with invalid header name

Reject request containing invalid header name. This concerns every
header containing uppercase letter or a non HTTP token such as a space.

For the moment, this kind of errors triggers a connection close. In the
future, it should be handled only with a stream reset. To reduce
backport surface, this will be implemented in another commit.

Thanks to Yuki Mogi from FFRI Security, Inc. for having reported this.

This must be backported up to 2.6.

(cherry picked from commit d6fb7a0e0f3a79afa1f4b6fc7b62053c3955dc4a)
Signed-off-by: Christopher Faulet 

 src/h3.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Willy Tarreau
On Thu, Dec 15, 2022 at 10:23:59AM +0600,  ??? wrote:
> Can you try to bisect?
> 
> I suspect that it won't help, browsers tend to remember things in their own
> way

That's often the problem we've been facing as well during tests. When a
browser decides that your QUIC implementation doesn't work, it seems to
store the info "somewhere" for "some time". That's extremely frustrating
because restarting usually doesn't change, and there doesn't seem to be
anything available to tell them "OK I finished fiddling with my setup,
please try again".

Willy



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Илья Шипицин
Can you try to bisect?

I suspect that it won't help, browsers tend to remember things in their own
way

On Thu, Dec 15, 2022, 9:09 AM Shawn Heisey  wrote:

> On 12/14/22 19:33, Shawn Heisey wrote:
> > With quictls 3.0.7 it was working.  I will try rebuilding and see
> > whether it still does.  There was probably an update to haproxy as well
> > as changing quictls -- my build script pulls the latest from the 2.7 git
> > repo.
>
> Rebuilding with quictls 3.0.7 didn't change the behavior -- browsers
> still don't switch to http as they did before, so the obvious conclusion
> is that something changed in haproxy.
>
> If you would like me to do anything to help troubleshoot, please let me
> know.
>
> This is the simplest test I have.  Reloading this page used to switch to
> http3:
>
> https://http3test.elyograg.org/
>
> I also built and installed the latest 2.8.0-dev version with quictls
> 1.1.1s.  It doesn't switch to h3 either.
>
> Thanks,
> Shawn
>
>


Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Shawn Heisey

On 12/14/22 19:33, Shawn Heisey wrote:
With quictls 3.0.7 it was working.  I will try rebuilding and see 
whether it still does.  There was probably an update to haproxy as well 
as changing quictls -- my build script pulls the latest from the 2.7 git 
repo.


Rebuilding with quictls 3.0.7 didn't change the behavior -- browsers 
still don't switch to http as they did before, so the obvious conclusion 
is that something changed in haproxy.


If you would like me to do anything to help troubleshoot, please let me 
know.


This is the simplest test I have.  Reloading this page used to switch to 
http3:


https://http3test.elyograg.org/

I also built and installed the latest 2.8.0-dev version with quictls 
1.1.1s.  It doesn't switch to h3 either.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Shawn Heisey

On 12/14/22 07:15, Willy Tarreau wrote:

Should I switch to quictls 1.1.1 instead?

Possibly, yes


I did this, and now browsers do not switch to http3.  A direct request 
that forces http3 works, but browsers are not switching to it based on 
the alt-svc header.  Tried both firefox and chrome which have been 
successful for me in the past.


I grabbed a sniffer trace of UDP/443 when I ask for the page in firefox. 
 Here is a wireshark view of that when following the UDP stream:


https://www.dropbox.com/s/5sc8ylxt82mn0gf/h3_udp_capture_follow.png?dl=0

That certainly looks to me like a significant amount of two-way 
communication, but as it's encrypted, I have no idea what it might mean. 
 The browser's console reports that the connection is http/2.


With quictls 3.0.7 it was working.  I will try rebuilding and see 
whether it still does.  There was probably an update to haproxy as well 
as changing quictls -- my build script pulls the latest from the 2.7 git 
repo.


Output from haproxy -vv:

HAProxy version 2.7.0-e557ae-43 2022/12/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2024.
Known bugs: http://www.haproxy.org/bugs/bugs-2.7.0.html
Running on: Linux 5.15.0-1026-aws #30~20.04.2-Ubuntu SMP Fri Nov 25 
14:53:22 UTC 2022 x86_64

Build options :
  TARGET  = linux-glibc
  CPU = native
  CC  = cc
  CFLAGS  = -O2 -march=native -g -Wall -Wextra -Wundef 
-Wdeclaration-after-statement -Wfatal-errors -Wtype-limits 
-Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond 
-Wnull-dereference -fwrapv -Wno-address-of-packed-member 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered 
-Wno-missing-field-initializers -Wno-cast-function-type 
-Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1 
USE_QUIC=1

  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT -PCRE2 
+PCRE2_JIT +POLL +THREAD -PTHREAD_EMULATION +BACKTRACE -STATIC_PCRE 
-STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H 
-ENGINE +GETADDRINFO +OPENSSL -OPENSSL_WOLFSSL -LUA +ACCEPT4 -CLOSEFROM 
+ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL 
+SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT 
+QUIC -PROMEX -MEMORY_PROFILING +SHM_OPEN


Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, 
default=2).

Built with OpenSSL version : OpenSSL 1.1.1s+quic  1 Nov 2022
Running on OpenSSL version : OpenSSL 1.1.1s+quic  1 Nov 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND

Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' keyword)
   quic : mode=HTTP  side=FE mux=QUIC  flags=HTX|NO_UPG|FRAMED
 h2 : mode=HTTP  side=FE|BE  mux=H2flags=HTX|HOL_RISK|NO_UPG
   fcgi : mode=HTTP  side=BE mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
   : mode=HTTP  side=FE|BE  mux=H1flags=HTX
 h1 : mode=HTTP  side=FE|BE  mux=H1flags=HTX|NO_UPG
   : mode=TCP   side=FE|BE  mux=PASS  flags=
   none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace

Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Shawn Heisey

On 12/14/22 12:06, Shawn Heisey wrote:
I built a gitlab CI config to test out changes to my build/install 
scripts.  I'm having some trouble with that where haproxy is not working 
right, I'll start a new thread.


Turned out that most of those problems were due to docker-related 
issues.  And then I discovered that in my tiny little test config for 
haproxy I had the bind line for udp/443 all wrong.


The following command may be of interest to anyone testing out 
http3/quic support.  It requires that you have docker installed.  On 
Ubuntu that can be installed with "apt install docker.io".


sudo docker run --add-host=host.docker.internal:host-gateway --rm 
ymuski/curl-http3 curl -v -m 4 -s -f -k 
"https://host.docker.internal/test_file"; --http3 && echo GOOD


The curl options configure a 4 second absolute timeout, suppress the 
usual progress meter that curl shows, turn 4xx or 5xx response codes 
into a nonzero exit status, and disable certificate validation.  Perfect 
for a CI/CD pipeline.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Shawn Heisey

On 12/14/22 07:15, Willy Tarreau wrote:

Should I switch to quictls 1.1.1 instead?


Possibly, yes. It's more efficient in every way from what we can see.
For users who build themselves (and with QUIC right now you don't have
a better choice), it should not change anything and will keep robustness.
For those relying on the distro's package, I don't know if it's possible
to install the previous distro's package side-by-side, but in any case
it can start to become a mess to deal with.


Bonus, 1.1.1s compiles noticeably faster than 3.0.7.  Install seems 
about the same, but I figured out how to have the install NOT do the 
docs, which brought install time down to 3 seconds.


I built a gitlab CI config to test out changes to my build/install 
scripts.  I'm having some trouble with that where haproxy is not working 
right, I'll start a new thread.


Thanks,
Shawn



Re: Followup on openssl 3.0 note seen in another thread

2022-12-14 Thread Willy Tarreau
On Wed, Dec 14, 2022 at 07:01:59AM -0700, Shawn Heisey wrote:
> On 12/14/22 06:07, Willy Tarreau wrote:
> > By the way, are you running with OpenSSL
> > 3.0 ?  That one is absolutely terrible and makes extreme abuse of
> > mutexes and locks, to the point that certain workloads were divided
> > by 2-digit numbers between 1.1.1 and 3.0. It took me one day to
> > figure that my load generator which was caping at 400 conn/s was in
> > fact suffering from an accidental build using 3.0 while in 1.1.1
> > the perf went back to 75000/s!
> 
> Is this a current problem with the latest openssl built from source?

Yes and deeper than that actually, there's even a meta-issue to try to
reference the many reports for massive performance regressions on the
project:

  https://github.com/openssl/openssl/issues/17627#issuecomment-1060123659

> I'm
> running my 2.7.x installs with quictls 3.0.7, which aside from the QUIC
> support should be the same as openssl.

Due to new distros progressively moving to 3.0, it's getting more and
more exposed. And with 1.1.1 support ending soon, it's going to become
a huge problem for many high-performance users.

> 400 connections per second is a lot more than I need, but if it's that
> inefficient, seems like overall system performance would take a hit even if
> it's not completely saturated.  My primary server has dual E5-2697 v2 CPUs,
> but my mail server is a 2-CPU AWS instance.

Actually you're in the same situation as plenty of users who don't need
this level of performance and will not necessarily notice the problem
until they face a traffic spike and the machine collapses.

> Should I switch to quictls 1.1.1 instead?

Possibly, yes. It's more efficient in every way from what we can see.
For users who build themselves (and with QUIC right now you don't have
a better choice), it should not change anything and will keep robustness.
For those relying on the distro's package, I don't know if it's possible
to install the previous distro's package side-by-side, but in any case
it can start to become a mess to deal with.

But if you're running at low loads and ideally not exposed to the net,
it's unlikely that you'd notice it. What's really happening is that
in order to make it more dynamic they've apparently replaced lots of
constants with functions that run over lists under locks, so if you're
facing very low load, the overhead will remain minimal, but once the
load increases and multiple threads need to access the same elements,
contention happens.

To give you an idea, during a test I measured up to 80 calls to a
rwlock for a single HTTP request...  Mutexes are so expensive that
they should be avoided by all means in low-level functions, and in
the worst case should be limited to a single-digit. Here it has no
chance to ever recover once a short traffic spike touches the machine.

Willy