Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring

2020-05-11 Thread William Dauchy
On Mon, May 11, 2020 at 5:32 PM William Lallemand
 wrote:
> By removing, it just means:
> - In the case of a directory there is basicaly nothing to do but remove
>   the bundle code, everything will be loaded as usual, it will just be
>   done in separate SSL_CTX.
> - In the configuration, if you specify a bundle with "example.pem" to
>   load the .rsa and .ecdsa, I'll probably add a shim to emulate the
>   previous behavior, but they will be loaded in separate SSL_CTX.

understood, this will indeed simplify a lot of things. Thanks for the
clarification!
-- 
William



Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring

2020-05-11 Thread William Lallemand
On Mon, May 11, 2020 at 04:34:07PM +0200, William Dauchy wrote:
> > I'm going to remove the support for multi-cert bundle once 2.2 is
> > released, so it will simplify a lot of things. I encourage people
> > writing new features to not support multi-cert bundles, more
> > particularly on the CLI.
> 
> So you mean that
> example.pem.rsa
> example.pem.ecdsa
> will be loaded separately as all the other certificates?
> I'm not 100% sure what you meant behind removing support of "multi-cert 
> bundle".

What we call "multi-cert bundle" is the ability to load a certificate
for each key type (RSA,DSA,ECDSA) in the same SSL_CTX.
It was a hack implemented this way for OpenSSL 1.0.2 because there
wasn't any callback available to chose a certificate type in this
version. It wasn't implemented by all other SSL libraries.

With OpenSSL 1.1.1, a client_hello callback was implemented, letting us
chose the certificate type ourselves instead of letting OpenSSL do it.
In my opinion this API is better.

The multi-cert bundles are tricky and need specific conditions
everywhere in the loading code. It complexify a lot of things. For
example the loading code needs to verify if all SNIs exists in all
certificate types, and generates the right SSL_CTX combinations, leading
to multiple SSL_CTX created if the SNI weren't matching.

By removing, it just means:
- In the case of a directory there is basicaly nothing to do but remove
  the bundle code, everything will be loaded as usual, it will just be
  done in separate SSL_CTX.
- In the configuration, if you specify a bundle with "example.pem" to
  load the .rsa and .ecdsa, I'll probably add a shim to emulate the
  previous behavior, but they will be loaded in separate SSL_CTX.

Regarding the steps during the connection, during the SNI lookup,
instead of having one sni_ctx node containing a SSL_CTX with all
certificates, you will have one sni_ctx for each certificate type and
the right one will be selected during lookup.

Note that you can configure more precisely the SSL, for each certificate
file, by loading separately the certs in a crt-list, instead of
configuring all certificates on the bind line.

-- 
William Lallemand



Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring

2020-05-11 Thread William Dauchy
Hello William L.

Thank you for your answer.

On Mon, May 11, 2020 at 1:01 PM William Lallemand
 wrote:
> I'm wondering if we coud have the timing directly in the logs or on the
> CLI, because I'm actually also doing this kind of things, and people
> could probably optimize the loading if they could see the loading time.

I agree it is something which was missing while doing my tests.
Remove/re-add my timer to do my tests each time I was doing some
modifications; to a point where I noted to ask you if you had some
tooling I missed.
So that's totally something I could do in a separate patch! What is
your preferred timing function?

> Your patch is really interesting in my opinion, a lot of people could
> benefits of something like that. Regarding the reload part, I have some
> ideas to pass the ckch structures to the new processes without reloading
> the files one the filesystem, but I didn't implement anything yet.

ok; interesting.

> I'm currently reworking the ssl_sock.c and .h files, and splitting them
> in several files, so you will probably need to rebase your patch on top
> of this. At the moment I separated this in 3 parts: 1 part for the
> crt-list/directories, 1 part for the ckch, and 1 part for the
> SSL_CTX/bind. I'm not happy yet with what I have, but I'll do my best to
> push that as soon as possible in the git repository, I don't want the
> 2.2 to be released in the current state of the files.

good to know, and that's mostly why I avoided too much changes on the
file to avoid having too much rebase work while evolving my
functionality.

> I don't really know the io_uring API in details so I trust you on the
> implementation.

Yes, I've been using it for a few months now, so I'm starting to
better understand how to properly use it. There are indeed a few
different options where you could easily get trapped, and making the
code more complicated.
I however was planning to ask Jens for a quick review when I will have
something more feature complete.

> Regarding the name of the file, ssl_load.c, it is really generic, I
> don't mind having a more specific file only for loading the SSL
> certificates with io_uring.

Yes I totally agree; as it was more of an experimentation I will do a
second path on the naming and improve that. As I also started some
poll changes, it also helps me to better name things to make elements
more reusable.

> I'm wondering if we couldn't push directly the certificate in an X509
> struture instead of storing it in a intermediate buffer, and use the
> ckch instead of a specific cert_iobuf. That just some thoughts, I don't
> know io_uring enough.

I will look into that and it was also on my todo but for now I
considered it ok for a first implementation. It mostly depends if we
are able to properly allocate memory at the right time, while keeping
the code clear enough. For now I considered it as a possible
incremental improvement.
Thanks for the input.

> The crtlist files could also benefits from this, since they can contain
> a lot of certificates too.

Yes, indeed, for now I limited myself to the cases I was concerned
about to avoid too much testing. I will improve that. I should have
warned about it in my commit message.

> I'm seeing that you are using in ssl_load_preparetree() the
> SSL_SOCK_KEYTYPES_NAME array, which is only used for merging a
> certificate bundle. I know that this can be confusing in the code
> because the bundle implementation is crappy. But in a directory we
> don't use any filename extension to load a PEM, we load everything as a
> PEM file with the exception of .key, .ocsp, .sctl and .issuer, which are
> loaded separatly with their own functions. (The cert_exts[] array is
> used for that in ssl_sock.c)

ok, good to recall all files should be taken into account; for now I
blindly implemented my only known case. This will be part of the above
point to take into account all possible options.

> I'm going to remove the support for multi-cert bundle once 2.2 is
> released, so it will simplify a lot of things. I encourage people
> writing new features to not support multi-cert bundles, more
> particularly on the CLI.

So you mean that
example.pem.rsa
example.pem.ecdsa
will be loaded separately as all the other certificates?
I'm not 100% sure what you meant behind removing support of "multi-cert bundle".

> Unfortunately your patch is too late for 2.2, but I think it could be
> great for the next release!

Yes, and it was clear for me since the beginning, I would not have the
time to finish things early enough.

Thanks a lot for your feedbacks. I will try to combine the one from
Willy and yours for a future submission. I also hope we could find a
good middle point between something perfect and something we could
improve along the time.

--
William



Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring

2020-05-11 Thread William Lallemand
On Fri, May 08, 2020 at 04:34:27PM +0200, William Dauchy wrote:
> experimental patch to make use of IO_URING to batch load certificates;
> this drastically reduces the number of syscall and might benefit to
> setup with large number of certificates.
> it uses liburing in order to batch operation as follow:
> for each certificate directory, we apply the same operations on each
> file:
>  - statx
>  - openat
>  - read
>  - close
> the results are stored in an ebtree. Then when we need to load them
> with the SSL lib, instead of providing a path, we provide a buffer to
> be consumed. The tree is freed after each directory.
> 
> for now it requires a quite large limit of file descriptors, as all
> operations types are done one after another; so the limit of fd should
> be set higher than the number of certificates you have to load. This
> part is probably going to evolve very soon as IO_URING plans are to be
> able to chain operations with a given pre-defined file descriptor.
> 
> on a setup with 25k certificates I was able to measure a minimum of 20%
> gain on the init time when the filesystem cache is empty. My testing is
> as follow; for each directory:
> - put a timer before `ssl_sock_load_cert_list_file`, including
>   `ssl_load_certiodir` in the case of io_uring case.
> - measure the time at the end of those operations, just after
>   `ssl_free_certiodir` in the case of io_uring.

I'm wondering if we coud have the timing directly in the logs or on the
CLI, because I'm actually also doing this kind of things, and people
could probably optimize the loading if they could see the loading time.

Your patch is really interesting in my opinion, a lot of people could
benefits of something like that. Regarding the reload part, I have some
ideas to pass the ckch structures to the new processes without reloading
the files one the filesystem, but I didn't implement anything yet.

I'm currently reworking the ssl_sock.c and .h files, and splitting them
in several files, so you will probably need to rebase your patch on top
of this. At the moment I separated this in 3 parts: 1 part for the
crt-list/directories, 1 part for the ckch, and 1 part for the
SSL_CTX/bind. I'm not happy yet with what I have, but I'll do my best to
push that as soon as possible in the git repository, I don't want the
2.2 to be released in the current state of the files.

I don't really know the io_uring API in details so I trust you on the
implementation. 

Regarding the name of the file, ssl_load.c, it is really generic, I
don't mind having a more specific file only for loading the SSL
certificates with io_uring.

I'm wondering if we couldn't push directly the certificate in an X509
struture instead of storing it in a intermediate buffer, and use the
ckch instead of a specific cert_iobuf. That just some thoughts, I don't
know io_uring enough.

The crtlist files could also benefits from this, since they can contain
a lot of certificates too.

I'm seeing that you are using in ssl_load_preparetree() the
SSL_SOCK_KEYTYPES_NAME array, which is only used for merging a
certificate bundle. I know that this can be confusing in the code
because the bundle implementation is crappy. But in a directory we
don't use any filename extension to load a PEM, we load everything as a
PEM file with the exception of .key, .ocsp, .sctl and .issuer, which are
loaded separatly with their own functions. (The cert_exts[] array is
used for that in ssl_sock.c)

I'm going to remove the support for multi-cert bundle once 2.2 is
released, so it will simplify a lot of things. I encourage people
writing new features to not support multi-cert bundles, more
particularly on the CLI.

Unfortunately your patch is too late for 2.2, but I think it could be
great for the next release!

Regards,

-- 
William Lallemand



[RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring

2020-05-08 Thread William Dauchy
experimental patch to make use of IO_URING to batch load certificates;
this drastically reduces the number of syscall and might benefit to
setup with large number of certificates.
it uses liburing in order to batch operation as follow:
for each certificate directory, we apply the same operations on each
file:
 - statx
 - openat
 - read
 - close
the results are stored in an ebtree. Then when we need to load them
with the SSL lib, instead of providing a path, we provide a buffer to
be consumed. The tree is freed after each directory.

for now it requires a quite large limit of file descriptors, as all
operations types are done one after another; so the limit of fd should
be set higher than the number of certificates you have to load. This
part is probably going to evolve very soon as IO_URING plans are to be
able to chain operations with a given pre-defined file descriptor.

on a setup with 25k certificates I was able to measure a minimum of 20%
gain on the init time when the filesystem cache is empty. My testing is
as follow; for each directory:
- put a timer before `ssl_sock_load_cert_list_file`, including
  `ssl_load_certiodir` in the case of io_uring case.
- measure the time at the end of those operations, just after
  `ssl_free_certiodir` in the case of io_uring.

some results with my old ssd laptop, the average certificate init time
in ms, using 24686 certs:
▒▒   8554io_uring_empty_cache
▒8100io_uring_full_cache
▒▒  11395syscall_empty_cache
▒8087syscall_full_cache

The benefits are of course less obvious with following reloads on a
machine where you only have haproxy process managing its own memory, but
it can be interesting on a busier setup where the filesystem cache is
often invalidated. However a very quick overview of reload time on our
production seems to show that more than 50% of the time, the files are
not present in cache anymore while reloading (I hope I'm not too far
from the reality in my analysis); so I would expect this patch could
participate to flatten our reload time; I however did not tested it in a
production environment.

It remains linux-specific though; the operations used require a minimum
of kernel >= v5.6 even though some operations were supported in v5.4, I
did not change the code to adapt the behavior. All my tests were
performed on >= v5.7rc4 as the time of writing.

it introduces three build parameters:
- USE_IO_URING
- IO_URING_INC
- IO_URING_LIB

Signed-off-by: William Dauchy 
---
 Makefile |  10 +-
 doc/io_uring.txt |  18 +++
 include/proto/ssl_load.h |  23 
 include/types/ssl_load.h |  38 ++
 include/types/ssl_sock.h |   7 +
 src/ssl_load.c   | 274 +++
 src/ssl_sock.c   |  32 -
 7 files changed, 395 insertions(+), 7 deletions(-)
 create mode 100644 doc/io_uring.txt
 create mode 100644 include/proto/ssl_load.h
 create mode 100644 include/types/ssl_load.h
 create mode 100644 src/ssl_load.c

diff --git a/Makefile b/Makefile
index 1e4213989..afba381c0 100644
--- a/Makefile
+++ b/Makefile
@@ -52,6 +52,7 @@
 #   USE_SYSTEMD  : enable sd_notify() support.
 #   USE_OBSOLETE_LINKER  : use when the linker fails to emit 
__start_init/__stop_init
 #   USE_THREAD_DUMP  : use the more advanced thread state dump system. 
Automatic.
+#   USE_IO_URING : use IO_URING advanced async features
 #
 # Options can be forced by specifying "USE_xxx=1" or can be disabled by using
 # "USE_xxx=" (empty string). The list of enabled and disabled options for a
@@ -293,7 +294,8 @@ use_opts = USE_EPOLL USE_KQUEUE USE_NETFILTER   
  \
USE_GETADDRINFO USE_OPENSSL USE_LUA USE_FUTEX USE_ACCEPT4  \
USE_ZLIB USE_SLZ USE_CPU_AFFINITY USE_TFO USE_NS   \
USE_DL USE_RT USE_DEVICEATLAS USE_51DEGREES USE_WURFL USE_SYSTEMD  \
-   USE_OBSOLETE_LINKER USE_PRCTL USE_THREAD_DUMP USE_EVPORTS
+   USE_OBSOLETE_LINKER USE_PRCTL USE_THREAD_DUMP USE_EVPORTS  \
+   USE_IO_URING
 
  Target system options
 # Depending on the target platform, some options are set, as well as some
@@ -531,6 +533,12 @@ ifneq ($(USE_BACKTRACE),)
 OPTIONS_LDFLAGS += -Wl,$(if $(EXPORT_SYMBOL),$(EXPORT_SYMBOL),--export-dynamic)
 endif
 
+ifneq ($(USE_IO_URING),)
+OPTIONS_OBJS  += src/ssl_load.o
+OPTIONS_CFLAGS  += $(if $(IO_URING_INC),-I$(IO_URING_INC))
+OPTIONS_LDFLAGS += $(if $(IO_URING_LIB),-L$(IO_URING_LIB)) -luring
+endif
+
 ifneq ($(USE_OPENSSL),)
 # OpenSSL is packaged in various forms and with various dependencies.
 # In general -lssl is enough, but on some platforms, -lcrypto may be needed,
diff --git a/doc/io_uring.txt b/doc/io_uring.txt
new file mode 100644
index 0..aa290dd0d
--- /dev/null
+++ b/doc/io_uring.txt
@@ -0,0 +1,18 @@
+Linux IO_URING support for HAProxy
+==
+
+