Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring
On Mon, May 11, 2020 at 5:32 PM William Lallemand wrote: > By removing, it just means: > - In the case of a directory there is basicaly nothing to do but remove > the bundle code, everything will be loaded as usual, it will just be > done in separate SSL_CTX. > - In the configuration, if you specify a bundle with "example.pem" to > load the .rsa and .ecdsa, I'll probably add a shim to emulate the > previous behavior, but they will be loaded in separate SSL_CTX. understood, this will indeed simplify a lot of things. Thanks for the clarification! -- William
Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring
On Mon, May 11, 2020 at 04:34:07PM +0200, William Dauchy wrote: > > I'm going to remove the support for multi-cert bundle once 2.2 is > > released, so it will simplify a lot of things. I encourage people > > writing new features to not support multi-cert bundles, more > > particularly on the CLI. > > So you mean that > example.pem.rsa > example.pem.ecdsa > will be loaded separately as all the other certificates? > I'm not 100% sure what you meant behind removing support of "multi-cert > bundle". What we call "multi-cert bundle" is the ability to load a certificate for each key type (RSA,DSA,ECDSA) in the same SSL_CTX. It was a hack implemented this way for OpenSSL 1.0.2 because there wasn't any callback available to chose a certificate type in this version. It wasn't implemented by all other SSL libraries. With OpenSSL 1.1.1, a client_hello callback was implemented, letting us chose the certificate type ourselves instead of letting OpenSSL do it. In my opinion this API is better. The multi-cert bundles are tricky and need specific conditions everywhere in the loading code. It complexify a lot of things. For example the loading code needs to verify if all SNIs exists in all certificate types, and generates the right SSL_CTX combinations, leading to multiple SSL_CTX created if the SNI weren't matching. By removing, it just means: - In the case of a directory there is basicaly nothing to do but remove the bundle code, everything will be loaded as usual, it will just be done in separate SSL_CTX. - In the configuration, if you specify a bundle with "example.pem" to load the .rsa and .ecdsa, I'll probably add a shim to emulate the previous behavior, but they will be loaded in separate SSL_CTX. Regarding the steps during the connection, during the SNI lookup, instead of having one sni_ctx node containing a SSL_CTX with all certificates, you will have one sni_ctx for each certificate type and the right one will be selected during lookup. Note that you can configure more precisely the SSL, for each certificate file, by loading separately the certs in a crt-list, instead of configuring all certificates on the bind line. -- William Lallemand
Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring
Hello William L. Thank you for your answer. On Mon, May 11, 2020 at 1:01 PM William Lallemand wrote: > I'm wondering if we coud have the timing directly in the logs or on the > CLI, because I'm actually also doing this kind of things, and people > could probably optimize the loading if they could see the loading time. I agree it is something which was missing while doing my tests. Remove/re-add my timer to do my tests each time I was doing some modifications; to a point where I noted to ask you if you had some tooling I missed. So that's totally something I could do in a separate patch! What is your preferred timing function? > Your patch is really interesting in my opinion, a lot of people could > benefits of something like that. Regarding the reload part, I have some > ideas to pass the ckch structures to the new processes without reloading > the files one the filesystem, but I didn't implement anything yet. ok; interesting. > I'm currently reworking the ssl_sock.c and .h files, and splitting them > in several files, so you will probably need to rebase your patch on top > of this. At the moment I separated this in 3 parts: 1 part for the > crt-list/directories, 1 part for the ckch, and 1 part for the > SSL_CTX/bind. I'm not happy yet with what I have, but I'll do my best to > push that as soon as possible in the git repository, I don't want the > 2.2 to be released in the current state of the files. good to know, and that's mostly why I avoided too much changes on the file to avoid having too much rebase work while evolving my functionality. > I don't really know the io_uring API in details so I trust you on the > implementation. Yes, I've been using it for a few months now, so I'm starting to better understand how to properly use it. There are indeed a few different options where you could easily get trapped, and making the code more complicated. I however was planning to ask Jens for a quick review when I will have something more feature complete. > Regarding the name of the file, ssl_load.c, it is really generic, I > don't mind having a more specific file only for loading the SSL > certificates with io_uring. Yes I totally agree; as it was more of an experimentation I will do a second path on the naming and improve that. As I also started some poll changes, it also helps me to better name things to make elements more reusable. > I'm wondering if we couldn't push directly the certificate in an X509 > struture instead of storing it in a intermediate buffer, and use the > ckch instead of a specific cert_iobuf. That just some thoughts, I don't > know io_uring enough. I will look into that and it was also on my todo but for now I considered it ok for a first implementation. It mostly depends if we are able to properly allocate memory at the right time, while keeping the code clear enough. For now I considered it as a possible incremental improvement. Thanks for the input. > The crtlist files could also benefits from this, since they can contain > a lot of certificates too. Yes, indeed, for now I limited myself to the cases I was concerned about to avoid too much testing. I will improve that. I should have warned about it in my commit message. > I'm seeing that you are using in ssl_load_preparetree() the > SSL_SOCK_KEYTYPES_NAME array, which is only used for merging a > certificate bundle. I know that this can be confusing in the code > because the bundle implementation is crappy. But in a directory we > don't use any filename extension to load a PEM, we load everything as a > PEM file with the exception of .key, .ocsp, .sctl and .issuer, which are > loaded separatly with their own functions. (The cert_exts[] array is > used for that in ssl_sock.c) ok, good to recall all files should be taken into account; for now I blindly implemented my only known case. This will be part of the above point to take into account all possible options. > I'm going to remove the support for multi-cert bundle once 2.2 is > released, so it will simplify a lot of things. I encourage people > writing new features to not support multi-cert bundles, more > particularly on the CLI. So you mean that example.pem.rsa example.pem.ecdsa will be loaded separately as all the other certificates? I'm not 100% sure what you meant behind removing support of "multi-cert bundle". > Unfortunately your patch is too late for 2.2, but I think it could be > great for the next release! Yes, and it was clear for me since the beginning, I would not have the time to finish things early enough. Thanks a lot for your feedbacks. I will try to combine the one from Willy and yours for a future submission. I also hope we could find a good middle point between something perfect and something we could improve along the time. -- William
Re: [RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring
On Fri, May 08, 2020 at 04:34:27PM +0200, William Dauchy wrote: > experimental patch to make use of IO_URING to batch load certificates; > this drastically reduces the number of syscall and might benefit to > setup with large number of certificates. > it uses liburing in order to batch operation as follow: > for each certificate directory, we apply the same operations on each > file: > - statx > - openat > - read > - close > the results are stored in an ebtree. Then when we need to load them > with the SSL lib, instead of providing a path, we provide a buffer to > be consumed. The tree is freed after each directory. > > for now it requires a quite large limit of file descriptors, as all > operations types are done one after another; so the limit of fd should > be set higher than the number of certificates you have to load. This > part is probably going to evolve very soon as IO_URING plans are to be > able to chain operations with a given pre-defined file descriptor. > > on a setup with 25k certificates I was able to measure a minimum of 20% > gain on the init time when the filesystem cache is empty. My testing is > as follow; for each directory: > - put a timer before `ssl_sock_load_cert_list_file`, including > `ssl_load_certiodir` in the case of io_uring case. > - measure the time at the end of those operations, just after > `ssl_free_certiodir` in the case of io_uring. I'm wondering if we coud have the timing directly in the logs or on the CLI, because I'm actually also doing this kind of things, and people could probably optimize the loading if they could see the loading time. Your patch is really interesting in my opinion, a lot of people could benefits of something like that. Regarding the reload part, I have some ideas to pass the ckch structures to the new processes without reloading the files one the filesystem, but I didn't implement anything yet. I'm currently reworking the ssl_sock.c and .h files, and splitting them in several files, so you will probably need to rebase your patch on top of this. At the moment I separated this in 3 parts: 1 part for the crt-list/directories, 1 part for the ckch, and 1 part for the SSL_CTX/bind. I'm not happy yet with what I have, but I'll do my best to push that as soon as possible in the git repository, I don't want the 2.2 to be released in the current state of the files. I don't really know the io_uring API in details so I trust you on the implementation. Regarding the name of the file, ssl_load.c, it is really generic, I don't mind having a more specific file only for loading the SSL certificates with io_uring. I'm wondering if we couldn't push directly the certificate in an X509 struture instead of storing it in a intermediate buffer, and use the ckch instead of a specific cert_iobuf. That just some thoughts, I don't know io_uring enough. The crtlist files could also benefits from this, since they can contain a lot of certificates too. I'm seeing that you are using in ssl_load_preparetree() the SSL_SOCK_KEYTYPES_NAME array, which is only used for merging a certificate bundle. I know that this can be confusing in the code because the bundle implementation is crappy. But in a directory we don't use any filename extension to load a PEM, we load everything as a PEM file with the exception of .key, .ocsp, .sctl and .issuer, which are loaded separatly with their own functions. (The cert_exts[] array is used for that in ssl_sock.c) I'm going to remove the support for multi-cert bundle once 2.2 is released, so it will simplify a lot of things. I encourage people writing new features to not support multi-cert bundles, more particularly on the CLI. Unfortunately your patch is too late for 2.2, but I think it could be great for the next release! Regards, -- William Lallemand
[RFC PATCH 1/1] MEDIUM: ssl: add alternative way to load certificates with io_uring
experimental patch to make use of IO_URING to batch load certificates; this drastically reduces the number of syscall and might benefit to setup with large number of certificates. it uses liburing in order to batch operation as follow: for each certificate directory, we apply the same operations on each file: - statx - openat - read - close the results are stored in an ebtree. Then when we need to load them with the SSL lib, instead of providing a path, we provide a buffer to be consumed. The tree is freed after each directory. for now it requires a quite large limit of file descriptors, as all operations types are done one after another; so the limit of fd should be set higher than the number of certificates you have to load. This part is probably going to evolve very soon as IO_URING plans are to be able to chain operations with a given pre-defined file descriptor. on a setup with 25k certificates I was able to measure a minimum of 20% gain on the init time when the filesystem cache is empty. My testing is as follow; for each directory: - put a timer before `ssl_sock_load_cert_list_file`, including `ssl_load_certiodir` in the case of io_uring case. - measure the time at the end of those operations, just after `ssl_free_certiodir` in the case of io_uring. some results with my old ssd laptop, the average certificate init time in ms, using 24686 certs: ▒▒ 8554io_uring_empty_cache ▒8100io_uring_full_cache ▒▒ 11395syscall_empty_cache ▒8087syscall_full_cache The benefits are of course less obvious with following reloads on a machine where you only have haproxy process managing its own memory, but it can be interesting on a busier setup where the filesystem cache is often invalidated. However a very quick overview of reload time on our production seems to show that more than 50% of the time, the files are not present in cache anymore while reloading (I hope I'm not too far from the reality in my analysis); so I would expect this patch could participate to flatten our reload time; I however did not tested it in a production environment. It remains linux-specific though; the operations used require a minimum of kernel >= v5.6 even though some operations were supported in v5.4, I did not change the code to adapt the behavior. All my tests were performed on >= v5.7rc4 as the time of writing. it introduces three build parameters: - USE_IO_URING - IO_URING_INC - IO_URING_LIB Signed-off-by: William Dauchy --- Makefile | 10 +- doc/io_uring.txt | 18 +++ include/proto/ssl_load.h | 23 include/types/ssl_load.h | 38 ++ include/types/ssl_sock.h | 7 + src/ssl_load.c | 274 +++ src/ssl_sock.c | 32 - 7 files changed, 395 insertions(+), 7 deletions(-) create mode 100644 doc/io_uring.txt create mode 100644 include/proto/ssl_load.h create mode 100644 include/types/ssl_load.h create mode 100644 src/ssl_load.c diff --git a/Makefile b/Makefile index 1e4213989..afba381c0 100644 --- a/Makefile +++ b/Makefile @@ -52,6 +52,7 @@ # USE_SYSTEMD : enable sd_notify() support. # USE_OBSOLETE_LINKER : use when the linker fails to emit __start_init/__stop_init # USE_THREAD_DUMP : use the more advanced thread state dump system. Automatic. +# USE_IO_URING : use IO_URING advanced async features # # Options can be forced by specifying "USE_xxx=1" or can be disabled by using # "USE_xxx=" (empty string). The list of enabled and disabled options for a @@ -293,7 +294,8 @@ use_opts = USE_EPOLL USE_KQUEUE USE_NETFILTER \ USE_GETADDRINFO USE_OPENSSL USE_LUA USE_FUTEX USE_ACCEPT4 \ USE_ZLIB USE_SLZ USE_CPU_AFFINITY USE_TFO USE_NS \ USE_DL USE_RT USE_DEVICEATLAS USE_51DEGREES USE_WURFL USE_SYSTEMD \ - USE_OBSOLETE_LINKER USE_PRCTL USE_THREAD_DUMP USE_EVPORTS + USE_OBSOLETE_LINKER USE_PRCTL USE_THREAD_DUMP USE_EVPORTS \ + USE_IO_URING Target system options # Depending on the target platform, some options are set, as well as some @@ -531,6 +533,12 @@ ifneq ($(USE_BACKTRACE),) OPTIONS_LDFLAGS += -Wl,$(if $(EXPORT_SYMBOL),$(EXPORT_SYMBOL),--export-dynamic) endif +ifneq ($(USE_IO_URING),) +OPTIONS_OBJS += src/ssl_load.o +OPTIONS_CFLAGS += $(if $(IO_URING_INC),-I$(IO_URING_INC)) +OPTIONS_LDFLAGS += $(if $(IO_URING_LIB),-L$(IO_URING_LIB)) -luring +endif + ifneq ($(USE_OPENSSL),) # OpenSSL is packaged in various forms and with various dependencies. # In general -lssl is enough, but on some platforms, -lcrypto may be needed, diff --git a/doc/io_uring.txt b/doc/io_uring.txt new file mode 100644 index 0..aa290dd0d --- /dev/null +++ b/doc/io_uring.txt @@ -0,0 +1,18 @@ +Linux IO_URING support for HAProxy +== + +