Re: Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process

2022-09-05 Thread Arkadiusz Miśkiewicz

On 2.09.2022 14:44, Bartosz Kwitniewski wrote:

Hello,

I'm running a dovecot 2.3.19.1 server that has around 6000 SSL 
certificates in separate config files, each containing:

local_name "domain" {
     ssl_cert = ...
     ssl_key = ...
}
When new certificate is added, dovecot is reloaded (around 20 times a 
day). When dovecot is being reloaded, users are unable to log in for 
around 30 seconds.


Unfortunately it's known for ages that dovecot is not capable of 
handling thousands of certificates in a sane way.


There were some ideas which were never implemented:

https://dovecot.org/list/dovecot/2016-October/105858.html

( https://dovecot.org/list/dovecot/2016-October/105855.html )

--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )



Re: Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process

2022-09-03 Thread spi


04.09.2022 01:01:16 Bartosz Kwitniewski :


> For now they are on the same machine, we have to write our own panel for 
> clients to get more freedom in backend choices. I was looking into HAProxy 
> for SSL termination, but it does not support STARTTLS.
>
> I'll try to look for workaround next week, but haven't used C for ages.
>
> Best regards,
> --
> Bartosz Kwitniewski

Nginx can be used as reverse proxy for dovecot to terminate tls and load 
balance. It can also be used to verify client certificates (access only with 
valid client certificate and route access to backend based on client 
certificate).

Cheers,
spi


Re: Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process

2022-09-03 Thread Tom Hendrikx

Hi,

Isn't the easiest way to solve this to reconfigure the SSL cert update 
process to reload dovecot only once a day? It isn't that an update to an 
SSL cert should be imminent: normally you can take your time and plan 
carefully. This situation seems to me something like using the default 
scripting that comes with standard small-scale LE certificates, which is 
simple not suitable for the big-scaled setup of the OP.


Only adding new certs/domains could be something that needs to be done 
directly, depending on the business case of the OP.


Kind regards,
Tom

On 02-09-2022 22:45, John Stoffel wrote:

"Bartosz" == Bartosz Kwitniewski  writes:



Out of other services on that machine that are able to handle such
number of certificates during reloads:
- proftpd loads configs dynamically based on SNI domain
- exim loads certificates dynamically based on SNI domain
- LiteSpeed switches to a new process after loading whole configuration


Are you running all these services on one machine?  Maybe you could
get an SSL termination device which terminates the SSL connections and
then forwards them into the proper backend application?  This way only
one system needs to be managed for certs, and only one (or two since I
assume you have an HA pair :-) needs to then reload when new certs are
inserted.

If you could hack the proftpd cert code into dovecot, that might also
be a way around it.  I haven't a clue how this works since I haven't
looked at either code base.  It won't be simple, but I'm sure others
would apprecaite it.

If it's critical, paying for the feature to be added is another
option.



Best regards,
--
Bartosz Kwitniewski



On 02/09/2022 14:52, Felipe Gasper wrote:

For hosting environments--where TLS certs can change hundreds of times in a 
matter of minutes--it would be a boon for Dovecot to load those certificates 
dynamically rather than all at once.

Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS 
certificates & keys. Documented here:

https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672c3f8428/README.TLS#L161

-FG



On Sep 2, 2022, at 08:44, Bartosz Kwitniewski  wrote:

Hello,

I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in 
separate config files, each containing:
local_name "domain" {
ssl_cert = ...
ssl_key = ...
}
When new certificate is added, dovecot is reloaded (around 20 times a day). 
When dovecot is being reloaded, users are unable to log in for around 30 
seconds.

The main problem here seems to be that during reload, new config process is 
immediately designated as the one serving config requests and then it starts 
parsing config files, which takes around 20-30 seconds. If it would parse 
config files first, and only then would become a new process for serving config 
requests, then it would probably solve the problem. Or perhaps there is a 
better way to load new certificates or a way to optimize?

There is another problem with config process and shutdown_clients=no. We do not 
want to disconnect users during reload, because e.g. Thunderbird displays a 
popup that server is shutting down. When there are long lasting IMAP 
connections from Google and other services that aggregate e-mail, old config 
process is not being killed. Because config process with ~6000 certificates is 
using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a 
big issue however, because we have created a task that kills old processes, but 
there could be a built-in mechanism to solve that problem.

I have created minimal configuration and scripts to recreate problem. 
Reproduction steps below.

(...)


Re: Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process

2022-09-02 Thread John Stoffel
> "Bartosz" == Bartosz Kwitniewski  writes:

> Out of other services on that machine that are able to handle such 
> number of certificates during reloads:
> - proftpd loads configs dynamically based on SNI domain
> - exim loads certificates dynamically based on SNI domain
> - LiteSpeed switches to a new process after loading whole configuration

Are you running all these services on one machine?  Maybe you could
get an SSL termination device which terminates the SSL connections and
then forwards them into the proper backend application?  This way only
one system needs to be managed for certs, and only one (or two since I
assume you have an HA pair :-) needs to then reload when new certs are
inserted.

If you could hack the proftpd cert code into dovecot, that might also
be a way around it.  I haven't a clue how this works since I haven't
looked at either code base.  It won't be simple, but I'm sure others
would apprecaite it.  

If it's critical, paying for the feature to be added is another
option.  


> Best regards,
> --
> Bartosz Kwitniewski

> On 02/09/2022 14:52, Felipe Gasper wrote:
>> For hosting environments--where TLS certs can change hundreds of times in a 
>> matter of minutes--it would be a boon for Dovecot to load those certificates 
>> dynamically rather than all at once.
>> 
>> Pure-FTPd implements a nice solution to this: a standalone service that 
>> fetches TLS certificates & keys. Documented here:
>> 
>> https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672c3f8428/README.TLS#L161
>> 
>> -FG
>> 
>> 
>>> On Sep 2, 2022, at 08:44, Bartosz Kwitniewski  wrote:
>>> 
>>> Hello,
>>> 
>>> I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates 
>>> in separate config files, each containing:
>>> local_name "domain" {
>>> ssl_cert = ...
>>> ssl_key = ...
>>> }
>>> When new certificate is added, dovecot is reloaded (around 20 times a day). 
>>> When dovecot is being reloaded, users are unable to log in for around 30 
>>> seconds.
>>> 
>>> The main problem here seems to be that during reload, new config process is 
>>> immediately designated as the one serving config requests and then it 
>>> starts parsing config files, which takes around 20-30 seconds. If it would 
>>> parse config files first, and only then would become a new process for 
>>> serving config requests, then it would probably solve the problem. Or 
>>> perhaps there is a better way to load new certificates or a way to optimize?
>>> 
>>> There is another problem with config process and shutdown_clients=no. We do 
>>> not want to disconnect users during reload, because e.g. Thunderbird 
>>> displays a popup that server is shutting down. When there are long lasting 
>>> IMAP connections from Google and other services that aggregate e-mail, old 
>>> config process is not being killed. Because config process with ~6000 
>>> certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory 
>>> used. This is not a big issue however, because we have created a task that 
>>> kills old processes, but there could be a built-in mechanism to solve that 
>>> problem.
>>> 
>>> I have created minimal configuration and scripts to recreate problem. 
>>> Reproduction steps below.
>>> 
>>> (...)


Re: Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process

2022-09-02 Thread Felipe Gasper
For hosting environments--where TLS certs can change hundreds of times in a 
matter of minutes--it would be a boon for Dovecot to load those certificates 
dynamically rather than all at once.

Pure-FTPd implements a nice solution to this: a standalone service that fetches 
TLS certificates & keys. Documented here:

https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672c3f8428/README.TLS#L161

-FG


> On Sep 2, 2022, at 08:44, Bartosz Kwitniewski  wrote:
> 
> Hello,
> 
> I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates 
> in separate config files, each containing:
> local_name "domain" {
>ssl_cert = ...
>ssl_key = ...
> }
> When new certificate is added, dovecot is reloaded (around 20 times a day). 
> When dovecot is being reloaded, users are unable to log in for around 30 
> seconds.
> 
> The main problem here seems to be that during reload, new config process is 
> immediately designated as the one serving config requests and then it starts 
> parsing config files, which takes around 20-30 seconds. If it would parse 
> config files first, and only then would become a new process for serving 
> config requests, then it would probably solve the problem. Or perhaps there 
> is a better way to load new certificates or a way to optimize?
> 
> There is another problem with config process and shutdown_clients=no. We do 
> not want to disconnect users during reload, because e.g. Thunderbird displays 
> a popup that server is shutting down. When there are long lasting IMAP 
> connections from Google and other services that aggregate e-mail, old config 
> process is not being killed. Because config process with ~6000 certificates 
> is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is 
> not a big issue however, because we have created a task that kills old 
> processes, but there could be a built-in mechanism to solve that problem.
> 
> I have created minimal configuration and scripts to recreate problem. 
> Reproduction steps below.
> 
> Configuration (doveadm config -n):
> ==
> # 2.3.19.1 (9b53102964): /etc/dovecot/dovecot.conf
> # OS: Linux 4.18.0-372.9.1.1.lve.el8.x86_64 x86_64 CloudLinux release 8.6 
> (Leonid Kadenyuk)
> # Hostname: -
> auth_mechanisms = plain login
> default_client_limit = 12288
> default_process_limit = 2048
> default_vsz_limit = 1 G
> first_valid_uid = 1000
> mail_location = maildir:~/mail
> mbox_write_locks = fcntl
> namespace inbox {
>  inbox = yes
>  location =
>  mailbox Drafts {
>special_use = \Drafts
>  }
>  mailbox Sent {
>special_use = \Sent
>  }
>  mailbox "Sent Messages" {
>special_use = \Sent
>  }
>  mailbox Trash {
>special_use = \Trash
>  }
>  prefix =
> }
> passdb {
>  args = scheme=CRYPT username_format=%u /etc/dovecot/users
>  driver = passwd-file
> }
> service auth-worker {
>  user = $default_internal_user
> }
> service auth {
>  user = $default_internal_user
> }
> service config {
>  vsz_limit = 4 G
> }
> service imap-login {
>  process_min_avail = 20
>  service_count = 1
> }
> service imap {
>  process_limit = 4096
> }
> service pop3-login {
>  process_min_avail = 20
>  service_count = 1
> }
> service pop3 {
>  process_limit = 512
> }
> shutdown_clients = no
> ssl_cert =  ssl_cipher_list = 
> ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:DHE-DSS-AES128-GCM-SHA256:DHE-DSS-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES128-SHA256:DHE-DSS-AES256-SHA256:DHE-DSS-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-RSA-AES128-GCM-SHA256:ECDH-RSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA:ECDH-RSA-AES128-SHA256:ECDH-RSA-AES256-SHA384:ECDH-RSA-AES128-SHA:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-CHACHA20-POLY1305
> ssl_dh =  ssl_key =  ssl_options = no_compression
> ssl_prefer_server_ciphers = yes
> userdb {
>  args = username_format=%u /etc/dovecot/users
>  driver = passwd-file
> }
> verbose_proctitle = yes
> protocol imap {
>  mail_max_userip_connections = 2048
> }
> protocol pop3 {
>  mail_max_userip_connections = 2048
> }
> local_name domain4587.example.org mail.domain4587.example.org {
>  ssl_cert =   ssl_key =  }
> local_name domain4588.example.org mail.domain4588.example.org {
>  ssl_cert =   ssl_key =  }
> (...) 5000 certificates goes here
> ==
> 
> /etc/dovecot/users:
> ==
> u...@domain1.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain1.example.org/user::
> u...@domain2.example.org:{SHA512-CRYPT}$6$...:1000:1000::/home/test/mail/domain2.example.org/user::
> (...) 5000 users goes here, but one should be enough for testing.
> ==
>