bug#46961: Nginx and certbot cervices don't play well togther
On Wed, Jan 31 2024, Carlo Zancanaro wrote: >>> + (< attempt 12)) ; 12 * 10 seconds = 2 minutes >> ^-- >> This comment is not true because certbot takes time to execute (around 15s >> on my vm). I don't think there is a need to be that precise. > > I haven't extracted/named the max-attempts value, but I have removed the > comments that imply that the time frame is bounded. Ok >> Also could you update the example in the docs? > > I have removed the %certbot-deploy-hook in the example in the manual. > >> ... However, we could add a nginx-service-type and a >> dhcp-client-service-type so that people have an idea of what the minimal >> config is, maybe like I did in my first review: >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46961#23. > > I have not added this. I understand the desire, but I'm wary of providing an > example that's "too involved". The current example demonstrates a minimal > config of certbot itself. I think you are looking to include an example of a > minimal system that hosts a website using certbot provided certificates. I > don't know where an example like that belongs, but I'm not yet convinced it > belongs in the certbot service documentation. Sounds good Pushed, thank you! Clément
bug#46961: Nginx and certbot cervices don't play well togther
On Wed, Jan 31 2024, Clément Lassieur wrote: Removing guix-devel. I've also removed Brice. On Tue, Jan 30 2024, Carlo Zancanaro wrote: (format #t "Acquiring or renewing certificate: ~a~%" name) Here we could add ‘(force-output)’, because otherwise those logs arrive after the certbot logs, and it's hard to understand anything. Done. + ;; If we have a connection error, then bail early + ;; with exit code 2. We don't expect this to + ;; resolve within the timespan of this script. Could we have a (log + force-output) here too? (I imagine within a ‘begin’) Done. + ;; If we have any other type of error, then continue + ;; but exit with a failing status code in the end. and here? Done. And maybe a log also in case the command succeeds. (So that would mean to replace ‘unless’ with ‘if’). Done. + (< attempt 12)) ; 12 * 10 seconds = 2 minutes ^-- This comment is not true because certbot takes time to execute (around 15s on my vm). I don't think there is a need to be that precise. I haven't extracted/named the max-attempts value, but I have removed the comments that imply that the time frame is bounded. Also could you update the example in the docs? I have removed the %certbot-deploy-hook in the example in the manual. ... However, we could add a nginx-service-type and a dhcp-client-service-type so that people have an idea of what the minimal config is, maybe like I did in my first review: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46961#23. I have not added this. I understand the desire, but I'm wary of providing an example that's "too involved". The current example demonstrates a minimal config of certbot itself. I think you are looking to include an example of a minimal system that hosts a website using certbot provided certificates. I don't know where an example like that belongs, but I'm not yet convinced it belongs in the certbot service documentation. Carlo
bug#46961: Nginx and certbot cervices don't play well togther
Removing guix-devel. On Tue, Jan 30 2024, Carlo Zancanaro wrote: > +(define (file-contains? file string) > + (string-contains (call-with-input-file file > + get-string-all) > + string)) > + > +(define (connection-error?) > + (file-contains? "/var/log/letsencrypt/letsencrypt.log" > + "Failed to establish a new connection")) > + > +(let ((script-code 0)) >(for-each > (match-lambda > ((name . command) >(begin > (format #t "Acquiring or renewing certificate: ~a~%" > name) Here we could add ‘(force-output)’, because otherwise those logs arrive after the certbot logs, and it's hard to understand anything. > -(set! code (or (apply system* command) code) > - '#$commands) code))) > +(unless (zero? (status:exit-val (apply system* command))) > + ;; Certbot errors are always exit code 1, but we'd like > + ;; to separate connection errors from other error > types. > + (if (connection-error?) > + ;; If we have a connection error, then bail early > + ;; with exit code 2. We don't expect this to > + ;; resolve within the timespan of this script. Could we have a (log + force-output) here too? (I imagine within a ‘begin’) > + (exit 2) > + ;; If we have any other type of error, then > continue > + ;; but exit with a failing status code in the end. and here? > + (set! script-code 1)) And maybe a log also in case the command succeeds. (So that would mean to replace ‘unless’ with ‘if’). > + '#$commands) > + (exit script-code > > + (let loop ((attempt 0)) > + (let ((code (status:exit-val > + (system* #$(certbot-command config) > + (cond > +((and (= code 2) ; Exit code 2 means connection > error > + (< attempt 12)) ; 12 * 10 seconds = 2 minutes ^-- This comment is not true because certbot takes time to execute (around 15s on my vm). I don't think there is a need to be that precise. Maybe you can just add in in the let form, as in (let ((code ...) (max-attempts 12)). > + (sleep 10) > + (loop (1+ attempt))) > +((zero? code) > + ;; Success! > + #t) > +(else > + ;; Failure. > + #f)) Also could you update the example in the docs? >From the doc: >> @defvar certbot-service-type >> A service type for the @code{certbot} Let's Encrypt client. Its value >> must be a @code{certbot-configuration} record as in this example: >> >> @lisp >> (define %certbot-deploy-hook >> (program-file "certbot-deploy-hook.scm" >> (with-imported-modules '((gnu services herd)) >> #~(begin >> (use-modules (gnu services herd)) >> (with-shepherd-action 'nginx ('reload) result result) ^ This part isn't useful anymore. However, we could add a nginx-service-type and a dhcp-client-service-type so that people have an idea of what the minimal config is, maybe like I did in my first review: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46961#23. >> (service certbot-service-type >> (certbot-configuration >> (email "foo@@example.net") >> (certificates >>(list >> (certificate-configuration >> (domains '("example.net" "www.example.net")) >> (deploy-hook %certbot-deploy-hook)) >> (certificate-configuration >> (domains '("bar.example.net"))) >> @end lisp We are almost there, thanks! Clément
bug#46961: Nginx and certbot cervices don't play well togther
I removed guix-devel, not sure we need to spam it. On Tue, Jan 30 2024, Carlo Zancanaro wrote: > +(define %default-deploy-hook > + (program-file > + "reload-nginx.scm" > + (with-imported-modules '((gnu services herd)) > + #~(begin > + (use-modules (gnu services herd)) > + (with-shepherd-action 'nginx ('reload) result result) > + > (define-record-type* >certificate-configuration make-certificate-configuration >certificate-configuration? > @@ -65,7 +74,7 @@ (define-record-type* >(cleanup-hookcertificate-cleanup-hook > (default #f)) >(deploy-hook certificate-configuration-deploy-hook > - (default #f)) > + (default %default-deploy-hook)) >(start-self-signed? certificate-configuration-start-self-signed? > (default #t))) I'd reload within ‘certbot-deploy-hook’, between ‘rename-file’ and “(if deploy-hook-script” so that people don't get surprised, when they use a deploy-hook for unrelated reasons, that the nginx doesn't reload anymore. Plus, reloading nginx is harmless.
bug#46961: Nginx and certbot cervices don't play well togther
On Tue, Jan 30 2024, Carlo Zancanaro wrote: >>> + ;; Due to the way certbot runs, we need to >>> + ;; create the self-signed certificates in the >>> + ;; archive folder and symlink them into the live >>> + ;; folder. This mimics what certbot does well >>> + ;; enough to make acquiring new certificates >>> + ;; work. >> >> In another mail you say it doesn't work as well as you thought it did? >> What doesn't work? > > This comment doesn't describe the code any more. In my first attempt I > was trying to generate certificates in /etc/letsencrypt/live/ and get > certbot to write over them when it ran. Unfortunately, it refused to do > so. I then tried writing to /etc/letsencrypt/archive/ and symlinking > into /etc/letsencrypt/live/ (which is what this comment describes), but > that also failed. Certbot refuses to write over any existing files when > fetching a certificate. Oh I read the comment too quickly, I thought it was describing the /etc/certs moving. I suppose you will update it so to reflect the actual state? What you did (using /etc/certs, and symlinking stuff in /etc/letsencrypt) is a good idea I think, and it's excellent that it's backward compatible! > It looks like other acme clients might be happier to overwrite existing > files, but changing away from certbot seemed like more work than adding > a deploy hook to do what we need. Indeed! > I'll follow up with a v2 of this patch when I get a chance. Thanks! > Carlo
bug#46961: Nginx and certbot cervices don't play well togther
Hi Clément, Thanks for taking the time to review my change. I've responded inline below. On Mon, Jan 29 2024, Clément Lassieur wrote: > This is great, thank you! I tested it, it worked. Could you please > just make sure lines fit within 80 columns? Yep, no worries. > Would it make sense now to run ‘update-certificates’ at end of the > activation stuff? We can't run it during activation, because nginx won't have started yet. However, I am planning a follow-up to add a one-shot service to run certbot after nginx starts. I'll see if I can add it to this series, but if I run into any issues I'll leave it for later. > And would it make sense to reload nginx after ‘update-certificates’ is > run? This would be a sensible default. There is an example in the manual of configuring certbot to reload nginx, so this should be straightforward to add. > gnu/services/certbot.scm:203:26: warning: "subjectAltName=~{DNS:~a~^,~}": > unsupported format option ~{, use (ice-9 format) instead Ha! I import (ice-9 format), but within the gexp (and then I don't use it, whoops!). Must be a leftover from a previous iteration. I'll fix this up. >> + ;; Due to the way certbot runs, we need to >> + ;; create the self-signed certificates in the >> + ;; archive folder and symlink them into the live >> + ;; folder. This mimics what certbot does well >> + ;; enough to make acquiring new certificates >> + ;; work. > > In another mail you say it doesn't work as well as you thought it did? > What doesn't work? This comment doesn't describe the code any more. In my first attempt I was trying to generate certificates in /etc/letsencrypt/live/ and get certbot to write over them when it ran. Unfortunately, it refused to do so. I then tried writing to /etc/letsencrypt/archive/ and symlinking into /etc/letsencrypt/live/ (which is what this comment describes), but that also failed. Certbot refuses to write over any existing files when fetching a certificate. It looks like other acme clients might be happier to overwrite existing files, but changing away from certbot seemed like more work than adding a deploy hook to do what we need. I'll follow up with a v2 of this patch when I get a chance. Carlo
bug#46961: Nginx and certbot cervices don't play well togther
Also, I forgot, I think it would be great to have somewhere in the doc an example of minimal config.scm that works. I know we can't do proper testing because we depend of certbot service but that would make it easier for a lot of people to test it. Maybe such example is already in the docs and I haven't seen it though. Here is the one I used: --8<---cut here---start->8--- (use-modules (gnu) (gnu tests)) (use-package-modules web) (use-service-modules certbot networking web) (operating-system (inherit %simple-os) (services (cons* (service dhcp-client-service-type) (service nginx-service-type (nginx-configuration (server-blocks (list (nginx-server-configuration (listen '("443 ssl")) (server-name '("test.lassieur.org")) (ssl-certificate "/etc/certs/test.lassieur.org/fullchain.pem") (ssl-certificate-key "/etc/certs/test.lassieur.org/privkey.pem")) (service certbot-service-type (certbot-configuration (certificates (list (certificate-configuration (domains '("test.lassieur.org"))) %base-services))) --8<---cut here---end--->8---
bug#46961: Nginx and certbot cervices don't play well togther
Hi Carlo, On Wed, Jan 24 2024, Carlo Zancanaro wrote: > * gnu/services/certbot.scm (): Add > start-self-signed? field. > (generate-certificate-gexp): New procedure. > (certbot-activation): Generate self-signed certificates when > start-self-signed? is #t. > * doc/guix.texi (Certificate services): Document start-self-signed?. > --- > doc/guix.texi| 6 + > gnu/services/certbot.scm | 56 +--- > 2 files changed, 59 insertions(+), 3 deletions(-) This is great, thank you! I tested it, it worked. Could you please just make sure lines fit within 80 columns? And there is a warning during compilation, pasted below. Would it make sense now to run ‘update-certificates’ at end of the activation stuff? And would it make sense to reload nginx after ‘update-certificates’ is run? Clément > diff --git a/doc/guix.texi b/doc/guix.texi > index 2d43ab9a65..15b256d0a3 100644 > --- a/doc/guix.texi > +++ b/doc/guix.texi > @@ -32664,6 +32664,12 @@ Certificate Services > contain a space-delimited list of renewed certificate domains (for > example, @samp{"example.com www.example.com"}. > > +@item @code{start-self-signed?} (default: @code{#t}) > +Whether to generate an initial self-signed certificate during system > +activation. This option is particularly useful to allow @code{nginx} to > +start before @code{certbot} has run, because @code{certbot} relies on > +@code{nginx} running to perform HTTP challenges. > + > @end table > @end deftp > > diff --git a/gnu/services/certbot.scm b/gnu/services/certbot.scm > index 58e709f8a4..bb321a1b50 100644 > --- a/gnu/services/certbot.scm > +++ b/gnu/services/certbot.scm > @@ -64,7 +64,9 @@ (define-record-type* >(cleanup-hookcertificate-cleanup-hook > (default #f)) >(deploy-hook certificate-configuration-deploy-hook > - (default #f))) > + (default #f)) > + (start-self-signed? certificate-configuration-start-self-signed? > + (default #t))) > > (define-record-type* >certbot-configuration make-certbot-configuration > @@ -91,7 +93,10 @@ (define-record-type* > (define (certbot-deploy-hook name deploy-hook-script) >"Returns a gexp which creates symlinks for privkey.pem and fullchain.pem > from /etc/certs/NAME to /etc/letsenctypt/live/NAME. If DEPLOY-HOOK-SCRIPT is > -not #f then it is run after the symlinks have been created." > +not #f then it is run after the symlinks have been created. This wrapping is > +necessary for certificates with start-self-signed? set to #t, as it will > +overwrite the initial self-signed certificates upon the first successful > +deploy." >(program-file > (string-append name "-deploy-hook") > (with-imported-modules '((guix build utils)) > @@ -108,7 +113,8 @@ (define (certbot-deploy-hook name deploy-hook-script) > "/etc/letsencrypt/live/" name "/fullchain.pem") >#$(string-append "/etc/certs/" name "/fullchain.pem.new")) > > - ;; Rename over the top of the old ones, if there are any. > + ;; Rename over the top of the old ones, just in case they were the > + ;; original self-signed certificates. > (rename-file #$(string-append "/etc/certs/" name "/privkey.pem.new") >#$(string-append "/etc/certs/" name "/privkey.pem")) > (rename-file #$(string-append "/etc/certs/" name > "/fullchain.pem.new") > @@ -182,6 +188,44 @@ (define (certbot-renewal-jobs config) > #~(job '(next-minute-from (next-hour '(0 12)) (list (random 60))) >#$(certbot-command config > > +(define (generate-certificate-gexp certbot-cert-directory rsa-key-size) > + (match-lambda > +(($ name (primary-domain other-domains ...) > challenge > +csr authentication-hook > +cleanup-hook deploy-hook) > + (let (;; Arbitrary default subject, with just the > + ;; right domain filled in. These values don't > + ;; have any real significance. > + (subject (string-append "/C=US/ST=Oregon/L=Portland/O=Company > Name/OU=Org/CN=" > + primary-domain)) > + (alt-names (if (null? other-domains) > + #f > + (format #f "subjectAltName=~{DNS:~a~^,~}" > other-domains))) gnu/services/certbot.scm:203:26: warning: "subjectAltName=~{DNS:~a~^,~}": unsupported format option ~{, use (ice-9 format) instead > + (directory (string-append "/etc/certs/" (or name > primary-domain > + #~(begin > + (use-modules (ice-9 format)) > + (when (not (file-exists? #$directory)) > + ;; Due to the way certbot runs, we need to > + ;; create the self-signed certificates in the > + ;; archive folder and symlink them into the live > +
bug#46961: Nginx and certbot cervices don't play well togther
Hello Guix, After an suggestion from Tobias to give a try at forcing HTTPS for Guix's websites on berlin, I had a go at it but it was more complex that what I was expecting. Looking deeper at nginx and certbot services it appear both services don't play that well together, requering a inital dance when deploying a new HTTPS virtual server. As explained in #36389¹ you need to: « - run system configuration with just the certbot service - use certbot to generate your initial certificates - reconfigure with additional nginx server configuration, pointing to the SSL certificates created by certbot » Indeed, with an operating-system continaing the following services it's impossible to sart Nginx and Certbot at once as one would expect: --8<---cut here---start->8--- (service nginx-service-type) (service php-fpm-service-type) (service certbot-service-type (certbot-configuration (certificates (list (certificate-configuration (domains '("test.sama.re")) (deploy-hook (program-file "nginx-deploy-hook" #~(let ((pid (call-with-input-file "/var/run/nginx.pid" read))) (kill pid SIGHUP) (cat-avatar-generator-service #:configuration (nginx-server-configuration (listen '("443 ssl")) (server-name '("test.sama.re")) (ssl-certificate "/etc/letsencrypt/live/test.sama.re/fullchain.pem") (ssl-certificate-key "/etc/letsencrypt/live/test.sama.re/privkey.pem"))) --8<---cut here---end--->8--- Here is the error from reconfiguring the system: --8<---cut here---start->8--- # guix system reconfigure /etc/config.sm [...] building /gnu/store/55cq2ja4i5489s55viv9fh50032d1ziy-switch-to-system.scm.drv... making '/gnu/store/p2rkcmrnpls5py7x2iappf2qcbxwlb95-system' the current system... setting up setuid programs in '/run/setuid-programs'... populating /etc from /gnu/store/k2kb8hsq3q0dhhad4a9pjh4kx32mn4g0-etc... /var/lib/certbot/renew-certificates may need to be run creating nginx log directory '/var/log/nginx' creating nginx run directory '/var/run/nginx' creating nginx temp directories '/var/run/nginx/{client_body,proxy,fastcgi,uwsgi,scgi}_temp' nginx: [emerg] cannot load certificate "/etc/letsencrypt/live/test.sama.re/fullchain.pem": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/etc/letsencrypt/live/test.sama.re/fullchain.pem','r') error:2006D080:BIO routines:BIO_new_file:no such file) nginx: configuration file /gnu/store/chpw631djay2w39x7agg8zz53iayy4zy-nginx.conf test failed `/gnu/store/jyxc290q7jyhhpalski0h13h8z9zvnka-openssh-authorized-keys/bricewge' -> `/etc/ssh/authorized_keys.d/bricewge' The following derivation will be built: /gnu/store/qlzbrmpx6wnhzqcpqi9yrbb6xva82kvr-install-bootloader.scm.drv building /gnu/store/qlzbrmpx6wnhzqcpqi9yrbb6xva82kvr-install-bootloader.scm.drv... guix system: bootloader successfully installed on '/dev/sda' The following derivation will be built: /gnu/store/ikak44inrnz3b3dx8j8csdakgqafbijn-upgrade-shepherd-services.scm.drv building /gnu/store/ikak44inrnz3b3dx8j8csdakgqafbijn-upgrade-shepherd-services.scm.drv... shepherd: Removing service 'dbus-system'... shepherd: Service dbus-system has been stopped. shepherd: Done. shepherd: Service host-name has been started. shepherd: Service user-homes has been started. shepherd: Service host-name has been started. shepherd: Service term-auto could not be started. shepherd: Service php-fpm has been started. guix system: warning: exception caught while executing 'start' on service 'nginx': Throw to key `%exception' with args `("#< program: \"/gnu/store/hn1mvgafkpf5knrnzvwpgpdlzmq553al-nginx-1.19.6/sbin/nginx\" arguments: (\"-c\" \"/gnu/store/chpw631djay2w39x7agg8zz53iayy4zy-nginx.conf\" \"-p\" \"/var/run/nginx\") exit-status: 1 term-signal: #f stop-signal: #f>")'. guix system: warning: some services could not be upgraded hint: To allow changes to all the system services to take effect, you will need to reboot. --8<---cut here---end--->8--- What happen is Nginx won't start because the certficate related files present in it's configuration doesn't exist and we can't get a Let's Encrypt certificate from a HTTP-01 challenge without that web server running. NixOS broke that chicken and egg problem by generating a self-signed certificate first, after that starting nginx, then requesting a valid Lets' Encrypt certificate and finally reloading Nginx. That way we end up with a Nginx server using Let's Encrypt certificate with no more that a simple system reconfiguration. Note that, the initial self-signed certificate will need to be at the path were certbot will put it's