If started with --no-restart-ike-daemon, ovs-monitor-ipsec doesn't clear the NSS database. This is not a problem if the certificates do not change while the monitor is down, because completely duplicate entries cannot be added to the NSS database. However, if the monitor is stopped, then certificates change on disk and then the monitor is started back, it will add new tunnel certificates alongside the old ones and will fail to add the new CA certificate. So, we'll end up with multiple certificates for the same tunnel and the outdated CA certificate. This will not allow creating new connections as we'll not be able to verify certificates of the new CA:
# certutil -L -d sql:/var/lib/ipsec/nss Certificate Nickname Trust Attributes SSL,S/MIME,JAR/XPI ovs_certkey_c04c352b u,u,u ovs_cert_cacert CT,, ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u pluto: "ovn-c04c35-0-out-1" #459: processing decrypted IKE_AUTH request containing SK{IDi,CERT,CERTREQ,IDr,AUTH,SA, TSi,TSr,N(USE_TRANSPORT_MODE)} pluto: "ovn-c04c35-0-out-1" #459: NSS: ERROR: IPsec certificate CN=c04c352b,OU=kind,O=ovnkubernetes,C=US invalid: SEC_ERROR_UNKNOWN_ISSUER: Peer's Certificate issuer is not recognized. pluto: "ovn-c04c35-0-out-1" #459: NSS: end certificate invalid Fix that by always checking certificates in the NSS database before importing the new one. If they do not match, then remove the old one from the NSS and add the new one. We have to call deletion multiple times in order to remove all the potential duplicates from previous runs. This will be useful on upgrade, but also may save us if one of the deletions ever fail for any reason and we'll end up with a duplicate entry anyway. One alternative might be to always clear the database, even if the --no-restart-ike-daemon option is set, but there is a chance that we'll refresh and ask to re-read secrets before we got all the tunnel information from the database. That may affect dataplane. Even if this is really not possible, the logic seems too far apart to rely on. Also, Libreswan 4.6 seems to have some bug that prevents re-adding deleted connections if we removed and re-add the same certificate (newer versions don't have this issue), so it's better if we do not touch certificates that didn't actually change if we're not restarting the IKE daemon. The clearing may seem redundant now, but it may still be useful to clean up certificates for tunnels that disappeared while the monitor was down. Approach taken in this change doesn't cover this case. Test is added to check the described scenario. The 'on_exit' command is converted to obtain the monitor PID at exit, since we're now killing one monitor and starting another. Fixes: fe5ff26a49f6 ("ovs-monitor-ipsec: Add option to not restart IKE daemon.") Reported-at: https://issues.redhat.com/browse/FDP-1473 Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> --- Version 2: * Changed the approach to compare existing NSS entry with the current certificate instead of always removing and re-adding. This allows to avoid a weird bug in Libreswan 4.6 where deleted connection cannot be re-added if the certificate is removed and re-added. This new version should also be slightly faster since we're most of the time calling one external command (cert dump) instead of two (delete + add). * Tested with Libreswan 4.6, 4.9, 4.15, 5.1 and 5.2. ipsec/ovs-monitor-ipsec.in | 80 ++++++++++++++++++++++++--- tests/system-ipsec.at | 107 ++++++++++++++++++++++++++++++++++++- 2 files changed, 180 insertions(+), 7 deletions(-) diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in index 5058c0908..7d594620d 100755 --- a/ipsec/ovs-monitor-ipsec.in +++ b/ipsec/ovs-monitor-ipsec.in @@ -77,7 +77,7 @@ RECONCILIATION_INTERVAL = 15 # seconds TIMEOUT_EXPIRED = 137 # Exit code for a SIGKILL (128 + 9). -def run_command(args, description=None): +def run_command(args, description=None, warn_on_failure=True): """ This function runs the process args[0] with args[1:] arguments and returns a tuple: return-code, stdout, stderr. """ @@ -99,7 +99,7 @@ def run_command(args, description=None): proc.kill() ret = TIMEOUT_EXPIRED - if proc.returncode or perr: + if (proc.returncode or perr) and warn_on_failure: vlog.warn("Failed to %s; exit code: %d" % (description, proc.returncode)) vlog.warn("cmdline: %s" % proc.args) @@ -920,18 +920,85 @@ conn prevent_unencrypted_vxlan elif name.startswith(self.CERTKEY_PREFIX): self._nss_delete_cert_and_key(name) + def _nss_get_cert(self, name): + """Obtains ASCII-formatted (PEM) certificate from the NSS database + by name. If multiple certificates have the same name, all will be + returned together.""" + + ret, pout, perr = run_command( + ['certutil', '-L', '-d', self.IPSEC_D, '-a', '-n', name], + "get certificate %s from NSS" % name, warn_on_failure=False) + return None if ret else pout + + def _pem_get_certs(self, pem_text): + """Returns a set of all base64-encoded certificates in the 'pem_text'. + Can be used to unify slightly different formats between different files + and NSS database dumps.""" + + pattern = r"--*BEGIN CERTIFICATE--*(.*?)--*END CERTIFICATE--*" + matches = re.findall(pattern, pem_text, re.DOTALL) + return {re.sub(r'[^A-Za-z0-9+/=]', '', body) for body in matches} + + def _cert_check_and_remove_obsolete(self, cert, name): + """Looks up the certificate in the NSS database, removes all the + certificates that have the same 'name' but do not match the 'cert'. + Returns False if the 'cert' is not in the database in the end, + True otherwise.""" + + certs = set() + with open(cert) as f: + certs = self._pem_get_certs(f.read()) + + while True: + current = self._nss_get_cert(name) + if current is None: + # There are no certificates in the database. + return False + + current_certs = self._pem_get_certs(current) + if certs == current_certs: + # The certificates are up to date. + return True + + vlog.info( + "Mismatched certificate %s in the NSS database, removing." + % name + ) + + # Delete one certificate and try again. + res = 0 + if name.startswith(self.CERT_PREFIX): + res = self._nss_delete_cert(name) + else: + res = self._nss_delete_cert_and_key(name) + + if res != 0: + # We failed to remove the certificate. Assume that the + # one we need is not there. + return False + def _nss_import_cert(self, cert, name, cert_type): """Cert_type is 'CT,,' for the CA certificate and 'P,P,P' for the normal certificate.""" + + if self._cert_check_and_remove_obsolete(cert, name): + # Already in the database. + return + run_command(['certutil', '-A', '-a', '-i', cert, '-d', self.IPSEC_D, '-n', name, '-t', cert_type], "import certificate %s into NSS" % name) def _nss_delete_cert(self, name): - run_command(['certutil', '-D', '-d', self.IPSEC_D, '-n', name], - "delete certificate %s from NSS" % name) + return run_command(['certutil', '-D', '-d', self.IPSEC_D, '-n', name], + "delete certificate %s from NSS" % name)[0] def _nss_import_cert_and_key(self, cert, key, name): + if self._cert_check_and_remove_obsolete(cert, name): + # Already in the database. And if the cert is the same, the + # key must be the same as well. + return + # Avoid deleting other files path = os.path.abspath('/tmp/%s.p12' % name) if not path.startswith('/tmp/'): @@ -952,8 +1019,9 @@ conn prevent_unencrypted_vxlan def _nss_delete_cert_and_key(self, name): # Delete certificate and private key - run_command(['certutil', '-F', '-d', self.IPSEC_D, '-n', name], - "delete certificate and private key for %s" % name) + return run_command( + ['certutil', '-F', '-d', self.IPSEC_D, '-n', name], + "delete certificate and private key for %s" % name)[0] class IPsecTunnel(object): diff --git a/tests/system-ipsec.at b/tests/system-ipsec.at index 3136f8c03..18d7285b1 100644 --- a/tests/system-ipsec.at +++ b/tests/system-ipsec.at @@ -84,7 +84,7 @@ m4_define([IPSEC_ADD_NODE], --ipsec-ctl=$ovs_base/$1/pluto.ctl \ m4_if([$6], [], [], [$6]) \ --no-restart-ike-daemon --detach ], [0], [], [stderr]) - on_exit "kill `cat $ovs_base/$1/ovs-monitor-ipsec.pid`" + on_exit 'kill $(cat $ovs_base/$1/ovs-monitor-ipsec.pid)' dnl Set up OVS bridge NS_CHECK_EXEC([$1], @@ -753,6 +753,111 @@ AT_CHECK([head -n $(grep -c ':' left/sa.before) left/sa.after \ OVS_TRAFFIC_VSWITCHD_STOP() AT_CLEANUP +AT_SETUP([IPsec -- Libreswan - certificate update while the daemon is down]) +AT_KEYWORDS([ipsec libreswan geneve ca-cert]) +dnl Note: Geneve test may not work on older kernels due to CVE-2020-25645 +dnl https://bugzilla.redhat.com/show_bug.cgi?id=1883988 + +CHECK_LIBRESWAN() +OVS_TRAFFIC_VSWITCHD_START() +IPSEC_SETUP_UNDERLAY() + +dnl Set up dummy hosts. +IPSEC_ADD_NODE_LEFT(10.1.1.1, 10.1.1.2) +IPSEC_ADD_NODE_RIGHT(10.1.1.2, 10.1.1.1) + +dnl Create and set ca-signed certs. +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + --force init], [0], [ignore], [ignore]) +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + req+sign -u left], [0], [ignore], [ignore]) +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + req+sign -u right], [0], [ignore], [ignore]) + +OVS_VSCTL_LEFT(set Open_vSwitch . \ + other_config:ca_cert=${ovs_base}/switchca/cacert.pem \ + other_config:certificate=${ovs_base}/left-cert.pem \ + other_config:private_key=${ovs_base}/left-privkey.pem) +OVS_VSCTL_RIGHT(set Open_vSwitch . \ + other_config:ca_cert=${ovs_base}/switchca/cacert.pem \ + other_config:certificate=${ovs_base}/right-cert.pem \ + other_config:private_key=${ovs_base}/right-privkey.pem) + +dnl Set up IPsec tunnel on 'left' host. +IPSEC_ADD_TUNNEL_LEFT([geneve], + [options:remote_ip=10.1.1.2 options:remote_name=right]) + +dnl Set up IPsec tunnel on 'right' host. +IPSEC_ADD_TUNNEL_RIGHT([geneve], + [options:remote_ip=10.1.1.1 options:remote_name=left]) +CHECK_ESP_TRAFFIC + +dnl Get the numbers of all the current SAs. +IPSEC_SA_LIST([left], [left/sa.before]) +IPSEC_SA_LIST([right], [right/sa.before]) + +dnl Kill the ovs-monitor-ipsec on the left host. +AT_CHECK([kill $(cat ${ovs_base}/left/ovs-monitor-ipsec.pid)]) +OVS_WAIT_WHILE([kill -0 $(cat ${ovs_base}/left/ovs-monitor-ipsec.pid)]) + +dnl Re-generate the certificates. +AT_CHECK([rm -rf ${ovs_base}/switchca *.pem]) +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + --force init], [0], [ignore], [ignore]) +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + req+sign -u left], [0], [ignore], [ignore]) +AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \ + req+sign -u right], [0], [ignore], [ignore]) + +dnl Re-start ovs-monitor-ipsec with --no-restart-ike-daemon. +NS_CHECK_EXEC([left], [ovs-monitor-ipsec unix:${OVS_RUNDIR}/left/db.sock \ + --pidfile=${OVS_RUNDIR}/left/ovs-monitor-ipsec.pid \ + --ike-daemon=libreswan \ + --ipsec-conf=$ovs_base/left/ipsec.conf \ + --ipsec-d=$ovs_base/left/ipsec.d \ + --ipsec-secrets=$ovs_base/left/secrets \ + --log-file=$ovs_base/left/ovs-monitor-ipsec-2.log \ + --ipsec-ctl=$ovs_base/left/pluto.ctl \ + --no-restart-ike-daemon --detach ], [0], [], [stderr]) + +OVS_WAIT_UNTIL([grep -q 'Connections for all(1) configured tunnels are Up.' \ + $ovs_base/left/ovs-monitor-ipsec-2.log]) + +dnl Check that the original left-right tunnel still works. +NS_CHECK_EXEC([left], [ping -q -c 3 -i 0.1 -W 2 192.0.0.2 | FORMAT_PING], [0], + [3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([right], [ping -q -c 3 -i 0.1 -W 2 192.0.0.1 | FORMAT_PING], [0], + [3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Check that ovs-monitor-ipsec didn't touch the original tunnel. +IPSEC_SA_LIST([left], [left/sa.after]) +IPSEC_SA_LIST([right], [right/sa.after]) +AT_CHECK([diff -u left/sa.before left/sa.after]) +AT_CHECK([diff -u right/sa.before right/sa.after]) + +dnl Check that there are no extra certs in the NSS database. +AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d | grep -c ovs_cert], [0], [2 +]) + +dnl Check that loaded certs are the new ones. Using openssl for comparison +dnl to ensure the same format (the file on disk and the dump from the NSS +dnl database have slightly different formats). +AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d \ + -a -n ovs_cert_cacert], [0], [stdout]) +AT_CHECK_UNQUOTED([openssl x509 -in stdout -text -noout], [0], + [$(openssl x509 -in switchca/cacert.pem -text -noout) +]) +AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d \ + -a -n ovs_certkey_left], [0], [stdout]) +AT_CHECK_UNQUOTED([openssl x509 -in stdout -text -noout], [0], + [$(openssl x509 -in left-cert.pem -text -noout) +]) + +OVS_TRAFFIC_VSWITCHD_STOP() +AT_CLEANUP + AT_SETUP([IPsec -- Libreswan NxN geneve tunnels + reconciliation]) AT_KEYWORDS([ipsec libreswan scale reconciliation]) dnl Note: Geneve test may not work on older kernels due to CVE-2020-25645 -- 2.49.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev