If started with --no-restart-ike-daemon, ovs-monitor-ipsec doesn't
clear the NSS database.  This is not a problem if the certificates do
not change while the monitor is down, because completely duplicate
entries cannot be added to the NSS database.  However, if the monitor
is stopped, then certificates change on disk and then the monitor is
started back, it will add new tunnel certificates alongside the old
ones and will fail to add the new CA certificate.  So, we'll end up
with multiple certificates for the same tunnel and the outdated CA
certificate.  This will not allow creating new connections as we'll
not be able to verify certificates of the new CA:

  # certutil -L -d sql:/var/lib/ipsec/nss

  Certificate Nickname             Trust Attributes
                                   SSL,S/MIME,JAR/XPI

  ovs_certkey_c04c352b             u,u,u
  ovs_cert_cacert                  CT,,
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u

  pluto: "ovn-c04c35-0-out-1" #459: processing decrypted
   IKE_AUTH request containing SK{IDi,CERT,CERTREQ,IDr,AUTH,SA,
   TSi,TSr,N(USE_TRANSPORT_MODE)}
  pluto: "ovn-c04c35-0-out-1" #459: NSS: ERROR:
   IPsec certificate CN=c04c352b,OU=kind,O=ovnkubernetes,C=US invalid:
   SEC_ERROR_UNKNOWN_ISSUER: Peer's Certificate issuer is not recognized.
  pluto: "ovn-c04c35-0-out-1" #459: NSS: end certificate invalid

Fix that by always trying to remove the certificate from the NSS
database before importing the new one.  This doesn't affect Libreswan,
because it will not access the database until we ask it to re-read
the secrets.

We have to call deletion multiple times in order to remove all the
potential duplicates from previous runs.  This will be useful on
upgrade, but also may save us if one of the deletions ever fail for
any reason and we'll end up with a duplicate entry anyway.

One alternative might be to always clear the database, even if the
--no-restart-ike-daemon option is set, but there is a chance that
we'll refresh and ask to re-read secrets before we got all the tunnel
information from the database.  That may affect dataplane.  Even if
this is really not possible, the logic seems too far apart to rely on.

The clearing may seem redundant now, but it may still be useful to
clean up certificates for tunnels that disappeared while the monitor
was down.  Approach taken in this change doesn't cover this case.

Test is added to check the described scenario.  The 'on_exit' command
is converted to obtain the monitor PID at exit, since we're now killing
one monitor and starting another.

Fixes: fe5ff26a49f6 ("ovs-monitor-ipsec: Add option to not restart IKE daemon.")
Reported-at: https://issues.redhat.com/browse/FDP-1473
Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
---
 ipsec/ovs-monitor-ipsec.in |  31 ++++++++---
 tests/system-ipsec.at      | 107 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 129 insertions(+), 9 deletions(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 5058c0908..2ce08bae9 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -77,7 +77,7 @@ RECONCILIATION_INTERVAL = 15  # seconds
 TIMEOUT_EXPIRED = 137  # Exit code for a SIGKILL (128 + 9).
 
 
-def run_command(args, description=None):
+def run_command(args, description=None, warn_on_failure=True):
     """ This function runs the process args[0] with args[1:] arguments
     and returns a tuple: return-code, stdout, stderr. """
 
@@ -99,7 +99,7 @@ def run_command(args, description=None):
         proc.kill()
         ret = TIMEOUT_EXPIRED
 
-    if proc.returncode or perr:
+    if (proc.returncode or perr) and warn_on_failure:
         vlog.warn("Failed to %s; exit code: %d"
                   % (description, proc.returncode))
         vlog.warn("cmdline: %s" % proc.args)
@@ -923,13 +923,21 @@ conn prevent_unencrypted_vxlan
     def _nss_import_cert(self, cert, name, cert_type):
         """Cert_type is 'CT,,' for the CA certificate and 'P,P,P' for the
         normal certificate."""
+
+        # Make sure there is no certificate with the same nickname already.
+        # Need to run in a loop, because there can be multiple certs with
+        # the same nickname and each call will remove only one of them.
+        while self._nss_delete_cert(name, warn_on_failure=False) == 0:
+            pass
+
         run_command(['certutil', '-A', '-a', '-i', cert, '-d', self.IPSEC_D,
                      '-n', name, '-t', cert_type],
                     "import certificate %s into NSS" % name)
 
-    def _nss_delete_cert(self, name):
-        run_command(['certutil', '-D', '-d', self.IPSEC_D, '-n', name],
-                    "delete certificate %s from NSS" % name)
+    def _nss_delete_cert(self, name, warn_on_failure=True):
+        return run_command(['certutil', '-D', '-d', self.IPSEC_D, '-n', name],
+                           "delete certificate %s from NSS" % name,
+                           warn_on_failure=warn_on_failure)[0]
 
     def _nss_import_cert_and_key(self, cert, key, name):
         # Avoid deleting other files
@@ -945,15 +953,22 @@ conn prevent_unencrypted_vxlan
                        "create p12 file from pem files")[0]:
             return
 
+        # Make sure there is no certificate with the same nickname already.
+        # Need to run in a loop, because there can be multiple certs with
+        # the same nickname and each call will remove only one of them.
+        while self._nss_delete_cert_and_key(name, warn_on_failure=False) == 0:
+            pass
+
         # Load p12 file to the database
         run_command(['pk12util', '-i', path, '-d', self.IPSEC_D, '-W', ''],
                     "load p12 file to the NSS database")
         os.remove(path)
 
-    def _nss_delete_cert_and_key(self, name):
+    def _nss_delete_cert_and_key(self, name, warn_on_failure=True):
         # Delete certificate and private key
-        run_command(['certutil', '-F', '-d', self.IPSEC_D, '-n', name],
-                    "delete certificate and private key for %s" % name)
+        return run_command(['certutil', '-F', '-d', self.IPSEC_D, '-n', name],
+                           "delete certificate and private key for %s" % name,
+                           warn_on_failure=warn_on_failure)[0]
 
 
 class IPsecTunnel(object):
diff --git a/tests/system-ipsec.at b/tests/system-ipsec.at
index 3136f8c03..18d7285b1 100644
--- a/tests/system-ipsec.at
+++ b/tests/system-ipsec.at
@@ -84,7 +84,7 @@ m4_define([IPSEC_ADD_NODE],
         --ipsec-ctl=$ovs_base/$1/pluto.ctl \
         m4_if([$6], [], [], [$6]) \
         --no-restart-ike-daemon --detach ], [0], [], [stderr])
-  on_exit "kill `cat $ovs_base/$1/ovs-monitor-ipsec.pid`"
+  on_exit 'kill $(cat $ovs_base/$1/ovs-monitor-ipsec.pid)'
 
   dnl Set up OVS bridge
   NS_CHECK_EXEC([$1],
@@ -753,6 +753,111 @@ AT_CHECK([head -n $(grep -c ':' left/sa.before) 
left/sa.after \
 OVS_TRAFFIC_VSWITCHD_STOP()
 AT_CLEANUP
 
+AT_SETUP([IPsec -- Libreswan - certificate update while the daemon is down])
+AT_KEYWORDS([ipsec libreswan geneve ca-cert])
+dnl Note: Geneve test may not work on older kernels due to CVE-2020-25645
+dnl https://bugzilla.redhat.com/show_bug.cgi?id=1883988
+
+CHECK_LIBRESWAN()
+OVS_TRAFFIC_VSWITCHD_START()
+IPSEC_SETUP_UNDERLAY()
+
+dnl Set up dummy hosts.
+IPSEC_ADD_NODE_LEFT(10.1.1.1, 10.1.1.2)
+IPSEC_ADD_NODE_RIGHT(10.1.1.2, 10.1.1.1)
+
+dnl Create and set ca-signed certs.
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            --force init], [0], [ignore], [ignore])
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            req+sign -u left], [0], [ignore], [ignore])
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            req+sign -u right], [0], [ignore], [ignore])
+
+OVS_VSCTL_LEFT(set Open_vSwitch . \
+      other_config:ca_cert=${ovs_base}/switchca/cacert.pem \
+      other_config:certificate=${ovs_base}/left-cert.pem \
+      other_config:private_key=${ovs_base}/left-privkey.pem)
+OVS_VSCTL_RIGHT(set Open_vSwitch . \
+      other_config:ca_cert=${ovs_base}/switchca/cacert.pem \
+      other_config:certificate=${ovs_base}/right-cert.pem \
+      other_config:private_key=${ovs_base}/right-privkey.pem)
+
+dnl Set up IPsec tunnel on 'left' host.
+IPSEC_ADD_TUNNEL_LEFT([geneve],
+                      [options:remote_ip=10.1.1.2 options:remote_name=right])
+
+dnl Set up IPsec tunnel on 'right' host.
+IPSEC_ADD_TUNNEL_RIGHT([geneve],
+                       [options:remote_ip=10.1.1.1 options:remote_name=left])
+CHECK_ESP_TRAFFIC
+
+dnl Get the numbers of all the current SAs.
+IPSEC_SA_LIST([left], [left/sa.before])
+IPSEC_SA_LIST([right], [right/sa.before])
+
+dnl Kill the ovs-monitor-ipsec on the left host.
+AT_CHECK([kill $(cat ${ovs_base}/left/ovs-monitor-ipsec.pid)])
+OVS_WAIT_WHILE([kill -0 $(cat ${ovs_base}/left/ovs-monitor-ipsec.pid)])
+
+dnl Re-generate the certificates.
+AT_CHECK([rm -rf ${ovs_base}/switchca *.pem])
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            --force init], [0], [ignore], [ignore])
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            req+sign -u left], [0], [ignore], [ignore])
+AT_CHECK([ovs-pki -b --dir=${ovs_base} -l ${ovs_base}/ovs-pki.log \
+            req+sign -u right], [0], [ignore], [ignore])
+
+dnl Re-start ovs-monitor-ipsec with --no-restart-ike-daemon.
+NS_CHECK_EXEC([left], [ovs-monitor-ipsec unix:${OVS_RUNDIR}/left/db.sock \
+    --pidfile=${OVS_RUNDIR}/left/ovs-monitor-ipsec.pid \
+    --ike-daemon=libreswan \
+    --ipsec-conf=$ovs_base/left/ipsec.conf \
+    --ipsec-d=$ovs_base/left/ipsec.d \
+    --ipsec-secrets=$ovs_base/left/secrets \
+    --log-file=$ovs_base/left/ovs-monitor-ipsec-2.log \
+    --ipsec-ctl=$ovs_base/left/pluto.ctl \
+    --no-restart-ike-daemon --detach ], [0], [], [stderr])
+
+OVS_WAIT_UNTIL([grep -q 'Connections for all(1) configured tunnels are Up.' \
+                    $ovs_base/left/ovs-monitor-ipsec-2.log])
+
+dnl Check that the original left-right tunnel still works.
+NS_CHECK_EXEC([left], [ping -q -c 3 -i 0.1 -W 2 192.0.0.2 | FORMAT_PING], [0],
+              [3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([right], [ping -q -c 3 -i 0.1 -W 2 192.0.0.1 | FORMAT_PING], [0],
+              [3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Check that ovs-monitor-ipsec didn't touch the original tunnel.
+IPSEC_SA_LIST([left], [left/sa.after])
+IPSEC_SA_LIST([right], [right/sa.after])
+AT_CHECK([diff -u left/sa.before left/sa.after])
+AT_CHECK([diff -u right/sa.before right/sa.after])
+
+dnl Check that there are no extra certs in the NSS database.
+AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d | grep -c ovs_cert], [0], [2
+])
+
+dnl Check that loaded certs are the new ones.  Using openssl for comparison
+dnl to ensure the same format (the file on disk and the dump from the NSS
+dnl database have slightly different formats).
+AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d \
+            -a -n ovs_cert_cacert], [0], [stdout])
+AT_CHECK_UNQUOTED([openssl x509 -in stdout -text -noout], [0],
+                  [$(openssl x509 -in switchca/cacert.pem -text -noout)
+])
+AT_CHECK([certutil -L -d $ovs_base/left/ipsec.d \
+            -a -n ovs_certkey_left], [0], [stdout])
+AT_CHECK_UNQUOTED([openssl x509 -in stdout -text -noout], [0],
+                  [$(openssl x509 -in left-cert.pem -text -noout)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP()
+AT_CLEANUP
+
 AT_SETUP([IPsec -- Libreswan NxN geneve tunnels + reconciliation])
 AT_KEYWORDS([ipsec libreswan scale reconciliation])
 dnl Note: Geneve test may not work on older kernels due to CVE-2020-25645
-- 
2.49.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to