When ovn-ic is paused, it releases the ISB lock (if it was held), but
does not cancel an outstanding/contended lock request.
This means a paused instance can acquire the ISB lock after
another AZ releases it, even though the paused instance cannot use it.
When then handling the "locked" event, it might not unlock it, as
the check of the lock state is done before ovsdb_idl_run.
Finally, it goes to sleep (holding the ISB lock) until something else
wakes it up.
Fix this by also checking ovsdb_idl_is_lock_contended() before
releasing, matching the pattern already used for the SB lock.
Fixes: 052a298bb90e ("ovn-ic: Use dual IC-SB connections to prevent constraint
violations.")
Signed-off-by: Xavier Simonart <[email protected]>
---
ic/ovn-ic.c | 3 ++-
tests/ovn-ic.at | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index b197ceeab..4f7e33da3 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -4062,7 +4062,8 @@ main(int argc, char *argv[])
* copy will be out of sync.
* - but we don't want to create any txns.
* */
- if (ovsdb_idl_has_lock(ovnisb_idl_loop.idl)) {
+ if (ovsdb_idl_has_lock(ovnisb_idl_loop.idl) ||
+ ovsdb_idl_is_lock_contended(ovnisb_idl_loop.idl)) {
VLOG_INFO("This ovn-ic instance is now paused. "
"Removing IC-SB lock.");
ovsdb_idl_set_lock(ovnisb_idl_loop.idl, NULL);
diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index 0826632e9..9b6a58acd 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -5061,3 +5061,42 @@ OVN_CLEANUP_IC([az1], [az2])
AT_CLEANUP
])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn-ic - pause])
+ovn_init_ic_db
+net_add n1
+
+check ovn-ic-nbctl ts-add ts1
+ovn_start az1
+OVS_WAIT_UNTIL([test "x$(as az1 ovn-appctl -t ic/ovn-ic status)" = "xStatus:
active"])
+OVS_WAIT_UNTIL([grep -q "OVN ISB lock acquired" az1/ic/ovn-ic.log])
+
+ovn_start az2
+
+AS_BOX([az2 paused])
+check as az2 ovn-appctl -t ic/ovn-ic pause
+OVS_WAIT_UNTIL([test "x$(as az2 ovn-appctl -t ic/ovn-ic status)" = "xStatus:
paused"])
+n1_lock_notif=$(grep -c 'send notification, method="locked"'
ovn-ic-sb/ovsdb-server.log)
+
+AS_BOX([az1 paused])
+check as az1 ovn-appctl -t ic/ovn-ic pause
+OVS_WAIT_UNTIL([test "x$(as az1 ovn-appctl -t ic/ovn-ic status)" = "xStatus:
paused"])
+n2_lock_notif=$(grep -c 'send notification, method="locked"'
ovn-ic-sb/ovsdb-server.log)
+
+# Pausing az1 should not cause az2 to own the lock: az2 is paused.
+echo "$n1_lock_notif before and $n2_lock_notif after pausing az1"
+AT_CHECK([test $n1_lock_notif -eq $n2_lock_notif])
+
+n1_lock_acquired=$(grep -c 'OVN ISB lock acquired' az1/ic/ovn-ic.log)
+AS_BOX([az1 resumed])
+check as az1 ovn-appctl -t ic/ovn-ic resume
+n2_lock_acquired=$(grep -c 'OVN ISB lock acquired' az1/ic/ovn-ic.log)
+echo "$n1_lock_acquired before and $n2_lock_acquired after resuming az1"
+OVS_WAIT_UNTIL([test $n1_lock_acquired -ne $n2_lock_acquired])
+
+OVN_CLEANUP_IC([az1], [az2])
+
+AT_CLEANUP
+])
+
--
2.47.1
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev