A potential race condition happened with the following 3 threads:
* PMD thread replaced the old_ukey and transitioned the state to
  UKEY_DELETED.
* RCU thread is freeing the old_ukey mutex.
* While the revalidator thread is trying to lock the old_ukey mutex.

We added some timestamp to udpif_key state transition and
the udpif_key mutex free action, as well as the sweep try lock for
the same udpif_key. When the ovs-vswitchd goes core, the udpif_key
try_lock mutex had always a bit later than the udpif_key mutex free
action. For instance [3]:
ukey_destroy_time = 13217289156490
sweep_now = 13217289156568

The second time is 78us behind the first time.

Then vswitchd process aborts at the revalidator thread try_lock of
ukey->mutex because of the NULL pointer.

This patch adds the try_lock for the ukeys' basket udpif_key map to
avoid the PMD and revalidator access to the same map for replacing the
udpif_key and transitioning the udpif_key state.

More details can be found at:
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
[2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
[3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html

Signed-off-by: LIU Yulong <i...@liuyulong.me>
---
v2: - Updated commit message to make 0-day Robot happy.
v3: - Updated commit message to make checkpatch.py happy.
    - Add some investigation details.
---
---
 ofproto/ofproto-dpif-upcall.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 9a5c5c29c..ef13f820a 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2974,6 +2974,10 @@ revalidator_sweep__(struct revalidator *revalidator, 
bool purge)
         struct umap *umap = &udpif->ukeys[i];
         size_t n_ops = 0;
 
+        if (ovs_mutex_trylock(&umap->mutex)) {
+            continue;
+        }
+
         CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
             enum ukey_state ukey_state;
 
@@ -3013,9 +3017,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool 
purge)
             if (ukey_state == UKEY_EVICTED) {
                 /* The common flow deletion case involves deletion of the flow
                  * during the dump phase and ukey deletion here. */
-                ovs_mutex_lock(&umap->mutex);
                 ukey_delete(umap, ukey);
-                ovs_mutex_unlock(&umap->mutex);
             }
 
             if (n_ops == REVALIDATE_MAX_BATCH) {
@@ -3025,6 +3027,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool 
purge)
                 n_ops = 0;
             }
         }
+        ovs_mutex_unlock(&umap->mutex);
 
         if (n_ops) {
             push_ukey_ops(udpif, umap, ops, n_ops);
-- 
2.27.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to