A potential race condition happened with the following 3 threads: * PMD thread replaced the old_ukey and transitioned the state to UKEY_DELETED. * RCU thread is freeing the old_ukey mutex. * While the revalidator thread is trying to lock the old_ukey mutex.
We added some timestamp to udpif_key state transition and the udpif_key mutex free action, as well as the sweep try lock for the same udpif_key. When the ovs-vswitchd goes core, the udpif_key try_lock mutex had always a bit later than the udpif_key mutex free action. For instance [3]: ukey_destroy_time = 13217289156490 sweep_now = 13217289156568 The second time is 78us behind the first time. Then vswitchd process aborts at the revalidator thread try_lock of ukey->mutex because of the NULL pointer. This patch adds the try_lock for the ukeys' basket udpif_key map to avoid the PMD and revalidator access to the same map for replacing the udpif_key and transitioning the udpif_key state. More details can be found at: [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html [2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html Signed-off-by: LIU Yulong <i...@liuyulong.me> --- v2: - Updated commit message to make 0-day Robot happy. v3: - Updated commit message to make checkpatch.py happy. - Add some investigation details. --- --- ofproto/ofproto-dpif-upcall.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 9a5c5c29c..ef13f820a 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -2974,6 +2974,10 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) struct umap *umap = &udpif->ukeys[i]; size_t n_ops = 0; + if (ovs_mutex_trylock(&umap->mutex)) { + continue; + } + CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { enum ukey_state ukey_state; @@ -3013,9 +3017,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) if (ukey_state == UKEY_EVICTED) { /* The common flow deletion case involves deletion of the flow * during the dump phase and ukey deletion here. */ - ovs_mutex_lock(&umap->mutex); ukey_delete(umap, ukey); - ovs_mutex_unlock(&umap->mutex); } if (n_ops == REVALIDATE_MAX_BATCH) { @@ -3025,6 +3027,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) n_ops = 0; } } + ovs_mutex_unlock(&umap->mutex); if (n_ops) { push_ukey_ops(udpif, umap, ops, n_ops); -- 2.27.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev