On 15 Mar 2024, at 11:04, LIU Yulong wrote:
> A potential race condition happened with the following 3 threads:
> * PMD thread replaced the old_ukey and transitioned the state to
> UKEY_DELETED.
> * RCU thread is freeing the old_ukey mutex.
> * While the revalidator thread is trying to lock the old_ukey mutex.
>
> We added some timestamp to udpif_key state transition and
> the udpif_key mutex free action, as well as the sweep try lock for
> the same udpif_key. When the ovs-vswitchd goes core, the udpif_key
> try_lock mutex had always a bit later than the udpif_key mutex free
> action. For instance [3]:
> ukey_destroy_time = 13217289156490
> sweep_now = 13217289156568
>
> The second time is 78us behind the first time.
>
> Then vswitchd process aborts at the revalidator thread try_lock of
> ukey->mutex because of the NULL pointer.
>
> This patch adds the try_lock for the ukeys' basket udpif_key map to
> avoid the PMD and revalidator access to the same map for replacing the
> udpif_key and transitioning the udpif_key state.
>
> More details can be found at:
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> [2]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
> [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
>
> Signed-off-by: LIU Yulong <i...@liuyulong.me>
Hi LIU,
I've examined a lot of code, but I can't seem to figure out what event could be
causing your problem. I also don't understand why your fix would prevent the
problem from occurring (aside from possibly avoiding some quiescent state).
Here's some of my analysis.
Assume PMD is doing try_ukey_replace():
1930 try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey,
1931 struct udpif_key *new_ukey)
1932 OVS_REQUIRES(umap->mutex)
1933 OVS_TRY_LOCK(true, new_ukey->mutex)
1934 {
1935 bool replaced = false;
1936
1937 if (!ovs_mutex_trylock(&old_ukey->mutex)) {
1938 if (old_ukey->state == UKEY_EVICTED) {
1939 /* The flow was deleted during the current revalidator dump,
1940 * but its ukey won't be fully cleaned up until the sweep
phase.
1941 * In the mean time, we are receiving upcalls for this
traffic.
1942 * Expedite the (new) flow install by replacing the ukey. */
1943 ovs_mutex_lock(&new_ukey->mutex);
1944 cmap_replace(&umap->cmap, &old_ukey->cmap_node,
1945 &new_ukey->cmap_node, new_ukey->hash);
1946 new_ukey->dump_seq = old_ukey->dump_seq;
1947 ovsrcu_postpone(ukey_delete__, old_ukey);
1948 transition_ukey(old_ukey, UKEY_DELETED);
1949 transition_ukey(new_ukey, UKEY_VISIBLE);
1950 replaced = true;
1951 }
1952 ovs_mutex_unlock(&old_ukey->mutex);
1953 }
...
In this code analysis, it's evident that the ukey deletion doesn't occur
immediately but rather after completing an RCU quiescent period.
The function revalidator_sweep__() doesn't include any ovsrcu_quiesce() calls
within the CMAP loop; instead, it executes it when processing the next slice.
See the code snippet below.
2983 static void
2984 revalidator_sweep__(struct revalidator *revalidator, bool purge)
2985 {
....
2995
2996 for (int i = slice; i < N_UMAPS; i += udpif->n_revalidators) {
....
3004
3005 CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
3006 enum flow_del_reason del_reason = FDR_NONE;
3007 enum ukey_state ukey_state;
3008
3009 /* Handler threads could be holding a ukey lock while it
installs a
3010 * new flow, so don't hang around waiting for access to it. */
3011 if (ovs_mutex_trylock(&ukey->mutex)) {
3012 continue;
3013 }
3037 reval_op_init(&ops[n_ops++], result, udpif, ukey,
&recircs,
3038 &odp_actions);
3039 }
3040 OVS_USDT_PROBE(revalidator_sweep__, flow_sweep_result,
udpif,
3041 ukey, result, del_reason);
3042 }
3043 ovs_mutex_unlock(&ukey->mutex);
....
3045 if (ukey_state == UKEY_EVICTED) {
3046 /* The common flow deletion case involves deletion of the
flow
3047 * during the dump phase and ukey deletion here. */
3048 ovs_mutex_lock(&umap->mutex);
3049 ukey_delete(umap, ukey);
3050 ovs_mutex_unlock(&umap->mutex);
3051 }
3052
3053 if (n_ops == REVALIDATE_MAX_BATCH) {
3054 /* Update/delete missed flows and clean up corresponding
ukeys
3055 * if necessary. */
3056 push_ukey_ops(udpif, umap, ops, n_ops);
3057 n_ops = 0;
3058 }
3059 }
....
3066 ovsrcu_quiesce();
^^ Here we call it.
3067 }
3068 }
So maybe you could try to remove the ovsrcu_quiesce() above, as this seems to
be the only hint why we could go trough quiescent state (maybe some compiler
optimization)?
If this is not solving the problem, we might need to come up with some more
debugging. Let me know and I can try to put some debug code together.
//Eelco
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev