On 15 Mar 2024, at 11:04, LIU Yulong wrote:

> A potential race condition happened with the following 3 threads:
> * PMD thread replaced the old_ukey and transitioned the state to
>   UKEY_DELETED.
> * RCU thread is freeing the old_ukey mutex.
> * While the revalidator thread is trying to lock the old_ukey mutex.
>
> We added some timestamp to udpif_key state transition and
> the udpif_key mutex free action, as well as the sweep try lock for
> the same udpif_key. When the ovs-vswitchd goes core, the udpif_key
> try_lock mutex had always a bit later than the udpif_key mutex free
> action. For instance [3]:
> ukey_destroy_time = 13217289156490
> sweep_now = 13217289156568
>
> The second time is 78us behind the first time.
>
> Then vswitchd process aborts at the revalidator thread try_lock of
> ukey->mutex because of the NULL pointer.
>
> This patch adds the try_lock for the ukeys' basket udpif_key map to
> avoid the PMD and revalidator access to the same map for replacing the
> udpif_key and transitioning the udpif_key state.
>
> More details can be found at:
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> [2] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
> [3] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052993.html
>
> Signed-off-by: LIU Yulong <i...@liuyulong.me>

Hi LIU,

I've examined a lot of code, but I can't seem to figure out what event could be 
causing your problem. I also don't understand why your fix would prevent the 
problem from occurring (aside from possibly avoiding some quiescent state). 
Here's some of my analysis.

Assume PMD is doing try_ukey_replace():

  1930  try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey,
  1931                   struct udpif_key *new_ukey)
  1932      OVS_REQUIRES(umap->mutex)
  1933      OVS_TRY_LOCK(true, new_ukey->mutex)
  1934  {
  1935      bool replaced = false;
  1936
  1937      if (!ovs_mutex_trylock(&old_ukey->mutex)) {
  1938          if (old_ukey->state == UKEY_EVICTED) {
  1939              /* The flow was deleted during the current revalidator dump,
  1940               * but its ukey won't be fully cleaned up until the sweep 
phase.
  1941               * In the mean time, we are receiving upcalls for this 
traffic.
  1942               * Expedite the (new) flow install by replacing the ukey. */
  1943              ovs_mutex_lock(&new_ukey->mutex);
  1944              cmap_replace(&umap->cmap, &old_ukey->cmap_node,
  1945                           &new_ukey->cmap_node, new_ukey->hash);
  1946              new_ukey->dump_seq = old_ukey->dump_seq;
  1947              ovsrcu_postpone(ukey_delete__, old_ukey);
  1948              transition_ukey(old_ukey, UKEY_DELETED);
  1949              transition_ukey(new_ukey, UKEY_VISIBLE);
  1950              replaced = true;
  1951          }
  1952          ovs_mutex_unlock(&old_ukey->mutex);
  1953      }
  ...

In this code analysis, it's evident that the ukey deletion doesn't occur 
immediately but rather after completing an RCU quiescent period.
The function revalidator_sweep__() doesn't include any ovsrcu_quiesce() calls 
within the CMAP loop; instead, it executes it when processing the next slice. 
See the code snippet below.

2983  static void
2984  revalidator_sweep__(struct revalidator *revalidator, bool purge)
2985  {
....
2995
2996      for (int i = slice; i < N_UMAPS; i += udpif->n_revalidators) {
....
3004
3005          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
3006              enum flow_del_reason del_reason = FDR_NONE;
3007              enum ukey_state ukey_state;
3008
3009              /* Handler threads could be holding a ukey lock while it 
installs a
3010               * new flow, so don't hang around waiting for access to it. */
3011              if (ovs_mutex_trylock(&ukey->mutex)) {
3012                  continue;
3013              }
3037                      reval_op_init(&ops[n_ops++], result, udpif, ukey, 
&recircs,
3038                                    &odp_actions);
3039                  }

3040                  OVS_USDT_PROBE(revalidator_sweep__, flow_sweep_result, 
udpif,
3041                                 ukey, result, del_reason);
3042              }
3043              ovs_mutex_unlock(&ukey->mutex);
....
3045              if (ukey_state == UKEY_EVICTED) {
3046                  /* The common flow deletion case involves deletion of the 
flow
3047                   * during the dump phase and ukey deletion here. */
3048                  ovs_mutex_lock(&umap->mutex);
3049                  ukey_delete(umap, ukey);
3050                  ovs_mutex_unlock(&umap->mutex);
3051              }
3052
3053              if (n_ops == REVALIDATE_MAX_BATCH) {
3054                  /* Update/delete missed flows and clean up corresponding 
ukeys
3055                   * if necessary. */
3056                  push_ukey_ops(udpif, umap, ops, n_ops);
3057                  n_ops = 0;
3058              }
3059          }
....
3066          ovsrcu_quiesce();
              ^^ Here we call it.
3067      }
3068  }

So maybe you could try to remove the ovsrcu_quiesce() above, as this seems to 
be the only hint why we could go trough quiescent state (maybe some compiler 
optimization)?

If this is not solving the problem, we might need to come up with some more 
debugging. Let me know and I can try to put some debug code together.

//Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to