On 11 May 2026, at 17:54, Ilya Maximets wrote:

> A simplified ofproto destruction flow looks like this:
>
>   ofproto_destroy()
>
>     ofproto_flush__()
>       ofproto_class->flush(ofproto);
>         udpif_flush()
>           udpif_stop_threads(udpif, true)
>             revalidator_purge()
>               ...
>               ovsrcu_postpone(ukey_delete__, ukey)
>           udpif_start_threads()
>
>     ofproto_class->destruct(p, del)
>       xlate_txn_start();
>       xlate_remove_ofproto(ofproto);
>       xlate_txn_commit();
>       ...
>       recirc_free_ofproto()
>       ...
>       ovsrcu_barrier()
>       close_dpif_backer(ofproto->backer, del);
>
> The recirc_free_ofproto() function is used to check if we leaked any
> recirculation IDs, but it doesn't really wait for everything to be
> completed and reports false positives.  There are two races here in
> particular:
>
> 1. Flush frees recirculation IDs in an RCU-postponed ukey_delete__().
> There is one ovsrcu_synchronize() call between udpif_flush() and the
> recirc_free_ofproto() inside xlate_txn_commit(), but synchronization
> only waits for all threads to enter quiescent state, it doesn't wait
> for the postponed callbacks to be completed.  So, when the
> recirc_free_ofproto() is called, some IDs can still be not freed.
>
> 2. udpif_flush() starts the threads back at the end, so they can still
> add new ukeys in the short window between the flush and the
> destruction.  In this case some new IDs can be allocated after the
> flush and the leak check will find them.
>
> We can solve both races by moving the leak check down to the end of
> the ofproto_class->destruct().  We already have the ovsrcu_barrier()
> there that actually waits for all the RCU-postponed cleanups.  And the
> close_dpif_backer() will take care of all the ukeys created after the
> flush in a non-postponed fashion.
>
> It's hard to pinpoint the exact commit to blame, but citing one that
> definitely played a role here.
>
> Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on 
> recirculation.")
> Signed-off-by: Ilya Maximets <[email protected]>

Thanks, Ilya, for the patch and the detailed explanation. The
change looks good to me; however, I do not have a replicator to
verify it, so it would be good if Dumitru has one and could give it
a try.

Cheers,

Eelco

Acked-by: Eelco Chaudron <[email protected]>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to