On 11 May 2026, at 17:54, Ilya Maximets wrote:
> A simplified ofproto destruction flow looks like this:
>
> ofproto_destroy()
>
> ofproto_flush__()
> ofproto_class->flush(ofproto);
> udpif_flush()
> udpif_stop_threads(udpif, true)
> revalidator_purge()
> ...
> ovsrcu_postpone(ukey_delete__, ukey)
> udpif_start_threads()
>
> ofproto_class->destruct(p, del)
> xlate_txn_start();
> xlate_remove_ofproto(ofproto);
> xlate_txn_commit();
> ...
> recirc_free_ofproto()
> ...
> ovsrcu_barrier()
> close_dpif_backer(ofproto->backer, del);
>
> The recirc_free_ofproto() function is used to check if we leaked any
> recirculation IDs, but it doesn't really wait for everything to be
> completed and reports false positives. There are two races here in
> particular:
>
> 1. Flush frees recirculation IDs in an RCU-postponed ukey_delete__().
> There is one ovsrcu_synchronize() call between udpif_flush() and the
> recirc_free_ofproto() inside xlate_txn_commit(), but synchronization
> only waits for all threads to enter quiescent state, it doesn't wait
> for the postponed callbacks to be completed. So, when the
> recirc_free_ofproto() is called, some IDs can still be not freed.
>
> 2. udpif_flush() starts the threads back at the end, so they can still
> add new ukeys in the short window between the flush and the
> destruction. In this case some new IDs can be allocated after the
> flush and the leak check will find them.
>
> We can solve both races by moving the leak check down to the end of
> the ofproto_class->destruct(). We already have the ovsrcu_barrier()
> there that actually waits for all the RCU-postponed cleanups. And the
> close_dpif_backer() will take care of all the ukeys created after the
> flush in a non-postponed fashion.
>
> It's hard to pinpoint the exact commit to blame, but citing one that
> definitely played a role here.
>
> Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on
> recirculation.")
> Signed-off-by: Ilya Maximets <[email protected]>
Thanks, Ilya, for the patch and the detailed explanation. The
change looks good to me; however, I do not have a replicator to
verify it, so it would be good if Dumitru has one and could give it
a try.
Cheers,
Eelco
Acked-by: Eelco Chaudron <[email protected]>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev