On 2/16/22 16:18, Gaëtan Rivet wrote: > On Wed, Feb 16, 2022, at 14:15, Ilya Maximets wrote: >> On 2/4/22 17:12, Gaetan Rivet wrote: >>> A race condition has been identified during datapath destruction, >>> along with the port offload flushes issued. >>> >>> This series addresses these race conditions, cleaning up the >>> port deletion process. >>> >>> The last patch also cleans up the offload structure. >>> It is not strictly necessary like the first two fixes, >>> so I put it last. It can wait until after the code-freeze >>> to be integrated. >>> >>> I tested for a few hours without ASAN enabled without seeing >>> issues. ASAN has been executed as part of the github CI: >>> https://github.com/grivet/ovs/actions/runs/1795624401 >>> It is however not too relevant, as no offloads are inserted during CI. >>> >>> The following patch was used to fix an unrelated CI issue: >>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ >>> >>> I also ran datapath creation + deletion loop with ASAN on an offload >>> test setup, but the execution was excruciatingly slow and could >>> not progress much. It reached datapath deletion without panicking >>> and no crash was seen, even though I had to interrupt the test after >>> a few hours. >>> >>> Gaetan Rivet (2): >>> dpif-netdev: Move port flush after datapath reconfiguration >>> dpif-netdev: Use dp_netdev reference in offload threads >>> >>> Sriharsha Basavapatna (1): >>> dpif-netdev: Fix a race condition in deletion of offloaded flows >>> >>> lib/dpif-netdev.c | 71 ++++++++++++++++++++++++++--------------------- >>> 1 file changed, 40 insertions(+), 31 deletions(-) >> >> >> Thanks! I applied the series to master and 2.17. >> >> I was thinking about backports and I think that it should be OK to >> backport patch #2 (always enqueue deletions), because it doesn't >> introduce any new issues. >> >> But I'm not sure about the first patch. On 2.17 flush acts like >> a barrier, so no offload requests are in the offload queue after >> the port removal from PMD thread and flush from the offload thread. >> However, there is no such synchronization mechanism on 2.16. >> Flush is performed from the main thread and some offload requests >> could still be in the offload queue. And if dp is destroyed, >> processing of those offload requests will cause use-after-free. >> In other words, while patch #1 can reduce the race window for flows >> remaining in the hardware after the port deletion, it doesn't solve >> the use-after-free problem. So, we need some other solution. >> Backporting the barrier machinery doesn't seem like a good option. >> >> Referencing the dp on creation of the PMD thread or on creation >> of the offload item might be a solution. But this will change the >> time dp will actually be destroyed, so needs a careful consideration. >> >> We may also not fix the UAF problem on 2.16 taking into account that >> issue is happening only during deletion of the datapath, so should not >> be very severe and 2.16 is not an LTS branch. >> >> Thoughts? >> >> Best regards, Ilya Maximets. > > I agree that the first patch is only correct as long as flush serves as > a synchronization point. Without the barrier, nothing ensures that no further > offload reqs exists in the queue. > > Managing dp reference and implementing an offload barrier are two solutions > that have the potential to impact beyond the datapath destruction, potentially > introducing more severe bugs. > > Another potential solution might be offload request cancellation, but > it is more complex than the barrier one, so even less doable on the branch. > > I will keep trying to find another simple solution that could be used, > but so far I agree that the safest thing would be to leave it as-is > (or just reduce the race window with patch 1 & 2).
OK. I did some tests on 2.16 and the code seems to work fine, so I updated the comments and applied first 2 patches to branch-2.16. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
