On 2/16/22 16:18, Gaëtan Rivet wrote:
> On Wed, Feb 16, 2022, at 14:15, Ilya Maximets wrote:
>> On 2/4/22 17:12, Gaetan Rivet wrote:
>>> A race condition has been identified during datapath destruction,
>>> along with the port offload flushes issued.
>>>
>>> This series addresses these race conditions, cleaning up the
>>> port deletion process.
>>>
>>> The last patch also cleans up the offload structure.
>>> It is not strictly necessary like the first two fixes,
>>> so I put it last. It can wait until after the code-freeze
>>> to be integrated.
>>>
>>> I tested for a few hours without ASAN enabled without seeing
>>> issues. ASAN has been executed as part of the github CI:
>>> https://github.com/grivet/ovs/actions/runs/1795624401
>>> It is however not too relevant, as no offloads are inserted during CI.
>>>
>>> The following patch was used to fix an unrelated CI issue:
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>>>
>>> I also ran datapath creation + deletion loop with ASAN on an offload
>>> test setup, but the execution was excruciatingly slow and could
>>> not progress much. It reached datapath deletion without panicking
>>> and no crash was seen, even though I had to interrupt the test after
>>> a few hours.
>>>
>>> Gaetan Rivet (2):
>>>   dpif-netdev: Move port flush after datapath reconfiguration
>>>   dpif-netdev: Use dp_netdev reference in offload threads
>>>
>>> Sriharsha Basavapatna (1):
>>>   dpif-netdev: Fix a race condition in deletion of offloaded flows
>>>
>>>  lib/dpif-netdev.c | 71 ++++++++++++++++++++++++++---------------------
>>>  1 file changed, 40 insertions(+), 31 deletions(-)
>>
>>
>> Thanks!  I applied the series to master and 2.17.
>>
>> I was thinking about backports and I think that it should be OK to
>> backport patch #2 (always enqueue deletions), because it doesn't
>> introduce any new issues.
>>
>> But I'm not sure about the first patch.  On 2.17 flush acts like
>> a barrier, so no offload requests are in the offload queue after
>> the port removal from PMD thread and flush from the offload thread.
>> However, there is no such synchronization mechanism on 2.16.
>> Flush is performed from the main thread and some offload requests
>> could still be in the offload queue.  And if dp is destroyed,
>> processing of those offload requests will cause use-after-free.
>> In other words, while patch #1 can reduce the race window for flows
>> remaining in the hardware after the port deletion, it doesn't solve
>> the use-after-free problem.  So, we need some other solution.
>> Backporting the barrier machinery doesn't seem like a good option.
>>
>> Referencing the dp on creation of the PMD thread or on creation
>> of the offload item might be a solution. But this will change the
>> time dp will actually be destroyed, so needs a careful consideration.
>>
>> We may also not fix the UAF problem on 2.16 taking into account that
>> issue is happening only during deletion of the datapath, so should not
>> be very severe and 2.16 is not an LTS branch.
>>
>> Thoughts?
>>
>> Best regards, Ilya Maximets.
> 
> I agree that the first patch is only correct as long as flush serves as
> a synchronization point. Without the barrier, nothing ensures that no further
> offload reqs exists in the queue.
> 
> Managing dp reference and implementing an offload barrier are two solutions
> that have the potential to impact beyond the datapath destruction, potentially
> introducing more severe bugs.
> 
> Another potential solution might be offload request cancellation, but
> it is more complex than the barrier one, so even less doable on the branch.
> 
> I will keep trying to find another simple solution that could be used,
> but so far I agree that the safest thing would be to leave it as-is
> (or just reduce the race window with patch 1 & 2).

OK.  I did some tests on 2.16 and the code seems to work fine, so I
updated the comments and applied first 2 patches to branch-2.16.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to