On 2/1/22 12:38, Gaëtan Rivet wrote:
> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>> this occurs in the window in which the offload thread completes
>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>> deleting the flow. This problem has been observed while adding
>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>> deletion regardless of the flow->mark being set.
>>>>>
>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>> Signed-off-by: Sriharsha Basavapatna <[email protected]>
>>>>> ---
>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>> index e28e0b554..22c5f19a8 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
>>>>> dp_netdev_pmd_thread *pmd,
>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>      cmap_remove(&pmd->flow_table, node, 
>>>>> dp_netdev_flow_hash(&flow->ufid));
>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>> -    }
>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>      flow->dead = true;
>>>>>  
>>>>>      dp_netdev_flow_unref(flow);
>>>>>
>>>>
>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>> on flow disassociation if the dpif already destroyed:
>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>
>>>> I think that the problem is that offload thread holds the ref count
>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>> should hold and ref the dp pointer instead?
>>>>
>>>> What do you think?  Gaetan?
>>>>
>>>
>>> Hi Ilya,
>>>
>>> I've been looking into this issue, you are right that the offload threads 
>>> don't
>>> actually need the pmd pointer, only the dp one.
>>>
>>> The problem I see with a proper reference taking on dp to protect against 
>>> this
>>> UAF, is that it is prohibitively slow to manage those dp references, and it 
>>> relies on
>>> a global lock.
>>>
>>> I have tried to find an alternative (the flush op could be tagged to mark 
>>> that
>>> it was issued during a dp_netdev_free), but I always end-up with a 
>>> race-condition,
>>> even if the window is reduced.
>> Hmm.  Is it with flow API enabled or disabled?
>> With flow api enabled we should have a flush operation that should
>> in theory wait for all the previous offloading operations to be
>> completed.  If it's disabled though, flow deletion will initialize
>> the offload thread and enqueue the operation while flush will not
>> actually be requested.
> 
> Yes, on your second point, with the API disabled the deletion path relied on
> the test flow->mark != INVALID to know whether a flow was offload or not, and 
> did not
> care then to check the API status.
> The API state should now be checked as well for the deletion path.
> 
>>
>> I missed before that we have flush on that path.  Maybe it's enough
>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>> Or am I missing something else?
>>
> 
> After the flush, there will be remaining ops in the queue that would trigger 
> the UAF.

What are these ops?

> The api_enabled() check on del emission is not sufficient.
> During the flush, with a special tag telling that it is part of deleting the 
> whole
> datapath, the API could be marked disabled, and the api_enabled() could be
> checked again in the offload thread, before any dereference of dp.
> 
> I didn't consider it as the API was not meant to be disabled then possible 
> re-enabled
> after a datapath re-creation potentially. That could work, with some changes 
> on
> the api_enabled() flag. The current flag should hold the user intent as set 
> in the config,
> a second flag should hold the API liveness as set related to the dp_netdev 
> status.
> That second flag would be the one read during the checks, and written on open 
> and after
> a flush, before getting past the barrier. All offload threads should just 
> discard
> offload requests if this liveness flag is false.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to