On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
> On 2/1/22 12:38, Gaëtan Rivet wrote:
>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>> this occurs in the window in which the offload thread completes
>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>> deleting the flow. This problem has been observed while adding
>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>> deletion regardless of the flow->mark being set.
>>>>>>
>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>> Signed-off-by: Sriharsha Basavapatna <[email protected]>
>>>>>> ---
>>>>>> lib/dpif-netdev.c | 4 +---
>>>>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>> --- a/lib/dpif-netdev.c
>>>>>> +++ b/lib/dpif-netdev.c
>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct
>>>>>> dp_netdev_pmd_thread *pmd,
>>>>>> dp_netdev_simple_match_remove(pmd, flow);
>>>>>> cmap_remove(&pmd->flow_table, node,
>>>>>> dp_netdev_flow_hash(&flow->ufid));
>>>>>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>> - if (flow->mark != INVALID_FLOW_MARK) {
>>>>>> - queue_netdev_flow_del(pmd, flow);
>>>>>> - }
>>>>>> + queue_netdev_flow_del(pmd, flow);
>>>>>> flow->dead = true;
>>>>>>
>>>>>> dp_netdev_flow_unref(flow);
>>>>>>
>>>>>
>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>> on flow disassociation if the dpif already destroyed:
>>>>> https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>
>>>>> I think that the problem is that offload thread holds the ref count
>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>> will be used while PMD thread is destroyed. I guess, we should fix
>>>>> that. OTOH, I'm not sure if offload thread actually needs a reference
>>>>> to dp_netdev_pmd structure. If I didn't miss something, it only
>>>>> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item
>>>>> should hold and ref the dp pointer instead?
>>>>>
>>>>> What do you think? Gaetan?
>>>>>
>>>>
>>>> Hi Ilya,
>>>>
>>>> I've been looking into this issue, you are right that the offload threads
>>>> don't
>>>> actually need the pmd pointer, only the dp one.
>>>>
>>>> The problem I see with a proper reference taking on dp to protect against
>>>> this
>>>> UAF, is that it is prohibitively slow to manage those dp references, and
>>>> it relies on
>>>> a global lock.
>>>>
>>>> I have tried to find an alternative (the flush op could be tagged to mark
>>>> that
>>>> it was issued during a dp_netdev_free), but I always end-up with a
>>>> race-condition,
>>>> even if the window is reduced.
>>> Hmm. Is it with flow API enabled or disabled?
>>> With flow api enabled we should have a flush operation that should
>>> in theory wait for all the previous offloading operations to be
>>> completed. If it's disabled though, flow deletion will initialize
>>> the offload thread and enqueue the operation while flush will not
>>> actually be requested.
>>
>> Yes, on your second point, with the API disabled the deletion path relied on
>> the test flow->mark != INVALID to know whether a flow was offload or not,
>> and did not
>> care then to check the API status.
>> The API state should now be checked as well for the deletion path.
>>
>>>
>>> I missed before that we have flush on that path. Maybe it's enough
>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>> Or am I missing something else?
>>>
>>
>> After the flush, there will be remaining ops in the queue that would trigger
>> the UAF.
>
> What are these ops?
>
Any add/mod/del can be found there.
Flushes are scheduled by the main thread in response to user commands.
While it executes, PMDs et revalidators are running, potentially enqueuing
offload ops. The only restriction on those ops is that it's impossible to have
multiple
flush in the queue: only one can exist at a time.
--
Gaetan Rivet
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev