On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> On 1/31/22 11:42, Gaëtan Rivet wrote:
>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>> this occurs in the window in which the offload thread completes
>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>> deleting the flow. This problem has been observed while adding
>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>> deletion regardless of the flow->mark being set.
>>>>
>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>> Signed-off-by: Sriharsha Basavapatna <[email protected]>
>>>> ---
>>>> lib/dpif-netdev.c | 4 +---
>>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index e28e0b554..22c5f19a8 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct
>>>> dp_netdev_pmd_thread *pmd,
>>>> dp_netdev_simple_match_remove(pmd, flow);
>>>> cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>> ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>> - if (flow->mark != INVALID_FLOW_MARK) {
>>>> - queue_netdev_flow_del(pmd, flow);
>>>> - }
>>>> + queue_netdev_flow_del(pmd, flow);
>>>> flow->dead = true;
>>>>
>>>> dp_netdev_flow_unref(flow);
>>>>
>>>
>>> Thanks for the patch, but it looks like that it's not that simple...
>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>> on flow disassociation if the dpif already destroyed:
>>> https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>
>>> I think that the problem is that offload thread holds the ref count
>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>> will be used while PMD thread is destroyed. I guess, we should fix
>>> that. OTOH, I'm not sure if offload thread actually needs a reference
>>> to dp_netdev_pmd structure. If I didn't miss something, it only
>>> uses pmd pointer to access pmd->dp. So, maybe, dp_offload_thread_item
>>> should hold and ref the dp pointer instead?
>>>
>>> What do you think? Gaetan?
>>>
>>
>> Hi Ilya,
>>
>> I've been looking into this issue, you are right that the offload threads
>> don't
>> actually need the pmd pointer, only the dp one.
>>
>> The problem I see with a proper reference taking on dp to protect against
>> this
>> UAF, is that it is prohibitively slow to manage those dp references, and it
>> relies on
>> a global lock.
>>
>> I have tried to find an alternative (the flush op could be tagged to mark
>> that
>> it was issued during a dp_netdev_free), but I always end-up with a
>> race-condition,
>> even if the window is reduced.
> Hmm. Is it with flow API enabled or disabled?
> With flow api enabled we should have a flush operation that should
> in theory wait for all the previous offloading operations to be
> completed. If it's disabled though, flow deletion will initialize
> the offload thread and enqueue the operation while flush will not
> actually be requested.
Yes, on your second point, with the API disabled the deletion path relied on
the test flow->mark != INVALID to know whether a flow was offload or not, and
did not
care then to check the API status.
The API state should now be checked as well for the deletion path.
>
> I missed before that we have flush on that path. Maybe it's enough
> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> Or am I missing something else?
>
After the flush, there will be remaining ops in the queue that would trigger
the UAF.
The api_enabled() check on del emission is not sufficient.
During the flush, with a special tag telling that it is part of deleting the
whole
datapath, the API could be marked disabled, and the api_enabled() could be
checked again in the offload thread, before any dereference of dp.
I didn't consider it as the API was not meant to be disabled then possible
re-enabled
after a datapath re-creation potentially. That could work, with some changes on
the api_enabled() flag. The current flag should hold the user intent as set in
the config,
a second flag should hold the API liveness as set related to the dp_netdev
status.
That second flag would be the one read during the checks, and written on open
and after
a flush, before getting past the barrier. All offload threads should just
discard
offload requests if this liveness flag is false.
--
Gaetan Rivet
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev