On 2/1/22 14:38, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>>> this occurs in the window in which the offload thread completes
>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>>> deleting the flow. This problem has been observed while adding
>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>>> deletion regardless of the flow->mark being set.
>>>>>>>
>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>>> Signed-off-by: Sriharsha Basavapatna 
>>>>>>> <[email protected]>
>>>>>>> ---
>>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
>>>>>>> dp_netdev_pmd_thread *pmd,
>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>>      cmap_remove(&pmd->flow_table, node, 
>>>>>>> dp_netdev_flow_hash(&flow->ufid));
>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>>> -    }
>>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>>      flow->dead = true;
>>>>>>>  
>>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>>
>>>>>>
>>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>>> on flow disassociation if the dpif already destroyed:
>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>>
>>>>>> I think that the problem is that offload thread holds the ref count
>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>>> should hold and ref the dp pointer instead?
>>>>>>
>>>>>> What do you think?  Gaetan?
>>>>>>
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> I've been looking into this issue, you are right that the offload threads 
>>>>> don't
>>>>> actually need the pmd pointer, only the dp one.
>>>>>
>>>>> The problem I see with a proper reference taking on dp to protect against 
>>>>> this
>>>>> UAF, is that it is prohibitively slow to manage those dp references, and 
>>>>> it relies on
>>>>> a global lock.
>>>>>
>>>>> I have tried to find an alternative (the flush op could be tagged to mark 
>>>>> that
>>>>> it was issued during a dp_netdev_free), but I always end-up with a 
>>>>> race-condition,
>>>>> even if the window is reduced.
>>>> Hmm.  Is it with flow API enabled or disabled?
>>>> With flow api enabled we should have a flush operation that should
>>>> in theory wait for all the previous offloading operations to be
>>>> completed.  If it's disabled though, flow deletion will initialize
>>>> the offload thread and enqueue the operation while flush will not
>>>> actually be requested.
>>>
>>> Yes, on your second point, with the API disabled the deletion path relied on
>>> the test flow->mark != INVALID to know whether a flow was offload or not, 
>>> and did not
>>> care then to check the API status.
>>> The API state should now be checked as well for the deletion path.
>>>
>>>>
>>>> I missed before that we have flush on that path.  Maybe it's enough
>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>>> Or am I missing something else?
>>>>
>>>
>>> After the flush, there will be remaining ops in the queue that would 
>>> trigger the UAF.
>>
>> What are these ops?
>>
> 
> Any add/mod/del can be found there.
> 
> Flushes are scheduled by the main thread in response to user commands.
> While it executes, PMDs et revalidators are running, potentially enqueuing
> offload ops. The only restriction on those ops is that it's impossible to 
> have multiple
> flush in the queue: only one can exist at a time.
> 

I think, revalidators should already be stopped at the point of datapath
deletion.  Don't they?

For the PMDs, I see that they are still running and that sounds like a
bug, because we're still offloading things after the flush.  So, some
of the flows are still in hardware after the port removal / some userspace
structures still exist for flows of the deleted datapath.

Can we do something like this (just an untested concept):

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e28e0b554..4b51ac652 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2313,13 +2313,14 @@ static void
 do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
     OVS_REQ_WRLOCK(dp->port_rwlock)
 {
-    dp_netdev_offload_flush(dp, port);
-    netdev_uninit_flow_api(port->netdev);
     hmap_remove(&dp->ports, &port->node);
     seq_change(dp->port_seq);
 
     reconfigure_datapath(dp);
 
+    dp_netdev_offload_flush(dp, port);
+    netdev_uninit_flow_api(port->netdev);
+
     port_destroy(port);
 }
 
---
?

In other words, we should flush flows after removing the port from
a PMD thread.  This way, after removing the last port of a datapath,
there should be no more offloading operations enqueued after the
last flush.  Of course, assuming that revalidators are stopped at
this point, which is what I hope for.
Does that make sense?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to