> On 30 Oct 2024, at 1:37 AM, Numan Siddique <[email protected]> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Wed, Oct 23, 2024 at 11:14 PM Numan Siddique <[email protected]> wrote:
>> 
>> On Mon, Oct 21, 2024 at 12:01 PM Ales Musil <[email protected]> wrote:
>>> 
>>> On Mon, Oct 21, 2024 at 5:21 PM Naveen Yerramneni <
>>> [email protected]> wrote:
>>> 
>>>> Issue:
>>>> =====
>>>> OVN controller main() is caching the "sbrec_mac_binding/sbrec_fdb"
>>>> pointers in "mac_cache_mac_binding->mac_bindings/fdbs" which are
>>>> used to update timestamp in mac_cache_mb_stats_run()/
>>>> mac_cache_fdb_stats_run(). There is a condition where cached
>>>> "sbrec_mac_binding/sbrec_fdb" pointers gets freed without
>>>> removing the refs from "mac_cache_mac_binding->mac_bindings/fdbs"
>>>> which is leading to crash due to "use after free".
>>>> 
>>>> This condition can occur due to following reason.
>>>>  - en_runtime_data is input to en_mac_cache.
>>>>  - en_runtime_data went for full recompute but engine_run()
>>>>    is called with recompute_allowed as False due to
>>>>    ofctrl_has_backlog() returned True. This caused to avoid
>>>>    the recompute in the current engine run and engine_run_aborted
>>>>    is set to True. If recompute_allowed is False and
>>>>    engine_run_aborted is True then, inc processing engine is
>>>>    skipped.
>>>>  - If there are any mac_binding/fdb deletes in the same run
>>>>    then, entries gets deleted during ovsdb_idl_loop_run() and
>>>>    rows are freed in ovsdb_idl_track_clear() but references
>>>>    to them are present in mac_cache_data->mac_bindings/fdbs.
>>>> 
>>>> Fix:
>>>> ===
>>>> Avoid statctrl_run() when engine recompute is not allowed.
>>>> 
>>>> Signed-off-by: Naveen Yerramneni <[email protected]>
>> 
>> Thanks for the fix.
>> 
>> However this doesn't seem to be the right fix.
>> 
>> The actual issue is because "mac_cache_data" is initialized before the
>> start of the main loop
>> using "engine_get_internal_data(&en_mac_cache)".
>> 
>> Later this data is used inside the main loop when calling
>> "statctrl_run(mac_cache_data).
>> 
>> The issue will be fixed if mac_cache_data is retrieved from the
>> en_mac_cache inside
>> the while loop  as  ->  mac_cache_data = engine_get_data(&en_mac_cache);
>> 
>> In the scenario you mentioned above when SB mac binding rows are
>> deleted and the engine run was
>> aborted,  the engine node "en_mac_cache" state will be  "Stale" and
>> the function engine_get_data() will return NULL,
>> thereby statctrl_run() will not be called.
>> 
>> Can you please fix it this way ?
> 
> I went ahead and applied this patch to main and backported till 23.09
> with the below changes
> on top of this patch.
> 

Hi Numan,

Sorry for the delay.
Thanks for the fix, change looks good to me.

Thanks,
Naveen



> 
> ---------------------------
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index d821d11529..e87274121c 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -5752,9 +5752,8 @@ main(int argc, char *argv[])
>                         }
>                     }
> 
> -                    if (mac_cache_data && recompute_allowed) {
> -                        /* Run only when engine recompute is allowed
> -                         * since mac_binding/FDB rows are cached */
> +                    mac_cache_data = engine_get_data(&en_mac_cache);
> +                    if (mac_cache_data) {
>                         statctrl_run(ovnsb_idl_txn, mac_cache_data);
>                     }
> ----------------------------------------
>> 
>> Thanks
>> Numan
>> 
>> 
>> 
>>>> ---
>>>> v2:
>>>>  - Addressed review comments from Ales.
>>>>  - Updated the description.
>>>> ---
>>>> controller/ovn-controller.c | 59 +++++++++++++++++++------------------
>>>> 1 file changed, 30 insertions(+), 29 deletions(-)
>>>> 
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index c48667887..88430a111 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -5536,34 +5536,33 @@ main(int argc, char *argv[])
>>>> 
>>>>                     stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>>>                                     time_msec());
>>>> -                    if (ovnsb_idl_txn) {
>>>> -                        if (ofctrl_has_backlog()) {
>>>> -                            /* When there are in-flight messages pending
>>>> to
>>>> -                             * ovs-vswitchd, we should hold on
>>>> recomputing so
>>>> -                             * that the previous flow installations won't
>>>> be
>>>> -                             * delayed.  However, we still want to try if
>>>> -                             * recompute is not needed and we can quickly
>>>> -                             * incrementally process the new changes, to
>>>> avoid
>>>> -                             * unnecessarily forced recomputes later on.
>>>> This
>>>> -                             * is because the OVSDB change tracker cannot
>>>> -                             * preserve tracked changes across
>>>> iterations.  If
>>>> -                             * change tracking is improved, we can simply
>>>> skip
>>>> -                             * this round of engine_run and continue
>>>> processing
>>>> -                             * acculated changes incrementally later when
>>>> -                             * ofctrl_has_backlog() returns false. */
>>>> -                            engine_run(false);
>>>> -                        } else {
>>>> -                            engine_run(true);
>>>> -                        }
>>>> -                    } else {
>>>> -                        /* Even if there's no SB DB transaction available,
>>>> -                         * try to run the engine so that we can handle any
>>>> -                         * incremental changes that don't require a
>>>> recompute.
>>>> -                         * If a recompute is required, the engine will
>>>> cancel,
>>>> -                         * triggerring a full run in the next iteration.
>>>> -                         */
>>>> -                        engine_run(false);
>>>> -                    }
>>>> +
>>>> +                    /* Recompute is not allowed in following cases: */
>>>> +                    /* 1. No ovnsb_idl_txn  */
>>>> +                    /* Even if there's no SB DB transaction available,
>>>> +                    * try to run the engine so that we can handle any
>>>> +                    * incremental changes that don't require a recompute.
>>>> +                    * If a recompute is required, the engine will cancel,
>>>> +                    * triggerring a full run in the next iteration.
>>>> +                    */
>>>> +                    /* 2. ofctrl_has_backlog */
>>>> +                    /* When there are in-flight messages pending to
>>>> +                     * ovs-vswitchd, we should hold on recomputing so
>>>> +                     * that the previous flow installations won't be
>>>> +                     * delayed.  However, we still want to try if
>>>> +                     * recompute is not needed and we can quickly
>>>> +                     * incrementally process the new changes, to avoid
>>>> +                     * unnecessarily forced recomputes later on.  This
>>>> +                     * is because the OVSDB change tracker cannot
>>>> +                     * preserve tracked changes across iterations.  If
>>>> +                     * change tracking is improved, we can simply skip
>>>> +                     * this round of engine_run and continue processing
>>>> +                     * acculated changes incrementally later when
>>>> +                     * ofctrl_has_backlog() returns false. */
>>>> +
>>>> +                    bool recompute_allowed = (ovnsb_idl_txn &&
>>>> +                                              !ofctrl_has_backlog());
>>>> +                    engine_run(recompute_allowed);
>>>>                     stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>>>                                    time_msec());
>>>>                     if (engine_has_updated()) {
>>>> @@ -5685,7 +5684,9 @@ main(int argc, char *argv[])
>>>>                         }
>>>>                     }
>>>> 
>>>> -                    if (mac_cache_data) {
>>>> +                    if (mac_cache_data && recompute_allowed) {
>>>> +                        /* Run only when engine recompute is allowed
>>>> +                         * since mac_binding/FDB rows are cached */
>>>>                         statctrl_run(ovnsb_idl_txn, mac_cache_data);
>>>>                     }
>>>> 
>>>> --
>>>> 2.36.6
>>>> 
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=CkZytTZ7UtWOMHjuINFR08uWFgn88IkIh4-vZVGTag9-o1c7PyOdn1J87Fv6qogB&s=YSwteyImhl2dRMkuk-LjVkD5jEE_ItEoLVyJxlKLGHQ&e=
>>>> 
>>>> 
>>> Looks good to me, thanks.
>>> 
>>> Acked-by: Ales Musil <[email protected]>
>>> 
>>> --
>>> 
>>> Ales Musil
>>> 
>>> Senior Software Engineer - OVN Core
>>> 
>>> Red Hat EMEA 
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.redhat.com&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=CkZytTZ7UtWOMHjuINFR08uWFgn88IkIh4-vZVGTag9-o1c7PyOdn1J87Fv6qogB&s=u9UtLVzjeh1Z9tUqDp71Gsapcm69xcXPzLvrKWNBHAc&e=
>>>  >
>>> 
>>> [email protected]
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__red.ht_sig&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=CkZytTZ7UtWOMHjuINFR08uWFgn88IkIh4-vZVGTag9-o1c7PyOdn1J87Fv6qogB&s=S7KxWAv57N1RXgYcdfdHe8wb__Hl6qMARlXaxvwwpDk&e=
>>>  >
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=2PQjSDR7A28z1kXE1ptSm6X36oL_nCq1XxeEt7FkLmA&m=CkZytTZ7UtWOMHjuINFR08uWFgn88IkIh4-vZVGTag9-o1c7PyOdn1J87Fv6qogB&s=YSwteyImhl2dRMkuk-LjVkD5jEE_ItEoLVyJxlKLGHQ&e=


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to