On Wed, Oct 23, 2024 at 11:14 PM Numan Siddique <[email protected]> wrote:
>
> On Mon, Oct 21, 2024 at 12:01 PM Ales Musil <[email protected]> wrote:
> >
> > On Mon, Oct 21, 2024 at 5:21 PM Naveen Yerramneni <
> > [email protected]> wrote:
> >
> > > Issue:
> > > =====
> > > OVN controller main() is caching the "sbrec_mac_binding/sbrec_fdb"
> > > pointers in "mac_cache_mac_binding->mac_bindings/fdbs" which are
> > > used to update timestamp in mac_cache_mb_stats_run()/
> > > mac_cache_fdb_stats_run(). There is a condition where cached
> > > "sbrec_mac_binding/sbrec_fdb" pointers gets freed without
> > > removing the refs from "mac_cache_mac_binding->mac_bindings/fdbs"
> > > which is leading to crash due to "use after free".
> > >
> > > This condition can occur due to following reason.
> > >   - en_runtime_data is input to en_mac_cache.
> > >   - en_runtime_data went for full recompute but engine_run()
> > >     is called with recompute_allowed as False due to
> > >     ofctrl_has_backlog() returned True. This caused to avoid
> > >     the recompute in the current engine run and engine_run_aborted
> > >     is set to True. If recompute_allowed is False and
> > >     engine_run_aborted is True then, inc processing engine is
> > >     skipped.
> > >   - If there are any mac_binding/fdb deletes in the same run
> > >     then, entries gets deleted during ovsdb_idl_loop_run() and
> > >     rows are freed in ovsdb_idl_track_clear() but references
> > >     to them are present in mac_cache_data->mac_bindings/fdbs.
> > >
> > > Fix:
> > > ===
> > > Avoid statctrl_run() when engine recompute is not allowed.
> > >
> > > Signed-off-by: Naveen Yerramneni <[email protected]>
>
> Thanks for the fix.
>
> However this doesn't seem to be the right fix.
>
> The actual issue is because "mac_cache_data" is initialized before the
> start of the main loop
> using "engine_get_internal_data(&en_mac_cache)".
>
> Later this data is used inside the main loop when calling
> "statctrl_run(mac_cache_data).
>
> The issue will be fixed if mac_cache_data is retrieved from the
> en_mac_cache inside
> the while loop  as  ->  mac_cache_data = engine_get_data(&en_mac_cache);
>
> In the scenario you mentioned above when SB mac binding rows are
> deleted and the engine run was
> aborted,  the engine node "en_mac_cache" state will be  "Stale" and
> the function engine_get_data() will return NULL,
> thereby statctrl_run() will not be called.
>
> Can you please fix it this way ?

I went ahead and applied this patch to main and backported till 23.09
with the below changes
on top of this patch.


---------------------------
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index d821d11529..e87274121c 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5752,9 +5752,8 @@ main(int argc, char *argv[])
                         }
                     }

-                    if (mac_cache_data && recompute_allowed) {
-                        /* Run only when engine recompute is allowed
-                         * since mac_binding/FDB rows are cached */
+                    mac_cache_data = engine_get_data(&en_mac_cache);
+                    if (mac_cache_data) {
                         statctrl_run(ovnsb_idl_txn, mac_cache_data);
                     }
----------------------------------------
>
> Thanks
> Numan
>
>
>
> > > ---
> > > v2:
> > >   - Addressed review comments from Ales.
> > >   - Updated the description.
> > > ---
> > >  controller/ovn-controller.c | 59 +++++++++++++++++++------------------
> > >  1 file changed, 30 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > > index c48667887..88430a111 100644
> > > --- a/controller/ovn-controller.c
> > > +++ b/controller/ovn-controller.c
> > > @@ -5536,34 +5536,33 @@ main(int argc, char *argv[])
> > >
> > >                      stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
> > >                                      time_msec());
> > > -                    if (ovnsb_idl_txn) {
> > > -                        if (ofctrl_has_backlog()) {
> > > -                            /* When there are in-flight messages pending
> > > to
> > > -                             * ovs-vswitchd, we should hold on
> > > recomputing so
> > > -                             * that the previous flow installations won't
> > > be
> > > -                             * delayed.  However, we still want to try if
> > > -                             * recompute is not needed and we can quickly
> > > -                             * incrementally process the new changes, to
> > > avoid
> > > -                             * unnecessarily forced recomputes later on.
> > > This
> > > -                             * is because the OVSDB change tracker cannot
> > > -                             * preserve tracked changes across
> > > iterations.  If
> > > -                             * change tracking is improved, we can simply
> > > skip
> > > -                             * this round of engine_run and continue
> > > processing
> > > -                             * acculated changes incrementally later when
> > > -                             * ofctrl_has_backlog() returns false. */
> > > -                            engine_run(false);
> > > -                        } else {
> > > -                            engine_run(true);
> > > -                        }
> > > -                    } else {
> > > -                        /* Even if there's no SB DB transaction 
> > > available,
> > > -                         * try to run the engine so that we can handle 
> > > any
> > > -                         * incremental changes that don't require a
> > > recompute.
> > > -                         * If a recompute is required, the engine will
> > > cancel,
> > > -                         * triggerring a full run in the next iteration.
> > > -                         */
> > > -                        engine_run(false);
> > > -                    }
> > > +
> > > +                    /* Recompute is not allowed in following cases: */
> > > +                    /* 1. No ovnsb_idl_txn  */
> > > +                    /* Even if there's no SB DB transaction available,
> > > +                    * try to run the engine so that we can handle any
> > > +                    * incremental changes that don't require a recompute.
> > > +                    * If a recompute is required, the engine will cancel,
> > > +                    * triggerring a full run in the next iteration.
> > > +                    */
> > > +                    /* 2. ofctrl_has_backlog */
> > > +                    /* When there are in-flight messages pending to
> > > +                     * ovs-vswitchd, we should hold on recomputing so
> > > +                     * that the previous flow installations won't be
> > > +                     * delayed.  However, we still want to try if
> > > +                     * recompute is not needed and we can quickly
> > > +                     * incrementally process the new changes, to avoid
> > > +                     * unnecessarily forced recomputes later on.  This
> > > +                     * is because the OVSDB change tracker cannot
> > > +                     * preserve tracked changes across iterations.  If
> > > +                     * change tracking is improved, we can simply skip
> > > +                     * this round of engine_run and continue processing
> > > +                     * acculated changes incrementally later when
> > > +                     * ofctrl_has_backlog() returns false. */
> > > +
> > > +                    bool recompute_allowed = (ovnsb_idl_txn &&
> > > +                                              !ofctrl_has_backlog());
> > > +                    engine_run(recompute_allowed);
> > >                      stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
> > >                                     time_msec());
> > >                      if (engine_has_updated()) {
> > > @@ -5685,7 +5684,9 @@ main(int argc, char *argv[])
> > >                          }
> > >                      }
> > >
> > > -                    if (mac_cache_data) {
> > > +                    if (mac_cache_data && recompute_allowed) {
> > > +                        /* Run only when engine recompute is allowed
> > > +                         * since mac_binding/FDB rows are cached */
> > >                          statctrl_run(ovnsb_idl_txn, mac_cache_data);
> > >                      }
> > >
> > > --
> > > 2.36.6
> > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > >
> > Looks good to me, thanks.
> >
> > Acked-by: Ales Musil <[email protected]>
> >
> > --
> >
> > Ales Musil
> >
> > Senior Software Engineer - OVN Core
> >
> > Red Hat EMEA <https://www.redhat.com>
> >
> > [email protected]
> > <https://red.ht/sig>
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to