On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <[email protected]> wrote:
>
> On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <[email protected]> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <[email protected]> wrote:
> > >
> > > Thanks Numan for working on this issue!
> > >
> > > On Mon, Apr 26, 2021 at 3:53 AM <[email protected]> wrote:
> > > >
> > > > From: Numan Siddique <[email protected]>
> > > >
> > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
> > > > and if 'del_f' is not installed.  But there are a couple of scenarios
> > > > this can happen:
> > > >
> > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> couldn't
> > > > install the flow (because of the rconn queue is full) and an SB update
> > > > caused 'F1' to be removed and re-added as 'F2'.
> > > >
> > >
> > > This seems not possible, because del_f->installed_flow is set whenever a
> > > "installed_flow" is added to the installed_flows table, regardless of
> > > whether it is sent to OVS or not.
> >
> > This can happen if ofctrl_can_put() returns false in which case
> > ovn-controller will
> > not call ofctrl_put() at all. There is a theoretical  possibility here
> for sure.
> >
> > Let me know if you think otherwise.
> >
>
> Oh, sorry that I misunderstood. You are right that ofctrl_put() may return
> immediately without installing the desired flow F1, but if that happens,
> F1->installed_flow should be NULL, and then it is the same case as in
> scenario 2) below: when F1 is deleted, it would be destroyed in
> track_flow_del(). So we should never end up with a tracked deleted flow
> with installed_flow being NULL, right? The logic is, in theory, when a
> "desired but not yet installed" flow is being deleted, we should simply
> destroy it and remove it from tracking. Maybe we should check if there are
> other possibilities.

You're right.  F1 should be destroyed in track_flow_del().

I got access to the core dump and I can see below details of "f" and "del_f".

Please let me know if this rings any bell

-----
(gdb) frame 6
#6  0x00005569459bfbaa in merge_tracked_flows
(flow_table=0x5569470c5b40) at
/usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
283             hmap_expand_at(hmap, where);
(gdb) print del_f
$1 = (struct desired_flow *) 0x556948c670b0
(gdb) print *del_f
$2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
  match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
= 0x556948c67100, next = 0x556948c67100}, references = {prev =
0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
installed_ref_list_node = {prev = 0x5569495bcd90,
    next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
next = 0x556948d2bfc8}, is_deleted = true}


(gdb) print f
$7 = (struct desired_flow *) 0x556948d2bf40

(gdb) print *f
$6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
  match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
= {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
    next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
next = 0x556948e68658}, is_deleted = false}

------

Looks like there are some issues with physical flow handling.  Seems
to me an installed flow F1 is deleted and in the same loop, F2 is
added
to the tracked flow which has the same match, actions and cookie.

Table id is 37 which is OFTABLE_REMOTE_OUTPUT.  The SB resource can be
either port binding or multicast group.

Thanks for the comments so far. I will continue debugging.  This patch
can be dropped.

Numan

>
> > We observed this assertion during the upgrades of openshift to an ovn
> version
> > and there were a lot of "unreasonable timeout" warnings.
> >
> > The updated OVN version had the commit [1] which led me to think that
> > commit [1] can also cause this.
> >
> > >
> > > > 2. In a single engine run, a flow 'F1' was added to the desired flow,
> > > > removed from the desired flow and re-added as 'F2'.  This could
> > > > happen after the commit [1].
> > >
> > > In theory this should not happen either, because if F1 was added and
> then
> > > removed before it was installed, it would have been removed in
> > > track_flow_del():
> > >
> > >         if (!f->installed_flow) {
> > >             /* If it is not installed yet, simply destroy it. */
> > >             desired_flow_destroy(f);
> > >             return;
> > >         }
> > >
> > > In fact this is what the comment is supposed to say, but the comment
> had a
> > > typo (my bad):
> > >
> > >             /* del_f must have been installed, otherwise it should have
> been
> > >             * removed during track_flow_add_or_modify. */
> > >
> > > s/track_flow_add_or_modify/track_flow_del
> >
> > This typo really confused me :). From your comments the scenario 2 seems
> > not possible.
> >
> > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE.
> > What do you think ?
> >
> > Thanks
> > Numan
> >
> > >
> > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> into
> > > the logic, but I think removing the assert may just hide the problem.
> > >
> > > >
> > > > The likelihood of (2) seems to be higher with datapath groups enabled.
> > > >
> > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > few deployments after the commit [1].  This patch addressed this issue
> > > > by removing that assertion.  Removing this assertion should not have
> > > > any side effects as the flow 'F2' will be installed.
> > > >
> > > > This patch is proposed based on the code inspection and the
> possibility
> > > > that the above mentioned scenarios can happen.  The patch doesn't have
> > > > any test cases to reproduce these scenarios.
> > > >
> > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing
> flows
> > > with conj actions.")
> > > >
> > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > Signed-off-by: Numan Siddique <[email protected]>
> > > > ---
> > > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > index c29c3d180..9e960b034 100644
> > > > --- a/controller/ofctrl.c
> > > > +++ b/controller/ofctrl.c
> > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> ovn_desired_flow_table
> > > *flow_table)
> > > >                  continue;
> > > >              }
> > > >
> > > > -            /* del_f must have been installed, otherwise it should
> have
> > > been
> > > > -             * removed during track_flow_add_or_modify. */
> > > > -            ovs_assert(del_f->installed_flow);
> > > > +            if (!del_f->installed_flow) {
> > > > +                /* del_f must have been installed, otherwise it
> should
> > > have
> > > > +                 * been removed during track_flow_add_or_modify.
> > > > +                 *
> > > > +                 * But however there are a couple of scenarios this
> may
> > > not
> > > > +                 * happen.
> > > > +                 *
> > > > +                 * Scenario 1:  A flow was added to the desired
> flows,
> > > but
> > > > +                 *              ofctrl_put() couldn't install the
> flow
> > > and
> > > > +                 *              an SB update caused the 'del_f' to be
> > > removed
> > > > +                 *              and re-added as 'f'.
> > > > +                 *
> > > > +                 * Scenario 2:  In a  single engine run, a flow
> 'del_f'
> > > was
> > > > +                 *              added to the desired flow, removed
> from
> > > the
> > > > +                 *              desired flow and re-added as 'f'.
> This
> > > could
> > > > +                 *              happen after the commit c6c61b4e3462.
> > > > +                 *
> > > > +                 * Treat both the scenarios as valid scenarios and
> just
> > > remove
> > > > +                 * 'del_f' from the hmap - deleted_flows.
> > > > +                 * update_installed_flows_by_track() will install
> 'f'.
> > > > +                 */
> > > >
> > > > -            if (!f->installed_flow) {
> > > > +            } else if (!f->installed_flow) {
> > > >                  /* f is not installed yet. */
> > > >                  replace_installed_to_desired(del_f->installed_flow,
> > > del_f, f);
> > > >              } else {
> > > > --
> > > > 2.29.2
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to