On Wed, May 5, 2021 at 2:25 AM Han Zhou <[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <[email protected]> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <[email protected]> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <[email protected]> wrote:
> > > >
> > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <[email protected]> wrote:
> > > > >
> > > > > Thanks Numan for working on this issue!
> > > > >
> > > > > On Mon, Apr 26, 2021 at 3:53 AM <[email protected]> wrote:
> > > > > >
> > > > > > From: Numan Siddique <[email protected]>
> > > > > >
> > > > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are
> equal
> > > > > > and if 'del_f' is not installed. But there are a couple of
> scenarios
> > > > > > this can happen:
> > > > > >
> > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> > > couldn't
> > > > > > install the flow (because of the rconn queue is full) and an SB
> update
> > > > > > caused 'F1' to be removed and re-added as 'F2'.
> > > > > >
> > > > >
> > > > > This seems not possible, because del_f->installed_flow is set
> whenever a
> > > > > "installed_flow" is added to the installed_flows table, regardless
> of
> > > > > whether it is sent to OVS or not.
> > > >
> > > > This can happen if ofctrl_can_put() returns false in which case
> > > > ovn-controller will
> > > > not call ofctrl_put() at all. There is a theoretical possibility here
> > > for sure.
> > > >
> > > > Let me know if you think otherwise.
> > > >
> > >
> > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may
> return
> > > immediately without installing the desired flow F1, but if that happens,
> > > F1->installed_flow should be NULL, and then it is the same case as in
> > > scenario 2) below: when F1 is deleted, it would be destroyed in
> > > track_flow_del(). So we should never end up with a tracked deleted flow
> > > with installed_flow being NULL, right? The logic is, in theory, when a
> > > "desired but not yet installed" flow is being deleted, we should simply
> > > destroy it and remove it from tracking. Maybe we should check if there
> are
> > > other possibilities.
> >
> > You're right. F1 should be destroyed in track_flow_del().
> >
> > I got access to the core dump and I can see below details of "f" and
> "del_f".
> >
> > Please let me know if this rings any bell
> >
> > -----
> > (gdb) frame 6
> > #6 0x00005569459bfbaa in merge_tracked_flows
> > (flow_table=0x5569470c5b40) at
> >
> /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
> > 283 hmap_expand_at(hmap, where);
> > (gdb) print del_f
> > $1 = (struct desired_flow *) 0x556948c670b0
> > (gdb) print *del_f
> > $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> > 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
> > 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> > 0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
> > match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
> > = 0x556948c67100, next = 0x556948c67100}, references = {prev =
> > 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
> > installed_ref_list_node = {prev = 0x5569495bcd90,
> > next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
> > next = 0x556948d2bfc8}, is_deleted = true}
> >
> Looking at the content of the del_f, the installed_ref_list_node is not
> pointing to the field of the flow itself, which means at least it was in
> some installed_flow's desired_refs list, which means it was installed at
> least at some point before. Now that the "installed_flow" is NULL, it is
> possible that the desired flow was installed but then unlinked from the
> installed flow. But by reviewing the code I didn't find any path that
> "unlink_installed_to_desired()" or "replace_installed_to_desired()" could
> happen to this flow before this point. If it is unlinked it should also be
> removed from the track list and also from the deleted_flows hmap and
> destroyed so it should never be found again. Of course it is also possible
> that this desired_flow data structure is corrupted so the fields
> nstalled_ref_list_node and installed_flow just have inconsistent data. I
> will keep looking into the code. At the same time, is this reproducible? We
> could add some more logs to help debugging if this happens again.
>
In one of the environments, his issue was seen when upgrading an OCP cluster.
and there were some failures related to memory and OOM had kicked in.
I am wondering if the issue is related to OOM. In order to address the
OOM issues,
few kernel patches were required. After addressing these issues, ovn-controller
assert is not seen after that.
Thanks for looking into the backtraces.
Numan
> >
> > (gdb) print f
> > $7 = (struct desired_flow *) 0x556948d2bf40
> >
> > (gdb) print *f
> > $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> > 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
> > 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> > 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
> > match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
> > list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
> > = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
> > 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
> > next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
> > next = 0x556948e68658}, is_deleted = false}
> >
> > ------
> >
> > Looks like there are some issues with physical flow handling. Seems
> > to me an installed flow F1 is deleted and in the same loop, F2 is
> > added
> > to the tracked flow which has the same match, actions and cookie.
> >
> > Table id is 37 which is OFTABLE_REMOTE_OUTPUT. The SB resource can be
> > either port binding or multicast group.
> >
> > Thanks for the comments so far. I will continue debugging. This patch
> > can be dropped.
> >
> > Numan
> >
> > >
> > > > We observed this assertion during the upgrades of openshift to an ovn
> > > version
> > > > and there were a lot of "unreasonable timeout" warnings.
> > > >
> > > > The updated OVN version had the commit [1] which led me to think that
> > > > commit [1] can also cause this.
> > > >
> > > > >
> > > > > > 2. In a single engine run, a flow 'F1' was added to the desired
> flow,
> > > > > > removed from the desired flow and re-added as 'F2'. This could
> > > > > > happen after the commit [1].
> > > > >
> > > > > In theory this should not happen either, because if F1 was added and
> > > then
> > > > > removed before it was installed, it would have been removed in
> > > > > track_flow_del():
> > > > >
> > > > > if (!f->installed_flow) {
> > > > > /* If it is not installed yet, simply destroy it. */
> > > > > desired_flow_destroy(f);
> > > > > return;
> > > > > }
> > > > >
> > > > > In fact this is what the comment is supposed to say, but the comment
> > > had a
> > > > > typo (my bad):
> > > > >
> > > > > /* del_f must have been installed, otherwise it should
> have
> > > been
> > > > > * removed during track_flow_add_or_modify. */
> > > > >
> > > > > s/track_flow_add_or_modify/track_flow_del
> > > >
> > > > This typo really confused me :). From your comments the scenario 2
> seems
> > > > not possible.
> > > >
> > > > But I still think scenario 1 can happen if ofctrl_can_put() returns
> FALSE.
> > > > What do you think ?
> > > >
> > > > Thanks
> > > > Numan
> > > >
> > > > >
> > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> > > into
> > > > > the logic, but I think removing the assert may just hide the
> problem.
> > > > >
> > > > > >
> > > > > > The likelihood of (2) seems to be higher with datapath groups
> enabled.
> > > > > >
> > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > > > few deployments after the commit [1]. This patch addressed this
> issue
> > > > > > by removing that assertion. Removing this assertion should not
> have
> > > > > > any side effects as the flow 'F2' will be installed.
> > > > > >
> > > > > > This patch is proposed based on the code inspection and the
> > > possibility
> > > > > > that the above mentioned scenarios can happen. The patch doesn't
> have
> > > > > > any test cases to reproduce these scenarios.
> > > > > >
> > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood
> removing
> > > flows
> > > > > with conj actions.")
> > > > > >
> > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > > > Signed-off-by: Numan Siddique <[email protected]>
> > > > > > ---
> > > > > > controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > > > > 1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > > > index c29c3d180..9e960b034 100644
> > > > > > --- a/controller/ofctrl.c
> > > > > > +++ b/controller/ofctrl.c
> > > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> > > ovn_desired_flow_table
> > > > > *flow_table)
> > > > > > continue;
> > > > > > }
> > > > > >
> > > > > > - /* del_f must have been installed, otherwise it
> should
> > > have
> > > > > been
> > > > > > - * removed during track_flow_add_or_modify. */
> > > > > > - ovs_assert(del_f->installed_flow);
> > > > > > + if (!del_f->installed_flow) {
> > > > > > + /* del_f must have been installed, otherwise it
> > > should
> > > > > have
> > > > > > + * been removed during track_flow_add_or_modify.
> > > > > > + *
> > > > > > + * But however there are a couple of scenarios
> this
> > > may
> > > > > not
> > > > > > + * happen.
> > > > > > + *
> > > > > > + * Scenario 1: A flow was added to the desired
> > > flows,
> > > > > but
> > > > > > + * ofctrl_put() couldn't install the
> > > flow
> > > > > and
> > > > > > + * an SB update caused the 'del_f'
> to be
> > > > > removed
> > > > > > + * and re-added as 'f'.
> > > > > > + *
> > > > > > + * Scenario 2: In a single engine run, a flow
> > > 'del_f'
> > > > > was
> > > > > > + * added to the desired flow,
> removed
> > > from
> > > > > the
> > > > > > + * desired flow and re-added as 'f'.
> > > This
> > > > > could
> > > > > > + * happen after the commit
> c6c61b4e3462.
> > > > > > + *
> > > > > > + * Treat both the scenarios as valid scenarios
> and
> > > just
> > > > > remove
> > > > > > + * 'del_f' from the hmap - deleted_flows.
> > > > > > + * update_installed_flows_by_track() will install
> > > 'f'.
> > > > > > + */
> > > > > >
> > > > > > - if (!f->installed_flow) {
> > > > > > + } else if (!f->installed_flow) {
> > > > > > /* f is not installed yet. */
> > > > > >
> replace_installed_to_desired(del_f->installed_flow,
> > > > > del_f, f);
> > > > > > } else {
> > > > > > --
> > > > > > 2.29.2
> > > > > >
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > [email protected]
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > [email protected]
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > >
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev