On Tue, Feb 16, 2021 at 4:41 PM Ilya Maximets <[email protected]> wrote:
>
> On 2/16/21 8:02 AM, [email protected] wrote:
> > From: Numan Siddique <[email protected]>
>
> Hi, Numan.
> Thanks for the fix!  The logic seems correct, but I'd suggest to
> implement the fix a bit differently just to avoid unnecessary
> complications.  This code is already very hard to understand.
>
> Test would be nice, but it could be a separate change.  BTW, this
> module is kind of isolated, so it should be possible to write real
> unit test.
>
> Comments inline.
>
> Best regards, Ilya Maximets.

Thanks a lot for the reviews.

I have addressed them and sent v2. Request to please take a look.

I have added a test case. I will work on adding unit test cases as a
follow up patch.

Thanks
Numan

>
> >
> > In one of the scaled deployments, ovn-controller is asserting with the
> > below stack trace
> >
> > ****************
> > (gdb) bt
> >    #0  0x00007fa6aed4970f in raise () from /lib64/libc.so.6
> >    #1  0x00007fa6aed33b25 in abort () from /lib64/libc.so.6
> >    #2  0x000055d46594a714 in ovs_abort_valist (err_no=err_no@entry=0, 
> > format=format@entry=0x55d465a2a190 "%s: assertion %s failed in %s()", 
> > args=args@entry=0x7ffd24307ba0) at lib/util.c:419
> >    #3  0x000055d465952504 in vlog_abort_valist (module_=<optimized out>, 
> > message=0x55d465a2a190 "%s: assertion %s failed in %s()", 
> > args=args@entry=0x7ffd24307ba0) at lib/vlog.c:1249
> >    #4  0x000055d4659525aa in vlog_abort (module=module@entry=0x55d465ce6880 
> > <this_module>, message=message@entry=0x55d465a2a190 "%s: assertion %s 
> > failed in %s()") at lib/vlog.c:1263
> >    #5  0x000055d46594a42b in ovs_assert_failure 
> > (where=where@entry=0x55d465a05529 "controller/ofctrl.c:1198", 
> > function=function@entry=0x55d465a05ca0 <__func__.33798> 
> > "flood_remove_flows_for_sb_uuid",
> >     condition=condition@entry=0x55d465a05a80 
> > "ovs_list_is_empty(&f->list_node)") at lib/util.c:86
> >    #6  0x000055d465877aa2 in flood_remove_flows_for_sb_uuid 
> > (flow_table=flow_table@entry=0x55d46688d080, 
> > sb_uuid=sb_uuid@entry=0x55d53dbec538, 
> > flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
> > controller/ofctrl.c:1205
> >    #7  0x000055d465877c2e in flood_remove_flows_for_sb_uuid 
> > (flow_table=flow_table@entry=0x55d46688d080, 
> > sb_uuid=sb_uuid@entry=0x55d546553898, 
> > flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
> > controller/ofctrl.c:1230
> >    #8  0x000055d465877c2e in flood_remove_flows_for_sb_uuid 
> > (flow_table=flow_table@entry=0x55d46688d080, 
> > sb_uuid=sb_uuid@entry=0x55d55eafebf0, 
> > flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
> > controller/ofctrl.c:1230
> >    #9  0x000055d465877dc2 in ofctrl_flood_remove_flows 
> > (flow_table=0x55d46688d080, 
> > flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
> > controller/ofctrl.c:1250
> >    #10 0x000055d465872b24 in lflow_handle_changed_ref 
> > (ref_type=ref_type@entry=REF_TYPE_PORTGROUP, 
> > ref_name=ref_name@entry=0x55d49375a050 
> > "5564_pg_6415729e_58ec_4b8b_bc99_2ceef5c44bac", 
> > l_ctx_in=l_ctx_in@entry=0x7ffd24307fd0,
> >     l_ctx_out=l_ctx_out@entry=0x7ffd24307f90, 
> > changed=changed@entry=0x7ffd24307f8f) at controller/lflow.c:612
> >    #11 0x000055d46588f2f8 in _flow_output_resource_ref_handler 
> > (node=<optimized out>, data=<optimized out>, ref_type=REF_TYPE_PORTGROUP) 
> > at controller/ovn-controller.c:2181
> >    #12 0x000055d4658a8163 in engine_compute (recompute_allowed=<optimized 
> > out>, node=<optimized out>) at lib/inc-proc-eng.c:306
> >    #13 engine_run_node (recompute_allowed=true, node=0x7ffd2430d4b0) at 
> > lib/inc-proc-eng.c:352
> >    #14 engine_run (recompute_allowed=recompute_allowed@entry=true) at 
> > lib/inc-proc-eng.c:377
> >    #15 0x000055d46586613d in main (argc=<optimized out>, argv=<optimized 
> > out>) at controller/ovn-controller.c:2794
> >
> > ***************
>
> Thanks for providing the backtrace.  It's hard to read though and
> it contains too much unnecessary information for a commit message.
>
> I'd say we can strip it like this to be more readable:
>
> ***
>  (gdb) bt
>    0  raise () from /lib64/libc.so.6
>    1  abort () from /lib64/libc.so.6
>    2  ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419
>    3  vlog_abort_valist ("%s: assertion %s failed in %s()") at lib/vlog.c:1249
>    4  vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
>    5  ovs_assert_failure (where="controller/ofctrl.c:1198",
>                           function="flood_remove_flows_for_sb_uuid",
>                           condition="ovs_list_is_empty(&f->list_node)") at 
> lib/util.c:86
>    6  flood_remove_flows_for_sb_uuid (sb_uuid=...538,
>         flood_remove_nodes=...ed0) at controller/ofctrl.c:1205
>    7  flood_remove_flows_for_sb_uuid (sb_uuid=...898,
>         flood_remove_nodes=...ed0) at controller/ofctrl.c:1230
>    8  flood_remove_flows_for_sb_uuid (sb_uuid=...bf0,
>         flood_remove_nodes=...ed0) at controller/ofctrl.c:1230
>    9  ofctrl_flood_remove_flows (flood_remove_nodes=...ed0) at 
> controller/ofctrl.c:1250
>    10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP,
>         ref_name= "5564_pg_64...bac") at controller/lflow.c:612
>    11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP)
>         at controller/ovn-controller.c:2181
>    12 engine_compute () at lib/inc-proc-eng.c:306
>    13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352
>    14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377
>    15 main () at controller/ovn-controller.c:2794
> ***
>
> >
> > This assertion is seen when a port group gets updated and it is referenced 
> > by many
> > logical flows (with conj actions).  The function
> > ofctrl_flood_remove_flows(), calls flood_remove_flows_for_sb_uuid() for
> > each sb uuid in the hmap - flood_remove_nodes using HMAP_FOR_EACH 
> > (flood_remove_nodes).
> > flood_remove_flows_for_sb_uuid() also takes the hmap
> > 'flood_remove_nodes' as an argument and it inserts few items into it
> > when it has to call itself recursively.  When an item is inserted, its 
> > possible that
> > the hmap may get expanded.  And if this happens, the HMAP_FOR_EACH ()
> > skips few entries causing some of the desired flows not getting cleared.
>
> Line wrapping might be a bit better in above text, I think. :)
> Sorry for nitpicking.
>
> >
> > Later when ofctrl_add_or_append_flow() is called, there would be
> > multiple  'struct sb_flow_ref' references for the same desired flow.
> > And this causes the above assertion later when the same port group gets
> > updated.
> >
> > This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and
> > using it to iterate the flood remove nodes.
>
> We're only copying this hmap for iteration purposes, so there is no real
> reason to copy it as a hmap.  Simple array will save memory and will
> be much faster.  And it will be much more readable, IMHO.
>
> Something like this (not tested):
>
> ---
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 9d62e1260..a2cdbb507 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -1247,11 +1247,24 @@ ofctrl_flood_remove_flows(struct 
> ovn_desired_flow_table *flow_table,
>                            struct hmap *flood_remove_nodes)
>  {
>      struct ofctrl_flood_remove_node *ofrn;
> +    struct uuid *sb_uuids;
> +    int i, n = 0;
> +
> +    /* flood_remove_flows_for_sb_uuid() will modify the 'flood_remove_nodes'
> +     * hash map by inserting new items, so we can't use it for iteration.
> +     * Copying. */
> +    sb_uuids = xmalloc(hmap_count(flood_remove_nodes) * sizeof *sb_uuids);
>      HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
> -        flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid,
> +        sb_uuids[n++] = ofrn->sb_uuid;
> +    }
> +
> +    for (i = 0; i < n; i++) {
> +        flood_remove_flows_for_sb_uuid(flow_table, &sb_uuids[i],
>                                         flood_remove_nodes);
>      }
>
> +    free(sb_uuids);
> +
>      /* remove any related group and meter info */
>      HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
>          ovn_extend_table_remove_desired(groups, &ofrn->sb_uuid);
> ---
>
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1928012
> > CC: Han Zhou <[email protected]>
> > CC: Dumitru Ceara <[email protected]>
> > Signed-off-by: Numan Siddique <[email protected]>
> > ---
> >  controller/ofctrl.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index 9d62e1260..42443acb2 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -1247,10 +1247,28 @@ ofctrl_flood_remove_flows(struct 
> > ovn_desired_flow_table *flow_table,
> >                            struct hmap *flood_remove_nodes)
> >  {
> >      struct ofctrl_flood_remove_node *ofrn;
> > +
> > +    /* flood_remove_flows_for_sb_uuid() modifies the hmap by inserting
> > +     * few entries when it calls itself recursively.  Clone the
> > +     * hmap 'flood_remove_nodes' before calling.  Inserting an item to
> > +     * hmap may expand it. */
> > +    struct hmap flood_remove_uuids = HMAP_INITIALIZER(&flood_remove_uuids);
> >      HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
> > +        ofctrl_flood_remove_add_node(&flood_remove_uuids, &ofrn->sb_uuid);
> > +    }
> > +
> > +    /* Iterate using the cloned hmap - 'flood_remove_uuids', but pass
> > +     * the hmap 'flood_remove_nodes' provided by the caller.
> > +     * flood_remove_flows_for_sb_uuid() will delete the other lflows
> > +     * referenced by the sb_uuid,  which needs to be re-added later
> > +     * if those sb_uuids were not deleted. The caller
> > +     * (in lflow.c) will add re-add lflows which are not deleted. */
>
> These comments seems very complex and overloaded with information.  It makes
> understanding of what is going on not easy.  I'd replace them with something
> simple, e.g. the comment that I have in me example above.
>
> > +    HMAP_FOR_EACH_POP (ofrn, hmap_node, &flood_remove_uuids) {
> >          flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid,
> >                                         flood_remove_nodes);
> > +        free(ofrn);
> >      }
> > +    hmap_destroy(&flood_remove_uuids);
> >
> >      /* remove any related group and meter info */
> >      HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
> >
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to