On 2/16/21 12:11 PM, Ilya Maximets wrote: > On 2/16/21 8:02 AM, [email protected] wrote: >> From: Numan Siddique <[email protected]> > > Hi, Numan. > Thanks for the fix! The logic seems correct, but I'd suggest to > implement the fix a bit differently just to avoid unnecessary > complications. This code is already very hard to understand. > > Test would be nice, but it could be a separate change. BTW, this > module is kind of isolated, so it should be possible to write real > unit test. > > Comments inline. > > Best regards, Ilya Maximets. > >> >> In one of the scaled deployments, ovn-controller is asserting with the >> below stack trace >> >> **************** >> (gdb) bt >> #0 0x00007fa6aed4970f in raise () from /lib64/libc.so.6 >> #1 0x00007fa6aed33b25 in abort () from /lib64/libc.so.6 >> #2 0x000055d46594a714 in ovs_abort_valist (err_no=err_no@entry=0, >> format=format@entry=0x55d465a2a190 "%s: assertion %s failed in %s()", >> args=args@entry=0x7ffd24307ba0) at lib/util.c:419 >> #3 0x000055d465952504 in vlog_abort_valist (module_=<optimized out>, >> message=0x55d465a2a190 "%s: assertion %s failed in %s()", >> args=args@entry=0x7ffd24307ba0) at lib/vlog.c:1249 >> #4 0x000055d4659525aa in vlog_abort (module=module@entry=0x55d465ce6880 >> <this_module>, message=message@entry=0x55d465a2a190 "%s: assertion %s failed >> in %s()") at lib/vlog.c:1263 >> #5 0x000055d46594a42b in ovs_assert_failure >> (where=where@entry=0x55d465a05529 "controller/ofctrl.c:1198", >> function=function@entry=0x55d465a05ca0 <__func__.33798> >> "flood_remove_flows_for_sb_uuid", >> condition=condition@entry=0x55d465a05a80 >> "ovs_list_is_empty(&f->list_node)") at lib/util.c:86 >> #6 0x000055d465877aa2 in flood_remove_flows_for_sb_uuid >> (flow_table=flow_table@entry=0x55d46688d080, >> sb_uuid=sb_uuid@entry=0x55d53dbec538, >> flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at >> controller/ofctrl.c:1205 >> #7 0x000055d465877c2e in flood_remove_flows_for_sb_uuid >> (flow_table=flow_table@entry=0x55d46688d080, >> sb_uuid=sb_uuid@entry=0x55d546553898, >> flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at >> controller/ofctrl.c:1230 >> #8 0x000055d465877c2e in flood_remove_flows_for_sb_uuid >> (flow_table=flow_table@entry=0x55d46688d080, >> sb_uuid=sb_uuid@entry=0x55d55eafebf0, >> flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at >> controller/ofctrl.c:1230 >> #9 0x000055d465877dc2 in ofctrl_flood_remove_flows >> (flow_table=0x55d46688d080, >> flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at >> controller/ofctrl.c:1250 >> #10 0x000055d465872b24 in lflow_handle_changed_ref >> (ref_type=ref_type@entry=REF_TYPE_PORTGROUP, >> ref_name=ref_name@entry=0x55d49375a050 >> "5564_pg_6415729e_58ec_4b8b_bc99_2ceef5c44bac", >> l_ctx_in=l_ctx_in@entry=0x7ffd24307fd0, >> l_ctx_out=l_ctx_out@entry=0x7ffd24307f90, >> changed=changed@entry=0x7ffd24307f8f) at controller/lflow.c:612 >> #11 0x000055d46588f2f8 in _flow_output_resource_ref_handler >> (node=<optimized out>, data=<optimized out>, ref_type=REF_TYPE_PORTGROUP) at >> controller/ovn-controller.c:2181 >> #12 0x000055d4658a8163 in engine_compute (recompute_allowed=<optimized >> out>, node=<optimized out>) at lib/inc-proc-eng.c:306 >> #13 engine_run_node (recompute_allowed=true, node=0x7ffd2430d4b0) at >> lib/inc-proc-eng.c:352 >> #14 engine_run (recompute_allowed=recompute_allowed@entry=true) at >> lib/inc-proc-eng.c:377 >> #15 0x000055d46586613d in main (argc=<optimized out>, argv=<optimized >> out>) at controller/ovn-controller.c:2794 >> >> *************** > > Thanks for providing the backtrace. It's hard to read though and > it contains too much unnecessary information for a commit message. > > I'd say we can strip it like this to be more readable: > > *** > (gdb) bt > 0 raise () from /lib64/libc.so.6 > 1 abort () from /lib64/libc.so.6 > 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419 > 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at lib/vlog.c:1249 > 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263 > 5 ovs_assert_failure (where="controller/ofctrl.c:1198", > function="flood_remove_flows_for_sb_uuid", > condition="ovs_list_is_empty(&f->list_node)") at > lib/util.c:86 > 6 flood_remove_flows_for_sb_uuid (sb_uuid=...538, > flood_remove_nodes=...ed0) at controller/ofctrl.c:1205 > 7 flood_remove_flows_for_sb_uuid (sb_uuid=...898, > flood_remove_nodes=...ed0) at controller/ofctrl.c:1230 > 8 flood_remove_flows_for_sb_uuid (sb_uuid=...bf0, > flood_remove_nodes=...ed0) at controller/ofctrl.c:1230 > 9 ofctrl_flood_remove_flows (flood_remove_nodes=...ed0) at > controller/ofctrl.c:1250 > 10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP, > ref_name= "5564_pg_64...bac") at controller/lflow.c:612 > 11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP) > at controller/ovn-controller.c:2181 > 12 engine_compute () at lib/inc-proc-eng.c:306 > 13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352 > 14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377 > 15 main () at controller/ovn-controller.c:2794 > *** > >> >> This assertion is seen when a port group gets updated and it is referenced >> by many >> logical flows (with conj actions). The function >> ofctrl_flood_remove_flows(), calls flood_remove_flows_for_sb_uuid() for >> each sb uuid in the hmap - flood_remove_nodes using HMAP_FOR_EACH >> (flood_remove_nodes). >> flood_remove_flows_for_sb_uuid() also takes the hmap >> 'flood_remove_nodes' as an argument and it inserts few items into it >> when it has to call itself recursively. When an item is inserted, its >> possible that >> the hmap may get expanded. And if this happens, the HMAP_FOR_EACH () >> skips few entries causing some of the desired flows not getting cleared. > > Line wrapping might be a bit better in above text, I think. :) > Sorry for nitpicking. > >> >> Later when ofctrl_add_or_append_flow() is called, there would be >> multiple 'struct sb_flow_ref' references for the same desired flow. >> And this causes the above assertion later when the same port group gets >> updated. >> >> This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and >> using it to iterate the flood remove nodes. > > We're only copying this hmap for iteration purposes, so there is no real > reason to copy it as a hmap. Simple array will save memory and will > be much faster. And it will be much more readable, IMHO. > > Something like this (not tested): > > --- > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > index 9d62e1260..a2cdbb507 100644 > --- a/controller/ofctrl.c > +++ b/controller/ofctrl.c > @@ -1247,11 +1247,24 @@ ofctrl_flood_remove_flows(struct > ovn_desired_flow_table *flow_table, > struct hmap *flood_remove_nodes) > { > struct ofctrl_flood_remove_node *ofrn; > + struct uuid *sb_uuids; > + int i, n = 0; > + > + /* flood_remove_flows_for_sb_uuid() will modify the 'flood_remove_nodes' > + * hash map by inserting new items, so we can't use it for iteration. > + * Copying. */ > + sb_uuids = xmalloc(hmap_count(flood_remove_nodes) * sizeof *sb_uuids); > HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) { > - flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid, > + sb_uuids[n++] = ofrn->sb_uuid; > + } > + > + for (i = 0; i < n; i++) { > + flood_remove_flows_for_sb_uuid(flow_table, &sb_uuids[i], > flood_remove_nodes); > } > > + free(sb_uuids); > + > /* remove any related group and meter info */ > HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) { > ovn_extend_table_remove_desired(groups, &ofrn->sb_uuid); > --- > >> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1928012 >> CC: Han Zhou <[email protected]> >> CC: Dumitru Ceara <[email protected]>
^^ This is not a correct email. Got bounced. :) CC: correct one. >> Signed-off-by: Numan Siddique <[email protected]> >> --- >> controller/ofctrl.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >> index 9d62e1260..42443acb2 100644 >> --- a/controller/ofctrl.c >> +++ b/controller/ofctrl.c >> @@ -1247,10 +1247,28 @@ ofctrl_flood_remove_flows(struct >> ovn_desired_flow_table *flow_table, >> struct hmap *flood_remove_nodes) >> { >> struct ofctrl_flood_remove_node *ofrn; >> + >> + /* flood_remove_flows_for_sb_uuid() modifies the hmap by inserting >> + * few entries when it calls itself recursively. Clone the >> + * hmap 'flood_remove_nodes' before calling. Inserting an item to >> + * hmap may expand it. */ >> + struct hmap flood_remove_uuids = HMAP_INITIALIZER(&flood_remove_uuids); >> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) { >> + ofctrl_flood_remove_add_node(&flood_remove_uuids, &ofrn->sb_uuid); >> + } >> + >> + /* Iterate using the cloned hmap - 'flood_remove_uuids', but pass >> + * the hmap 'flood_remove_nodes' provided by the caller. >> + * flood_remove_flows_for_sb_uuid() will delete the other lflows >> + * referenced by the sb_uuid, which needs to be re-added later >> + * if those sb_uuids were not deleted. The caller >> + * (in lflow.c) will add re-add lflows which are not deleted. */ > > These comments seems very complex and overloaded with information. It makes > understanding of what is going on not easy. I'd replace them with something > simple, e.g. the comment that I have in me example above. > >> + HMAP_FOR_EACH_POP (ofrn, hmap_node, &flood_remove_uuids) { >> flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid, >> flood_remove_nodes); >> + free(ofrn); >> } >> + hmap_destroy(&flood_remove_uuids); >> >> /* remove any related group and meter info */ >> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) { >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
