From: Numan Siddique <[email protected]>

In one of the scaled deployments, ovn-controller is asserting with the
below stack trace

****************
(gdb) bt
   #0  0x00007fa6aed4970f in raise () from /lib64/libc.so.6
   #1  0x00007fa6aed33b25 in abort () from /lib64/libc.so.6
   #2  0x000055d46594a714 in ovs_abort_valist (err_no=err_no@entry=0, 
format=format@entry=0x55d465a2a190 "%s: assertion %s failed in %s()", 
args=args@entry=0x7ffd24307ba0) at lib/util.c:419
   #3  0x000055d465952504 in vlog_abort_valist (module_=<optimized out>, 
message=0x55d465a2a190 "%s: assertion %s failed in %s()", 
args=args@entry=0x7ffd24307ba0) at lib/vlog.c:1249
   #4  0x000055d4659525aa in vlog_abort (module=module@entry=0x55d465ce6880 
<this_module>, message=message@entry=0x55d465a2a190 "%s: assertion %s failed in 
%s()") at lib/vlog.c:1263
   #5  0x000055d46594a42b in ovs_assert_failure 
(where=where@entry=0x55d465a05529 "controller/ofctrl.c:1198", 
function=function@entry=0x55d465a05ca0 <__func__.33798> 
"flood_remove_flows_for_sb_uuid",
    condition=condition@entry=0x55d465a05a80 
"ovs_list_is_empty(&f->list_node)") at lib/util.c:86
   #6  0x000055d465877aa2 in flood_remove_flows_for_sb_uuid 
(flow_table=flow_table@entry=0x55d46688d080, 
sb_uuid=sb_uuid@entry=0x55d53dbec538, 
flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
controller/ofctrl.c:1205
   #7  0x000055d465877c2e in flood_remove_flows_for_sb_uuid 
(flow_table=flow_table@entry=0x55d46688d080, 
sb_uuid=sb_uuid@entry=0x55d546553898, 
flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
controller/ofctrl.c:1230
   #8  0x000055d465877c2e in flood_remove_flows_for_sb_uuid 
(flow_table=flow_table@entry=0x55d46688d080, 
sb_uuid=sb_uuid@entry=0x55d55eafebf0, 
flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
controller/ofctrl.c:1230
   #9  0x000055d465877dc2 in ofctrl_flood_remove_flows 
(flow_table=0x55d46688d080, 
flood_remove_nodes=flood_remove_nodes@entry=0x7ffd24307ed0) at 
controller/ofctrl.c:1250
   #10 0x000055d465872b24 in lflow_handle_changed_ref 
(ref_type=ref_type@entry=REF_TYPE_PORTGROUP, 
ref_name=ref_name@entry=0x55d49375a050 
"5564_pg_6415729e_58ec_4b8b_bc99_2ceef5c44bac", 
l_ctx_in=l_ctx_in@entry=0x7ffd24307fd0,
    l_ctx_out=l_ctx_out@entry=0x7ffd24307f90, 
changed=changed@entry=0x7ffd24307f8f) at controller/lflow.c:612
   #11 0x000055d46588f2f8 in _flow_output_resource_ref_handler (node=<optimized 
out>, data=<optimized out>, ref_type=REF_TYPE_PORTGROUP) at 
controller/ovn-controller.c:2181
   #12 0x000055d4658a8163 in engine_compute (recompute_allowed=<optimized out>, 
node=<optimized out>) at lib/inc-proc-eng.c:306
   #13 engine_run_node (recompute_allowed=true, node=0x7ffd2430d4b0) at 
lib/inc-proc-eng.c:352
   #14 engine_run (recompute_allowed=recompute_allowed@entry=true) at 
lib/inc-proc-eng.c:377
   #15 0x000055d46586613d in main (argc=<optimized out>, argv=<optimized out>) 
at controller/ovn-controller.c:2794

***************

This assertion is seen when a port group gets updated and it is referenced by 
many
logical flows (with conj actions).  The function
ofctrl_flood_remove_flows(), calls flood_remove_flows_for_sb_uuid() for
each sb uuid in the hmap - flood_remove_nodes using HMAP_FOR_EACH 
(flood_remove_nodes).
flood_remove_flows_for_sb_uuid() also takes the hmap
'flood_remove_nodes' as an argument and it inserts few items into it
when it has to call itself recursively.  When an item is inserted, its possible 
that
the hmap may get expanded.  And if this happens, the HMAP_FOR_EACH ()
skips few entries causing some of the desired flows not getting cleared.

Later when ofctrl_add_or_append_flow() is called, there would be
multiple  'struct sb_flow_ref' references for the same desired flow.
And this causes the above assertion later when the same port group gets
updated.

This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and
using it to iterate the flood remove nodes.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1928012
CC: Han Zhou <[email protected]>
CC: Dumitru Ceara <[email protected]>
Signed-off-by: Numan Siddique <[email protected]>
---
 controller/ofctrl.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 9d62e1260..42443acb2 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -1247,10 +1247,28 @@ ofctrl_flood_remove_flows(struct ovn_desired_flow_table 
*flow_table,
                           struct hmap *flood_remove_nodes)
 {
     struct ofctrl_flood_remove_node *ofrn;
+
+    /* flood_remove_flows_for_sb_uuid() modifies the hmap by inserting
+     * few entries when it calls itself recursively.  Clone the
+     * hmap 'flood_remove_nodes' before calling.  Inserting an item to
+     * hmap may expand it. */
+    struct hmap flood_remove_uuids = HMAP_INITIALIZER(&flood_remove_uuids);
     HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
+        ofctrl_flood_remove_add_node(&flood_remove_uuids, &ofrn->sb_uuid);
+    }
+
+    /* Iterate using the cloned hmap - 'flood_remove_uuids', but pass
+     * the hmap 'flood_remove_nodes' provided by the caller.
+     * flood_remove_flows_for_sb_uuid() will delete the other lflows
+     * referenced by the sb_uuid,  which needs to be re-added later
+     * if those sb_uuids were not deleted. The caller
+     * (in lflow.c) will add re-add lflows which are not deleted. */
+    HMAP_FOR_EACH_POP (ofrn, hmap_node, &flood_remove_uuids) {
         flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid,
                                        flood_remove_nodes);
+        free(ofrn);
     }
+    hmap_destroy(&flood_remove_uuids);
 
     /* remove any related group and meter info */
     HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
-- 
2.29.2

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to