On 2/22/21 7:48 PM, [email protected] wrote:
From: Numan Siddique <[email protected]>

In one of the scaled deployments, ovn-controller is asserting with the
below stack trace

     ***
      (gdb) bt
        0  raise () from /lib64/libc.so.6
        1  abort () from /lib64/libc.so.6
        2  ovs_abort_valist ("%s: assertion %s failed in %s()") at 
lib/util.c:419
        3  vlog_abort_valist ("%s: assertion %s failed in %s()") at 
lib/vlog.c:1249
        4  vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
        5  ovs_assert_failure (where="controller/ofctrl.c:1216",
                               function="flood_remove_flows_for_sb_uuid",
                               condition="ovs_list_is_empty(&f->list_node)") at 
lib/util.c:86
        6  flood_remove_flows_for_sb_uuid (sb_uuid=...134,
             flood_remove_nodes=...ef0) at controller/ofctrl.c:1216
        9  ofctrl_flood_remove_flows (flood_remove_nodes=...ef0) at 
controller/ofctrl.c:1280
        10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP,
             ref_name= "5564_pg_14...aac") at controller/lflow.c:522
        11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP)
             at controller/ovn-controller.c:2220
        12 engine_compute () at lib/inc-proc-eng.c:306
        13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352
        14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377
        15 main () at controller/ovn-controller.c:2887
     ***

The reason for this assert is different from the assertion bug fixed in
the commit 858d1dd716("ofctrl: Fix the assert seen when flood removing
flows.")

The above assertion is seen if lflow.c calls ofctrl_add_or_append_flow()
twice for the same flow uuid 'S'.  When this function is called for
the first time, a new desired flow 'f' is created and 'f->references'
will have [(S, f)].  When it is called for the 2nd time, the list
'f->references' is updated as [(S, f), (S,f)].  Later when
flood_remove_flows_for_sb_uuid() is called for 'S', the above assertion
is seen.

This scenarion can happen when ovn-controller receives an update message
from SB ovsdb-server in which a new logical flow 'F' belonging to a
datapath 'D' is added which would result in conjunction flows.  If this
datapath 'D' is added to 'local_datapaths' in the same loop, then
logical flow 'F' is first processed in lflow_add_flows_for_datapath()
and the second time in lflow_handle_changed_flows().

This issue can be fixed in 2 ways
1. Flood remove all the tracked flows in lflow_handle_changed_flows()
    first and then process the modified or newly added logical flows.

2. In the function ofctrl_add_or_append_flow(), check if there is
    already a reference for the flow uuid 'S' in the existing desired
    flows and only append the actions if it doesn't exist.

This patch has taken the approach (1) to ensure correctness of flows.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929978
Signed-off-by: Numan Siddique <[email protected]>
---

Looks good to me, thanks!

Acked-by: Dumitru Ceara <[email protected]>


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to