On Tue, Feb 23, 2021 at 4:56 PM Dumitru Ceara <[email protected]> wrote: > > On 2/22/21 7:48 PM, [email protected] wrote: > > From: Numan Siddique <[email protected]> > > > > In one of the scaled deployments, ovn-controller is asserting with the > > below stack trace > > > > *** > > (gdb) bt > > 0 raise () from /lib64/libc.so.6 > > 1 abort () from /lib64/libc.so.6 > > 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at > > lib/util.c:419 > > 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at > > lib/vlog.c:1249 > > 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263 > > 5 ovs_assert_failure (where="controller/ofctrl.c:1216", > > function="flood_remove_flows_for_sb_uuid", > > > > condition="ovs_list_is_empty(&f->list_node)") at lib/util.c:86 > > 6 flood_remove_flows_for_sb_uuid (sb_uuid=...134, > > flood_remove_nodes=...ef0) at controller/ofctrl.c:1216 > > 9 ofctrl_flood_remove_flows (flood_remove_nodes=...ef0) at > > controller/ofctrl.c:1280 > > 10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP, > > ref_name= "5564_pg_14...aac") at controller/lflow.c:522 > > 11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP) > > at controller/ovn-controller.c:2220 > > 12 engine_compute () at lib/inc-proc-eng.c:306 > > 13 engine_run_node (recompute_allowed=true) at > > lib/inc-proc-eng.c:352 > > 14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377 > > 15 main () at controller/ovn-controller.c:2887 > > *** > > > > The reason for this assert is different from the assertion bug fixed in > > the commit 858d1dd716("ofctrl: Fix the assert seen when flood removing > > flows.") > > > > The above assertion is seen if lflow.c calls ofctrl_add_or_append_flow() > > twice for the same flow uuid 'S'. When this function is called for > > the first time, a new desired flow 'f' is created and 'f->references' > > will have [(S, f)]. When it is called for the 2nd time, the list > > 'f->references' is updated as [(S, f), (S,f)]. Later when > > flood_remove_flows_for_sb_uuid() is called for 'S', the above assertion > > is seen. > > > > This scenarion can happen when ovn-controller receives an update message > > from SB ovsdb-server in which a new logical flow 'F' belonging to a > > datapath 'D' is added which would result in conjunction flows. If this > > datapath 'D' is added to 'local_datapaths' in the same loop, then > > logical flow 'F' is first processed in lflow_add_flows_for_datapath() > > and the second time in lflow_handle_changed_flows(). > > > > This issue can be fixed in 2 ways > > 1. Flood remove all the tracked flows in lflow_handle_changed_flows() > > first and then process the modified or newly added logical flows. > > > > 2. In the function ofctrl_add_or_append_flow(), check if there is > > already a reference for the flow uuid 'S' in the existing desired > > flows and only append the actions if it doesn't exist. > > > > This patch has taken the approach (1) to ensure correctness of flows. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929978 > > Signed-off-by: Numan Siddique <[email protected]> > > --- > > Looks good to me, thanks! > > Acked-by: Dumitru Ceara <[email protected]>
Thanks Dumitru. I applied this patch to the main branch and backported upto branch-20.09. Thanks Numan > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
