On 3/14/25 11:05 AM, Ales Musil wrote: > On Thu, Mar 13, 2025 at 9:33 AM Xavier Simonart <xsimo...@redhat.com> wrote: > >> - Add ability to ignore some specific datapath or tables when comparing >> flows. >> This is necessary as, as of today, ovn does not properly remove datapath >> from list of local datapaths when some prots are removed. >> - Properly check ovn-nb/ovn-nb.sock when AZ is present. >> - Ignore flow differences in OFTABLE_MAC_BINDING, OFTABLE_MAC_LOOKUP >> and OFTABLE_MAC_CACHE_USE. >> >> Signed-off-by: Xavier Simonart <xsimo...@redhat.com> >> >> --- >> v2: - avoid using bash arrays. >> - use diff instead of comm -3 for flow diff as more readable. >> - avoid cat file | grep ... > file. >> - ignore flow differences in OFTABLE_MAC_BINDING, OFTABLE_MAC_LOOKUP >> and OFTABLE_MAC_CACHE_USE. >> v3: - Properly check ovn-nb/ovn-nb.sock when AZ is present in >> CHECK_RELATED_PORTS. >> - Add possibility to ignore some tables when comparing I+P and >> recompute flows. >> - When multiple hv were cleaned, v1 & v2 resulted in >> checking_recompute for >> hv1, cleanup hv1, checking_recompute for hv2, cleanup hv2. >> However, cleaning up hv1 usually results in chassis removal in hv2 >> and >> multiple node recomputes in hv2. We now first check_recomputes on >> all hv, >> then clean the hv. This highlighted a few more tests or ovn >> I+P/recompute >> differences on hv2, hv3... >> v4: - Fix those I+P/recompute differences in this patch (was erroneoulsy >> fixed in >> next patch of the serie). >> --- >> > > Hi Xavier,
Hi Xavier, Ales, > thank you for the patch. > > I'm slightly afraid that we are increasing complexity of tests in general, > now everyone will need to understand what is ok to ignore and why. > Not sure what to do about it though so I won't block the series, but > any idea how to improve it is welcome I suppose. > CI flakes are awful. I agree that this is a bit more complex but let's see how it goes. I only backported the bug fix patches from this series to stable branches. However, I suggest that if the CI turns out to be stable on main for a while we should also backport the test patches too. > tests/ovn-macros.at | 95 ++++++++++++++++++++++++++++++++++----------- >> tests/ovn.at | 42 +++++++++++++++----- >> 2 files changed, 106 insertions(+), 31 deletions(-) >> >> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at >> index 6a568e743..cdfa503da 100644 >> --- a/tests/ovn-macros.at >> +++ b/tests/ovn-macros.at >> @@ -196,7 +196,7 @@ m4_define([DUMP_FLOWS], [ >> sbox=$1 >> output_file=$2 >> as $sbox >> - ovs-ofctl dump-flows br-int | >> + ovs-ofctl dump-flows br-int | grep -v "NXST_FLOW reply" | >> sed 's/cookie=0x[[^,]]*/cookie=xx/g' | >> sed 's/duration=[[^,]]*/duration=xx/g' | >> sed 's/idle_age=[[^,]]*/idle_age=xx/g' | >> @@ -221,7 +221,7 @@ m4_define([CHECK_FLOWS_AFTER_RECOMPUTE], [ >> sbox=$2 >> # Make sure I+P has finalized his job before getting flows and >> comparing them after recompte. >> # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl >> for those. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> # Do wait twice to handle some potential race conditions >> check ovn-nbctl --wait=hv sync >> check ovn-nbctl --wait=hv sync >> @@ -230,15 +230,15 @@ m4_define([CHECK_FLOWS_AFTER_RECOMPUTE], [ >> as $sbox >> if test "$hv" != "vtep"; then >> # Get flows before and after recompute >> - DUMP_FLOWS([$sbox], [flows-$hv-1]) >> + DUMP_FLOWS([$sbox], [flows-$hv-before]) >> >> check ovn-appctl -t ovn-controller recompute >> # The recompute might cause some sb changes. Let controller catch >> up. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; >> then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> check ovn-nbctl --wait=hv sync >> fi >> - DUMP_FLOWS([$sbox], [flows-$hv-2]) >> - diff flows-$hv-1 flows-$hv-2 > flow-diff >> + DUMP_FLOWS([$sbox], [flows-$hv-after]) >> + diff flows-$hv-before flows-$hv-after > flow-diff >> AT_CHECK([wc -l < flow-diff], [0], [0 >> ]) >> fi >> @@ -251,7 +251,7 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [ >> AT_CAPTURE_FILE([related-ports-diff]) >> # Make sure I+P has finalized the job before getting flows and >> comparing them after recompte. >> # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl >> for those. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> # Do wait twice to handle some potential race conditions >> check ovn-nbctl --wait=hv sync >> check ovn-nbctl --wait=hv sync >> @@ -262,7 +262,7 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [ >> DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-before]) >> check ovn-appctl -t ovn-controller recompute >> # The recompute might cause some sb changes. Let controller catch >> up. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; >> then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> check ovn-nbctl --wait=hv sync >> fi >> DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-after]) >> @@ -275,14 +275,20 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [ >> fi >> ]) >> >> +# CHECK_AFTER_RECOMPUTE(hv, sbox, related_ports, ignored_dp) >> +# Check related ports (ignoring $related_ports) and FLOWS (ignoring flows >> from $ignored_dp and from $ignored_tables) >> +# by comparing IP flows and flows after recompute. >> +# We also ignore Flows from tables OFTABLE_MAC_BINDING, >> OFTABLE_MAC_LOOKUP, OFTABLE_MAC_CACHE_USE. >> m4_define([CHECK_AFTER_RECOMPUTE], [ >> hv=$1 >> sbox=$2 >> related_ports=$3 >> + ignored_dp=$4 >> + ignored_tables=$5 >> AT_CAPTURE_FILE([related-ports-diff]) >> # Make sure I+P has finalized his job before getting flows and >> comparing them after recompte. >> # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl >> for those. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> # Do wait twice to handle some potential race conditions >> check ovn-nbctl --wait=hv sync >> check ovn-nbctl --wait=hv sync >> @@ -290,25 +296,59 @@ m4_define([CHECK_AFTER_RECOMPUTE], [ >> >> as $sbox >> if test "$hv" != "vtep"; then >> - DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-1]) >> - DUMP_FLOWS([$sbox], [flows-$hv-1]) >> + DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-before]) >> + DUMP_FLOWS([$sbox], [flows-$hv-before]) >> check ovn-appctl -t ovn-controller recompute >> + echo "Checking after recompute for $sbox" >> # The recompute might cause some sb changes. Let controller catch >> up. >> - if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; >> then >> + if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e >> ${AZ_DIR}northd/ovn-northd.pid ]]; then >> check ovn-nbctl --wait=hv sync >> fi >> - DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-2]) >> - DUMP_FLOWS([$sbox], [flows-$hv-2]) >> - # Compare and store differences before and after recompute >> - comm -3 related-ports-$hv-1 related-ports-$hv-2 > >> related-ports-diff-$hv >> + DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-after]) >> + DUMP_FLOWS([$sbox], [flows-$hv-after]) >> + >> + # Compare and store related_ports differences before and after >> recompute >> + comm -3 related-ports-$hv-before related-ports-$hv-after > >> related-ports-diff-$hv >> # Ignore some differences. >> + if [[ -n "$related_ports" ]]; then >> + echo "Ignoring related ports $related_ports" >> + fi >> echo "$related_ports" | comm -2 -3 related-ports-diff-$hv - > >> related-ports-diff >> AT_CHECK([wc -l < related-ports-diff], [0], [0 >> ]) >> - diff flows-$hv-1 flows-$hv-2 > flow-diff >> - AT_CHECK([wc -l < flow-diff], [0], [0 >> + >> + # Compare and store flow differences before and after recompute >> + diff -u flows-$hv-before flows-$hv-after | grep "^+ \|^- " > >> flow-diff-$hv >> + if [[ -n "$related_ports" ]]; then >> + tag=$(ovn-sbctl --bare --columns tag list port_binding >> $related_ports) >> + if [[ -n "$tag" ]]; then >> + echo "Ignoring tag $tag" >> + cat flow-diff-$hv | grep -v "dl_vlan=$tag" > tmp; mv tmp >> flow-diff-$hv >> + fi >> + fi >> + >> + dps=$(echo "$ignored_dp" | tr ',' ' ') >> + for dp in $dps >> + do >> + if [[ -n "${dp}" ]]; then >> + key=$(printf '0x%x' $(ovn-sbctl --bare --columns tunnel_key >> list datapath $dp)) >> + if [[ -n "${key}" ]]; then >> + echo "Ignoring $dp i.e. tunnel_key $key" >> + cat flow-diff-$hv | grep -v "metadata=$key" | grep -v >> "load:${key}->OXM_OF_METADATA" > tmp; mv tmp flow-diff-$hv >> + fi >> + fi >> + done >> + >> + tables=$(echo "$ignored_tables" | tr ',' ' ') >> + for table in $tables >> + do >> + echo "Ignoring table $table" >> + cat flow-diff-$hv | grep -v "table=$table" > tmp; mv tmp >> flow-diff-$hv >> + done >> + >> + cat flow-diff-$hv | grep -v "table=OFTABLE_MAC_BINDING" | grep -v >> "table=OFTABLE_MAC_LOOKUP" | grep -v "table=OFTABLE_MAC_CACHE_USE" > tmp; >> mv tmp flow-diff-$hv >> + AT_CHECK([wc -l < flow-diff-$hv], [0], [0 >> ]) >> - diff flows-$hv-1 flows-$hv-2 > flow-diff >> fi >> ]) >> >> @@ -322,8 +362,11 @@ m4_define([OVN_CLEANUP_CONTROLLER],[ >> hv=$1 >> sbox=$2 >> related_ports=$3 >> + no_recompute_check=$4 >> + if test "$no_recompute_check" != "True"; then >> + CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$hv], [$sbox], >> [$related_ports]) >> + fi >> >> - CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$hv], [$sbox], [$related_ports]) >> if test "$hv" = "vtep"; then >> OVS_APP_EXIT_AND_WAIT([ovn-controller-vtep]) >> OVS_APP_EXIT_AND_WAIT([ovs-vtep]) >> @@ -342,8 +385,10 @@ m4_define([OVN_CLEANUP_SBOX],[ >> sbox=$1 >> error=$2 >> related_ports=$3 >> + no_recompute_check=$4 >> echo "$sbox: clean up sandbox" >> - OVN_CLEANUP_CONTROLLER([$sbox], [$sbox], [$related_ports]) >> + as $sbox >> + OVN_CLEANUP_CONTROLLER([$sbox], [$sbox], [$related_ports], >> [$no_recompute_check]) >> OVN_CLEANUP_VSWITCH([$sbox]) >> >> # Check for errors in logs. Ignore following errors: >> @@ -372,7 +417,12 @@ m4_define([OVN_CLEANUP],[ >> sbox=$(echo "sbox_and_error" |sed -n '1p') >> error=$(echo "sbox_and_error" | grep '/') >> related_ports=$(echo "sbox_and_error" | sed -n '2,$p' | grep -v >> '/') >> - OVN_CLEANUP_SBOX([$sbox], [$error], [$related_ports]) >> + CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$sbox], [$sbox], >> [$related_ports]) >> + ]) >> + m4_foreach([sbox_and_error], [$@], [ >> + sbox=$(echo "sbox_and_error" |sed -n '1p') >> + error=$(echo "sbox_and_error" | grep '/') >> + OVN_CLEANUP_SBOX([$sbox], [$error], [], ["True"]) >> ]) >> >> echo >> @@ -753,6 +803,7 @@ start_virtual_controller() { >> # ovn_setenv AZ >> ovn_setenv () { >> local d=$ovs_base/$1 >> + AS_VAR_SET([AZ_DIR], ["$d"/]); export $var >> AS_VAR_SET([OVN_NB_DB], [unix:"$d"/ovn-nb/ovn-nb.sock]); export $var >> AS_VAR_SET([OVN_SB_DB], [unix:"$d"/ovn-sb/ovn-sb.sock]); export $var >> } >> diff --git a/tests/ovn.at b/tests/ovn.at >> index f4abfb345..3d159d02b 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -9925,7 +9925,10 @@ n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp >> | wc -l) >> test "$n_arp" = 10 >> ]) >> >> -OVN_CLEANUP([hv1],[hv2]) >> +# lr0 & ls0 should not be local datapath on hv2 anymore, and hence >> +# ln_port should not be a related_port on hv2. >> +OVN_CLEANUP([hv1],[hv2 >> +ln_port]) >> >> AT_CLEANUP >> ]) >> @@ -25162,12 +25165,13 @@ m4_define([DVR_N_S_ARP_HANDLING], >> as hv4 ovs-appctl fdb/show br-phys >> >> # Expect some potential transaction errors for MAC_Binding >> - # Since router-to-underlay moved to hv3, ls-underlay is not local to >> hv1 anymore and ln3 should not be a related_port. >> + # Since router-to-underlay moved to hv3, ls-underlay is not local to >> hv1 or hv2 anymore. >> + # As ls-underlay is not local, ln3 should not be a related_port. >> OVN_CLEANUP([hv1 >> /transaction error/d >> ln3],[hv2 >> /transaction error/d >> - ],[hv3 >> +ln3],[hv3 >> /transaction error/d >> ],[hv4]) >> >> @@ -26352,7 +26356,10 @@ wait_row_count Service_Monitor 1 >> wait_row_count Service_Monitor 0 status=offline >> wait_row_count Service_Monitor 0 status=online >> >> -OVN_CLEANUP([hv1], [hv2]) >> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths >> on hv2, and ln_public not a related port for hv2. >> +OVN_CLEANUP([hv1], [hv2 >> +ln-public]) >> + >> AT_CLEANUP >> ]) >> >> @@ -26550,7 +26557,10 @@ wait_row_count Service_Monitor 1 >> wait_row_count Service_Monitor 0 status=offline >> wait_row_count Service_Monitor 0 status=online >> >> -OVN_CLEANUP([hv1], [hv2]) >> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths >> on hv2, and ln_public not a related port for hv2. >> +OVN_CLEANUP([hv1], [hv2 >> +ln-public]) >> + >> AT_CLEANUP >> ]) >> >> @@ -26669,7 +26679,10 @@ check_row_count Service_Monitor 0 >> >> AT_CHECK([test 1 = `grep -c "SCTP load balancers do not currently support >> health checks" northd/ovn-northd.log`]) >> >> -OVN_CLEANUP([hv1],[hv2]) >> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths >> on hv2, and ln_public not a related port for hv2. >> +OVN_CLEANUP([hv1], [hv2 >> +ln-public]) >> + >> AT_CLEANUP >> ]) >> >> @@ -34368,13 +34381,22 @@ test_ip vif11 f00000000010 06ac10010001 $sip >> $dip vif-north3 >> OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv5/vif-north3-tx.pcap], >> [vif-north3.expected]) >> >> # Since DR-S1, DR-S2 and DR-S3 are gw-chassis on resp. hv2, hv3 and hv4, >> -# S1, S2 and S3 should not be local datapaths on hv1, and their localnets >> +# S1, S2 and S3 should not be local datapaths on the other hvs, and their >> localnets >> # should not be related ports. >> OVN_CLEANUP([hv1 >> ln1 >> ln2 >> ln3 >> -],[hv2],[hv3],[hv4],[hv5]) >> +],[hv2 >> +ln2 >> +ln3],[hv3 >> +ln1 >> +ln3],[hv4 >> +ln1 >> +ln2],[hv5 >> +ln1 >> +ln2 >> +ln3]) >> AT_CLEANUP >> ]) >> >> @@ -40499,7 +40521,9 @@ OVS_WAIT_UNTIL([test $(as hv2 ovs-ofctl dump-flows >> br-int table=OFTABLE_OUTPUT_L >> check ovn-nbctl --wait=hv lsp-del multi >> OVS_WAIT_UNTIL([test $(as hv2 ovs-ofctl dump-flows br-int >> table=OFTABLE_OUTPUT_LARGE_PKT_DETECT | grep -c check_pkt_larger) -eq 0]) >> >> -OVN_CLEANUP([hv1],[hv2]) >> +# ls (and it's localnet ln) should not be local datapath to hv2 as multi >> was deleted. Still local datapath to hv1 due to lsp1. >> +OVN_CLEANUP([hv1],[hv2 >> +ln]) >> >> AT_CLEANUP >> ]) >> -- >> 2.47.1 >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > Acked-by: Ales Musil <amu...@redhat.com> > Applied to main. Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev