On 3/14/25 11:05 AM, Ales Musil wrote:
> On Thu, Mar 13, 2025 at 9:33 AM Xavier Simonart <xsimo...@redhat.com> wrote:
> 
>> - Add ability to ignore some specific datapath or tables when comparing
>> flows.
>>   This is necessary as, as of today, ovn does not properly remove datapath
>>   from list of local datapaths when some prots are removed.
>> - Properly check ovn-nb/ovn-nb.sock when AZ is present.
>> - Ignore flow differences in OFTABLE_MAC_BINDING, OFTABLE_MAC_LOOKUP
>>   and OFTABLE_MAC_CACHE_USE.
>>
>> Signed-off-by: Xavier Simonart <xsimo...@redhat.com>
>>
>> ---
>> v2: - avoid using bash arrays.
>>     - use diff instead of comm -3 for flow diff as more readable.
>>     - avoid cat file | grep ... > file.
>>     - ignore flow differences in OFTABLE_MAC_BINDING, OFTABLE_MAC_LOOKUP
>>       and OFTABLE_MAC_CACHE_USE.
>> v3: - Properly check ovn-nb/ovn-nb.sock when AZ is present in
>> CHECK_RELATED_PORTS.
>>     - Add possibility to ignore some tables when comparing I+P and
>> recompute flows.
>>     - When multiple hv were cleaned, v1 & v2 resulted in
>> checking_recompute for
>>       hv1, cleanup hv1, checking_recompute for hv2, cleanup hv2.
>>       However, cleaning up hv1 usually results in chassis removal in hv2
>> and
>>       multiple node recomputes in hv2. We now first check_recomputes on
>> all hv,
>>       then clean the hv. This highlighted a few more tests or ovn
>> I+P/recompute
>>       differences on hv2, hv3...
>> v4: - Fix those I+P/recompute differences in this patch (was erroneoulsy
>> fixed in
>>       next patch of the serie).
>> ---
>>
> 
> Hi Xavier,

Hi Xavier, Ales,

> thank you for the patch.
> 
> I'm slightly afraid that we are increasing complexity of tests in general,
> now everyone will need to understand what is ok to ignore and why.
> Not sure what to do about it though so I won't block the series, but
> any idea how to improve it is welcome I suppose.
> 

CI flakes are awful.  I agree that this is a bit more complex but let's
see how it goes.  I only backported the bug fix patches from this series
to stable branches.  However, I suggest that if the CI turns out to be
stable on main for a while we should also backport the test patches too.

>  tests/ovn-macros.at | 95 ++++++++++++++++++++++++++++++++++-----------
>>  tests/ovn.at        | 42 +++++++++++++++-----
>>  2 files changed, 106 insertions(+), 31 deletions(-)
>>
>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>> index 6a568e743..cdfa503da 100644
>> --- a/tests/ovn-macros.at
>> +++ b/tests/ovn-macros.at
>> @@ -196,7 +196,7 @@ m4_define([DUMP_FLOWS], [
>>      sbox=$1
>>      output_file=$2
>>      as $sbox
>> -    ovs-ofctl dump-flows br-int |
>> +    ovs-ofctl dump-flows br-int  | grep -v "NXST_FLOW reply" |
>>            sed 's/cookie=0x[[^,]]*/cookie=xx/g' |
>>            sed 's/duration=[[^,]]*/duration=xx/g' |
>>            sed 's/idle_age=[[^,]]*/idle_age=xx/g' |
>> @@ -221,7 +221,7 @@ m4_define([CHECK_FLOWS_AFTER_RECOMPUTE], [
>>      sbox=$2
>>      # Make sure I+P has finalized his job before getting flows and
>> comparing them after recompte.
>>      # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl
>> for those.
>> -    if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then
>> +    if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>          # Do wait twice to handle some potential race conditions
>>          check ovn-nbctl --wait=hv sync
>>          check ovn-nbctl --wait=hv sync
>> @@ -230,15 +230,15 @@ m4_define([CHECK_FLOWS_AFTER_RECOMPUTE], [
>>      as $sbox
>>      if test "$hv" != "vtep"; then
>>        # Get flows before and after recompute
>> -      DUMP_FLOWS([$sbox], [flows-$hv-1])
>> +      DUMP_FLOWS([$sbox], [flows-$hv-before])
>>
>>        check ovn-appctl -t ovn-controller recompute
>>        # The recompute might cause some sb changes. Let controller catch
>> up.
>> -      if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]];
>> then
>> +      if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>            check ovn-nbctl --wait=hv sync
>>        fi
>> -      DUMP_FLOWS([$sbox], [flows-$hv-2])
>> -      diff flows-$hv-1 flows-$hv-2 > flow-diff
>> +      DUMP_FLOWS([$sbox], [flows-$hv-after])
>> +      diff flows-$hv-before flows-$hv-after > flow-diff
>>        AT_CHECK([wc -l < flow-diff], [0], [0
>>  ])
>>      fi
>> @@ -251,7 +251,7 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [
>>      AT_CAPTURE_FILE([related-ports-diff])
>>      # Make sure I+P has finalized the job before getting flows and
>> comparing them after recompte.
>>      # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl
>> for those.
>> -    if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then
>> +    if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>          # Do wait twice to handle some potential race conditions
>>          check ovn-nbctl --wait=hv sync
>>          check ovn-nbctl --wait=hv sync
>> @@ -262,7 +262,7 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [
>>        DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-before])
>>        check ovn-appctl -t ovn-controller recompute
>>        # The recompute might cause some sb changes. Let controller catch
>> up.
>> -      if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]];
>> then
>> +      if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>            check ovn-nbctl --wait=hv sync
>>        fi
>>        DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-after])
>> @@ -275,14 +275,20 @@ m4_define([CHECK_RELATED_PORTS_AFTER_RECOMPUTE], [
>>      fi
>>  ])
>>
>> +# CHECK_AFTER_RECOMPUTE(hv, sbox, related_ports, ignored_dp)
>> +# Check related ports (ignoring $related_ports) and FLOWS (ignoring flows
>> from $ignored_dp and from $ignored_tables)
>> +# by comparing IP flows and flows after recompute.
>> +# We also ignore Flows from tables OFTABLE_MAC_BINDING,
>> OFTABLE_MAC_LOOKUP, OFTABLE_MAC_CACHE_USE.
>>  m4_define([CHECK_AFTER_RECOMPUTE], [
>>      hv=$1
>>      sbox=$2
>>      related_ports=$3
>> +    ignored_dp=$4
>> +    ignored_tables=$5
>>      AT_CAPTURE_FILE([related-ports-diff])
>>      # Make sure I+P has finalized his job before getting flows and
>> comparing them after recompte.
>>      # Some tests have northd and ovn-nb ovsdb stopped, so avoid ovn-nbctl
>> for those.
>> -    if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]]; then
>> +    if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>          # Do wait twice to handle some potential race conditions
>>          check ovn-nbctl --wait=hv sync
>>          check ovn-nbctl --wait=hv sync
>> @@ -290,25 +296,59 @@ m4_define([CHECK_AFTER_RECOMPUTE], [
>>
>>      as $sbox
>>      if test "$hv" != "vtep"; then
>> -      DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-1])
>> -      DUMP_FLOWS([$sbox], [flows-$hv-1])
>> +      DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-before])
>> +      DUMP_FLOWS([$sbox], [flows-$hv-before])
>>        check ovn-appctl -t ovn-controller recompute
>> +      echo "Checking after recompute for $sbox"
>>        # The recompute might cause some sb changes. Let controller catch
>> up.
>> -      if [[ -e ovn-nb/ovn-nb.sock ]] && [[ -e northd/ovn-northd.pid ]];
>> then
>> +      if [[ -e ${AZ_DIR}ovn-nb/ovn-nb.sock ]] && [[ -e
>> ${AZ_DIR}northd/ovn-northd.pid ]]; then
>>            check ovn-nbctl --wait=hv sync
>>        fi
>> -      DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-2])
>> -      DUMP_FLOWS([$sbox], [flows-$hv-2])
>> -      # Compare and store differences before and after recompute
>> -      comm -3 related-ports-$hv-1 related-ports-$hv-2 >
>> related-ports-diff-$hv
>> +      DUMP_RELATED_PORTS([$sbox], [related-ports-$hv-after])
>> +      DUMP_FLOWS([$sbox], [flows-$hv-after])
>> +
>> +      # Compare and store related_ports differences before and after
>> recompute
>> +      comm -3 related-ports-$hv-before related-ports-$hv-after >
>> related-ports-diff-$hv
>>        # Ignore some differences.
>> +      if [[ -n "$related_ports" ]]; then
>> +        echo "Ignoring related ports $related_ports"
>> +      fi
>>        echo "$related_ports" | comm -2 -3 related-ports-diff-$hv - >
>> related-ports-diff
>>        AT_CHECK([wc -l < related-ports-diff], [0], [0
>>  ])
>> -      diff flows-$hv-1 flows-$hv-2 > flow-diff
>> -      AT_CHECK([wc -l < flow-diff], [0], [0
>> +
>> +      # Compare and store flow differences before and after recompute
>> +      diff -u flows-$hv-before flows-$hv-after | grep "^+ \|^- " >
>> flow-diff-$hv
>> +      if [[ -n "$related_ports" ]]; then
>> +        tag=$(ovn-sbctl --bare --columns tag list port_binding
>> $related_ports)
>> +        if [[ -n "$tag" ]]; then
>> +          echo "Ignoring tag $tag"
>> +          cat flow-diff-$hv | grep -v "dl_vlan=$tag" > tmp; mv tmp
>> flow-diff-$hv
>> +        fi
>> +      fi
>> +
>> +      dps=$(echo "$ignored_dp" | tr ',' ' ')
>> +      for dp in $dps
>> +      do
>> +        if [[ -n "${dp}" ]]; then
>> +          key=$(printf '0x%x' $(ovn-sbctl --bare --columns tunnel_key
>> list datapath $dp))
>> +          if [[ -n "${key}" ]]; then
>> +            echo "Ignoring $dp i.e. tunnel_key $key"
>> +            cat flow-diff-$hv | grep -v "metadata=$key" | grep -v
>> "load:${key}->OXM_OF_METADATA" > tmp; mv tmp flow-diff-$hv
>> +          fi
>> +        fi
>> +      done
>> +
>> +      tables=$(echo "$ignored_tables" | tr ',' ' ')
>> +      for table in $tables
>> +      do
>> +        echo "Ignoring table $table"
>> +        cat flow-diff-$hv | grep -v "table=$table" > tmp; mv tmp
>> flow-diff-$hv
>> +      done
>> +
>> +      cat flow-diff-$hv | grep -v "table=OFTABLE_MAC_BINDING" | grep -v
>> "table=OFTABLE_MAC_LOOKUP" | grep -v "table=OFTABLE_MAC_CACHE_USE" > tmp;
>> mv tmp flow-diff-$hv
>> +      AT_CHECK([wc -l < flow-diff-$hv], [0], [0
>>  ])
>> -      diff flows-$hv-1 flows-$hv-2 > flow-diff
>>      fi
>>  ])
>>
>> @@ -322,8 +362,11 @@ m4_define([OVN_CLEANUP_CONTROLLER],[
>>      hv=$1
>>      sbox=$2
>>      related_ports=$3
>> +    no_recompute_check=$4
>> +    if test "$no_recompute_check" != "True"; then
>> +        CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$hv], [$sbox],
>> [$related_ports])
>> +    fi
>>
>> -    CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$hv], [$sbox], [$related_ports])
>>      if test "$hv" = "vtep"; then
>>          OVS_APP_EXIT_AND_WAIT([ovn-controller-vtep])
>>          OVS_APP_EXIT_AND_WAIT([ovs-vtep])
>> @@ -342,8 +385,10 @@ m4_define([OVN_CLEANUP_SBOX],[
>>      sbox=$1
>>      error=$2
>>      related_ports=$3
>> +    no_recompute_check=$4
>>      echo "$sbox: clean up sandbox"
>> -    OVN_CLEANUP_CONTROLLER([$sbox], [$sbox], [$related_ports])
>> +    as $sbox
>> +    OVN_CLEANUP_CONTROLLER([$sbox], [$sbox], [$related_ports],
>> [$no_recompute_check])
>>      OVN_CLEANUP_VSWITCH([$sbox])
>>
>>      # Check for errors in logs. Ignore following errors:
>> @@ -372,7 +417,12 @@ m4_define([OVN_CLEANUP],[
>>          sbox=$(echo "sbox_and_error" |sed -n '1p')
>>          error=$(echo "sbox_and_error"  | grep '/')
>>          related_ports=$(echo "sbox_and_error" | sed -n '2,$p' | grep -v
>> '/')
>> -        OVN_CLEANUP_SBOX([$sbox], [$error], [$related_ports])
>> +        CHECK_RELATED_PORTS_AFTER_RECOMPUTE([$sbox], [$sbox],
>> [$related_ports])
>> +    ])
>> +    m4_foreach([sbox_and_error], [$@], [
>> +        sbox=$(echo "sbox_and_error" |sed -n '1p')
>> +        error=$(echo "sbox_and_error"  | grep '/')
>> +        OVN_CLEANUP_SBOX([$sbox], [$error], [], ["True"])
>>      ])
>>
>>      echo
>> @@ -753,6 +803,7 @@ start_virtual_controller() {
>>  # ovn_setenv AZ
>>  ovn_setenv () {
>>      local d=$ovs_base/$1
>> +    AS_VAR_SET([AZ_DIR], ["$d"/]); export $var
>>      AS_VAR_SET([OVN_NB_DB], [unix:"$d"/ovn-nb/ovn-nb.sock]); export $var
>>      AS_VAR_SET([OVN_SB_DB], [unix:"$d"/ovn-sb/ovn-sb.sock]); export $var
>>  }
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index f4abfb345..3d159d02b 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -9925,7 +9925,10 @@ n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp
>> | wc -l)
>>  test "$n_arp" = 10
>>  ])
>>
>> -OVN_CLEANUP([hv1],[hv2])
>> +# lr0 & ls0 should not be local datapath on hv2 anymore, and hence
>> +# ln_port should not be a related_port on hv2.
>> +OVN_CLEANUP([hv1],[hv2
>> +ln_port])
>>
>>  AT_CLEANUP
>>  ])
>> @@ -25162,12 +25165,13 @@ m4_define([DVR_N_S_ARP_HANDLING],
>>     as hv4 ovs-appctl fdb/show br-phys
>>
>>     # Expect some potential transaction errors for MAC_Binding
>> -   # Since router-to-underlay moved to hv3, ls-underlay is not local to
>> hv1 anymore and ln3 should not be a related_port.
>> +   # Since router-to-underlay moved to hv3, ls-underlay is not local to
>> hv1 or hv2 anymore.
>> +   # As ls-underlay is not local, ln3 should not be a related_port.
>>     OVN_CLEANUP([hv1
>>     /transaction error/d
>>  ln3],[hv2
>>     /transaction error/d
>> -   ],[hv3
>> +ln3],[hv3
>>     /transaction error/d
>>     ],[hv4])
>>
>> @@ -26352,7 +26356,10 @@ wait_row_count Service_Monitor 1
>>  wait_row_count Service_Monitor 0 status=offline
>>  wait_row_count Service_Monitor 0 status=online
>>
>> -OVN_CLEANUP([hv1], [hv2])
>> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths
>> on hv2, and ln_public not a related port for hv2.
>> +OVN_CLEANUP([hv1], [hv2
>> +ln-public])
>> +
>>  AT_CLEANUP
>>  ])
>>
>> @@ -26550,7 +26557,10 @@ wait_row_count Service_Monitor 1
>>  wait_row_count Service_Monitor 0 status=offline
>>  wait_row_count Service_Monitor 0 status=online
>>
>> -OVN_CLEANUP([hv1], [hv2])
>> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths
>> on hv2, and ln_public not a related port for hv2.
>> +OVN_CLEANUP([hv1], [hv2
>> +ln-public])
>> +
>>  AT_CLEANUP
>>  ])
>>
>> @@ -26669,7 +26679,10 @@ check_row_count Service_Monitor 0
>>
>>  AT_CHECK([test 1 = `grep -c "SCTP load balancers do not currently support
>> health checks" northd/ovn-northd.log`])
>>
>> -OVN_CLEANUP([hv1],[hv2])
>> +# lr0_public is on hv1. Hence lr0 & public should not be local datapaths
>> on hv2, and ln_public not a related port for hv2.
>> +OVN_CLEANUP([hv1], [hv2
>> +ln-public])
>> +
>>  AT_CLEANUP
>>  ])
>>
>> @@ -34368,13 +34381,22 @@ test_ip vif11 f00000000010 06ac10010001 $sip
>> $dip vif-north3
>>  OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv5/vif-north3-tx.pcap],
>> [vif-north3.expected])
>>
>>  # Since DR-S1, DR-S2 and DR-S3 are gw-chassis on resp. hv2, hv3 and hv4,
>> -# S1, S2 and S3 should not be local datapaths on hv1, and their localnets
>> +# S1, S2 and S3 should not be local datapaths on the other hvs, and their
>> localnets
>>  # should not be related ports.
>>  OVN_CLEANUP([hv1
>>  ln1
>>  ln2
>>  ln3
>> -],[hv2],[hv3],[hv4],[hv5])
>> +],[hv2
>> +ln2
>> +ln3],[hv3
>> +ln1
>> +ln3],[hv4
>> +ln1
>> +ln2],[hv5
>> +ln1
>> +ln2
>> +ln3])
>>  AT_CLEANUP
>>  ])
>>
>> @@ -40499,7 +40521,9 @@ OVS_WAIT_UNTIL([test $(as hv2 ovs-ofctl dump-flows
>> br-int table=OFTABLE_OUTPUT_L
>>  check ovn-nbctl --wait=hv lsp-del multi
>>  OVS_WAIT_UNTIL([test $(as hv2 ovs-ofctl dump-flows br-int
>> table=OFTABLE_OUTPUT_LARGE_PKT_DETECT | grep -c check_pkt_larger) -eq 0])
>>
>> -OVN_CLEANUP([hv1],[hv2])
>> +# ls (and it's localnet ln) should not be local datapath to hv2 as multi
>> was deleted. Still local datapath to hv1 due to lsp1.
>> +OVN_CLEANUP([hv1],[hv2
>> +ln])
>>
>>  AT_CLEANUP
>>  ])
>> --
>> 2.47.1
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Acked-by: Ales Musil <amu...@redhat.com>
> 

Applied to main.

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to