On 6/17/26 6:07 PM, Xavier Simonart wrote:
> Hi Ilya,
> 
> Thanks for looking into this.
> 
> On Tue, Jun 16, 2026 at 7:48 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 6/16/26 10:54 AM, Xavier Simonart via dev wrote:
>     > Starting a scapy server might take time on slow/busy systems, and cause
>     > some tests to fail in a flaky way when started in the middle of the 
> test.
>     >
>     > This caused some tests to be flaky such as:
>     > - "IP packet buffering"
>     > - "virtual port claim race condition"
>     >
>     > Note that macros such as send_garp do not check anymore to skip the
>     > test if HAVE_SCAPY is not set as it is done through OVN_SCAPY_REQUIRED.
>     > OVN_SCAPY_REQUIRED is now enforced: a test calling send_garp, send_na or
>     > fmt_pkt will always fail if OVN_SCAPY_REQUIRED is not set.
>     >
>     > Signed-off-by: Xavier Simonart <[email protected] 
> <mailto:[email protected]>>
>     > ---
>     >  tests/ovn-ic.at <http://ovn-ic.at>     |   2 +-
>     >  tests/ovn-macros.at <http://ovn-macros.at> |  12 ++---
>     >  tests/ovn-util.at <http://ovn-util.at>   |   1 -
>     >  tests/ovn.at <http://ovn.at>        | 128 
> +++++++++++++++++++++++---------------------
>     >  tests/system-ovn.at <http://system-ovn.at> |   6 +--
>     >  5 files changed, 76 insertions(+), 73 deletions(-)
>     >
>     > diff --git a/tests/ovn-ic.at <http://ovn-ic.at> b/tests/ovn-ic.at 
> <http://ovn-ic.at>
>     > index 68d78d9e4..35a531d89 100644
>     > --- a/tests/ovn-ic.at <http://ovn-ic.at>
>     > +++ b/tests/ovn-ic.at <http://ovn-ic.at>
>     > @@ -2872,7 +2872,7 @@ AT_CLEANUP
>     >  OVN_FOR_EACH_NORTHD([
>     >  AT_SETUP([spine-leaf: 3 AZs, 3 HVs, 3 LSs, connected via transit spine 
> switch])
>     >  AT_KEYWORDS([spine leaf])
>     > -AT_SKIP_IF([test $HAVE_SCAPY = no])
>     > +OVN_SCAPY_REQUIRED
>     > 
>     >  ovn_init_ic_db
>     > 
>     > diff --git a/tests/ovn-macros.at <http://ovn-macros.at> 
> b/tests/ovn-macros.at <http://ovn-macros.at>
>     > index 8744ff6b3..dd91af3b5 100644
>     > --- a/tests/ovn-macros.at <http://ovn-macros.at>
>     > +++ b/tests/ovn-macros.at <http://ovn-macros.at>
>     > @@ -536,6 +536,11 @@ m4_define([OVN_CLEANUP_IC],[
>     >      fi
>     >  ])
>     > 
>     > +m4_define([OVN_SCAPY_REQUIRED],[
> 
>     I wonder if CHECK_SCAPY might be a better name, i.e. more in line
>     with some other stuff like CHECK_CONNTRACK.  WDYT?
> 
> Yes, it is more aligned with it, will change in v2 and use CHECK_SCAPY. 
> 
> 
>     > +    AT_SKIP_IF([test $HAVE_SCAPY = no])
>     > +    start_scapy_server
>     > +])
>     > +
>     >  m4_divert_push([PREPARE_TESTS])
>     > 
>     >  # ovn_init_db DATABASE [AZ]
>     > @@ -1294,7 +1299,6 @@ hex_to_mac() {
>     >  }
>     > 
>     >  send_garp() {
>     > -    AT_SKIP_IF([test $HAVE_SCAPY = no])
>     >      local hv=$1 inport=$2 op=$3 eth_src=$4 eth_dst=$5 spa=$6 tpa=$7
>     > 
>     >      local packet=$(fmt_pkt "Ether(dst='${eth_dst}', src='${eth_src}')/ 
> \
>     > @@ -1380,10 +1384,6 @@ ovn_trace_client() {
>     >  #
>     >  fmt_pkt() {
>     >      ctlfile=$ovs_base/scapy.ctl
>     > -    if [[ ! -S $ctlfile ]]; then
>     > -        start_scapy_server
>     > -    fi
>     > -    while [[ ! -S $ctlfile ]]; do sleep 0.1; done
>     >      ovs-appctl -t $ctlfile payload "$1"
>     >  }
>     > 
>     > @@ -1396,6 +1396,7 @@ start_scapy_server() {
>     >      flock -n $lockfile "$top_srcdir"/tests/scapy-server.py \
>     >          --pidfile=$pidfile --unixctl=$ctlfile --log-file=$logfile 
> --detach \
>     >      && on_exit "test -e \"$pidfile\" && ovs-appctl -t $ctlfile exit"
>     > +    while [[ ! -S $ctlfile ]]; do sleep 0.1; done
> 
>     Maybe we can use something like:
> 
>     OVS_WAIT_UNTIL([ovs-appctl -t $ctlfile list-commands])
> 
>     ?
> 
>     I'm mostly concerned that the loop is unbounded, even though that was the
>     case before as well.  But this also should be a more reliable check as it
>     waits until the daemon is actually fully functional and is able to process
>     josnrpc requests.
> 
> I was trying to reduce the size of the change (also because I never saw a 
> failure related to that),
> but I agree it makes sense. I will also update this in v2.
> 
> 
>     If we do that in the fmt_pkt(), maybe we'd also not need the SKIP_IF 
> changes?
>     Though changing the SKIP_IF into a simpler macro seems a little nicer 
> indeed.
> 
> Not sure I understand this: if we want to start scapy server at the beginning 
> of the test,
> (what this patch is about) then we need to change all those AT_SKIP_IF([test 
> $HAVE_SCAPY = no])
> to a macro which starts it. 
> Starting Scapy at the beginning of the test helps on slow systems where 
> starting
> scapy can take quite some time, longer than some ovn-controller timeout.

Maybe worth clarifying that in the commit message.  I was under impression
that the flakiness was due to occasional pkt_fmt failures, not because of
other random activities in ovn-controller.  Though if that's the case,
I suppose, the tests could still fail on slow systems.  Do we need some ways
to control the ovn-controller better?  e.g. time/stop ?

> 
> I'll send v2 updating the macro name, bounding the loop, and add an ovs-appctl
> (probably ovs-appctl version as I do not think scapy-server.py supports 
> list-commands).

Indeed, python jsonrpc server doesn't have a list-commands in the default
set of commands...

> 
> 
>     Best regards, Ilya Maximets.
> 
> Thanks
> Xavier 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to