On 6/16/26 5:46 PM, Xavier Simonart via dev wrote:
> As in [0], multiple load balancing system tests are randomly failing from
> time to time as they check that, after 10 or 20 requests sent to load
> balancer, all backends are at least reached once. Statistically, this is
> failing from time to time.
> [1] fixed such issues, but there are new occurrences.
> If after 10 requests we did not get the expected distribution, we
> send 10 more requests. We do that up to 30 times.

Hi, Xavier.  Are you sure this is what is happening here?
The chance that all 20 requests are sent to the same backend
supposed to be 1 to 2^20, which is a very small chance and
so it should not really happen in practice.  Maybe there is
a different reason here after all?  How frequently you see
the test failures?

> 
> [0] https://github.com/ovsrobot/ovn/actions/runs/27547031217/job/81423590350
> [1] c906da4f1dea: tests: Fixed load balancing system-tests
> 
> Fixes: 40a686e8e70f ("Add IPv6 support for lb health-check")
> Fixes: 33cfa4655fd7 ("tests: Move SCTP test from kernel only to general OVN 
> system tests.")
> Fixes: da5529438342 ("northd: Do not drop ip traffic with destination vip 
> expressed via template vars.")
> Signed-off-by: Xavier Simonart <[email protected]>
> ---
>  tests/system-ovn.at | 84 +++++++++++++++++++++------------------------
>  1 file changed, 39 insertions(+), 45 deletions(-)
> 
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 35df0ec2f..2cadbc6a7 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -5143,15 +5143,15 @@ OVS_WAIT_UNTIL(
>  )
>  
>  # From sw0-p2 send traffic to vip - 2001::a
> -for i in `seq 1 20`; do
> -    echo Request $i
> -    ovn-sbctl list service_monitor
> -    NS_CHECK_EXEC([sw0-p2], [wget http://[[2001::a]] -t 5 -T 1 
> --retry-connrefused -v -o wget$i.log])
> -done
> +OVS_WAIT_FOR_OUTPUT([
> +    for i in `seq 1 20`; do
> +        ovn-sbctl list service_monitor >> service_monitor.log
> +        NS_EXEC([sw0-p2], [wget http://[[2001::a]] -t 5 -T 1 
> --retry-connrefused -v -o wget$i.log])

I don't think this is a good change to replace NS_CHECK_EXEC
with a simple NS_EXEC.  As explained in commit:
  b087f2556514 ("tests: system-ovn: Fix force SNAT IP in load-balancer template 
test.")
It will take forever for this test to fail if there is an actual
issue in the pipeline and the packets are not delivered / conntrack
entries are not created.  It will take about 2.5 hours for the test
to actually fail, IIUC.  We should not have that.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to