Hi Ilya, Thanks for the review.
On Tue, Jun 16, 2026 at 6:30 PM Ilya Maximets <[email protected]> wrote: > On 6/16/26 5:46 PM, Xavier Simonart via dev wrote: > > As in [0], multiple load balancing system tests are randomly failing from > > time to time as they check that, after 10 or 20 requests sent to load > > balancer, all backends are at least reached once. Statistically, this is > > failing from time to time. > > [1] fixed such issues, but there are new occurrences. > > If after 10 requests we did not get the expected distribution, we > > send 10 more requests. We do that up to 30 times. > > Hi, Xavier. Are you sure this is what is happening here? > The chance that all 20 requests are sent to the same backend > supposed to be 1 to 2^20, which is a very small chance and > so it should not really happen in practice. Maybe there is > a different reason here after all? How frequently you see > the test failures? > It did not happen when we send 20 requests but it occurred during a test where we "only" send 10 requests (see the fix around line 17733), and we can see what's happening in the tcpdumps. I then changed all occurrences of that same pattern. However I agree that with 20 requests the probability becomes really low. With 10 requests it happens more often than we might think: we have roughly two patches a day, ovs-robot runs 4x system-tests (gcc, clang, userspace, dpdk), and we have roughly 40 occurrences of this pattern in system tests. So we run through this ~300 times per day... > > > > > [0] > https://github.com/ovsrobot/ovn/actions/runs/27547031217/job/81423590350 > > [1] c906da4f1dea: tests: Fixed load balancing system-tests > > > > Fixes: 40a686e8e70f ("Add IPv6 support for lb health-check") > > Fixes: 33cfa4655fd7 ("tests: Move SCTP test from kernel only to general > OVN system tests.") > > Fixes: da5529438342 ("northd: Do not drop ip traffic with destination > vip expressed via template vars.") > > Signed-off-by: Xavier Simonart <[email protected]> > > --- > > tests/system-ovn.at | 84 +++++++++++++++++++++------------------------ > > 1 file changed, 39 insertions(+), 45 deletions(-) > > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > index 35df0ec2f..2cadbc6a7 100644 > > --- a/tests/system-ovn.at > > +++ b/tests/system-ovn.at > > @@ -5143,15 +5143,15 @@ OVS_WAIT_UNTIL( > > ) > > > > # From sw0-p2 send traffic to vip - 2001::a > > -for i in `seq 1 20`; do > > - echo Request $i > > - ovn-sbctl list service_monitor > > - NS_CHECK_EXEC([sw0-p2], [wget http://[[2001::a]] -t 5 -T 1 > --retry-connrefused -v -o wget$i.log]) > > -done > > +OVS_WAIT_FOR_OUTPUT([ > > + for i in `seq 1 20`; do > > + ovn-sbctl list service_monitor >> service_monitor.log > > + NS_EXEC([sw0-p2], [wget http://[[2001::a]] -t 5 -T 1 > --retry-connrefused -v -o wget$i.log]) > > I don't think this is a good change to replace NS_CHECK_EXEC > with a simple NS_EXEC. As explained in commit: > b087f2556514 ("tests: system-ovn: Fix force SNAT IP in load-balancer > template test.") > It will take forever for this test to fail if there is an actual > issue in the pipeline and the packets are not delivered / conntrack > entries are not created. It will take about 2.5 hours for the test > to actually fail, IIUC. We should not have that. > I do not think that we can run NS_EXEC within OVS_WAIT_FOR_OUTPUT. So, instead I could simply ensure that we send 20 requests (i.e. only change the test which sends 10 for now). This should be enough to reduce the number of failures to less than one per year, and we can keep NS_CHECK_EXEC. I'll send v2. > > Best regards, Ilya Maximets. > Thanks Xavier _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
