Re: [ovs-dev] [PATCH] tests: bfd: Fix waiting time after re-enabling BFD in decay test.

Eelco Chaudron via dev Tue, 09 Jun 2026 00:54:34 -0700

On 8 Jun 2026, at 20:42, Ilya Maximets wrote:

> The BFD decay test disables BFD on one of the ports, making both sides
> to go Down.  Then it re-enables BFD and expects them to be Up within
> 1.5 seconds.  This seems reasonable given the 300-500 ms configured
> timings.  However, while not in the Up state, the minimal transmission
> time is increased to be at least 1,000,000 microseconds, according to
> RFC 5880 Section 6.8.3:
>
>    When bfd.SessionState is not Up, the system MUST set
>    bfd.DesiredMinTxInterval to a value of not less than one second
>    (1,000,000 microseconds).  This is intended to ensure that the
>    bandwidth consumed by BFD sessions that are not Up is negligible,
>    particularly in the case where a neighbor may not be running BFD.
>
> And this is correctly implemented in bfd_min_tx() function.
>
> Since both sides are not Up, it takes at least two round trips for the
> states to converge.  There is a 25% randomness baked into the messages,
> so it is at least 750 ms per message, i.e., at least 1500 ms total, if
> we're very lucky.
>
> There is extra overhead in the test due to execution of the unixctl
> commands, actual packet processing, and the time it takes to execute
> the next checks.  That seems to push the timing a little and make the
> overall wait of just 1500 ms enough for the test to pass.  However,
> if the randomness is not in our favor, it may not be enough.  Ideally,
> we need at least 2000 ms, or better 2500 ms, to be sure that all
> exchanges are complete and the states are properly set.  To be safe,
> it might be better to use 3500 ms even.
>
> 3500 ms should not be enough to trigger decay, as state changes reset
> the decay timer.  So, increasing the wait times this way should not
> affect the later checks.
>
> Without this change, the BFD decay test fails on my laptop in ~3% of
> the cases.  With this change, I was not able to reproduce the failure
> after 1500 iterations.
>
> We see occasional failures of this test in our CI, but they are mostly
> covered by the automatic re-check.  It's rare to see the test fail
> twice in a row to trigger the full CI failure, but it definitely does
> happen from time to time.  The failures tend to be more frequent on
> different architectures like arm or s390.  This test was flaky for as
> long as I remember working on OVS.
>
> I'm not sure if this change covers all the failures of this particular
> test, but it definitely covers a lot of them.
>
> Fixes: c1c4e8c76912 ("bfd: Implement BFD decay.")
> Signed-off-by: Ilya Maximets <[email protected]>

Thanks for fixing this Ilya. The change looks good to me.

Acked-by: Eelco Chaudron <[email protected]>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tests: bfd: Fix waiting time after re-enabling BFD in decay test.

Reply via email to