On Thu, May 1, 2025 at 4:41 PM Dumitru Ceara <dce...@redhat.com> wrote: > > On 4/30/25 5:37 PM, Numan Siddique wrote: > > On Wed, Apr 9, 2025 at 12:58 PM Mark Michelson via dev > > <ovs-dev@openvswitch.org> wrote: > >> > >> Thanks, Frode, > >> > >> Acked-by: Mark Michelson <mmich...@redhat.com> > > > > Thanks. > > > > Applied to main. > > > > Numan > > > > Hi Numan, Mark, Frode, > > Sorry, I should've probably mentioned this explicitly on the patch but > this uncovers some other underlying issue (didn't debug further yet) and > causes one of our tests to fail often in CI: > > https://github.com/ovn-org/ovn/actions/runs/14758505553/job/41433037753#step:10:5297 > > Also briefly discussed during the IRC meeting on April 3rd: > https://libera.irclog.whitequark.org/openvswitch/2025-04-03#37982200; > > I wonder if we should revert 27c0dc6b7b22 ("tests: Set inactivity_probe > for ovn-remote.") until we figure out the problem with the flaky test. > > What do you guys think?
I tried to debug the issue, but couldn't reproduce it locally when running the test in a loop. From the CI logs, it looks like the appctl command to exit ovn-controller timed out. ovs-appctl --timeout=10 -t ovn-controller exit I'm just confused why increasing the probe interval is causing this issue. Since the probe interval is now 15 seconds, is the IDL blocking for some reason and that's ovs-appctl times out ? After this commit "2a12cda890a" (controller, northd: Wait for cleanup before replying to exit), we wait for the cleanup to finish before exiting. Looks like either IDL or the cleanup is taking more than 10 seconds and that's ovs-appctl exit times out. I'm fine reverting this patch. But should we also debug if there is a bug in ovn-controller during exit ? Thanks Numan > > Regards, > Dumitru > > >> > >> On 3/17/25 13:41, Frode Nordahl wrote: > >>> Common macros set up a SSL or TCP connection for communication > >>> between ovn-controller and SB DB in the test environment. > >>> > >>> This connection is subject to the default inactivity_probe value > >>> of 5000ms. > >>> > >>> On slow systems, this may not be enough. Consequently set the > >>> inactivity_probe to 15000ms on both ends of the connection. > >>> > >>> Reported-at: https://launchpad.net/bugs/2103444 > >>> Signed-off-by: Frode Nordahl <fnord...@ubuntu.com> > >>> --- > >>> tests/ovn-macros.at | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at > >>> index 1ef511c25..573353e3d 100644 > >>> --- a/tests/ovn-macros.at > >>> +++ b/tests/ovn-macros.at > >>> @@ -613,12 +613,14 @@ ovn_start () { > >>> ovn-sbctl \ > >>> -- --id=@c create connection \ > >>> target=\"ptcp:0:127.0.0.1\" \ > >>> + inactivity_probe=15000 \ > >>> -- add SB_Global . connections @c > >>> elif test X$HAVE_OPENSSL = Xyes; then > >>> # Create the SB DB pssl+RBAC connection. > >>> ovn-sbctl \ > >>> -- --id=@c create connection \ > >>> target=\"pssl:0:127.0.0.1\" role=ovn-controller \ > >>> + inactivity_probe=15000 \ > >>> -- add SB_Global . connections @c > >>> local d=$ovs_base > >>> if test -n "$AZ"; then > >>> @@ -743,6 +745,7 @@ ovn_az_attach() { > >>> -- set Open_vSwitch . external-ids:hostname=$sandbox \ > >>> -- set Open_vSwitch . external-ids:system-id=$systemid \ > >>> -- set Open_vSwitch . external-ids:ovn-remote=$ovn_remote \ > >>> + -- set Open_vSwitch . > >>> external-ids:ovn-remote-probe-interval=15000 \ > >>> -- set Open_vSwitch . external-ids:ovn-encap-type=$encap \ > >>> -- set Open_vSwitch . external-ids:ovn-encap-ip=$ip \ > >>> -- --may-exist add-br br-int \ > >> > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev