On 02.05.2025 01:59, Numan Siddique wrote:
On Thu, May 1, 2025 at 7:44 PM Numan Siddique <num...@ovn.org> wrote:

On Thu, May 1, 2025 at 4:41 PM Dumitru Ceara <dce...@redhat.com> wrote:

On 4/30/25 5:37 PM, Numan Siddique wrote:
On Wed, Apr 9, 2025 at 12:58 PM Mark Michelson via dev
<ovs-dev@openvswitch.org> wrote:

Thanks, Frode,

Acked-by: Mark Michelson <mmich...@redhat.com>

Thanks.

Applied to main.

Numan


Hi Numan, Mark, Frode,

Sorry, I should've probably mentioned this explicitly on the patch but
this uncovers some other underlying issue (didn't debug further yet) and
causes one of our tests to fail often in CI:

https://github.com/ovn-org/ovn/actions/runs/14758505553/job/41433037753#step:10:5297

Also briefly discussed during the IRC meeting on April 3rd:
https://libera.irclog.whitequark.org/openvswitch/2025-04-03#37982200;

Indeed we did, I never got around to looking further after that and probably 
should have updated the patch status on patchwork.

I wonder if we should revert 27c0dc6b7b22 ("tests: Set inactivity_probe
for ovn-remote.") until we figure out the problem with the flaky test.

What do you guys think?

I tried to debug the issue, but couldn't reproduce it locally when
running the test in a loop.
 From the CI logs,  it looks like the appctl command to exit
ovn-controller  timed out.

ovs-appctl --timeout=10 -t ovn-controller exit

I'm just confused why increasing the probe interval is causing this  issue.
Since the probe interval is now 15 seconds,  is the IDL blocking for
some reason and that's ovs-appctl times out ?

After this commit "2a12cda890a" (controller, northd: Wait for cleanup
before replying to exit),  we wait for the cleanup
to finish before exiting.   Looks like either IDL or the cleanup is
taking more than 10 seconds and that's ovs-appctl exit times out.

I'm fine reverting this patch.  But should we also debug if there is a
bug in ovn-controller during exit ?

I've submitted a patch to revert -
https://patchwork.ozlabs.org/project/ovn/patch/20250501235530.140417-1-num...@ovn.org/

The last I looked at the one test that failed on the change ("ovn-controller - 
Chassis other_config"), it appeared to me that the test relies on some unsafe 
assumptions in the space of how the controller reacts to change of system-id and how RBAC 
rules are enforced.

Never got to the bottom of it though. I have another upstream issue to 
investigate this morning, so I'll try to have another look at this, and failing 
to find any resolution I'm happy to have it reverted until we do.

--
Frode Nordahl

Numan


Thanks
Numan


Regards,
Dumitru


On 3/17/25 13:41, Frode Nordahl wrote:
Common macros set up a SSL or TCP connection for communication
between ovn-controller and SB DB in the test environment.

This connection is subject to the default inactivity_probe value
of 5000ms.

On slow systems, this may not be enough.  Consequently set the
inactivity_probe to 15000ms on both ends of the connection.

Reported-at: https://launchpad.net/bugs/2103444
Signed-off-by: Frode Nordahl <fnord...@ubuntu.com>
---
   tests/ovn-macros.at | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 1ef511c25..573353e3d 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -613,12 +613,14 @@ ovn_start () {
           ovn-sbctl \
               -- --id=@c create connection \
                   target=\"ptcp:0:127.0.0.1\" \
+                inactivity_probe=15000 \
               -- add SB_Global . connections @c
       elif test X$HAVE_OPENSSL = Xyes; then
           # Create the SB DB pssl+RBAC connection.
           ovn-sbctl \
               -- --id=@c create connection \
                   target=\"pssl:0:127.0.0.1\" role=ovn-controller \
+                inactivity_probe=15000 \
               -- add SB_Global . connections @c
           local d=$ovs_base
           if test -n "$AZ"; then
@@ -743,6 +745,7 @@ ovn_az_attach() {
           -- set Open_vSwitch . external-ids:hostname=$sandbox \
           -- set Open_vSwitch . external-ids:system-id=$systemid \
           -- set Open_vSwitch . external-ids:ovn-remote=$ovn_remote \
+        -- set Open_vSwitch . external-ids:ovn-remote-probe-interval=15000 \
           -- set Open_vSwitch . external-ids:ovn-encap-type=$encap \
           -- set Open_vSwitch . external-ids:ovn-encap-ip=$ip \
           -- --may-exist add-br br-int \

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to