On 28 Apr 2025, at 11:57, Daniel Niasoff via discuss wrote:
> Hi,
>
> We are deploying OpenStack 2024.2 using kolla on Ubuntu Noble. Using
> OVN as the network overlay.
>
> We have this issue when we enable qos on routers and networks, the
> openvswitch_vswitchd processes start hanging. Haven't tried with just
> one or the other, but it's shouldn't be possible to bring down a whole
> cluster with a bit of config.
>
> This occured with OpenStack 2023.2 on Jammy as well in the past. So
> this would have been an older version of Open vSwitch and I even tried
> with Open vSwitch 3.5.0.
>
> Just using a simple ingress/egress limits of 1000/1000 for just 1
> network and 500/500 for a single router,
>
> Here are the logs
>
> 2025-03-19T09:37:24.752Z|409501|connmgr|INFO|br-int<->unix#1: 8 flow_mods
> 43 s ago (6 adds, 2 deletes)
> 2025-03-19T09:38:39.945Z|410047|connmgr|INFO|br-int<->unix#1: 10 flow_mods
> in the 2 s starting 10 s ago (2 adds, 8 deletes)
> 2025-03-19T09:44:19.786Z|412166|connmgr|INFO|br-int<->unix#1: 4 flow_mods
> 10 s ago (2 adds, 2 deletes)
> 2025-03-19T09:45:19.786Z|412576|connmgr|INFO|br-int<->unix#1: 8 flow_mods
> in the 6 s starting 33 s ago (6 adds, 2 deletes)
> 2025-03-19T09:54:07.996Z|415871|connmgr|INFO|br-int<->unix#1: 8 flow_mods
> in the 1 s starting 10 s ago (2 adds, 6 deletes)
> 2025-03-19T09:54:52.517Z|416385|bridge|INFO|bridge br-int: deleted
> interface tap66d9c2a6-95 on port 101
> 2025-03-19T09:55:07.996Z|416743|connmgr|INFO|br-int<->unix#1: 331 flow_mods
> in the 8 s starting 23 s ago (21 adds, 310 deletes)
> 2025-03-19T09:56:07.996Z|417114|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 56 s ago (1 adds)
> 2025-03-19T09:56:54.831Z|417448|bridge|INFO|bridge br-int: added interface
> tapc19e70a1-68 on port 102
> 2025-03-19T09:56:54.860Z|417540|netdev_linux|WARN|tapc19e70a1-68: removing
> policing failed: No such device
> 2025-03-19T09:57:07.996Z|417902|connmgr|INFO|br-int<->unix#1: 207 flow_mods
> in the 1 s starting 13 s ago (197 adds, 10 deletes)
> 2025-03-19T10:00:12.730Z|419178|connmgr|INFO|br-int<->unix#1: 94 flow_mods
> 10 s ago (85 adds, 9 deletes)
> 2025-03-19T10:01:12.730Z|419549|connmgr|INFO|br-int<->unix#1: 6 flow_mods
> 37 s ago (4 adds, 2 deletes)
> 2025-03-19T10:05:54.525Z|421308|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 10 s ago (1 adds)
> 2025-03-19T10:06:54.526Z|421710|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 52 s ago (1 deletes)
> 2025-03-19T10:08:52.756Z|422418|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 10 s ago (1 adds)
> 2025-03-19T11:18:15.953Z|448775|connmgr|INFO|br-int<->unix#1: 176 flow_mods
> in the 8 s starting 10 s ago (31 adds, 145 deletes)
> 2025-03-19T11:31:30.570Z|453640|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 10 s ago (1 adds)
> 2025-03-19T11:32:30.570Z|454015|connmgr|INFO|br-int<->unix#1: 1 flow_mods
> 58 s ago (1 adds)
> 2025-03-19T11:35:09.140Z|539360|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting
> for handler1 to quiesce
> 2025-03-19T11:35:09.140Z|455059|ovs_rcu|WARN|blocked 1000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:35:10.140Z|539409|ovs_rcu(urcu9)|WARN|blocked 2000 ms waiting
> for handler1 to quiesce
> 2025-03-19T11:35:10.141Z|455106|ovs_rcu|WARN|blocked 2000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:35:12.140Z|539497|ovs_rcu(urcu9)|WARN|blocked 4001 ms waiting
> for handler1 to quiesce
> 2025-03-19T11:35:12.141Z|455192|ovs_rcu|WARN|blocked 4000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:35:16.140Z|539687|ovs_rcu(urcu9)|WARN|blocked 8000 ms waiting
> for handler1 to quiesce
> 2025-03-19T11:35:16.141Z|455387|ovs_rcu|WARN|blocked 8000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:35:24.139Z|540106|ovs_rcu(urcu9)|WARN|blocked 16000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:35:24.140Z|455837|ovs_rcu|WARN|blocked 16000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:35:40.139Z|541019|ovs_rcu(urcu9)|WARN|blocked 32000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:35:40.140Z|456773|ovs_rcu|WARN|blocked 32000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:36:12.139Z|542611|ovs_rcu(urcu9)|WARN|blocked 64000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:36:12.140Z|458417|ovs_rcu|WARN|blocked 64000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:37:16.140Z|545667|ovs_rcu(urcu9)|WARN|blocked 128000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:37:16.141Z|461499|ovs_rcu|WARN|blocked 128000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:39:24.139Z|551954|ovs_rcu(urcu9)|WARN|blocked 256000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:39:24.140Z|467913|ovs_rcu|WARN|blocked 256000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:43:40.140Z|564156|ovs_rcu(urcu9)|WARN|blocked 512000 ms
> waiting for handler1 to quiesce
> 2025-03-19T11:43:40.141Z|480412|ovs_rcu|WARN|blocked 512000 ms waiting for
> handler1 to quiesce
> 2025-03-19T11:50:04.648Z|00001|vlog|INFO|opened log file
> /var/log/kolla/openvswitch/ovs-vswitchd.log
Looks like the handler thread is blocked or busy. If you attach through GDB,
you might be able to see the call stack. Or, try looking at cat
/proc/[pid]/syscall or cat /proc/[pid]/status to determine the syscall. It
might also be that the process is blocked in the kernel.
I hope these pointers help get you started.
//Eelco
> Any ideas?
>
> Thanks
>
> Daniel
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss