Hello OVS folks, We are seeing a strange behavior with ovs-vswitchd when we upgraded ovn-controller to 24.09 from 23.09.
After ovn-controller is upgraded, ovs-vswitchd is blocked for a long time and because of which ovn-controller can't program the flows. The compute node was hosting just a few VMs. Once the VMs were migrated to other node, ovs-vswitchd and ovn-controller are back to normal. Below are some of the log messages seen in ovs-vswitchd.log. ------------------------------------------------- 2025-06-18T05:32:17.770Z|434224|timeval|WARN|Unreasonably long 59741ms poll interval (3942ms user, 55611ms system) 2025-06-18T05:32:17.770Z|434225|timeval|WARN|faults: 29 minor, 0 major 2025-06-18T05:32:17.770Z|434226|timeval|WARN|context switches: 0 voluntary, 95 involuntary 2025-06-18T05:32:17.770Z|434227|coverage|INFO|Skipping details of duplicate event coverage for hash=81106859 2025-06-18T05:32:17.770Z|434228|poll_loop|INFO|wakeup due to [POLLIN] on fd 91 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:274 (99% CPU usage) 2025-06-18T05:32:17.770Z|434229|poll_loop|INFO|wakeup due to [POLLIN] on fd 95 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (99% CPU usage) 2025-06-18T05:32:17.770Z|434230|poll_loop|INFO|wakeup due to [POLLOUT][POLLHUP] on fd 7 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:153 (99% CPU usage) 2025-06-18T05:32:17.770Z|434231|poll_loop|INFO|wakeup due to [POLLIN] on fd 107 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (99% CPU usage) 2025-06-18T05:32:17.774Z|11877|ofproto_dpif_upcall(handler28)|INFO|Dropped 45 log messages in last 60 seconds (most recently, 60 seconds ago) due to excessive rate 2025-06-18T05:32:17.774Z|07880|ofproto_dpif_upcall(revalidator40)|WARN|Dropped 36 log messages in last 60 seconds (most recently, 60 seconds ago) due to excessive rate 2025-06-18T05:32:17.774Z|11878|ofproto_dpif_upcall(handler28)|INFO|received packet on unassociated datapath port 7 (no OpenFlow port for datapath port 7) 2025-06-18T05:32:17.774Z|07881|ofproto_dpif_upcall(revalidator40)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:d1e6759c-66bb-46ae-be16-2f6cbbbc2839 2025-06-18T05:32:18.275Z|08136|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:cf11e102-9fc6-42b3-aadd-6e5a1a3603a8 2025-06-18T05:32:18.275Z|08137|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:2e069616-631a-491c-ad14-c0460b3ccad2 2025-06-18T05:32:18.275Z|08138|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:ef4f40b5-625e-47ee-9d26-09b816b497bf 2025-06-18T05:32:18.275Z|08139|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:f60d9bbc-4880-41c4-b799-e501b5d9f0a7 2025-06-18T05:32:18.275Z|08140|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:aec47146-a9ec-42fa-97a9-6896017c865a 2025-06-18T05:32:18.275Z|08141|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:e63cac09-4159-4187-8dff-64b432019bb3 2025-06-18T05:32:18.275Z|08142|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:f6984063-81ad-432b-96d9-22dadd9f8edb 2025-06-18T05:32:18.275Z|08143|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:609d6c3f-7093-4dba-bdb9-5607add1fb8a 2025-06-18T05:32:18.275Z|08144|ofproto_dpif_upcall(revalidator46)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:81887285-1a37-47d5-959a-85fa6a3547cc 2025-06-18T05:32:18.780Z|466262|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting for main to quiesce 2025-06-18T05:32:19.779Z|466263|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for main to quiesce 2025-06-18T05:32:21.779Z|466264|ovs_rcu(urcu4)|WARN|blocked 4000 ms waiting for main to quiesce 2025-06-18T05:32:25.780Z|466265|ovs_rcu(urcu4)|WARN|blocked 8001 ms waiting for main to quiesce 2025-06-18T05:32:26.092Z|353399|ofproto_dpif_xlate(handler17)|WARN|Dropped 3 log messages in last 105 seconds (most recently, 86 seconds ago) due to excessive rate 2025-06-18T05:32:26.092Z|353400|ofproto_dpif_xlate(handler17)|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=4,dl_vlan=120,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=00:00:5e:00:01:04,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=100.100.100.1,arp_tpa=100.100.100.1,arp_op=1,arp_sha=00:00:5e:00:01:04,arp_tha=00:00:00:00:00:00 2025-06-18T05:32:33.781Z|466266|ovs_rcu(urcu4)|WARN|blocked 16001 ms waiting for main to quiesce 2025-06-18T05:32:49.780Z|466267|ovs_rcu(urcu4)|WARN|blocked 32000 ms waiting for main to quiesce 2025-06-18T05:33:06.782Z|434232|rconn|WARN|br-int<->unix#963254: connection dropped (Broken pipe) 2025-06-18T05:33:06.783Z|434233|timeval|WARN|Unreasonably long 49013ms poll interval (3146ms user, 45722ms system) ... ... 2025-06-18T05:32:33.781Z|466266|ovs_rcu(urcu4)|WARN|blocked 16001 ms waiting for main to quiesce 2025-06-18T05:32:49.780Z|466267|ovs_rcu(urcu4)|WARN|blocked 32000 ms waiting for main to quiesce 2025-06-18T05:33:06.782Z|434232|rconn|WARN|br-int<->unix#963254: connection dropped (Broken pipe) 2025-06-18T05:33:06.783Z|434233|timeval|WARN|Unreasonably long 49013ms poll interval (3146ms user, 45722ms system) 2025-06-18T05:33:06.783Z|434234|timeval|WARN|faults: 3 minor, 0 major 2025-06-18T05:33:06.783Z|434235|timeval|WARN|context switches: 2 voluntary, 38 involuntary 2025-06-18T05:33:06.783Z|434236|coverage|INFO|Skipping details of duplicate event coverage for hash=81106859 2025-06-18T05:33:06.783Z|434237|poll_loop|INFO|wakeup due to [POLLIN] on fd 91 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:274 (99% CPU usage) 2025-06-18T05:33:06.783Z|434238|poll_loop|INFO|wakeup due to [POLLIN] on fd 95 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (99% CPU usage) 2025-06-18T05:33:06.783Z|434239|poll_loop|INFO|wakeup due to [POLLOUT] on fd 104 (/var/run/openvswitch/ovs-vswitchd.12738.ctl<->) at ../lib/stream-fd.c:153 (99% CPU usage) 2025-06-18T05:33:06.783Z|434240|poll_loop|INFO|wakeup due to [POLLOUT][POLLHUP] on fd 7 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:153 (99% CPU usage) 2025-06-18T05:33:06.783Z|434241|poll_loop|INFO|wakeup due to [POLLIN] on fd 107 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (99% CPU usage) 2025-06-18T05:33:06.787Z|12945|ofproto_dpif_upcall(handler35)|INFO|Dropped 49 log messages in last 49 seconds (most recently, 49 seconds ago) due to excessive rate 2025-06-18T05:33:06.787Z|12946|ofproto_dpif_upcall(handler35)|INFO|received packet on unassociated datapath port 7 (no OpenFlow port for datapath port 7) 2025-06-18T05:33:06.791Z|434242|rconn|WARN|br-int<->unix#963255: connection dropped (Broken pipe) ... ------------------------------------ We took core dump when its blocked, and we see the below stack trace #0 0x00007f48203205bf in recvmsg () from /lib64/libc.so.6 #1 0x000056195f0be300 in nl_sock_recv__ (sock=0x5619680045a0, buf=buf@entry=0x7fff259e9070, nsid=nsid@entry=0x0, wait=wait@entry=false) at ../lib/netlink-socket.c:712 #2 0x000056195f0c22d9 in nl_dump_refill (buffer=<optimized out>, dump=<optimized out>) at ../lib/netlink-socket.c:1182 #3 nl_dump_next (dump=dump@entry=0x7fff259e90b0, reply=reply@entry=0x7fff259e9030, buffer=buffer@entry=0x7fff259e9070) at ../lib/netlink-socket.c:1265 #4 0x000056195f0c2a5e in nl_ct_flush_zone (flush_zone=2472) at ../lib/netlink-conntrack.c:312 #5 0x000056195ef2b970 in handle_nxt_ct_flush_zone (oh=<optimized out>, ofconn=0x56197640d880) at ../ofproto/ofproto.c:948 #6 handle_single_part_openflow (type=OFPTYPE_CT_FLUSH_ZONE, oh=<optimized out>, ofconn=0x56197640d880) at ../ofproto/ofproto.c:8870 #7 handle_openflow (ofconn=0x56197640d880, msgs=0x7fff259e93f0) at ../ofproto/ofproto.c:8936 #8 0x000056195ef72106 in ofconn_run (handle_openflow=0x56195ef2b480 <handle_openflow>, ofconn=0x56197640d880) at ../ofproto/connmgr.c:1329 #9 connmgr_run (mgr=0x5619680b10d0, handle_openflow=0x56195ef2b480 <handle_openflow>) at ../ofproto/connmgr.c:356 #10 0x000056195ef278ee in ofproto_run (p=<optimized out>) at ../ofproto/ofproto.c:1967 #11 0x000056195ef0c43c in bridge_run__ () at ../vswitchd/bridge.c:3241 #12 0x000056195ef1993e in bridge_run () at ../vswitchd/bridge.c:3300 #13 0x000056195e818de6 in main (argc=<optimized out>, argv=<optimized out>) at ../vswitchd/ovs-vswitchd.c:130 Looks like something is wrong in the kernel. Any pointers or suggestions would be great. Thanks Numan _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev