Hi Folks,
Whilst launching a lot (150+) of VMs on a hypervisor we occasionally
see errors like
"libvirt: error : internal error: Unable to add port vnet114 to OVS
bridge br-int"
in var/log/messages, for example:
Feb 25 04:13:15 pcacn001 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl
--timeout=5 -- --may-exist add-port br-int vnet87 -- set Interface vnet87
"external-ids:attached-mac=\"00:13:97:42:ff:5d\"" -- set Interface vnet87
"external-ids:iface-id=\"b652aef0-b4c2-11ed-9fba-a8698c05eb74\"" -- set
Interface vnet87
"external-ids:vm-id=\"ba70d0b3-9b47-48e8-9ed4-a8bf04bae901\"" -- set
Interface vnet87 external-ids:iface-status=active
Feb 25 04:13:20 pcacn001 journal: internal error: Child process (ovs-vsctl
--timeout=5 -- --may-exist add-port br-int vnet87 -- set Interface vnet87
'external-ids:attached-mac="00:13:97:42:ff:5d"' -- set Interface vnet87
'external-ids:iface-id="b652aef0-b4c2-11ed-9fba-a8698c05eb74"' -- set
Interface vnet87 'external-ids:vm-id="ba70d0b3-9b47-48e8-9ed4-a8bf04bae901"'
-- set Interface vnet87 external-ids:iface-status=active) unexpected fatal
signal 14: 2023-02-25T04:13:20Z|00002|fatal_signal|WARN|terminating with
signal 14 (Alarm clock)
Feb 25 04:13:20 pcacn001 journal: internal error: Unable to add port vnet87
to OVS bridge br-int
There is nothing suspicious in ovn-controller log, but the ovsdb-server and
ovs-vswitchd logs for this time period do contain WARNs
ovsdb-server.log
================
2023-02-25T04:13:08.896Z|108756|jsonrpc|WARN|unix#84848: receive error:
Connection reset by peer
2023-02-25T04:13:08.896Z|108757|reconnect|WARN|unix#84848: connection
dropped (Connection reset by peer)
2023-02-25T04:13:08.910Z|108758|reconnect|WARN|unix#84850: connection
dropped (Connection reset by peer)
2023-02-25T04:13:08.918Z|108759|reconnect|WARN|unix#84851: connection
dropped (Connection reset by peer)
2023-02-25T04:13:08.925Z|108760|reconnect|WARN|unix#84852: connection
dropped (Connection reset by peer)
2023-02-25T04:13:08.932Z|108761|reconnect|WARN|unix#84853: connection
dropped (Connection reset by peer)
2023-02-25T04:13:08.938Z|108762|reconnect|WARN|unix#84854: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.238Z|108763|reconnect|WARN|unix#84855: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.277Z|108764|reconnect|WARN|unix#84856: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.284Z|108765|reconnect|WARN|unix#84857: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.300Z|108766|reconnect|WARN|unix#84859: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.308Z|108767|reconnect|WARN|unix#84860: connection
dropped (Connection reset by peer)
2023-02-25T04:13:09.323Z|108768|reconnect|WARN|unix#84862: connection
dropped (Connection reset by peer)
2023-02-25T04:14:46.220Z|108769|jsonrpc|WARN|Dropped 11 log messages in
last 97 seconds (most recently, 97 seconds ago) due to excessive rate
2023-02-25T04:14:46.220Z|108770|jsonrpc|WARN|unix#84870: receive error:
Connection reset by peer
ovs-vswitchd.log
=================
2023-02-25T04:13:00.849Z|00258|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:01.694Z|04014|timeval|WARN|Unreasonably long 1910ms
poll interval (902ms user, 1007ms system)
2023-02-25T04:13:01.694Z|04015|timeval|WARN|faults: 21 minor, 0 major
2023-02-25T04:13:01.694Z|04016|timeval|WARN|context switches: 0
voluntary, 2 involuntary
2023-02-25T04:13:01.694Z|04017|coverage|INFO|Dropped 12 log messages in
last 65 seconds (most recently, 40 seconds ago) due to excessive rate
2023-02-25T04:13:01.694Z|04018|coverage|INFO|Skipping details of
duplicate event coverage for hash=ce7b9f83
2023-02-25T04:13:01.722Z|04019|bridge|INFO|bridge br-int: added
interface vnet86 on port 204
2023-02-25T04:13:02.695Z|00259|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:03.676Z|04020|timeval|WARN|Unreasonably long 1982ms
poll interval (895ms user, 1084ms system)
2023-02-25T04:13:03.676Z|04021|timeval|WARN|faults: 1 minor, 0 major
2023-02-25T04:13:03.676Z|04022|timeval|WARN|context switches: 0
voluntary, 5 involuntary
2023-02-25T04:13:04.676Z|00260|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:05.658Z|04023|timeval|WARN|Unreasonably long 1982ms
poll interval (922ms user, 1059ms system)
2023-02-25T04:13:05.658Z|04024|timeval|WARN|faults: 18 minor, 0 major
2023-02-25T04:13:05.658Z|04025|timeval|WARN|context switches: 2
voluntary, 5 involuntary
2023-02-25T04:13:05.658Z|04026|poll_loop|INFO|wakeup due to [POLLIN] on
fd 15 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (101% CPU
usage)
2023-02-25T04:13:06.672Z|00261|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:07.616Z|04027|timeval|WARN|Unreasonably long 1958ms
poll interval (916ms user, 1025ms system)
2023-02-25T04:13:07.616Z|04028|timeval|WARN|context switches: 3
voluntary, 6 involuntary
2023-02-25T04:13:07.616Z|04029|poll_loop|INFO|wakeup due to [POLLIN] on
fd 15 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (101% CPU
usage)
2023-02-25T04:13:08.616Z|00262|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:09.536Z|04030|timeval|WARN|Unreasonably long 1920ms
poll interval (919ms user, 999ms system)
2023-02-25T04:13:09.536Z|04031|timeval|WARN|faults: 2 minor, 0 major
2023-02-25T04:13:09.536Z|04032|timeval|WARN|context switches: 4
voluntary, 3 involuntary
2023-02-25T04:13:09.536Z|04033|poll_loop|INFO|wakeup due to [POLLIN] on
fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1401 (101%
CPU usage)
2023-02-25T04:13:09.537Z|04034|poll_loop|INFO|wakeup due to [POLLIN] on
fd 22 (FIFO pipe:[52276]) at vswitchd/bridge.c:423 (101% CPU usage)
2023-02-25T04:13:09.537Z|04035|poll_loop|INFO|wakeup due to [POLLIN] on
fd 529 (/var/run/openvswitch/br-int.mgmt<->) at lib/stream-fd.c:157
(101% CPU usage)
2023-02-25T04:13:10.538Z|00263|ovs_rcu(urcu9)|WARN|blocked 1001 ms
waiting for main to quiesce
2023-02-25T04:13:11.512Z|04036|timeval|WARN|Unreasonably long 1976ms
poll interval (918ms user, 1023ms system)
2023-02-25T04:13:11.512Z|04037|timeval|WARN|faults: 3 minor, 0 major
2023-02-25T04:13:11.512Z|04038|timeval|WARN|context switches: 8
voluntary, 5 involuntary
2023-02-25T04:13:11.512Z|04039|poll_loop|INFO|wakeup due to [POLLIN] on
fd 529 (/var/run/openvswitch/br-int.mgmt<->) at lib/stream-fd.c:157
(101% CPU usage)
2023-02-25T04:13:11.522Z|04040|poll_loop|INFO|wakeup due to 0-ms timeout
at ofproto/ofproto-dpif.c:1976 (101% CPU usage)
2023-02-25T04:13:11.972Z|04041|poll_loop|INFO|Dropped 1 log messages in
last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2023-02-25T04:13:11.972Z|04042|poll_loop|INFO|wakeup due to [POLLIN] on
fd 22 (FIFO pipe:[52276]) at vswitchd/bridge.c:423 (101% CPU usage)
2023-02-25T04:13:16.472Z|00264|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:17.472Z|00265|ovs_rcu(urcu9)|WARN|blocked 2000 ms
waiting for main to quiesce
2023-02-25T04:13:18.405Z|04043|connmgr|INFO|br-int<->unix#1: 60
flow_mods in the 16 s starting 25 s ago (58 adds, 2 deletes)
2023-02-25T04:13:18.487Z|04044|timeval|WARN|Unreasonably long 3303ms
poll interval (1569ms user, 1731ms system)
2023-02-25T04:13:18.487Z|04045|timeval|WARN|faults: 11 minor, 0 major
2023-02-25T04:13:18.487Z|04046|timeval|WARN|context switches: 1
voluntary, 8 involuntary
2023-02-25T04:13:18.487Z|04047|poll_loop|INFO|Dropped 10 log messages in
last 6 seconds (most recently, 3 seconds ago) due to excessive rate
2023-02-25T04:13:18.487Z|04048|poll_loop|INFO|wakeup due to [POLLIN] on
fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1401 (57%
CPU usage)
2023-02-25T04:13:18.518Z|04049|bridge|INFO|bridge br-int: added
interface vnet87 on port 205
2023-02-25T04:13:19.487Z|00266|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:20.475Z|04050|ofproto|WARN|br-int: cannot get STP
status on nonexistent port 205
2023-02-25T04:13:20.475Z|04051|ofproto|WARN|br-int: cannot get RSTP
status on nonexistent port 205
2023-02-25T04:13:20.547Z|04052|timeval|WARN|Unreasonably long 2060ms
poll interval (910ms user, 1092ms system)
2023-02-25T04:13:20.548Z|04053|timeval|WARN|faults: 9 minor, 0 major
2023-02-25T04:13:20.548Z|04054|timeval|WARN|context switches: 8
voluntary, 4 involuntary
2023-02-25T04:13:20.578Z|04055|bridge|INFO|bridge br-int: deleted
interface vnet87 on port 205
2023-02-25T04:13:20.581Z|04056|bridge|WARN|could not open network device
vnet87 (No such device)
2023-02-25T04:13:21.550Z|00267|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
2023-02-25T04:13:22.509Z|04057|timeval|WARN|Unreasonably long 1961ms
poll interval (908ms user, 1048ms system)
2023-02-25T04:13:22.509Z|04058|timeval|WARN|faults: 1 minor, 0 major
2023-02-25T04:13:22.509Z|04059|timeval|WARN|context switches: 6
voluntary, 7 involuntary
2023-02-25T04:13:24.026Z|04060|poll_loop|INFO|Dropped 12 log messages in
last 6 seconds (most recently, 1 seconds ago) due to excessive rate
Any idea what's going on here? Should it be that ovs-vsctl timeout
should be longer?
Who is sending the SIG 14 to it and why?
What does thisovs-vswitchd WARN mean:
2023-02-25T04:13:02.695Z|00259|ovs_rcu(urcu9)|WARN|blocked 1000 ms
waiting for main to quiesce
It seems that eventually the port is added even though ovs-vsctl cmd
failed :
ovs-vsctl list-ports br-int | grep vnet87
vnet87
Thanks
Brendan
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss