Hi,
We've found OVS used 100% CPU on ovs-switchd, and we checked the log,
it seems there are some issue on NETLINK_ROUTE.
# OVS Config
Bridge "vmbr0"
Port "vmbr0"
Interface "vmbr0"
type: internal
Port "vlan3005"
tag: 3005
Interface "vlan3005"
type: internal
Port "vlan30"
tag: 30
Interface "vlan30"
type: internal
Port "bond0"
Interface "enp1s0f0"
Interface "enp1s0f1"
Port "vlan3702"
tag: 3702
Interface "vlan3702"
type: internal
Port "vlan3502"
tag: 3502
Interface "vlan3502"
type: internal
Bridge "vmbr1"
Port "vmbr1"
Interface "vmbr1"
type: internal
ovs_version: "2.9.2"
# Log
2018-07-21T22:12:03.941Z|05606|netlink_notifier|WARN|netlink receive buffer
overflowed
2018-07-21T22:12:04.995Z|01446|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for
main to quiesce
2018-07-21T22:12:05.995Z|01447|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for
main to quiesce
2018-07-21T22:12:07.711Z|05607|timeval|WARN|Unreasonably long 3787ms poll
interval (1502ms user, 1706ms system)
2018-07-21T22:12:07.711Z|05608|timeval|WARN|context switches: 2824 voluntary,
15 involuntary
2018-07-21T22:12:07.711Z|05609|poll_loop|INFO|wakeup due to [POLLIN] on fd 12
(<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (86% CPU usage)
2018-07-21T22:12:08.798Z|01448|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for
main to quiesce
2018-07-21T22:12:09.798Z|01449|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for
main to quiesce
2018-07-21T22:12:11.068Z|05610|timeval|WARN|Unreasonably long 3357ms poll
interval (1715ms user, 1591ms system)
2018-07-21T22:12:11.068Z|05611|timeval|WARN|context switches: 3770 voluntary, 9
involuntary
2018-07-21T22:12:12.218Z|01450|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for
main to quiesce
2018-07-21T22:12:13.218Z|01451|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for
main to quiesce
2018-07-21T22:12:14.475Z|05612|timeval|WARN|Unreasonably long 3407ms poll
interval (1609ms user, 1718ms system)
2018-07-21T22:12:14.475Z|05613|timeval|WARN|faults: 2095 minor, 0 major
2018-07-21T22:12:14.475Z|05614|timeval|WARN|context switches: 3485 voluntary, 7
involuntary
2018-07-21T22:12:14.475Z|05615|poll_loop|INFO|Dropped 2 log messages in last 7
seconds (most recently, 3 seconds ago) due to excessive rate
2018-07-21T22:12:14.475Z|05616|poll_loop|INFO|wakeup due to [POLLIN] on fd 14
(NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1331 (97% CPU usage)
2018-07-21T22:12:15.706Z|01452|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for
main to quiesce
2018-07-21T22:12:16.707Z|01453|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for
main to quiesce
2018-07-21T22:12:18.004Z|05617|timeval|WARN|Unreasonably long 3529ms poll
interval (1763ms user, 1663ms system)
2018-07-21T22:12:18.004Z|05618|timeval|WARN|context switches: 3232 voluntary, 9
involuntary
It takes a long time to wait NETLINK_ROUTE response.
We've received full internet routes via BGP and import to the kernel.
It seems there are performance issue when kernel has 700k prefixes route.
----
Jason Huang_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss