For netdev_linux_update_via_netlink(), hint to the kernel that
we do not need it to gather netlink internal stats when we want
to update the netlink flags, as those stats are not rendered
within OVS.
Background:
ovs-vswitchd can spend quite a bit of time blocked by the kernel
during netlink calls, especially systems with many cores. This
time is dominated by the kernel-side internal stats gathering
mechanism in netlink, specifically:
inet6_fill_link_af
inet6_fill_ifla6_attrs
__snmp6_fill_stats64
In Linux 4.4+, there exists a hint for netlink requests to not
trigger the ipv6 stats gathering mechanism, which greatly reduces
the amount of time that ovs-vswitchd is on CPU.
Testing and Results:
Tested booting 320 VM's and measuring OVS utilization with perf
record, then visualized into a flamegraph using a patched version
of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst
by this issue.
Before bridge_run() == 11.3% of samples
After bridge_run() == 3.4% of samples
Note that there are at least two observed netlink calls under
bridge_run that are still kernel stats heavy after this patch:
Call 1:
bridge_run -> netdev_run -> route_table_run -> route_table_reset ->
ovs_router_insert -> ovs_router_insert__ -> get_src_addr ->
netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs
Since the actual netlink call is coming from getifaddrs() in glibc,
fixing would likely involve either duplicating glibc code in ovs
source or patch glibc.
Call 2:
bridge_run -> iface_refresh_stats -> netdev_get_stats ->
netdev_linux_get_stats -> get_stats_via_netlink
This does use netlink based stats; however, it isn't immediately
clear if just dropping the stats from inet6_fill_link_af would
impact anything or not. Given this call is more intermittent, its
of lesser concern.
Signed-off-by: Jon Kohler <[email protected]>
Acked-by: Greg Smith <[email protected]>
---
lib/netdev-linux.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 2766b3f2bf..f0246d3b2b 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -247,6 +247,12 @@ enum {
VALID_NUMA_ID = 1 << 8,
};
+/* Linux 4.4 introduced the ability to skip the internal stats gathering
+ * that netlink does via an external filter mask that can be passed into
+ * a netlink request.
+ */
+#define RTEXT_FILTER_SKIP_STATS (1 << 3)
+
/* Use one for the packet buffer and another for the aux buffer to receive
* TSO packets. */
#define IOV_STD_SIZE 1
@@ -6418,6 +6424,9 @@ netdev_linux_update_via_netlink(struct netdev_linux
*netdev)
if (netdev_linux_netnsid_is_remote(netdev)) {
nl_msg_put_u32(&request, IFLA_IF_NETNSID, netdev->netnsid);
}
+
+ nl_msg_put_u32(&request, IFLA_EXT_MASK, RTEXT_FILTER_SKIP_STATS);
+
error = nl_transact(NETLINK_ROUTE, &request, &reply);
ofpbuf_uninit(&request);
if (error) {
--
2.30.1 (Apple Git-130)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev