Any takers? I’m hoping I’ve got the right mailing list, as I did see the thread get generated on the mailing list website?
> On May 31, 2022, at 10:03 AM, Jon Kohler <[email protected]> wrote: > > > >> On May 26, 2022, at 9:11 PM, Jon Kohler <[email protected]> wrote: >> >> For netdev_linux_update_via_netlink(), hint to the kernel that >> we do not need it to gather netlink internal stats when we want >> to update the netlink flags, as those stats are not rendered >> within OVS. >> >> Background: >> ovs-vswitchd can spend quite a bit of time blocked by the kernel >> during netlink calls, especially systems with many cores. This >> time is dominated by the kernel-side internal stats gathering >> mechanism in netlink, specifically: >> inet6_fill_link_af >> inet6_fill_ifla6_attrs >> __snmp6_fill_stats64 >> >> In Linux 4.4+, there exists a hint for netlink requests to not >> trigger the ipv6 stats gathering mechanism, which greatly reduces >> the amount of time that ovs-vswitchd is on CPU. >> >> Testing and Results: >> Tested booting 320 VM's and measuring OVS utilization with perf >> record, then visualized into a flamegraph using a patched version >> of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst >> by this issue. >> >> Before bridge_run() == 11.3% of samples >> After bridge_run() == 3.4% of samples >> >> Note that there are at least two observed netlink calls under >> bridge_run that are still kernel stats heavy after this patch: >> >> Call 1: >> bridge_run -> netdev_run -> route_table_run -> route_table_reset -> >> ovs_router_insert -> ovs_router_insert__ -> get_src_addr -> >> netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs >> >> Since the actual netlink call is coming from getifaddrs() in glibc, >> fixing would likely involve either duplicating glibc code in ovs >> source or patch glibc. >> >> Call 2: >> bridge_run -> iface_refresh_stats -> netdev_get_stats -> >> netdev_linux_get_stats -> get_stats_via_netlink >> >> This does use netlink based stats; however, it isn't immediately >> clear if just dropping the stats from inet6_fill_link_af would >> impact anything or not. Given this call is more intermittent, its >> of lesser concern. >> >> Signed-off-by: Jon Kohler <[email protected]> >> Acked-by: Greg Smith <[email protected]> > > Gentle bump > >> --- >> lib/netdev-linux.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c >> index 2766b3f2bf..f0246d3b2b 100644 >> --- a/lib/netdev-linux.c >> +++ b/lib/netdev-linux.c >> @@ -247,6 +247,12 @@ enum { >> VALID_NUMA_ID = 1 << 8, >> }; >> >> +/* Linux 4.4 introduced the ability to skip the internal stats gathering >> + * that netlink does via an external filter mask that can be passed into >> + * a netlink request. >> + */ >> +#define RTEXT_FILTER_SKIP_STATS (1 << 3) >> + >> /* Use one for the packet buffer and another for the aux buffer to receive >> * TSO packets. */ >> #define IOV_STD_SIZE 1 >> @@ -6418,6 +6424,9 @@ netdev_linux_update_via_netlink(struct netdev_linux >> *netdev) >> if (netdev_linux_netnsid_is_remote(netdev)) { >> nl_msg_put_u32(&request, IFLA_IF_NETNSID, netdev->netnsid); >> } >> + >> + nl_msg_put_u32(&request, IFLA_EXT_MASK, RTEXT_FILTER_SKIP_STATS); >> + >> error = nl_transact(NETLINK_ROUTE, &request, &reply); >> ofpbuf_uninit(&request); >> if (error) { >> -- >> 2.30.1 (Apple Git-130) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
