Any takers? I’m hoping I’ve got the right mailing list, as I did see
the thread get generated on the mailing list website?

> On May 31, 2022, at 10:03 AM, Jon Kohler <[email protected]> wrote:
> 
> 
> 
>> On May 26, 2022, at 9:11 PM, Jon Kohler <[email protected]> wrote:
>> 
>> For netdev_linux_update_via_netlink(), hint to the kernel that
>> we do not need it to gather netlink internal stats when we want
>> to update the netlink flags, as those stats are not rendered
>> within OVS.
>> 
>> Background:
>> ovs-vswitchd can spend quite a bit of time blocked by the kernel
>> during netlink calls, especially systems with many cores. This
>> time is dominated by the kernel-side internal stats gathering
>> mechanism in netlink, specifically:
>> inet6_fill_link_af
>>   inet6_fill_ifla6_attrs
>>     __snmp6_fill_stats64
>> 
>> In Linux 4.4+, there exists a hint for netlink requests to not
>> trigger the ipv6 stats gathering mechanism, which greatly reduces
>> the amount of time that ovs-vswitchd is on CPU.
>> 
>> Testing and Results:
>> Tested booting 320 VM's and measuring OVS utilization with perf
>> record, then visualized into a flamegraph using a patched version
>> of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst
>> by this issue.
>> 
>> Before bridge_run() == 11.3% of samples
>> After bridge_run() == 3.4% of samples
>> 
>> Note that there are at least two observed netlink calls under
>> bridge_run that are still kernel stats heavy after this patch:
>> 
>> Call 1:
>> bridge_run -> netdev_run -> route_table_run -> route_table_reset ->
>>   ovs_router_insert -> ovs_router_insert__ -> get_src_addr ->
>>     netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs
>> 
>> Since the actual netlink call is coming from getifaddrs() in glibc,
>> fixing would likely involve either duplicating glibc code in ovs
>> source or patch glibc.
>> 
>> Call 2:
>> bridge_run -> iface_refresh_stats -> netdev_get_stats ->
>>   netdev_linux_get_stats -> get_stats_via_netlink
>> 
>> This does use netlink based stats; however, it isn't immediately
>> clear if just dropping the stats from inet6_fill_link_af would
>> impact anything or not. Given this call is more intermittent, its
>> of lesser concern.
>> 
>> Signed-off-by: Jon Kohler <[email protected]>
>> Acked-by: Greg Smith <[email protected]>
> 
> Gentle bump



> 
>> ---
>> lib/netdev-linux.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>> 
>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>> index 2766b3f2bf..f0246d3b2b 100644
>> --- a/lib/netdev-linux.c
>> +++ b/lib/netdev-linux.c
>> @@ -247,6 +247,12 @@ enum {
>>    VALID_NUMA_ID           = 1 << 8,
>> };
>> 
>> +/* Linux 4.4 introduced the ability to skip the internal stats gathering
>> + * that netlink does via an external filter mask that can be passed into
>> + * a netlink request.
>> + */
>> +#define     RTEXT_FILTER_SKIP_STATS (1 << 3)
>> +
>> /* Use one for the packet buffer and another for the aux buffer to receive
>> * TSO packets. */
>> #define IOV_STD_SIZE 1
>> @@ -6418,6 +6424,9 @@ netdev_linux_update_via_netlink(struct netdev_linux 
>> *netdev)
>>    if (netdev_linux_netnsid_is_remote(netdev)) {
>>        nl_msg_put_u32(&request, IFLA_IF_NETNSID, netdev->netnsid);
>>    }
>> +
>> +    nl_msg_put_u32(&request, IFLA_EXT_MASK, RTEXT_FILTER_SKIP_STATS);
>> +
>>    error = nl_transact(NETLINK_ROUTE, &request, &reply);
>>    ofpbuf_uninit(&request);
>>    if (error) {
>> -- 
>> 2.30.1 (Apple Git-130)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to