This series refactors netdev-linux to use event-driven netlink notifications
instead of polling for device state update, significantly improving performance
under RTNL lock contention.
## Background
The RTNL mutex is used to serialize rtnetlink requests in the Linux kernel.
It's widespread use in many network configuration paths make it a problem which
the kernel community is well aware of.
When there is a lot of network configuration activity, like when lots of
interfaces are being created or deleted, specially interfaces that interact
with HW resources such as SR-IOV VFs, contention on the RTNL mutex can make
rtnetlink requests be quite slow.
The impact of RTNL contention on OVS's main thread can be high, making the
entire loop take several seconds (even minutes!) to complete, affecting other
periodic tasks such as OVSDB updates, OpenFlow flow programming, etc.
After analyzing what requests were being sent by OVS, it was observed that most
of them came from netdev-linux state checking mechanisms. While some state is
cached (such as the MTU or MAC address), the netdev's flags are not, and they
are checked often.
On the other hand, Netlink provides a reliable notification mechanism via
multicast groups that allows userspace to receive asynchronouse updates when
device state changes, eliminating the need to for polling in most cases.
## Approach
The series aims to change two aspects of netdev-linux operations:
A - Cache the netdev's flags
B - Switch the main netdev-linux loop run to event-driven reusing the existing
rtnetlink event infrastructure.
In order to accomplish A (commit 3) some bug-fixing and refactorings are done
first
(commits 1-2).
In roder to accomplish B (commit 6) some refactoring is done to enhance existing
rtnetlink notifier infrastructure (commits 4-5).
Finally, there are some extra consolidation and cleanups (commits 7-9).
## Testing and results
In order to test this series, I have written a small script that chruns
(deletes and recreates) some ovs ports (veths) in a way an SDN would do.
I increased the number of interfaces to churn from 10 to 100
In order to simulate RTNL mutex contention I used delay-kfunc [1] to
introduce latency to 'rtnl_lock'. The following table shows the time it
takes to complete the test:
==============================================================================
N ifaces RTNL Delay(μs) Main (s) Series (s) Delta (%)
------------------------------------------------------------------------------
10 0 0.275(0.008) 0.234(0.014) -14.9%
10 50 0.269(0.009) 0.249(0.012) -7.3%
10 100 0.278(0.011) 0.266(0.007) -4.6%
10 500 0.423(0.039) 0.395(0.046) -6.7%
10 1000 0.695(0.060) 0.586(0.045) -15.6%
10 5000 1.855(0.099) 1.818(0.041) -2.0%
10 10000 4.361(0.074) 3.106(0.111) -28.8%
20 0 0.485(0.014) 0.424(0.019) -12.6%
20 50 0.478(0.018) 0.472(0.015) -1.3%
20 100 0.504(0.018) 0.493(0.020) -2.3%
20 500 0.716(0.022) 0.678(0.031) -5.3%
20 1000 0.994(0.026) 0.926(0.083) -6.9%
20 5000 3.313(0.133) 2.851(0.039) -13.9%
20 10000 6.803(0.093) 4.875(0.117) -28.3%
30 0 0.716(0.024) 0.645(0.033) -10.0%
30 50 0.723(0.018) 0.692(0.019) -4.2%
30 100 0.744(0.024) 0.745(0.031) +0.1%
30 500 0.981(0.031) 0.997(0.034) +1.6%
30 1000 1.328(0.046) 1.222(0.040) -8.0%
30 5000 4.838(0.059) 3.865(0.079) -20.1%
30 10000 9.146(0.110) 6.653(0.110) -27.3%
40 0 0.974(0.042) 0.864(0.065) -11.3%
40 50 0.963(0.032) 0.961(0.044) -0.2%
40 100 0.997(0.040) 1.004(0.043) +0.7%
40 500 1.397(0.105) 1.359(0.035) -2.7%
40 1000 1.990(0.107) 1.805(0.096) -9.3%
40 5000 7.240(1.751) 4.967(0.587) -31.4%
40 10000 11.657(0.131) 8.289(0.308) -28.9%
50 0 1.340(0.111) 1.253(0.167) -6.5%
50 50 1.410(0.196) 1.274(0.059) -9.7%
50 100 1.411(0.108) 1.329(0.111) -5.8%
50 500 1.788(0.060) 1.779(0.079) -0.5%
50 1000 2.656(0.220) 2.446(0.097) -7.9%
50 5000 11.532(0.132) 8.216(0.094) -28.8%
50 10000 22.685(1.157) 14.098(0.186) -37.8%
60 0 1.760(0.249) 1.738(0.333) -1.3%
60 50 1.945(0.283) 1.851(0.305) -4.8%
60 100 1.777(0.340) 1.613(0.116) -9.2%
60 500 2.525(0.184) 2.330(0.125) -7.7%
60 1000 3.497(0.327) 3.247(0.174) -7.2%
60 5000 14.390(0.172) 10.093(0.138) -29.9%
60 10000 27.980(0.545) 17.383(0.211) -37.9%
80 0 3.977(0.767) 3.632(0.651) -8.7%
80 50 3.550(0.667) 3.294(0.645) -7.2%
80 100 3.854(0.679) 3.182(0.763) -17.4%
80 500 4.571(0.685) 3.998(0.619) -12.5%
80 1000 6.445(0.490) 4.955(0.281) -23.1%
80 5000 27.107(0.331) 17.348(0.197) -36.0%
80 10000 54.738(0.971) 31.525(1.116) -42.4%
100 0 8.509(2.392) 7.452(0.138) -12.4%
100 50 7.730(0.552) 7.278(1.877) -5.8%
100 100 8.084(2.648) 7.342(1.124) -9.2%
100 500 7.543(0.551) 6.851(0.997) -9.2%
100 1000 10.784(0.782) 7.990(0.651) -25.9%
100 5000 36.393(0.626) 25.800(0.363) -29.1%
100 10000 72.916(2.488) 45.648(1.929) -37.4%
==============================================================================
Notes about the above results:
- Values are shown as "{mean}({std})".
- I did not perform any kind of tuning or cpu isolation the test server.
- delay-kfunc does not always introduce the exact same delay so there is some
source of variance there as well.
- Beyond 200 interfaces, limitations of the test script itself make the results
rather unreliable.
All in all, a pretty consistent improvement is observed which grows with the
number of interfaces that we churn and with the amount of external RTNL
pressure.
## Future work
This is part of a larger effort to improve robustness against RTNL contention.
I plan to work on more optimizations in future series.
[1] https://github.com/xdp-project/bpf-examples/tree/main/delay-kfunc
Adrian Moreno (9):
netdev-linux: Fix IFLA_IF_NETNSID value.
netdev_linux: Refactor netdev flag update.
netdev-linux: Cache netdev flags.
netlink-notifier: Drain socket on overflow.
netlink-notifier: Include nsid in callbacks.
netdev-linux: Use rtnetlink to update state.
netdev-linux: Consolidate RTM_GETLINK parsing.
linux-netdev: Check status when reading stats.
netdev-linux: Consolidate netlink updates.
acinclude.m4 | 8 +
lib/if-notifier.c | 5 +-
lib/netdev-afxdp.c | 2 +-
lib/netdev-linux-private.h | 5 +-
lib/netdev-linux.c | 417 +++++++++++++++------------------
lib/netdev-linux.h | 3 +
lib/netlink-notifier.c | 23 +-
lib/netlink-notifier.h | 14 +-
lib/netnsid.h | 1 +
lib/route-table.c | 17 +-
lib/route-table.h | 2 +-
lib/rtnetlink.c | 32 ++-
lib/rtnetlink.h | 52 +++-
lib/tnl-ports.c | 2 +-
tests/system-interface.at | 1 +
tests/system-route.at | 12 +-
tests/system-tap.at | 5 +-
tests/test-lib-route-table.c | 20 +-
tests/test-netlink-conntrack.c | 7 +-
19 files changed, 358 insertions(+), 270 deletions(-)
--
2.52.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev