NOTE: this change makes improvements depending on a change in the kernel currently on the netdev mailing list (see [1]).
Previously the kernel did not provide a netlink interface to flush/list only conntrack entries matching a specific zone. With [1] it is now possible to flush and list conntrack entries filtered by zone. Older kernels not yet supporting this feature will ignore the filter. For the list request that means just returning all entries (which we can then filter in userspace as before). FOr the flush request that means deleting all conntrack entries. These significantly improves the performance of flushing conntrack zones when the conntrack table is large. Since flushing a conntrack zone is normally triggered via an openflow command it blocks the main ovs thread and thereby also blocks new flows from being applied. The main benefit can already be acheived by using the existing logicl with the additional filter based on the zone (90-95% speedup). Using the logical to flush directly by zone brings an additional 10-15% on top of that (more numbers below). In combination with OVN the creation of a Logical_Router (which causes the flushing of a ct zone) could block other operations, e.g. the failover of Logical_Routers (as they cause new flows to be created). This is visible from a user perspective as a ovn-controller that is idle (as it waits for vswitchd) and vswitchd reporting: "blocked 1000 ms waiting for main to quiesce" (potentially with ever increasing times). The following performance tests where run in a qemu vm with 500.000 conntrack entries distributed evenly over 500 ct zones using `ovstest test-netlink-conntrack flush zone=<zoneid>`. With this patch and the respective kernel patch applied, but OVS_NETLINK_CONNTRAK_FLUSH_ZONE_SUPPORTED unset: ----------------------------------------------------------------------------------------------------------------------------------------------------- Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count ----------------------------------------------------------------------------------------------------------------------------------------------------- flush zone with 1000 entries 0.309 0.372 0.393 0.467 0.516 0.374 93.597 250 flush zone with no entry 0.265 0.305 0.333 0.352 0.393 0.307 76.770 250 ----------------------------------------------------------------------------------------------------------------------------------------------------- With this patch and the respective kernel patch applied, and OVS_NETLINK_CONNTRAK_FLUSH_ZONE_SUPPORTED set: ----------------------------------------------------------------------------------------------------------------------------------------------------- Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count ----------------------------------------------------------------------------------------------------------------------------------------------------- flush zone with 1000 entries 0.256 0.323 0.341 0.367 0.389 0.322 80.729 250 flush zone with no entry 0.225 0.265 0.317 0.336 0.351 0.274 68.659 250 ----------------------------------------------------------------------------------------------------------------------------------------------------- Before this patch and/or without the respective kernel patch ----------------------------------------------------------------------------------------------------------------------------------------------------- Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count ----------------------------------------------------------------------------------------------------------------------------------------------------- flush zone with 1000 entries 2.499 4.990 5.209 6.435 7.150 5.008 1252.158 250 flush zone with no entry 4.120 4.572 4.783 5.156 5.364 4.559 1139.786 250 ----------------------------------------------------------------------------------------------------------------------------------------------------- [1]: https://lore.kernel.org/netdev/zvegfp2x-wx6d...@sit-sdelap4051.int.lidl.net/T/#u --- lib/netlink-conntrack.c | 49 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c index 492bfcffb..32be0d122 100644 --- a/lib/netlink-conntrack.c +++ b/lib/netlink-conntrack.c @@ -141,6 +141,9 @@ nl_ct_dump_start(struct nl_ct_dump_state **statep, const uint16_t *zone, nl_msg_put_nfgenmsg(&state->buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, IPCTNL_MSG_CT_GET, NLM_F_REQUEST); + if (zone) { + nl_msg_put_be16(&state->buf, CTA_ZONE, htons(*zone)); + } nl_dump_start(&state->dump, NETLINK_NETFILTER, &state->buf); ofpbuf_clear(&state->buf); @@ -283,23 +286,65 @@ nl_ct_flush_zone(uint16_t flush_zone) return err; } #else + +static bool netlink_flush_supports_zone(void) { + static bool valid, supported = false; + if (!valid) { + char *env = getenv("OVS_NETLINK_CONNTRAK_FLUSH_ZONE_SUPPORTED"); + if (env && env[0]) { + if (env[0] == 'T' || env[0] == 't') { + supported = true; + } + } + valid = true; + } + return supported; +} + int nl_ct_flush_zone(uint16_t flush_zone) { - /* Apparently, there's no netlink interface to flush a specific zone. + /* In older kernels, there was no netlink interface to flush a specific + * conntrack zone. * This code dumps every connection, checks the zone and eventually * delete the entry. + * In newer kernels there is the option to specifiy a zone for filtering + * during dumps. Older kernels ignore this option. We set it here in the + * hope we only get relevant entries back, but fall back to filtering here + * to keep compatibility. * - * This is race-prone, but it is better than using shell scripts. */ + * This is race-prone, but it is better than using shell scripts. + * + * Additionaly newer kenerls also support flushing a zone without listing + * it first. However it is not easily possible to discover if the kernel + * supports this feature or if it will flush the complete conntrack table. + * We therefor rely on an environment variable, allowing the user to + * provide us this information. In the future we can use kernel version + * numbers. */ struct nl_dump dump; struct ofpbuf buf, reply, delete; + int err; + + if (netlink_flush_supports_zone()) { + ofpbuf_init(&buf, NL_DUMP_BUFSIZE); + + nl_msg_put_nfgenmsg(&buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, + IPCTNL_MSG_CT_DELETE, NLM_F_REQUEST); + nl_msg_put_be16(&buf, CTA_ZONE, htons(flush_zone)); + + err = nl_transact(NETLINK_NETFILTER, &buf, NULL); + ofpbuf_uninit(&buf); + + return err; + } ofpbuf_init(&buf, NL_DUMP_BUFSIZE); ofpbuf_init(&delete, NL_DUMP_BUFSIZE); nl_msg_put_nfgenmsg(&buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, IPCTNL_MSG_CT_GET, NLM_F_REQUEST); + nl_msg_put_be16(&buf, CTA_ZONE, htons(flush_zone)); nl_dump_start(&dump, NETLINK_NETFILTER, &buf); ofpbuf_clear(&buf); -- 2.42.0 Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>. This e-mail may contain confidential content and is intended only for the specified recipient/s. If you are not the intended recipient, please inform the sender immediately and delete this e-mail. Information on data protection can be found here<https://www.datenschutz.schwarz>. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev