On 3/21/24 14:28, Aaron Conole wrote: > Ilya Maximets <[email protected]> writes: > >> Currently, ovs-vswitchd is subscribed to all the routing changes in the >> kernel. On each change, it marks the internal routing table cache as >> invalid, then resets it and dumps all the routes from the kernel from >> scratch. The reason for that is kernel routing updates not being >> reliable in a sense that it's hard to tell which route is getting >> removed or modified. Userspace application has to track the order in >> which route entries are dumped from the kernel. Updates can get lost >> or even duplicated and the kernel doesn't provide a good mechanism to >> distinguish one route from another. To my knowledge, dumping all the >> routes from a kernel after each change is the only way to keep the >> cache consistent. Some more info can be found in the following never >> addressed issues: >> https://bugzilla.redhat.com/1337860 >> https://bugzilla.redhat.com/1337855 >> >> It seems to be believed that NetworkManager "mostly" does incremental >> updates right. But it is still not completely correct, will re-dump >> the whole table in certain cases, and it takes a huge amount of very >> complicated code to do the accounting and route comparisons. >> >> Going back to ovs-vswitchd, it currently dumps routes from all the >> routing tables. If it will get conflicting routes from multiple >> tables, the cache will not be useful. The routing cache in userspace >> is primarily used for checking the egress port for tunneled traffic >> and this way also detecting link state changes for a tunnel port. >> For userspace datapath it is used for actual routing of the packet >> after sending to a native tunnel. >> With kernel datapath we don't really have a mechanism to know which >> routing table will actually be used by the kernel after encapsulation, >> so our lookups on a cache may be incorrect because of this as well. >> >> So, unless all the relevant routes are in the standard tables, the >> lookup in userspace route cache is unreliable. >> >> Luckily, most setups are not using any complicated routing in >> non-standard tables that OVS has to be aware of. >> >> It is possible, but unlikely, that standard routing tables are >> completely empty while some other custom table is not, and all the OVS >> tunnel traffic is directed to that table. That would be the only >> scenario where dumping non-standard tables would make sense. But it >> seems like this kind of setup will likely need a way to tell OVS from >> which table the routes should be taken, or we'll need to dump routing >> rules and keep a separate cache for each table, so we can first match >> on rules and then lookup correct routes in a specific table. I'm not >> sure if trying to implement all that is justified. >> >> For now, stop considering routes from non-standard tables to avoid >> mixing different tables together and also wasting CPU resources. >> >> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is >> running on a same host and in a same network namespace with OVS using >> its own custom routing table. >> >> Unfortunately, there seems to be no way to tell the kernel to send >> updates only for particular tables. So, we'll still receive and parse >> all of them. But they will not result in a full cache invalidation in >> most cases. >> >> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. >> So, we can make use of it and dump only standard tables when we get a >> relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on >> the socket for filtering to work. There is no reason to not enable it >> by default, if supported. It is not used outside of NETLINK_ROUTE. >> >> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.") >> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb") >> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185 >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html >> Signed-off-by: Ilya Maximets <[email protected]> >> --- > > Thanks! > > Acked-by: Aaron Conole <[email protected]> >
Thanks! Applied and backported down to 2.17. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
