On 3/21/24 14:28, Aaron Conole wrote:
> Ilya Maximets <[email protected]> writes:
> 
>> Currently, ovs-vswitchd is subscribed to all the routing changes in the
>> kernel.  On each change, it marks the internal routing table cache as
>> invalid, then resets it and dumps all the routes from the kernel from
>> scratch.  The reason for that is kernel routing updates not being
>> reliable in a sense that it's hard to tell which route is getting
>> removed or modified.  Userspace application has to track the order in
>> which route entries are dumped from the kernel.  Updates can get lost
>> or even duplicated and the kernel doesn't provide a good mechanism to
>> distinguish one route from another.  To my knowledge, dumping all the
>> routes from a kernel after each change is the only way to keep the
>> cache consistent.  Some more info can be found in the following never
>> addressed issues:
>>   https://bugzilla.redhat.com/1337860
>>   https://bugzilla.redhat.com/1337855
>>
>> It seems to be believed that NetworkManager "mostly" does incremental
>> updates right.  But it is still not completely correct, will re-dump
>> the whole table in certain cases, and it takes a huge amount of very
>> complicated code to do the accounting and route comparisons.
>>
>> Going back to ovs-vswitchd, it currently dumps routes from all the
>> routing tables.  If it will get conflicting routes from multiple
>> tables, the cache will not be useful.  The routing cache in userspace
>> is primarily used for checking the egress port for tunneled traffic
>> and this way also detecting link state changes for a tunnel port.
>> For userspace datapath it is used for actual routing of the packet
>> after sending to a native tunnel.
>> With kernel datapath we don't really have a mechanism to know which
>> routing table will actually be used by the kernel after encapsulation,
>> so our lookups on a cache may be incorrect because of this as well.
>>
>> So, unless all the relevant routes are in the standard tables, the
>> lookup in userspace route cache is unreliable.
>>
>> Luckily, most setups are not using any complicated routing in
>> non-standard tables that OVS has to be aware of.
>>
>> It is possible, but unlikely, that standard routing tables are
>> completely empty while some other custom table is not, and all the OVS
>> tunnel traffic is directed to that table.  That would be the only
>> scenario where dumping non-standard tables would make sense.  But it
>> seems like this kind of setup will likely need a way to tell OVS from
>> which table the routes should be taken, or we'll need to dump routing
>> rules and keep a separate cache for each table, so we can first match
>> on rules and then lookup correct routes in a specific table.  I'm not
>> sure if trying to implement all that is justified.
>>
>> For now, stop considering routes from non-standard tables to avoid
>> mixing different tables together and also wasting CPU resources.
>>
>> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
>> running on a same host and in a same network namespace with OVS using
>> its own custom routing table.
>>
>> Unfortunately, there seems to be no way to tell the kernel to send
>> updates only for particular tables.  So, we'll still receive and parse
>> all of them.  But they will not result in a full cache invalidation in
>> most cases.
>>
>> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
>> So, we can make use of it and dump only standard tables when we get a
>> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
>> the socket for filtering to work.  There is no reason to not enable it
>> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>>
>> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
>> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
>> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
>> Reported-at: 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
> 
> Thanks!
> 
> Acked-by: Aaron Conole <[email protected]>
> 

Thanks!  Applied and backported down to 2.17.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to