Re: [ovs-dev] [PATCH net] net: openvswitch: remove misbehaving actions length check

Ilya Maximets Mon, 10 Mar 2025 05:25:36 -0700

On 3/8/25 21:03, Aaron Conole wrote:
> Ilya Maximets <i.maxim...@ovn.org> writes:
> 
>> The actions length check is unreliable and produces different results
>> depending on the initial length of the provided netlink attribute and
>> the composition of the actual actions inside of it.  For example, a
>> user can add 4088 empty clone() actions without triggering -EMSGSIZE,
>> on attempt to add 4089 such actions the operation will fail with the
>> -EMSGSIZE verdict.  However, if another 16 KB of other actions will
>> be *appended* to the previous 4089 clone() actions, the check passes
>> and the flow is successfully installed into the openvswitch datapath.
>>
>> The reason for a such a weird behavior is the way memory is allocated.
>> When ovs_flow_cmd_new() is invoked, it calls ovs_nla_copy_actions(),
>> that in turn calls nla_alloc_flow_actions() with either the actual
>> length of the user-provided actions or the MAX_ACTIONS_BUFSIZE.  The
>> function adds the size of the sw_flow_actions structure and then the
>> actually allocated memory is rounded up to the closest power of two.
>>
>> So, if the user-provided actions are larger than MAX_ACTIONS_BUFSIZE,
>> then MAX_ACTIONS_BUFSIZE + sizeof(*sfa) rounded up is 32K + 24 -> 64K.
>> Later, while copying individual actions, we look at ksize(), which is
>> 64K, so this way the MAX_ACTIONS_BUFSIZE check is not actually
>> triggered and the user can easily allocate almost 64 KB of actions.
>>
>> However, when the initial size is less than MAX_ACTIONS_BUFSIZE, but
>> the actions contain ones that require size increase while copying
>> (such as clone() or sample()), then the limit check will be performed
>> during the reserve_sfa_size() and the user will not be allowed to
>> create actions that yield more than 32 KB internally.
>>
>> This is one part of the problem.  The other part is that it's not
>> actually possible for the userspace application to know beforehand
>> if the particular set of actions will be rejected or not.
>>
>> Certain actions require more space in the internal representation,
>> e.g. an empty clone() takes 4 bytes in the action list passed in by
>> the user, but it takes 12 bytes in the internal representation due
>> to an extra nested attribute, and some actions require less space in
>> the internal representations, e.g. set(tunnel(..)) normally takes
>> 64+ bytes in the action list provided by the user, but only needs to
>> store a single pointer in the internal implementation, since all the
>> data is stored in the tunnel_info structure instead.
>>
>> And the action size limit is applied to the internal representation,
>> not to the action list passed by the user.  So, it's not possible for
>> the userpsace application to predict if the certain combination of
>> actions will be rejected or not, because it is not possible for it to
>> calculate how much space these actions will take in the internal
>> representation without knowing kernel internals.
>>
>> All that is causing random failures in ovs-vswitchd in userspace and
>> inability to handle certain traffic patterns as a result.  For example,
>> it is reported that adding a bit more than a 1100 VMs in an OpenStack
>> setup breaks the network due to OVS not being able to handle ARP
>> traffic anymore in some cases (it tries to install a proper datapath
>> flow, but the kernel rejects it with -EMSGSIZE, even though the action
>> list isn't actually that large.)
>>
>> Kernel behavior must be consistent and predictable in order for the
>> userspace application to use it in a reasonable way.  ovs-vswitchd has
>> a mechanism to re-direct parts of the traffic and partially handle it
>> in userspace if the required action list is oversized, but that doesn't
>> work properly if we can't actually tell if the action list is oversized
>> or not.
>>
>> Solution for this is to check the size of the user-provided actions
>> instead of the internal representation.  This commit just removes the
>> check from the internal part because there is already an implicit size
>> check imposed by the netlink protocol.  The attribute can't be larger
>> than 64 KB.  Realistically, we could reduce the limit to 32 KB, but
>> we'll be risking to break some existing setups that rely on the fact
>> that it's possible to create nearly 64 KB action lists today.
>>
>> Vast majority of flows in real setups are below 100-ish bytes.  So
>> removal of the limit will not change real memory consumption on the
>> system.  The absolutely worst case scenario is if someone adds a flow
>> with 64 KB of empty clone() actions.  That will yield a 192 KB in the
>> internal representation consuming 256 KB block of memory.  However,
>> that list of actions is not meaningful and also a no-op.  Real world
>> very large action lists (that can occur for a rare cases of BUM
>> traffic handling) are unlikely to contain a large number of clones and
>> will likely have a lot of tunnel attributes making the internal
>> representation comparable in size to the original action list.
>> So, it should be fine to just remove the limit.
>>
>> Commit in the 'Fixes' tag is the first one that introduced the
>> difference between internal representation and the user-provided action
>> lists, but there were many more afterwards that lead to the situation
>> we have today.
>>
>> Fixes: 7d5437c709de ("openvswitch: Add tunneling interface.")
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
> 
> Thanks for the detailed explanation.  Do you think it's useful to
> check with selftest:
> 
>    # python3 ./ovs-dpctl.py add-dp flbr
>    # python3 ./ovs-dpctl.py add-flow flbr \
>      "in_port(0),eth(),eth_type(0x806),arp()" \
>      $(echo 'print("clone(),"*4089)' | python3)
> 
> I think a limit test is probably a good thing to have anyway (although
> after this commit we will rely on netlink limits).


I had a similar thought, but as you said, this commit will remove the limit
so we'll not really be testing OVS code at this point.  So, I thought it
may be better to not include such a test for easier backporting to older
kernels (given the Fixes tag goes far back).   But I agree that it's a good
thing in general to have tests that cover maximum size cases, so maybe we
can add something like this to net-next instead, once the fix is accepted?

Note: 4089 is too small for such a test, it should be somewhere around 16K.

> 
> 
> Reviewed-by: Aaron Conole <acon...@redhat.com>
> 

Thanks for review!

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: remove misbehaving actions length check

Reply via email to