On 3/28/25 14:34, Ilya Maximets wrote: > On 3/28/25 14:26, Ilya Maximets wrote: >> On 3/28/25 14:14, Aaron Conole wrote: >>> Ilya Maximets <i.maxim...@ovn.org> writes: >>> >>>> On 3/25/25 16:38, Aaron Conole via dev wrote: >>>>> Daniel Ding <danieldin...@gmail.com> writes: >>>>> >>>>>> From: Daniel Ding <danieldin...@gmail.com> >>>>>> >>>>>> In order to collect meters since the last meter action >>>>>> to the current action in the dpif_execute_helper_cb, >>>>>> the next entry is simply obtained using nl_attr_next, and >>>>>> without considering the nested nl. This will cause memory >>>>>> access to be overflowed. >>>>>> >>>>>> Gdb debug looks like this: >>>>>> >>>>>> (gdb) print a >>>>>> $6 = (const struct nlattr *) 0xaaaad175a69c >>>>>> (gdb) n >>>>>> 1206 a = nl_attr_next(a); >>>>>> (gdb) n >>>>>> 1207 } while (a != action && >>>>>> (gdb) n >>>>>> 1206 a = nl_attr_next(a); >>>>>> (gdb) print a >>>>>> $7 = (const struct nlattr *) 0xaaaad175a69c >>>>>> (gdb) print *a >>>>>> $8 = {nla_len = 0, nla_type = 0} >>>>>> (gdb) >>>>>> >>>>>> Fixes: 076caa2fb077 ("ofproto: Meter translation.") >>>>>> Signed-off-by: Daniel Ding <danieldin...@gmail.com> >>>>>> --- >>>>>> lib/dpif.c | 43 ++++++++++++++++--------------------------- >>>>>> 1 file changed, 16 insertions(+), 27 deletions(-) >>>>>> >>>>>> diff --git a/lib/dpif.c b/lib/dpif.c >>>>>> index d07241f1e..05dfd2919 100644 >>>>>> --- a/lib/dpif.c >>>>>> +++ b/lib/dpif.c >>>>>> @@ -1158,11 +1158,14 @@ dpif_flow_dump_next(struct dpif_flow_dump_thread >>>>>> *thread, >>>>>> return n; >>>>>> } >>>>>> >>>>>> +enum { EXECUTE_HELPER_MAX_METER = 32 }; >>>>>> + >>>>> >>>>> Could we also prevent this with something like: >>>>> >>>>> while (a != action && nl_attr_is_valid(a) && >>>>> nl_attr_type(a) != OVS_ACTION_ATTR_METER) >>>> >>>> I think nl_attr_is_valid(a) may still be accessing the memory beyond >>>> allocated. >>>> If we want to fix the iteration this way, we should just use >>>> NL_ATTR_FOR_EACH >>>> and break at the right time. It will perform all the necessary length >>>> checks. >>>> >>>> However, as rightly pointed out, this code can't handle nested attributes >>>> and >>>> so it's not correct in general. >>> >>> It might be possible to rework this so that we have something like >>> (psuedo-code): >>> >>> bool nla_is_nested(const struct nlattr *nla) { >>> return nla->nla_type & NLA_F_NESTED; >> >> Note: we're not using the NLA_F_NESTED flag. And can't actually use it >> without breaking compatibility with the kernel. >> >>> } >>> >>> int validate_and_assist(actions, len) { >>> int ret = 0, left; >>> struct nlattr *a; >>> >>> NL_ATTR_FOR_EACH (a, left, actions, len) { >>> if (ret) break; >>> if (nla_is_nested(a)) { >>> ret = validate_and_assist(a, a->nla_len); >>> continue; >>> } >>> ... normal assist ... >>> } >>> return ret; >>> } >>> >>> Then we handle nested attributes, and I think it is a bit easier to >>> read. >>> >>> Maybe we could instead of doing recursively, do it as iterative by >>> breaking things up a bit differently: >>> >>> bool nla_is_nested(const struct nlattr *nla) { >>> return nla->nla_type & NLA_F_NESTED; >>> } >>> >>> int assist_attr(attr) { >>> ... normal assist ... >>> } >>> >>> int assist_actions(actions, len) { >>> int ret = 0, left; >>> struct nlattr *a; >>> >>> NL_ATTR_FOR_EACH (a, left, actions, len) { >>> if (nla_is_nested(a)) { >>> struct nlattr *b; >>> int bleft; >>> NL_NESTED_FOR_EACH(b, bleft, a) { >>> assist_attr(b); >>> } >>> } else { >>> assist_attr(b); >>> } >>> } >>> } >>> >>> Something like an above set of approachs at least will let us handle >>> nested attributes and could be a bit more pleasant to read. >>> >> >> Basically, what we need is a depth-first search for the current action >> starting from the first meter we saw. > > Hmm, actually, no. Because the last meter we saw might have been in > a different branch of execution and isn't relevant to the current > action at all. So, we have to start form the very beginning of the > actions, or somehow traverse this tree backwards from the current > action to the beginning collecting the meters and stopping if we hit > another action that requires assistance, because all the meters before > that would be already executed.
So, the algorithm should look like: 1. Start at the very beginning of the action list. 2. Find a full list of actions on the path from the beginning to the current action using DFS. 3. Find the last action on the path that required datapath assistance. 4. Add all the meters from the path between that action and the current one. It would be better if we could properly accumulate meters while executing other actions, but it's tricky because different actions are executed in different parts of the code and odp-execute doesn't know how the aux structure works. We may work around that by adding another callback for odp-execute to use whenever we go into or out of nested actions, so dpif can keep track of the nesting and push/pop meters accordingly. Maybe that's a better approach. At least, that will be more performant. > >> Storing all the meters we encounter >> on the path we found (push and pop actions when we recursively go into >> nested actions and return). >> >>>>> >>>>> I haven't done a deep review, just wondering because from a quick >>>>> glance, it seems that this change technically imposes limits on the >>>>> action list we are helping with, and then silently drops them after >>>>> that. >>>> >>>> It may be fine to limit the number of meters we will execute for one >>>> packet, >>>> 32 meters is kind of a lot for one packet. This execution path is not >>>> great >>>> to begin with. But yes, we could have a growing dynamic array or use a >>>> list >>>> instead to avoid missing some of the meters. >>>> >>>> A bigger problem here, however, is that we're accumulating those meters, >>>> but we're not removing them when we go back from the recursive execution of >>>> the clone or other nested action, so meters accumulated for one branch of >>>> execution will also be executed for the other branch, e.g. in >>>> actions:clone(meter(1),<action-with-help>),meter(2),<action-with-help> >>>> In this scenario we will accumulate and execute both meters for the last >>>> action, while only the second meter should actually be executed. >>>> >>>>> >>>>> Also, seems you could include a test case with the fix. >>>> >>>> Yes, we should have a test case for this issue. >>>> >>>> Best regards, Ilya Maximets. >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev