On 6/6/23 18:08, Frode Nordahl wrote: > On Tue, Jun 6, 2023 at 5:46 PM Ilya Maximets <i.maxim...@ovn.org> wrote: >> >> On 6/6/23 11:38, Frode Nordahl wrote: >>> The tc module combines the use of the `tc_transact` helper >>> function for communication with the in-kernel tc infrastructure >>> with assertions on the reply data by `ofpbuf_at_assert` on the >>> received data prior to further processing. >>> >>> With the presence of bugs on the kernel side, we need to treat >>> the kernel as an unreliable service provider and replace assertions >>> on the reply from it with checks to avoid a fatal crash of OVS. >>> >>> For the record, the symptom of the crash is this in the log: >>> EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= >>> b->size failed in ofpbuf_at_assert() >>> >>> And an excerpt of the backtrace looks like this: >>> 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, >>> offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 >>> tc_replace_flower (id=<optimized out>, flower=<optimized out>) at >>> ../lib/tc.c:3223 >>> 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, >>> match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, >>> ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at >>> ../lib/netdev-offload-tc.c:2096 >>> 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, >>> info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, >>> actions=<optimized out>, >>> match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 >>> parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at >>> ../lib/dpif-netlink.c:2297 >>> try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at >>> ../lib/dpif-netlink.c:2384 >>> >>> Reported-At: https://launchpad.net/bugs/2018500 >>> Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police >>> action") >>> Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for >>> tc-offload") >>> Fixes: f98e418fbdb6 ("tc: Add tc flower functions") >>> Fixes: c1c9c9c4b636 ("Implement QoS framework.") >>> Signed-off-by: Frode Nordahl <frode.nord...@canonical.com> >>> --- >>> lib/netdev-linux.c | 32 +++++++++++++++++++++--------- >>> lib/tc.c | 49 ++++++++++++++++++++++++++++++++-------------- >>> 2 files changed, 57 insertions(+), 24 deletions(-) >>> >>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c >>> index 36620199e..1efc837ac 100644 >>> --- a/lib/netdev-linux.c >>> +++ b/lib/netdev-linux.c >>> @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, >>> uint32_t kbits_rate, >>> >>> err = tc_transact(&request, &reply); >>> if (!err) { >>> - struct tcmsg *tc = >>> - ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc); >>> + struct ofpbuf b = ofpbuf_const_initializer(reply->data, >>> reply->size); >>> + struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); >>> + struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); >>> + >>> + if (!nlmsg || !tc) { >>> + VLOG_ERR_RL(&rl, "Failed to add match all policer, " >>> + "malformed reply"); >> >> We will leak a 'reply' here if this condition ever happens. >> But I'm still not sure what is the point of having and parsing >> a reply for this request at all. If it didn't fail, then we >> can assume it succeeded? > > Oh shoot, what can I say, iteration fatigue, all on me of course, will fix. > >> We can't assume that cls_flower is working correctly, it has >> issues. Even more on older kernels. That's why we request >> to echo everythig back and comparing that kernel configured >> what we asked. I hope, we can trust at least some of the >> interfaces and don't need to re-check everything. >> >> Are there any known issues with the matchall classifier? >> If not, I'd suggest we just don't ask for reply. An error >> code should be enough. > > As for if we can trust the kernel infrastructure here, I really can't say. > I'd prefer to replicate the existing behavior tbh, if you look at the end > of the function, the existing code even asserts that the reply is long > enough before deleting it: > https://github.com/openvswitch/ovs/blob/64cdc290ef441bc3b4c2cddc230311ba58bc31b3/lib/netdev-linux.c#L2718
For the sake of keeping it simple, let's keep the current logic, OK. We can try to figure out why this check was added later. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev