Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.
On 6/6/23 18:08, Frode Nordahl wrote: > On Tue, Jun 6, 2023 at 5:46 PM Ilya Maximets wrote: >> >> On 6/6/23 11:38, Frode Nordahl wrote: >>> The tc module combines the use of the `tc_transact` helper >>> function for communication with the in-kernel tc infrastructure >>> with assertions on the reply data by `ofpbuf_at_assert` on the >>> received data prior to further processing. >>> >>> With the presence of bugs on the kernel side, we need to treat >>> the kernel as an unreliable service provider and replace assertions >>> on the reply from it with checks to avoid a fatal crash of OVS. >>> >>> For the record, the symptom of the crash is this in the log: >>> EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= >>> b->size failed in ofpbuf_at_assert() >>> >>> And an excerpt of the backtrace looks like this: >>> 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, >>> offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 >>> tc_replace_flower (id=, flower=) at >>> ../lib/tc.c:3223 >>> 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, >>> match=, actions=, actions_len=, >>> ufid=, info=, stats=) at >>> ../lib/netdev-offload-tc.c:2096 >>> 0x561dac117541 in netdev_flow_put (stats=, >>> info=0x7fb65b7ba780, ufid=, act_len=, >>> actions=, >>> match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 >>> parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at >>> ../lib/dpif-netlink.c:2297 >>> try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at >>> ../lib/dpif-netlink.c:2384 >>> >>> Reported-At: https://launchpad.net/bugs/2018500 >>> Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police >>> action") >>> Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for >>> tc-offload") >>> Fixes: f98e418fbdb6 ("tc: Add tc flower functions") >>> Fixes: c1c9c9c4b636 ("Implement QoS framework.") >>> Signed-off-by: Frode Nordahl >>> --- >>> lib/netdev-linux.c | 32 +- >>> lib/tc.c | 49 -- >>> 2 files changed, 57 insertions(+), 24 deletions(-) >>> >>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c >>> index 36620199e..1efc837ac 100644 >>> --- a/lib/netdev-linux.c >>> +++ b/lib/netdev-linux.c >>> @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, >>> uint32_t kbits_rate, >>> >>> err = tc_transact(&request, &reply); >>> if (!err) { >>> -struct tcmsg *tc = >>> -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc); >>> +struct ofpbuf b = ofpbuf_const_initializer(reply->data, >>> reply->size); >>> +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); >>> +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); >>> + >>> +if (!nlmsg || !tc) { >>> +VLOG_ERR_RL(&rl, "Failed to add match all policer, " >>> +"malformed reply"); >> >> We will leak a 'reply' here if this condition ever happens. >> But I'm still not sure what is the point of having and parsing >> a reply for this request at all. If it didn't fail, then we >> can assume it succeeded? > > Oh shoot, what can I say, iteration fatigue, all on me of course, will fix. > >> We can't assume that cls_flower is working correctly, it has >> issues. Even more on older kernels. That's why we request >> to echo everythig back and comparing that kernel configured >> what we asked. I hope, we can trust at least some of the >> interfaces and don't need to re-check everything. >> >> Are there any known issues with the matchall classifier? >> If not, I'd suggest we just don't ask for reply. An error >> code should be enough. > > As for if we can trust the kernel infrastructure here, I really can't say. > I'd prefer to replicate the existing behavior tbh, if you look at the end > of the function, the existing code even asserts that the reply is long > enough before deleting it: > https://github.com/openvswitch/ovs/blob/64cdc290ef441bc3b4c2cddc230311ba58bc31b3/lib/netdev-linux.c#L2718 For the sake of keeping it simple, let's keep the current logic, OK. We can try to figure out why this check was added later. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.
On Tue, Jun 6, 2023 at 5:46 PM Ilya Maximets wrote: > > On 6/6/23 11:38, Frode Nordahl wrote: > > The tc module combines the use of the `tc_transact` helper > > function for communication with the in-kernel tc infrastructure > > with assertions on the reply data by `ofpbuf_at_assert` on the > > received data prior to further processing. > > > > With the presence of bugs on the kernel side, we need to treat > > the kernel as an unreliable service provider and replace assertions > > on the reply from it with checks to avoid a fatal crash of OVS. > > > > For the record, the symptom of the crash is this in the log: > > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= > > b->size failed in ofpbuf_at_assert() > > > > And an excerpt of the backtrace looks like this: > > 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, > > offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > > tc_replace_flower (id=, flower=) at > > ../lib/tc.c:3223 > > 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, > > match=, actions=, actions_len=, > > ufid=, info=, stats=) at > > ../lib/netdev-offload-tc.c:2096 > > 0x561dac117541 in netdev_flow_put (stats=, > > info=0x7fb65b7ba780, ufid=, act_len=, > > actions=, > > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at > > ../lib/dpif-netlink.c:2297 > > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at > > ../lib/dpif-netlink.c:2384 > > > > Reported-At: https://launchpad.net/bugs/2018500 > > Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police > > action") > > Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for > > tc-offload") > > Fixes: f98e418fbdb6 ("tc: Add tc flower functions") > > Fixes: c1c9c9c4b636 ("Implement QoS framework.") > > Signed-off-by: Frode Nordahl > > --- > > lib/netdev-linux.c | 32 +- > > lib/tc.c | 49 -- > > 2 files changed, 57 insertions(+), 24 deletions(-) > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > index 36620199e..1efc837ac 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, > > uint32_t kbits_rate, > > > > err = tc_transact(&request, &reply); > > if (!err) { > > -struct tcmsg *tc = > > -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc); > > +struct ofpbuf b = ofpbuf_const_initializer(reply->data, > > reply->size); > > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); > > +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); > > + > > +if (!nlmsg || !tc) { > > +VLOG_ERR_RL(&rl, "Failed to add match all policer, " > > +"malformed reply"); > > We will leak a 'reply' here if this condition ever happens. > But I'm still not sure what is the point of having and parsing > a reply for this request at all. If it didn't fail, then we > can assume it succeeded? Oh shoot, what can I say, iteration fatigue, all on me of course, will fix. > We can't assume that cls_flower is working correctly, it has > issues. Even more on older kernels. That's why we request > to echo everythig back and comparing that kernel configured > what we asked. I hope, we can trust at least some of the > interfaces and don't need to re-check everything. > > Are there any known issues with the matchall classifier? > If not, I'd suggest we just don't ask for reply. An error > code should be enough. As for if we can trust the kernel infrastructure here, I really can't say. I'd prefer to replicate the existing behavior tbh, if you look at the end of the function, the existing code even asserts that the reply is long enough before deleting it: https://github.com/openvswitch/ovs/blob/64cdc290ef441bc3b4c2cddc230311ba58bc31b3/lib/netdev-linux.c#L2718 > > +return EPROTO; > > +} > > ofpbuf_delete(reply); > > } > > > > @@ -5744,26 +5751,27 @@ static int > > tc_update_policer_action_stats(struct ofpbuf *msg, > > struct ofputil_meter_stats *stats) > > { > > +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size); > > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); > > +struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca); > > struct ovs_flow_stats stats_dropped; > > struct ovs_flow_stats stats_hw; > > struct ovs_flow_stats stats_sw; > > const struct nlattr *act; > > struct nlattr *prio; > > -struct tcamsg *tca; > > int error = 0; > > > > if (!stats) { > > goto exit; > > } > > > > -if (NLMSG_HDRLEN + sizeof *tca > msg->size) { > > +if (!nlmsg || !tca) { > > VLOG_ERR_RL(&rl, "Failed to get action stats, size error"); > >
Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.
On 6/6/23 11:38, Frode Nordahl wrote: > The tc module combines the use of the `tc_transact` helper > function for communication with the in-kernel tc infrastructure > with assertions on the reply data by `ofpbuf_at_assert` on the > received data prior to further processing. > > With the presence of bugs on the kernel side, we need to treat > the kernel as an unreliable service provider and replace assertions > on the reply from it with checks to avoid a fatal crash of OVS. > > For the record, the symptom of the crash is this in the log: > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size > failed in ofpbuf_at_assert() > > And an excerpt of the backtrace looks like this: > 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, > offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > tc_replace_flower (id=, flower=) at > ../lib/tc.c:3223 > 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, > match=, actions=, actions_len=, > ufid=, info=, stats=) at > ../lib/netdev-offload-tc.c:2096 > 0x561dac117541 in netdev_flow_put (stats=, > info=0x7fb65b7ba780, ufid=, act_len=, > actions=, > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at > ../lib/dpif-netlink.c:2297 > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at > ../lib/dpif-netlink.c:2384 > > Reported-At: https://launchpad.net/bugs/2018500 > Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police > action") > Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for > tc-offload") > Fixes: f98e418fbdb6 ("tc: Add tc flower functions") > Fixes: c1c9c9c4b636 ("Implement QoS framework.") > Signed-off-by: Frode Nordahl > --- > lib/netdev-linux.c | 32 +- > lib/tc.c | 49 -- > 2 files changed, 57 insertions(+), 24 deletions(-) > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index 36620199e..1efc837ac 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, > uint32_t kbits_rate, > > err = tc_transact(&request, &reply); > if (!err) { > -struct tcmsg *tc = > -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc); > +struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size); > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); > +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); > + > +if (!nlmsg || !tc) { > +VLOG_ERR_RL(&rl, "Failed to add match all policer, " > +"malformed reply"); We will leak a 'reply' here if this condition ever happens. But I'm still not sure what is the point of having and parsing a reply for this request at all. If it didn't fail, then we can assume it succeeded? We can't assume that cls_flower is working correctly, it has issues. Even more on older kernels. That's why we request to echo everythig back and comparing that kernel configured what we asked. I hope, we can trust at least some of the interfaces and don't need to re-check everything. Are there any known issues with the matchall classifier? If not, I'd suggest we just don't ask for reply. An error code should be enough. > +return EPROTO; > +} > ofpbuf_delete(reply); > } > > @@ -5744,26 +5751,27 @@ static int > tc_update_policer_action_stats(struct ofpbuf *msg, > struct ofputil_meter_stats *stats) > { > +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size); > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); > +struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca); > struct ovs_flow_stats stats_dropped; > struct ovs_flow_stats stats_hw; > struct ovs_flow_stats stats_sw; > const struct nlattr *act; > struct nlattr *prio; > -struct tcamsg *tca; > int error = 0; > > if (!stats) { > goto exit; > } > > -if (NLMSG_HDRLEN + sizeof *tca > msg->size) { > +if (!nlmsg || !tca) { > VLOG_ERR_RL(&rl, "Failed to get action stats, size error"); > error = EPROTO; > goto exit; > } > > -tca = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tca); > -act = nl_attr_find(msg, NLMSG_HDRLEN + sizeof *tca, TCA_ACT_TAB); > +act = nl_attr_find(&b, 0, TCA_ACT_TAB); > if (!act) { > VLOG_ERR_RL(&rl, "Failed to get action stats, can't find attribute"); > error = EPROTO; > @@ -6028,20 +6036,26 @@ static int > tc_parse_class(const struct ofpbuf *msg, unsigned int *handlep, > struct nlattr **options, struct netdev_queue_stats *stats) > { > +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size); > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, size
[ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.
The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. With the presence of bugs on the kernel side, we need to treat the kernel as an unreliable service provider and replace assertions on the reply from it with checks to avoid a fatal crash of OVS. For the record, the symptom of the crash is this in the log: EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 tc_replace_flower (id=, flower=) at ../lib/tc.c:3223 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=, actions=, actions_len=, ufid=, info=, stats=) at ../lib/netdev-offload-tc.c:2096 0x561dac117541 in netdev_flow_put (stats=, info=0x7fb65b7ba780, ufid=, act_len=, actions=, match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action") Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload") Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Fixes: c1c9c9c4b636 ("Implement QoS framework.") Signed-off-by: Frode Nordahl --- lib/netdev-linux.c | 32 +- lib/tc.c | 49 -- 2 files changed, 57 insertions(+), 24 deletions(-) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 36620199e..1efc837ac 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, uint32_t kbits_rate, err = tc_transact(&request, &reply); if (!err) { -struct tcmsg *tc = -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc); +struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size); +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); + +if (!nlmsg || !tc) { +VLOG_ERR_RL(&rl, "Failed to add match all policer, " +"malformed reply"); +return EPROTO; +} ofpbuf_delete(reply); } @@ -5744,26 +5751,27 @@ static int tc_update_policer_action_stats(struct ofpbuf *msg, struct ofputil_meter_stats *stats) { +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size); +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); +struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca); struct ovs_flow_stats stats_dropped; struct ovs_flow_stats stats_hw; struct ovs_flow_stats stats_sw; const struct nlattr *act; struct nlattr *prio; -struct tcamsg *tca; int error = 0; if (!stats) { goto exit; } -if (NLMSG_HDRLEN + sizeof *tca > msg->size) { +if (!nlmsg || !tca) { VLOG_ERR_RL(&rl, "Failed to get action stats, size error"); error = EPROTO; goto exit; } -tca = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tca); -act = nl_attr_find(msg, NLMSG_HDRLEN + sizeof *tca, TCA_ACT_TAB); +act = nl_attr_find(&b, 0, TCA_ACT_TAB); if (!act) { VLOG_ERR_RL(&rl, "Failed to get action stats, can't find attribute"); error = EPROTO; @@ -6028,20 +6036,26 @@ static int tc_parse_class(const struct ofpbuf *msg, unsigned int *handlep, struct nlattr **options, struct netdev_queue_stats *stats) { +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size); +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc); static const struct nl_policy tca_policy[] = { [TCA_OPTIONS] = { .type = NL_A_NESTED, .optional = false }, [TCA_STATS2] = { .type = NL_A_NESTED, .optional = false }, }; struct nlattr *ta[ARRAY_SIZE(tca_policy)]; -if (!nl_policy_parse(msg, NLMSG_HDRLEN + sizeof(struct tcmsg), - tca_policy, ta, ARRAY_SIZE(ta))) { +if (!nlmsg || !tc) { +VLOG_ERR_RL(&rl, "failed to parse class message, malformed reply"); +goto error; +} + +if (!nl_policy_parse(&b, 0, tca_policy, ta, ARRAY_SIZE(ta))) { VLOG_WARN_RL(&rl, "failed to parse class message"); goto error; } if (handlep) { -struct tcmsg *tc = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tc); *handlep = tc->tcm_handle; } diff --git