Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.

2023-06-06 Thread Ilya Maximets
On 6/6/23 18:08, Frode Nordahl wrote:
> On Tue, Jun 6, 2023 at 5:46 PM Ilya Maximets  wrote:
>>
>> On 6/6/23 11:38, Frode Nordahl wrote:
>>> The tc module combines the use of the `tc_transact` helper
>>> function for communication with the in-kernel tc infrastructure
>>> with assertions on the reply data by `ofpbuf_at_assert` on the
>>> received data prior to further processing.
>>>
>>> With the presence of bugs on the kernel side, we need to treat
>>> the kernel as an unreliable service provider and replace assertions
>>> on the reply from it with checks to avoid a fatal crash of OVS.
>>>
>>> For the record, the symptom of the crash is this in the log:
>>> EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= 
>>> b->size failed in ofpbuf_at_assert()
>>>
>>> And an excerpt of the backtrace looks like this:
>>> 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, 
>>> offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
>>> tc_replace_flower (id=, flower=) at 
>>> ../lib/tc.c:3223
>>> 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, 
>>> match=, actions=, actions_len=,
>>> ufid=, info=, stats=) at 
>>> ../lib/netdev-offload-tc.c:2096
>>> 0x561dac117541 in netdev_flow_put (stats=, 
>>> info=0x7fb65b7ba780, ufid=, act_len=, 
>>> actions=,
>>> match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
>>> parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at 
>>> ../lib/dpif-netlink.c:2297
>>> try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at 
>>> ../lib/dpif-netlink.c:2384
>>>
>>> Reported-At: https://launchpad.net/bugs/2018500
>>> Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police 
>>> action")
>>> Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for 
>>> tc-offload")
>>> Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
>>> Fixes: c1c9c9c4b636 ("Implement QoS framework.")
>>> Signed-off-by: Frode Nordahl 
>>> ---
>>>  lib/netdev-linux.c | 32 +-
>>>  lib/tc.c   | 49 --
>>>  2 files changed, 57 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>> index 36620199e..1efc837ac 100644
>>> --- a/lib/netdev-linux.c
>>> +++ b/lib/netdev-linux.c
>>> @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, 
>>> uint32_t kbits_rate,
>>>
>>>  err = tc_transact(&request, &reply);
>>>  if (!err) {
>>> -struct tcmsg *tc =
>>> -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
>>> +struct ofpbuf b = ofpbuf_const_initializer(reply->data, 
>>> reply->size);
>>> +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
>>> +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc);
>>> +
>>> +if (!nlmsg || !tc) {
>>> +VLOG_ERR_RL(&rl, "Failed to add match all policer, "
>>> +"malformed reply");
>>
>> We will leak a 'reply' here if this condition ever happens.
>> But I'm still not sure what is the point of having and parsing
>> a reply for this request at all.  If it didn't fail, then we
>> can assume it succeeded?
> 
> Oh shoot, what can I say, iteration fatigue, all on me of course, will fix.
> 
>> We can't assume that cls_flower is working correctly, it has
>> issues.  Even more on older kernels.  That's why we request
>> to echo everythig back and comparing that kernel configured
>> what we asked.  I hope, we can trust at least some of the
>> interfaces and don't need to re-check everything.
>>
>> Are there any known issues with the matchall classifier?
>> If not, I'd suggest we just don't ask for reply.  An error
>> code should be enough.
> 
> As for if we can trust the kernel infrastructure here, I really can't say.
> I'd prefer to replicate the existing behavior tbh, if you look at the end
> of the function, the existing code even asserts that the reply is long
> enough before deleting it:
> https://github.com/openvswitch/ovs/blob/64cdc290ef441bc3b4c2cddc230311ba58bc31b3/lib/netdev-linux.c#L2718

For the sake of keeping it simple, let's keep the current logic, OK.
We can try to figure out why this check was added later.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.

2023-06-06 Thread Frode Nordahl
On Tue, Jun 6, 2023 at 5:46 PM Ilya Maximets  wrote:
>
> On 6/6/23 11:38, Frode Nordahl wrote:
> > The tc module combines the use of the `tc_transact` helper
> > function for communication with the in-kernel tc infrastructure
> > with assertions on the reply data by `ofpbuf_at_assert` on the
> > received data prior to further processing.
> >
> > With the presence of bugs on the kernel side, we need to treat
> > the kernel as an unreliable service provider and replace assertions
> > on the reply from it with checks to avoid a fatal crash of OVS.
> >
> > For the record, the symptom of the crash is this in the log:
> > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= 
> > b->size failed in ofpbuf_at_assert()
> >
> > And an excerpt of the backtrace looks like this:
> > 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, 
> > offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
> > tc_replace_flower (id=, flower=) at 
> > ../lib/tc.c:3223
> > 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, 
> > match=, actions=, actions_len=,
> > ufid=, info=, stats=) at 
> > ../lib/netdev-offload-tc.c:2096
> > 0x561dac117541 in netdev_flow_put (stats=, 
> > info=0x7fb65b7ba780, ufid=, act_len=, 
> > actions=,
> > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
> > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at 
> > ../lib/dpif-netlink.c:2297
> > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at 
> > ../lib/dpif-netlink.c:2384
> >
> > Reported-At: https://launchpad.net/bugs/2018500
> > Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police 
> > action")
> > Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for 
> > tc-offload")
> > Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
> > Fixes: c1c9c9c4b636 ("Implement QoS framework.")
> > Signed-off-by: Frode Nordahl 
> > ---
> >  lib/netdev-linux.c | 32 +-
> >  lib/tc.c   | 49 --
> >  2 files changed, 57 insertions(+), 24 deletions(-)
> >
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index 36620199e..1efc837ac 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, 
> > uint32_t kbits_rate,
> >
> >  err = tc_transact(&request, &reply);
> >  if (!err) {
> > -struct tcmsg *tc =
> > -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
> > +struct ofpbuf b = ofpbuf_const_initializer(reply->data, 
> > reply->size);
> > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
> > +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc);
> > +
> > +if (!nlmsg || !tc) {
> > +VLOG_ERR_RL(&rl, "Failed to add match all policer, "
> > +"malformed reply");
>
> We will leak a 'reply' here if this condition ever happens.
> But I'm still not sure what is the point of having and parsing
> a reply for this request at all.  If it didn't fail, then we
> can assume it succeeded?

Oh shoot, what can I say, iteration fatigue, all on me of course, will fix.

> We can't assume that cls_flower is working correctly, it has
> issues.  Even more on older kernels.  That's why we request
> to echo everythig back and comparing that kernel configured
> what we asked.  I hope, we can trust at least some of the
> interfaces and don't need to re-check everything.
>
> Are there any known issues with the matchall classifier?
> If not, I'd suggest we just don't ask for reply.  An error
> code should be enough.

As for if we can trust the kernel infrastructure here, I really can't say.
I'd prefer to replicate the existing behavior tbh, if you look at the end
of the function, the existing code even asserts that the reply is long
enough before deleting it:
https://github.com/openvswitch/ovs/blob/64cdc290ef441bc3b4c2cddc230311ba58bc31b3/lib/netdev-linux.c#L2718

> > +return EPROTO;
> > +}
> >  ofpbuf_delete(reply);
> >  }
> >
> > @@ -5744,26 +5751,27 @@ static int
> >  tc_update_policer_action_stats(struct ofpbuf *msg,
> > struct ofputil_meter_stats *stats)
> >  {
> > +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size);
> > +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
> > +struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca);
> >  struct ovs_flow_stats stats_dropped;
> >  struct ovs_flow_stats stats_hw;
> >  struct ovs_flow_stats stats_sw;
> >  const struct nlattr *act;
> >  struct nlattr *prio;
> > -struct tcamsg *tca;
> >  int error = 0;
> >
> >  if (!stats) {
> >  goto exit;
> >  }
> >
> > -if (NLMSG_HDRLEN + sizeof *tca > msg->size) {
> > +if (!nlmsg || !tca) {
> >  VLOG_ERR_RL(&rl, "Failed to get action stats, size error");
> >

Re: [ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.

2023-06-06 Thread Ilya Maximets
On 6/6/23 11:38, Frode Nordahl wrote:
> The tc module combines the use of the `tc_transact` helper
> function for communication with the in-kernel tc infrastructure
> with assertions on the reply data by `ofpbuf_at_assert` on the
> received data prior to further processing.
> 
> With the presence of bugs on the kernel side, we need to treat
> the kernel as an unreliable service provider and replace assertions
> on the reply from it with checks to avoid a fatal crash of OVS.
> 
> For the record, the symptom of the crash is this in the log:
> EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size 
> failed in ofpbuf_at_assert()
> 
> And an excerpt of the backtrace looks like this:
> 0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, 
> offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
> tc_replace_flower (id=, flower=) at 
> ../lib/tc.c:3223
> 0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, 
> match=, actions=, actions_len=,
> ufid=, info=, stats=) at 
> ../lib/netdev-offload-tc.c:2096
> 0x561dac117541 in netdev_flow_put (stats=, 
> info=0x7fb65b7ba780, ufid=, act_len=, 
> actions=,
> match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
> parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at 
> ../lib/dpif-netlink.c:2297
> try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at 
> ../lib/dpif-netlink.c:2384
> 
> Reported-At: https://launchpad.net/bugs/2018500
> Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police 
> action")
> Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for 
> tc-offload")
> Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
> Fixes: c1c9c9c4b636 ("Implement QoS framework.")
> Signed-off-by: Frode Nordahl 
> ---
>  lib/netdev-linux.c | 32 +-
>  lib/tc.c   | 49 --
>  2 files changed, 57 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 36620199e..1efc837ac 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, 
> uint32_t kbits_rate,
>  
>  err = tc_transact(&request, &reply);
>  if (!err) {
> -struct tcmsg *tc =
> -ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
> +struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size);
> +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
> +struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc);
> +
> +if (!nlmsg || !tc) {
> +VLOG_ERR_RL(&rl, "Failed to add match all policer, "
> +"malformed reply");

We will leak a 'reply' here if this condition ever happens.
But I'm still not sure what is the point of having and parsing
a reply for this request at all.  If it didn't fail, then we
can assume it succeeded?

We can't assume that cls_flower is working correctly, it has
issues.  Even more on older kernels.  That's why we request
to echo everythig back and comparing that kernel configured
what we asked.  I hope, we can trust at least some of the
interfaces and don't need to re-check everything.

Are there any known issues with the matchall classifier?
If not, I'd suggest we just don't ask for reply.  An error
code should be enough.

> +return EPROTO;
> +}
>  ofpbuf_delete(reply);
>  }
>  
> @@ -5744,26 +5751,27 @@ static int
>  tc_update_policer_action_stats(struct ofpbuf *msg,
> struct ofputil_meter_stats *stats)
>  {
> +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size);
> +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
> +struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca);
>  struct ovs_flow_stats stats_dropped;
>  struct ovs_flow_stats stats_hw;
>  struct ovs_flow_stats stats_sw;
>  const struct nlattr *act;
>  struct nlattr *prio;
> -struct tcamsg *tca;
>  int error = 0;
>  
>  if (!stats) {
>  goto exit;
>  }
>  
> -if (NLMSG_HDRLEN + sizeof *tca > msg->size) {
> +if (!nlmsg || !tca) {
>  VLOG_ERR_RL(&rl, "Failed to get action stats, size error");
>  error = EPROTO;
>  goto exit;
>  }
>  
> -tca = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tca);
> -act = nl_attr_find(msg, NLMSG_HDRLEN + sizeof *tca, TCA_ACT_TAB);
> +act = nl_attr_find(&b, 0, TCA_ACT_TAB);
>  if (!act) {
>  VLOG_ERR_RL(&rl, "Failed to get action stats, can't find attribute");
>  error = EPROTO;
> @@ -6028,20 +6036,26 @@ static int
>  tc_parse_class(const struct ofpbuf *msg, unsigned int *handlep,
> struct nlattr **options, struct netdev_queue_stats *stats)
>  {
> +struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size);
> +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, size

[ovs-dev] [PATCH v6] tc: fix crash on malformed reply from kernel.

2023-06-06 Thread Frode Nordahl
The tc module combines the use of the `tc_transact` helper
function for communication with the in-kernel tc infrastructure
with assertions on the reply data by `ofpbuf_at_assert` on the
received data prior to further processing.

With the presence of bugs on the kernel side, we need to treat
the kernel as an unreliable service provider and replace assertions
on the reply from it with checks to avoid a fatal crash of OVS.

For the record, the symptom of the crash is this in the log:
EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size 
failed in ofpbuf_at_assert()

And an excerpt of the backtrace looks like this:
0x561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, 
offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194
tc_replace_flower (id=, flower=) at 
../lib/tc.c:3223
0x561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, 
match=, actions=, actions_len=,
ufid=, info=, stats=) at 
../lib/netdev-offload-tc.c:2096
0x561dac117541 in netdev_flow_put (stats=, 
info=0x7fb65b7ba780, ufid=, act_len=, 
actions=,
match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257
parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at 
../lib/dpif-netlink.c:2297
try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at 
../lib/dpif-netlink.c:2384

Reported-At: https://launchpad.net/bugs/2018500
Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police 
action")
Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Fixes: c1c9c9c4b636 ("Implement QoS framework.")
Signed-off-by: Frode Nordahl 
---
 lib/netdev-linux.c | 32 +-
 lib/tc.c   | 49 --
 2 files changed, 57 insertions(+), 24 deletions(-)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 36620199e..1efc837ac 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2714,8 +2714,15 @@ tc_add_matchall_policer(struct netdev *netdev, uint32_t 
kbits_rate,
 
 err = tc_transact(&request, &reply);
 if (!err) {
-struct tcmsg *tc =
-ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
+struct ofpbuf b = ofpbuf_const_initializer(reply->data, reply->size);
+struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
+struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc);
+
+if (!nlmsg || !tc) {
+VLOG_ERR_RL(&rl, "Failed to add match all policer, "
+"malformed reply");
+return EPROTO;
+}
 ofpbuf_delete(reply);
 }
 
@@ -5744,26 +5751,27 @@ static int
 tc_update_policer_action_stats(struct ofpbuf *msg,
struct ofputil_meter_stats *stats)
 {
+struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size);
+struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
+struct tcamsg *tca = ofpbuf_try_pull(&b, sizeof *tca);
 struct ovs_flow_stats stats_dropped;
 struct ovs_flow_stats stats_hw;
 struct ovs_flow_stats stats_sw;
 const struct nlattr *act;
 struct nlattr *prio;
-struct tcamsg *tca;
 int error = 0;
 
 if (!stats) {
 goto exit;
 }
 
-if (NLMSG_HDRLEN + sizeof *tca > msg->size) {
+if (!nlmsg || !tca) {
 VLOG_ERR_RL(&rl, "Failed to get action stats, size error");
 error = EPROTO;
 goto exit;
 }
 
-tca = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tca);
-act = nl_attr_find(msg, NLMSG_HDRLEN + sizeof *tca, TCA_ACT_TAB);
+act = nl_attr_find(&b, 0, TCA_ACT_TAB);
 if (!act) {
 VLOG_ERR_RL(&rl, "Failed to get action stats, can't find attribute");
 error = EPROTO;
@@ -6028,20 +6036,26 @@ static int
 tc_parse_class(const struct ofpbuf *msg, unsigned int *handlep,
struct nlattr **options, struct netdev_queue_stats *stats)
 {
+struct ofpbuf b = ofpbuf_const_initializer(msg->data, msg->size);
+struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg);
+struct tcmsg *tc = ofpbuf_try_pull(&b, sizeof *tc);
 static const struct nl_policy tca_policy[] = {
 [TCA_OPTIONS] = { .type = NL_A_NESTED, .optional = false },
 [TCA_STATS2] = { .type = NL_A_NESTED, .optional = false },
 };
 struct nlattr *ta[ARRAY_SIZE(tca_policy)];
 
-if (!nl_policy_parse(msg, NLMSG_HDRLEN + sizeof(struct tcmsg),
- tca_policy, ta, ARRAY_SIZE(ta))) {
+if (!nlmsg || !tc) {
+VLOG_ERR_RL(&rl, "failed to parse class message, malformed reply");
+goto error;
+}
+
+if (!nl_policy_parse(&b, 0, tca_policy, ta, ARRAY_SIZE(ta))) {
 VLOG_WARN_RL(&rl, "failed to parse class message");
 goto error;
 }
 
 if (handlep) {
-struct tcmsg *tc = ofpbuf_at_assert(msg, NLMSG_HDRLEN, sizeof *tc);
 *handlep = tc->tcm_handle;
 }
 
diff --git