Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-19 Thread Jamal Hadi Salim

On 16-02-18 10:24 AM, John Fastabend wrote:

On 16-02-18 04:14 AM, Jamal Hadi Salim wrote:

On 16-02-17 06:07 PM, John Fastabend wrote:

[...]






IMO, it would be better at this early stage to enforce the correct
behavior for future generations.
To follow the netlink semantics which a lot of people are already
trained to think in.

Current netlink behavior is supposed to be:

1) NEW ==> "Create".
Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it
exists otherwise create"
Unfortunately different parts of the kernel often assume some
default from either #a or #b.



But this is already handled by the core cls_api.c code. We never
get to u32_change if the flags are not correct.

Look at the block right above the op call into the classifiers
change() code in cls_api.c. Starting at line 287.




Indeed that would cover s/ware filters but not h/ware. That will
depend on what hardware can do. Really,
all you need is to propagate those flags. Your
driver ndo can ignore them. I know i will need them.

Alternatively: If we say all filters are going to be stored in
s/ware as well i.e the tri-state we talked about then the cls_api
checks will work as well. But when you are talking millions
of filters (such as i deal with) - that may become impractical
(and then you are going to have all kind of clever things to find
whether an EXCLUSIVE will work or not etc).

cheers,
jamal



Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-19 Thread Or Gerlitz
On Thu, Feb 18, 2016 at 11:23 AM, Amir Vadai"  wrote:
> On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:

>> Actually thinking about this a bit more I wrote this thinking
>> that there existed some hardware that actually cared if it was
>> a new rule or an existing rule. For me it doesn't matter I do
>> the same thing in the new/replace cases I just write into the
>> slot on the hardware table and if it happens to have something
>> in it well its overwritten e.g. "replaced". This works because
>> the cls_u32 layer protects us from doing something unexpected.

>> I'm wondering (mostly asking the mlx folks) is there hardware
>> out there that cares to make this distinction between new and
>> replace? Otherwise I can just drop new and always use replace.
>> Or vice versa which is the case in its current form.

> I don't see a need for such a distinction in mlx hardware.

If you (say) have the same match and different action for a given rule
we have HW API to modify existing steering entry under which we would
end up calling the FW once instead of twice (delete old, add new).

Or.


Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-18 Thread Simon Horman
On Thu, Feb 18, 2016 at 11:23:35AM +0200, Amir Vadai" wrote:
> On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:
> > [...]
> > 
> > >>
> > >>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
> > >>> tc_u_hnode *h)
> > >>> +{
> > >>> +struct net_device *dev = tp->q->dev_queue->dev;
> > >>> +struct tc_cls_u32_offload u32_offload = {0};
> > >>> +struct tc_to_netdev offload;
> > >>> +
> > >>> +offload.type = TC_SETUP_CLSU32;
> > >>> +offload.cls_u32 = _offload;
> > >>> +
> > >>> +if (dev->netdev_ops->ndo_setup_tc) {
> > >>> +offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> > >>
> > >> TC_CLSU32_REPLACE_HNODE?
> > >>
> > > 
> > > Yep I made this change and will send out v4.
> > > 
> > > [...]
> > > 
> > >>
> > 
> > Actually thinking about this a bit more I wrote this thinking
> > that there existed some hardware that actually cared if it was
> > a new rule or an existing rule. For me it doesn't matter I do
> > the same thing in the new/replace cases I just write into the
> > slot on the hardware table and if it happens to have something
> > in it well its overwritten e.g. "replaced". This works because
> > the cls_u32 layer protects us from doing something unexpected.
> > 
> > I'm wondering (mostly asking the mlx folks) is there hardware
> > out there that cares to make this distinction between new and
> > replace? Otherwise I can just drop new and always use replace.
> > Or vice versa which is the case in its current form.
> I don't see a need for such a distinction in mlx hardware.

FWIW, I think it is unlikely such a distinction would
be needed for Netronome hardware.


Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-18 Thread John Fastabend
On 16-02-18 04:14 AM, Jamal Hadi Salim wrote:
> On 16-02-17 06:07 PM, John Fastabend wrote:
>> [...]
>>
> 
>> Actually thinking about this a bit more I wrote this thinking
>> that there existed some hardware that actually cared if it was
>> a new rule or an existing rule. For me it doesn't matter I do
>> the same thing in the new/replace cases I just write into the
>> slot on the hardware table and if it happens to have something
>> in it well its overwritten e.g. "replaced". This works because
>> the cls_u32 layer protects us from doing something unexpected.
>>
> 
> You are describing create-or-update which is a reasonable default
> BUT: counting on the user to specify the htid+bktid+nodeid
> for every filter and knowing what that means is prone to mistakes
> when for example (using your big hammer approach right now) they
> dont specify the handle and the kernel creates one for them.
> 
> IMO, it would be better at this early stage to enforce the correct
> behavior for future generations.
> To follow the netlink semantics which a lot of people are already
> trained to think in.
> 
> Current netlink behavior is supposed to be:
> 
> 1) NEW ==> "Create".
> Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it
> exists otherwise create"
> Unfortunately different parts of the kernel often assume some
> default from either #a or #b.
> 

But this is already handled by the core cls_api.c code. We never
get to u32_change if the flags are not correct.

Look at the block right above the op call into the classifiers
change() code in cls_api.c. Starting at line 287.

> 2) NEW|REPLACE flag ==> "Create if it doesnt exist and replace
> if it exists"
> 
> 3)NEW|EXCLUSIVE ==> "Create if it doesnt exist and fail if it
> exists"
> 
> 4)NEW|APPEND ==> "just fscking create; i dont care if it exists".
> 
> IOW, just add the flag field which is intepreted from whatever
> the user explicitly asks for. And reject say what the hardware
> doesnt support.

I think I agree with what your saying but I'm fairly sure this is
working as you describe.

> I have worked with tcams where we support #3. It is a bit inefficient
> because you have to check if a rule exists first. And i have worked
> in cases where #1 is assumed to mean #2 and at times #4. It is better
> user experience to be explicit.

Agreed, and I think it is :)

> 
> cheers,
> jamal



Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-18 Thread Jamal Hadi Salim

On 16-02-17 06:07 PM, John Fastabend wrote:

[...]




Actually thinking about this a bit more I wrote this thinking
that there existed some hardware that actually cared if it was
a new rule or an existing rule. For me it doesn't matter I do
the same thing in the new/replace cases I just write into the
slot on the hardware table and if it happens to have something
in it well its overwritten e.g. "replaced". This works because
the cls_u32 layer protects us from doing something unexpected.



You are describing create-or-update which is a reasonable default
BUT: counting on the user to specify the htid+bktid+nodeid
for every filter and knowing what that means is prone to mistakes
when for example (using your big hammer approach right now) they
dont specify the handle and the kernel creates one for them.

IMO, it would be better at this early stage to enforce the correct
behavior for future generations.
To follow the netlink semantics which a lot of people are already
trained to think in.

Current netlink behavior is supposed to be:

1) NEW ==> "Create".
Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it 
exists otherwise create"

Unfortunately different parts of the kernel often assume some
default from either #a or #b.

2) NEW|REPLACE flag ==> "Create if it doesnt exist and replace
if it exists"

3)NEW|EXCLUSIVE ==> "Create if it doesnt exist and fail if it
exists"

4)NEW|APPEND ==> "just fscking create; i dont care if it exists".

IOW, just add the flag field which is intepreted from whatever
the user explicitly asks for. And reject say what the hardware
doesnt support.
I have worked with tcams where we support #3. It is a bit inefficient
because you have to check if a rule exists first. And i have worked
in cases where #1 is assumed to mean #2 and at times #4. It is better
user experience to be explicit.

cheers,
jamal


Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-18 Thread Amir Vadai"
On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:
> [...]
> 
> >>
> >>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
> >>> tc_u_hnode *h)
> >>> +{
> >>> +struct net_device *dev = tp->q->dev_queue->dev;
> >>> +struct tc_cls_u32_offload u32_offload = {0};
> >>> +struct tc_to_netdev offload;
> >>> +
> >>> +offload.type = TC_SETUP_CLSU32;
> >>> +offload.cls_u32 = _offload;
> >>> +
> >>> +if (dev->netdev_ops->ndo_setup_tc) {
> >>> +offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> >>
> >> TC_CLSU32_REPLACE_HNODE?
> >>
> > 
> > Yep I made this change and will send out v4.
> > 
> > [...]
> > 
> >>
> 
> Actually thinking about this a bit more I wrote this thinking
> that there existed some hardware that actually cared if it was
> a new rule or an existing rule. For me it doesn't matter I do
> the same thing in the new/replace cases I just write into the
> slot on the hardware table and if it happens to have something
> in it well its overwritten e.g. "replaced". This works because
> the cls_u32 layer protects us from doing something unexpected.
> 
> I'm wondering (mostly asking the mlx folks) is there hardware
> out there that cares to make this distinction between new and
> replace? Otherwise I can just drop new and always use replace.
> Or vice versa which is the case in its current form.
I don't see a need for such a distinction in mlx hardware.

Thanks,
Amir.

> 
> Thanks,
> John
> 


Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-17 Thread John Fastabend
[...]

>>
>>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
>>> tc_u_hnode *h)
>>> +{
>>> +struct net_device *dev = tp->q->dev_queue->dev;
>>> +struct tc_cls_u32_offload u32_offload = {0};
>>> +struct tc_to_netdev offload;
>>> +
>>> +offload.type = TC_SETUP_CLSU32;
>>> +offload.cls_u32 = _offload;
>>> +
>>> +if (dev->netdev_ops->ndo_setup_tc) {
>>> +offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
>>
>> TC_CLSU32_REPLACE_HNODE?
>>
> 
> Yep I made this change and will send out v4.
> 
> [...]
> 
>>

Actually thinking about this a bit more I wrote this thinking
that there existed some hardware that actually cared if it was
a new rule or an existing rule. For me it doesn't matter I do
the same thing in the new/replace cases I just write into the
slot on the hardware table and if it happens to have something
in it well its overwritten e.g. "replaced". This works because
the cls_u32 layer protects us from doing something unexpected.

I'm wondering (mostly asking the mlx folks) is there hardware
out there that cares to make this distinction between new and
replace? Otherwise I can just drop new and always use replace.
Or vice versa which is the case in its current form.

Thanks,
John



Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-17 Thread John Fastabend
On 16-02-17 02:59 AM, Jamal Hadi Salim wrote:
> On 16-02-17 12:17 AM, John Fastabend wrote:
>> This patch allows netdev drivers to consume cls_u32 offloads via
>> the ndo_setup_tc ndo op.
>>
>> This works aligns with how network drivers have been doing qdisc
>> offloads for mqprio.
>>
> 
> This one i have comments on.
> 
>> Signed-off-by: John Fastabend 
>> ---
>>   include/linux/netdevice.h |6 ++-
>>   include/net/pkt_cls.h |   34 +++
>>   net/sched/cls_u32.c   |   99
>> -

[...]

>>   #endif /* CONFIG_NET_CLS_IND */
>>
>> +struct tc_cls_u32_knode {
>> +struct tcf_exts *exts;
>> +u8 fshift;
>> +u32 handle;
>> +u32 val;
>> +u32 mask;
>> +u32 link_handle;
>> +struct tc_u32_sel *sel;
>> +};
>>
> 
> Swapping sel and fshift would give better struct alignment.
> 

Makes sense I went ahead and did this.

>> +struct tc_cls_u32_hnode {
>> +u32 handle;
>> +u32 prio;
>> +unsigned int divisor;
>> +};
> 
> 
> Assuming in the future "prio" would be moved to something that is more
> generic classifier specific?
> 

Sure in the future it can be moved up into a generic struct if
it becomes useful there.

>> +enum tc_clsu32_command {
>> +TC_CLSU32_NEW_KNODE,
>> +TC_CLSU32_REPLACE_KNODE,
>> +TC_CLSU32_DELETE_KNODE,
>> +TC_CLSU32_NEW_HNODE,
>> +TC_CLSU32_REPLACE_HNODE,
>> +TC_CLSU32_DELETE_HNODE,
>> +};
>> +
> 
> It seems to me commands should be generic which speak
> Netlinkism. A REPLACE is just a flag to NEW. You dont need
> a NEW_XXX for every object. switchdev got this right.
> If you use cmd + flags  then you can have all kinds of
> netlink semantics that relay user intent from user space. Example:
> Exclusivity where user says "create if it doesnt exist but dont replace
> if it does".
> At minimal add "flags" there.
> Maybe not this release - but it makes sense to move "command" into
> tc_to_netdev; a u16 cmd + u16 flags.
> 

Yep next set of patches add the specific hw/sw/both semantics
and specific error handling strategies. For this series we just
get the simplest one.

>> +struct tc_cls_u32_offload {
>> +/* knode values */
>> +enum tc_clsu32_command command;
>> +union {
>> +struct tc_cls_u32_knode knode;
>> +struct tc_cls_u32_hnode hnode;
>> +};
>> +};
>> +
>>   #endif
>> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
>> index 4fbb674..d54bc94 100644
> 
>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
>> tc_u_hnode *h)
>> +{
>> +struct net_device *dev = tp->q->dev_queue->dev;
>> +struct tc_cls_u32_offload u32_offload = {0};
>> +struct tc_to_netdev offload;
>> +
>> +offload.type = TC_SETUP_CLSU32;
>> +offload.cls_u32 = _offload;
>> +
>> +if (dev->netdev_ops->ndo_setup_tc) {
>> +offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> 
> TC_CLSU32_REPLACE_HNODE?
> 

Yep I made this change and will send out v4.

[...]

> 
> 
> You are unconditionally calling the _hw_ api. For someone not using _hw_
> offloads, there are a few more instructions. Maybe just do the
> dev->netdev_ops->ndo_setup_tc first?
> 

My simple add/rem stress test didn't seem to make any difference.
I think there is so much other stuff going on here this is in the
noise. I'll take a pass at optimizing this later for all
cases not just the hw loading ones.

> 
> And to Or's point: How do i distinguish s/w from h/w?

Next series handles this for now just enable the simplest case.

> 
> cheers,
> jamal
> 



Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-17 Thread Jamal Hadi Salim

On 16-02-17 12:17 AM, John Fastabend wrote:

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.



This one i have comments on.


Signed-off-by: John Fastabend 
---
  include/linux/netdevice.h |6 ++-
  include/net/pkt_cls.h |   34 +++
  net/sched/cls_u32.c   |   99 -
  3 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e396060..47671ce0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct 
netdev_phys_item_id *a,
  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
   struct sk_buff *skb);

-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
   * that are being passed to the netdevice through the setup_tc op.
   */
  enum {
TC_SETUP_MQPRIO,
+   TC_SETUP_CLSU32,
  };

+struct tc_cls_u32_offload;
+
  struct tc_to_netdev {
unsigned int type;
union {
u8 tc;
+   struct tc_cls_u32_offload *cls_u32;
};
  };


>

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..59789ca 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,38 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
  }
  #endif /* CONFIG_NET_CLS_IND */

+struct tc_cls_u32_knode {
+   struct tcf_exts *exts;
+   u8 fshift;
+   u32 handle;
+   u32 val;
+   u32 mask;
+   u32 link_handle;
+   struct tc_u32_sel *sel;
+};



Swapping sel and fshift would give better struct alignment.


+struct tc_cls_u32_hnode {
+   u32 handle;
+   u32 prio;
+   unsigned int divisor;
+};



Assuming in the future "prio" would be moved to something that is more
generic classifier specific?


+enum tc_clsu32_command {
+   TC_CLSU32_NEW_KNODE,
+   TC_CLSU32_REPLACE_KNODE,
+   TC_CLSU32_DELETE_KNODE,
+   TC_CLSU32_NEW_HNODE,
+   TC_CLSU32_REPLACE_HNODE,
+   TC_CLSU32_DELETE_HNODE,
+};
+


It seems to me commands should be generic which speak
Netlinkism. A REPLACE is just a flag to NEW. You dont need
a NEW_XXX for every object. switchdev got this right.
If you use cmd + flags  then you can have all kinds of
netlink semantics that relay user intent from user space. Example:
Exclusivity where user says "create if it doesnt exist but dont replace
if it does".
At minimal add "flags" there.
Maybe not this release - but it makes sense to move "command" into 
tc_to_netdev; a u16 cmd + u16 flags.



+struct tc_cls_u32_offload {
+   /* knode values */
+   enum tc_clsu32_command command;
+   union {
+   struct tc_cls_u32_knode knode;
+   struct tc_cls_u32_hnode hnode;
+   };
+};
+
  #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..d54bc94 100644



+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;


TC_CLSU32_REPLACE_HNODE?


+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}




  static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
  {
struct tc_u_knode *n;
@@ -434,6 +522,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht)
RCU_INIT_POINTER(ht->ht[h],
 rtnl_dereference(n->next));
tcf_unbind_filter(tp, >res);
+   u32_remove_hw_knode(tp, n->handle);
call_rcu(>rcu, u32_delete_key_freepf_rcu);
}
}
@@ -454,6 +543,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *ht)
 phn;
 hn = >next, phn = rtnl_dereference(*hn)) {
if (phn == ht) {
+   u32_clear_hw_hnode(tp, ht);
RCU_INIT_POINTER(*hn, ht->next);
kfree_rcu(ht, rcu);
return 0;
@@ -540,8 +630,10 @@ static int u32_delete(struct tcf_proto *tp, unsigned long 
arg)
if (ht == NULL)
return 0;

-   if 

Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-16 Thread Jiri Pirko
Wed, Feb 17, 2016 at 06:17:09AM CET, john.fastab...@gmail.com wrote:
>This patch allows netdev drivers to consume cls_u32 offloads via
>the ndo_setup_tc ndo op.
>
>This works aligns with how network drivers have been doing qdisc
>offloads for mqprio.
>
>Signed-off-by: John Fastabend 

Acked-by: Jiri Pirko 


[net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-16 Thread John Fastabend
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend 
---
 include/linux/netdevice.h |6 ++-
 include/net/pkt_cls.h |   34 +++
 net/sched/cls_u32.c   |   99 -
 3 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e396060..47671ce0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct 
netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
   struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
TC_SETUP_MQPRIO,
+   TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
unsigned int type;
union {
u8 tc;
+   struct tc_cls_u32_offload *cls_u32;
};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..59789ca 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,38 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+   struct tcf_exts *exts;
+   u8 fshift;
+   u32 handle;
+   u32 val;
+   u32 mask;
+   u32 link_handle;
+   struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+   u32 handle;
+   u32 prio;
+   unsigned int divisor;
+};
+
+enum tc_clsu32_command {
+   TC_CLSU32_NEW_KNODE,
+   TC_CLSU32_REPLACE_KNODE,
+   TC_CLSU32_DELETE_KNODE,
+   TC_CLSU32_NEW_HNODE,
+   TC_CLSU32_REPLACE_HNODE,
+   TC_CLSU32_DELETE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+   /* knode values */
+   enum tc_clsu32_command command;
+   union {
+   struct tc_cls_u32_knode knode;
+   struct tc_cls_u32_hnode hnode;
+   };
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..d54bc94 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct tc_u_knode {
struct tc_u_knode __rcu *next;
@@ -424,6 +425,93 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+   offload.cls_u32->knode.handle = handle;
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
+static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_DELETE_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+