Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-27 Thread Jiri Pirko
Tue, Sep 26, 2017 at 04:50:10PM CEST, pab...@redhat.com wrote:
>On Tue, 2017-09-26 at 17:17 +0300, Or Gerlitz wrote:
>> On Tue, Sep 26, 2017 at 3:51 PM, Jiri Benc  wrote:
>> > On Tue, 26 Sep 2017 15:41:37 +0300, Or Gerlitz wrote:
>> > > Please note that the way the rule is being set to the HW driver is by 
>> > > delegation
>> > > done in flower, see these commits (specifically "Add offload support
>> > > using egress Hardware device")
>> > 
>> > It's very well possible the bug is somewhere in net/sched.
>> 
>> maybe before/instead you call it a bug, take a look on the design
>> there and maybe
>> tell us how to possibly do that otherwise?
>
>The problem, AFAICT, is in the API between flower and NIC implementing
>the offload, because in the above example the kernel will call the
>offload hook with exactly the same arguments with the 'bad' rule and
>the 'good' one - but the 'bad' rule should never match any packets.
>
>I think that can be fixed changing the flower code to invoke the
>offload hook for filters with tunnel-based match only if the device
>specified in such match has the appropriate type, e.g. given that
>currently only vxlan is supported with something like the code below
>(very rough and untested, just to give the idea):
>
>Cheers,
>
>Paolo
>
>---
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index d230cb4c8094..ff8476e56d4e 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -243,10 +243,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>struct fl_flow_key *mask,
>struct cls_fl_filter *f)
> {
>-   struct net_device *dev = tp->q->dev_queue->dev;
>+   struct net_device *ingress_dev, *dev = tp->q->dev_queue->dev;
>struct tc_cls_flower_offload cls_flower = {};
>int err;
> 
>+   ingress_dev = dev;
>if (!tc_can_offload(dev)) {
>if (tcf_exts_get_dev(dev, >exts, >hw_dev) ||
>(f->hw_dev && !tc_can_offload(f->hw_dev))) {
>@@ -259,6 +260,12 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>f->hw_dev = dev;
>}
> 
>+   if ((dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
>+dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_PORTS) ||
>+ // ... list all the others tunnel based keys ...
>+ ) && strcmp(ingress_dev->rtnl_link_ops->kind, "vxlan"))
>+   return tc_skip_sw(f->flags) ? -EINVAL : 0;

This kind of hooks are giving me nightmares. The code is screwed up as
it is already. I'm currently working on conversion to callbacks. This
part is handled in:
https://github.com/jpirko/linux_mlxsw/commits/jiri_devel_egdevcb


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-27 Thread Simon Horman
On Tue, Sep 26, 2017 at 09:29:03PM -0700, David Miller wrote:
> From: Simon Horman 
> Date: Mon, 25 Sep 2017 12:23:34 +0200
> 
> > From: Simon Horman 
> > 
> > John says:
> > 
> > This patch set allows offloading of TC flower match and set tunnel fields
> > to the NFP. The initial focus is on VXLAN traffic. Due to the current
> > state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> > handled. The match and action fields must explicity set this value to be
> > supported. Tunnel end point information is also offloaded to the NFP for
> > both encapsulation and decapsulation. The NFP expects 3 separate data sets
> > to be supplied.
>  ...
> 
> Series applied, thanks.
> 
> I see there is some discussion about ipv6 flow dissector key handling
> and ND keepalives, but those should be addressable in follow-on changes.

Thanks Dave,

I'll make sure that discussion is brought to a satisfactory conclusion.


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread David Miller
From: Simon Horman 
Date: Mon, 25 Sep 2017 12:23:34 +0200

> From: Simon Horman 
> 
> John says:
> 
> This patch set allows offloading of TC flower match and set tunnel fields
> to the NFP. The initial focus is on VXLAN traffic. Due to the current
> state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> handled. The match and action fields must explicity set this value to be
> supported. Tunnel end point information is also offloaded to the NFP for
> both encapsulation and decapsulation. The NFP expects 3 separate data sets
> to be supplied.
 ...

Series applied, thanks.

I see there is some discussion about ipv6 flow dissector key handling
and ND keepalives, but those should be addressable in follow-on changes.

Thanks.


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Paolo Abeni
On Tue, 2017-09-26 at 17:17 +0300, Or Gerlitz wrote:
> On Tue, Sep 26, 2017 at 3:51 PM, Jiri Benc  wrote:
> > On Tue, 26 Sep 2017 15:41:37 +0300, Or Gerlitz wrote:
> > > Please note that the way the rule is being set to the HW driver is by 
> > > delegation
> > > done in flower, see these commits (specifically "Add offload support
> > > using egress Hardware device")
> > 
> > It's very well possible the bug is somewhere in net/sched.
> 
> maybe before/instead you call it a bug, take a look on the design
> there and maybe
> tell us how to possibly do that otherwise?

The problem, AFAICT, is in the API between flower and NIC implementing
the offload, because in the above example the kernel will call the
offload hook with exactly the same arguments with the 'bad' rule and
the 'good' one - but the 'bad' rule should never match any packets.

I think that can be fixed changing the flower code to invoke the
offload hook for filters with tunnel-based match only if the device
specified in such match has the appropriate type, e.g. given that
currently only vxlan is supported with something like the code below
(very rough and untested, just to give the idea):

Cheers,

Paolo

---
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d230cb4c8094..ff8476e56d4e 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -243,10 +243,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
struct fl_flow_key *mask,
struct cls_fl_filter *f)
 {
-   struct net_device *dev = tp->q->dev_queue->dev;
+   struct net_device *ingress_dev, *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload cls_flower = {};
int err;
 
+   ingress_dev = dev;
if (!tc_can_offload(dev)) {
if (tcf_exts_get_dev(dev, >exts, >hw_dev) ||
(f->hw_dev && !tc_can_offload(f->hw_dev))) {
@@ -259,6 +260,12 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
f->hw_dev = dev;
}
 
+   if ((dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_KEYID) ||
+dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ENC_PORTS) ||
+ // ... list all the others tunnel based keys ...
+ ) && strcmp(ingress_dev->rtnl_link_ops->kind, "vxlan"))
+   return tc_skip_sw(f->flags) ? -EINVAL : 0;
+




Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 17:17:02 +0300, Or Gerlitz wrote:
> maybe before/instead you call it a bug,

But it is a bug. When offloaded, the rules must not behave differently.
That's the fundamental thing about offloading. Here, the rules behave
differently when offloaded and when not. That's a bug.

> take a look on the design there and maybe
> tell us how to possibly do that otherwise?

I don't know the design. It's the responsibility of those who implement
the offloading to do it in the way that it's consistent with the
software path. That has always been the case.

This needs to be fixed. If it can't be fixed, the feature needs to be
reverted. It's not that Linux has to make use of every single offload
supported by hardware. If the offloading cannot be fit into how Linux
works, then the offload can't be supported. There are in fact many
precedents.

 Jiri


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Or Gerlitz
On Tue, Sep 26, 2017 at 3:51 PM, Jiri Benc  wrote:
> On Tue, 26 Sep 2017 15:41:37 +0300, Or Gerlitz wrote:
>> Please note that the way the rule is being set to the HW driver is by 
>> delegation
>> done in flower, see these commits (specifically "Add offload support
>> using egress Hardware device")
>
> It's very well possible the bug is somewhere in net/sched.

maybe before/instead you call it a bug, take a look on the design
there and maybe
tell us how to possibly do that otherwise?


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 15:41:37 +0300, Or Gerlitz wrote:
> Please note that the way the rule is being set to the HW driver is by 
> delegation
> done in flower, see these commits (specifically "Add offload support
> using egress Hardware device")

It's very well possible the bug is somewhere in net/sched.

> What is the bug in your view?

If you replace skip_sw with skip_hw, the rules have to work
identically. In software, decapsulated packets appear on the vxlan0
interface, not on the eth0 interface. As the consequence, the second
example must not match on such packets. Those packets do not appear on
eth0 with software only path. eth0 sees encapsulated packets only. It's
vxlan0 that sees decapsulated packets with attached dst_metadata and
that's the only interface where the flower filter in the example can
match.

Hardware offloaded path must behave identically to the software path.

 Jiri


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Or Gerlitz
On Tue, Sep 26, 2017 at 1:15 PM, Jiri Benc  wrote:
> On Mon, 25 Sep 2017 19:04:53 +0200, Simon Horman wrote:
>> The MAC addresses are extracted from the netdevs already loaded in the
>> kernel and are monitored for any changes. The IP addresses are slightly
>> different in that they are extracted from the rules themselves. We make the
>> assumption that, if a packet is decapsulated at the end point and a match
>> is attempted on the IP address,
>
> You lost me here, I'm afraid. What do you mean by "match"?
>
>> that this IP address should be recognised
>> in the kernel. That being the case, the same traffic pattern should be
>> witnessed if the skip_hw flag is applied.
>
> Just to be really sure that this works correctly, can you confirm that
> this will match the packet:
>
> ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
> ip link set dev vxlan0 up
> tc qdisc add dev vxlan0 ingress
> ethtool -K eth0 hw-tc-offload on
> tc filter add dev vxlan0 protocol ip parent : flower enc_key_id 102 \
>enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]
>
> while this one will NOT match:

what do you exactly mean by "will not match"

> ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
> ip link set dev vxlan0 up
> tc qdisc add dev eth0 ingress
> ethtool -K eth0 hw-tc-offload on
> tc filter add dev eth0 protocol ip parent : flower enc_key_id 102 \
>enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

Please note that the way the rule is being set to the HW driver is by delegation
done in flower, see these commits (specifically "Add offload support
using egress Hardware device")

a6e1693 net/sched: cls_flower: Set the filter Hardware device for all use-cases
7091d8c net/sched: cls_flower: Add offload support using egress Hardware device
255cb30 net/sched: act_mirred: Add new tc_action_ops get_dev()

Since the egress port is not HW port netdev but rather SW virtual tunnel netdev
we have some logic in the kernel to delegate the rule programming to
HW via the HW netdev

OKay? if not, please elaborate

> We found that with mlx5, the second one actually matches, too. Which is
> a very serious bug. (Adding Paolo who found this. And adding a few more
> Mellanox guys to be aware of the bug.)

What is the bug in your view?

Or.


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-26 Thread Jiri Benc
On Mon, 25 Sep 2017 19:04:53 +0200, Simon Horman wrote:
> The MAC addresses are extracted from the netdevs already loaded in the
> kernel and are monitored for any changes. The IP addresses are slightly
> different in that they are extracted from the rules themselves. We make the
> assumption that, if a packet is decapsulated at the end point and a match
> is attempted on the IP address,

You lost me here, I'm afraid. What do you mean by "match"?

> that this IP address should be recognised
> in the kernel. That being the case, the same traffic pattern should be
> witnessed if the skip_hw flag is applied.

Just to be really sure that this works correctly, can you confirm that
this will match the packet:

ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
ip link set dev vxlan0 up
tc qdisc add dev vxlan0 ingress
ethtool -K eth0 hw-tc-offload on
tc filter add dev vxlan0 protocol ip parent : flower enc_key_id 102 \
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

while this one will NOT match:

ip link add vxlan0 type vxlan dstport 4789 dev eth0 external
ip link set dev vxlan0 up
tc qdisc add dev eth0 ingress
ethtool -K eth0 hw-tc-offload on
tc filter add dev eth0 protocol ip parent : flower enc_key_id 102 \
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

We found that with mlx5, the second one actually matches, too. Which is
a very serious bug. (Adding Paolo who found this. And adding a few more
Mellanox guys to be aware of the bug.)

 Jiri


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-25 Thread Simon Horman
On Mon, Sep 25, 2017 at 06:25:03PM +0300, Or Gerlitz wrote:
> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>  wrote:
> > From: Simon Horman 
> >
> > John says:
> >
> > This patch set allows offloading of TC flower match and set tunnel fields
> > to the NFP. The initial focus is on VXLAN traffic. Due to the current
> > state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> > handled. The match and action fields must explicity set this value to be
> > supported. Tunnel end point information is also offloaded to the NFP for
> > both encapsulation and decapsulation. The NFP expects 3 separate data sets
> > to be supplied.
> 
> > For decapsulation, 2 separate lists exist; a list of MAC addresses
> > referenced by an index comprised of the port number, and a list of IP
> > addresses. These IP addresses are not connected to a MAC or port.
> 
> Do these IP addresses exist on the host kernel SW stack? can the same
> set of TC rules be fully functional and generate the same traffic
> pattern when set to run in SW (skip_hw)?

Hi Or,

I asked John (now CCed) about this and his response was:

The MAC addresses are extracted from the netdevs already loaded in the
kernel and are monitored for any changes. The IP addresses are slightly
different in that they are extracted from the rules themselves. We make the
assumption that, if a packet is decapsulated at the end point and a match
is attempted on the IP address, that this IP address should be recognised
in the kernel. That being the case, the same traffic pattern should be
witnessed if the skip_hw flag is applied.


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-25 Thread Or Gerlitz
On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
 wrote:
> From: Simon Horman 
>
> John says:
>
> This patch set allows offloading of TC flower match and set tunnel fields
> to the NFP. The initial focus is on VXLAN traffic. Due to the current
> state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> handled. The match and action fields must explicity set this value to be
> supported. Tunnel end point information is also offloaded to the NFP for
> both encapsulation and decapsulation. The NFP expects 3 separate data sets
> to be supplied.

> For decapsulation, 2 separate lists exist; a list of MAC addresses
> referenced by an index comprised of the port number, and a list of IP
> addresses. These IP addresses are not connected to a MAC or port.

Do these IP addresses exist on the host kernel SW stack? can the same
set of TC rules be fully functional and generate the same traffic
pattern when set to run in SW (skip_hw)?


> The MAC
> addresses can be written as a block or one at a time (because they have an
> index, previous values can be overwritten) while the IP addresses are
> always written as a list of all the available IPs. Because the MAC address
> used as a tunnel end point may be associated with a physical port or may
> be a virtual netdev like an OVS bridge, we do not know which addresses
> should be offloaded. For this reason, all MAC addresses of active netdevs
> are offloaded to the NFP. A notifier checks for changes to any currently
> offloaded MACs or any new netdevs that may occur. For IP addresses, the
> tunnel end point used in the rules is known as the destination IP address
> must be specified in the flower classifier rule. When a new IP address
> appears in a rule, the IP address is offloaded. The IP is removed from the
> offloaded list when all rules matching on that IP are deleted.
>
> For encapsulation, a next hop table is updated on the NFP that contains
> the source/dest IPs, MACs and egress port. These are written individually
> when requested. If the NFP tries to encapsulate a packet but does not know
> the next hop, then is sends a request to the host. The host carries out a
> route lookup and populates the given entry on the NFP table. A notifier
> also exists to check for any links changing or going down in the kernel
> next hop table. If an offloaded next hop entry is removed from the kernel
> then it is also removed on the NFP.
>
> The NFP periodically sends a message to the host telling it which tunnel
> ports have packets egressing the system. The host uses this information to
> update the used value in the neighbour entry. This means that, rather than
> expire when it times out, the kernel will send an ARP to check if the link
> is still live. From an NFP perspective, this means that valid entries will
> not be removed from its next hop table.
>
> John Hurley (7):
>   nfp: add helper to get flower cmsg length
>   nfp: compile flower vxlan tunnel metadata match fields
>   nfp: compile flower vxlan tunnel set actions
>   nfp: offload flower vxlan endpoint MAC addresses
>   nfp: offload vxlan IPv4 endpoints of flower rules
>   nfp: flower vxlan neighbour offload
>   nfp: flower vxlan neighbour keep-alive
>
>  drivers/net/ethernet/netronome/nfp/Makefile|   3 +-
>  drivers/net/ethernet/netronome/nfp/flower/action.c | 169 -
>  drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  16 +-
>  drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  87 ++-
>  drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +
>  drivers/net/ethernet/netronome/nfp/flower/main.h   |  35 +
>  drivers/net/ethernet/netronome/nfp/flower/match.c  |  75 +-
>  .../net/ethernet/netronome/nfp/flower/metadata.c   |   2 +-
>  .../net/ethernet/netronome/nfp/flower/offload.c|  74 +-
>  .../ethernet/netronome/nfp/flower/tunnel_conf.c| 811 
> +
>  10 files changed, 1243 insertions(+), 42 deletions(-)
>  create mode 100644 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
>
> --
> 2.1.4
>


Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-25 Thread Jakub Kicinski
On Mon, 25 Sep 2017 12:23:34 +0200, Simon Horman wrote:
> From: Simon Horman 
> 
> John says:
> 
> This patch set allows offloading of TC flower match and set tunnel fields
> to the NFP. The initial focus is on VXLAN traffic. Due to the current
> state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> handled. The match and action fields must explicity set this value to be
> supported. Tunnel end point information is also offloaded to the NFP for
> both encapsulation and decapsulation. The NFP expects 3 separate data sets
> to be supplied.
...

Acked-by: Jakub Kicinski 


[PATCH net-next 0/7] nfp: flower vxlan tunnel offload

2017-09-25 Thread Simon Horman
From: Simon Horman 

John says:

This patch set allows offloading of TC flower match and set tunnel fields
to the NFP. The initial focus is on VXLAN traffic. Due to the current
state of the NFP firmware, only VXLAN traffic on well known port 4789 is
handled. The match and action fields must explicity set this value to be
supported. Tunnel end point information is also offloaded to the NFP for
both encapsulation and decapsulation. The NFP expects 3 separate data sets
to be supplied.

For decapsulation, 2 separate lists exist; a list of MAC addresses
referenced by an index comprised of the port number, and a list of IP
addresses. These IP addresses are not connected to a MAC or port. The MAC
addresses can be written as a block or one at a time (because they have an
index, previous values can be overwritten) while the IP addresses are
always written as a list of all the available IPs. Because the MAC address
used as a tunnel end point may be associated with a physical port or may
be a virtual netdev like an OVS bridge, we do not know which addresses
should be offloaded. For this reason, all MAC addresses of active netdevs
are offloaded to the NFP. A notifier checks for changes to any currently
offloaded MACs or any new netdevs that may occur. For IP addresses, the
tunnel end point used in the rules is known as the destination IP address
must be specified in the flower classifier rule. When a new IP address
appears in a rule, the IP address is offloaded. The IP is removed from the
offloaded list when all rules matching on that IP are deleted.

For encapsulation, a next hop table is updated on the NFP that contains
the source/dest IPs, MACs and egress port. These are written individually
when requested. If the NFP tries to encapsulate a packet but does not know
the next hop, then is sends a request to the host. The host carries out a
route lookup and populates the given entry on the NFP table. A notifier
also exists to check for any links changing or going down in the kernel
next hop table. If an offloaded next hop entry is removed from the kernel
then it is also removed on the NFP.

The NFP periodically sends a message to the host telling it which tunnel
ports have packets egressing the system. The host uses this information to
update the used value in the neighbour entry. This means that, rather than
expire when it times out, the kernel will send an ARP to check if the link
is still live. From an NFP perspective, this means that valid entries will
not be removed from its next hop table.

John Hurley (7):
  nfp: add helper to get flower cmsg length
  nfp: compile flower vxlan tunnel metadata match fields
  nfp: compile flower vxlan tunnel set actions
  nfp: offload flower vxlan endpoint MAC addresses
  nfp: offload vxlan IPv4 endpoints of flower rules
  nfp: flower vxlan neighbour offload
  nfp: flower vxlan neighbour keep-alive

 drivers/net/ethernet/netronome/nfp/Makefile|   3 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c | 169 -
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  16 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  87 ++-
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  35 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  75 +-
 .../net/ethernet/netronome/nfp/flower/metadata.c   |   2 +-
 .../net/ethernet/netronome/nfp/flower/offload.c|  74 +-
 .../ethernet/netronome/nfp/flower/tunnel_conf.c| 811 +
 10 files changed, 1243 insertions(+), 42 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c

-- 
2.1.4