Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-03-01 Thread Jiri Benc
On Mon, 29 Feb 2016 09:13:50 -0800, Tom Herbert wrote:
> As defined now, GPB can't be used with VXLAN-GPE at all, but when I
> read your patch it looks very much like GPB is being checked and
> allowed in the VXLAN-GPE path. The fact that "if (vs->flags &
> VXLAN_F_GBP)" always fails for VXLAN-GPE packets because of
> configuration constraints is not at all obvious, and really this just
> results in an unnecessary conditional that gives the same answer for
> every single VXLAN-GPE packet which we've already checked for just a
> few lines above. At least the check for GPB could be moved to an else
> block of " if (vs->flags & VXLAN_F_GPE)", this alone improves clarity
> and eliminates an unnecessary conditional in the VXLAN-GPE path.

The problem here is ordering. GPE needs to be called before
iptunnel_pull_header, while GBP needs to be called after udp_tun_rx_dst
(and hence after iptunnel_pull_header).

I agree that it's a check that's done for every packet and would be
nice to get rid of. On the other hand, the amount of processing in the
rx path of vxlan is so huge that it hardly matters. Yes, we should work
on overall vxlan performance and it's something I'm actually looking
into.

As for not being obvious that the GBP processing can't happen, I see
your point. I'll add a comment that explains this to the code in v2.

Thanks,

 Jiri


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-29 Thread Tom Herbert
On Mon, Feb 29, 2016 at 2:23 AM, Jiri Benc  wrote:
> On Sat, 27 Feb 2016 12:54:52 -0800, Tom Herbert wrote:
>> Yes, but RCO has not been specified for VXLAN-GPE either
>
> As far as I can see, RCO will just work with VXLAN-GPE. But I have no
> problem disallowing them to be set together, if you prefer that.
>
>> so the patch
>> does not correctly refuse setting those two together. Inevitably
>> though, those and other extensions will defined for VXLAN-GPE and new
>> ones for VXLAN. Again, the protocols are fundamentally incompatible,
>> so instead of trying to enforce each valid combination at
>> configuration
>
> We need to do the checking in either case. If we accepted unsupported
> combinations and then just silently ignored them, we'd be in troubles
> later when such combination becomes defined/supported. There would be
> no way for the userspace tools to detect whether a particular kernel
> supports the combination or not.
>
> So, we need to check for supported combination of options during
> configuration anyway.
>
> And when we have that, I don't really see the reason for doing that
> kind of code duplication that you suggest.
>
>> or performing multiple checks for flavor each time we
>> look at a packet, it seems easier to split the parsing with at most
>> one check for the protocol variant. For instance in
>> vxlan_udp_encap_recv just do:
>>
>> if (vs->flags & VXLAN_F_GPE)
>>if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
>>goto drop;
>> else
>>if (!vxlan_parse_gpe(, skb, vs->flags))
>>goto drop;
>
> Most of the code of these two functions will be identical. To
> consolidate that as much as possible, you'll end up with what I have or
> something very similar.
>
>> And then move REMCSUM and GPB and other protocol specific checks to
>> the right function.
>
> And when RCO is defined for GPE, we copy the code? Doesn't make sense,
> sorry.
>
> If you look at the code in the current net-next (and the code after
> this patchset), the extension handling has been made generic and each
> extension gets its own handler function, leading to clean separation in
> the code. There's no reason to split the vxlan_rcv into two functions
> doing the same things but with slightly different calls to extensions.
>
They may or may not be "slightly different"; if they are the same
(like RCO for VXLAN-GPE uses the low order bits in VNI) then a common
backend function can be called.

As defined now, GPB can't be used with VXLAN-GPE at all, but when I
read your patch it looks very much like GPB is being checked and
allowed in the VXLAN-GPE path. The fact that "if (vs->flags &
VXLAN_F_GBP)" always fails for VXLAN-GPE packets because of
configuration constraints is not at all obvious, and really this just
results in an unnecessary conditional that gives the same answer for
every single VXLAN-GPE packet which we've already checked for just a
few lines above. At least the check for GPB could be moved to an else
block of " if (vs->flags & VXLAN_F_GPE)", this alone improves clarity
and eliminates an unnecessary conditional in the VXLAN-GPE path.

>  Jiri


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-29 Thread Jiri Benc
On Sat, 27 Feb 2016 12:54:52 -0800, Tom Herbert wrote:
> Yes, but RCO has not been specified for VXLAN-GPE either

As far as I can see, RCO will just work with VXLAN-GPE. But I have no
problem disallowing them to be set together, if you prefer that.

> so the patch
> does not correctly refuse setting those two together. Inevitably
> though, those and other extensions will defined for VXLAN-GPE and new
> ones for VXLAN. Again, the protocols are fundamentally incompatible,
> so instead of trying to enforce each valid combination at
> configuration

We need to do the checking in either case. If we accepted unsupported
combinations and then just silently ignored them, we'd be in troubles
later when such combination becomes defined/supported. There would be
no way for the userspace tools to detect whether a particular kernel
supports the combination or not.

So, we need to check for supported combination of options during
configuration anyway.

And when we have that, I don't really see the reason for doing that
kind of code duplication that you suggest.

> or performing multiple checks for flavor each time we
> look at a packet, it seems easier to split the parsing with at most
> one check for the protocol variant. For instance in
> vxlan_udp_encap_recv just do:
> 
> if (vs->flags & VXLAN_F_GPE)
>if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
>goto drop;
> else
>if (!vxlan_parse_gpe(, skb, vs->flags))
>goto drop;

Most of the code of these two functions will be identical. To
consolidate that as much as possible, you'll end up with what I have or
something very similar.

> And then move REMCSUM and GPB and other protocol specific checks to
> the right function.

And when RCO is defined for GPE, we copy the code? Doesn't make sense,
sorry.

If you look at the code in the current net-next (and the code after
this patchset), the extension handling has been made generic and each
extension gets its own handler function, leading to clean separation in
the code. There's no reason to split the vxlan_rcv into two functions
doing the same things but with slightly different calls to extensions.

 Jiri


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-27 Thread Tom Herbert
On Sat, Feb 27, 2016 at 12:54 PM, Tom Herbert  wrote:
> On Sat, Feb 27, 2016 at 11:31 AM, Jiri Benc  wrote:
>> On Fri, 26 Feb 2016 15:51:29 -0800, Tom Herbert wrote:
>>> I don't think this is right. VXLAN-GPE is a separate protocol than
>>> VXLAN, they are not compatible on the wire and don't share flags or
>>> fields (for instance GPB uses bits in VXLAN that hold the next
>>> protocol in VXLAN-GPE). Neither is there a VXLAN_F_GPE flag defined in
>>> VXLAN to differentiate the two. So VXLAN-GPE would be used on a
>>> different port
>>
>> Yes, and that's exactly what this patchset does. If there's the
>> VXLAN_F_GPE flag defined while creating the interface, the used UDP
>> port defaults to the VXLAN-GPE UDP port (but can be overriden) and the
>> driver expects that all packets received are VXLAN-GPE.
>>
>> Note also that you can't define both GPE and GBP together, because as
>> you noted, they're not compatible. The driver correctly refuses such
>> combination.
>>
> Yes, but RCO has not been specified for VXLAN-GPE either so the patch
> does not correctly refuse setting those two together. Inevitably
> though, those and other extensions will defined for VXLAN-GPE and new
> ones for VXLAN. Again, the protocols are fundamentally incompatible,
> so instead of trying to enforce each valid combination at
> configuration or performing multiple checks for flavor each time we
> look at a packet, it seems easier to split the parsing with at most
> one check for the protocol variant. For instance in
> vxlan_udp_encap_recv just do:
>
> if (vs->flags & VXLAN_F_GPE)
>if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
>goto drop;
> else
>if (!vxlan_parse_gpe(, skb, vs->flags))
>goto drop;
>

I meant

if (vs->flags & VXLAN_F_GPE)
   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
   goto drop;
else
   if (!vxlan_parse_hdr(, skb, vs->flags))
   goto drop;

>
> And then move REMCSUM and GPB and other protocol specific checks to
> the right function.
>
> Tom


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-27 Thread Tom Herbert
On Sat, Feb 27, 2016 at 11:31 AM, Jiri Benc  wrote:
> On Fri, 26 Feb 2016 15:51:29 -0800, Tom Herbert wrote:
>> I don't think this is right. VXLAN-GPE is a separate protocol than
>> VXLAN, they are not compatible on the wire and don't share flags or
>> fields (for instance GPB uses bits in VXLAN that hold the next
>> protocol in VXLAN-GPE). Neither is there a VXLAN_F_GPE flag defined in
>> VXLAN to differentiate the two. So VXLAN-GPE would be used on a
>> different port
>
> Yes, and that's exactly what this patchset does. If there's the
> VXLAN_F_GPE flag defined while creating the interface, the used UDP
> port defaults to the VXLAN-GPE UDP port (but can be overriden) and the
> driver expects that all packets received are VXLAN-GPE.
>
> Note also that you can't define both GPE and GBP together, because as
> you noted, they're not compatible. The driver correctly refuses such
> combination.
>
Yes, but RCO has not been specified for VXLAN-GPE either so the patch
does not correctly refuse setting those two together. Inevitably
though, those and other extensions will defined for VXLAN-GPE and new
ones for VXLAN. Again, the protocols are fundamentally incompatible,
so instead of trying to enforce each valid combination at
configuration or performing multiple checks for flavor each time we
look at a packet, it seems easier to split the parsing with at most
one check for the protocol variant. For instance in
vxlan_udp_encap_recv just do:

if (vs->flags & VXLAN_F_GPE)
   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
   goto drop;
else
   if (!vxlan_parse_gpe(, skb, vs->flags))
   goto drop;

And then move REMCSUM and GPB and other protocol specific checks to
the right function.

Tom


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-27 Thread Jiri Benc
On Fri, 26 Feb 2016 15:51:29 -0800, Tom Herbert wrote:
> I don't think this is right. VXLAN-GPE is a separate protocol than
> VXLAN, they are not compatible on the wire and don't share flags or
> fields (for instance GPB uses bits in VXLAN that hold the next
> protocol in VXLAN-GPE). Neither is there a VXLAN_F_GPE flag defined in
> VXLAN to differentiate the two. So VXLAN-GPE would be used on a
> different port

Yes, and that's exactly what this patchset does. If there's the
VXLAN_F_GPE flag defined while creating the interface, the used UDP
port defaults to the VXLAN-GPE UDP port (but can be overriden) and the
driver expects that all packets received are VXLAN-GPE.

Note also that you can't define both GPE and GBP together, because as
you noted, they're not compatible. The driver correctly refuses such
combination.

> and probably needs its own rcv functions.

I don't see the need for code duplication. This patchset does exactly
what you described and reuses the code, as most of it is really the
same for all VXLAN modes. I also made sure this is as clean as possible
in the driver which was the reason for the previous 4 cleanup patchsets.

 Jiri


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-26 Thread Tom Herbert
On Thu, Feb 25, 2016 at 11:48 PM, Jiri Benc  wrote:
> Implement VXLAN-GPE. Only L2 mode (i.e. encapsulated Ethernet frame) is
> supported by this patch.
>
> L3 mode will be added by subsequent patches.
>
> Signed-off-by: Jiri Benc 
> ---
>  drivers/net/vxlan.c  | 68 
> ++--
>  include/net/vxlan.h  | 62 +++-
>  include/uapi/linux/if_link.h |  8 ++
>  3 files changed, 135 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 775ddb48388d..c7844bae339d 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1192,6 +1192,33 @@ out:
> unparsed->vx_flags &= ~VXLAN_GBP_USED_BITS;
>  }
>
> +static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
> +   struct sk_buff *skb, u32 vxflags)
> +{
> +   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
> +
> +   /* Need to have Next Protocol set for interfaces in GPE mode. */
> +   if (!gpe->np_applied)
> +   return false;
> +   /* "The initial version is 0. If a receiver does not support the
> +* version indicated it MUST drop the packet.
> +*/
> +   if (gpe->version != 0)
> +   return false;
> +   /* "When the O bit is set to 1, the packet is an OAM packet and OAM
> +* processing MUST occur." However, we don't implement OAM
> +* processing, thus drop the packet.
> +*/
> +   if (gpe->oam_flag)
> +   return false;
> +
> +   if (gpe->next_protocol != VXLAN_GPE_NP_ETHERNET)
> +   return false;
> +
> +   unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
> +   return true;
> +}
> +
>  static bool vxlan_set_mac(struct vxlan_dev *vxlan,
>   struct vxlan_sock *vs,
>   struct sk_buff *skb)
> @@ -1307,6 +1334,9 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
> *skb)
> /* For backwards compatibility, only allow reserved fields to be
>  * used by VXLAN extensions if explicitly requested.
>  */
> +   if (vs->flags & VXLAN_F_GPE)
> +   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
> +   goto drop;

I don't think this is right. VXLAN-GPE is a separate protocol than
VXLAN, they are not compatible on the wire and don't share flags or
fields (for instance GPB uses bits in VXLAN that hold the next
protocol in VXLAN-GPE). Neither is there a VXLAN_F_GPE flag defined in
VXLAN to differentiate the two. So VXLAN-GPE would be used on a
different port and probably needs its own rcv functions.

Tom

> if (vs->flags & VXLAN_F_REMCSUM_RX)
> if (!vxlan_remcsum(, skb, vs->flags))
> goto drop;
> @@ -1685,6 +1715,14 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, 
> u32 vxflags,
> gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
>  }
>
> +static void vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags)
> +{
> +   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
> +
> +   gpe->np_applied = 1;
> +   gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
> +}
> +
>  static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>int iphdr_len, __be32 vni,
>struct vxlan_metadata *md, u32 vxflags,
> @@ -1744,6 +1782,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
> dst_entry *dst,
>
> if (vxflags & VXLAN_F_GBP)
> vxlan_build_gbp_hdr(vxh, vxflags, md);
> +   if (vxflags & VXLAN_F_GPE)
> +   vxlan_build_gpe_hdr(vxh, vxflags);
>
> skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> return 0;
> @@ -2515,6 +2555,7 @@ static const struct nla_policy 
> vxlan_policy[IFLA_VXLAN_MAX + 1] = {
> [IFLA_VXLAN_REMCSUM_RX] = { .type = NLA_U8 },
> [IFLA_VXLAN_GBP]= { .type = NLA_FLAG, },
> [IFLA_VXLAN_REMCSUM_NOPARTIAL]  = { .type = NLA_FLAG },
> +   [IFLA_VXLAN_GPE_MODE]   = { .type = NLA_U8, },
>  };
>
>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2714,6 +2755,10 @@ static int vxlan_dev_configure(struct net *src_net, 
> struct net_device *dev,
> __be16 default_port = vxlan->cfg.dst_port;
> struct net_device *lowerdev = NULL;
>
> +   if (((conf->flags & VXLAN_F_LEARN) && (conf->flags & VXLAN_F_GPE)) ||
> +   ((conf->flags & VXLAN_F_GBP) && (conf->flags & VXLAN_F_GPE)))
> +   return -EINVAL;
> +
> vxlan->net = src_net;
>
> dst->remote_vni = conf->vni;
> @@ -2770,8 +2815,12 @@ static int vxlan_dev_configure(struct net *src_net, 
> struct net_device *dev,
> dev->needed_headroom = needed_headroom;
>
> memcpy(>cfg, conf, sizeof(*conf));
> -   if (!vxlan->cfg.dst_port)
> -   vxlan->cfg.dst_port = 

[PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-25 Thread Jiri Benc
Implement VXLAN-GPE. Only L2 mode (i.e. encapsulated Ethernet frame) is
supported by this patch.

L3 mode will be added by subsequent patches.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c  | 68 ++--
 include/net/vxlan.h  | 62 +++-
 include/uapi/linux/if_link.h |  8 ++
 3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 775ddb48388d..c7844bae339d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1192,6 +1192,33 @@ out:
unparsed->vx_flags &= ~VXLAN_GBP_USED_BITS;
 }
 
+static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
+   struct sk_buff *skb, u32 vxflags)
+{
+   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
+
+   /* Need to have Next Protocol set for interfaces in GPE mode. */
+   if (!gpe->np_applied)
+   return false;
+   /* "The initial version is 0. If a receiver does not support the
+* version indicated it MUST drop the packet.
+*/
+   if (gpe->version != 0)
+   return false;
+   /* "When the O bit is set to 1, the packet is an OAM packet and OAM
+* processing MUST occur." However, we don't implement OAM
+* processing, thus drop the packet.
+*/
+   if (gpe->oam_flag)
+   return false;
+
+   if (gpe->next_protocol != VXLAN_GPE_NP_ETHERNET)
+   return false;
+
+   unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
+   return true;
+}
+
 static bool vxlan_set_mac(struct vxlan_dev *vxlan,
  struct vxlan_sock *vs,
  struct sk_buff *skb)
@@ -1307,6 +1334,9 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
+   if (vs->flags & VXLAN_F_GPE)
+   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
+   goto drop;
if (vs->flags & VXLAN_F_REMCSUM_RX)
if (!vxlan_remcsum(, skb, vs->flags))
goto drop;
@@ -1685,6 +1715,14 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, 
u32 vxflags,
gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
 }
 
+static void vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags)
+{
+   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
+
+   gpe->np_applied = 1;
+   gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
+}
+
 static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
   int iphdr_len, __be32 vni,
   struct vxlan_metadata *md, u32 vxflags,
@@ -1744,6 +1782,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
 
if (vxflags & VXLAN_F_GBP)
vxlan_build_gbp_hdr(vxh, vxflags, md);
+   if (vxflags & VXLAN_F_GPE)
+   vxlan_build_gpe_hdr(vxh, vxflags);
 
skb_set_inner_protocol(skb, htons(ETH_P_TEB));
return 0;
@@ -2515,6 +2555,7 @@ static const struct nla_policy 
vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_REMCSUM_RX] = { .type = NLA_U8 },
[IFLA_VXLAN_GBP]= { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL]  = { .type = NLA_FLAG },
+   [IFLA_VXLAN_GPE_MODE]   = { .type = NLA_U8, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2714,6 +2755,10 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
__be16 default_port = vxlan->cfg.dst_port;
struct net_device *lowerdev = NULL;
 
+   if (((conf->flags & VXLAN_F_LEARN) && (conf->flags & VXLAN_F_GPE)) ||
+   ((conf->flags & VXLAN_F_GBP) && (conf->flags & VXLAN_F_GPE)))
+   return -EINVAL;
+
vxlan->net = src_net;
 
dst->remote_vni = conf->vni;
@@ -2770,8 +2815,12 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
dev->needed_headroom = needed_headroom;
 
memcpy(>cfg, conf, sizeof(*conf));
-   if (!vxlan->cfg.dst_port)
-   vxlan->cfg.dst_port = default_port;
+   if (!vxlan->cfg.dst_port) {
+   if (conf->flags & VXLAN_F_GPE)
+   vxlan->cfg.dst_port = 4790; /* IANA assigned VXLAN-GPE 
port */
+   else
+   vxlan->cfg.dst_port = default_port;
+   }
vxlan->flags |= conf->flags;
 
if (!vxlan->cfg.age_interval)
@@ -2941,6 +2990,13 @@ static int vxlan_newlink(struct net *src_net, struct 
net_device *dev,
if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
 
+   if (data[IFLA_VXLAN_GPE_MODE]) {
+   u8 mode = nla_get_u8(data[IFLA_VXLAN_GPE_MODE]);
+
+   if (mode > 0