Re: [ovs-dev] [PATCH] netdev: Retry getting interfaces on inconsistent dumps from kernel

2018-07-12 Thread Jiri Benc
On Thu, 12 Jul 2018 08:18:43 +0200, Daniel Alvarez wrote:
> This patch in glibc [0] is fixing a bug where we may be getting
> inconsistent dumps from the kernel when listing interfaces due to
> a race condition.
> 
> This could happen if we try to retrieve them while interfaces are
> being added/removed from the system at the same time.
> For systems running against old glibc versions, this patch is retrying
> the operation up to 3 times and then proceeding by logging a
> warning.
> 
> Note that 3 times should be enough to not delay the operation much
> and since it's unlikely that we hit the race condition 3 times in
> a row. Still, if this happened, this patch is not changing the
> current behavior.
> 
> [0] 
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4
> 
> Signed-off-by: Daniel Alvarez 
> Signed-off-by: Jiri Benc 

Co-authored-by: Jiri Benc 

is better in this case :-)

Thanks, Daniel, for following up on this!

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif: Ensure ERSPAN GRE support

2018-06-07 Thread Jiri Benc
On Fri, 1 Jun 2018 11:23:12 -0700, William Tu wrote:
> Looking at the dpif_netlink_rtnl_probe_oot_tunnels(), since now we
> added ERSPAN feature, instead of probing geneve module,
> we should probe ip_gre module with a nlattr of ERSPAN (ex: HWID).
> If it does not return -ENOSUPPORT, then use the upstream ip_gre module.

That doesn't make sense. That function is used solely to determine
whether the out of (kernel) tree modules are loaded. If they are, ovs
prefers them.

Note that the features do not come into play here: if there are out of
tree modules present, they are used. It's very well possible that the
running kernel has more features but it's irrelevant.

It doesn't matter in which way the code determines whether there are
out of tree modules loaded or not. Probing for geneve thus works just
fine and there's no need to change it.

> And the next added feature should change this function to determine
> whether to use compat mode or not.  Do I understand it right?

No. The out of tree modules are preferred, irrespective of the
supported features.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif: Ensure ERSPAN GRE support

2018-06-01 Thread Jiri Benc
On Fri, 1 Jun 2018 09:15:33 -0700, Gregory Rose wrote:
> Since ERSPAN over gre/ip_gre was added to the Linux 4.16 kernel the 
> compat interface is needed
> for kernels up to 4.15 so that we can support ERSPAN.  If the built-in 
> gre/ip_gre kernel modules
> don't have the ERSPAN support in them then we have to use the compat 
> interface.

That's very wrong. The compat interface should not be used with
upstream kernel (except perhaps for very very very old kernels). We
converted the API to the standard rtnetlink for good reasons. New
features are not supported using the compat API. You are potentially
breaking future distribution kernels by reverting to an obsolete and
deprecated API.

You'll have to find a different way to do what you need. Eric described
pretty nicely a way to achieve that and how the fallbacks work, please
re-read his emails and modify the code accordingly.

> The target for USE_UPSTREAM_TUNNEL is moved to 4.16 now.  That's when 
> ERSPAN becomes
> fully supported.  Going forward the ERSPAN feature is the determinant 
> for whether gr/ip_gre
> compat mode is used or not.

And with the next added feature to the kernel, that next feature will be
what determines whether the compat mode will be used? And then next and
so on? This doesn't work. ERSPAN must not be the decision factor.
Instead, rtnetlink must be tried first and if and only if it fails,
compat mode can be used.

Please go read what Eric described about reading the value back.

As for the patch,

Nacked-by: Jiri Benc 

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath: compat: Fix ndo_size in RHEL 7.5 backport

2018-05-18 Thread Jiri Benc
On Thu, 17 May 2018 12:39:51 -0700, Yi-Hung Wei wrote:
> If 'ndo_size' is not set in 'struct net_device_ops', RHEL kernel will not
> make use of functions in 'struct net_device_ops_extended'.
> 
> Fixes: 39ca338374ab ("datapath: compat: Fix build on RHEL 7.5")
> Reported-by: Jiri Benc <jb...@redhat.com>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347070.html
> Signed-off-by: Yi-Hung Wei <yihung@gmail.com>

Reviewed-by: Jiri Benc <jb...@redhat.com>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] datapath: compat: Fix build on RHEL 7.5

2018-05-17 Thread Jiri Benc
On Fri, 11 May 2018 10:32:12 -0700, Yi-Hung Wei wrote:
> --- a/datapath/linux/compat/geneve.c
> +++ b/datapath/linux/compat/geneve.c
> @@ -1271,7 +1271,11 @@ static const struct net_device_ops geneve_netdev_ops = 
> {
>   .ndo_stop   = geneve_stop,
>   .ndo_start_xmit = geneve_dev_xmit,
>   .ndo_get_stats64= ip_tunnel_get_stats64,
> +#ifdef   HAVE_RHEL7_MAX_MTU
> + .extended.ndo_change_mtu = geneve_change_mtu,

Note that this will never be called by the RHEL kernel unless you
also set .ndo_size to sizeof(struct net_device_ops).

In other words, you've just effectively set .ndo_change_mtu to NULL.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] odp-util: Fix ofproto/trace with odp flow

2018-03-14 Thread Jiri Benc
On Tue, 13 Mar 2018 09:36:23 -0700, Darrell Ball wrote:
> [Darrell] There was no suggestion otherwise in general. We were discussing
> for one the difference b/w kernel and userspace DP for handling packet type
> aware support. Specifically, the difference is here is as fundamental as it
> gets -
> eth vs non-eth, The discussed support for ofproto/trace is also in that
> context;
> see Jan's response.

Ah, right, sorry.

So, the problem is that the two datapaths express the same thing
(packet type) differently, right? I think the solution would then be to
insert some kind of translation. ovs-dpctl dump-flows for the kernel
probably should output the flow in a format that ofproto/trace expects
and that matches the user space datapath. (And reverse that in the
other direction.)

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] odp-util: Fix ofproto/trace with odp flow

2018-03-13 Thread Jiri Benc
Darell, please fix your email client configuration to conform to
RFC 3676 (https://tools.ietf.org/html/rfc3676#section-4.5). Your
replies are unreadable. In particular, the text you're replying to must
be quoted by ">" character, clearly separating your reply from the
original email.

Also, wrap your lines at ~78 characters.

If your email client does not support this, it is severely broken and
you'll need to use a different one.

On Tue, 13 Mar 2018 02:35:57 +, Darrell Ball wrote:
> [Darrell] I guess part of the value of ofproto/trace is that it
> matches as close as possible to other injection sources.

Kernel datapath API has nothing to do with OpenFlow. In particular,
using output from ovs-dpctl dump-flows in a place where OpenFlow is
required could not be expected to work.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netlink: don't allocate per port netlink sockets

2018-02-07 Thread Jiri Benc
On Tue, 23 Jan 2018 09:50:00 -0800, Ben Pfaff wrote:
> Per-port sockets help OVS to improve fairness, so that a single busy
> port can't monopolize all slow-path resources.  We can't just throw that
> away.  If there's a problem with too many netlink sockets, let's come up
> with a solution, but it can't be to eliminate fairness entirely.

The main problem with this is the sockets are dedicated to particular
ports. With high number of ports, the number of sockets raises to such
high value that it stops helping performance anyway. There's not much
point of having thousands opened sockets per CPU, we'll not be able to
utilize them anyway.

Having dedicated sockets for each port is wasteful, especially if one
port is very busy with upcalls and other ports are mostly idle. (And if
many ports are busy with many upcalls, we're in performance troubles
anyway.) It would make sense to share the sockets between the ports.

I'd like to understand more about the thoughts behind the fairness. Why
should not a single busy port use all slow-path resources in case other
ports are idle?

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames

2017-12-20 Thread Jiri Benc
On Wed, 20 Dec 2017 15:09:22 -0500, Eric Garver wrote:
> skb_vlan_pop() expects skb->protocol to be a valid TPID for double
> tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
> shift the true ethertype into position for us.
> 
> Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
> Signed-off-by: Eric Garver <e...@erig.me>

Thanks!

Reviewed-by: Jiri Benc <jb...@redhat.com>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net v2] openvswitch: Fix pop_vlan action for double tagged frames

2017-12-20 Thread Jiri Benc
On Wed, 20 Dec 2017 10:39:32 -0500, Eric Garver wrote:
> + if (is_flow_key_valid(key) && key->eth.vlan.tci && key->eth.cvlan.tci)

Maybe (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)) for consistency
with the rest of the code? But it's just nitpicking.

The real problem here is when a double tagged packet leaves the ovs
bridge, it won't have the skb->protocol that the kernel expects: it
will be ethertype of the payload, while my understanding is it should
be the inner tpid, right?

This patch fixes that nicely for the pop vlan case. But what about
other cases? It seems to me that we need to add the logic to
key_extract.

Thanks!

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] openvswitch: Fix pop_vlan action for double tagged frames

2017-12-19 Thread Jiri Benc
On Tue, 19 Dec 2017 13:57:53 -0500, Eric Garver wrote:
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -559,8 +559,9 @@ static int parse_nsh(struct sk_buff *skb, struct 
> sw_flow_key *key)
>   *  of a correct length, otherwise the same as skb->network_header.
>   *  For other key->eth.type values it is left untouched.
>   *
> - *- skb->protocol: the type of the data starting at skb->network_header.
> - *  Equals to key->eth.type.
> + *- skb->protocol: For Ethernet, the ethertype or VLAN TPID.
> + *  For non-Ethernet, the type of the data starting at 
> skb->network_header
> + *  (also equal to key->eth.type).
>   */
>  static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>  {
> @@ -579,6 +580,7 @@ static int key_extract(struct sk_buff *skb, struct 
> sw_flow_key *key)
>   return -EINVAL;
>  
>   skb_reset_network_header(skb);
> + key->eth.type = skb->protocol;
>   } else {
>   eth = eth_hdr(skb);
>   ether_addr_copy(key->eth.src, eth->h_source);
> @@ -592,15 +594,14 @@ static int key_extract(struct sk_buff *skb, struct 
> sw_flow_key *key)
>   if (unlikely(parse_vlan(skb, key)))
>   return -ENOMEM;
>  
> - skb->protocol = parse_ethertype(skb);
> - if (unlikely(skb->protocol == htons(0)))
> + key->eth.type = parse_ethertype(skb);
> + if (unlikely(key->eth.type == htons(0)))
>   return -ENOMEM;
>  
>   skb_reset_network_header(skb);
>   __skb_push(skb, skb->data - skb_mac_header(skb));
>   }
>   skb_reset_mac_len(skb);
> - key->eth.type = skb->protocol;
>  
>   /* Network layer. */
>   if (key->eth.type == htons(ETH_P_IP)) {

Unfortunately, this does not work. key_extract must set skb->protocol
even for Ethernet frames that come from a mixed L2/L3 tunnel. Such
packets will have key->mac_proto set to MAC_PROTO_ETHERNET and
skb->protocol set to ETH_P_TEB (see key_extract_mac_proto). In
key_extract, skb->protocol has to be correctly set to the dissected
value.

Which means that we have to check for the existence of inner vlan tag
(by checking key->eth.cvlan.tci or, perhaps better, by returning it
from parse_vlan) and set skb->protocol accordingly.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/8] netlink: provide network namespace id from a msg.

2017-12-04 Thread Jiri Benc
On Mon, 4 Dec 2017 11:44:19 -0200, Flavio Leitner wrote:
> It is only positive numbers if you review the kernel sources today,
> but the API is exposing signed 32bit, so not sure what can happen
> in the future if the kernel decides to expand the range.

I think you can assume that it will always be a positive number.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v15] openvswitch: enable NSH support

2017-11-06 Thread Jiri Benc
On Sat, 4 Nov 2017 07:29:46 -0700, Pravin Shelar wrote:
> > +int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh)
> > +{
> > +   struct nshhdr *nh;
> > +   size_t length = nsh_hdr_len(pushed_nh);
> > +   u8 next_proto;
> > +
> > +   if (skb->mac_len) {
> > +   next_proto = TUN_P_ETHERNET;
> > +   } else {
> > +   next_proto = tun_p_from_eth_p(skb->protocol);
> > +   if (!next_proto)
> > +   return -EAFNOSUPPORT;  
> check for supported protocols can be moved to flow install validation
> in __ovs_nla_copy_actions().

You mean the check for !next_proto? It needs to be present for
correctness of nsh_push. This function has to be self contained, it
will be used by more callers than opevswitch, namely tc.

It's actually not so much a check for "supported protocols", it's
rather a check of return value of a function that converts ethertype to
a 1 byte tunnel type. Blindly using a result of a function that may
return error would be wrong. Openvswitch is free to perform additional
checks but this needs to stay.

> > +int nsh_pop(struct sk_buff *skb)
> > +{
> > +   struct nshhdr *nh;
> > +   size_t length;
> > +   __be16 inner_proto;
> > +
> > +   if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN))
> > +   return -ENOMEM;
> > +   nh = (struct nshhdr *)(skb->data);
> > +   length = nsh_hdr_len(nh);
> > +   inner_proto = tun_p_to_eth_p(nh->np);  
> same as above, this check can be moved to flow install 
> __ovs_nla_copy_actions().

There's no check here.

> > +   if (!pskb_may_pull(skb, length))
> > +   return -ENOMEM;
> > +
> > +   if (!inner_proto)
> > +   return -EAFNOSUPPORT;

Did you mean this one instead? Then see above, this has to stay.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v15] openvswitch: enable NSH support

2017-11-01 Thread Jiri Benc
On Wed,  1 Nov 2017 12:03:01 +0800, Yi Yang wrote:
> v14->v15
>  - Check size in nsh_hdr_from_nlattr
>  - Fixed four small issues pointed out By Jiri and Eric

Acked-by: Jiri Benc <jb...@redhat.com>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v14] openvswitch: enable NSH support

2017-10-31 Thread Jiri Benc
On Mon, 30 Oct 2017 09:29:34 +0800, Yi Yang wrote:
> +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
> +const struct nlattr *a)
> +{
> + struct nshhdr *nh;
> + size_t length;
> + int err;
> + u8 flags;
> + u8 ttl;
> + int i;
> +
> + struct ovs_key_nsh key;
> + struct ovs_key_nsh mask;
> +
> + err = nsh_key_from_nlattr(a, , );
> + if (err)
> + return err;
> +
> + /* Make sure the NSH base header is there */
> + if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN))

This should be skb_network_offset(skb) + NSH_BASE_HDR_LEN.

> +size_t ovs_nsh_key_attr_size(void)
> +{
> + /* Whenever adding new OVS_NSH_KEY_ FIELDS, we should consider
> +  * updating this function.
> +  */
> + return  nla_total_size(NSH_BASE_HDR_LEN) /* OVS_NSH_KEY_ATTR_BASE */
> + /* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are
> +  * mutually exclusive, so the bigger one can cover
> +  * the small one.
> +  *
> +  * OVS_NSH_KEY_ATTR_MD2
> +  */

A nit, not important but since you'll need to respin anyway: the last
line in the comment above seems to be a left over from some previous
version of the comment. This should be enough:

/* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are
 * mutually exclusive, so the bigger one can cover
 * the small one.
 */

Or maybe I misunderstood what you meant.

> +int nsh_hdr_from_nlattr(const struct nlattr *attr,
> + struct nshhdr *nh, size_t size)
> +{
> + struct nlattr *a;
> + int rem;
> + u8 flags = 0;
> + u8 ttl = 0;
> + int mdlen = 0;
> +
> + /* validate_nsh has check this, so we needn't do duplicate check here
> +  */
> + nla_for_each_nested(a, attr, rem) {
> + int type = nla_type(a);
> +
> + switch (type) {
> + case OVS_NSH_KEY_ATTR_BASE: {
> + const struct ovs_nsh_key_base *base = nla_data(a);
> +
> + flags = base->flags;
> + ttl = base->ttl;
> + nh->np = base->np;
> + nh->mdtype = base->mdtype;
> + nh->path_hdr = base->path_hdr;
> + break;
> + }
> + case OVS_NSH_KEY_ATTR_MD1:
> + mdlen = nla_len(a);
> + memcpy(>md1, nla_data(a), mdlen);

The check for 'size' disappeared from here somehow.

> + break;
> +
> + case OVS_NSH_KEY_ATTR_MD2:
> + mdlen = nla_len(a);
> + memcpy(>md2, nla_data(a), mdlen);

And here.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v14] openvswitch: enable NSH support

2017-10-31 Thread Jiri Benc
On Tue, 31 Oct 2017 15:57:41 -0400, Eric Garver wrote:
> On Mon, Oct 30, 2017 at 09:29:34AM +0800, Yi Yang wrote:
> > +   if (WARN_ON(is_push_nsh && is_mask))
> > +   return -EINVAL;
> 
> OVS_NLERR() is probably more appropriate.

No, not here. If this happens, it's a bug in the kernel and WARN_ON is
what we need. This is not triggerable from user space and user space
has no way to fix it if this happens.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support

2017-10-20 Thread Jiri Benc
On Fri, 20 Oct 2017 05:53:12 +0800, Yang, Yi wrote:
> For push_nsh, my typical use scinario is push_nsh then set then output
> to vxlangpe port.

Okay.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support

2017-10-19 Thread Jiri Benc
On Thu, 19 Oct 2017 21:12:15 +0800, Yang, Yi wrote:
> flow_key in set_nsh is got from netlink message which is set by
> commit_nsh in user space, here is code.

Isn't this the 'key' local variable that you're talking about, while I'm
referring to the 'flow_key' parameter?

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support

2017-10-19 Thread Jiri Benc
On Thu, 19 Oct 2017 19:40:53 +0800, Yang, Yi wrote:
> Actually mdtype can't be set, only push_nsh can set mdtype, so set_nsh
> won't have mdtype flow key, we can't get it from flow_key in set_nsh,
> only ttl, flags and path_hdr can be set in set_nsh as you can see in code.
> I understand your concern is calling skb_ensure_writable twice, so
> changing the first one to pskb_may_pull is more appropriate for set_nsh.

Isn't set_nsh called only after the packet was already dissected (i.e.
parse_nsh called)? The dissected fields should be available in flow_key.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support

2017-10-18 Thread Jiri Benc
On Mon, 16 Oct 2017 21:53:29 +0800, Yi Yang wrote:
> +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
> +const struct nlattr *a)
> +{
> + struct nshhdr *nh;
> + size_t length;
> + int err;
> + u8 flags;
> + u8 ttl;
> + int i;
> +
> + struct ovs_key_nsh key;
> + struct ovs_key_nsh mask;
> +
> + err = nsh_key_from_nlattr(a, , );
> + if (err)
> + return err;
> +
> + /* Make sure the NSH base header is there */
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +NSH_BASE_HDR_LEN);
> + if (unlikely(err))
> + return err;
> +
> + nh = nsh_hdr(skb);
> + length = nsh_hdr_len(nh);
> +
> + /* Make sure the whole NSH header is there */
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +length);

Calling skb_ensure_writable twice is an unnecessary waste in the fast
path. If anything, the first call should be changed to pskb_may_pull.

But what we really should do here is to move the header only once. We
know how much data we're going to write, we have everything stored in
the key and can calculate it from there.

length = NSH_BASE_HDR_LEN;
switch (flow_key->nsh.base.mdtype) {
case NSH_MD_TYPE1:
length += sizeof(struct ovs_nsh_key_md1);
break;
case NSH_MD_TYPE2:
length += whatever_way_we_store_the_tlvs_in_flow_key;
break;
}
err = skb_ensure_writable(skb, skb_network_offset(skb) + length);

However, set_nsh does not support MD type 2, thus the second case is a
dead code. In both switches in this function. As such, it should be
removed and added only when MD type 2 is introduced. I'd still keep the
overall logic to ease the future addition, though. This boils down to:

length = NSH_BASE_HDR_LEN;
/* Assume MD type 1. This function cannot be called for anything
 * else currently. When MD type 2 is added, the line below will
 * have to be turned into a switch on flow_key->nsh.base.mdtype.
 */
length += sizeof(struct ovs_nsh_key_md1);
err = skb_ensure_writable(skb, skb_network_offset(skb) + length);
...
flow_key->nsh.base.path_hdr = nh->path_hdr;
/* Only MD type 1, see the comment above. */
for (i = 0; i < NSH_MD1_CONTEXT_SIZE; i++) {
...

Please verify I'm not missing something.

It seems we also rely on the user space checking first whether the
packet is indeed compatible with the pushed key/mask. Most importantly,
that it's of the same mdtype and (in the future) that the MD type 2
TLVs being written actually fit. Seems this is done the same way in the
other set_* actions, thus fine with me.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v11] openvswitch: enable NSH support

2017-10-02 Thread Jiri Benc
On Fri, 29 Sep 2017 15:03:30 +0800, Yi Yang wrote:
> --- a/include/net/nsh.h
> +++ b/include/net/nsh.h
> @@ -304,4 +304,7 @@ static inline void nsh_set_flags_ttl_len(struct nshhdr 
> *nsh, u8 flags,
>   NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK);
>  }
>  
> +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *nh);

[...]
> +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *src_nsh_hdr)

This is minor but since this patch will need a respin anyway, please
name the variables in the forward declaration and here the same.

> +int skb_pop_nsh(struct sk_buff *skb)
> +{
> + int err;
> + struct nshhdr *nsh_hdr = (struct nshhdr *)(skb->data);

Bad name of the variable, clashes with the nsh_hdr function. I pointed
that out already.

> + size_t length;
> + __be16 inner_proto;
> +
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +sizeof(struct nshhdr));

You assume that the skb starts at the NSH header, thus the
skb_network_offset is completely unnecessary and introduces just
another assumption on the caller. Also, the sizeof(struct nshhdr) is
wrong: there's no guarantee that the header is not smaller or larger
than that.

More importantly though, why do you need skb_ensure_writable? You don't
write into the header. pkskb_may_pull is enough.

if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN))
return -ENOMEM;
length = nsh_hdr_len(nsh_hdr);
if (!pskb_may_pull(skb, length))
return -ENOMEM;

> +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
> +const struct nlattr *a)
> +{
> + struct nshhdr *nh;
> + int err;
> + u8 flags;
> + u8 ttl;
> + int i;
> +
> + struct ovs_key_nsh key;
> + struct ovs_key_nsh mask;
> +
> + err = nsh_key_from_nlattr(a, , );
> + if (err)
> + return err;
> +
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +sizeof(struct nshhdr));

I missed this before: this is wrong, too. You need to use the real
header size, not sizeof(struct nshhdr). It should be computable from
the flow key.

> + case OVS_ACTION_ATTR_PUSH_NSH: {
> + u8 buffer[NSH_HDR_MAX_LEN];
> + struct nshhdr *nh = (struct nshhdr *)buffer;
> +
> + nsh_hdr_from_nlattr(nla_data(a), nh,
> + NSH_HDR_MAX_LEN);
> + err = push_nsh(skb, key, (const struct nshhdr *)nh);

Is the cast to const really needed? It looks suspicious. If you added it
because a compiler complained, it's even more suspicious.

> +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nshhdr *nh;
> + unsigned int nh_ofs = skb_network_offset(skb);
> + u8 version, length;
> + int err;
> +
> + err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN);
> + if (unlikely(err))
> + return err;
> +
> + nh = nsh_hdr(skb);
> + version = nsh_get_ver(nh);
> + length = nsh_hdr_len(nh);
> +
> + if (version != 0)
> + return -EINVAL;
> +
> + err = check_header(skb, nh_ofs + length);
> + if (unlikely(err))
> + return err;
> +
> + nh = (struct nshhdr *)skb_network_header(skb);

I really really really hate this. This is the third time I'm telling
you to use the nsh_hdr function. Every time, you change only part of
the places. And this one I even explicitly pointed out in the previous
review.

I'm not supposed to look at my previous review to verify that you
addressed everything. That's your responsibility. Yet I need to do it
because every time, some of my comments remain unaddressed.

> +int nsh_hdr_from_nlattr(const struct nlattr *attr,
> + struct nshhdr *nh, size_t size)
> +{
> + struct nlattr *a;
> + int rem;
> + u8 flags = 0;
> + u8 ttl = 0;
> + int mdlen = 0;
> +
> + /* validate_nsh has check this, so we needn't do duplicate check here
> +  */
> + nla_for_each_nested(a, attr, rem) {
> + int type = nla_type(a);
> +
> + switch (type) {
> + case OVS_NSH_KEY_ATTR_BASE: {
> + const struct ovs_nsh_key_base *base = nla_data(a);
> +
> + flags = base->flags;
> + ttl = base->ttl;
> + nh->np = base->np;
> + nh->mdtype = base->mdtype;
> + nh->path_hdr = base->path_hdr;
> + break;
> + }
> + case OVS_NSH_KEY_ATTR_MD1: {
> + const struct ovs_nsh_key_md1 *md1 = nla_data(a);
> +
> + mdlen = nla_len(a);
> + memcpy(>md1, md1, mdlen);
> + break;

Looks better. Why not simplify it even more?

case OVS_NSH_KEY_ATTR_MD1:

Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 21:52:41 +0800, Yang, Yi wrote:
> > +   return ((ret != 0) ? false : true);
> 
> But I don't think this is a problematic line from my understanding,

Why not:

return ((ret != 0 == true) ? false : true) == true;

?

Sigh. This is equal to:

return !ret;

which you should use.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote:
> + return ((ret != 0) ? false : true);

I'm not going to review this version but this caught my eye - I pointed
out this silly construct in my review of v9. I can understand that
working late in the night and rewriting the code back and forth, one
could end up with such construct and overlook it at the final
self-review before submission. Happens to everyone.

But leaving this after a review pointed it out means you're not paying
enough attention to your work. Even the fact that you sent v10 so
shortly after v9 means you did not spend the needed time on reflecting
on the review and that you did not properly test the new version. And
I told you exactly this before.

Honestly, I'm starting to be tired with reviewing your patch again and
again and pointing out silly mistakes like this one and repeating basic
things to you again and again.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote:
> v9->v10
>  - Change struct ovs_key_nsh to
>struct ovs_nsh_key_base base;
>__be32 context[NSH_MD1_CONTEXT_SIZE];
>  - Fix new comments for v9

NAK, we haven't finished the discussion for v9 yet. It's not
appropriate to send a new version until there's a conclusion (or at
least until the discussion dies).

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support

2017-09-26 Thread Jiri Benc
On Tue, 26 Sep 2017 12:55:39 +0800, Yang, Yi wrote:
> After push_nsh, the packet won't be recirculated to flow pipeline, so
> key->eth.type must be set explicitly here, but for pop_nsh, the packet
> will be recirculated to flow pipeline, it will be reparsed, so
> key->eth.type will be set in packet parse function, we needn't handle it
> in pop_nsh.

This seems to be a very different approach than what we currently have.
Looking at the code, the requirement after "destructive" actions such
as pushing or popping headers is to recirculate.

Setting key->eth.type to satisfy conditions in the output path without
updating the rest of the key looks very hacky and fragile to me. There
might be other conditions and dependencies that are not obvious.
I don't think the code was written with such code path in mind.

I'd like to hear what Pravin thinks about this.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support

2017-09-25 Thread Jiri Benc
On Mon, 25 Sep 2017 22:16:09 +0800, Yi Yang wrote:
> + skb->protocol = htons(ETH_P_NSH);
> + skb_reset_mac_header(skb);
> + skb_reset_mac_len(skb);
> + skb_reset_network_header(skb);

The last two lines are swapped. Network header needs to be reset before
mac_len.

> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(skb_push_nsh);
> +
> +int skb_pop_nsh(struct sk_buff *skb)
> +{
> + struct nshhdr *nsh_hdr = (struct nshhdr *)(skb->data);
> + size_t length;
> + u16 inner_proto;

__be16 inner_proto.

> +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
> + const struct nshhdr *src_nsh_hdr)
> +{
> + int err;
> +
> + err = skb_push_nsh(skb, src_nsh_hdr);
> + if (err)
> + return err;
> +
> + key->eth.type = htons(ETH_P_NSH);

I wonder why you have this assignment here. The key is invalidated,
thus nothing should rely on key->eth.type. However, looking at the code
and ovs_fragment in particular, I'm not sure that's the case. Could you
please explain why it is needed? And why the reverse of it is not
needed in pop_nsh?

> +
> + /* safe right before invalidate_flow_key */
> + key->mac_proto = MAC_PROTO_NONE;
> + invalidate_flow_key(key);
> + return 0;
> +}
[...]
> +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
> +const struct nlattr *a)
> +{
> + struct nshhdr *nsh_hdr;
> + int err;
> + u8 flags;
> + u8 ttl;
> + int i;
> +
> + struct ovs_key_nsh key;
> + struct ovs_key_nsh mask;
> +
> + err = nsh_key_from_nlattr(a, , );
> + if (err)
> + return err;
> +
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +   sizeof(struct nshhdr));

Whitespace nit: the sizeof should be aligned to skb_network_offset.

> + if (unlikely(err))
> + return err;
> +
> + nsh_hdr = (struct nshhdr *)skb_network_header(skb);

Please use the nsh_hdr function (I'm sure I already requested that in
the previous review?). It means renaming the nsh_hdr variable.

> @@ -1210,6 +1307,21 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>   case OVS_ACTION_ATTR_POP_ETH:
>   err = pop_eth(skb, key);
>   break;
> +
> + case OVS_ACTION_ATTR_PUSH_NSH: {
> + u8 buffer[NSH_HDR_MAX_LEN];
> + struct nshhdr *nsh_hdr = (struct nshhdr *)buffer;
> + const struct nshhdr *src_nsh_hdr = nsh_hdr;
> +
> + nsh_hdr_from_nlattr(nla_data(a), nsh_hdr,
> + NSH_HDR_MAX_LEN);
> + err = push_nsh(skb, key, src_nsh_hdr);

Is the src_nsh_hdr variable really necessary? Cannot be nsh_hdr passed
directly to push_nsh, relying on automatic retyping to const?

By the way, nsh_hdr is a poor name for a variable, it clashes with the
nsh_hdr function. For clarity, please rename the variables in the whole
patch to something else.

> +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nshhdr *nsh_hdr;
> + unsigned int nh_ofs = skb_network_offset(skb);
> + u8 version, length;
> + int err;
> +
> + err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN);
> + if (unlikely(err))
> + return err;
> +
> + nsh_hdr = (struct nshhdr *)skb_network_header(skb);

Again, rename the variable and use the nsh_hdr function.

> + version = nsh_get_ver(nsh_hdr);
> + length = nsh_hdr_len(nsh_hdr);
> +
> + if (version != 0)
> + return -EINVAL;
> +
> + err = check_header(skb, nh_ofs + length);
> + if (unlikely(err))
> + return err;
> +
> + nsh_hdr = (struct nshhdr *)skb_network_header(skb);

Ditto.

> +struct ovs_key_nsh {
> + u8 flags;
> + u8 ttl;
> + u8 mdtype;
> + u8 np;
> + __be32 path_hdr;
> + __be32 context[NSH_MD1_CONTEXT_SIZE];
> +};

This should be:

struct ovs_key_nsh {
struct ovs_nsh_key_base base;
__be32 context[NSH_MD1_CONTEXT_SIZE];
};

to capture the real dependency. Implicitly depending on ovs_key_nsh
starting with the same fields as ovs_nsh_key_base will only lead to bugs
in the future.

> +int nsh_hdr_from_nlattr(const struct nlattr *attr,
> + struct nshhdr *nsh, size_t size)
> +{
> + struct nlattr *a;
> + int rem;
> + u8 flags = 0;
> + u8 ttl = 0;
> + int mdlen = 0;
> +
> + /* validate_nsh has check this, so we needn't do duplicate check here
> +  */
> + nla_for_each_nested(a, attr, rem) {
> + int type = nla_type(a);
> +
> + switch (type) {
> + case OVS_NSH_KEY_ATTR_BASE: {
> + const struct ovs_nsh_key_base *base =
> + (struct ovs_nsh_key_base *)nla_data(a);

It's not necessary to retype void * explicitly. Just assign it.

> + 

Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support

2017-09-14 Thread Jiri Benc
On Thu, 14 Sep 2017 16:37:59 +0800, Yi Yang wrote:
> OVS master and 2.8 branch has merged NSH userspace
> patch series, this patch is to enable NSH support
> in kernel data path in order that OVS can support
> NSH in compat mode by porting this.

http://vger.kernel.org/~davem/net-next.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc
On Mon, 4 Sep 2017 14:45:50 +, Jan Scheurich wrote:
> So what is the correct layout for MASKED_SET action with nested fields?
> 1. All nested values, followed by all nested masks, or
> 2. For each nested field value followed by mask?
> 
> I guess alternative 1, but just to be sure.

It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement
and probably impossible to be properly validated.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc
On Tue, 5 Sep 2017 14:37:05 +0800, Yang, Yi wrote:
> I checked source code but can't find where we can prepopulate it and how
> we can deliver the prepopulated header to push_nsh, can you expand it in
> order that I can know how to do this in code level?

I already said that it's not implemented yet and can be done as a
future performance enhancement.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc
On Tue, 5 Sep 2017 13:51:45 +0800, Yang, Yi wrote:
> But if we follow your way, how does nla_for_each_nested handle such
> pattern?
> 
> attribute1
> attribute1_mask
> attribute2
> attribute2_mask
> attribute3
> attribute3_mask

Uh? That will just work. Note that nla len contains the size of the
whole attribute, i.e. includes both fields.

> I don't think this can increase performance and readability.

Do you realize you're stating that not copying data around in hot path
can't increase performance? ;-)

> if we use one function to handle both attributes and masks, I can't
> see any substantial difference between two ways as far as the
> performance is concerned.

Except that what you did is so unexpected to netlink that you had to go
out of your way to parse it. Those two memcpys speak for themselves.

> If we consider OVS_KEY_ATTR_NSH as a big attribute, current way is
> precisely following current design pattern

Do you have an idea how nested netlink attributes work? It's very well
defined. What you're doing is breaking the definition. You can't do
that.

The (non-nested) attributes contain header followed by data. The data
is a single value or it is a struct. In ovs for masked actions,
attributes contain a struct of two fields (value and mask). The struct
access is open coded but it's still a struct.

Now, nested attributes are very well defined: it's a nla header
followed by a stream of attributes. There's no requirement on the order
of the attributes. Do you see how you're breaking this assumption? You
expect that each attribute is present exactly twice, once among first
half of attributes, once among second half of attributes. You don't
check that this weird assumption is adhered to. You don't even check
that the point where you split the data is between attributes! You may
very well be splitting an attribute in half and interpreting its data
as nla headers.

Consider your proposal NACKed from my side.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jiri Benc
On Mon, 4 Sep 2017 14:07:45 +, Jan Scheurich wrote:
> Then perhaps I misunderstood your comment. I thought you didn't like that the 
> SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested.
> I was aiming to avoid this by lifting the two components of the NSH header 
> components to the top level:
> OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER
> OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT

No, this should be a nested attr.

I objected to the way value+mask combo is handled.

> See above. The two would be separate values in the same enum, i.e. distinct.

This is not how netlink attrs should be designed.

> Not sure I have the full picture here. Are you saying that the tc tool uses 
> the same
> Netlink API to the kernel as OVS? What would be the back-end for tc? Also the
> openvswitch kernel module? 

It does not use the same API but it makes sense for these two to share
common code. Plus hw offloading in ovs is done using tc.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jiri Benc
On Mon, 4 Sep 2017 12:57:15 +, Jan Scheurich wrote:
> So is what you are suggesting the following?
> 
> For matching:
> OVS_KEY_ATTR_NSH_BASE_HEADER  mandatory
> OVS_KEY_ATTR_NSH_MD1_CONTEXT  optional, in case MDTYPE == MD1

This needs to be:

OVS_KEY_ATTR_NSH
OVS_KEY_ATTR_NSH_BASE_HEADER
OVS_KEY_ATTR_NSH_MD1_CONTEXT

> For setting:
> OVS_ACTION_ATTR_SET_MASKED
> -OVS_KEY_ATTR_NSH_BASE_HEADER when setting any base header 
> fields
> OVS_ACTION_ATTR_SET_MASKED
> -OVS_KEY_ATTR_NSH_MD1_CONTEXT when setting any MD1 context data fields

This needs to be:

OVS_ACTION_ATTR_SET_MASKED
OVS_KEY_ATTR_NSH
OVS_KEY_ATTR_NSH_BASE_HEADER
OVS_KEY_ATTR_NSH_MD1_CONTEXT

In your example, there's no way to distinguish OVS_KEY_ATTR_ENCAP and
OVS_KEY_ATTR_NSH_BASE_HEADER, they are both "1".

> Should we consider to stick to that simple and efficient approach? As far 
> As I can see it will be generic for the foreseeable future.

I'm not sure. From the kernel point of view, we want the same
functionality offered by the openvswitch module and by a tc action.

In theory, it can be the tc tool that constructs the header from the
command line but there's no precedent. Dunno.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jiri Benc
On Wed, 30 Aug 2017 20:39:12 +0800, Yi Yang wrote:
> +enum ovs_nsh_key_attr {
> + OVS_NSH_KEY_ATTR_BASE,  /* struct ovs_nsh_key_base. */
> + OVS_NSH_KEY_ATTR_MD1,   /* struct ovs_nsh_key_md1. */
> + OVS_NSH_KEY_ATTR_MD2,   /* variable-length octets for MD type 2. */
> + __OVS_NSH_KEY_ATTR_MAX
> +};

One more thing: 0 is reserved, the first attr must be
OVS_NSH_KEY_ATTR_UNSPEC.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jiri Benc
On Mon, 4 Sep 2017 20:09:07 +0800, Yang, Yi wrote:
> So we must do many changes if we want to break this assumption.

We may do as many changes as we want to. This is uAPI we're talking
about and we need to get it right since the beginning. Sure, it may
mean that some user space programs need some changes in order to make
use of the new features. That happens every day.

I also don't understand where's the problem. It's very easy to check
for NLA_F_NESTED and generically act based on that in the function you
quoted. Just call a different function than format_odp_key_attr to
handle ovs_nsh_key_attr attributes in case the nested flag is set and
the attribute is OVS_KEY_ATTR_NSH and you're done. You'll need such
function anyway, it's not much different code size wise to call it from
format_odp_key_attr or from format_odp_action.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jiri Benc
On Mon, 4 Sep 2017 16:00:05 +0800, Yang, Yi wrote:
> I think we have had similiar discussion about this, the issue is we have
> no way to handle both MD type 1 and MD type 2 by using a common flow key 
> struct.
> 
> So we have to do so, there is miniflow to handle such issue in
> userspace, but kernel data path didn't provide such mechanism. I think
> every newly-added key will result in the same space-wasting issue for
> kernel data path, isn't it?

Yes. And it would be surprising if it didn't have an effect on
performance.

I don't care that much as this is a result of openvswitch module design
decisions but it still would be good to reach a consensus before
implementing uAPI that might not be needed in the end.

> OVS_ATTR_KEY_NSH is the first case to use nested attributes for set
> action, so no reference code for this, set action has two use cases, one
> has mask but the other hasn't, so it will be easier an nested attribute is
> handled as a whole, in user space, nsh_key_from_nlattr is used in
> several places.

I very much disagree. What you're doing is very poor design as can be
seen from the code where you have to do weird gymnastics to implement
that. Compare that to a much simple case where each attribute carries
its own value and mask.

> If we change it per your proposal, there will be a lot
> of rework,

That's not an argument. We care about doing things right, we don't do
things hastily and with as little effort as possible.

> we have to provide two functions to handle this, one is for
> the case with mask, the other is for the case without mask.

I don't see that. The function is called from a single place only.

> How can we do so? I see batch process code in user space implementation,
> but kernel data path didn't such infrastructure,

You're right that there's no infrastructure. I somewhat agree that it
might be out of scope of this patch and it can be optimized later.
That's why I also included other suggestions to speed this up until we
implement better parsing: namely, verify correctness once (at the time
it is set from user space) and just expect things to be correct in the
fast path.

> how can we know next push_nsh uses the same nsh header as previous
> one?

We store the prepopulated header together with the action.

> > Shouldn't we reject the packet, then? Pretending that something was parsed
> > correctly while in fact it was not, does not look correct.
> 
> For MD type 2, we don't implement metadata parsing, but it can still
> work. I'm not sure what you mean here, how do we reject a packet here?

Okay, we can match on mdtype and thus can find out about this type of
packets. No problem, then.

> > These checks should be done only once when the action is configured, not for
> > each packet.
> 
> I don't know how to implement a batch processing in kernel data path, it
> seems there isn't such code for reference.

The checks should be done somewhere in __ovs_nla_copy_action. Which
seems to be done even in your patch by nsh_key_put_from_nlattr
(I didn't get that far during the review last week. The names of
those nsh_(key|hdr)_*_nlattr are so similar that it's impossible to
tell what they do without looking at the call sites - something you
should also consider to improve). That makes it even more wrong: you
appear to check everything twice, first on configuration and then for
every packet.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-08-31 Thread Jiri Benc
On Wed, 30 Aug 2017 20:39:12 +0800, Yi Yang wrote:
> --- a/net/nsh/nsh.c
> +++ b/net/nsh/nsh.c
> @@ -14,6 +14,47 @@
>  #include 
>  #include 
>  
> +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *nsh_src, bool 
> is_eth)
> +{
> + struct nshhdr *nsh;
> + size_t length = nsh_hdr_len(nsh_src);
> + u8 next_proto;
> +
> + if (is_eth) {
> + next_proto = TUN_P_ETHERNET;

I don't like the is_eth parameter. It should be clear from the skb being
passed to the function, specifically from skb->mac_len.

> + } else {
> + next_proto = tun_p_from_eth_p(skb->protocol);
> + if (!next_proto)
> + return -ENOTSUPP;

EAFNOSUPPORT is more suitable here.

> + }
> +
> + /* Add the NSH header */
> + if (skb_cow_head(skb, length) < 0)
> + return -ENOMEM;
> +
> + /* Add the NSH header */
> + if (skb_cow_head(skb, length) < 0)
> + return -ENOMEM;

Duplicate statement.

> +
> + if (!skb->inner_protocol) {
> + skb_set_inner_network_header(skb, skb->mac_len);
> + skb_set_inner_protocol(skb, skb->protocol);

It doesn't make sense to set either of those. Nothing uses the fields for
NSH.

> + }
> +
> + skb_push(skb, length);
> + nsh = (struct nshhdr *)(skb->data);

Use nsh_hdr.

> + memcpy(nsh, nsh_src, length);
> + nsh->np = next_proto;
> + nsh->mdtype &= NSH_MDTYPE_MASK;

The "& NSH_MDTYPE_MASK" operation does not make sense. Just trust the caller
to provide a meaningful value. It's its job to verify the parameters.

> + skb->protocol = htons(ETH_P_NSH);
> + skb_reset_mac_header(skb);
> + skb_reset_mac_len(skb);

You have to reset also the network header.

> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(skb_push_nsh);
> +
>  static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
>  netdev_features_t features)
>  {
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index a54a556..e969fad 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -38,11 +38,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "datapath.h"
>  #include "flow.h"
>  #include "conntrack.h"
>  #include "vport.h"
> +#include "flow_netlink.h"
>  
>  struct deferred_action {
>   struct sk_buff *skb;
> @@ -380,6 +382,57 @@ static int push_eth(struct sk_buff *skb, struct 
> sw_flow_key *key,
>   return 0;
>  }
>  
> +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
> + const struct nshhdr *nsh_src)
> +{
> + bool is_eth = false;
> + int ret;
> +
> + if (key->mac_proto == MAC_PROTO_ETHERNET)
> + is_eth = true;
> +
> + ret = skb_push_nsh(skb, nsh_src, is_eth);
> + if (ret != 0)

The usual idiom is "if (ret)". And for consistency with the rest of the
file, the name of the variable should be "err".

> + return ret;
> +
> + key->eth.type = htons(ETH_P_NSH);
> +
> + /* safe right before invalidate_flow_key */
> + key->mac_proto = MAC_PROTO_NONE;
> + invalidate_flow_key(key);
> + return 0;
> +}
> +
> +static int pop_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nshhdr *nsh = (struct nshhdr *)(skb->data);
> + size_t length;
> + u16 inner_proto;
> +
> + if (ovs_key_mac_proto(key) != MAC_PROTO_NONE ||
> + skb->protocol != htons(ETH_P_NSH)) {
> + return -EINVAL;
> + }
> +
> + inner_proto = tun_p_to_eth_p(nsh->np);
> + if (!inner_proto)
> + return -ENOTSUPP;
> +
> + length = nsh_hdr_len(nsh);
> + skb_pull(skb, length);
> + skb_reset_mac_header(skb);
> + skb_reset_mac_len(skb);
> + skb->protocol = inner_proto;

Please factor this out to skb_pop_nsh, similarly to what you did with
skb_push_nsh.

> +
> + /* safe right before invalidate_flow_key */
> + if (inner_proto == htons(ETH_P_TEB))
> + key->mac_proto = MAC_PROTO_ETHERNET;
> + else
> + key->mac_proto = MAC_PROTO_NONE;
> + invalidate_flow_key(key);
> + return 0;
> +}
> +
>  static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh,
> __be32 addr, __be32 new_addr)
>  {
> @@ -602,6 +655,53 @@ static int set_ipv6(struct sk_buff *skb, struct 
> sw_flow_key *flow_key,
>   return 0;
>  }
>  
> +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
> +const struct ovs_key_nsh *key,
> +const struct ovs_key_nsh *mask)
> +{
> + struct nshhdr *nsh;
> + int err;
> + u8 flags;
> + u8 ttl;
> + int i;
> +
> + err = skb_ensure_writable(skb, skb_network_offset(skb) +
> +   sizeof(struct nshhdr));
> + if (unlikely(err))
> + return err;
> +
> + nsh = (struct nshhdr *)skb_network_header(skb);

nsh_hdr

> +
> + flags = nsh_get_flags(nsh);
> 

Re: [ovs-dev] [PATCH] NEWS: Mark NSH support as experimental in 2.8.

2017-08-29 Thread Jiri Benc
On Tue, 29 Aug 2017 10:50:11 -0700, Ben Pfaff wrote:
> This feature landed late in 2.8 and the NSH wire protocol itself is not
> completely stable.

Acked-by: Jiri Benc <jb...@redhat.com>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH

2017-08-25 Thread Jiri Benc
On Fri, 25 Aug 2017 18:25:14 +0200, Jiri Benc wrote:
> While looking at this, I realized that GSO for VXLAN-GPE is broken,
> too. Let me fix it by implementing what I described above which will
> make your patch much easier.

Okay, it's not really broken and we don't need that complexity. At
least not immediately. Hw offloading in the VXLAN-GPE case probably does
not work correctly and would benefit from that change but that's a
different beast to tackle at a different time. Software segmentation
works fine for VXLAN-GPE.

There should not be much problems with NSH segmentation, either, if we
carefully store and set mac_header, mac_len and skb->protocol around
calls to skb_mac_gso_segment. Note that with zero mac_len (and correct
skb->protocol), skb_mac_gso_segment behaves in the same way that you
tried to achieve with find_gso_segment_by_type, which is thus completely
unnecessary.

More on Monday.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH

2017-08-25 Thread Jiri Benc
On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -766,7 +766,7 @@ struct sk_buff {
>   __u8ndisc_nodetype:2;
>  #endif
>   __u8ipvs_property:1;
> - __u8inner_protocol_type:1;
> + __u8inner_protocol_type:2;

Adding anything to sk_buff is pretty much forbidden. You can't add more
bytes to it and there are no more free bits to use, either.

Luckily, we still have one byte hole next to inner_ipproto that we can
use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3
and storing the L3 type in the unused byte. It's not beautiful (would
be better to use ethertype than a custom enum) but it will work.

While looking at this, I realized that GSO for VXLAN-GPE is broken,
too. Let me fix it by implementing what I described above which will
make your patch much easier.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v6 1/3] net: add NSH header structures and helpers

2017-08-25 Thread Jiri Benc
On Fri, 25 Aug 2017 22:20:03 +0800, Yi Yang wrote:
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -138,6 +138,7 @@
>  #define ETH_P_IEEE802154 0x00F6  /* IEEE802.15.4 frame   
> */
>  #define ETH_P_CAIF   0x00F7  /* ST-Ericsson CAIF protocol*/
>  #define ETH_P_XDSA   0x00F8  /* Multiplexed DSA protocol */
> +#define ETH_P_NSH0x894F  /* Ethertype for NSH. */

This is in a wrong section. It belongs to Ethernet protocols a few lines
higher in the file (after ETH_P_HSR).

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support

2017-08-23 Thread Jiri Benc
On Tue, 22 Aug 2017 17:38:26 +0800, Yang, Yi wrote:
> I checked GSO support code, we have two cases to handle, one case is to
> output NSH packet to VxLAN-gpe port, the other case is to output NSH packet
> to Ethernet port (tap, veth, vhost, physical eth, etc.), I don't think
> it is a good way to do GSO support in OVS module, because we can't know
> the final output port at this point, if the final output port is
> VxLAN-gpe port, it will still face GSO issue, but I didn't see vxlan
> module handles this, maybe udp tunnel did this. I think a NSH packet can
> be split into two packets, one has NSH header, another one doesn't,
> so I'm not sure how vxlan module can avoid this situation.

As I said before, by either implementing GSO support for NSH or by
segmenting in software before pushing the NSH header. In the first
case, the packet will be software segmented before being pushed to the
hardware, in the second case, it will be software segmented before
pushing the NSH header.

> For the second case, we can add a nsh GSO module in net/nsh/nsh_gso.c
> (as MPLS did in net/mpls/mpls_gso.c), that is the best way to handle
> this, what do you think about it?

Yes, that's one of the two options that we've been talking about. It's
the better and the more efficient one.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jiri Benc
On Mon, 21 Aug 2017 10:10:38 +, Jan Scheurich wrote:
> If I understand correctly, this is a default definition that can be
> overridden by drivers so that we still cannot rely on the Ethernet
> payload always being 32-bit-aligned. 

Not by drivers, by architectures. Only architectures that can
efficiently handle unaligned access (or on which the cost of handling
unaligned access is lower than the cost of unaligned DMA access) set
NET_IP_ALIGN to 0.

> Furthermore, there seem to be machine architectures that cannot
> handle misaligned 32-bit access at all (not even with a performance
> penalty).

Those leave NET_IP_ALIGN to 2.

> Or why else does OVS user space code take so great pain to model
> possible misalignment and provide/use safe access functions?

I don't know how the ovs user space deals with packet allocation. In
the kernel, the network header is aligned in a way that it allows
efficient 32bit access.

> Does Linux kernel code generally ignore this risk?

Given the fact that IPv4 addresses are 32bit, are accessed as such and
one can't say that IPv4 implementation on Linux is non-functional, the
answer is obvious :-)

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jiri Benc
On Mon, 21 Aug 2017 17:15:42 +0800, Yang, Yi wrote:
> The issue is it is used union in
> 
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t md_type;
> uint8_t next_proto;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2;
> };
> };

This should work (modulo the non-kernel type names, of course). Did you
mean to put [] after md2?

> in Linux kernel build, it complained it, I changed it to

What was the error message?

> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t md_type;
> uint8_t next_proto;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2[0];
> };
> };

I wouldn't use this. First, zero length array is a GCC extension. It
would indeed be better not to use that in uAPI. Second, I wouldn't even
use a flexible array member here, see my reply to Jan for the reasons.

Note that I commented on struct nsh_md2_tlv having __u8[] as the last
member which IMHO makes good sense. I'm not entirely sure what C99 says
about flexible array member being part of a struct inside union inside
a struct, though. GCC seems to cope with that just fine but AFAIK it
has some extension over the C standard wrt. flexible array members.

> I don't know how we can support this, is it a must-have thing?

What would happen if you get a GSO packet? Ports of an ovs bridge claim
GSO support, thus they may get a GSO packet. You have to handle it one
way or the other: either software segment the packet before pushing the
header, or implement proper GSO support for NSH.

> But struct nsh_hdr had different struct from struct ovs_key_nsh, we
> have no way to make them completely same, do you mean we should use the
> same name if they are same fields and represent the same thing?

Yes.

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support

2017-08-18 Thread Jiri Benc
On Fri, 18 Aug 2017 15:26:01 +0200, Jiri Benc wrote:
> Out of time for today, will continue the review next week. Again, feel
> free to send a new version meanwhile or wait for the rest of the
> review, whatever works better for you.

One more thing: before you send a new version, be sure and double check
that you addressed *all* of the feedback. In this version, you still
have not addressed a few things I gave feedback on in the previous
version. This wastes everyone's time.

It doesn't mean I'm always right, of course, but you either explain why
I'm wrong or change the code. It's generally not acceptable to ignore
feedback.

Thank you,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support

2017-08-18 Thread Jiri Benc
On Fri, 18 Aug 2017 15:24:31 +0800, Yi Yang wrote:
> +struct nsh_md2_tlv {
> + __be16 md_class;
> + u8 type;
> + u8 length;
> + /* Followed by variable-length data. */
> +};

What was wrong with the u8[] field that was present at the end of the
struct in the previous version of the patch?

> +#define NSH_M_TYPE2_MAX_LEN 256

This is defined twice, please delete this define and keep the one lower
in the file.

> +#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */

This is a VXLAN-GPE port, it has nothing to do with NSH (except that
VXLAN-GPE can contain a NSH packet). It's also unused. Please remove it.

> +/* NSH Metadata Length. */
> +#define NSH_M_TYPE1_MDLEN 16

This is unused and it seems it's not much useful anyway,
sizeof(struct nsh_md1_ctx) provides the same value. Please remove this
define.

> +#define NSH_MD1_CTX(nsh_hdr_ptr) (&(nsh_hdr_ptr)->md1)
> +
> +#define NSH_MD2_CTX(nsh_hdr_ptr) (&(nsh_hdr_ptr)->md2)

Please remove these two. They are unused and would just obscure things
anyway.

> +static inline struct nsh_md1_ctx *nsh_md1_ctx(struct nsh_hdr *nsh)
> +{
> + return >md1;
> +}
> +
> +static inline struct nsh_md2_tlv *nsh_md2_ctx(struct nsh_hdr *nsh)
> +{
> + return >md2;
> +}

And remove these too, for the same reason. Just use nsh->md1 when you
need the metadata, there's no reason for these helper functions. They
just obscure things.

> +static inline void nsh_set_flags_and_ttl(struct nsh_hdr *nsh, u8 flags, u8 
> ttl)
> +{
> + nsh->ver_flags_ttl_len
> + = htons((ntohs(nsh->ver_flags_ttl_len)
> + & ~(NSH_FLAGS_MASK | NSH_TTL_MASK))
> + | ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK)
> + | ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK));
> +}
> +
> +static inline void nsh_set_flags_ttl_len(struct nsh_hdr *nsh, u8 flags,
> +  u8 ttl, u8 len)
> +{
> + nsh->ver_flags_ttl_len
> + = htons((ntohs(nsh->ver_flags_ttl_len)
> + & ~(NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK))
> + | ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK)
> + | ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK)
> + | ((len << NSH_LEN_SHIFT) & NSH_LEN_MASK));
> +}

Okay. Could those two perhaps use a common function?

static inline void __nsh_set_flags(struct nsh_hdr *nsh, u16 value, u16 mask)
{
nsh->ver_flags_ttl_len = nsh->ver_flags_ttl_len & ~htons(mask)
| htons(value);
}

static inline void nsh_set_flags_and_ttl(struct nsh_hdr *nsh, u8 flags, u8 ttl)
{
__nsh_set_flags(nsh, flags << NSH_FLAGS_SHIFT | ttl << NSH_TTL_SHIFT,
NSH_FLAGS_MASK | NSH_TTL_MASK);
}

etc.

> +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
> + const struct nsh_hdr *nsh_src)
> +{
[...]
> + if (!skb->inner_protocol)
> + skb_set_inner_protocol(skb, skb->protocol);

I was wondering about this during the reviews of the previous versions.
Now I've given this more thought but I still don't see it - why is the
inner_protocol set here?

> + case OVS_KEY_ATTR_NSH: {
> + struct ovs_key_nsh nsh;
> + struct ovs_key_nsh nsh_mask;
> + size_t size = nla_len(a) / 2;
> + struct nlattr attr[1 + size / sizeof(struct nlattr) + 1];
> + struct nlattr mask[1 + size / sizeof(struct nlattr) + 1];
> +
> + attr->nla_type = nla_type(a);
> + mask->nla_type = attr->nla_type;
> + attr->nla_len = NLA_HDRLEN + size;
> + mask->nla_len = attr->nla_len;
> + memcpy(attr + 1, (char *)(a + 1), size);
> + memcpy(mask + 1, (char *)(a + 1) + size, size);

No, please. See my reply to the previous version for how to do this in
a less hacky way.

> + case OVS_ACTION_ATTR_PUSH_NSH: {
> + u8 buffer[256];
> + struct nsh_hdr *nsh_hdr = (struct nsh_hdr *)buffer;
> + const struct nsh_hdr *nsh_src = nsh_hdr;
> +
> + nsh_hdr_from_nlattr(nla_data(a), nsh_hdr);

This is very dangerous security wise. You have to protect against
buffer overflow, one way or other. The current code may not overflow
(I have not checked that, though) but a future addition may break the
assumption without being obvious it's a problem.

Note that the previous version had exactly the same problem but it was
hidden and I didn't notice it. Which means that getting rid of that
push_nsh_para struct was a very good thing, the code is more clean and
more obvious now.

> +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nsh_hdr *nsh = (struct nsh_hdr *)skb_network_header(skb);
> + u8 version, length;
> + int err;
> +
> + err = check_header(skb, NSH_BASE_HDR_LEN);
> + if (unlikely(err))
> +   

Re: [ovs-dev] [PATCH v3] openvswitch: enable NSH support

2017-08-16 Thread Jiri Benc
On Wed, 16 Aug 2017 17:31:30 +0800, Yang, Yi wrote:
> On Wed, Aug 16, 2017 at 11:19:21AM +0200, Jiri Benc wrote:
> > > --- a/include/uapi/linux/openvswitch.h
> > > +++ b/include/uapi/linux/openvswitch.h
> > [...]
> > > +#define NSH_MD1_CONTEXT_SIZE 4
> > 
> > Please move this to nsh.h and use it there, too, instead of the open
> > coded 4.
> 
> ovs code is very ugly, it will convert array[4] in
> datapath/linux/compat/include/linux/openvswitch.h to other struct, I
> have to change context[4] to such format :-), we can use 4 here for
> Linux kernel.

Oh, right, this is uAPI and nsh.h is kernel internal. My suggestion was
nonsense, let's keep it as it was in your patch.

> > > + case OVS_KEY_ATTR_NSH: {
> > > + struct ovs_key_nsh nsh;
> > > + struct ovs_key_nsh nsh_mask;
> > > + size_t size = nla_len(a) / 2;
> > > + struct nlattr attr[1 + DIV_ROUND_UP(sizeof(struct ovs_key_ipv6)
> > > + , sizeof(struct nlattr))];
> > > + struct nlattr mask[1 + DIV_ROUND_UP(sizeof(struct ovs_key_ipv6)
> > > + , sizeof(struct nlattr))];
> > > +
> > > + attr->nla_type = nla_type(a);
> > > + mask->nla_type = attr->nla_type;
> > > + attr->nla_len = NLA_HDRLEN + size;
> > > + mask->nla_len = attr->nla_len;
> > > + memcpy(attr + 1, (char *)(a + 1), size);
> > > + memcpy(mask + 1, (char *)(a + 1) + size, size);
> > 
> > This is too hacky. Please find a better way to handle this.
> > 
> > One option is to create a struct with struct nlattr as the first member
> > followed by a char buffer. Still not nice but at least it's clear
> > what's the intent.
> 
> The issue is nested attributes only can use this way, nested attribute
> for SET_MASKED is very special, we have to handle it specially.

I'm not sure you understood what I meant. Let me explain in code:

struct {
struct nlattr attr;
struct ovs_key_ipv6 data;
} attr, mask;

attr->attr.nla_type = nla_type(a);
attr->attr.nl_len = NLA_HDRLEN + size;
memcpy(>data, a + 1, size);

It's much less hacky and doing the same thing.

I'm not sure we don't need verification of size not overflowing the
available buffer. Is it checked beforehand elsewhere?

I also spotted one more bug: the 'mask' variable is not used anywhere.
The second call of nsh_key_from_nlattr should use mask, not attr.

> > > + key->nsh.path_hdr = nsh->path_hdr;
> > > + switch (key->nsh.mdtype) {
> > > + case NSH_M_TYPE1:
> > > + if ((length << 2) != NSH_M_TYPE1_LEN)
> > 
> > Why length << 2?
> 
> len in NSH header is number of 4 octets, so need to multiply 4.

Should nsh_get_len take care of that, then?

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jiri Benc
On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote:
> Is it worth to speculate on how a hypothetical future NSH version
> (with a different Version value in the Base header) might look like?

Absolutely. This is uAPI we're talking about and once merged, it's set
in stone. Whatever we come up with should allow future extensibility.

> If this should ever arise, we could introduce a new push_nsh_v2
> action.

Which would mean we failed with the design. We would have to maintain
two parallel APIs forever. This is not how the design should be made.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jiri Benc
On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote:
> Jiri, I am not too familiar with conventions on the OVS netlink
> interface regarding the handling of variable length fields. What is
> the benefit of structuring the push_nsh action into
> 
> OVS_ACTION_ATTR_PUSH_NSH
> +-- OVS_ACTION_ATTR_NSH_BASE_HEADER
> +-- OVS_ACTION_ATTR_NSH_MD
> 
> instead of grouping the base header fields and the variable length MD
> into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the
> main concern a potential future need to add additional parameters to
> the push_nsh datapath action? Are there examples for such structured
> actions other than OVS_ACTION_ATTR_CT where the need is obvious
> because it is very polymorphic?

This is about having the design clean. What would you do if you had two
variable length fields? Would you still put them in a single structure
and have a length field in the structure, too? That would be wrong, we
have length in the attribute header. I doubt you would do that. Which
indicates that putting variable length fields to an attribute with
anything other is wrong.

Also, look at how ugly the code would be. You'd have to subtract the
base header length from the attribute length to get the variable data
length. That's not nice at all.

Think about the netlink in the way that by default, every field should
have its own attribute. Structures are included only for performance
reasons where certain fields are always passed in a message and having
them in separate attributes would be impractical and waste of space.
Going out of your way to include the context data in the structure thus
doesn't make sense.

> BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because
> in the NSH draft the term base header is used for the first 32-bit
> word, whereas here it includes also the 32-bit Service Path header.

An excellent comment. This means that it's very well possible that
future NSH versions may not include SP header or may have it of a
different size. Maybe we should consider putting it into a separate
attribute, too? Not sure it is needed, though, I don't think it's
likely to happen.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-11 Thread Jiri Benc
On Fri, 11 Aug 2017 10:09:36 +, Jan Scheurich wrote:
> Unless someone can explain to me why the datapath should understand the
> internal structure/format of metadata in push_nsh, I would strongly
> vote to keep the metadata as variable length octet sequence in the
> non-structured OVS_ACTION_ATTR_PUSH_NSH

Could be but it still needs to be in a different attribute and not in
the ovs_action_push_nsh structure.

Separate attributes for MD1/MD2 has the advantage of easier validation:
with a separate MD1 type attribute, the size check is easier. With an
unstructured MD attribute, we'd need to look into the
OVS_ACTION_ATTR_NSH_BASE_HEADER attribute for mdtype and then validate
the unstructured MD attribute size manually. Not a big deal, though.
I don't have strong opinion here.

But I do have strong opinion that MD needs to go into a separate
attribute, whether there are separate attributes for MD1/2 or not.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-11 Thread Jiri Benc
On Fri, 11 Aug 2017 09:24:25 +, Yang, Yi Y wrote:
> So far, we're not clear how we can support MD type 2 better, as I
> explained before, we need to reuse tun_metadata in struct flow_tnl
> which is the thing Geneve is using. Geneve predefined 64 keys for
> this from tun_metadata0 to tun_metadata63, we will reuse it for MD
> type 2. But you know NSH is not tunnel, so it has to be changed to
> support both Geneve and NSH. Anyway, they won't be part of
> ovs_key_nsh.

Please do not top post.

The context field does not apply to MD type 2. It looks wrong for the
context field to be included in netlink attribute for anything other
than MD type 1. Perhaps it needs to be put into a separate attribute,
too?

Note that I'm talking only about the uAPI. Internally, ovs can use
struct ovs_key_nsh that is MD type 1 only, there's no problem changing
that later. But for the user space interface, this needs to be clean.
This can be solved for example this way:

In include/uapi/linux/openvswitch.h:

struct ovs_key_nsh_base {
__u8 flags;
__u8 mdtype;
__u8 np;
__u8 pad;
__be32 path_hdr;
};

+ one more netlink attribute carrying MD type 1 info. Will probably
require to change OVS_KEY_ATTR_NSH to a nested attribute etc.

In net/openvswitch/flow.h (or perhaps a different header would be more
appropriate?):

struct ovs_key_nsh {
struct ovs_key_nsh_base base;
__be32 context[4];
};

Plus needed conversions between OVS_KEY_ATTR_NSH and struct ovs_key_nsh
when interfacing between the kernel and user space.

That way, we can have MD type 1 support only for now while still being
allowed to redesign things in whatever way later.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-11 Thread Jiri Benc
On Fri, 11 Aug 2017 16:47:23 +0800, Yang, Yi wrote:
> is "__be32 context[4]" ok?

Yes, that looks better.

> So define three new netlink attributes
> 
> OVS_ACTION_ATTR_NSH_BASE_HEADER
> OVS_ACTION_ATTR_NSH_MD1_DATA
> OVS_ACTION_ATTR_NSH_MD2_DATA
> 
> OVS_ACTION_ATTR_PUSH_NSH is nested netlink attribute, it will nest
> OVS_ACTION_ATTR_NSH_BASE_HEADER and OVS_ACTION_ATTR_NSH_MD1_DATA for MD
> type 1, it will nest OVS_ACTION_ATTR_NSH_BASE_HEADER and
> OVS_ACTION_ATTR_NSH_MD2_DATA for MD type 2. I'll compeletely remove struct
> ovs_action_push_nsh, is it ok?

Yes, that's the way to do it.

What should be done with struct ovs_key_nsh? Even with "c" renamed to
"context", it's still MD type 1 only structure. What is the plan for
MD type 2 support wrt. this structure?

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-11 Thread Jiri Benc
On Thu, 10 Aug 2017 21:21:15 +0800, Yi Yang wrote:
> OVS master and 2.8 branch has merged NSH userspace
> patch series, this patch is to enable NSH support
> in kernel data path in order that OVS can support
> NSH in 2.8 release in compat mode by porting this.

Please include changelog when posting a new version of a patch.

> +static inline u16
> +nsh_hdr_len(const struct nsh_hdr *nsh)

Single line, please. And all other instances of this in nsh.h, too.

> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -333,6 +333,7 @@ enum ovs_key_attr {
>   OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
>   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
>   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
> + OVS_KEY_ATTR_NSH,  /* struct ovs_key_nsh */
>  
>  #ifdef __KERNEL__
>   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> @@ -491,6 +492,15 @@ struct ovs_key_ct_tuple_ipv6 {
>   __u8   ipv6_proto;
>  };
>  
> +struct ovs_key_nsh {
> + __u8 flags;
> + __u8 mdtype;
> + __u8 np;
> + __u8 pad;
> + __be32 path_hdr;
> + __be32 c[4];

I still don't like the "c" name. Please change it to something
descriptive. This is uAPI and can't be changed later.

And I still don't see my comment about this not being extensible for
MD type 2 addressed. Please understand this is uAPI and it is set in
stone once it is merged into the kernel. It's very important we get
this right since the beginning.

> +struct ovs_action_push_nsh {
> + __u8 flags;
> + __u8 mdtype;
> + __u8 mdlen;
> + __u8 np;
> + __be32 path_hdr;
> + __u8 metadata[];
> +};

This is not how netlink attributes work. Please reread Ben Pfaff's
explanation on how this needs to be structured (Message-ID:
<20170809180912.gu6...@ovn.org>) and rework the patch. I 100% agree
with what he wrote, his proposal is very clean and matches how netlink
is designed.

> @@ -835,6 +866,8 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */
>   OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */
>   OVS_ACTION_ATTR_POP_ETH,  /* No argument. */
> + OVS_ACTION_ATTR_PUSH_NSH, /* struct ovs_action_push_nsh. */
> + OVS_ACTION_ATTR_POP_NSH,  /* No argument. */

Thank you for changing this to push/pop, it looks much cleaner now.

> +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nsh_hdr *nsh = (struct nsh_hdr *)skb_network_header(skb);
> + u16 ver_flags_len;
> + u8 version, length;
> + u32 path_hdr;
> + int i;
> +
> + memset(>nsh, 0, sizeof(struct ovs_key_nsh));
> + ver_flags_len = ntohs(nsh->ver_flags_len);
> + version = (ver_flags_len & NSH_VER_MASK) >> NSH_VER_SHIFT;
> + length = (ver_flags_len & NSH_LEN_MASK) >> NSH_LEN_SHIFT;

A nit: the operation to get/set version, length and flags from the NSH
header seems to be repeated enough to warrant helper functions in
include/net/nsh.h. Something like:

static inline u8 nsh_get_version(const struct nsh_hdr *nsh)
{
return (ntohs(nsh->ver_flags_len) & NSH_VER_MASK) >> NSH_VER_SHIFT;
}

etc.

Not a blocker, though, it may be done later if needed.

> @@ -76,9 +77,11 @@ static bool actions_may_change_flow(const struct nlattr 
> *actions)
>  
>   case OVS_ACTION_ATTR_CT:
>   case OVS_ACTION_ATTR_HASH:
> + case OVS_ACTION_ATTR_POP_NSH:
>   case OVS_ACTION_ATTR_POP_ETH:
>   case OVS_ACTION_ATTR_POP_MPLS:
>   case OVS_ACTION_ATTR_POP_VLAN:
> + case OVS_ACTION_ATTR_PUSH_NSH:
>   case OVS_ACTION_ATTR_PUSH_ETH:
>   case OVS_ACTION_ATTR_PUSH_MPLS:
>   case OVS_ACTION_ATTR_PUSH_VLAN:

Alphabetical order, please.

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: add NSH support

2017-08-08 Thread Jiri Benc
On Tue,  8 Aug 2017 12:59:40 +0800, Yi Yang wrote:
> +struct ovs_key_nsh {
> + __u8 flags;
> + __u8 mdtype;
> + __u8 np;
> + __u8 pad;
> + __be32 path_hdr;
> + __be32 c[4];

"c" is a very poor name. Please rename it to something that expresses
what this field contains.

Also, this looks like MD type 1 only. How are those fields going to work
with MD type 2? I don't think MD type 2 implementation is necessary in
this patch but I'd like to know how this is going to work - it's uAPI
and thus set in stone once this is merged. The uAPI needs to be
designed with future use in mind.

> +#define OVS_ENCAP_NSH_MAX_MD_LEN 16
> +/*
> + * struct ovs_action_encap_nsh - %OVS_ACTION_ATTR_ENCAP_NSH
> + * @flags: NSH header flags.
> + * @mdtype: NSH metadata type.
> + * @mdlen: Length of NSH metadata in bytes.
> + * @np: NSH next_protocol: Inner packet type.
> + * @path_hdr: NSH service path id and service index.
> + * @metadata: NSH metadata for MD type 1 or 2
> + */
> +struct ovs_action_encap_nsh {
> + __u8 flags;
> + __u8 mdtype;
> + __u8 mdlen;
> + __u8 np;
> + __be32 path_hdr;
> + __u8 metadata[OVS_ENCAP_NSH_MAX_MD_LEN];

This is wrong. The metadata size is set to a fixed size by this and
cannot be ever extended, or at least not easily. Netlink has attributes
for exactly these cases and that's what needs to be used here.

> @@ -835,6 +866,8 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */
>   OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */
>   OVS_ACTION_ATTR_POP_ETH,  /* No argument. */
> + OVS_ACTION_ATTR_ENCAP_NSH,/* struct ovs_action_encap_nsh. */
> + OVS_ACTION_ATTR_DECAP_NSH,/* No argument. */

Use "push" and "pop", not "encap" and "decap" to be consistent with the
naming in the rest of the file. We use encap and decap for tunnel
operations. This code does not use lwtunnels, thus push and pop is more
appropriate.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v3 1/6] vxlan: refactor verification and application of configuration

2017-06-23 Thread Jiri Benc
This patchset looks good overall (would send my Acked-by for most of
this but I'm late).

On Mon, 19 Jun 2017 10:03:55 +0200, Matthias Schiffer wrote:
> Log messages in these
> functions are removed, as it is generally unexpected to find error output
> for netlink requests in the kernel log. Userspace should be able to handle
> errors based on the error codes returned via netlink just fine.

However, this is not really true. It's impossible to find out what went
wrong when you use e.g. iproute2 to configure a vxlan link.

We really need to convert the kernel log messages to the extended
netlink errors. Since you removed them prematurely, could you please
work on that?

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/7] userspace: Support for push_eth and pop_eth actions

2017-05-04 Thread Jiri Benc
On Thu, 4 May 2017 12:53:25 -0700, Joe Stringer wrote:
> On 4 May 2017 at 12:51, Joe Stringer  wrote:
> > Really? Previously the OVS usage of netlink always validated that
> > netlink attributes are exactly as long as expected and that there's no
> > trailing data.
> 
> With a glance at __ovs_nla_copy_actions(), this is also true for
> OVS_ACTION_ATTR_PUSH_ETH.

Right, sorry. I completely forgot about the special netlink handling of
ovs.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/7] userspace: Support for push_eth and pop_eth actions

2017-05-04 Thread Jiri Benc
On Thu, 4 May 2017 10:02:22 +, Zoltán Balogh wrote:
> I'm afraid the kernel would not accept that OVS_ACTION_ATTR_PUSH_ETH
> action in this case.
> 
> Jiri, could you confirm, please?

It would accept it and it would ignore the eth_type field. The kernel
does not verify whether netlink attributes are longer than they should
be. It just accepts such attributes and silently ignores the excessive
data.

I would discourage doing this, though. The data structures between the
kernel and the user space should match.

(As a side note, we're currently having a discussion that this
treatment of excessively long attributes should be changed. Of course
this would require the user space asking for such new behavior by
setsockopt to keep compatibility with existing programs.)

> We could remove the eth_type from struct ovs_action_push_eth, then
> put an additional "set action" after putting "push_eth" in the
> odp_put_push_eth_action() function in order to set the ethertype of
> the packet.

This seems weird, though. Under which circumstances does ovs not know
what ethertype the packet has? The kernel knows about it (via
skb->protocol) all the time, with or without the Ethernet header. No
flow matching could work otherwise.

> That way we would split the push_eth action into a simpler push_eth
> and a set_field. However this would lead to modify
> odp_execute_set_action()as well, since changing of ethertype with
> set_field is not allowed:

Disallowing changing of ethertype is a sensible thing to do.

> I guess something similar should be done at kernel side too, since
> the kernel module would not accept set_field ethertype either.

It does not need it. The kernel knows what the ethertype is.

> Another option would be to convert OVS_ACTION_ATTR_PUSH_ETH action
> argument between userspace and kernel.
> 
> Do you have any other proposal?
> 
> Btw, the original comment message of struct ovs_action_push_eth
> indicates eth_type as a second member.

Thanks, this is a (documentation) bug. A leftover from one of the
previous versions that nobody noticed. I'll remove it upstream once
net-next reopens.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH

2017-01-24 Thread Jiri Benc
On Wed, 18 Jan 2017 09:53:18 +, Jan Scheurich wrote:
> Please be invited to the next sync meeting.

Sorry, won't make the meeting today. Too busy with DevConf.cz
preparations.

> Actions Points:
> AP-1 (Jarno): Coordinate review of Yi's backported net-next patches
> AP-2 (Jiri) Check the ability of the kernel datapath to match on the
> presence of Ethernet header w/o matching on Ethernet addresses.
> Formating of such DP flows?

The OVS_KEY_ATTR_ETHERNET netlink attribute contains source and
destination MAC address. It can be masked.

If the OVS_KEY_ATTR_ETHERNET attribute is present, only Ethernet frames
will match. The MAC addresses may be masked (wildcarded). The
OVS_KEY_ATTR_ETHERTYPE attribute indicates the ethertype to match and
cannot be masked (it's always an exact match). If it's not present,
ETH_P_802_2 is assumed.

If the OVS_KEY_ATTR_ETHERNET in not present, the OVS_KEY_ATTR_ETHERTYPE
attribute is mandatory and indicates the ethertype to match. Again, it
cannot be masked.

To answer the question, it's currently possible to match on Ethernet
header without matching on Ethernet addresses but not without matching
on ethertype. I think it can be relaxed with some work if needed.

> AP-3 (Jiri) Provide an update on the status and ETA of the RTNETLINK API 
> patches

RFC patchset posted last week:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327749.html

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink

2017-01-10 Thread Jiri Benc
On Tue, 10 Jan 2017 15:44:07 +0100, Jiri Benc wrote:
> create an ovs_geneve interface using rtnetlink

This can be done just once. The pseudocode at startup thus would be:

create an ovs_geneve interface using rtnetlink
if successful {
delete the created interface
set out_of_tree flag
}

And interface creation:

if not out_of_tree {
create lwtunnel geneve interface using rtnetlink
check parameters of the created interface
if it is lwtunnel {
done, exit
} else {
delete the created interface
}
}
create geneve interface using genetlink
if successful {
done
} else {
fail
}

Is that what you want? If so, should the out_of_tree flag be queried
and set per tunnel type, or just based globally on ovs_geneve (or a
different kind)?

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH

2017-01-09 Thread Jiri Benc
On Mon, 9 Jan 2017 11:15:22 +, Yang, Yi Y wrote:
> We need to let users have one NSH version available before your
> proposal is implemented. I support your proposal, but I have no way
> to do anything helpful before your implementation for generic
> encap/decap and packet_type are available.

Implementing something that we know will be obsolete in a relatively
short time does not make much sense. It's painful but the focus here is
long term maintainability, not short time gain.

> I don't know what other guys are thinking about this, it seems only
> you and I are caring this :)

The truth is very few people care about NSH in ovs. There is a lot of
people who *claim* they care and that they need it implemented ASAP but
when it comes to doing real work, it's only you, Ericsson and partially
Red Hat. Which, in my opinion, pretty much reflects the urgency. If
people did really care, they would invest into this. Hopefully this
will change now with Jan taking lead in this.

I like Jan's proposal.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH

2016-12-19 Thread Jiri Benc
Hi Jan,

On Sun, 18 Dec 2016 14:44:03 +, Jan Scheurich wrote:
> I would like to call for a final sync meeting before the Christmas break.

you managed to hit almost the only slot in this week when I have a
conflict with another meeting. I'm not available on Tuesday starting at
the time you chose.

Moving the meeting to an earlier time slot or to a different day
(except for today) would work for me.

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 17/27] OVS: remove assumptions about VLAN_TAG_PRESENT bit

2016-12-13 Thread Jiri Benc
On Tue, 13 Dec 2016 01:12:38 +0100 (CET), Michał Mirosław wrote:
> @@ -850,20 +848,11 @@ static int validate_vlan_from_nlattrs(const struct 
> sw_flow_match *match,
>   return -EINVAL;
>   }
>  
> - if (a[OVS_KEY_ATTR_VLAN])
> - tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
> -
> - if (!(tci & htons(VLAN_TAG_PRESENT))) {
> - if (tci) {
> - OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
> bit set.",
> -   (inner) ? "C-VLAN" : "VLAN");
> - return -EINVAL;
> - } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> - /* Corner case for truncated VLAN header. */
> - OVS_NLERR(log, "Truncated %s header has non-zero encap 
> attribute.",
> -   (inner) ? "C-VLAN" : "VLAN");
> - return -EINVAL;
> - }
> + if (!a[OVS_KEY_ATTR_VLAN] && nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> + /* Corner case for truncated VLAN header. */
> + OVS_NLERR(log, "Truncated %s header has non-zero encap 
> attribute.",
> + (inner) ? "C-VLAN" : "VLAN");
> + return -EINVAL;

This looks wrong. The zero value of nla_get_be16(a[OVS_KEY_ATTR_VLAN])
together with empty a[OVS_KEY_ATTR_ENCAP] means truncated VLAN header
and this needs to be preserved.

In other words, you need to check also !nla_get_be16(a[OVS_KEY_ATTR_VLAN])
here.


Overall, this whole patch looks very suspicious from the very
beginning. The xors used are strong indication that something is not
right.

And indeed, you're breaking uAPI compatibility. Previously, the
VLAG_TAG_PRESENT bit set in OVS_KEY_ATTR_VLAN caused the flow to match
on packets with VLAN tag present. After this patch, it causes the flow
to match only on those packets that have a certain value of
VLAN_CFI_MASK in their VLAN tag (I haven't bother deciphering what
value is checked after all these xors, it's as well possible that it
will also match on packets _without_ any VLAN header).

You have to preserve the old meaning of VLAN_TAG_PRESENT in
OVS_KEY_ATTR_VLAN. When doing flow lookups, VLAN_TAG_PRESENT must match
on and only on packets that have VLAN tag present, irrespective of their
VLAN_CFI_MASK.

If you want to introduce support for lookups on VLAN_CFI_MASK, you'll
have to do that by introducing a new netlink attribute.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink

2016-12-09 Thread Jiri Benc
On Thu, 8 Dec 2016 22:49:56 -0800, Pravin Shelar wrote:
> OVS out of tree kernel module is using compat tunnel code upto kernel
> 4.5 kernel even thought LWT is available in these kernels. This is due
> to missing features on these kernel which are backported to OVS
> module. In future we could bump up requirements of kernel again.
> Therefore I think we could add compat code for GPE given it is not
> that complicated.

What I'm concerned about is not so much the code in the out of tree
module but the added code that would have to try genetlink for
VXLAN-GPE. I hoped to use rtnetlink only for this; adding genetlink
would required quite a bit of code (also because it will have to be
tried only for the out of tree module but never for the in kernel one;
this differs from what we'll do for VXLAN itself).

We'd end up with three different handling for different features:

(1) genetlink+rtnetlink for both out of tree and in tree module (for
e.g. plain VXLAN)
(2) genetlink+rtnetlink for out of tree module, rtnetlink for in three
module (for VXLAN-GPE)
(3) rtnetlink for both out of tree and in tree module (new VXLAN
features added in the future)

I hoped to avoid this. Especially given that (2) covers only very
limited time period.

But if you insist...

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: fix VxLAN-gpe port can't be created in ovs compat mode

2016-12-08 Thread Jiri Benc
On Thu, 8 Dec 2016 18:57:51 +0800, Yang, Yi wrote:
> So ovs out of tree modules need to adapt to upstream kernel, any
> kernel-related changes must be accepted by Linux kernel at first.

I'm perfectly aware of that and I'm saying that your patch is
unacceptable for upstream kernel. This is a long standing policy of the
kernel: there's no way you can get a patch into the kernel to
accommodate an out of tree kernel module. The policy is there for good
reasons and as paradoxical as it may sound, it benefits the projects
that employ out of tree modules in the long run.

If Open vSwitch wants to carry a non-upstream patch, it's its choice
and we can have that discussion but that's not something to discuss on
netdev@vger nor propose for net-next.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink

2016-12-08 Thread Jiri Benc
On Wed, 7 Dec 2016 16:35:59 -0800, Pravin Shelar wrote:
> In compat mode, OVS tunnel devices are not used in same way as LWT,
> since OVS do support kernel version that does not have core LWT
> support. Therefore we have to use legacy vport APIs to create these
> ports.

I see. Yes, that's unfortunate.

> There might be a way to configure the device, once it is
> created, using rtnetlink API but would complicate the code. So I think
> in such cases like GPE we could to add code to the legacy code.

Could we just support the newest shiniest features only with lwtunnel
capable kernel? Kernel 4.3 is out for more than a year already, that's
a long time. And several more months will pass before this is available
in an Open vSwitch release.

What about:
- always preferring the out of tree module (whatever capabilities it
  has)
- first try rtnetlink
- if it fails, try genetlink
- if it fails (but the out of tree module is there), just don't
  bother with kernel
- then try the in kernel module
- first rtnetlink
- if it fails then genetlink

This way, we would accommodate most of the stuff. With old kernels,
VXLAN-GPE wouldn't be available even with the out of tree module (but
I think that's it, I can't think of any other feature unavailable at
this point; of course, more may be added in the future).

Is this really that bad? It's (relatively) simply to implement, the out
of tree module does not diverge from the in kernel one too much, and I
don't think the requirement for lwtunnels capable kernel for the newest
features is unreasonable.

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: fix VxLAN-gpe port can't be created in ovs compat mode

2016-12-08 Thread Jiri Benc
On Thu,  8 Dec 2016 16:20:10 +0800, Yi Yang wrote:
> In ovs compat mode, ovs won't use LWT in current kernel, this is to
> make sure ovs can work on the old kernels, Linux kernel v4.7 includes
> VxLAN-gpe support but many Linux distributions' kernels are odler than
> v4.7, this fix will ensure that ovs can create VxLAN-gpe port correctly
> on old kernels, it has been verified on Ubuntu 16.04 x86_64 with Linux
> kernel 4.4.0-53-generic.

NAK. We do have a way to configure this and that's rtnetlink. Open
vSwitch should use that to configure tunnels. Out of tree modules are
on their own. Upstream kernel does not accommodate out of tree modules.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink

2016-12-07 Thread Jiri Benc
On Wed, 7 Dec 2016 09:03:39 +0800, Yang, Yi wrote:
> But ovs userspace use genetlink to create vxlan tunnel port when we
> build ovs to use its own compat modules, this is case 3 Pravin, how
> do you think we can handle this?

Where's the problem? I don't see any.

> Now the dilemma is we can't create vxlangpe port although it has been
> in ovs, I know it is ok if we use upstream tunnel code, but in many
> cases, Linux distribution didn't include recent kernels, we have to
> use ovs' compat modules, Pravin's comments also addressed this.

There's no problem in using VXLAN-GPE in the out of tree driver using
rtnetlink. As Eric wrote, we're working on this.

> So anybody will change current genetlink in ovs to rtnetlink? I will
> try to change it if nobody will fix it. Another issue, I think ovs
> will send rtnetlink message to kernel, right? Can ovs handle
> rtnetlink message if we use ovs compat modules instead of upstream
> kernel modules?

I don't see any problem.

We know how to handle all of the cases. Eric took over the work from
Thadeu and he has been working on the patches. Nobody prevents you from
submitting your own patches; but please do your own research in such
case first. By asking tons of question you're not speeding things up,
as we'll be spending time explaining instead of writing and testing the
code.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink

2016-12-06 Thread Jiri Benc
On Tue, 6 Dec 2016 07:17:20 +, Yang, Yi Y wrote:
> So my advice about this is we can push patch [2] to Linux net-next
> first, then apply patch series
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-June/316879.html
> from Cascardo and apply [1], that can cover all the cases Pravin
> mentioned.

There's no reason to add this to the genetlink interface when we're
going to switch to rtnetlink for tunnel configuration.

NACKed-by: Jiri Benc <jb...@redhat.com>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Packet type-aware pipeline, L3 tunneling, EXT-382 and NSH: Minutes call 2016-11-29

2016-12-01 Thread Jiri Benc
On Wed, 30 Nov 2016 22:52:38 +, Jan Scheurich wrote:
> The corresponding section in the document is already updated with the
> results of the discussion. So please have look. There's also a short
> summary of the main results at the end of the mail.

Thanks for writing that down. I see several shortcomings in the
proposal and several things make zero sense to me as proposed (probably
EXT-382 fault). I commented on that in the document.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12 rebased 1/3] userspace: add support for pop_eth and push_eth actions

2016-11-29 Thread Jiri Benc
On Mon, 28 Nov 2016 16:56:27 -0800, Ben Pfaff wrote:
> I know that this isn't intended to be applied yet, according to the
> cover letter, but it doesn't build:
> 
> ../lib/odp-util.c:5504:15: error: no member named 'base_layer' in 'struct 
> flow'
> ../lib/odp-util.c:5504:40: error: no member named 'base_layer' in 'struct 
> flow'
> ../lib/odp-util.c:5505:19: error: no member named 'base_layer' in 'struct 
> flow'
> ../lib/odp-util.c:5505:33: error: use of undeclared identifier 'LAYER_2'
> ../lib/odp-util.c:5512:20: error: no member named 'base_layer' in 'struct 
> flow'
> ../lib/odp-util.c:5512:40: error: no member named 'base_layer' in 'struct 
> flow'
> 
> and furthermore I can't any mention of base_layer in the OVS history so
> I'm not sure what's going on here.

It's in patch 2, seems that the order is mixed up. Sorry for that.
Please apply all three patches together for testing.

We'll be taking a different approach with this patchset, implementing
EXT-382 instead.

Thanks,

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v13 4/8] openvswitch: support MPLS push and pop for L3 packets

2016-11-10 Thread Jiri Benc
Update Ethernet header only if there is one.

Signed-off-by: Jiri Benc <jb...@redhat.com>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/actions.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index dc8bb97e2258..064cbcb7b0c5 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -187,7 +187,8 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
 
skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
 
-   update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
+   if (ovs_key_mac_proto(key) == MAC_PROTO_ETHERNET)
+   update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
skb->protocol = mpls->mpls_ethertype;
 
invalidate_flow_key(key);
@@ -197,7 +198,6 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
 static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
const __be16 ethertype)
 {
-   struct ethhdr *hdr;
int err;
 
err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
@@ -213,11 +213,15 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb->mac_len);
 
-   /* mpls_hdr() is used to locate the ethertype field correctly in the
-* presence of VLAN tags.
-*/
-   hdr = (struct ethhdr *)((void *)mpls_hdr(skb) - ETH_HLEN);
-   update_ethertype(skb, hdr, ethertype);
+   if (ovs_key_mac_proto(key) == MAC_PROTO_ETHERNET) {
+   struct ethhdr *hdr;
+
+   /* mpls_hdr() is used to locate the ethertype field correctly 
in the
+* presence of VLAN tags.
+*/
+   hdr = (struct ethhdr *)((void *)mpls_hdr(skb) - ETH_HLEN);
+   update_ethertype(skb, hdr, ethertype);
+   }
if (eth_p_mpls(skb->protocol))
skb->protocol = ethertype;
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v13 3/8] openvswitch: pass mac_proto to ovs_vport_send

2016-11-10 Thread Jiri Benc
We'll need it to alter packets sent to ARPHRD_NONE interfaces.

Change do_output() to use the actual L2 header size of the packet when
deciding on the minimum cutlen. The assumption here is that what matters is
not the output interface hard_header_len but rather the L2 header of the
particular packet. For example, ARPHRD_NONE tunnels that encapsulate
Ethernet should get at least the Ethernet header.

Signed-off-by: Jiri Benc <jb...@redhat.com>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/actions.c | 29 +
 net/openvswitch/vport.c   |  2 +-
 net/openvswitch/vport.h   |  2 +-
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 44144f914920..dc8bb97e2258 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -66,6 +66,7 @@ struct ovs_frag_data {
u16 vlan_tci;
__be16 vlan_proto;
unsigned int l2_len;
+   u8 mac_proto;
u8 l2_data[MAX_L2_LEN];
 };
 
@@ -673,7 +674,7 @@ static int ovs_vport_output(struct net *net, struct sock 
*sk, struct sk_buff *sk
skb_reset_mac_len(skb);
}
 
-   ovs_vport_send(vport, skb);
+   ovs_vport_send(vport, skb, data->mac_proto);
return 0;
 }
 
@@ -692,7 +693,7 @@ static int ovs_vport_output(struct net *net, struct sock 
*sk, struct sk_buff *sk
  * ovs_vport_output(), which is called once per fragmented packet.
  */
 static void prepare_frag(struct vport *vport, struct sk_buff *skb,
-u16 orig_network_offset)
+u16 orig_network_offset, u8 mac_proto)
 {
unsigned int hlen = skb_network_offset(skb);
struct ovs_frag_data *data;
@@ -705,6 +706,7 @@ static void prepare_frag(struct vport *vport, struct 
sk_buff *skb,
data->network_offset = orig_network_offset;
data->vlan_tci = skb->vlan_tci;
data->vlan_proto = skb->vlan_proto;
+   data->mac_proto = mac_proto;
data->l2_len = hlen;
memcpy(>l2_data, skb->data, hlen);
 
@@ -713,7 +715,8 @@ static void prepare_frag(struct vport *vport, struct 
sk_buff *skb,
 }
 
 static void ovs_fragment(struct net *net, struct vport *vport,
-struct sk_buff *skb, u16 mru, __be16 ethertype)
+struct sk_buff *skb, u16 mru,
+struct sw_flow_key *key)
 {
u16 orig_network_offset = 0;
 
@@ -727,11 +730,12 @@ static void ovs_fragment(struct net *net, struct vport 
*vport,
goto err;
}
 
-   if (ethertype == htons(ETH_P_IP)) {
+   if (key->eth.type == htons(ETH_P_IP)) {
struct dst_entry ovs_dst;
unsigned long orig_dst;
 
-   prepare_frag(vport, skb, orig_network_offset);
+   prepare_frag(vport, skb, orig_network_offset,
+ovs_key_mac_proto(key));
dst_init(_dst, _dst_ops, NULL, 1,
 DST_OBSOLETE_NONE, DST_NOCOUNT);
ovs_dst.dev = vport->dev;
@@ -742,7 +746,7 @@ static void ovs_fragment(struct net *net, struct vport 
*vport,
 
ip_do_fragment(net, skb->sk, skb, ovs_vport_output);
refdst_drop(orig_dst);
-   } else if (ethertype == htons(ETH_P_IPV6)) {
+   } else if (key->eth.type == htons(ETH_P_IPV6)) {
const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
unsigned long orig_dst;
struct rt6_info ovs_rt;
@@ -751,7 +755,8 @@ static void ovs_fragment(struct net *net, struct vport 
*vport,
goto err;
}
 
-   prepare_frag(vport, skb, orig_network_offset);
+   prepare_frag(vport, skb, orig_network_offset,
+ovs_key_mac_proto(key));
memset(_rt, 0, sizeof(ovs_rt));
dst_init(_rt.dst, _dst_ops, NULL, 1,
 DST_OBSOLETE_NONE, DST_NOCOUNT);
@@ -765,7 +770,7 @@ static void ovs_fragment(struct net *net, struct vport 
*vport,
refdst_drop(orig_dst);
} else {
WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
- ovs_vport_name(vport), ntohs(ethertype), mru,
+ ovs_vport_name(vport), ntohs(key->eth.type), mru,
  vport->dev->mtu);
goto err;
}
@@ -785,19 +790,19 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
u32 cutlen = OVS_CB(skb)->cutlen;
 
if (unlikely(cutlen > 0)) {
-   if (skb->len - cutlen > ETH_HLEN)
+   if (skb->len - cutlen > ovs_mac_header_len(key))
pskb_trim(skb, skb->len - cutlen);
else
- 

[ovs-dev] [PATCH net-next v13 1/8] openvswitch: use hard_header_len instead of hardcoded ETH_HLEN

2016-11-10 Thread Jiri Benc
On tx, use hard_header_len while deciding whether to refragment or drop the
packet. That way, all combinations are calculated correctly:

* L2 packet going to L2 interface (the L2 header len is subtracted),
* L2 packet going to L3 interface (the L2 header is included in the packet
  lenght),
* L3 packet going to L3 interface.

Signed-off-by: Jiri Benc <jb...@redhat.com>
Acked-by: Pravin B Shelar <pshe...@ovn.org>
---
 net/openvswitch/actions.c |  3 ++-
 net/openvswitch/vport.c   | 10 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1105c4e29c62..49af167105d3 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -791,7 +791,8 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
pskb_trim(skb, ETH_HLEN);
}
 
-   if (likely(!mru || (skb->len <= mru + ETH_HLEN))) {
+   if (likely(!mru ||
+  (skb->len <= mru + vport->dev->hard_header_len))) {
ovs_vport_send(vport, skb);
} else if (mru <= vport->dev->mtu) {
struct net *net = read_pnet(>net);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 9bb85b35a1fb..3e131f6868f2 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -464,9 +464,10 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
return 0;
 }
 
-static unsigned int packet_length(const struct sk_buff *skb)
+static unsigned int packet_length(const struct sk_buff *skb,
+ struct net_device *dev)
 {
-   unsigned int length = skb->len - ETH_HLEN;
+   unsigned int length = skb->len - dev->hard_header_len;
 
if (!skb_vlan_tag_present(skb) &&
eth_type_vlan(skb->protocol))
@@ -484,10 +485,11 @@ void ovs_vport_send(struct vport *vport, struct sk_buff 
*skb)
 {
int mtu = vport->dev->mtu;
 
-   if (unlikely(packet_length(skb) > mtu && !skb_is_gso(skb))) {
+   if (unlikely(packet_length(skb, vport->dev) > mtu &&
+!skb_is_gso(skb))) {
net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
 vport->dev->name,
-packet_length(skb), mtu);
+packet_length(skb, vport->dev), mtu);
vport->dev->stats.tx_errors++;
goto drop;
}
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev