Re: [ovs-dev] [PATCH] netdev: Retry getting interfaces on inconsistent dumps from kernel
On Thu, 12 Jul 2018 08:18:43 +0200, Daniel Alvarez wrote: > This patch in glibc [0] is fixing a bug where we may be getting > inconsistent dumps from the kernel when listing interfaces due to > a race condition. > > This could happen if we try to retrieve them while interfaces are > being added/removed from the system at the same time. > For systems running against old glibc versions, this patch is retrying > the operation up to 3 times and then proceeding by logging a > warning. > > Note that 3 times should be enough to not delay the operation much > and since it's unlikely that we hit the race condition 3 times in > a row. Still, if this happened, this patch is not changing the > current behavior. > > [0] > https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4 > > Signed-off-by: Daniel Alvarez > Signed-off-by: Jiri Benc Co-authored-by: Jiri Benc is better in this case :-) Thanks, Daniel, for following up on this! Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif: Ensure ERSPAN GRE support
On Fri, 1 Jun 2018 11:23:12 -0700, William Tu wrote: > Looking at the dpif_netlink_rtnl_probe_oot_tunnels(), since now we > added ERSPAN feature, instead of probing geneve module, > we should probe ip_gre module with a nlattr of ERSPAN (ex: HWID). > If it does not return -ENOSUPPORT, then use the upstream ip_gre module. That doesn't make sense. That function is used solely to determine whether the out of (kernel) tree modules are loaded. If they are, ovs prefers them. Note that the features do not come into play here: if there are out of tree modules present, they are used. It's very well possible that the running kernel has more features but it's irrelevant. It doesn't matter in which way the code determines whether there are out of tree modules loaded or not. Probing for geneve thus works just fine and there's no need to change it. > And the next added feature should change this function to determine > whether to use compat mode or not. Do I understand it right? No. The out of tree modules are preferred, irrespective of the supported features. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif: Ensure ERSPAN GRE support
On Fri, 1 Jun 2018 09:15:33 -0700, Gregory Rose wrote: > Since ERSPAN over gre/ip_gre was added to the Linux 4.16 kernel the > compat interface is needed > for kernels up to 4.15 so that we can support ERSPAN. If the built-in > gre/ip_gre kernel modules > don't have the ERSPAN support in them then we have to use the compat > interface. That's very wrong. The compat interface should not be used with upstream kernel (except perhaps for very very very old kernels). We converted the API to the standard rtnetlink for good reasons. New features are not supported using the compat API. You are potentially breaking future distribution kernels by reverting to an obsolete and deprecated API. You'll have to find a different way to do what you need. Eric described pretty nicely a way to achieve that and how the fallbacks work, please re-read his emails and modify the code accordingly. > The target for USE_UPSTREAM_TUNNEL is moved to 4.16 now. That's when > ERSPAN becomes > fully supported. Going forward the ERSPAN feature is the determinant > for whether gr/ip_gre > compat mode is used or not. And with the next added feature to the kernel, that next feature will be what determines whether the compat mode will be used? And then next and so on? This doesn't work. ERSPAN must not be the decision factor. Instead, rtnetlink must be tried first and if and only if it fails, compat mode can be used. Please go read what Eric described about reading the value back. As for the patch, Nacked-by: Jiri Benc Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath: compat: Fix ndo_size in RHEL 7.5 backport
On Thu, 17 May 2018 12:39:51 -0700, Yi-Hung Wei wrote: > If 'ndo_size' is not set in 'struct net_device_ops', RHEL kernel will not > make use of functions in 'struct net_device_ops_extended'. > > Fixes: 39ca338374ab ("datapath: compat: Fix build on RHEL 7.5") > Reported-by: Jiri Benc <jb...@redhat.com> > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347070.html > Signed-off-by: Yi-Hung Wei <yihung@gmail.com> Reviewed-by: Jiri Benc <jb...@redhat.com> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] datapath: compat: Fix build on RHEL 7.5
On Fri, 11 May 2018 10:32:12 -0700, Yi-Hung Wei wrote: > --- a/datapath/linux/compat/geneve.c > +++ b/datapath/linux/compat/geneve.c > @@ -1271,7 +1271,11 @@ static const struct net_device_ops geneve_netdev_ops = > { > .ndo_stop = geneve_stop, > .ndo_start_xmit = geneve_dev_xmit, > .ndo_get_stats64= ip_tunnel_get_stats64, > +#ifdef HAVE_RHEL7_MAX_MTU > + .extended.ndo_change_mtu = geneve_change_mtu, Note that this will never be called by the RHEL kernel unless you also set .ndo_size to sizeof(struct net_device_ops). In other words, you've just effectively set .ndo_change_mtu to NULL. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] odp-util: Fix ofproto/trace with odp flow
On Tue, 13 Mar 2018 09:36:23 -0700, Darrell Ball wrote: > [Darrell] There was no suggestion otherwise in general. We were discussing > for one the difference b/w kernel and userspace DP for handling packet type > aware support. Specifically, the difference is here is as fundamental as it > gets - > eth vs non-eth, The discussed support for ofproto/trace is also in that > context; > see Jan's response. Ah, right, sorry. So, the problem is that the two datapaths express the same thing (packet type) differently, right? I think the solution would then be to insert some kind of translation. ovs-dpctl dump-flows for the kernel probably should output the flow in a format that ofproto/trace expects and that matches the user space datapath. (And reverse that in the other direction.) Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] odp-util: Fix ofproto/trace with odp flow
Darell, please fix your email client configuration to conform to RFC 3676 (https://tools.ietf.org/html/rfc3676#section-4.5). Your replies are unreadable. In particular, the text you're replying to must be quoted by ">" character, clearly separating your reply from the original email. Also, wrap your lines at ~78 characters. If your email client does not support this, it is severely broken and you'll need to use a different one. On Tue, 13 Mar 2018 02:35:57 +, Darrell Ball wrote: > [Darrell] I guess part of the value of ofproto/trace is that it > matches as close as possible to other injection sources. Kernel datapath API has nothing to do with OpenFlow. In particular, using output from ovs-dpctl dump-flows in a place where OpenFlow is required could not be expected to work. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] dpif-netlink: don't allocate per port netlink sockets
On Tue, 23 Jan 2018 09:50:00 -0800, Ben Pfaff wrote: > Per-port sockets help OVS to improve fairness, so that a single busy > port can't monopolize all slow-path resources. We can't just throw that > away. If there's a problem with too many netlink sockets, let's come up > with a solution, but it can't be to eliminate fairness entirely. The main problem with this is the sockets are dedicated to particular ports. With high number of ports, the number of sockets raises to such high value that it stops helping performance anyway. There's not much point of having thousands opened sockets per CPU, we'll not be able to utilize them anyway. Having dedicated sockets for each port is wasteful, especially if one port is very busy with upcalls and other ports are mostly idle. (And if many ports are busy with many upcalls, we're in performance troubles anyway.) It would make sense to share the sockets between the ports. I'd like to understand more about the thoughts behind the fairness. Why should not a single busy port use all slow-path resources in case other ports are idle? Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames
On Wed, 20 Dec 2017 15:09:22 -0500, Eric Garver wrote: > skb_vlan_pop() expects skb->protocol to be a valid TPID for double > tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() > shift the true ethertype into position for us. > > Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") > Signed-off-by: Eric Garver <e...@erig.me> Thanks! Reviewed-by: Jiri Benc <jb...@redhat.com> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net v2] openvswitch: Fix pop_vlan action for double tagged frames
On Wed, 20 Dec 2017 10:39:32 -0500, Eric Garver wrote: > + if (is_flow_key_valid(key) && key->eth.vlan.tci && key->eth.cvlan.tci) Maybe (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)) for consistency with the rest of the code? But it's just nitpicking. The real problem here is when a double tagged packet leaves the ovs bridge, it won't have the skb->protocol that the kernel expects: it will be ethertype of the payload, while my understanding is it should be the inner tpid, right? This patch fixes that nicely for the pop vlan case. But what about other cases? It seems to me that we need to add the logic to key_extract. Thanks! Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net] openvswitch: Fix pop_vlan action for double tagged frames
On Tue, 19 Dec 2017 13:57:53 -0500, Eric Garver wrote: > --- a/net/openvswitch/flow.c > +++ b/net/openvswitch/flow.c > @@ -559,8 +559,9 @@ static int parse_nsh(struct sk_buff *skb, struct > sw_flow_key *key) > * of a correct length, otherwise the same as skb->network_header. > * For other key->eth.type values it is left untouched. > * > - *- skb->protocol: the type of the data starting at skb->network_header. > - * Equals to key->eth.type. > + *- skb->protocol: For Ethernet, the ethertype or VLAN TPID. > + * For non-Ethernet, the type of the data starting at > skb->network_header > + * (also equal to key->eth.type). > */ > static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) > { > @@ -579,6 +580,7 @@ static int key_extract(struct sk_buff *skb, struct > sw_flow_key *key) > return -EINVAL; > > skb_reset_network_header(skb); > + key->eth.type = skb->protocol; > } else { > eth = eth_hdr(skb); > ether_addr_copy(key->eth.src, eth->h_source); > @@ -592,15 +594,14 @@ static int key_extract(struct sk_buff *skb, struct > sw_flow_key *key) > if (unlikely(parse_vlan(skb, key))) > return -ENOMEM; > > - skb->protocol = parse_ethertype(skb); > - if (unlikely(skb->protocol == htons(0))) > + key->eth.type = parse_ethertype(skb); > + if (unlikely(key->eth.type == htons(0))) > return -ENOMEM; > > skb_reset_network_header(skb); > __skb_push(skb, skb->data - skb_mac_header(skb)); > } > skb_reset_mac_len(skb); > - key->eth.type = skb->protocol; > > /* Network layer. */ > if (key->eth.type == htons(ETH_P_IP)) { Unfortunately, this does not work. key_extract must set skb->protocol even for Ethernet frames that come from a mixed L2/L3 tunnel. Such packets will have key->mac_proto set to MAC_PROTO_ETHERNET and skb->protocol set to ETH_P_TEB (see key_extract_mac_proto). In key_extract, skb->protocol has to be correctly set to the dissected value. Which means that we have to check for the existence of inner vlan tag (by checking key->eth.cvlan.tci or, perhaps better, by returning it from parse_vlan) and set skb->protocol accordingly. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/8] netlink: provide network namespace id from a msg.
On Mon, 4 Dec 2017 11:44:19 -0200, Flavio Leitner wrote: > It is only positive numbers if you review the kernel sources today, > but the API is exposing signed 32bit, so not sure what can happen > in the future if the kernel decides to expand the range. I think you can assume that it will always be a positive number. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v15] openvswitch: enable NSH support
On Sat, 4 Nov 2017 07:29:46 -0700, Pravin Shelar wrote: > > +int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh) > > +{ > > + struct nshhdr *nh; > > + size_t length = nsh_hdr_len(pushed_nh); > > + u8 next_proto; > > + > > + if (skb->mac_len) { > > + next_proto = TUN_P_ETHERNET; > > + } else { > > + next_proto = tun_p_from_eth_p(skb->protocol); > > + if (!next_proto) > > + return -EAFNOSUPPORT; > check for supported protocols can be moved to flow install validation > in __ovs_nla_copy_actions(). You mean the check for !next_proto? It needs to be present for correctness of nsh_push. This function has to be self contained, it will be used by more callers than opevswitch, namely tc. It's actually not so much a check for "supported protocols", it's rather a check of return value of a function that converts ethertype to a 1 byte tunnel type. Blindly using a result of a function that may return error would be wrong. Openvswitch is free to perform additional checks but this needs to stay. > > +int nsh_pop(struct sk_buff *skb) > > +{ > > + struct nshhdr *nh; > > + size_t length; > > + __be16 inner_proto; > > + > > + if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN)) > > + return -ENOMEM; > > + nh = (struct nshhdr *)(skb->data); > > + length = nsh_hdr_len(nh); > > + inner_proto = tun_p_to_eth_p(nh->np); > same as above, this check can be moved to flow install > __ovs_nla_copy_actions(). There's no check here. > > + if (!pskb_may_pull(skb, length)) > > + return -ENOMEM; > > + > > + if (!inner_proto) > > + return -EAFNOSUPPORT; Did you mean this one instead? Then see above, this has to stay. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v15] openvswitch: enable NSH support
On Wed, 1 Nov 2017 12:03:01 +0800, Yi Yang wrote: > v14->v15 > - Check size in nsh_hdr_from_nlattr > - Fixed four small issues pointed out By Jiri and Eric Acked-by: Jiri Benc <jb...@redhat.com> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v14] openvswitch: enable NSH support
On Mon, 30 Oct 2017 09:29:34 +0800, Yi Yang wrote: > +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key, > +const struct nlattr *a) > +{ > + struct nshhdr *nh; > + size_t length; > + int err; > + u8 flags; > + u8 ttl; > + int i; > + > + struct ovs_key_nsh key; > + struct ovs_key_nsh mask; > + > + err = nsh_key_from_nlattr(a, , ); > + if (err) > + return err; > + > + /* Make sure the NSH base header is there */ > + if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN)) This should be skb_network_offset(skb) + NSH_BASE_HDR_LEN. > +size_t ovs_nsh_key_attr_size(void) > +{ > + /* Whenever adding new OVS_NSH_KEY_ FIELDS, we should consider > + * updating this function. > + */ > + return nla_total_size(NSH_BASE_HDR_LEN) /* OVS_NSH_KEY_ATTR_BASE */ > + /* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are > + * mutually exclusive, so the bigger one can cover > + * the small one. > + * > + * OVS_NSH_KEY_ATTR_MD2 > + */ A nit, not important but since you'll need to respin anyway: the last line in the comment above seems to be a left over from some previous version of the comment. This should be enough: /* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are * mutually exclusive, so the bigger one can cover * the small one. */ Or maybe I misunderstood what you meant. > +int nsh_hdr_from_nlattr(const struct nlattr *attr, > + struct nshhdr *nh, size_t size) > +{ > + struct nlattr *a; > + int rem; > + u8 flags = 0; > + u8 ttl = 0; > + int mdlen = 0; > + > + /* validate_nsh has check this, so we needn't do duplicate check here > + */ > + nla_for_each_nested(a, attr, rem) { > + int type = nla_type(a); > + > + switch (type) { > + case OVS_NSH_KEY_ATTR_BASE: { > + const struct ovs_nsh_key_base *base = nla_data(a); > + > + flags = base->flags; > + ttl = base->ttl; > + nh->np = base->np; > + nh->mdtype = base->mdtype; > + nh->path_hdr = base->path_hdr; > + break; > + } > + case OVS_NSH_KEY_ATTR_MD1: > + mdlen = nla_len(a); > + memcpy(>md1, nla_data(a), mdlen); The check for 'size' disappeared from here somehow. > + break; > + > + case OVS_NSH_KEY_ATTR_MD2: > + mdlen = nla_len(a); > + memcpy(>md2, nla_data(a), mdlen); And here. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v14] openvswitch: enable NSH support
On Tue, 31 Oct 2017 15:57:41 -0400, Eric Garver wrote: > On Mon, Oct 30, 2017 at 09:29:34AM +0800, Yi Yang wrote: > > + if (WARN_ON(is_push_nsh && is_mask)) > > + return -EINVAL; > > OVS_NLERR() is probably more appropriate. No, not here. If this happens, it's a bug in the kernel and WARN_ON is what we need. This is not triggerable from user space and user space has no way to fix it if this happens. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support
On Fri, 20 Oct 2017 05:53:12 +0800, Yang, Yi wrote: > For push_nsh, my typical use scinario is push_nsh then set then output > to vxlangpe port. Okay. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support
On Thu, 19 Oct 2017 21:12:15 +0800, Yang, Yi wrote: > flow_key in set_nsh is got from netlink message which is set by > commit_nsh in user space, here is code. Isn't this the 'key' local variable that you're talking about, while I'm referring to the 'flow_key' parameter? Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support
On Thu, 19 Oct 2017 19:40:53 +0800, Yang, Yi wrote: > Actually mdtype can't be set, only push_nsh can set mdtype, so set_nsh > won't have mdtype flow key, we can't get it from flow_key in set_nsh, > only ttl, flags and path_hdr can be set in set_nsh as you can see in code. > I understand your concern is calling skb_ensure_writable twice, so > changing the first one to pskb_may_pull is more appropriate for set_nsh. Isn't set_nsh called only after the packet was already dissected (i.e. parse_nsh called)? The dissected fields should be available in flow_key. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v12] openvswitch: enable NSH support
On Mon, 16 Oct 2017 21:53:29 +0800, Yi Yang wrote: > +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key, > +const struct nlattr *a) > +{ > + struct nshhdr *nh; > + size_t length; > + int err; > + u8 flags; > + u8 ttl; > + int i; > + > + struct ovs_key_nsh key; > + struct ovs_key_nsh mask; > + > + err = nsh_key_from_nlattr(a, , ); > + if (err) > + return err; > + > + /* Make sure the NSH base header is there */ > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > +NSH_BASE_HDR_LEN); > + if (unlikely(err)) > + return err; > + > + nh = nsh_hdr(skb); > + length = nsh_hdr_len(nh); > + > + /* Make sure the whole NSH header is there */ > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > +length); Calling skb_ensure_writable twice is an unnecessary waste in the fast path. If anything, the first call should be changed to pskb_may_pull. But what we really should do here is to move the header only once. We know how much data we're going to write, we have everything stored in the key and can calculate it from there. length = NSH_BASE_HDR_LEN; switch (flow_key->nsh.base.mdtype) { case NSH_MD_TYPE1: length += sizeof(struct ovs_nsh_key_md1); break; case NSH_MD_TYPE2: length += whatever_way_we_store_the_tlvs_in_flow_key; break; } err = skb_ensure_writable(skb, skb_network_offset(skb) + length); However, set_nsh does not support MD type 2, thus the second case is a dead code. In both switches in this function. As such, it should be removed and added only when MD type 2 is introduced. I'd still keep the overall logic to ease the future addition, though. This boils down to: length = NSH_BASE_HDR_LEN; /* Assume MD type 1. This function cannot be called for anything * else currently. When MD type 2 is added, the line below will * have to be turned into a switch on flow_key->nsh.base.mdtype. */ length += sizeof(struct ovs_nsh_key_md1); err = skb_ensure_writable(skb, skb_network_offset(skb) + length); ... flow_key->nsh.base.path_hdr = nh->path_hdr; /* Only MD type 1, see the comment above. */ for (i = 0; i < NSH_MD1_CONTEXT_SIZE; i++) { ... Please verify I'm not missing something. It seems we also rely on the user space checking first whether the packet is indeed compatible with the pushed key/mask. Most importantly, that it's of the same mdtype and (in the future) that the MD type 2 TLVs being written actually fit. Seems this is done the same way in the other set_* actions, thus fine with me. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v11] openvswitch: enable NSH support
On Fri, 29 Sep 2017 15:03:30 +0800, Yi Yang wrote: > --- a/include/net/nsh.h > +++ b/include/net/nsh.h > @@ -304,4 +304,7 @@ static inline void nsh_set_flags_ttl_len(struct nshhdr > *nsh, u8 flags, > NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK); > } > > +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *nh); [...] > +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *src_nsh_hdr) This is minor but since this patch will need a respin anyway, please name the variables in the forward declaration and here the same. > +int skb_pop_nsh(struct sk_buff *skb) > +{ > + int err; > + struct nshhdr *nsh_hdr = (struct nshhdr *)(skb->data); Bad name of the variable, clashes with the nsh_hdr function. I pointed that out already. > + size_t length; > + __be16 inner_proto; > + > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > +sizeof(struct nshhdr)); You assume that the skb starts at the NSH header, thus the skb_network_offset is completely unnecessary and introduces just another assumption on the caller. Also, the sizeof(struct nshhdr) is wrong: there's no guarantee that the header is not smaller or larger than that. More importantly though, why do you need skb_ensure_writable? You don't write into the header. pkskb_may_pull is enough. if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN)) return -ENOMEM; length = nsh_hdr_len(nsh_hdr); if (!pskb_may_pull(skb, length)) return -ENOMEM; > +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key, > +const struct nlattr *a) > +{ > + struct nshhdr *nh; > + int err; > + u8 flags; > + u8 ttl; > + int i; > + > + struct ovs_key_nsh key; > + struct ovs_key_nsh mask; > + > + err = nsh_key_from_nlattr(a, , ); > + if (err) > + return err; > + > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > +sizeof(struct nshhdr)); I missed this before: this is wrong, too. You need to use the real header size, not sizeof(struct nshhdr). It should be computable from the flow key. > + case OVS_ACTION_ATTR_PUSH_NSH: { > + u8 buffer[NSH_HDR_MAX_LEN]; > + struct nshhdr *nh = (struct nshhdr *)buffer; > + > + nsh_hdr_from_nlattr(nla_data(a), nh, > + NSH_HDR_MAX_LEN); > + err = push_nsh(skb, key, (const struct nshhdr *)nh); Is the cast to const really needed? It looks suspicious. If you added it because a compiler complained, it's even more suspicious. > +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + struct nshhdr *nh; > + unsigned int nh_ofs = skb_network_offset(skb); > + u8 version, length; > + int err; > + > + err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN); > + if (unlikely(err)) > + return err; > + > + nh = nsh_hdr(skb); > + version = nsh_get_ver(nh); > + length = nsh_hdr_len(nh); > + > + if (version != 0) > + return -EINVAL; > + > + err = check_header(skb, nh_ofs + length); > + if (unlikely(err)) > + return err; > + > + nh = (struct nshhdr *)skb_network_header(skb); I really really really hate this. This is the third time I'm telling you to use the nsh_hdr function. Every time, you change only part of the places. And this one I even explicitly pointed out in the previous review. I'm not supposed to look at my previous review to verify that you addressed everything. That's your responsibility. Yet I need to do it because every time, some of my comments remain unaddressed. > +int nsh_hdr_from_nlattr(const struct nlattr *attr, > + struct nshhdr *nh, size_t size) > +{ > + struct nlattr *a; > + int rem; > + u8 flags = 0; > + u8 ttl = 0; > + int mdlen = 0; > + > + /* validate_nsh has check this, so we needn't do duplicate check here > + */ > + nla_for_each_nested(a, attr, rem) { > + int type = nla_type(a); > + > + switch (type) { > + case OVS_NSH_KEY_ATTR_BASE: { > + const struct ovs_nsh_key_base *base = nla_data(a); > + > + flags = base->flags; > + ttl = base->ttl; > + nh->np = base->np; > + nh->mdtype = base->mdtype; > + nh->path_hdr = base->path_hdr; > + break; > + } > + case OVS_NSH_KEY_ATTR_MD1: { > + const struct ovs_nsh_key_md1 *md1 = nla_data(a); > + > + mdlen = nla_len(a); > + memcpy(>md1, md1, mdlen); > + break; Looks better. Why not simplify it even more? case OVS_NSH_KEY_ATTR_MD1:
Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support
On Tue, 26 Sep 2017 21:52:41 +0800, Yang, Yi wrote: > > + return ((ret != 0) ? false : true); > > But I don't think this is a problematic line from my understanding, Why not: return ((ret != 0 == true) ? false : true) == true; ? Sigh. This is equal to: return !ret; which you should use. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support
On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote: > + return ((ret != 0) ? false : true); I'm not going to review this version but this caught my eye - I pointed out this silly construct in my review of v9. I can understand that working late in the night and rewriting the code back and forth, one could end up with such construct and overlook it at the final self-review before submission. Happens to everyone. But leaving this after a review pointed it out means you're not paying enough attention to your work. Even the fact that you sent v10 so shortly after v9 means you did not spend the needed time on reflecting on the review and that you did not properly test the new version. And I told you exactly this before. Honestly, I'm starting to be tired with reviewing your patch again and again and pointing out silly mistakes like this one and repeating basic things to you again and again. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v10] openvswitch: enable NSH support
On Tue, 26 Sep 2017 12:47:16 +0800, Yi Yang wrote: > v9->v10 > - Change struct ovs_key_nsh to >struct ovs_nsh_key_base base; >__be32 context[NSH_MD1_CONTEXT_SIZE]; > - Fix new comments for v9 NAK, we haven't finished the discussion for v9 yet. It's not appropriate to send a new version until there's a conclusion (or at least until the discussion dies). Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support
On Tue, 26 Sep 2017 12:55:39 +0800, Yang, Yi wrote: > After push_nsh, the packet won't be recirculated to flow pipeline, so > key->eth.type must be set explicitly here, but for pop_nsh, the packet > will be recirculated to flow pipeline, it will be reparsed, so > key->eth.type will be set in packet parse function, we needn't handle it > in pop_nsh. This seems to be a very different approach than what we currently have. Looking at the code, the requirement after "destructive" actions such as pushing or popping headers is to recirculate. Setting key->eth.type to satisfy conditions in the output path without updating the rest of the key looks very hacky and fragile to me. There might be other conditions and dependencies that are not obvious. I don't think the code was written with such code path in mind. I'd like to hear what Pravin thinks about this. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support
On Mon, 25 Sep 2017 22:16:09 +0800, Yi Yang wrote: > + skb->protocol = htons(ETH_P_NSH); > + skb_reset_mac_header(skb); > + skb_reset_mac_len(skb); > + skb_reset_network_header(skb); The last two lines are swapped. Network header needs to be reset before mac_len. > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(skb_push_nsh); > + > +int skb_pop_nsh(struct sk_buff *skb) > +{ > + struct nshhdr *nsh_hdr = (struct nshhdr *)(skb->data); > + size_t length; > + u16 inner_proto; __be16 inner_proto. > +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key, > + const struct nshhdr *src_nsh_hdr) > +{ > + int err; > + > + err = skb_push_nsh(skb, src_nsh_hdr); > + if (err) > + return err; > + > + key->eth.type = htons(ETH_P_NSH); I wonder why you have this assignment here. The key is invalidated, thus nothing should rely on key->eth.type. However, looking at the code and ovs_fragment in particular, I'm not sure that's the case. Could you please explain why it is needed? And why the reverse of it is not needed in pop_nsh? > + > + /* safe right before invalidate_flow_key */ > + key->mac_proto = MAC_PROTO_NONE; > + invalidate_flow_key(key); > + return 0; > +} [...] > +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key, > +const struct nlattr *a) > +{ > + struct nshhdr *nsh_hdr; > + int err; > + u8 flags; > + u8 ttl; > + int i; > + > + struct ovs_key_nsh key; > + struct ovs_key_nsh mask; > + > + err = nsh_key_from_nlattr(a, , ); > + if (err) > + return err; > + > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > + sizeof(struct nshhdr)); Whitespace nit: the sizeof should be aligned to skb_network_offset. > + if (unlikely(err)) > + return err; > + > + nsh_hdr = (struct nshhdr *)skb_network_header(skb); Please use the nsh_hdr function (I'm sure I already requested that in the previous review?). It means renaming the nsh_hdr variable. > @@ -1210,6 +1307,21 @@ static int do_execute_actions(struct datapath *dp, > struct sk_buff *skb, > case OVS_ACTION_ATTR_POP_ETH: > err = pop_eth(skb, key); > break; > + > + case OVS_ACTION_ATTR_PUSH_NSH: { > + u8 buffer[NSH_HDR_MAX_LEN]; > + struct nshhdr *nsh_hdr = (struct nshhdr *)buffer; > + const struct nshhdr *src_nsh_hdr = nsh_hdr; > + > + nsh_hdr_from_nlattr(nla_data(a), nsh_hdr, > + NSH_HDR_MAX_LEN); > + err = push_nsh(skb, key, src_nsh_hdr); Is the src_nsh_hdr variable really necessary? Cannot be nsh_hdr passed directly to push_nsh, relying on automatic retyping to const? By the way, nsh_hdr is a poor name for a variable, it clashes with the nsh_hdr function. For clarity, please rename the variables in the whole patch to something else. > +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + struct nshhdr *nsh_hdr; > + unsigned int nh_ofs = skb_network_offset(skb); > + u8 version, length; > + int err; > + > + err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN); > + if (unlikely(err)) > + return err; > + > + nsh_hdr = (struct nshhdr *)skb_network_header(skb); Again, rename the variable and use the nsh_hdr function. > + version = nsh_get_ver(nsh_hdr); > + length = nsh_hdr_len(nsh_hdr); > + > + if (version != 0) > + return -EINVAL; > + > + err = check_header(skb, nh_ofs + length); > + if (unlikely(err)) > + return err; > + > + nsh_hdr = (struct nshhdr *)skb_network_header(skb); Ditto. > +struct ovs_key_nsh { > + u8 flags; > + u8 ttl; > + u8 mdtype; > + u8 np; > + __be32 path_hdr; > + __be32 context[NSH_MD1_CONTEXT_SIZE]; > +}; This should be: struct ovs_key_nsh { struct ovs_nsh_key_base base; __be32 context[NSH_MD1_CONTEXT_SIZE]; }; to capture the real dependency. Implicitly depending on ovs_key_nsh starting with the same fields as ovs_nsh_key_base will only lead to bugs in the future. > +int nsh_hdr_from_nlattr(const struct nlattr *attr, > + struct nshhdr *nsh, size_t size) > +{ > + struct nlattr *a; > + int rem; > + u8 flags = 0; > + u8 ttl = 0; > + int mdlen = 0; > + > + /* validate_nsh has check this, so we needn't do duplicate check here > + */ > + nla_for_each_nested(a, attr, rem) { > + int type = nla_type(a); > + > + switch (type) { > + case OVS_NSH_KEY_ATTR_BASE: { > + const struct ovs_nsh_key_base *base = > + (struct ovs_nsh_key_base *)nla_data(a); It's not necessary to retype void * explicitly. Just assign it. > +
Re: [ovs-dev] [PATCH net-next v9] openvswitch: enable NSH support
On Thu, 14 Sep 2017 16:37:59 +0800, Yi Yang wrote: > OVS master and 2.8 branch has merged NSH userspace > patch series, this patch is to enable NSH support > in kernel data path in order that OVS can support > NSH in compat mode by porting this. http://vger.kernel.org/~davem/net-next.html ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Mon, 4 Sep 2017 14:45:50 +, Jan Scheurich wrote: > So what is the correct layout for MASKED_SET action with nested fields? > 1. All nested values, followed by all nested masks, or > 2. For each nested field value followed by mask? > > I guess alternative 1, but just to be sure. It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement and probably impossible to be properly validated. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Tue, 5 Sep 2017 14:37:05 +0800, Yang, Yi wrote: > I checked source code but can't find where we can prepopulate it and how > we can deliver the prepopulated header to push_nsh, can you expand it in > order that I can know how to do this in code level? I already said that it's not implemented yet and can be done as a future performance enhancement. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Tue, 5 Sep 2017 13:51:45 +0800, Yang, Yi wrote: > But if we follow your way, how does nla_for_each_nested handle such > pattern? > > attribute1 > attribute1_mask > attribute2 > attribute2_mask > attribute3 > attribute3_mask Uh? That will just work. Note that nla len contains the size of the whole attribute, i.e. includes both fields. > I don't think this can increase performance and readability. Do you realize you're stating that not copying data around in hot path can't increase performance? ;-) > if we use one function to handle both attributes and masks, I can't > see any substantial difference between two ways as far as the > performance is concerned. Except that what you did is so unexpected to netlink that you had to go out of your way to parse it. Those two memcpys speak for themselves. > If we consider OVS_KEY_ATTR_NSH as a big attribute, current way is > precisely following current design pattern Do you have an idea how nested netlink attributes work? It's very well defined. What you're doing is breaking the definition. You can't do that. The (non-nested) attributes contain header followed by data. The data is a single value or it is a struct. In ovs for masked actions, attributes contain a struct of two fields (value and mask). The struct access is open coded but it's still a struct. Now, nested attributes are very well defined: it's a nla header followed by a stream of attributes. There's no requirement on the order of the attributes. Do you see how you're breaking this assumption? You expect that each attribute is present exactly twice, once among first half of attributes, once among second half of attributes. You don't check that this weird assumption is adhered to. You don't even check that the point where you split the data is between attributes! You may very well be splitting an attribute in half and interpreting its data as nla headers. Consider your proposal NACKed from my side. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Mon, 4 Sep 2017 14:07:45 +, Jan Scheurich wrote: > Then perhaps I misunderstood your comment. I thought you didn't like that the > SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested. > I was aiming to avoid this by lifting the two components of the NSH header > components to the top level: > OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER > OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT No, this should be a nested attr. I objected to the way value+mask combo is handled. > See above. The two would be separate values in the same enum, i.e. distinct. This is not how netlink attrs should be designed. > Not sure I have the full picture here. Are you saying that the tc tool uses > the same > Netlink API to the kernel as OVS? What would be the back-end for tc? Also the > openvswitch kernel module? It does not use the same API but it makes sense for these two to share common code. Plus hw offloading in ovs is done using tc. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Mon, 4 Sep 2017 12:57:15 +, Jan Scheurich wrote: > So is what you are suggesting the following? > > For matching: > OVS_KEY_ATTR_NSH_BASE_HEADER mandatory > OVS_KEY_ATTR_NSH_MD1_CONTEXT optional, in case MDTYPE == MD1 This needs to be: OVS_KEY_ATTR_NSH OVS_KEY_ATTR_NSH_BASE_HEADER OVS_KEY_ATTR_NSH_MD1_CONTEXT > For setting: > OVS_ACTION_ATTR_SET_MASKED > -OVS_KEY_ATTR_NSH_BASE_HEADER when setting any base header > fields > OVS_ACTION_ATTR_SET_MASKED > -OVS_KEY_ATTR_NSH_MD1_CONTEXT when setting any MD1 context data fields This needs to be: OVS_ACTION_ATTR_SET_MASKED OVS_KEY_ATTR_NSH OVS_KEY_ATTR_NSH_BASE_HEADER OVS_KEY_ATTR_NSH_MD1_CONTEXT In your example, there's no way to distinguish OVS_KEY_ATTR_ENCAP and OVS_KEY_ATTR_NSH_BASE_HEADER, they are both "1". > Should we consider to stick to that simple and efficient approach? As far > As I can see it will be generic for the foreseeable future. I'm not sure. From the kernel point of view, we want the same functionality offered by the openvswitch module and by a tc action. In theory, it can be the tc tool that constructs the header from the command line but there's no precedent. Dunno. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Wed, 30 Aug 2017 20:39:12 +0800, Yi Yang wrote: > +enum ovs_nsh_key_attr { > + OVS_NSH_KEY_ATTR_BASE, /* struct ovs_nsh_key_base. */ > + OVS_NSH_KEY_ATTR_MD1, /* struct ovs_nsh_key_md1. */ > + OVS_NSH_KEY_ATTR_MD2, /* variable-length octets for MD type 2. */ > + __OVS_NSH_KEY_ATTR_MAX > +}; One more thing: 0 is reserved, the first attr must be OVS_NSH_KEY_ATTR_UNSPEC. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Mon, 4 Sep 2017 20:09:07 +0800, Yang, Yi wrote: > So we must do many changes if we want to break this assumption. We may do as many changes as we want to. This is uAPI we're talking about and we need to get it right since the beginning. Sure, it may mean that some user space programs need some changes in order to make use of the new features. That happens every day. I also don't understand where's the problem. It's very easy to check for NLA_F_NESTED and generically act based on that in the function you quoted. Just call a different function than format_odp_key_attr to handle ovs_nsh_key_attr attributes in case the nested flag is set and the attribute is OVS_KEY_ATTR_NSH and you're done. You'll need such function anyway, it's not much different code size wise to call it from format_odp_key_attr or from format_odp_action. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Mon, 4 Sep 2017 16:00:05 +0800, Yang, Yi wrote: > I think we have had similiar discussion about this, the issue is we have > no way to handle both MD type 1 and MD type 2 by using a common flow key > struct. > > So we have to do so, there is miniflow to handle such issue in > userspace, but kernel data path didn't provide such mechanism. I think > every newly-added key will result in the same space-wasting issue for > kernel data path, isn't it? Yes. And it would be surprising if it didn't have an effect on performance. I don't care that much as this is a result of openvswitch module design decisions but it still would be good to reach a consensus before implementing uAPI that might not be needed in the end. > OVS_ATTR_KEY_NSH is the first case to use nested attributes for set > action, so no reference code for this, set action has two use cases, one > has mask but the other hasn't, so it will be easier an nested attribute is > handled as a whole, in user space, nsh_key_from_nlattr is used in > several places. I very much disagree. What you're doing is very poor design as can be seen from the code where you have to do weird gymnastics to implement that. Compare that to a much simple case where each attribute carries its own value and mask. > If we change it per your proposal, there will be a lot > of rework, That's not an argument. We care about doing things right, we don't do things hastily and with as little effort as possible. > we have to provide two functions to handle this, one is for > the case with mask, the other is for the case without mask. I don't see that. The function is called from a single place only. > How can we do so? I see batch process code in user space implementation, > but kernel data path didn't such infrastructure, You're right that there's no infrastructure. I somewhat agree that it might be out of scope of this patch and it can be optimized later. That's why I also included other suggestions to speed this up until we implement better parsing: namely, verify correctness once (at the time it is set from user space) and just expect things to be correct in the fast path. > how can we know next push_nsh uses the same nsh header as previous > one? We store the prepopulated header together with the action. > > Shouldn't we reject the packet, then? Pretending that something was parsed > > correctly while in fact it was not, does not look correct. > > For MD type 2, we don't implement metadata parsing, but it can still > work. I'm not sure what you mean here, how do we reject a packet here? Okay, we can match on mdtype and thus can find out about this type of packets. No problem, then. > > These checks should be done only once when the action is configured, not for > > each packet. > > I don't know how to implement a batch processing in kernel data path, it > seems there isn't such code for reference. The checks should be done somewhere in __ovs_nla_copy_action. Which seems to be done even in your patch by nsh_key_put_from_nlattr (I didn't get that far during the review last week. The names of those nsh_(key|hdr)_*_nlattr are so similar that it's impossible to tell what they do without looking at the call sites - something you should also consider to improve). That makes it even more wrong: you appear to check everything twice, first on configuration and then for every packet. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support
On Wed, 30 Aug 2017 20:39:12 +0800, Yi Yang wrote: > --- a/net/nsh/nsh.c > +++ b/net/nsh/nsh.c > @@ -14,6 +14,47 @@ > #include > #include > > +int skb_push_nsh(struct sk_buff *skb, const struct nshhdr *nsh_src, bool > is_eth) > +{ > + struct nshhdr *nsh; > + size_t length = nsh_hdr_len(nsh_src); > + u8 next_proto; > + > + if (is_eth) { > + next_proto = TUN_P_ETHERNET; I don't like the is_eth parameter. It should be clear from the skb being passed to the function, specifically from skb->mac_len. > + } else { > + next_proto = tun_p_from_eth_p(skb->protocol); > + if (!next_proto) > + return -ENOTSUPP; EAFNOSUPPORT is more suitable here. > + } > + > + /* Add the NSH header */ > + if (skb_cow_head(skb, length) < 0) > + return -ENOMEM; > + > + /* Add the NSH header */ > + if (skb_cow_head(skb, length) < 0) > + return -ENOMEM; Duplicate statement. > + > + if (!skb->inner_protocol) { > + skb_set_inner_network_header(skb, skb->mac_len); > + skb_set_inner_protocol(skb, skb->protocol); It doesn't make sense to set either of those. Nothing uses the fields for NSH. > + } > + > + skb_push(skb, length); > + nsh = (struct nshhdr *)(skb->data); Use nsh_hdr. > + memcpy(nsh, nsh_src, length); > + nsh->np = next_proto; > + nsh->mdtype &= NSH_MDTYPE_MASK; The "& NSH_MDTYPE_MASK" operation does not make sense. Just trust the caller to provide a meaningful value. It's its job to verify the parameters. > + skb->protocol = htons(ETH_P_NSH); > + skb_reset_mac_header(skb); > + skb_reset_mac_len(skb); You have to reset also the network header. > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(skb_push_nsh); > + > static struct sk_buff *nsh_gso_segment(struct sk_buff *skb, > netdev_features_t features) > { > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index a54a556..e969fad 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -38,11 +38,13 @@ > #include > #include > #include > +#include > > #include "datapath.h" > #include "flow.h" > #include "conntrack.h" > #include "vport.h" > +#include "flow_netlink.h" > > struct deferred_action { > struct sk_buff *skb; > @@ -380,6 +382,57 @@ static int push_eth(struct sk_buff *skb, struct > sw_flow_key *key, > return 0; > } > > +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key, > + const struct nshhdr *nsh_src) > +{ > + bool is_eth = false; > + int ret; > + > + if (key->mac_proto == MAC_PROTO_ETHERNET) > + is_eth = true; > + > + ret = skb_push_nsh(skb, nsh_src, is_eth); > + if (ret != 0) The usual idiom is "if (ret)". And for consistency with the rest of the file, the name of the variable should be "err". > + return ret; > + > + key->eth.type = htons(ETH_P_NSH); > + > + /* safe right before invalidate_flow_key */ > + key->mac_proto = MAC_PROTO_NONE; > + invalidate_flow_key(key); > + return 0; > +} > + > +static int pop_nsh(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + struct nshhdr *nsh = (struct nshhdr *)(skb->data); > + size_t length; > + u16 inner_proto; > + > + if (ovs_key_mac_proto(key) != MAC_PROTO_NONE || > + skb->protocol != htons(ETH_P_NSH)) { > + return -EINVAL; > + } > + > + inner_proto = tun_p_to_eth_p(nsh->np); > + if (!inner_proto) > + return -ENOTSUPP; > + > + length = nsh_hdr_len(nsh); > + skb_pull(skb, length); > + skb_reset_mac_header(skb); > + skb_reset_mac_len(skb); > + skb->protocol = inner_proto; Please factor this out to skb_pop_nsh, similarly to what you did with skb_push_nsh. > + > + /* safe right before invalidate_flow_key */ > + if (inner_proto == htons(ETH_P_TEB)) > + key->mac_proto = MAC_PROTO_ETHERNET; > + else > + key->mac_proto = MAC_PROTO_NONE; > + invalidate_flow_key(key); > + return 0; > +} > + > static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh, > __be32 addr, __be32 new_addr) > { > @@ -602,6 +655,53 @@ static int set_ipv6(struct sk_buff *skb, struct > sw_flow_key *flow_key, > return 0; > } > > +static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key, > +const struct ovs_key_nsh *key, > +const struct ovs_key_nsh *mask) > +{ > + struct nshhdr *nsh; > + int err; > + u8 flags; > + u8 ttl; > + int i; > + > + err = skb_ensure_writable(skb, skb_network_offset(skb) + > + sizeof(struct nshhdr)); > + if (unlikely(err)) > + return err; > + > + nsh = (struct nshhdr *)skb_network_header(skb); nsh_hdr > + > + flags = nsh_get_flags(nsh); >
Re: [ovs-dev] [PATCH] NEWS: Mark NSH support as experimental in 2.8.
On Tue, 29 Aug 2017 10:50:11 -0700, Ben Pfaff wrote: > This feature landed late in 2.8 and the NSH wire protocol itself is not > completely stable. Acked-by: Jiri Benc <jb...@redhat.com> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
On Fri, 25 Aug 2017 18:25:14 +0200, Jiri Benc wrote: > While looking at this, I realized that GSO for VXLAN-GPE is broken, > too. Let me fix it by implementing what I described above which will > make your patch much easier. Okay, it's not really broken and we don't need that complexity. At least not immediately. Hw offloading in the VXLAN-GPE case probably does not work correctly and would benefit from that change but that's a different beast to tackle at a different time. Software segmentation works fine for VXLAN-GPE. There should not be much problems with NSH segmentation, either, if we carefully store and set mac_header, mac_len and skb->protocol around calls to skb_mac_gso_segment. Note that with zero mac_len (and correct skb->protocol), skb_mac_gso_segment behaves in the same way that you tried to achieve with find_gso_segment_by_type, which is thus completely unnecessary. More on Monday. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote: > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -766,7 +766,7 @@ struct sk_buff { > __u8ndisc_nodetype:2; > #endif > __u8ipvs_property:1; > - __u8inner_protocol_type:1; > + __u8inner_protocol_type:2; Adding anything to sk_buff is pretty much forbidden. You can't add more bytes to it and there are no more free bits to use, either. Luckily, we still have one byte hole next to inner_ipproto that we can use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3 and storing the L3 type in the unused byte. It's not beautiful (would be better to use ethertype than a custom enum) but it will work. While looking at this, I realized that GSO for VXLAN-GPE is broken, too. Let me fix it by implementing what I described above which will make your patch much easier. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 1/3] net: add NSH header structures and helpers
On Fri, 25 Aug 2017 22:20:03 +0800, Yi Yang wrote: > --- a/include/uapi/linux/if_ether.h > +++ b/include/uapi/linux/if_ether.h > @@ -138,6 +138,7 @@ > #define ETH_P_IEEE802154 0x00F6 /* IEEE802.15.4 frame > */ > #define ETH_P_CAIF 0x00F7 /* ST-Ericsson CAIF protocol*/ > #define ETH_P_XDSA 0x00F8 /* Multiplexed DSA protocol */ > +#define ETH_P_NSH0x894F /* Ethertype for NSH. */ This is in a wrong section. It belongs to Ethernet protocols a few lines higher in the file (after ETH_P_HSR). Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support
On Tue, 22 Aug 2017 17:38:26 +0800, Yang, Yi wrote: > I checked GSO support code, we have two cases to handle, one case is to > output NSH packet to VxLAN-gpe port, the other case is to output NSH packet > to Ethernet port (tap, veth, vhost, physical eth, etc.), I don't think > it is a good way to do GSO support in OVS module, because we can't know > the final output port at this point, if the final output port is > VxLAN-gpe port, it will still face GSO issue, but I didn't see vxlan > module handles this, maybe udp tunnel did this. I think a NSH packet can > be split into two packets, one has NSH header, another one doesn't, > so I'm not sure how vxlan module can avoid this situation. As I said before, by either implementing GSO support for NSH or by segmenting in software before pushing the NSH header. In the first case, the packet will be software segmented before being pushed to the hardware, in the second case, it will be software segmented before pushing the NSH header. > For the second case, we can add a nsh GSO module in net/nsh/nsh_gso.c > (as MPLS did in net/mpls/mpls_gso.c), that is the best way to handle > this, what do you think about it? Yes, that's one of the two options that we've been talking about. It's the better and the more efficient one. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support
On Mon, 21 Aug 2017 10:10:38 +, Jan Scheurich wrote: > If I understand correctly, this is a default definition that can be > overridden by drivers so that we still cannot rely on the Ethernet > payload always being 32-bit-aligned. Not by drivers, by architectures. Only architectures that can efficiently handle unaligned access (or on which the cost of handling unaligned access is lower than the cost of unaligned DMA access) set NET_IP_ALIGN to 0. > Furthermore, there seem to be machine architectures that cannot > handle misaligned 32-bit access at all (not even with a performance > penalty). Those leave NET_IP_ALIGN to 2. > Or why else does OVS user space code take so great pain to model > possible misalignment and provide/use safe access functions? I don't know how the ovs user space deals with packet allocation. In the kernel, the network header is aligned in a way that it allows efficient 32bit access. > Does Linux kernel code generally ignore this risk? Given the fact that IPv4 addresses are 32bit, are accessed as such and one can't say that IPv4 implementation on Linux is non-functional, the answer is obvious :-) Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support
On Mon, 21 Aug 2017 17:15:42 +0800, Yang, Yi wrote: > The issue is it is used union in > > struct nsh_hdr { > ovs_be16 ver_flags_ttl_len; > uint8_t md_type; > uint8_t next_proto; > ovs_16aligned_be32 path_hdr; > union { > struct nsh_md1_ctx md1; > struct nsh_md2_tlv md2; > }; > }; This should work (modulo the non-kernel type names, of course). Did you mean to put [] after md2? > in Linux kernel build, it complained it, I changed it to What was the error message? > struct nsh_hdr { > ovs_be16 ver_flags_ttl_len; > uint8_t md_type; > uint8_t next_proto; > ovs_16aligned_be32 path_hdr; > union { > struct nsh_md1_ctx md1; > struct nsh_md2_tlv md2[0]; > }; > }; I wouldn't use this. First, zero length array is a GCC extension. It would indeed be better not to use that in uAPI. Second, I wouldn't even use a flexible array member here, see my reply to Jan for the reasons. Note that I commented on struct nsh_md2_tlv having __u8[] as the last member which IMHO makes good sense. I'm not entirely sure what C99 says about flexible array member being part of a struct inside union inside a struct, though. GCC seems to cope with that just fine but AFAIK it has some extension over the C standard wrt. flexible array members. > I don't know how we can support this, is it a must-have thing? What would happen if you get a GSO packet? Ports of an ovs bridge claim GSO support, thus they may get a GSO packet. You have to handle it one way or the other: either software segment the packet before pushing the header, or implement proper GSO support for NSH. > But struct nsh_hdr had different struct from struct ovs_key_nsh, we > have no way to make them completely same, do you mean we should use the > same name if they are same fields and represent the same thing? Yes. Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support
On Fri, 18 Aug 2017 15:26:01 +0200, Jiri Benc wrote: > Out of time for today, will continue the review next week. Again, feel > free to send a new version meanwhile or wait for the rest of the > review, whatever works better for you. One more thing: before you send a new version, be sure and double check that you addressed *all* of the feedback. In this version, you still have not addressed a few things I gave feedback on in the previous version. This wastes everyone's time. It doesn't mean I'm always right, of course, but you either explain why I'm wrong or change the code. It's generally not acceptable to ignore feedback. Thank you, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v4] openvswitch: enable NSH support
On Fri, 18 Aug 2017 15:24:31 +0800, Yi Yang wrote: > +struct nsh_md2_tlv { > + __be16 md_class; > + u8 type; > + u8 length; > + /* Followed by variable-length data. */ > +}; What was wrong with the u8[] field that was present at the end of the struct in the previous version of the patch? > +#define NSH_M_TYPE2_MAX_LEN 256 This is defined twice, please delete this define and keep the one lower in the file. > +#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */ This is a VXLAN-GPE port, it has nothing to do with NSH (except that VXLAN-GPE can contain a NSH packet). It's also unused. Please remove it. > +/* NSH Metadata Length. */ > +#define NSH_M_TYPE1_MDLEN 16 This is unused and it seems it's not much useful anyway, sizeof(struct nsh_md1_ctx) provides the same value. Please remove this define. > +#define NSH_MD1_CTX(nsh_hdr_ptr) (&(nsh_hdr_ptr)->md1) > + > +#define NSH_MD2_CTX(nsh_hdr_ptr) (&(nsh_hdr_ptr)->md2) Please remove these two. They are unused and would just obscure things anyway. > +static inline struct nsh_md1_ctx *nsh_md1_ctx(struct nsh_hdr *nsh) > +{ > + return >md1; > +} > + > +static inline struct nsh_md2_tlv *nsh_md2_ctx(struct nsh_hdr *nsh) > +{ > + return >md2; > +} And remove these too, for the same reason. Just use nsh->md1 when you need the metadata, there's no reason for these helper functions. They just obscure things. > +static inline void nsh_set_flags_and_ttl(struct nsh_hdr *nsh, u8 flags, u8 > ttl) > +{ > + nsh->ver_flags_ttl_len > + = htons((ntohs(nsh->ver_flags_ttl_len) > + & ~(NSH_FLAGS_MASK | NSH_TTL_MASK)) > + | ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) > + | ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK)); > +} > + > +static inline void nsh_set_flags_ttl_len(struct nsh_hdr *nsh, u8 flags, > + u8 ttl, u8 len) > +{ > + nsh->ver_flags_ttl_len > + = htons((ntohs(nsh->ver_flags_ttl_len) > + & ~(NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK)) > + | ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) > + | ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK) > + | ((len << NSH_LEN_SHIFT) & NSH_LEN_MASK)); > +} Okay. Could those two perhaps use a common function? static inline void __nsh_set_flags(struct nsh_hdr *nsh, u16 value, u16 mask) { nsh->ver_flags_ttl_len = nsh->ver_flags_ttl_len & ~htons(mask) | htons(value); } static inline void nsh_set_flags_and_ttl(struct nsh_hdr *nsh, u8 flags, u8 ttl) { __nsh_set_flags(nsh, flags << NSH_FLAGS_SHIFT | ttl << NSH_TTL_SHIFT, NSH_FLAGS_MASK | NSH_TTL_MASK); } etc. > +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key, > + const struct nsh_hdr *nsh_src) > +{ [...] > + if (!skb->inner_protocol) > + skb_set_inner_protocol(skb, skb->protocol); I was wondering about this during the reviews of the previous versions. Now I've given this more thought but I still don't see it - why is the inner_protocol set here? > + case OVS_KEY_ATTR_NSH: { > + struct ovs_key_nsh nsh; > + struct ovs_key_nsh nsh_mask; > + size_t size = nla_len(a) / 2; > + struct nlattr attr[1 + size / sizeof(struct nlattr) + 1]; > + struct nlattr mask[1 + size / sizeof(struct nlattr) + 1]; > + > + attr->nla_type = nla_type(a); > + mask->nla_type = attr->nla_type; > + attr->nla_len = NLA_HDRLEN + size; > + mask->nla_len = attr->nla_len; > + memcpy(attr + 1, (char *)(a + 1), size); > + memcpy(mask + 1, (char *)(a + 1) + size, size); No, please. See my reply to the previous version for how to do this in a less hacky way. > + case OVS_ACTION_ATTR_PUSH_NSH: { > + u8 buffer[256]; > + struct nsh_hdr *nsh_hdr = (struct nsh_hdr *)buffer; > + const struct nsh_hdr *nsh_src = nsh_hdr; > + > + nsh_hdr_from_nlattr(nla_data(a), nsh_hdr); This is very dangerous security wise. You have to protect against buffer overflow, one way or other. The current code may not overflow (I have not checked that, though) but a future addition may break the assumption without being obvious it's a problem. Note that the previous version had exactly the same problem but it was hidden and I didn't notice it. Which means that getting rid of that push_nsh_para struct was a very good thing, the code is more clean and more obvious now. > +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + struct nsh_hdr *nsh = (struct nsh_hdr *)skb_network_header(skb); > + u8 version, length; > + int err; > + > + err = check_header(skb, NSH_BASE_HDR_LEN); > + if (unlikely(err)) > +
Re: [ovs-dev] [PATCH v3] openvswitch: enable NSH support
On Wed, 16 Aug 2017 17:31:30 +0800, Yang, Yi wrote: > On Wed, Aug 16, 2017 at 11:19:21AM +0200, Jiri Benc wrote: > > > --- a/include/uapi/linux/openvswitch.h > > > +++ b/include/uapi/linux/openvswitch.h > > [...] > > > +#define NSH_MD1_CONTEXT_SIZE 4 > > > > Please move this to nsh.h and use it there, too, instead of the open > > coded 4. > > ovs code is very ugly, it will convert array[4] in > datapath/linux/compat/include/linux/openvswitch.h to other struct, I > have to change context[4] to such format :-), we can use 4 here for > Linux kernel. Oh, right, this is uAPI and nsh.h is kernel internal. My suggestion was nonsense, let's keep it as it was in your patch. > > > + case OVS_KEY_ATTR_NSH: { > > > + struct ovs_key_nsh nsh; > > > + struct ovs_key_nsh nsh_mask; > > > + size_t size = nla_len(a) / 2; > > > + struct nlattr attr[1 + DIV_ROUND_UP(sizeof(struct ovs_key_ipv6) > > > + , sizeof(struct nlattr))]; > > > + struct nlattr mask[1 + DIV_ROUND_UP(sizeof(struct ovs_key_ipv6) > > > + , sizeof(struct nlattr))]; > > > + > > > + attr->nla_type = nla_type(a); > > > + mask->nla_type = attr->nla_type; > > > + attr->nla_len = NLA_HDRLEN + size; > > > + mask->nla_len = attr->nla_len; > > > + memcpy(attr + 1, (char *)(a + 1), size); > > > + memcpy(mask + 1, (char *)(a + 1) + size, size); > > > > This is too hacky. Please find a better way to handle this. > > > > One option is to create a struct with struct nlattr as the first member > > followed by a char buffer. Still not nice but at least it's clear > > what's the intent. > > The issue is nested attributes only can use this way, nested attribute > for SET_MASKED is very special, we have to handle it specially. I'm not sure you understood what I meant. Let me explain in code: struct { struct nlattr attr; struct ovs_key_ipv6 data; } attr, mask; attr->attr.nla_type = nla_type(a); attr->attr.nl_len = NLA_HDRLEN + size; memcpy(>data, a + 1, size); It's much less hacky and doing the same thing. I'm not sure we don't need verification of size not overflowing the available buffer. Is it checked beforehand elsewhere? I also spotted one more bug: the 'mask' variable is not used anywhere. The second call of nsh_key_from_nlattr should use mask, not attr. > > > + key->nsh.path_hdr = nsh->path_hdr; > > > + switch (key->nsh.mdtype) { > > > + case NSH_M_TYPE1: > > > + if ((length << 2) != NSH_M_TYPE1_LEN) > > > > Why length << 2? > > len in NSH header is number of 4 octets, so need to multiply 4. Should nsh_get_len take care of that, then? Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote: > Is it worth to speculate on how a hypothetical future NSH version > (with a different Version value in the Base header) might look like? Absolutely. This is uAPI we're talking about and once merged, it's set in stone. Whatever we come up with should allow future extensibility. > If this should ever arise, we could introduce a new push_nsh_v2 > action. Which would mean we failed with the design. We would have to maintain two parallel APIs forever. This is not how the design should be made. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote: > Jiri, I am not too familiar with conventions on the OVS netlink > interface regarding the handling of variable length fields. What is > the benefit of structuring the push_nsh action into > > OVS_ACTION_ATTR_PUSH_NSH > +-- OVS_ACTION_ATTR_NSH_BASE_HEADER > +-- OVS_ACTION_ATTR_NSH_MD > > instead of grouping the base header fields and the variable length MD > into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the > main concern a potential future need to add additional parameters to > the push_nsh datapath action? Are there examples for such structured > actions other than OVS_ACTION_ATTR_CT where the need is obvious > because it is very polymorphic? This is about having the design clean. What would you do if you had two variable length fields? Would you still put them in a single structure and have a length field in the structure, too? That would be wrong, we have length in the attribute header. I doubt you would do that. Which indicates that putting variable length fields to an attribute with anything other is wrong. Also, look at how ugly the code would be. You'd have to subtract the base header length from the attribute length to get the variable data length. That's not nice at all. Think about the netlink in the way that by default, every field should have its own attribute. Structures are included only for performance reasons where certain fields are always passed in a message and having them in separate attributes would be impractical and waste of space. Going out of your way to include the context data in the structure thus doesn't make sense. > BTW: The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because > in the NSH draft the term base header is used for the first 32-bit > word, whereas here it includes also the 32-bit Service Path header. An excellent comment. This means that it's very well possible that future NSH versions may not include SP header or may have it of a different size. Maybe we should consider putting it into a separate attribute, too? Not sure it is needed, though, I don't think it's likely to happen. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Fri, 11 Aug 2017 10:09:36 +, Jan Scheurich wrote: > Unless someone can explain to me why the datapath should understand the > internal structure/format of metadata in push_nsh, I would strongly > vote to keep the metadata as variable length octet sequence in the > non-structured OVS_ACTION_ATTR_PUSH_NSH Could be but it still needs to be in a different attribute and not in the ovs_action_push_nsh structure. Separate attributes for MD1/MD2 has the advantage of easier validation: with a separate MD1 type attribute, the size check is easier. With an unstructured MD attribute, we'd need to look into the OVS_ACTION_ATTR_NSH_BASE_HEADER attribute for mdtype and then validate the unstructured MD attribute size manually. Not a big deal, though. I don't have strong opinion here. But I do have strong opinion that MD needs to go into a separate attribute, whether there are separate attributes for MD1/2 or not. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Fri, 11 Aug 2017 09:24:25 +, Yang, Yi Y wrote: > So far, we're not clear how we can support MD type 2 better, as I > explained before, we need to reuse tun_metadata in struct flow_tnl > which is the thing Geneve is using. Geneve predefined 64 keys for > this from tun_metadata0 to tun_metadata63, we will reuse it for MD > type 2. But you know NSH is not tunnel, so it has to be changed to > support both Geneve and NSH. Anyway, they won't be part of > ovs_key_nsh. Please do not top post. The context field does not apply to MD type 2. It looks wrong for the context field to be included in netlink attribute for anything other than MD type 1. Perhaps it needs to be put into a separate attribute, too? Note that I'm talking only about the uAPI. Internally, ovs can use struct ovs_key_nsh that is MD type 1 only, there's no problem changing that later. But for the user space interface, this needs to be clean. This can be solved for example this way: In include/uapi/linux/openvswitch.h: struct ovs_key_nsh_base { __u8 flags; __u8 mdtype; __u8 np; __u8 pad; __be32 path_hdr; }; + one more netlink attribute carrying MD type 1 info. Will probably require to change OVS_KEY_ATTR_NSH to a nested attribute etc. In net/openvswitch/flow.h (or perhaps a different header would be more appropriate?): struct ovs_key_nsh { struct ovs_key_nsh_base base; __be32 context[4]; }; Plus needed conversions between OVS_KEY_ATTR_NSH and struct ovs_key_nsh when interfacing between the kernel and user space. That way, we can have MD type 1 support only for now while still being allowed to redesign things in whatever way later. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Fri, 11 Aug 2017 16:47:23 +0800, Yang, Yi wrote: > is "__be32 context[4]" ok? Yes, that looks better. > So define three new netlink attributes > > OVS_ACTION_ATTR_NSH_BASE_HEADER > OVS_ACTION_ATTR_NSH_MD1_DATA > OVS_ACTION_ATTR_NSH_MD2_DATA > > OVS_ACTION_ATTR_PUSH_NSH is nested netlink attribute, it will nest > OVS_ACTION_ATTR_NSH_BASE_HEADER and OVS_ACTION_ATTR_NSH_MD1_DATA for MD > type 1, it will nest OVS_ACTION_ATTR_NSH_BASE_HEADER and > OVS_ACTION_ATTR_NSH_MD2_DATA for MD type 2. I'll compeletely remove struct > ovs_action_push_nsh, is it ok? Yes, that's the way to do it. What should be done with struct ovs_key_nsh? Even with "c" renamed to "context", it's still MD type 1 only structure. What is the plan for MD type 2 support wrt. this structure? Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support
On Thu, 10 Aug 2017 21:21:15 +0800, Yi Yang wrote: > OVS master and 2.8 branch has merged NSH userspace > patch series, this patch is to enable NSH support > in kernel data path in order that OVS can support > NSH in 2.8 release in compat mode by porting this. Please include changelog when posting a new version of a patch. > +static inline u16 > +nsh_hdr_len(const struct nsh_hdr *nsh) Single line, please. And all other instances of this in nsh.h, too. > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -333,6 +333,7 @@ enum ovs_key_attr { > OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */ > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, /* struct ovs_key_ct_tuple_ipv4 */ > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, /* struct ovs_key_ct_tuple_ipv6 */ > + OVS_KEY_ATTR_NSH, /* struct ovs_key_nsh */ > > #ifdef __KERNEL__ > OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ > @@ -491,6 +492,15 @@ struct ovs_key_ct_tuple_ipv6 { > __u8 ipv6_proto; > }; > > +struct ovs_key_nsh { > + __u8 flags; > + __u8 mdtype; > + __u8 np; > + __u8 pad; > + __be32 path_hdr; > + __be32 c[4]; I still don't like the "c" name. Please change it to something descriptive. This is uAPI and can't be changed later. And I still don't see my comment about this not being extensible for MD type 2 addressed. Please understand this is uAPI and it is set in stone once it is merged into the kernel. It's very important we get this right since the beginning. > +struct ovs_action_push_nsh { > + __u8 flags; > + __u8 mdtype; > + __u8 mdlen; > + __u8 np; > + __be32 path_hdr; > + __u8 metadata[]; > +}; This is not how netlink attributes work. Please reread Ben Pfaff's explanation on how this needs to be structured (Message-ID: <20170809180912.gu6...@ovn.org>) and rework the patch. I 100% agree with what he wrote, his proposal is very clean and matches how netlink is designed. > @@ -835,6 +866,8 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */ > OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */ > OVS_ACTION_ATTR_POP_ETH, /* No argument. */ > + OVS_ACTION_ATTR_PUSH_NSH, /* struct ovs_action_push_nsh. */ > + OVS_ACTION_ATTR_POP_NSH, /* No argument. */ Thank you for changing this to push/pop, it looks much cleaner now. > +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + struct nsh_hdr *nsh = (struct nsh_hdr *)skb_network_header(skb); > + u16 ver_flags_len; > + u8 version, length; > + u32 path_hdr; > + int i; > + > + memset(>nsh, 0, sizeof(struct ovs_key_nsh)); > + ver_flags_len = ntohs(nsh->ver_flags_len); > + version = (ver_flags_len & NSH_VER_MASK) >> NSH_VER_SHIFT; > + length = (ver_flags_len & NSH_LEN_MASK) >> NSH_LEN_SHIFT; A nit: the operation to get/set version, length and flags from the NSH header seems to be repeated enough to warrant helper functions in include/net/nsh.h. Something like: static inline u8 nsh_get_version(const struct nsh_hdr *nsh) { return (ntohs(nsh->ver_flags_len) & NSH_VER_MASK) >> NSH_VER_SHIFT; } etc. Not a blocker, though, it may be done later if needed. > @@ -76,9 +77,11 @@ static bool actions_may_change_flow(const struct nlattr > *actions) > > case OVS_ACTION_ATTR_CT: > case OVS_ACTION_ATTR_HASH: > + case OVS_ACTION_ATTR_POP_NSH: > case OVS_ACTION_ATTR_POP_ETH: > case OVS_ACTION_ATTR_POP_MPLS: > case OVS_ACTION_ATTR_POP_VLAN: > + case OVS_ACTION_ATTR_PUSH_NSH: > case OVS_ACTION_ATTR_PUSH_ETH: > case OVS_ACTION_ATTR_PUSH_MPLS: > case OVS_ACTION_ATTR_PUSH_VLAN: Alphabetical order, please. Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] openvswitch: add NSH support
On Tue, 8 Aug 2017 12:59:40 +0800, Yi Yang wrote: > +struct ovs_key_nsh { > + __u8 flags; > + __u8 mdtype; > + __u8 np; > + __u8 pad; > + __be32 path_hdr; > + __be32 c[4]; "c" is a very poor name. Please rename it to something that expresses what this field contains. Also, this looks like MD type 1 only. How are those fields going to work with MD type 2? I don't think MD type 2 implementation is necessary in this patch but I'd like to know how this is going to work - it's uAPI and thus set in stone once this is merged. The uAPI needs to be designed with future use in mind. > +#define OVS_ENCAP_NSH_MAX_MD_LEN 16 > +/* > + * struct ovs_action_encap_nsh - %OVS_ACTION_ATTR_ENCAP_NSH > + * @flags: NSH header flags. > + * @mdtype: NSH metadata type. > + * @mdlen: Length of NSH metadata in bytes. > + * @np: NSH next_protocol: Inner packet type. > + * @path_hdr: NSH service path id and service index. > + * @metadata: NSH metadata for MD type 1 or 2 > + */ > +struct ovs_action_encap_nsh { > + __u8 flags; > + __u8 mdtype; > + __u8 mdlen; > + __u8 np; > + __be32 path_hdr; > + __u8 metadata[OVS_ENCAP_NSH_MAX_MD_LEN]; This is wrong. The metadata size is set to a fixed size by this and cannot be ever extended, or at least not easily. Netlink has attributes for exactly these cases and that's what needs to be used here. > @@ -835,6 +866,8 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */ > OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */ > OVS_ACTION_ATTR_POP_ETH, /* No argument. */ > + OVS_ACTION_ATTR_ENCAP_NSH,/* struct ovs_action_encap_nsh. */ > + OVS_ACTION_ATTR_DECAP_NSH,/* No argument. */ Use "push" and "pop", not "encap" and "decap" to be consistent with the naming in the rest of the file. We use encap and decap for tunnel operations. This code does not use lwtunnels, thus push and pop is more appropriate. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v3 1/6] vxlan: refactor verification and application of configuration
This patchset looks good overall (would send my Acked-by for most of this but I'm late). On Mon, 19 Jun 2017 10:03:55 +0200, Matthias Schiffer wrote: > Log messages in these > functions are removed, as it is generally unexpected to find error output > for netlink requests in the kernel log. Userspace should be able to handle > errors based on the error codes returned via netlink just fine. However, this is not really true. It's impossible to find out what went wrong when you use e.g. iproute2 to configure a vxlan link. We really need to convert the kernel log messages to the extended netlink errors. Since you removed them prematurely, could you please work on that? Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/7] userspace: Support for push_eth and pop_eth actions
On Thu, 4 May 2017 12:53:25 -0700, Joe Stringer wrote: > On 4 May 2017 at 12:51, Joe Stringerwrote: > > Really? Previously the OVS usage of netlink always validated that > > netlink attributes are exactly as long as expected and that there's no > > trailing data. > > With a glance at __ovs_nla_copy_actions(), this is also true for > OVS_ACTION_ATTR_PUSH_ETH. Right, sorry. I completely forgot about the special netlink handling of ovs. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 2/7] userspace: Support for push_eth and pop_eth actions
On Thu, 4 May 2017 10:02:22 +, Zoltán Balogh wrote: > I'm afraid the kernel would not accept that OVS_ACTION_ATTR_PUSH_ETH > action in this case. > > Jiri, could you confirm, please? It would accept it and it would ignore the eth_type field. The kernel does not verify whether netlink attributes are longer than they should be. It just accepts such attributes and silently ignores the excessive data. I would discourage doing this, though. The data structures between the kernel and the user space should match. (As a side note, we're currently having a discussion that this treatment of excessively long attributes should be changed. Of course this would require the user space asking for such new behavior by setsockopt to keep compatibility with existing programs.) > We could remove the eth_type from struct ovs_action_push_eth, then > put an additional "set action" after putting "push_eth" in the > odp_put_push_eth_action() function in order to set the ethertype of > the packet. This seems weird, though. Under which circumstances does ovs not know what ethertype the packet has? The kernel knows about it (via skb->protocol) all the time, with or without the Ethernet header. No flow matching could work otherwise. > That way we would split the push_eth action into a simpler push_eth > and a set_field. However this would lead to modify > odp_execute_set_action()as well, since changing of ethertype with > set_field is not allowed: Disallowing changing of ethertype is a sensible thing to do. > I guess something similar should be done at kernel side too, since > the kernel module would not accept set_field ethertype either. It does not need it. The kernel knows what the ethertype is. > Another option would be to convert OVS_ACTION_ATTR_PUSH_ETH action > argument between userspace and kernel. > > Do you have any other proposal? > > Btw, the original comment message of struct ovs_action_push_eth > indicates eth_type as a second member. Thanks, this is a (documentation) bug. A leftover from one of the previous versions that nobody noticed. I'll remove it upstream once net-next reopens. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH
On Wed, 18 Jan 2017 09:53:18 +, Jan Scheurich wrote: > Please be invited to the next sync meeting. Sorry, won't make the meeting today. Too busy with DevConf.cz preparations. > Actions Points: > AP-1 (Jarno): Coordinate review of Yi's backported net-next patches > AP-2 (Jiri) Check the ability of the kernel datapath to match on the > presence of Ethernet header w/o matching on Ethernet addresses. > Formating of such DP flows? The OVS_KEY_ATTR_ETHERNET netlink attribute contains source and destination MAC address. It can be masked. If the OVS_KEY_ATTR_ETHERNET attribute is present, only Ethernet frames will match. The MAC addresses may be masked (wildcarded). The OVS_KEY_ATTR_ETHERTYPE attribute indicates the ethertype to match and cannot be masked (it's always an exact match). If it's not present, ETH_P_802_2 is assumed. If the OVS_KEY_ATTR_ETHERNET in not present, the OVS_KEY_ATTR_ETHERTYPE attribute is mandatory and indicates the ethertype to match. Again, it cannot be masked. To answer the question, it's currently possible to match on Ethernet header without matching on Ethernet addresses but not without matching on ethertype. I think it can be relaxed with some work if needed. > AP-3 (Jiri) Provide an update on the status and ETA of the RTNETLINK API > patches RFC patchset posted last week: https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327749.html Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink
On Tue, 10 Jan 2017 15:44:07 +0100, Jiri Benc wrote: > create an ovs_geneve interface using rtnetlink This can be done just once. The pseudocode at startup thus would be: create an ovs_geneve interface using rtnetlink if successful { delete the created interface set out_of_tree flag } And interface creation: if not out_of_tree { create lwtunnel geneve interface using rtnetlink check parameters of the created interface if it is lwtunnel { done, exit } else { delete the created interface } } create geneve interface using genetlink if successful { done } else { fail } Is that what you want? If so, should the out_of_tree flag be queried and set per tunnel type, or just based globally on ovs_geneve (or a different kind)? Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH
On Mon, 9 Jan 2017 11:15:22 +, Yang, Yi Y wrote: > We need to let users have one NSH version available before your > proposal is implemented. I support your proposal, but I have no way > to do anything helpful before your implementation for generic > encap/decap and packet_type are available. Implementing something that we know will be obsolete in a relatively short time does not make much sense. It's painful but the focus here is long term maintainability, not short time gain. > I don't know what other guys are thinking about this, it seems only > you and I are caring this :) The truth is very few people care about NSH in ovs. There is a lot of people who *claim* they care and that they need it implemented ASAP but when it comes to doing real work, it's only you, Ericsson and partially Red Hat. Which, in my opinion, pretty much reflects the urgency. If people did really care, they would invest into this. Hopefully this will change now with Jan taking lead in this. I like Jan's proposal. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Sync on PTAP, EXT-382 and NSH
Hi Jan, On Sun, 18 Dec 2016 14:44:03 +, Jan Scheurich wrote: > I would like to call for a final sync meeting before the Christmas break. you managed to hit almost the only slot in this week when I have a conflict with another meeting. I'm not available on Tuesday starting at the time you chose. Moving the meeting to an earlier time slot or to a different day (except for today) would work for me. Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 17/27] OVS: remove assumptions about VLAN_TAG_PRESENT bit
On Tue, 13 Dec 2016 01:12:38 +0100 (CET), Michał Mirosław wrote: > @@ -850,20 +848,11 @@ static int validate_vlan_from_nlattrs(const struct > sw_flow_match *match, > return -EINVAL; > } > > - if (a[OVS_KEY_ATTR_VLAN]) > - tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); > - > - if (!(tci & htons(VLAN_TAG_PRESENT))) { > - if (tci) { > - OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT > bit set.", > - (inner) ? "C-VLAN" : "VLAN"); > - return -EINVAL; > - } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) { > - /* Corner case for truncated VLAN header. */ > - OVS_NLERR(log, "Truncated %s header has non-zero encap > attribute.", > - (inner) ? "C-VLAN" : "VLAN"); > - return -EINVAL; > - } > + if (!a[OVS_KEY_ATTR_VLAN] && nla_len(a[OVS_KEY_ATTR_ENCAP])) { > + /* Corner case for truncated VLAN header. */ > + OVS_NLERR(log, "Truncated %s header has non-zero encap > attribute.", > + (inner) ? "C-VLAN" : "VLAN"); > + return -EINVAL; This looks wrong. The zero value of nla_get_be16(a[OVS_KEY_ATTR_VLAN]) together with empty a[OVS_KEY_ATTR_ENCAP] means truncated VLAN header and this needs to be preserved. In other words, you need to check also !nla_get_be16(a[OVS_KEY_ATTR_VLAN]) here. Overall, this whole patch looks very suspicious from the very beginning. The xors used are strong indication that something is not right. And indeed, you're breaking uAPI compatibility. Previously, the VLAG_TAG_PRESENT bit set in OVS_KEY_ATTR_VLAN caused the flow to match on packets with VLAN tag present. After this patch, it causes the flow to match only on those packets that have a certain value of VLAN_CFI_MASK in their VLAN tag (I haven't bother deciphering what value is checked after all these xors, it's as well possible that it will also match on packets _without_ any VLAN header). You have to preserve the old meaning of VLAN_TAG_PRESENT in OVS_KEY_ATTR_VLAN. When doing flow lookups, VLAN_TAG_PRESENT must match on and only on packets that have VLAN tag present, irrespective of their VLAN_CFI_MASK. If you want to introduce support for lookups on VLAN_CFI_MASK, you'll have to do that by introducing a new netlink attribute. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink
On Thu, 8 Dec 2016 22:49:56 -0800, Pravin Shelar wrote: > OVS out of tree kernel module is using compat tunnel code upto kernel > 4.5 kernel even thought LWT is available in these kernels. This is due > to missing features on these kernel which are backported to OVS > module. In future we could bump up requirements of kernel again. > Therefore I think we could add compat code for GPE given it is not > that complicated. What I'm concerned about is not so much the code in the out of tree module but the added code that would have to try genetlink for VXLAN-GPE. I hoped to use rtnetlink only for this; adding genetlink would required quite a bit of code (also because it will have to be tried only for the out of tree module but never for the in kernel one; this differs from what we'll do for VXLAN itself). We'd end up with three different handling for different features: (1) genetlink+rtnetlink for both out of tree and in tree module (for e.g. plain VXLAN) (2) genetlink+rtnetlink for out of tree module, rtnetlink for in three module (for VXLAN-GPE) (3) rtnetlink for both out of tree and in tree module (new VXLAN features added in the future) I hoped to avoid this. Especially given that (2) covers only very limited time period. But if you insist... Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] openvswitch: fix VxLAN-gpe port can't be created in ovs compat mode
On Thu, 8 Dec 2016 18:57:51 +0800, Yang, Yi wrote: > So ovs out of tree modules need to adapt to upstream kernel, any > kernel-related changes must be accepted by Linux kernel at first. I'm perfectly aware of that and I'm saying that your patch is unacceptable for upstream kernel. This is a long standing policy of the kernel: there's no way you can get a patch into the kernel to accommodate an out of tree kernel module. The policy is there for good reasons and as paradoxical as it may sound, it benefits the projects that employ out of tree modules in the long run. If Open vSwitch wants to carry a non-upstream patch, it's its choice and we can have that discussion but that's not something to discuss on netdev@vger nor propose for net-next. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink
On Wed, 7 Dec 2016 16:35:59 -0800, Pravin Shelar wrote: > In compat mode, OVS tunnel devices are not used in same way as LWT, > since OVS do support kernel version that does not have core LWT > support. Therefore we have to use legacy vport APIs to create these > ports. I see. Yes, that's unfortunate. > There might be a way to configure the device, once it is > created, using rtnetlink API but would complicate the code. So I think > in such cases like GPE we could to add code to the legacy code. Could we just support the newest shiniest features only with lwtunnel capable kernel? Kernel 4.3 is out for more than a year already, that's a long time. And several more months will pass before this is available in an Open vSwitch release. What about: - always preferring the out of tree module (whatever capabilities it has) - first try rtnetlink - if it fails, try genetlink - if it fails (but the out of tree module is there), just don't bother with kernel - then try the in kernel module - first rtnetlink - if it fails then genetlink This way, we would accommodate most of the stuff. With old kernels, VXLAN-GPE wouldn't be available even with the out of tree module (but I think that's it, I can't think of any other feature unavailable at this point; of course, more may be added in the future). Is this really that bad? It's (relatively) simply to implement, the out of tree module does not diverge from the in kernel one too much, and I don't think the requirement for lwtunnels capable kernel for the newest features is unreasonable. Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] openvswitch: fix VxLAN-gpe port can't be created in ovs compat mode
On Thu, 8 Dec 2016 16:20:10 +0800, Yi Yang wrote: > In ovs compat mode, ovs won't use LWT in current kernel, this is to > make sure ovs can work on the old kernels, Linux kernel v4.7 includes > VxLAN-gpe support but many Linux distributions' kernels are odler than > v4.7, this fix will ensure that ovs can create VxLAN-gpe port correctly > on old kernels, it has been verified on Ubuntu 16.04 x86_64 with Linux > kernel 4.4.0-53-generic. NAK. We do have a way to configure this and that's rtnetlink. Open vSwitch should use that to configure tunnels. Out of tree modules are on their own. Upstream kernel does not accommodate out of tree modules. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink
On Wed, 7 Dec 2016 09:03:39 +0800, Yang, Yi wrote: > But ovs userspace use genetlink to create vxlan tunnel port when we > build ovs to use its own compat modules, this is case 3 Pravin, how > do you think we can handle this? Where's the problem? I don't see any. > Now the dilemma is we can't create vxlangpe port although it has been > in ovs, I know it is ok if we use upstream tunnel code, but in many > cases, Linux distribution didn't include recent kernels, we have to > use ovs' compat modules, Pravin's comments also addressed this. There's no problem in using VXLAN-GPE in the out of tree driver using rtnetlink. As Eric wrote, we're working on this. > So anybody will change current genetlink in ovs to rtnetlink? I will > try to change it if nobody will fix it. Another issue, I think ovs > will send rtnetlink message to kernel, right? Can ovs handle > rtnetlink message if we use ovs compat modules instead of upstream > kernel modules? I don't see any problem. We know how to handle all of the cases. Eric took over the work from Thadeu and he has been working on the patches. Nobody prevents you from submitting your own patches; but please do your own research in such case first. By asking tons of question you're not speeding things up, as we'll be spending time explaining instead of writing and testing the code. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH] datapath: allow tunnels to be created with rtnetlink
On Tue, 6 Dec 2016 07:17:20 +, Yang, Yi Y wrote: > So my advice about this is we can push patch [2] to Linux net-next > first, then apply patch series > https://mail.openvswitch.org/pipermail/ovs-dev/2016-June/316879.html > from Cascardo and apply [1], that can cover all the cases Pravin > mentioned. There's no reason to add this to the genetlink interface when we're going to switch to rtnetlink for tunnel configuration. NACKed-by: Jiri Benc <jb...@redhat.com> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Packet type-aware pipeline, L3 tunneling, EXT-382 and NSH: Minutes call 2016-11-29
On Wed, 30 Nov 2016 22:52:38 +, Jan Scheurich wrote: > The corresponding section in the document is already updated with the > results of the discussion. So please have look. There's also a short > summary of the main results at the end of the mail. Thanks for writing that down. I see several shortcomings in the proposal and several things make zero sense to me as proposed (probably EXT-382 fault). I commented on that in the document. Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v12 rebased 1/3] userspace: add support for pop_eth and push_eth actions
On Mon, 28 Nov 2016 16:56:27 -0800, Ben Pfaff wrote: > I know that this isn't intended to be applied yet, according to the > cover letter, but it doesn't build: > > ../lib/odp-util.c:5504:15: error: no member named 'base_layer' in 'struct > flow' > ../lib/odp-util.c:5504:40: error: no member named 'base_layer' in 'struct > flow' > ../lib/odp-util.c:5505:19: error: no member named 'base_layer' in 'struct > flow' > ../lib/odp-util.c:5505:33: error: use of undeclared identifier 'LAYER_2' > ../lib/odp-util.c:5512:20: error: no member named 'base_layer' in 'struct > flow' > ../lib/odp-util.c:5512:40: error: no member named 'base_layer' in 'struct > flow' > > and furthermore I can't any mention of base_layer in the OVS history so > I'm not sure what's going on here. It's in patch 2, seems that the order is mixed up. Sorry for that. Please apply all three patches together for testing. We'll be taking a different approach with this patchset, implementing EXT-382 instead. Thanks, Jiri ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v13 4/8] openvswitch: support MPLS push and pop for L3 packets
Update Ethernet header only if there is one. Signed-off-by: Jiri Benc <jb...@redhat.com> Acked-by: Pravin B Shelar <pshe...@ovn.org> --- net/openvswitch/actions.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index dc8bb97e2258..064cbcb7b0c5 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -187,7 +187,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key, skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN); - update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype); + if (ovs_key_mac_proto(key) == MAC_PROTO_ETHERNET) + update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype); skb->protocol = mpls->mpls_ethertype; invalidate_flow_key(key); @@ -197,7 +198,6 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key, static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, const __be16 ethertype) { - struct ethhdr *hdr; int err; err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN); @@ -213,11 +213,15 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, skb_reset_mac_header(skb); skb_set_network_header(skb, skb->mac_len); - /* mpls_hdr() is used to locate the ethertype field correctly in the -* presence of VLAN tags. -*/ - hdr = (struct ethhdr *)((void *)mpls_hdr(skb) - ETH_HLEN); - update_ethertype(skb, hdr, ethertype); + if (ovs_key_mac_proto(key) == MAC_PROTO_ETHERNET) { + struct ethhdr *hdr; + + /* mpls_hdr() is used to locate the ethertype field correctly in the +* presence of VLAN tags. +*/ + hdr = (struct ethhdr *)((void *)mpls_hdr(skb) - ETH_HLEN); + update_ethertype(skb, hdr, ethertype); + } if (eth_p_mpls(skb->protocol)) skb->protocol = ethertype; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v13 3/8] openvswitch: pass mac_proto to ovs_vport_send
We'll need it to alter packets sent to ARPHRD_NONE interfaces. Change do_output() to use the actual L2 header size of the packet when deciding on the minimum cutlen. The assumption here is that what matters is not the output interface hard_header_len but rather the L2 header of the particular packet. For example, ARPHRD_NONE tunnels that encapsulate Ethernet should get at least the Ethernet header. Signed-off-by: Jiri Benc <jb...@redhat.com> Acked-by: Pravin B Shelar <pshe...@ovn.org> --- net/openvswitch/actions.c | 29 + net/openvswitch/vport.c | 2 +- net/openvswitch/vport.h | 2 +- 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 44144f914920..dc8bb97e2258 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -66,6 +66,7 @@ struct ovs_frag_data { u16 vlan_tci; __be16 vlan_proto; unsigned int l2_len; + u8 mac_proto; u8 l2_data[MAX_L2_LEN]; }; @@ -673,7 +674,7 @@ static int ovs_vport_output(struct net *net, struct sock *sk, struct sk_buff *sk skb_reset_mac_len(skb); } - ovs_vport_send(vport, skb); + ovs_vport_send(vport, skb, data->mac_proto); return 0; } @@ -692,7 +693,7 @@ static int ovs_vport_output(struct net *net, struct sock *sk, struct sk_buff *sk * ovs_vport_output(), which is called once per fragmented packet. */ static void prepare_frag(struct vport *vport, struct sk_buff *skb, -u16 orig_network_offset) +u16 orig_network_offset, u8 mac_proto) { unsigned int hlen = skb_network_offset(skb); struct ovs_frag_data *data; @@ -705,6 +706,7 @@ static void prepare_frag(struct vport *vport, struct sk_buff *skb, data->network_offset = orig_network_offset; data->vlan_tci = skb->vlan_tci; data->vlan_proto = skb->vlan_proto; + data->mac_proto = mac_proto; data->l2_len = hlen; memcpy(>l2_data, skb->data, hlen); @@ -713,7 +715,8 @@ static void prepare_frag(struct vport *vport, struct sk_buff *skb, } static void ovs_fragment(struct net *net, struct vport *vport, -struct sk_buff *skb, u16 mru, __be16 ethertype) +struct sk_buff *skb, u16 mru, +struct sw_flow_key *key) { u16 orig_network_offset = 0; @@ -727,11 +730,12 @@ static void ovs_fragment(struct net *net, struct vport *vport, goto err; } - if (ethertype == htons(ETH_P_IP)) { + if (key->eth.type == htons(ETH_P_IP)) { struct dst_entry ovs_dst; unsigned long orig_dst; - prepare_frag(vport, skb, orig_network_offset); + prepare_frag(vport, skb, orig_network_offset, +ovs_key_mac_proto(key)); dst_init(_dst, _dst_ops, NULL, 1, DST_OBSOLETE_NONE, DST_NOCOUNT); ovs_dst.dev = vport->dev; @@ -742,7 +746,7 @@ static void ovs_fragment(struct net *net, struct vport *vport, ip_do_fragment(net, skb->sk, skb, ovs_vport_output); refdst_drop(orig_dst); - } else if (ethertype == htons(ETH_P_IPV6)) { + } else if (key->eth.type == htons(ETH_P_IPV6)) { const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops(); unsigned long orig_dst; struct rt6_info ovs_rt; @@ -751,7 +755,8 @@ static void ovs_fragment(struct net *net, struct vport *vport, goto err; } - prepare_frag(vport, skb, orig_network_offset); + prepare_frag(vport, skb, orig_network_offset, +ovs_key_mac_proto(key)); memset(_rt, 0, sizeof(ovs_rt)); dst_init(_rt.dst, _dst_ops, NULL, 1, DST_OBSOLETE_NONE, DST_NOCOUNT); @@ -765,7 +770,7 @@ static void ovs_fragment(struct net *net, struct vport *vport, refdst_drop(orig_dst); } else { WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", - ovs_vport_name(vport), ntohs(ethertype), mru, + ovs_vport_name(vport), ntohs(key->eth.type), mru, vport->dev->mtu); goto err; } @@ -785,19 +790,19 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, u32 cutlen = OVS_CB(skb)->cutlen; if (unlikely(cutlen > 0)) { - if (skb->len - cutlen > ETH_HLEN) + if (skb->len - cutlen > ovs_mac_header_len(key)) pskb_trim(skb, skb->len - cutlen); else -
[ovs-dev] [PATCH net-next v13 1/8] openvswitch: use hard_header_len instead of hardcoded ETH_HLEN
On tx, use hard_header_len while deciding whether to refragment or drop the packet. That way, all combinations are calculated correctly: * L2 packet going to L2 interface (the L2 header len is subtracted), * L2 packet going to L3 interface (the L2 header is included in the packet lenght), * L3 packet going to L3 interface. Signed-off-by: Jiri Benc <jb...@redhat.com> Acked-by: Pravin B Shelar <pshe...@ovn.org> --- net/openvswitch/actions.c | 3 ++- net/openvswitch/vport.c | 10 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 1105c4e29c62..49af167105d3 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -791,7 +791,8 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, pskb_trim(skb, ETH_HLEN); } - if (likely(!mru || (skb->len <= mru + ETH_HLEN))) { + if (likely(!mru || + (skb->len <= mru + vport->dev->hard_header_len))) { ovs_vport_send(vport, skb); } else if (mru <= vport->dev->mtu) { struct net *net = read_pnet(>net); diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index 9bb85b35a1fb..3e131f6868f2 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -464,9 +464,10 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, return 0; } -static unsigned int packet_length(const struct sk_buff *skb) +static unsigned int packet_length(const struct sk_buff *skb, + struct net_device *dev) { - unsigned int length = skb->len - ETH_HLEN; + unsigned int length = skb->len - dev->hard_header_len; if (!skb_vlan_tag_present(skb) && eth_type_vlan(skb->protocol)) @@ -484,10 +485,11 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb) { int mtu = vport->dev->mtu; - if (unlikely(packet_length(skb) > mtu && !skb_is_gso(skb))) { + if (unlikely(packet_length(skb, vport->dev) > mtu && +!skb_is_gso(skb))) { net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n", vport->dev->name, -packet_length(skb), mtu); +packet_length(skb, vport->dev), mtu); vport->dev->stats.tx_errors++; goto drop; } -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev