Re: [ovs-dev] [ovs-dev, RFC] ovn: Revised support for service function chaining
On Mon, Mar 13, 2017 at 1:28 PM, John McDowall < jmcdow...@paloaltonetworks.com> wrote: > This patch set is an alternative implementation of service function > chaining (SFC) for OVS/OVN. The major change from the previous patch is > that the overloading of the ACL stage in ovn-northd.c has been removed > and replaced with additional logic in the CHAIN stage. > > This was done to improve modularity of the code as it was felt that > overloading the ACL stage did not add a lot and made it hard to compose > reusable SFCs. > > The new approach can still use the matching logic in OVN as a match > argument has been added. This has not been fully implemented yet as it > may need some error checking to ensure the match does not conflicit > with the lport in the chain and the bi-directional case. > > The other major change is the logic for the final destination of the > flows exiting the service chain. This has been simplified such that the > rules in the chain stage determine when the flow is exiting the port-chain > and then the flow just follows the normal path to the src or dst of the > flow. > > Areas to review > > 1) Logic for delivering flow after it leaves the chain. I think it is now > general and should work across subnets etc. > This revision does address one of my concerns with the previous revision, specifically the previous proposal to specify a 'last_hop_port' at the end of the logical port chain. However, it does not address my other major concern about the behavior at the end of the service chain, the interaction with the way OVN forwards traffic to the outside world. Repeating my comments from the last revision: In OVN without SFC, traffic is forwarded through a logical switch to a logical switch port of type 'localnet' that effectively resides on all hypervisors, that then forwards to the physical L2 network. When a VM residing on that same logical switch originates traffic destined for the outside world, it is directed to the local instance of the type 'localnet' logical switch port on the VM's hypervisor. The physical L2 network learns that the VM's MAC address resides on that hypervisor, and forwards all traffic to that MAC address to that specific hypervisor. Typically the physical L2 network first learns the VM's MAC address due to a gratuitous ARP packet sent by the ovn-controller on the VM's hypervisor. In this SFC proposal, at the end of the logical port chain, traffic is forwarded directly to the destination. This might work for destinations attached directly to OVN, but it would break forwarding to the physical L2 network: 1. If the last port pair group in the logical port chain contains more than one port pair spread across more than one chassis, then this would completely break upstream L2 learning. Traffic would have to be forwarded to a common point before being sent upstream. 2. What if the SFC classifier granularity is finer than per source VM, and the multiple logical port chains (that one VM's traffic can be redirected to) end at different chassis? 3. The gratuitous ARP packets generated by OVN for VIFs residing on the same logical switch as the 'localnet' port are sent from each VIF's chassis. This may be different than the chassis at the end of the logical port chain. There are some topologies where this SFC proposal will work even for traffic destined for the outside world: if there are no VIFs on the same logical switch as the 'localnet' port, and either a gateway router is used or a distributed router with only centralized NAT rules is used. If a distributed router is specified with distributed NAT rules, then point 3 above gets even worse, since ARP replies for a distributed NAT external IP address are restricted to the chassis where the corresponding VIF resides. My preference is to return to the originating hypervisor at the end of the logical port chain. With such an approach, L2 and L3 forwarding are unaffected. However, there is a question how to determine from a packet at the end of the logical port chain, what is the originating hypervisor? The best answer is to use an encapsulation such as Geneve or NSH that can carry context information. Another alternative is to do a lookup of the source address when the chain was entered in the "exit-lport" direction. An example to clarify the problem: Single Logical Switch LS1. VM1 on HV1 with MAC1, IP1. Logical Port Chain LPC1 with one Logical Port Pair Group LPPG1. LPPG1 has two Logical Port Pairs LPP1 and LPP2, residing on HV2 and HV3 respectively. When VM1 comes up, HV1 will send a gratuitous ARP packet out the localnet interface on HV1 to the physical L2 network, using MAC1 and IP1. The physical L2 network will learn that MAC1 is on HV1. VM1 sends a packet to the outside world. The packet gets redirected down LPC1 to LPP1 on HV2. Using the logic below, the packet will go through a normal L2 destination lookup on HV2, so it will be sent out the localnet interface on HV2 to the physical L2 network.
[ovs-dev] [PATCH branch-2.7 25/25] lib: Indicate if netlink message had labels.
Conntrack update events include labels only if they have changed. Record the presence of labels in the netlink message to OVS internal representation, so that the user may keep the old labels when an update does not modify them. Fixes: 6830a0c0e6bf ("netlink-conntrack: New module.") Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- lib/ct-dpif.h | 1 + lib/netlink-conntrack.c | 1 + 2 files changed, 2 insertions(+) diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h index 5da3c2c..e8e159a 100644 --- a/lib/ct-dpif.h +++ b/lib/ct-dpif.h @@ -163,6 +163,7 @@ struct ct_dpif_entry { struct ct_dpif_protoinfo protoinfo; ovs_u128 labels; +bool have_labels; uint32_t status; /* Timeout for this entry in seconds */ uint32_t timeout; diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c index aab5b1f..8b82db2 100644 --- a/lib/netlink-conntrack.c +++ b/lib/netlink-conntrack.c @@ -780,6 +780,7 @@ nl_ct_attrs_to_ct_dpif_entry(struct ct_dpif_entry *entry, entry->mark = ntohl(nl_attr_get_be32(attrs[CTA_MARK])); } if (attrs[CTA_LABELS]) { +entry->have_labels = true; memcpy(>labels, nl_attr_get(attrs[CTA_LABELS]), MIN(sizeof entry->labels, nl_attr_get_size(attrs[CTA_LABELS]))); } -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 19/25] datapath: Simplify labels length logic.
Upstream commit: commit b87cec3814ccc7f6afb0a1378ee7e5110d07cdd3 Author: Jarno RajahalmeDate: Thu Feb 9 11:21:56 2017 -0800 openvswitch: Simplify labels length logic. Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 distinct labels"), the size of conntrack labels extension has fixed to 128 bits, so we do not need to check for labels sizes shorter than 128 at run-time. This patch simplifies labels length logic accordingly, but allows the conntrack labels size to be increased in the future without breaking the build. In the event of conntrack labels increasing in size OVS would still be able to deal with the 128 first label bits. Suggested-by: Joe Stringer Signed-off-by: Jarno Rajahalme Acked-by: Pravin B Shelar Acked-by: Joe Stringer Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index b5c80be..2d095b8 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -145,22 +145,20 @@ static size_t ovs_ct_get_labels_len(struct nf_conn_labels *cl) #endif } +/* Guard against conntrack labels max size shrinking below 128 bits. */ +#if NF_CT_LABELS_MAX_SIZE < 16 +#error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes +#endif + static void ovs_ct_get_labels(const struct nf_conn *ct, struct ovs_key_ct_labels *labels) { struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL; - if (cl) { - size_t len = ovs_ct_get_labels_len(cl); - - if (len > OVS_CT_LABELS_LEN) - len = OVS_CT_LABELS_LEN; - else if (len < OVS_CT_LABELS_LEN) - memset(labels, 0, OVS_CT_LABELS_LEN); - memcpy(labels, cl->bits, len); - } else { + if (cl) + memcpy(labels, cl->bits, OVS_CT_LABELS_LEN); + else memset(labels, 0, OVS_CT_LABELS_LEN); - } } static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 23/25] nx-match: Fix oxm decode.
From: Yi-Hung Weidecode_nx_packet_in2() may be used by the switch to parse NXT_RESUME messages, where we need exact match on the oxm header. Therefore, change oxm_decode_loose() to oxm_decode() that takes an extra argument to indicate whether we want strict or loose match. Fixes: 7befb20d0f70 ("ofp-util: Ignore unknown fields in ofputil_decode_packet_in2()") Signed-off-by: Yi-Hung Wei Signed-off-by: Joe Stringer Signed-off-by: Jarno Rajahalme --- lib/nx-match.c | 10 +- lib/nx-match.h | 4 ++-- lib/ofp-util.c | 5 ++--- 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/lib/nx-match.c b/lib/nx-match.c index 2e62e99..43672cb 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -683,14 +683,14 @@ oxm_pull_match_loose(struct ofpbuf *b, const struct tun_table *tun_table, * * Returns 0 if successful, otherwise an OpenFlow error code. * - * Encountering unknown OXM headers or missing field prerequisites are not - * considered as error conditions. + * If 'loose' is true, encountering unknown OXM headers or missing field + * prerequisites are not considered as error conditions. */ enum ofperr -oxm_decode_match_loose(const void *oxm, size_t oxm_len, - const struct tun_table *tun_table, struct match *match) +oxm_decode_match(const void *oxm, size_t oxm_len, bool loose, + const struct tun_table *tun_table, struct match *match) { -return nx_pull_raw(oxm, oxm_len, false, match, NULL, NULL, tun_table); +return nx_pull_raw(oxm, oxm_len, !loose, match, NULL, NULL, tun_table); } /* Verify an array of OXM TLVs treating value of each TLV as a mask, diff --git a/lib/nx-match.h b/lib/nx-match.h index cee9e65..e103dd5 100644 --- a/lib/nx-match.h +++ b/lib/nx-match.h @@ -61,8 +61,8 @@ enum ofperr oxm_pull_match(struct ofpbuf *, const struct tun_table *, struct match *); enum ofperr oxm_pull_match_loose(struct ofpbuf *, const struct tun_table *, struct match *); -enum ofperr oxm_decode_match_loose(const void *, size_t, - const struct tun_table *, struct match *); +enum ofperr oxm_decode_match(const void *, size_t, bool, + const struct tun_table *, struct match *); enum ofperr oxm_pull_field_array(const void *, size_t fields_len, struct field_array *); diff --git a/lib/ofp-util.c b/lib/ofp-util.c index 9e8d4d2..d315337 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -3397,9 +3397,8 @@ decode_nx_packet_in2(const struct ofp_header *oh, bool loose, } case NXPINT_METADATA: -error = oxm_decode_match_loose(payload.msg, - ofpbuf_msgsize(), - tun_table, >flow_metadata); +error = oxm_decode_match(payload.msg, ofpbuf_msgsize(), + loose, tun_table, >flow_metadata); break; case NXPINT_USERDATA: -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 22/25] ofp-util: Ignore unknown fields in ofputil_decode_packet_in2().
The decoder of packet_in messages should not fail on encountering unknown metadata fields. This allows the switch to add new features without breaking controllers. The controllers should, however, copy the metadata fields from the packet_int to packet_out so that the switch gets back the full metadata. OVN is already doing this. Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- lib/nx-match.c | 25 - lib/nx-match.h | 4 ++-- lib/ofp-util.c | 5 +++-- 3 files changed, 21 insertions(+), 13 deletions(-) diff --git a/lib/nx-match.c b/lib/nx-match.c index 91401e2..2e62e99 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -504,6 +504,9 @@ nx_pull_match_entry(struct ofpbuf *b, bool allow_cookie, return 0; } +/* Prerequisites will only be checked when 'strict' is 'true'. This allows + * decoding conntrack original direction 5-tuple IP addresses without the + * ethertype being present, when decoding metadata only. */ static enum ofperr nx_pull_raw(const uint8_t *p, unsigned int match_len, bool strict, struct match *match, ovs_be64 *cookie, ovs_be64 *cookie_mask, @@ -539,7 +542,7 @@ nx_pull_raw(const uint8_t *p, unsigned int match_len, bool strict, *cookie = value.be64; *cookie_mask = mask.be64; } -} else if (!mf_are_prereqs_ok(field, >flow, NULL)) { +} else if (strict && !mf_are_prereqs_ok(field, >flow, NULL)) { error = OFPERR_OFPBMC_BAD_PREREQ; } else if (!mf_is_all_wild(field, >wc)) { error = OFPERR_OFPBMC_DUP_FIELD; @@ -607,7 +610,8 @@ nx_pull_match(struct ofpbuf *b, unsigned int match_len, struct match *match, } /* Behaves the same as nx_pull_match(), but skips over unknown NXM headers, - * instead of failing with an error. */ + * instead of failing with an error, and does not check for field + * prerequisities. */ enum ofperr nx_pull_match_loose(struct ofpbuf *b, unsigned int match_len, struct match *match, @@ -664,8 +668,9 @@ oxm_pull_match(struct ofpbuf *b, const struct tun_table *tun_table, return oxm_pull_match__(b, true, tun_table, match); } -/* Behaves the same as oxm_pull_match() with one exception. Skips over unknown - * OXM headers instead of failing with an error when they are encountered. */ +/* Behaves the same as oxm_pull_match() with two exceptions. Skips over + * unknown OXM headers instead of failing with an error when they are + * encountered, and does not check for field prerequisities. */ enum ofperr oxm_pull_match_loose(struct ofpbuf *b, const struct tun_table *tun_table, struct match *match) @@ -676,14 +681,16 @@ oxm_pull_match_loose(struct ofpbuf *b, const struct tun_table *tun_table, /* Parses the OXM match description in the 'oxm_len' bytes in 'oxm'. Stores * the result in 'match'. * - * Fails with an error when encountering unknown OXM headers. + * Returns 0 if successful, otherwise an OpenFlow error code. * - * Returns 0 if successful, otherwise an OpenFlow error code. */ + * Encountering unknown OXM headers or missing field prerequisites are not + * considered as error conditions. + */ enum ofperr -oxm_decode_match(const void *oxm, size_t oxm_len, - const struct tun_table *tun_table, struct match *match) +oxm_decode_match_loose(const void *oxm, size_t oxm_len, + const struct tun_table *tun_table, struct match *match) { -return nx_pull_raw(oxm, oxm_len, true, match, NULL, NULL, tun_table); +return nx_pull_raw(oxm, oxm_len, false, match, NULL, NULL, tun_table); } /* Verify an array of OXM TLVs treating value of each TLV as a mask, diff --git a/lib/nx-match.h b/lib/nx-match.h index 5dca24a..cee9e65 100644 --- a/lib/nx-match.h +++ b/lib/nx-match.h @@ -61,8 +61,8 @@ enum ofperr oxm_pull_match(struct ofpbuf *, const struct tun_table *, struct match *); enum ofperr oxm_pull_match_loose(struct ofpbuf *, const struct tun_table *, struct match *); -enum ofperr oxm_decode_match(const void *, size_t, const struct tun_table *, - struct match *); +enum ofperr oxm_decode_match_loose(const void *, size_t, + const struct tun_table *, struct match *); enum ofperr oxm_pull_field_array(const void *, size_t fields_len, struct field_array *); diff --git a/lib/ofp-util.c b/lib/ofp-util.c index 0c9343e..9e8d4d2 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -3397,8 +3397,9 @@ decode_nx_packet_in2(const struct ofp_header *oh, bool loose, } case NXPINT_METADATA: -error = oxm_decode_match(payload.msg, ofpbuf_msgsize(), - tun_table, >flow_metadata); +error = oxm_decode_match_loose(payload.msg, +
[ovs-dev] [PATCH branch-2.7 21/25] datapath: Inherit master's labels.
Upstream commit: commit 09aa98ad496d6b11a698b258bc64d7f64c55d682 Author: Jarno RajahalmeDate: Thu Feb 9 11:21:58 2017 -0800 openvswitch: Inherit master's labels. We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor of inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme Acked-by: Pravin B Shelar Acked-by: Joe Stringer Signed-off-by: David S. Miller Fixes: a94ebc39996b ("datapath: Add conntrack action") Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 45 +++-- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index a56fe07..9428eb2 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -80,6 +80,8 @@ struct ovs_conntrack_info { #endif }; +static bool labels_nonzero(const struct ovs_key_ct_labels *labels); + static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info); static u16 key_to_nfproto(const struct sw_flow_key *key) @@ -275,18 +277,32 @@ static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key, const struct ovs_key_ct_labels *labels, const struct ovs_key_ct_labels *mask) { - struct nf_conn_labels *cl; - u32 *dst; - int i; + struct nf_conn_labels *cl, *master_cl; + bool have_mask = labels_nonzero(mask); + + /* Inherit master's labels to the related connection? */ + master_cl = ct->master ? nf_ct_labels_find(ct->master) : NULL; + + if (!master_cl && !have_mask) + return 0; /* Nothing to do. */ cl = ovs_ct_get_conn_labels(ct); if (!cl) return -ENOSPC; - dst = (u32 *)cl->bits; - for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) - dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | - (labels->ct_labels_32[i] & mask->ct_labels_32[i]); + /* Inherit the master's labels, if any. */ + if (master_cl) + *cl = *master_cl; + + if (have_mask) { + u32 *dst = (u32 *)cl->bits; + int i; + + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | + (labels->ct_labels_32[i] +& mask->ct_labels_32[i]); + } /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the * IPCT_LABEL bit it set in the event cache. @@ -943,13 +959,14 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, if (err) return err; } - if (labels_nonzero(>labels.mask)) { - if (!nf_ct_is_confirmed(ct)) - err = ovs_ct_init_labels(ct, key, >labels.value, ->labels.mask); - else - err = ovs_ct_set_labels(ct, key, >labels.value, - >labels.mask); + if (!nf_ct_is_confirmed(ct)) { + err = ovs_ct_init_labels(ct, key, >labels.value, +>labels.mask); + if (err) + return err; + } else if (labels_nonzero(>labels.mask)) { + err = ovs_ct_set_labels(ct, key, >labels.value, + >labels.mask); if (err) return err; } -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 07/25] datapath: remove unnecessary EXPORT_SYMBOLs
From: Jiri BencUpstream commit: commit 76e4cc7731a1e0c07e202999b9834f9d9be66de4 Author: Jiri Benc Date: Wed Oct 19 11:26:37 2016 +0200 openvswitch: remove unnecessary EXPORT_SYMBOLs Some symbols exported to other modules are really used only by openvswitch.ko. Remove the exports. Tested by loading all 4 openvswitch modules, nothing breaks. Signed-off-by: Jiri Benc Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/datapath.c | 2 -- datapath/vport-netdev.c | 1 - datapath/vport.c| 1 - 3 files changed, 4 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index be433ba..64cd781 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -62,7 +62,6 @@ #include "vport-netdev.h" int ovs_net_id __read_mostly; -EXPORT_SYMBOL_GPL(ovs_net_id); static struct genl_family dp_packet_genl_family; static struct genl_family dp_flow_genl_family; @@ -135,7 +134,6 @@ int lockdep_ovsl_is_held(void) else return 1; } -EXPORT_SYMBOL_GPL(lockdep_ovsl_is_held); #endif static int queue_gso_packets(struct datapath *dp, struct sk_buff *, diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c index 970f7d3..fd97246 100644 --- a/datapath/vport-netdev.c +++ b/datapath/vport-netdev.c @@ -167,7 +167,6 @@ void ovs_netdev_detach_dev(struct vport *vport) netdev_master_upper_dev_get(vport->dev)); dev_set_promiscuity(vport->dev, -1); } -EXPORT_SYMBOL_GPL(ovs_netdev_detach_dev); static void netdev_destroy(struct vport *vport) { diff --git a/datapath/vport.c b/datapath/vport.c index 7ac4632..9c8c0f1 100644 --- a/datapath/vport.c +++ b/datapath/vport.c @@ -507,7 +507,6 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, ovs_dp_process_packet(skb, ); return 0; } -EXPORT_SYMBOL_GPL(ovs_vport_receive); static unsigned int packet_length(const struct sk_buff *skb) { -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 20/25] datapath: Refactor labels initialization.
Upstream commit: Refactoring conntrack labels initialization makes changes in later patches easier to review. Signed-off-by: Jarno RajahalmeAcked-by: Pravin B Shelar Acked-by: Joe Stringer Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 120 +++ 1 file changed, 64 insertions(+), 56 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 2d095b8..a56fe07 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -136,15 +136,6 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct) #endif } -static size_t ovs_ct_get_labels_len(struct nf_conn_labels *cl) -{ -#ifdef HAVE_NF_CONN_LABELS_WITH_WORDS - return cl->words * sizeof(long); -#else - return sizeof(cl->bits); -#endif -} - /* Guard against conntrack labels max size shrinking below 128 bits. */ #if NF_CT_LABELS_MAX_SIZE < 16 #error NF_CT_LABELS_MAX_SIZE must be at least 16 bytes @@ -243,19 +234,12 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb) return 0; } -static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, +static int ovs_ct_set_mark(struct nf_conn *ct, struct sw_flow_key *key, u32 ct_mark, u32 mask) { #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; u32 new_mark; - /* The connection could be invalid, in which case set_mark is no-op. */ - ct = nf_ct_get(skb, ); - if (!ct) - return 0; - new_mark = ct_mark | (ct->mark & ~(mask)); if (ct->mark != new_mark) { ct->mark = new_mark; @@ -270,18 +254,9 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, #endif } -static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, -const struct ovs_key_ct_labels *labels, -const struct ovs_key_ct_labels *mask) +static struct nf_conn_labels *ovs_ct_get_conn_labels(struct nf_conn *ct) { - enum ip_conntrack_info ctinfo; struct nf_conn_labels *cl; - struct nf_conn *ct; - - /* The connection could be invalid, in which case set_label is no-op.*/ - ct = nf_ct_get(skb, ); - if (!ct) - return 0; cl = nf_ct_labels_find(ct); if (!cl) { @@ -289,37 +264,59 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, cl = nf_ct_labels_find(ct); } - if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN) + return cl; +} + +/* Initialize labels for a new, yet to be committed conntrack entry. Note that + * since the new connection is not yet confirmed, and thus no-one else has + * access to it's labels, we simply write them over. + */ +static int ovs_ct_init_labels(struct nf_conn *ct, struct sw_flow_key *key, + const struct ovs_key_ct_labels *labels, + const struct ovs_key_ct_labels *mask) +{ + struct nf_conn_labels *cl; + u32 *dst; + int i; + + cl = ovs_ct_get_conn_labels(ct); + if (!cl) return -ENOSPC; - if (nf_ct_is_confirmed(ct)) { - /* Triggers a change event, which makes sense only for -* confirmed connections. -*/ - int err = nf_connlabels_replace(ct, labels->ct_labels_32, - mask->ct_labels_32, - OVS_CT_LABELS_LEN_32); - if (err) - return err; - } else { - u32 *dst = (u32 *)cl->bits; - const u32 *msk = mask->ct_labels_32; - const u32 *lbl = labels->ct_labels_32; - int i; + dst = (u32 *)cl->bits; + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + dst[i] = (dst[i] & ~mask->ct_labels_32[i]) | + (labels->ct_labels_32[i] & mask->ct_labels_32[i]); - /* No-one else has access to the non-confirmed entry, copy -* labels over, keeping any bits we are not explicitly setting. -*/ - for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) - dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); + /* Labels are included in the IPCTNL_MSG_CT_NEW event only if the +* IPCT_LABEL bit it set in the event cache. +*/ + nf_conntrack_event_cache(IPCT_LABEL, ct); - /* Labels are included in the IPCTNL_MSG_CT_NEW event only if -* the IPCT_LABEL bit it set in the event cache. -*/ - nf_conntrack_event_cache(IPCT_LABEL, ct); - } +
[ovs-dev] [PATCH branch-2.7 18/25] datapath: Unionize ovs_key_ct_label with a u32 array.
Upstream commit: commit cb80d58fae76d8ea93555149b2b16e19b89a1f4f Author: Jarno RajahalmeDate: Thu Feb 9 11:21:55 2017 -0800 openvswitch: Unionize ovs_key_ct_label with a u32 array. Make the array of labels in struct ovs_key_ct_label an union, adding a u32 array of the same byte size as the existing u8 array. It is faster to loop through the labels 32 bits at the time, which is also the alignment of netlink attributes. Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 15 --- datapath/linux/compat/include/linux/openvswitch.h | 8 ++-- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index cee47b7..b5c80be 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -298,20 +298,21 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, /* Triggers a change event, which makes sense only for * confirmed connections. */ - int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, - OVS_CT_LABELS_LEN / sizeof(u32)); + int err = nf_connlabels_replace(ct, labels->ct_labels_32, + mask->ct_labels_32, + OVS_CT_LABELS_LEN_32); if (err) return err; } else { u32 *dst = (u32 *)cl->bits; - const u32 *msk = (const u32 *)mask->ct_labels; - const u32 *lbl = (const u32 *)labels->ct_labels; + const u32 *msk = mask->ct_labels_32; + const u32 *lbl = labels->ct_labels_32; int i; /* No-one else has access to the non-confirmed entry, copy * labels over, keeping any bits we are not explicitly setting. */ - for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++) + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); /* Labels are included in the IPCTNL_MSG_CT_NEW event only if @@ -912,8 +913,8 @@ static bool labels_nonzero(const struct ovs_key_ct_labels *labels) { size_t i; - for (i = 0; i < sizeof(*labels); i++) - if (labels->ct_labels[i]) + for (i = 0; i < OVS_CT_LABELS_LEN_32; i++) + if (labels->ct_labels_32[i]) return true; return false; diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 12260d8..76a19ed 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -472,9 +472,13 @@ struct ovs_key_nd { __u8nd_tll[ETH_ALEN]; }; -#define OVS_CT_LABELS_LEN 16 +#define OVS_CT_LABELS_LEN_32 4 +#define OVS_CT_LABELS_LEN (OVS_CT_LABELS_LEN_32 * sizeof(__u32)) struct ovs_key_ct_labels { - __u8ct_labels[OVS_CT_LABELS_LEN]; + union { + __u8ct_labels[OVS_CT_LABELS_LEN]; + __u32 ct_labels_32[OVS_CT_LABELS_LEN_32]; + }; }; /* OVS_KEY_ATTR_CT_STATE flags */ -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 11/25] datapath: make ndo_get_stats64 a void function
From: stephen hemmingerUpstream commit: commit bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221 Author: stephen hemminger Date: Fri Jan 6 19:12:52 2017 -0800 net: make ndo_get_stats64 a void function The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller This seems to be fine for all prior Linux versions as well. Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/vport-internal_dev.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c index 2988e8c..2fa9ab9 100644 --- a/datapath/vport-internal_dev.c +++ b/datapath/vport-internal_dev.c @@ -117,7 +117,7 @@ static void internal_dev_destructor(struct net_device *dev) free_netdev(dev); } -static struct rtnl_link_stats64 * +static void internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) { int i; @@ -145,8 +145,6 @@ internal_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) stats->tx_bytes += local_stats.tx_bytes; stats->tx_packets += local_stats.tx_packets; } - - return stats; } #ifdef HAVE_IFF_PHONY_HEADROOM @@ -164,7 +162,7 @@ static const struct net_device_ops internal_dev_netdev_ops = { #ifndef HAVE_NET_DEVICE_WITH_MAX_MTU .ndo_change_mtu = internal_dev_change_mtu, #endif - .ndo_get_stats64 = internal_get_stats, + .ndo_get_stats64 = (void *)internal_get_stats, #ifdef HAVE_IFF_PHONY_HEADROOM #ifndef HAVE_NET_DEVICE_OPS_WITH_EXTENDED .ndo_set_rx_headroom = internal_set_rx_headroom, -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 06/25] datapath: remove unused functions
From: Jiri BencUpstream commit: commit f33eb0cf9984f79e8643eaac888e4b6a06a8e221 Author: Jiri Benc Date: Wed Oct 19 11:26:36 2016 +0200 openvswitch: remove unused functions ovs_vport_deferred_free is not used anywhere. It's the only caller of free_vport_rcu thus this one can be removed, too. Signed-off-by: Jiri Benc Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/vport.c | 16 datapath/vport.h | 1 - 2 files changed, 17 deletions(-) diff --git a/datapath/vport.c b/datapath/vport.c index c29f0b0..7ac4632 100644 --- a/datapath/vport.c +++ b/datapath/vport.c @@ -509,22 +509,6 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(ovs_vport_receive); -static void free_vport_rcu(struct rcu_head *rcu) -{ - struct vport *vport = container_of(rcu, struct vport, rcu); - - ovs_vport_free(vport); -} - -void ovs_vport_deferred_free(struct vport *vport) -{ - if (!vport) - return; - - call_rcu(>rcu, free_vport_rcu); -} -EXPORT_SYMBOL_GPL(ovs_vport_deferred_free); - static unsigned int packet_length(const struct sk_buff *skb) { unsigned int length = skb->len - ETH_HLEN; diff --git a/datapath/vport.h b/datapath/vport.h index 47995be..d14908f 100644 --- a/datapath/vport.h +++ b/datapath/vport.h @@ -152,7 +152,6 @@ struct vport_ops { struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *, const struct vport_parms *); void ovs_vport_free(struct vport *); -void ovs_vport_deferred_free(struct vport *vport); #define VPORT_ALIGN 8 -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 16/25] datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.
Upstream commit: commit 9ff464db50e437eef131f719cc2e9902eea9c607 Author: Jarno RajahalmeDate: Thu Feb 9 11:21:53 2017 -0800 openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. The conntrack lookup for existing connections fails to invert the packet 5-tuple for NATted packets, and therefore fails to find the existing conntrack entry. Conntrack only stores 5-tuples for incoming packets, and there are various situations where a lookup on a packet that has already been transformed by NAT needs to be made. Looking up an existing conntrack entry upon executing packet received from the userspace is one of them. This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple for the conntrack lookup whenever the packet has already been transformed by conntrack from its input form as evidenced by one of the NAT flags being set in the conntrack state metadata. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller This patch also adds a test case to OVS system tests to verify the behavior. The following is a more thorough explanation of what is going on: When we have evidence that an existing conntrack entry could exist, we must invert the tuple if NAT has already been applied, as the current packet headers do not match any tuple stored in conntrack. For example, if a packet from private address X to a public address B is source-NATted to A, the conntrack entry will have the following tuples (ignoring the protocol and port numbers) after the conntrack entry is committed: Original direction tuple: (X,B) Reply direction tuple: (B,A) Now, if a reply packet is already transformed back to the private address space (e.g., with a CT(nat) action), the tuple corresponding to the current packet headers is: Current packet tuple: (B,X) This does not match either of the conntrack tuples above. Normally this does not matter, as the conntrack lookup was already done using the tuple (B,A), but if the current packet does not match any flow in the OVS datapath, the packet is sent to userspace via an upcall, during which the packet's skb is freed, and the conntrack entry pointer in the skb is lost. When the packet is reintroduced to the datapath, any further conntrack action will need to perform a new conntrack lookup to find the entry again. Prior to this patch this second lookup failed. The datapath flow setup corresponding to the upcall can succeed, however, allowing all further packets in the reply direction to re-use the conntrack entry pointer in the skb, so typically the lookup failure only causes a packet drop. The solution is to invert the tuple derived from the current packet headers in case the conntrack state stored in the packet metadata indicates that the packet has been transformed by NAT: Inverted tuple: (X,B) With this the conntrack entry can be found, matching the original direction tuple. This same logic also works for the original direction packets: Current packet tuple (after reverse NAT): (A,B) Inverted tuple: (B,A) While the current packet tuple (A,B) does not match either of the conntrack tuples, the inverted one (B,A) does match the reply direction tuple. Since the inverted tuple matches the reverse direction tuple the direction of the packet must be reversed as well. Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c| 24 +-- tests/system-traffic.at | 52 + 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 9e34af9..08e5eab 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -471,7 +471,7 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) */ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, -u8 l3num, struct sk_buff *skb) +u8 l3num, struct sk_buff *skb, bool natted) { struct nf_conntrack_l3proto *l3proto; struct nf_conntrack_l4proto *l4proto; @@ -494,6 +494,17 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, return NULL; } + /* Must invert the tuple if skb has been transformed by NAT. */ + if (natted) { + struct nf_conntrack_tuple inverse; + + if (!nf_ct_invert_tuple(, , l3proto, l4proto)) { + pr_debug("ovs_ct_find_existing: Inversion failed!\n"); + return NULL; + } + tuple = inverse; + } + /* look for tuple match */ h =
[ovs-dev] [PATCH branch-2.7 17/25] datapath: Do not trigger events for unconfirmed connections.
Upstream commit: commit 193e30967897f3a8b6f9f137ac30571d832c2c5c Author: Jarno RajahalmeDate: Thu Feb 9 11:21:54 2017 -0800 openvswitch: Do not trigger events for unconfirmed connections. Receiving change events before the 'new' event for the connection has been received can be confusing. Avoid triggering change events for setting conntrack mark or labels before the conntrack entry has been confirmed. Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Upstream commit: commit 2317c6b51e4249dbfa093e1b88cab0a9f0564b7f Author: Jarno Rajahalme Date: Fri Feb 17 18:11:58 2017 -0800 openvswitch: Set event bit after initializing labels. Connlabels are included in conntrack netlink event messages only if the IPCT_LABEL bit is set in the event cache (see ctnetlink_conntrack_event()). Set it after initializing labels for a new connection. Found upon further system testing, where it was noticed that labels were missing from the conntrack events. Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con nections.") Signed-off-by: Jarno Rajahalme Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark") Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 08e5eab..cee47b7 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -261,7 +261,8 @@ static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, new_mark = ct_mark | (ct->mark & ~(mask)); if (ct->mark != new_mark) { ct->mark = new_mark; - nf_conntrack_event_cache(IPCT_MARK, ct); + if (nf_ct_is_confirmed(ct)) + nf_conntrack_event_cache(IPCT_MARK, ct); key->ct.mark = new_mark; } @@ -278,7 +279,6 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, enum ip_conntrack_info ctinfo; struct nf_conn_labels *cl; struct nf_conn *ct; - int err; /* The connection could be invalid, in which case set_label is no-op.*/ ct = nf_ct_get(skb, ); @@ -294,10 +294,31 @@ static int ovs_ct_set_labels(struct sk_buff *skb, struct sw_flow_key *key, if (!cl || ovs_ct_get_labels_len(cl) < OVS_CT_LABELS_LEN) return -ENOSPC; - err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, - OVS_CT_LABELS_LEN / sizeof(u32)); - if (err) - return err; + if (nf_ct_is_confirmed(ct)) { + /* Triggers a change event, which makes sense only for +* confirmed connections. +*/ + int err = nf_connlabels_replace(ct, (u32 *)labels, (u32 *)mask, + OVS_CT_LABELS_LEN / sizeof(u32)); + if (err) + return err; + } else { + u32 *dst = (u32 *)cl->bits; + const u32 *msk = (const u32 *)mask->ct_labels; + const u32 *lbl = (const u32 *)labels->ct_labels; + int i; + + /* No-one else has access to the non-confirmed entry, copy +* labels over, keeping any bits we are not explicitly setting. +*/ + for (i = 0; i < OVS_CT_LABELS_LEN / sizeof(u32); i++) + dst[i] = (dst[i] & ~msk[i]) | (lbl[i] & msk[i]); + + /* Labels are included in the IPCTNL_MSG_CT_NEW event only if +* the IPCT_LABEL bit it set in the event cache. +*/ + nf_conntrack_event_cache(IPCT_LABEL, ct); + } ovs_ct_get_labels(ct, >ct.labels); return 0; -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 15/25] datapath: Fix comments for skb->_nfct
Upstream commit: commit 5e17da634a21b1200853fe82ba67d6571f2beabe Author: Jarno RajahalmeDate: Thu Feb 9 11:21:52 2017 -0800 openvswitch: Fix comments for skb->_nfct Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that they are combined into '_nfct'. Signed-off-by: Jarno Rajahalme Acked-by: Pravin B Shelar Acked-by: Joe Stringer Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- datapath/conntrack.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 6b8414c..9e34af9 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -173,7 +173,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state, ovs_ct_get_labels(ct, >ct.labels); } -/* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has +/* Update 'key' based on skb->_nfct. If 'post_ct' is true, then OVS has * previously sent the packet to conntrack via the ct action. If * 'keep_nat_flags' is true, the existing NAT flags retained, else they are * initialized from the connection status. @@ -462,12 +462,12 @@ ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h) /* Find an existing connection which this packet belongs to without * re-attributing statistics or modifying the connection state. This allows an - * skb->nfct lost due to an upcall to be recovered during actions execution. + * skb->_nfct lost due to an upcall to be recovered during actions execution. * * Must be called with rcu_read_lock. * - * On success, populates skb->nfct and skb->nfctinfo, and returns the - * connection. Returns NULL if there is no existing entry. + * On success, populates skb->_nfct and returns the connection. Returns NULL + * if there is no existing entry. */ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, @@ -505,7 +505,7 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, return ct; } -/* Determine whether skb->nfct is equal to the result of conntrack lookup. */ +/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */ static bool skb_nfct_cached(struct net *net, const struct sw_flow_key *key, const struct ovs_conntrack_info *info, @@ -516,7 +516,7 @@ static bool skb_nfct_cached(struct net *net, ct = nf_ct_get(skb, ); /* If no ct, check if we have evidence that an existing conntrack entry -* might be found for this skb. This happens when we lose a skb->nfct +* might be found for this skb. This happens when we lose a skb->_nfct * due to an upcall. If the connection was not confirmed, it is not * cached and needs to be run through conntrack again. */ @@ -740,7 +740,7 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key *key, /* Pass 'skb' through conntrack in 'net', using zone configured in 'info', if * not done already. Update key with new CT state after passing the packet * through conntrack. - * Note that if the packet is deemed invalid by conntrack, skb->nfct will be + * Note that if the packet is deemed invalid by conntrack, skb->_nfct will be * set to NULL and 0 will be returned. */ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 14/25] datapath: add and use nf_ct_set helper
From: Florian WestphalUpstream commit: commit c74454fadd5ea6fc866ffe2c417a0dba56b2bf1c Author: Florian Westphal Date: Mon Jan 23 18:21:57 2017 +0100 netfilter: add and use nf_ct_set helper Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff. This avoids changing code in followup patch that merges skb->nfct and skb->nfctinfo into skb->_nfct. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- acinclude.m4 | 2 ++ datapath/conntrack.c | 6 ++ datapath/linux/compat/include/net/netfilter/nf_conntrack.h | 8 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index ea5dfe9..751dc63 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -531,6 +531,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h], [nf_ct_get_tuplepr], [struct.net], [OVS_DEFINE([HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET])]) + OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h], + [nf_ct_set]) OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_zones.h], [nf_ct_zone_init]) OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack_labels.h], diff --git a/datapath/conntrack.c b/datapath/conntrack.c index ec354c3..6b8414c 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -501,8 +501,7 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, ct = nf_ct_tuplehash_to_ctrack(h); - skb->nfct = >ct_general; - skb->nfctinfo = ovs_ct_get_info(h); + nf_ct_set(skb, ct, ovs_ct_get_info(h)); return ct; } @@ -766,8 +765,7 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, if (skb_nfct(skb)) nf_conntrack_put(skb_nfct(skb)); nf_conntrack_get(>ct_general); - skb->nfct = >ct_general; - skb->nfctinfo = IP_CT_NEW; + nf_ct_set(skb, tmpl, IP_CT_NEW); } err = nf_conntrack_in(net, info->family, diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h index e02e20b..bb40b0f 100644 --- a/datapath/linux/compat/include/net/netfilter/nf_conntrack.h +++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack.h @@ -14,4 +14,12 @@ static inline bool rpl_nf_ct_get_tuplepr(const struct sk_buff *skb, #define nf_ct_get_tuplepr rpl_nf_ct_get_tuplepr #endif +#ifndef HAVE_NF_CT_SET +static inline void +nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info) +{ + skb->nfct = >ct_general; + skb->nfctinfo = info; +} +#endif #endif /* _NF_CONNTRACK_WRAPPER_H */ -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 13/25] datapath: add and use skb_nfct helper
From: Florian WestphalUpstream commit: commit cb9c68363efb6d1f950ec55fb06e031ee70db5fc Author: Florian Westphal Date: Mon Jan 23 18:21:56 2017 +0100 skbuff: add and use skb_nfct helper Followup patch renames skb->nfct and changes its type so add a helper to avoid intrusive rename change later. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer --- acinclude.m4 | 1 + datapath/conntrack.c | 6 +++--- datapath/linux/compat/include/linux/skbuff.h | 11 +++ 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index b34c0fd..ea5dfe9 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -606,6 +606,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_clear_hash_if_not_l4]) OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_postpush_rcsum]) OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [lco_csum]) + OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_nfct]) OVS_GREP_IFELSE([$KSRC/include/linux/types.h], [bool], [OVS_DEFINE([HAVE_BOOL_TYPE])]) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 36db32a..ec354c3 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -763,8 +763,8 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, /* Associate skb with specified zone. */ if (tmpl) { - if (skb->nfct) - nf_conntrack_put(skb->nfct); + if (skb_nfct(skb)) + nf_conntrack_put(skb_nfct(skb)); nf_conntrack_get(>ct_general); skb->nfct = >ct_general; skb->nfctinfo = IP_CT_NEW; @@ -861,7 +861,7 @@ static int ovs_ct_lookup(struct net *net, struct sw_flow_key *key, if (err) return err; - ct = (struct nf_conn *)skb->nfct; + ct = (struct nf_conn *)skb_nfct(skb); if (ct) nf_ct_deliver_cached_events(ct); } diff --git a/datapath/linux/compat/include/linux/skbuff.h b/datapath/linux/compat/include/linux/skbuff.h index a2cbd78..943d5f8 100644 --- a/datapath/linux/compat/include/linux/skbuff.h +++ b/datapath/linux/compat/include/linux/skbuff.h @@ -371,4 +371,15 @@ static inline __wsum lco_csum(struct sk_buff *skb) return csum_partial(l4_hdr, csum_start - l4_hdr, partial); } #endif + +#ifndef HAVE_SKB_NFCT +static inline struct nf_conntrack *skb_nfct(const struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + return skb->nfct; +#else + return NULL; +#endif +} +#endif #endif -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 12/25] datapath: Allow compiling against Linux 4.10
OVS in-tree datapath compiles against Linux 4.10 kernel, so allow it. Signed-off-by: Jarno RajahalmeAcked-by: Joe Stringer --- acinclude.m4 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 052a18f..b34c0fd 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -134,10 +134,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [ AC_MSG_RESULT([$kversion]) if test "$version" -ge 4; then - if test "$version" = 4 && test "$patchlevel" -le 9; then + if test "$version" = 4 && test "$patchlevel" -le 10; then : # Linux 4.x else - AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.9.x is not supported (please refer to the FAQ for advice)]) + AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.10.x is not supported (please refer to the FAQ for advice)]) fi elif test "$version" = 3 && test "$patchlevel" -ge 10; then : # Linux 3.x -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 09/25] datapath: handle NF_REPEAT from nf_conntrack_in()
From: Pablo Neira AyusoUpstream commit: commit 08733a0cb7decce40bbbd0331a0449465f13c444 Author: Pablo Neira Ayuso Date: Thu Nov 3 10:56:43 2016 +0100 netfilter: handle NF_REPEAT from nf_conntrack_in() NF_REPEAT is only needed from nf_conntrack_in() under a very specific case required by the TCP protocol tracker, we can handle this case without returning to the core hook path. Handling of NF_REPEAT from the nf_reinject() is left untouched. Signed-off-by: Pablo Neira Ayuso [Committer notes] Shift the functionality into the compat code, protected by v4.10 version check. This allows the datapath/conntrack.c to match upstream. Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/conntrack.c| 8 ++-- .../include/net/netfilter/nf_conntrack_core.h | 21 + 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index a0c5443..36db32a 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -770,12 +770,8 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, skb->nfctinfo = IP_CT_NEW; } - /* Repeat if requested, see nf_iterate(). */ - do { - err = nf_conntrack_in(net, info->family, - NF_INET_PRE_ROUTING, skb); - } while (err == NF_REPEAT); - + err = nf_conntrack_in(net, info->family, + NF_INET_PRE_ROUTING, skb); if (err != NF_ACCEPT) return -ENOENT; diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h index 09a53c3..16b57a6 100644 --- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h +++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h @@ -67,4 +67,25 @@ static inline bool rpl_nf_ct_get_tuple(const struct sk_buff *skb, #define nf_ct_get_tuple rpl_nf_ct_get_tuple #endif /* HAVE_NF_CT_GET_TUPLEPR_TAKES_STRUCT_NET */ +/* Commit 08733a0cb7de ("netfilter: handle NF_REPEAT from nf_conntrack_in()") + * introduced behavioural changes to this function which cannot be detected + * in the headers. Unconditionally backport to kernels older than the one which + * contains this commit. */ +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,10,0) +static unsigned int rpl_nf_conntrack_in(struct net *net, u_int8_t pf, + unsigned int hooknum, + struct sk_buff *skb) +{ + int err; + + /* Repeat if requested, see nf_iterate(). */ + do { + err = nf_conntrack_in(net, pf, hooknum, skb); + } while (err == NF_REPEAT); + + return err; +} +#define nf_conntrack_in rpl_nf_conntrack_in +#endif /* < 4.10 */ + #endif /* _NF_CONNTRACK_CORE_WRAPPER_H */ -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 10/25] datapath: netns: make struct pernet_operations::id unsigned int.
From: Alexey DobriyanUpstream commit: commit c7d03a00b56fc23c3a01a8353789ad257363e281 Author: Alexey Dobriyan Date: Thu Nov 17 04:58:21 2016 +0300 netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] callg MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 38863959 +73 tipc_link_build_proto_msg 10961140 +44 mac80211_hwsim_new_radio27762808 +32 tipc_mon_rcv10321058 +26 svcauth_gss_legacy_init 14131429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 12471260 +13 nfsd4_setclientid_confirm782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 10851063 -22 nfs_get_client 10771050 -27 tcf_bpf_init11061076 -30 nfsd4_encode_fattr 59975930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan Signed-off-by: David S. Miller [Committer notes] It looks like changing the type of this doesn't affect the build on older kernels, so we can just make the change. I didn't go through all of the compat code to update the net_id variables there as none of that code should be enabled on kernels with this patch. Upstream: c7d03a00b56f ("netns: make struct pernet_operations::id unsigned int") Signed-off-by: Joe Stringer Acked-by: Jarno Rajahalme Signed-off-by: Jarno Rajahalme --- datapath/datapath.c | 2 +- datapath/datapath.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index 64cd781..78fcc1c 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -61,7 +61,7 @@ #include "vport-internal_dev.h" #include "vport-netdev.h" -int ovs_net_id __read_mostly; +unsigned int ovs_net_id __read_mostly; static struct genl_family dp_packet_genl_family; static struct genl_family dp_flow_genl_family; diff --git a/datapath/datapath.h b/datapath/datapath.h index 22bbaac..5e5c1c4 100644 --- a/datapath/datapath.h +++ b/datapath/datapath.h @@ -145,7 +145,7 @@ struct ovs_net { bool xt_label; }; -extern
[ovs-dev] [PATCH branch-2.7 08/25] datapath: use core MTU range checking in core net infra
From: Jarod WilsonUpstream commit: commit 61e84623ace35ce48975e8f90bbbac7557c43d61 Author: Jarod Wilson Date: Fri Oct 7 22:04:33 2016 -0400 net: centralize net_device min/max MTU checking While looking into an MTU issue with sfc, I started noticing that almost every NIC driver with an ndo_change_mtu function implemented almost exactly the same range checks, and in many cases, that was the only practical thing their ndo_change_mtu function was doing. Quite a few drivers have either 68, 64, 60 or 46 as their minimum MTU value checked, and then various sizes from 1500 to 65535 for their maximum MTU value. We can remove a whole lot of redundant code here if we simple store min_mtu and max_mtu in net_device, and check against those in net/core/dev.c's dev_set_mtu(). In theory, there should be zero functional change with this patch, it just puts the infrastructure in place. Subsequent patches will attempt to start using said infrastructure, with theoretically zero change in functionality. CC: net...@vger.kernel.org Signed-off-by: Jarod Wilson Signed-off-by: David S. Miller Upstream commit: commit 91572088e3fdbf4fe31cf397926d8b890fdb3237 Author: Jarod Wilson Date: Thu Oct 20 13:55:20 2016 -0400 net: use core MTU range checking in core net infra ... openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported ... Signed-off-by: Jarod Wilson Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Upstream commit: commit 425df17ce3a26d98f76e2b6b0af2acf4aeb0b026 Author: Jarno Rajahalme Date: Tue Feb 14 21:16:28 2017 -0800 openvswitch: Set internal device max mtu to ETH_MAX_MTU. Commit 91572088e3fd ("net: use core MTU range checking in core net infra") changed the openvswitch internal device to use the core net infra for controlling the MTU range, but failed to actually set the max_mtu as described in the commit message, which now defaults to ETH_DATA_LEN. This patch fixes this by setting max_mtu to ETH_MAX_MTU after ether_setup() call. Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Signed-off-by: Jarno Rajahalme Signed-off-by: David S. Miller This backport detects the new max_mtu field in the struct netdevice and uses the upstream code if it exists, and local backport code if not. The latter case is amended with bounds checks with new upstream macros ETH_MIN_MTU and ETH_MAX_MTU and the corresponding error messages from the upstream commit. Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- acinclude.m4 | 2 ++ datapath/linux/compat/include/linux/if_ether.h | 8 datapath/vport-internal_dev.c | 22 +++--- 3 files changed, 29 insertions(+), 3 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index e8b64b5..052a18f 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -510,6 +510,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_FIND_PARAM_IFELSE([$KSRC/include/linux/netdevice.h], [netdev_master_upper_dev_link], [upper_priv], [OVS_DEFINE([HAVE_NETDEV_MASTER_UPPER_DEV_LINK_PRIV])]) + OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netdevice.h], [net_device], +[max_mtu]) OVS_GREP_IFELSE([$KSRC/include/linux/netfilter.h], [nf_hook_state]) OVS_GREP_IFELSE([$KSRC/include/linux/netfilter.h], [nf_register_net_hook]) diff --git a/datapath/linux/compat/include/linux/if_ether.h b/datapath/linux/compat/include/linux/if_ether.h index ac0f1ed..5eb99bc 100644 --- a/datapath/linux/compat/include/linux/if_ether.h +++ b/datapath/linux/compat/include/linux/if_ether.h @@ -3,6 +3,14 @@ #include_next +#ifndef ETH_MIN_MTU +#define ETH_MIN_MTU68 /* Min IPv4 MTU per RFC791 */ +#endif + +#ifndef ETH_MAX_MTU +#define ETH_MAX_MTU0xU /* 65535, same as IP_MAX_MTU*/ +#endif + #ifndef ETH_P_802_3_MIN #define ETH_P_802_3_MIN0x0600 #endif diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c index f0542a5..2988e8c 100644 --- a/datapath/vport-internal_dev.c +++ b/datapath/vport-internal_dev.c @@ -89,14 +89,25 @@ static const struct ethtool_ops internal_dev_ethtool_ops = { .get_link = ethtool_op_get_link, }; -static int internal_dev_change_mtu(struct net_device *netdev, int new_mtu) +#ifndef HAVE_NET_DEVICE_WITH_MAX_MTU +static int internal_dev_change_mtu(struct net_device *dev, int new_mtu) { -
[ovs-dev] [PATCH branch-2.7 05/25] datapath: add NETIF_F_HW_VLAN_STAG_TX to internal dev.
From: Jiri BencUpstream commit: commit 3145c037e74926dea9241a3f68ada6f294b0119a Author: Jiri Benc Date: Mon Oct 10 17:02:44 2016 +0200 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev The internal device does support 802.1AD offloading since 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"). Signed-off-by: Jiri Benc Acked-by: Pravin B Shelar Acked-by: Eric Garver Signed-off-by: David S. Miller Upstream: 3145c037e749 ("openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev") Signed-off-by: Joe Stringer Acked-by: Jarno Rajahalme Signed-off-by: Jarno Rajahalme --- datapath/vport-internal_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c index cc01c9c..f0542a5 100644 --- a/datapath/vport-internal_dev.c +++ b/datapath/vport-internal_dev.c @@ -188,7 +188,7 @@ static void do_setup(struct net_device *netdev) netdev->vlan_features = netdev->features; netdev->hw_enc_features = netdev->features; - netdev->features |= NETIF_F_HW_VLAN_CTAG_TX; + netdev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; netdev->hw_features = netdev->features & ~NETIF_F_LLTX; eth_hw_addr_random(netdev); -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 03/25] datapath: Fix Frame-size larger than 1024 bytes warning.
From: pravin shelarUpstream commit: commit 190aa3e77880a05332ea1ccb382a51285d57adb5 Author: pravin shelar Date: Mon Sep 19 13:50:59 2016 -0700 openvswitch: Fix Frame-size larger than 1024 bytes warning. There is no need to declare separate key on stack, we can just use sw_flow->key to store the key directly. This commit fixes following warning: net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’: net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] Signed-off-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/datapath.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index ce2364a..e4089ef 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -939,7 +939,6 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) struct sw_flow_mask mask; struct sk_buff *reply; struct datapath *dp; - struct sw_flow_key key; struct sw_flow_actions *acts; struct sw_flow_match match; u32 ufid_flags = ovs_nla_get_ufid_flags(a[OVS_FLOW_ATTR_UFID_FLAGS]); @@ -967,20 +966,24 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) } /* Extract key. */ - ovs_match_init(, , ); + ovs_match_init(, _flow->key, ); error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK], log); if (error) goto err_kfree_flow; - ovs_flow_mask_key(_flow->key, , true, ); - /* Extract flow identifier. */ error = ovs_nla_get_identifier(_flow->id, a[OVS_FLOW_ATTR_UFID], - , log); + _flow->key, log); if (error) goto err_kfree_flow; + /* unmasked key is needed to match when ufid is not used. */ + if (ovs_identifier_is_key(_flow->id)) + match.key = new_flow->id.unmasked_key; + + ovs_flow_mask_key(_flow->key, _flow->key, true, ); + /* Validate actions. */ error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS], _flow->key, , log); @@ -1007,7 +1010,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) if (ovs_identifier_is_ufid(_flow->id)) flow = ovs_flow_tbl_lookup_ufid(>table, _flow->id); if (!flow) - flow = ovs_flow_tbl_lookup(>table, ); + flow = ovs_flow_tbl_lookup(>table, _flow->key); if (likely(!flow)) { rcu_assign_pointer(new_flow->sf_acts, acts); -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 02/25] datapath: use percpu flow stats
From: Thadeu Lima de Souza CascardoUpstream commit: commit db74a3335e0f645e3139c80bcfc90feb01d8e304 Author: Thadeu Lima de Souza Cascardo Date: Thu Sep 15 19:11:53 2016 -0300 openvswitch: use percpu flow stats Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo Cc: pravin shelar Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/flow.c | 42 ++ datapath/flow.h | 4 ++-- datapath/flow_table.c | 26 +- 3 files changed, 33 insertions(+), 39 deletions(-) diff --git a/datapath/flow.c b/datapath/flow.c index 6d56644..58b0e13 100644 --- a/datapath/flow.c +++ b/datapath/flow.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -71,32 +72,33 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags, { struct flow_stats *stats; int node = numa_node_id(); + int cpu = smp_processor_id(); int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0); - stats = rcu_dereference(flow->stats[node]); + stats = rcu_dereference(flow->stats[cpu]); - /* Check if already have node-specific stats. */ + /* Check if already have CPU-specific stats. */ if (likely(stats)) { spin_lock(>lock); /* Mark if we write on the pre-allocated stats. */ - if (node == 0 && unlikely(flow->stats_last_writer != node)) - flow->stats_last_writer = node; + if (cpu == 0 && unlikely(flow->stats_last_writer != cpu)) + flow->stats_last_writer = cpu; } else { stats = rcu_dereference(flow->stats[0]); /* Pre-allocated. */ spin_lock(>lock); - /* If the current NUMA-node is the only writer on the + /* If the current CPU is the only writer on the * pre-allocated stats keep using them. */ - if (unlikely(flow->stats_last_writer != node)) { + if (unlikely(flow->stats_last_writer != cpu)) { /* A previous locker may have already allocated the -* stats, so we need to check again. If node-specific +* stats, so we need to check again. If CPU-specific * stats were already allocated, we update the pre- * allocated stats as we have already locked them. */ - if (likely(flow->stats_last_writer != NUMA_NO_NODE) - && likely(!rcu_access_pointer(flow->stats[node]))) { - /* Try to allocate node-specific stats. */ + if (likely(flow->stats_last_writer != -1) && + likely(!rcu_access_pointer(flow->stats[cpu]))) { + /* Try to allocate CPU-specific stats. */ struct flow_stats *new_stats; new_stats = @@ -113,12 +115,12 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags, new_stats->tcp_flags = tcp_flags; spin_lock_init(_stats->lock); - rcu_assign_pointer(flow->stats[node], + rcu_assign_pointer(flow->stats[cpu], new_stats); goto unlock; } } - flow->stats_last_writer = node; + flow->stats_last_writer = cpu; } } @@ -135,15 +137,15 @@ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { - int node; + int cpu; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); - /* We open code this to make sure node 0 is always considered */ - for (node = 0; node < MAX_NUMNODES; node = next_node(node,
[ovs-dev] [PATCH branch-2.7 01/25] datapath: fix flow stats accounting when node 0 is not possible
From: Thadeu Lima de Souza CascardoUpstream commit: commit 40773966ccf1985a1b2bb570a03cbeaf1cbd4e00 Author: Thadeu Lima de Souza Cascardo Date: Thu Sep 15 19:11:52 2016 -0300 openvswitch: fix flow stats accounting when node 0 is not possible On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo Acked-by: Pravin B Shelar Signed-off-by: David S. Miller This patch contained a memory leak that is fixed in this backport. The next patch silently fixed that in upstream, too. Signed-off-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- datapath/flow.c | 6 -- datapath/flow_table.c | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/datapath/flow.c b/datapath/flow.c index 390286c..6d56644 100644 --- a/datapath/flow.c +++ b/datapath/flow.c @@ -141,7 +141,8 @@ void ovs_flow_stats_get(const struct sw_flow *flow, *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); - for_each_node(node) { + /* We open code this to make sure node 0 is always considered */ + for (node = 0; node < MAX_NUMNODES; node = next_node(node, node_possible_map)) { struct flow_stats *stats = rcu_dereference_ovsl(flow->stats[node]); if (stats) { @@ -164,7 +165,8 @@ void ovs_flow_stats_clear(struct sw_flow *flow) { int node; - for_each_node(node) { + /* We open code this to make sure node 0 is always considered */ + for (node = 0; node < MAX_NUMNODES; node = next_node(node, node_possible_map)) { struct flow_stats *stats = ovsl_dereference(flow->stats[node]); if (stats) { diff --git a/datapath/flow_table.c b/datapath/flow_table.c index d4204e5..3829b92 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -154,7 +154,8 @@ static void flow_free(struct sw_flow *flow) kfree(flow->id.unmasked_key); if (flow->sf_acts) ovs_nla_free_flow_actions((struct sw_flow_actions __force *)flow->sf_acts); - for_each_node(node) + /* We open code this to make sure node 0 is always considered */ + for (node = 0; node < MAX_NUMNODES; node = next_node(node, node_possible_map)) if (flow->stats[node]) kmem_cache_free(flow_stats_cache, rcu_dereference_raw(flow->stats[node])); -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH branch-2.7 00/25] Backports for branch-2.7
These are (mostly) datapath backports from master that fix existing features in branch-2.7 and/or make the datapath compilable with later Linux kernel code. Alexey Dobriyan (1): datapath: netns: make struct pernet_operations::id unsigned int. Florian Westphal (2): datapath: add and use skb_nfct helper datapath: add and use nf_ct_set helper Jarno Rajahalme (11): datapath: Allow compiling against Linux 4.10 datapath: Fix comments for skb->_nfct datapath: Use inverted tuple in ovs_ct_find_existing() if NATted. datapath: Do not trigger events for unconfirmed connections. datapath: Unionize ovs_key_ct_label with a u32 array. datapath: Simplify labels length logic. datapath: Refactor labels initialization. datapath: Inherit master's labels. ofp-util: Ignore unknown fields in ofputil_decode_packet_in2(). compat: nf_ct_delete compat. lib: Indicate if netlink message had labels. Jarod Wilson (1): datapath: use core MTU range checking in core net infra Jiri Benc (3): datapath: add NETIF_F_HW_VLAN_STAG_TX to internal dev. datapath: remove unused functions datapath: remove unnecessary EXPORT_SYMBOLs Pablo Neira Ayuso (1): datapath: handle NF_REPEAT from nf_conntrack_in() Thadeu Lima de Souza Cascardo (2): datapath: fix flow stats accounting when node 0 is not possible datapath: use percpu flow stats Yi-Hung Wei (1): nx-match: Fix oxm decode. pravin shelar (2): datapath: Fix Frame-size larger than 1024 bytes warning. datapath: avoid resetting flow key while installing new flow. stephen hemminger (1): datapath: make ndo_get_stats64 a void function acinclude.m4 | 15 +- datapath/conntrack.c | 195 ++--- datapath/datapath.c| 25 +-- datapath/datapath.h| 2 +- datapath/flow.c| 42 ++--- datapath/flow.h| 4 +- datapath/flow_netlink.c| 6 +- datapath/flow_netlink.h| 3 +- datapath/flow_table.c | 25 +-- datapath/linux/compat/include/linux/if_ether.h | 8 + datapath/linux/compat/include/linux/openvswitch.h | 8 +- datapath/linux/compat/include/linux/skbuff.h | 11 ++ .../compat/include/net/netfilter/nf_conntrack.h| 8 + .../include/net/netfilter/nf_conntrack_core.h | 58 ++ datapath/vport-internal_dev.c | 30 +++- datapath/vport-netdev.c| 1 - datapath/vport.c | 17 -- datapath/vport.h | 1 - lib/ct-dpif.h | 1 + lib/netlink-conntrack.c| 1 + lib/nx-match.c | 23 ++- lib/nx-match.h | 4 +- lib/ofp-util.c | 2 +- tests/system-traffic.at| 52 ++ 24 files changed, 378 insertions(+), 164 deletions(-) -- 2.1.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [branch-2.7 1/4] nx-match: Fix oxm decode.
IMO we should also backport the patch (“ofp-util: Ignore unknown fields in ofputil_decode_packet_in2().”) this patch fixed. jarno > On Mar 15, 2017, at 4:01 PM, Joe Stringerwrote: > > From: Yi-Hung Wei > > decode_nx_packet_in2() may be used by the switch to parse NXT_RESUME > messages, where we need exact match on the oxm header. It's also used by > OVN to parse NXT_PACKET_IN2 messages. For the switch, strict > prerequisites should be applied but for the controller, this should not > be the case. Pass the 'loose' parameter down to oxm_decode() to apply > these restrictions correctly based on which code is performing decode. > > Signed-off-by: Yi-Hung Wei > Signed-off-by: Joe Stringer > --- > lib/nx-match.c | 8 +--- > lib/nx-match.h | 4 ++-- > lib/ofp-util.c | 2 +- > 3 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/lib/nx-match.c b/lib/nx-match.c > index 91401e2201c6..c258869eec80 100644 > --- a/lib/nx-match.c > +++ b/lib/nx-match.c > @@ -678,12 +678,14 @@ oxm_pull_match_loose(struct ofpbuf *b, const struct > tun_table *tun_table, > * > * Fails with an error when encountering unknown OXM headers. > * > - * Returns 0 if successful, otherwise an OpenFlow error code. */ > + * If 'loose' is true, encountering unknown OXM headers or missing field > + * prerequisites are not considered as error conditions. > + */ > enum ofperr > -oxm_decode_match(const void *oxm, size_t oxm_len, > +oxm_decode_match(const void *oxm, size_t oxm_len, bool loose, > const struct tun_table *tun_table, struct match *match) > { > -return nx_pull_raw(oxm, oxm_len, true, match, NULL, NULL, tun_table); > +return nx_pull_raw(oxm, oxm_len, !loose, match, NULL, NULL, tun_table); > } > > /* Verify an array of OXM TLVs treating value of each TLV as a mask, > diff --git a/lib/nx-match.h b/lib/nx-match.h > index 5dca24a01a49..e103dd5fa74d 100644 > --- a/lib/nx-match.h > +++ b/lib/nx-match.h > @@ -61,8 +61,8 @@ enum ofperr oxm_pull_match(struct ofpbuf *, const struct > tun_table *, >struct match *); > enum ofperr oxm_pull_match_loose(struct ofpbuf *, const struct tun_table *, > struct match *); > -enum ofperr oxm_decode_match(const void *, size_t, const struct tun_table *, > - struct match *); > +enum ofperr oxm_decode_match(const void *, size_t, bool, > + const struct tun_table *, struct match *); > enum ofperr oxm_pull_field_array(const void *, size_t fields_len, > struct field_array *); > > diff --git a/lib/ofp-util.c b/lib/ofp-util.c > index 0c9343ec400b..d3153370f2e6 100644 > --- a/lib/ofp-util.c > +++ b/lib/ofp-util.c > @@ -3398,7 +3398,7 @@ decode_nx_packet_in2(const struct ofp_header *oh, bool > loose, > > case NXPINT_METADATA: > error = oxm_decode_match(payload.msg, ofpbuf_msgsize(), > - tun_table, >flow_metadata); > + loose, tun_table, >flow_metadata); > break; > > case NXPINT_USERDATA: > -- > 2.11.1 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [branch-2.7 3/4] ofproto: Add ref counting for variable length mf_fields.
From: Yi-Hung WeiCurrently, a controller may potentially trigger a segmentation fault if it accidentally removes a TLV mapping that is still used by an active flow. To resolve this issue, in this patch, we maintain reference counting for each dynamically allocated variable length mf_fields, so that vswitchd can use this information to properly remove a TLV mapping, and to return an error if the controller tries to remove a TLV mapping that is still used by any active flow. To keep track of the usage of tun_metadata for each flow, two 'uint64_t' bitmaps are introduce for the flow match and flow action respectively. We use 'uint64_t' as a bitmap since the 64 geneve TLV tunnel metadata are the only available variable length mf_fields for now. We shall adopt general bitmap when more variable length mf_fields are introduced. The bitmaps are configured during the flow decoding process, and vswitchd use these bitmaps to increase or decrease the ref counting when the flow is created or deleted. VMWare-BZ: #1768370 Fixes: 04f48a68c428 ("ofp-actions: Fix variable length meta-flow OXMs.") Suggested-by: Jarno Rajahalme Suggested-by: Joe Stringer Signed-off-by: Yi-Hung Wei Signed-off-by: Joe Stringer --- build-aux/extract-ofp-actions | 9 +- include/openvswitch/ofp-actions.h | 2 + include/openvswitch/ofp-errors.h | 4 + include/openvswitch/ofp-util.h| 1 + lib/learn.c | 5 + lib/meta-flow.c | 228 -- lib/ofp-actions.c | 208 +- lib/ofp-util.c| 21 ++-- lib/vl-mff-map.h | 17 ++- ofproto/ofproto-provider.h| 4 + ofproto/ofproto.c | 33 +- ovn/controller/pinctrl.c | 6 +- tests/tunnel.at | 76 - utilities/ovs-ofctl.c | 2 +- 14 files changed, 479 insertions(+), 137 deletions(-) diff --git a/build-aux/extract-ofp-actions b/build-aux/extract-ofp-actions index 184447b99422..0062ab881dd5 100755 --- a/build-aux/extract-ofp-actions +++ b/build-aux/extract-ofp-actions @@ -322,7 +322,8 @@ def extract_ofp_actions(fn, definitions): static enum ofperr ofpact_decode(const struct ofp_action_header *a, enum ofp_raw_action_type raw, enum ofp_version version, uint64_t arg, - const struct vl_mff_map *vl_mff_map, struct ofpbuf *out) + const struct vl_mff_map *vl_mff_map, + uint64_t *tlv_bitmap, struct ofpbuf *out) { switch (raw) {\ """ @@ -343,7 +344,7 @@ ofpact_decode(const struct ofp_action_header *a, enum ofp_raw_action_type raw, else: arg = "arg" if arg_vl_mff_map: -print "return decode_%s(%s, version, vl_mff_map, out);" % (enum, arg) +print "return decode_%s(%s, version, vl_mff_map, tlv_bitmap, out);" % (enum, arg) else: print "return decode_%s(%s, version, out);" % (enum, arg) print @@ -365,7 +366,7 @@ ofpact_decode(const struct ofp_action_header *a, enum ofp_raw_action_type raw, else: prototype += "%s, enum ofp_version, " % base_argtype if arg_vl_mff_map: -prototype += 'const struct vl_mff_map *, ' +prototype += 'const struct vl_mff_map *, uint64_t *, ' prototype += "struct ofpbuf *);" print prototype @@ -374,7 +375,7 @@ static enum ofperr ofpact_decode(const struct ofp_action_header *, enum ofp_raw_action_type raw, enum ofp_version version, uint64_t arg, const struct vl_mff_map *vl_mff_map, - struct ofpbuf *out); + uint64_t *tlv_bitmap, struct ofpbuf *out); """ if __name__ == '__main__': diff --git a/include/openvswitch/ofp-actions.h b/include/openvswitch/ofp-actions.h index 88f573dcd74e..214dfed3f3bd 100644 --- a/include/openvswitch/ofp-actions.h +++ b/include/openvswitch/ofp-actions.h @@ -946,12 +946,14 @@ enum ofperr ofpacts_pull_openflow_actions(struct ofpbuf *openflow, unsigned int actions_len, enum ofp_version version, const struct vl_mff_map *, + uint64_t *ofpacts_tlv_bitmap, struct ofpbuf *ofpacts); enum ofperr ofpacts_pull_openflow_instructions(struct ofpbuf *openflow, unsigned int instructions_len, enum ofp_version version,
[ovs-dev] [branch-2.7 4/4] ofproto: Move tun_table and vl_mff_map deletion.
From: Yi-Hung WeiIn this patch, we move the tun_table and vl_mff_map deletion in ofproto_destory__() to be in the following order. 1. Delete all the flows. 2. Delete vl_mff_map. 3. Delete tun_table. The rationale behind this order is that a flow may use a variable length mf_field, and a variable length mf_field is defined by a TLV mapping in tun_table. Signed-off-by: Yi-Hung Wei Signed-off-by: Joe Stringer --- ofproto/ofproto.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index b00a4af5e5c7..ceb020778d81 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -1574,14 +1574,6 @@ ofproto_destroy__(struct ofproto *ofproto) cmap_destroy(>groups); hmap_remove(_ofprotos, >hmap_node); -tun_metadata_free(ovsrcu_get_protected(struct tun_table *, - >metadata_tab)); - -ovs_mutex_lock(>vl_mff_map.mutex); -mf_vl_mff_map_clear(>vl_mff_map, true); -ovs_mutex_unlock(>vl_mff_map.mutex); -cmap_destroy(>vl_mff_map.cmap); -ovs_mutex_destroy(>vl_mff_map.mutex); free(ofproto->name); free(ofproto->type); @@ -1600,6 +1592,14 @@ ofproto_destroy__(struct ofproto *ofproto) } free(ofproto->tables); +ovs_mutex_lock(>vl_mff_map.mutex); +mf_vl_mff_map_clear(>vl_mff_map, true); +ovs_mutex_unlock(>vl_mff_map.mutex); +cmap_destroy(>vl_mff_map.cmap); +ovs_mutex_destroy(>vl_mff_map.mutex); +tun_metadata_free(ovsrcu_get_protected(struct tun_table *, + >metadata_tab)); + ovs_assert(hindex_is_empty(>cookies)); hindex_destroy(>cookies); -- 2.11.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [branch-2.7 2/4] nx-match: Use vl_mff_map to parse match field.
From: Yi-Hung Weivl_mff_map is introduced in commit 04f48a68c428 ("ofp-actions: Fix variable length meta-flow OXMs") to account variable length mf_field, and it is used to decode variable length mf_field in ofp_action. In this patch, vl_mff_map is further used to decode the variable length match field as well. Signed-off-by: Yi-Hung Wei Signed-off-by: Joe Stringer --- include/openvswitch/ofp-util.h | 6 ++-- lib/learning-switch.c | 2 +- lib/nx-match.c | 46 lib/nx-match.h | 7 ++-- lib/ofp-print.c| 4 +-- lib/ofp-util.c | 80 +- ofproto/ofproto.c | 11 +++--- ovn/controller/pinctrl.c | 2 +- tests/ofproto.at | 15 +--- utilities/ovs-ofctl.c | 13 +++ 10 files changed, 123 insertions(+), 63 deletions(-) diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h index 0c3a10aa4264..e73a942a3e15 100644 --- a/include/openvswitch/ofp-util.h +++ b/include/openvswitch/ofp-util.h @@ -222,7 +222,7 @@ void ofputil_match_to_ofp10_match(const struct match *, struct ofp10_match *); /* Work with ofp11_match. */ enum ofperr ofputil_pull_ofp11_match(struct ofpbuf *, const struct tun_table *, - struct match *, + const struct vl_mff_map *, struct match *, uint16_t *padded_match_len); enum ofperr ofputil_pull_ofp11_mask(struct ofpbuf *, struct match *, struct mf_bitmap *bm); @@ -352,7 +352,7 @@ struct ofputil_flow_stats_request { enum ofperr ofputil_decode_flow_stats_request( struct ofputil_flow_stats_request *, const struct ofp_header *, -const struct tun_table *); +const struct tun_table *, const struct vl_mff_map *); struct ofpbuf *ofputil_encode_flow_stats_request( const struct ofputil_flow_stats_request *, enum ofputil_protocol); @@ -457,6 +457,7 @@ void ofputil_packet_in_destroy(struct ofputil_packet_in *); enum ofperr ofputil_decode_packet_in(const struct ofp_header *, bool loose, const struct tun_table *, + const struct vl_mff_map *, struct ofputil_packet_in *, size_t *total_len, uint32_t *buffer_id, struct ofpbuf *continuation); @@ -509,6 +510,7 @@ struct ofpbuf *ofputil_encode_packet_in_private( enum ofperr ofputil_decode_packet_in_private( const struct ofp_header *, bool loose, const struct tun_table *, +const struct vl_mff_map *, struct ofputil_packet_in_private *, size_t *total_len, uint32_t *buffer_id); diff --git a/lib/learning-switch.c b/lib/learning-switch.c index bc757f46dd7a..77155d04fcc0 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -523,7 +523,7 @@ process_packet_in(struct lswitch *sw, const struct ofp_header *oh) struct dp_packet pkt; struct flow flow; -error = ofputil_decode_packet_in(oh, true, NULL, , NULL, +error = ofputil_decode_packet_in(oh, true, NULL, NULL, , NULL, _id, NULL); if (error) { VLOG_WARN_RL(, "failed to decode packet-in: %s", diff --git a/lib/nx-match.c b/lib/nx-match.c index c258869eec80..124cb71eb7c8 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -480,13 +480,14 @@ nx_pull_header(struct ofpbuf *b, const struct vl_mff_map *vl_mff_map, static enum ofperr nx_pull_match_entry(struct ofpbuf *b, bool allow_cookie, +const struct vl_mff_map *vl_mff_map, const struct mf_field **field, union mf_value *value, union mf_value *mask) { enum ofperr error; uint64_t header; -error = nx_pull_entry__(b, allow_cookie, NULL, , field, value, +error = nx_pull_entry__(b, allow_cookie, vl_mff_map, , field, value, mask); if (error) { return error; @@ -507,7 +508,8 @@ nx_pull_match_entry(struct ofpbuf *b, bool allow_cookie, static enum ofperr nx_pull_raw(const uint8_t *p, unsigned int match_len, bool strict, struct match *match, ovs_be64 *cookie, ovs_be64 *cookie_mask, -const struct tun_table *tun_table) +const struct tun_table *tun_table, +const struct vl_mff_map *vl_mff_map) { ovs_assert((cookie != NULL) == (cookie_mask != NULL)); @@ -525,7 +527,8 @@ nx_pull_raw(const uint8_t *p, unsigned int match_len, bool strict, union mf_value mask; enum ofperr error; -error = nx_pull_match_entry(, cookie != NULL, , , ); +error = nx_pull_match_entry(, cookie != NULL, vl_mff_map, , +, ); if
[ovs-dev] [branch-2.7 1/4] nx-match: Fix oxm decode.
From: Yi-Hung Weidecode_nx_packet_in2() may be used by the switch to parse NXT_RESUME messages, where we need exact match on the oxm header. It's also used by OVN to parse NXT_PACKET_IN2 messages. For the switch, strict prerequisites should be applied but for the controller, this should not be the case. Pass the 'loose' parameter down to oxm_decode() to apply these restrictions correctly based on which code is performing decode. Signed-off-by: Yi-Hung Wei Signed-off-by: Joe Stringer --- lib/nx-match.c | 8 +--- lib/nx-match.h | 4 ++-- lib/ofp-util.c | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/lib/nx-match.c b/lib/nx-match.c index 91401e2201c6..c258869eec80 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -678,12 +678,14 @@ oxm_pull_match_loose(struct ofpbuf *b, const struct tun_table *tun_table, * * Fails with an error when encountering unknown OXM headers. * - * Returns 0 if successful, otherwise an OpenFlow error code. */ + * If 'loose' is true, encountering unknown OXM headers or missing field + * prerequisites are not considered as error conditions. + */ enum ofperr -oxm_decode_match(const void *oxm, size_t oxm_len, +oxm_decode_match(const void *oxm, size_t oxm_len, bool loose, const struct tun_table *tun_table, struct match *match) { -return nx_pull_raw(oxm, oxm_len, true, match, NULL, NULL, tun_table); +return nx_pull_raw(oxm, oxm_len, !loose, match, NULL, NULL, tun_table); } /* Verify an array of OXM TLVs treating value of each TLV as a mask, diff --git a/lib/nx-match.h b/lib/nx-match.h index 5dca24a01a49..e103dd5fa74d 100644 --- a/lib/nx-match.h +++ b/lib/nx-match.h @@ -61,8 +61,8 @@ enum ofperr oxm_pull_match(struct ofpbuf *, const struct tun_table *, struct match *); enum ofperr oxm_pull_match_loose(struct ofpbuf *, const struct tun_table *, struct match *); -enum ofperr oxm_decode_match(const void *, size_t, const struct tun_table *, - struct match *); +enum ofperr oxm_decode_match(const void *, size_t, bool, + const struct tun_table *, struct match *); enum ofperr oxm_pull_field_array(const void *, size_t fields_len, struct field_array *); diff --git a/lib/ofp-util.c b/lib/ofp-util.c index 0c9343ec400b..d3153370f2e6 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -3398,7 +3398,7 @@ decode_nx_packet_in2(const struct ofp_header *oh, bool loose, case NXPINT_METADATA: error = oxm_decode_match(payload.msg, ofpbuf_msgsize(), - tun_table, >flow_metadata); + loose, tun_table, >flow_metadata); break; case NXPINT_USERDATA: -- 2.11.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [branch-2.7 0/4] Backport of variable length metaflow field fixes.
Commit 04f48a68c428 ("ofp-actions: Fix variable length meta-flow OXMs."), on branch-2.7 as 9554b03d6ab7, attempted to address incorrect encode and decode of variable length metaflow fields where the OXM/NXM encoding of the variable length fields would incorrectly serialize the length. The patch addresses this by introducing a new per-bridge structure that adds additional metaflow fields for the variable-length fields on demand when the TLVs are configured by a controller. Unfortunately, in the original patch there was nothing ensuring that flows referring to variable length fields would retain valid field references when controllers reconfigure the TLVs. In practice, this could lead to a crash of ovs-vswitchd by configuring a TLV field, adding a flow which refers to it, removing the TLV field, then running some traffic that hit the configured flow. This series looks to remedy the situation by reference counting the variable length fields and preventing a controller from reconfiguring TLV fields when there are active flows whose match or actions refer to the field. This series was applied to master, but given the size of the change and the minor changes necessary to apply to branch-2.7, I would feel more confident in backporting it if there was an extra round of review to ensure that nothing was missed when this series was first applied to master. Yi-Hung Wei (4): nx-match: Fix oxm decode. nx-match: Use vl_mff_map to parse match field. ofproto: Add ref counting for variable length mf_fields. ofproto: Move tun_table and vl_mff_map deletion. build-aux/extract-ofp-actions | 9 +- include/openvswitch/ofp-actions.h | 2 + include/openvswitch/ofp-errors.h | 4 + include/openvswitch/ofp-util.h| 7 +- lib/learn.c | 5 + lib/learning-switch.c | 2 +- lib/meta-flow.c | 228 -- lib/nx-match.c| 52 ++--- lib/nx-match.h| 9 +- lib/ofp-actions.c | 208 +- lib/ofp-print.c | 4 +- lib/ofp-util.c| 101 +++-- lib/vl-mff-map.h | 17 ++- ofproto/ofproto-provider.h| 4 + ofproto/ofproto.c | 58 +++--- ovn/controller/pinctrl.c | 8 +- tests/ofproto.at | 15 ++- tests/tunnel.at | 76 - utilities/ovs-ofctl.c | 15 +-- 19 files changed, 614 insertions(+), 210 deletions(-) -- 2.11.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 4/4] ofproto: Move tun_table and vl_mff_map deletion.
On 13 March 2017 at 11:28, Yi-Hung Weiwrote: > In this patch, we move the tun_table and vl_mff_map deletion in > ofproto_destory__() to be in the following order. > 1. Delete all the flows. > 2. Delete vl_mff_map. > 3. Delete tun_table. > The rationale behind this order is that a flow may use a variable length > mf_field, and a variable length mf_field is defined by a TLV mapping > in tun_table. > > Signed-off-by: Yi-Hung Wei Thanks, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 3/4] ofproto: Add ref counting for variable length mf_fields.
On 13 March 2017 at 11:28, Yi-Hung Weiwrote: > Currently, a controller may potentially trigger a segmentation fault if it > accidentally removes a TLV mapping that is still used by an active flow. > To resolve this issue, in this patch, we maintain reference counting for each > dynamically allocated variable length mf_fields, so that vswitchd can use this > information to properly remove a TLV mapping, and to return an error if the > controller tries to remove a TLV mapping that is still used by any active > flow. > > To keep track of the usage of tun_metadata for each flow, two 'uint64_t' > bitmaps are introduce for the flow match and flow action respectively. We use > 'uint64_t' as a bitmap since the 64 geneve TLV tunnel metadata are the only > available variable length mf_fields for now. We shall adopt general bitmap > when > more variable length mf_fields are introduced. The bitmaps are configured > during the flow decoding process, and vswitchd use these bitmaps to increase > or > decrease the ref counting when the flow is created or deleted. > > VMWare-BZ: #1768370 > Fixes: 04f48a68c428 ("ofp-actions: Fix variable length meta-flow OXMs.") > Suggested-by: Jarno Rajahalme > Suggested-by: Joe Stringer > Signed-off-by: Yi-Hung Wei Thanks, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 2/4] nx-match: Use vl_mff_map to parse match field.
On 13 March 2017 at 11:28, Yi-Hung Weiwrote: > vl_mff_map is introduced in commit 04f48a68c428 ("ofp-actions: Fix variable > length meta-flow OXMs") to account variable length mf_field, and it is used > to decode variable length mf_field in ofp_action. In this patch, vl_mff_map > is further used to decode the variable length match field as well. > > Signed-off-by: Yi-Hung Wei Thanks, I made some minor style changes and applied this to master ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/4] nx-match: Fix oxm decode.
On 13 March 2017 at 11:27, Yi-Hung Weiwrote: > decode_nx_packet_in2() may be used by the switch to parse NXT_RESUME messages, > where we need exact match on the oxm header. Therefore, change > oxm_decode_loose() to oxm_decode() that takes an extra argument to indicate > whether > we want strict or loose match. > > Fixes: 7befb20d0f70 ("ofp-util: Ignore unknown fields in > ofputil_decode_packet_in2()") > Signed-off-by: Yi-Hung Wei Thanks, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/7] Add support for 802.1ad (QinQ tunneling)
On Wed, Mar 1, 2017 at 2:47 PM, Eric Garverwrote: > Flow key handling changes: > - Add VLAN header array in struct flow, to record multiple 802.1q VLAN >headers. > - Add dpif multi-VLAN capability probing. If datapath supports >multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP. > > Refactor VLAN handling in dpif-xlate: > - Introduce 'xvlan' to track VLAN stack during flow processing. > - Input and output VLAN translation according to the xbundle type. > > Push VLAN action support: > - Allow ethertype 0x88a8 in VLAN headers and push_vlan action. > - Support push_vlan on dot1q packets. > > Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs > that can be matched. This allows us to preserve backwards compatibility. > > Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN > handling > > Co-authored-by: Thomas F Herbert > Signed-off-by: Thomas F Herbert > Co-authored-by: Xiao Liang > Signed-off-by: Xiao Liang > Signed-off-by: Eric Garver Thanks for working on this. And sorry it took a while to get those reviewed. The patch does not apply cleanly to master. Do you mind rebase and repost? I suppose you will need to this anyways for them to be pushed. I have a few comments for this patch, I will review this other patches more carefully after rebase. git am complained about extra while space. I think it is about the line below ofproto_set_vlan_limit() > > v2.7.0 - xx xxx > - > diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h > index df80dfe46199..0ed02e08fd2d 100644 > --- a/include/openvswitch/flow.h > +++ b/include/openvswitch/flow.h > @@ -23,7 +23,7 @@ > /* This sequence number should be incremented whenever anything involving > flows > * or the wildcarding of flows changes. This will cause build assertion > * failures in places which likely need to be updated. */ > -#define FLOW_WC_SEQ 36 > +#define FLOW_WC_SEQ 37 > > /* Number of Open vSwitch extension 32-bit registers. */ > #define FLOW_N_REGS 16 > @@ -61,6 +61,12 @@ const char *flow_tun_flag_to_string(uint32_t flags); > /* Maximum number of supported MPLS labels. */ > #define FLOW_MAX_MPLS_LABELS 3 > > +/* Maximum number of supported VLAN headers. */ > +#define FLOW_MAX_VLAN_HEADERS 2 > + > +/* Legacy maximum VLAN headers */ > +#define LEGACY_MAX_VLAN_HEADERS 1 > + > /* > * A flow in the network. > * > @@ -103,7 +109,8 @@ struct flow { > struct eth_addr dl_dst; /* Ethernet destination address. */ > struct eth_addr dl_src; /* Ethernet source address. */ > ovs_be16 dl_type; /* Ethernet frame type. */ > -ovs_be16 vlan_tci; /* If 802.1Q, TCI | VLAN_CFI; otherwise 0. */ > +uint8_t pad2[2];/* Pad to 64 bits. */ Do we need 4 more bytes for 64 bit alignment? > +void > +flow_pop_vlan(struct flow *flow, struct flow_wildcards *wc) > +{ > +int n = flow_count_vlan_headers(flow); > +if (n == 0) { > +return; > +} Can n be 0? or should this be an assert instead? > +if (wc) { > +memset(>masks.vlans[1], 0xff, > + sizeof(union flow_vlan_hdr) * (n - 1)); > +} Should we also set the mask for wc->masks.vlans[0]? > +memmove(>vlans[0], >vlans[1], > +sizeof(union flow_vlan_hdr) * (n - 1)); > +memset(>vlans[n - 1], 0, sizeof(union flow_vlan_hdr)); > +} > + > +void > +flow_push_vlan_uninit(struct flow *flow, struct flow_wildcards *wc) > +{ > +int n = flow_count_vlan_headers(flow); Do we always know there is room in the flow? > +if (wc) { > +memset(wc->masks.vlans, 0xff, sizeof(union flow_vlan_hdr) * n); > +} Should we set mask for the vlan fields that is about to be pushed? > +memmove(>vlans[1], >vlans[0], > +sizeof(union flow_vlan_hdr) * (FLOW_MAX_VLAN_HEADERS - 1)); > +memset(>vlans[0], 0, sizeof(union flow_vlan_hdr)); The function syas "_uninit", I don't think the last memset() is needed. > } > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Document OVN support in ovs-sandbox.
On Wed, Mar 15, 2017 at 4:37 AM, Numan Siddiquewrote: > On Tue, Mar 14, 2017 at 1:56 AM, Russell Bryant wrote: >> >> A previous commit removed the original ovs-sandbox based OVN tutorial >> because it became too outdated and difficult to maintain. However, >> the use of ovs-sandbox for basic OVN development and testing is incredibly >> useful, so we should provide at least basic documentation on how to use >> it. >> >> This commit introduces a new and shorter document that shows how to use >> OVN >> in ovs-sandbox. It provides a single sample configuration, as well as a >> sample ovn-trace command. >> >> Signed-off-by: Russell Bryant > > > Nice to have the basic documentation back. > > Acked-by: Numan Siddique Thanks, I applied this to master. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] lib/automake.mk: remove runtime directories
Markos Chandraswrites: > Hi Aaron, > > On 03/09/2017 03:35 PM, Aaron Conole wrote: >> The Open vSwitch run, log, and DB directories are installed as part of the >> normal `make install` process. However, this means they are created with >> user and group ownership that may conflict with the desired user. For >> example, running `make install` as root will install those files as >> root:root, whereas the runtime user desired may be openvswitch:openvswitch. >> >> Since these directories are automatically created as part of the ovs-ctl >> command, and with the correct user:group permissions, it makes sense to >> delay creation until these directories are actually required. >> >> Signed-off-by: Aaron Conole > > It looks reasonable to me. Thanks! > > Reviewed-by: Markos Chandras Thanks Markos. I'm planning on submitting this as a full PATCH once I've double checked that it doesn't break any of my existing test scripts and rpm builds. -Aaron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN: Compromised Chassis Mitigation
On Wed, Mar 15, 2017 at 7:18 AM, Lance Richardsonwrote: > > From: "Mickey Spiegel" > > To: "Lance Richardson" > > Cc: "Russell Bryant" , "devovs" > > Sent: Tuesday, March 14, 2017 3:06:53 PM > > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation > > > > Hi Mickey, > > Thanks for the excellent feedback. Here's the latest pass: > Thanks for driving this. All of this looks good to me :-) I hope others can provide some feedback on the question at the bottom. Mickey > 1) Add a new column, "role", of type "string" to the remote connection >table. If set, role-based access control is applied to transactions >on these connections using "role" as the index to the "RBAC_Roles" >table. If not set, role-based access control is not applied (e.g. >local unix: remotes between northd and ovsdb will not require RBAC >and will therefore not set the "role" column). > >For connections having role-based ACLs enabled, a reliable client ID >is required. This will require the use of SSL and client certificates >with CN field containing the client ID. > > 2) Add a new table, "RBAC_Roles", which is indexed by a role name >and contains two columns: > "name":Name of role, type string. Corresponds to "role" > column in remote connection table. > "permissions": A map of string (table name) to UUID (row in the > "RBAC_Permissions" table). > >The purpose of this table is to select a row in the RBAC_Permissions >table based on the transaction client's "role" and the name of a >table to be modified by an operation within a transaction. Having >this level of indirection allows new roles and access controls to >be created and managed dynamically, without having to update code >or schemas. > > 3) Add a new table, "RBAC_Permissions" which is initialized to contain one >row for each table in the schema that can be modified by ovn-controller. >Each row contains: > > - An "authorization" column containing a set of "string" type, where > each string is the name of a column (or column:key) whose contents > are to be compared against the ID of client attempting the > transaction > (CN field from client certificate). If this set is empty, all IDs > are > considered to be authorized. If this set contains more than one > string, > at least one must contain the client ID in order for the action to > be > considered authorized. > > - An "insert_delete" column of type boolean. If true, insertions > are allowed by any client and deletions are allowed for rows > meeting the authorization requirement. > > - An "update" column of type "set of strings". Each string is the > name of a column (or column:key) for which modification is allowed > in rows meeting the authorization requirement. > > For the current implementation of the OVN_Southbound schema, these tables > would have the following contents: > > RBAC_Roles: > name:"controller" > permissions: "Chassis": , > "Encap": , > "Port_Binding": RBAC_Permissions>, > "MAC_Binding": RBAC_Permissions> > > RBAC_Permissions: >Chassis row: > authorization: "chassis" > insert_delete: "true" > update:"nb_cfg", "external_ids", "encaps", > "vtep_logical_switches" > Modification of these columns is allowed for rows > which in > which the authorization check passes. > >Encap row: > authorization: "chassis" New column containing CN ID of row creator. > insert_delete: "true" > update:"type", "options", "ip" > >Port_Binding row: > authorization: "" All chassis are authorized. In a recent > live migration proposal, this column would contain > "options:chassis" and "options:migration- > destination". > insert_delete: "false" > update:"chassis" > >MAC_Binding row: > authorization: "" All chassis are allowed to update/delete any row. > Long- term plan would be to eliminate this table. > insert_delete: "true" > update: "logical_port", "ip", "mac", "datapath" > > 4) Implementation logic: >- Extract CN ID from client certificate and store with session state > when remote connection is established. >- Modify ovsdb_execute_{insert,delete,update,mutate} and related > functions > to pass the client session state (need to know whether RBAC is enabled > for the session and client ID for the session). >- Modify ovsdb_execute_{insert,delete,update,mutate} to apply access > controls. >- Logging, statistics. >- Unit tests. >- Documentation. > >
Re: [ovs-dev] [ dev-openvswitch ] sflow configuration
On Wed, Mar 15, 2017 at 02:52:19PM +0100, nicolas prochazka wrote: > Hello, > How can i configure agentSubId in sflow configuration . It doesn't look like OVS supports configuring the subid. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN: Compromised Chassis Mitigation
> From: "Mickey Spiegel"> To: "Lance Richardson" > Cc: "Russell Bryant" , "devovs" > Sent: Tuesday, March 14, 2017 3:06:53 PM > Subject: Re: [ovs-dev] OVN: Compromised Chassis Mitigation > Hi Mickey, Thanks for the excellent feedback. Here's the latest pass: 1) Add a new column, "role", of type "string" to the remote connection table. If set, role-based access control is applied to transactions on these connections using "role" as the index to the "RBAC_Roles" table. If not set, role-based access control is not applied (e.g. local unix: remotes between northd and ovsdb will not require RBAC and will therefore not set the "role" column). For connections having role-based ACLs enabled, a reliable client ID is required. This will require the use of SSL and client certificates with CN field containing the client ID. 2) Add a new table, "RBAC_Roles", which is indexed by a role name and contains two columns: "name":Name of role, type string. Corresponds to "role" column in remote connection table. "permissions": A map of string (table name) to UUID (row in the "RBAC_Permissions" table). The purpose of this table is to select a row in the RBAC_Permissions table based on the transaction client's "role" and the name of a table to be modified by an operation within a transaction. Having this level of indirection allows new roles and access controls to be created and managed dynamically, without having to update code or schemas. 3) Add a new table, "RBAC_Permissions" which is initialized to contain one row for each table in the schema that can be modified by ovn-controller. Each row contains: - An "authorization" column containing a set of "string" type, where each string is the name of a column (or column:key) whose contents are to be compared against the ID of client attempting the transaction (CN field from client certificate). If this set is empty, all IDs are considered to be authorized. If this set contains more than one string, at least one must contain the client ID in order for the action to be considered authorized. - An "insert_delete" column of type boolean. If true, insertions are allowed by any client and deletions are allowed for rows meeting the authorization requirement. - An "update" column of type "set of strings". Each string is the name of a column (or column:key) for which modification is allowed in rows meeting the authorization requirement. For the current implementation of the OVN_Southbound schema, these tables would have the following contents: RBAC_Roles: name:"controller" permissions: "Chassis": , "Encap": , "Port_Binding":, "MAC_Binding": RBAC_Permissions: Chassis row: authorization: "chassis" insert_delete: "true" update:"nb_cfg", "external_ids", "encaps", "vtep_logical_switches" Modification of these columns is allowed for rows which in which the authorization check passes. Encap row: authorization: "chassis" New column containing CN ID of row creator. insert_delete: "true" update:"type", "options", "ip" Port_Binding row: authorization: "" All chassis are authorized. In a recent live migration proposal, this column would contain "options:chassis" and "options:migration-destination". insert_delete: "false" update:"chassis" MAC_Binding row: authorization: "" All chassis are allowed to update/delete any row. Long- term plan would be to eliminate this table. insert_delete: "true" update: "logical_port", "ip", "mac", "datapath" 4) Implementation logic: - Extract CN ID from client certificate and store with session state when remote connection is established. - Modify ovsdb_execute_{insert,delete,update,mutate} and related functions to pass the client session state (need to know whether RBAC is enabled for the session and client ID for the session). - Modify ovsdb_execute_{insert,delete,update,mutate} to apply access controls. - Logging, statistics. - Unit tests. - Documentation. Open questions: 1) Is "list of columns/column:keys" with implied boolean OR sufficient, or do we need something more powerful (e.g. a mini-language supporting comparison/boolean/set/... operations). ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ dev-openvswitch ] sflow configuration
Hello, How can i configure agentSubId in sflow configuration . [nicolas-kabylake-8259511d-ffc3-3191-1428-f44d30684975]ovs-vsctl list sflow _uuid : 72794c6b-4b70-46b5-887f-31e622684c38 agent : neoEtap external_ids: {} header : 128 polling : 10 sampling: 100 targets : ["10.10.0.71:6343", "172.16.10.3:6343"] [nicolas-kabylake-8259511d-ffc3-3191-1428-f44d30684975]cat /data/ovs.sh #!/bin/bash COLLECTOR_IP=[\"172.16.10.3:6343\",\"10.10.0.71:6343\"] COLLECTOR_PORT=7001 AGENT=neoEtap HEADER=128 SAMPLING=100 POLLING=10 BRIDGE_NAME=switch0 ovs-vsctl -- --id=@sflow1 create sflow agent=${AGENT} \ target=${COLLECTOR_IP} header=${HEADER} \ sampling=${SAMPLING} polling=${POLLING} \ -- set bridge ${BRIDGE_NAME} sflow=@sflow1 agentSubId does not seems in sflow table, however we can see it in sflow code from ovs. Regards, Nicolas Prochazka ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Call DR JAMES ADAMS No + 27 748 044 206 IRREVOCABLE PAYMENT ORDER VIA ATM CARD
___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: add dpdk interface strip_vlan option
dpdk-strip-vlan option specifies whether strip vlan for the dpdk interface. Signed-off-by: Zang MingJie--- lib/netdev-dpdk.c| 23 ++- vswitchd/vswitch.xml | 7 +++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ddc651bf9..ea49adf3e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -373,6 +373,7 @@ struct netdev_dpdk { int requested_n_rxq; int requested_rxq_size; int requested_txq_size; +bool requested_strip_vlan; /* Number of rx/tx descriptors for physical devices */ int rxq_size; @@ -395,6 +396,8 @@ struct netdev_dpdk { /* DPDK-ETH hardware offload features, * from the enum set 'dpdk_hw_ol_features' */ uint32_t hw_ol_features; + +bool strip_vlan; }; struct netdev_rxq_dpdk { @@ -646,6 +649,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) conf.rxmode.jumbo_frame = 0; conf.rxmode.max_rx_pkt_len = 0; } +conf.rxmode.hw_vlan_strip = dev->strip_vlan; conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0; /* A device may report more queues than it makes available (this has @@ -1133,6 +1137,19 @@ netdev_dpdk_process_devargs(const char *devargs, char **errp) } static void +dpdk_set_strip_vlan_config(struct netdev_dpdk *dev, const struct smap *args) +OVS_REQUIRES(dev->mutex) +{ +bool strip_vlan; + +strip_vlan = smap_get_bool(args, "dpdk-strip-vlan", false); +if (strip_vlan != dev->requested_strip_vlan) { +dev->requested_strip_vlan = strip_vlan; +netdev_request_reconfigure(>up); +} +} + +static void dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) OVS_REQUIRES(dev->mutex) { @@ -1182,6 +1199,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, ovs_mutex_lock(>mutex); dpdk_set_rxq_config(dev, args); +dpdk_set_strip_vlan_config(dev, args); dpdk_process_queue_size(netdev, args, "n_rxq_desc", NIC_PORT_DEFAULT_RXQ_SIZE, @@ -3123,7 +3141,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) && dev->mtu == dev->requested_mtu && dev->rxq_size == dev->requested_rxq_size && dev->txq_size == dev->requested_txq_size -&& dev->socket_id == dev->requested_socket_id) { +&& dev->socket_id == dev->requested_socket_id +&& dev->strip_vlan == dev->requested_strip_vlan) { /* Reconfiguration is unnecessary */ goto out; @@ -3145,6 +3164,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) dev->rxq_size = dev->requested_rxq_size; dev->txq_size = dev->requested_txq_size; +dev->strip_vlan = dev->requested_strip_vlan; + rte_free(dev->tx_q); err = dpdk_eth_dev_init(dev); dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index a91be59b0..5c0083188 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2360,6 +2360,13 @@ + + + Specifies whether strip vlan for the dpdk interface. It is useful + when using ixgbevf interface with a vlan filter. + + + -- 2.11.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Document OVN support in ovs-sandbox.
On Tue, Mar 14, 2017 at 1:56 AM, Russell Bryantwrote: > A previous commit removed the original ovs-sandbox based OVN tutorial > because it became too outdated and difficult to maintain. However, > the use of ovs-sandbox for basic OVN development and testing is incredibly > useful, so we should provide at least basic documentation on how to use it. > > This commit introduces a new and shorter document that shows how to use OVN > in ovs-sandbox. It provides a single sample configuration, as well as a > sample ovn-trace command. > > Signed-off-by: Russell Bryant > Nice to have the basic documentation back. Acked-by: Numan Siddique --- > Documentation/automake.mk | 1 + > Documentation/index.rst | 1 + > Documentation/tutorials/index.rst | 1 + > Documentation/tutorials/ovn-sandbox.rst | 178 > > tutorial/automake.mk| 3 +- > tutorial/ovn-setup.sh | 27 + > 6 files changed, 210 insertions(+), 1 deletion(-) > create mode 100644 Documentation/tutorials/ovn-sandbox.rst > create mode 100755 tutorial/ovn-setup.sh > > diff --git a/Documentation/automake.mk b/Documentation/automake.mk > index 22a614b..c4076aa 100644 > --- a/Documentation/automake.mk > +++ b/Documentation/automake.mk > @@ -24,6 +24,7 @@ EXTRA_DIST += \ > Documentation/intro/install/xenserver.rst \ > Documentation/tutorials/index.rst \ > Documentation/tutorials/ovs-advanced.rst \ > + Documentation/tutorials/ovn-sandbox.rst \ > Documentation/topics/index.rst \ > Documentation/topics/bonding.rst \ > Documentation/topics/datapath.rst \ > diff --git a/Documentation/index.rst b/Documentation/index.rst > index 9f692b1..6970dce 100644 > --- a/Documentation/index.rst > +++ b/Documentation/index.rst > @@ -61,6 +61,7 @@ vSwitch? Start here. >:doc:`intro/install/dpdk` > > - **Tutorials:** :doc:`tutorials/ovs-advanced` | > + :doc:`tutorials/ovn-sandbox` > > Deeper Dive > --- > diff --git a/Documentation/tutorials/index.rst b/Documentation/tutorials/ > index.rst > index 8a7e6ee..4ba99c0 100644 > --- a/Documentation/tutorials/index.rst > +++ b/Documentation/tutorials/index.rst > @@ -40,3 +40,4 @@ vSwitch. > :maxdepth: 2 > > ovs-advanced > + ovn-sandbox > diff --git a/Documentation/tutorials/ovn-sandbox.rst > b/Documentation/tutorials/ovn-sandbox.rst > new file mode 100644 > index 000..d47fcf5 > --- /dev/null > +++ b/Documentation/tutorials/ovn-sandbox.rst > @@ -0,0 +1,178 @@ > +.. > + Licensed under the Apache License, Version 2.0 (the "License"); you > may > + not use this file except in compliance with the License. You may > obtain > + a copy of the License at > + > + http://www.apache.org/licenses/LICENSE-2.0 > + > + Unless required by applicable law or agreed to in writing, software > + distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the > + License for the specific language governing permissions and > limitations > + under the License. > + > + Convention for heading levels in Open vSwitch documentation: > + > + === Heading 0 (reserved for the title in a document) > + --- Heading 1 > + ~~~ Heading 2 > + +++ Heading 3 > + ''' Heading 4 > + > + Avoid deeper levels because they do not render well. > + > +=== > +OVN Sandbox > +=== > + > +This tutorial shows you how to explore features using ``ovs-sandbox`` as a > +simulated test environment. It's assumed that you have an understanding > of OVS > +before going through this tutorial. Detail about OVN is covered in > +ovn-architecture_, but this tutorial lets you quickly see it in action. > + > +Getting Started > +--- > + > +For some general information about ``ovs-sandbox``, see the "Getting > Started" > +section of the tutorial_. > + > +``ovs-sandbox`` does not include OVN support by default. To enable OVN, > you > +must pass the ``--ovn`` flag. For example, if running it straight from > the ovs > +git tree you would run:: > + > +$ make sandbox SANDBOXFLAGS="--ovn" > + > +Running the sandbox with OVN enabled does the following additional steps > to the > +environment: > + > +1. Creates the ``OVN_Northbound`` and ``OVN_Southbound`` databases as > described in > + `ovn-nb(5)`_ and `ovn-sb(5)`_. > + > +2. Creates a backup server for ``OVN_Southbond`` database. Sandbox launch > + screen provides the instructions on accessing the backup database. > However > + access to the backup server is not required to go through the tutorial. > + > +3. Creates the ``hardware_vtep`` database as described in `vtep(5)`_. > + > +4. Runs the `ovn-northd(8)`_, `ovn-controller(8)`_, and > + `ovn-controller-vtep(8)`_
Re: [ovs-dev] [PATCH] ofproto: Add appctl command to show Datapath features
On Tue, Mar 14, 2017 at 6:19 PM, Joe Stringerwrote: > On 13 March 2017 at 14:21, Andy Zhou wrote: >> Exporting Datapath runtime detected features can be useful for >> both debugging and for writing system unit testing easier. >> >> Signed-off-by: Andy Zhou > > Can we perform some kind of build-time check how many fields are in > 'support' and ensure that this function is extended when people add > new probes? > That would be nice. I don't have any concrete at the moment. > Looks otherwise trivial and straightforward (and useful!), thanks. > Couple of minor comments below. > > Acked-by: Joe Stringer Thanks for the review. Pushed. > >> --- >> ofproto/ofproto-dpif.c | 53 >> +- >> 1 file changed, 52 insertions(+), 1 deletion(-) >> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c >> index df779c2..af70ab3 100644 >> --- a/ofproto/ofproto-dpif.c >> +++ b/ofproto/ofproto-dpif.c >> @@ -4891,6 +4891,38 @@ ofproto_unixctl_dpif_dump_dps(struct unixctl_conn >> *conn, int argc OVS_UNUSED, >> } >> >> static void >> +show_dp_feature_b(struct ds *ds, const char *feature, bool b) >> +{ >> +ds_put_format(ds, "%s: %s\n", feature, b ? "Yes" : "No"); >> +} >> + >> +static void >> +show_dp_feature_s(struct ds *ds, const char *feature, size_t s) >> +{ >> +ds_put_format(ds, "%s: %"PRIuSIZE"\n", feature, s); >> +} >> + >> +static void >> +dpif_show_support(const struct dpif_backer_support *support, struct ds *ds) >> +{ >> +show_dp_feature_b(ds, "Variable length userdata", >> + support->variable_length_userdata); >> +show_dp_feature_b(ds, "Masked set action", support->masked_set_action); >> +show_dp_feature_b(ds, "Tunnel push pop", support->tnl_push_pop); >> +show_dp_feature_b(ds, "Ufid", support->ufid); >> +show_dp_feature_b(ds, "Trunc action", support->trunc); >> +show_dp_feature_b(ds, "Clone action", support->clone); >> +show_dp_feature_s(ds, "Max MPLS depth",support->odp.max_mpls_depth); >> +show_dp_feature_b(ds, "Recirc",support->odp.recirc); >> +show_dp_feature_b(ds, "CT state", support->odp.ct_state); >> +show_dp_feature_b(ds, "CT zone", support->odp.ct_zone); >> +show_dp_feature_b(ds, "CT mark", support->odp.ct_mark); >> +show_dp_feature_b(ds, "CT label", support->odp.ct_label); >> +show_dp_feature_b(ds, "CT State NAT", support->odp.ct_state_nat); >> +show_dp_feature_s(ds, "Max sample nesting",support->sample_nesting); > > Needs an extra whitespace on this last line, before support->sample_nesting > O.K. Fixed. >> +} >> + >> +static void >> dpif_show_backer(const struct dpif_backer *backer, struct ds *ds) >> { >> const struct shash_node **ofprotos; >> @@ -4899,7 +4931,6 @@ dpif_show_backer(const struct dpif_backer *backer, >> struct ds *ds) >> size_t i; >> >> dpif_get_dp_stats(backer->dpif, _stats); >> - >> ds_put_format(ds, "%s: hit:%"PRIu64" missed:%"PRIu64"\n", >>dpif_name(backer->dpif), dp_stats.n_hit, >> dp_stats.n_missed); >> >> @@ -5119,6 +5150,24 @@ disable_datapath_clone(struct unixctl_conn *conn >> OVS_UNUSED, >> } >> >> static void >> +ofproto_unixctl_dpif_show_dp_features(struct unixctl_conn *conn, >> + int argc, const char *argv[], >> + void *aux OVS_UNUSED) >> +{ >> +struct ds ds = DS_EMPTY_INITIALIZER; >> +const char *br = argv[argc -1]; >> +struct ofproto_dpif *ofproto = ofproto_dpif_lookup(br); >> + >> +if (!ofproto) { >> +unixctl_command_reply_error(conn, "no such bridge"); >> +return; >> +} >> + >> +dpif_show_support(>backer->support, ); >> +unixctl_command_reply(conn, ds_cstr()); >> +} >> + >> +static void >> ofproto_unixctl_init(void) >> { >> static bool registered; >> @@ -5139,6 +5188,8 @@ ofproto_unixctl_init(void) >> ofproto_unixctl_dpif_dump_dps, NULL); >> unixctl_command_register("dpif/show", "", 0, 0, >> ofproto_unixctl_dpif_show, >> NULL); >> +unixctl_command_register("dpif/show-dp-features", "bridge", 1, 1, >> + ofproto_unixctl_dpif_show_dp_features, NULL); >> unixctl_command_register("dpif/dump-flows", "[-m] bridge", 1, 2, >> ofproto_unixctl_dpif_dump_flows, NULL); >> >> -- >> 1.8.3.1 >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovn-northd: Skip icmp4 packets destined for router ports from conntrack
Adding Joe and Jarno to CC. On Tue, Mar 14, 2017 at 9:01 PM, Lance Richardsonwrote: > > > - Original Message - > > From: "Numan Siddique" > > To: "Russell Bryant" > > Cc: "ovs dev" > > Sent: Tuesday, March 14, 2017 11:21:33 AM > > Subject: Re: [ovs-dev] [PATCH] ovn-northd: Skip icmp4 packets destined > for router ports from conntrack > > > > On Tue, Mar 14, 2017 at 12:57 AM, Russell Bryant > wrote: > > > > > On Fri, Mar 10, 2017 at 4:48 PM, Russell Bryant > wrote: > > > > On Fri, Mar 10, 2017 at 2:35 PM, Russell Bryant > wrote: > > > >> On Thu, Mar 9, 2017 at 11:52 PM, Numan Siddique < > nusid...@redhat.com> > > > wrote: > > > >> I don't think it's a Neutron issue. > > > >> > > > >> I see the conntrack entry remaining in the UNREPLIED state, even in > > > >> the working case where there's not an ACL dropping the reply. > > > >> > > > >> icmp 1 29 src=10.0.0.10 dst=10.0.0.1 type=8 code=0 id=25857 > > > >> [UNREPLIED] src=10.0.0.1 dst=10.0.0.10 type=0 code=0 id=25857 mark=0 > > > >> zone=8 use=1 > > > >> > > > >> If I ping a different address (something past the logical router), I > > > >> see a proper conntrack entry that's not in the UNREPLIED state. > > > >> > > > >> I wonder if there's something about how we are generating the ICMP > > > >> response from the logical router that's making conntrack not > properly > > > >> associate it with the request? > > > > > > > > I checked into this and there's no meaningful difference in how we > > > > form the ICMP reply. > > > > > > > > I'm guessing this has to do with the request and reply both going > > > > through conntrack as a part of processing the same packet in the OVS > > > > data path. That's not behaving how we would expect. I'll keep > > > > looking next week to try to identify the root cause here, but I would > > > > appreciate any insight others may have about the behavior we should > > > > expect in this scenario. > > > > > > I'm able to reproduce this outside of OpenStack. > > > > > > https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e > > > > > > This creates an OVN setup with two logical switches connected by a > > > logical router. > > > > > > switch d47412f9-e64a-4734-be26-80ee71ded805 (sw1) > > > port sw1-port1 > > > addresses: ["50:54:00:00:00:03 11.0.0.2"] > > > switch a1730d73-ccab-4f00-b748-3cafb683e9b8 (sw0) > > > port sw0-port2 > > > addresses: ["50:54:00:00:00:02 192.168.0.3"] > > > port sw0-port1 > > > addresses: ["50:54:00:00:00:01 192.168.0.2"] > > > router 193f68cd-4a93-4a04-ad3b-3ddf7b5c66f3 (lr0) > > > > > > The ping commands at the end demonstrate the problem. My expectation > > > is that the very last ping command should be successful, but is not > > > due to the issue we're seeing here. > > > > > > > > > > This issue seems to be fixed with the below code diff. From my > observation, > > I could see that when the router datapath generates the icmp response and > > when the reply packet hits the ovs_ct_execute function in datapath > > - the "sw_flow_key - key" param has old conntrack values > > - in the function "__ovs_ct_lookup", skb_nfct_cached(net, key, info, > skb) > > returns false. > > - and the state is not set with the flag- OVS_CS_F_ESTABLISHED. > > > > > > I want to check, if this is the right fix and get your feedback before > > sending the upstream patch to net-next. > > > > Hi Numan, > > Hopefully folks more familiar with this code than me will weigh in, but > your patch seems to make sense. > > Since this is a bug fix, if you do wind up submitting it upstream it > should go to "net" instead of "net-next" (with a "Fixes:" tag). > > Regards, > >Lance > > Thanks Lance for your comments. > > > > >< > > diff --git a/datapath/flow.c b/datapath/flow.c > > index 2bc1ad0..2c59acd 100644 > > --- a/datapath/flow.c > > +++ b/datapath/flow.c > > @@ -845,6 +845,8 @@ int ovs_flow_key_extract_userspace(struct net *net, > > const struct nlattr *attr, > > if (err) > > return err; > > > > + ovs_ct_fill_key(skb, key); > > + > > /* Check that we have conntrack original direction tuple metadata > > only > > * for packets for which it makes sense. Otherwise the key may > be > > * corrupted due to overlapping key fields. > > > > >< > > > > Thanks > > Numan > > > > > > > > > > > > > -- > > > Russell Bryant > > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev