[ovs-dev] [PATCH v4] Adding support for PMD auto load balancing

2019-01-10 Thread Nitin Katiyar
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.

Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.

This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.

If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.

If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.

The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.

Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebal-interval="5"

Co-authored-by: Rohith Basavaraja 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Rohith Basavaraja 
Signed-off-by: Venkatesan Pradeep 
Signed-off-by: Nitin Katiyar 
---
 Documentation/topics/dpdk/pmd.rst |  41 +
 NEWS  |   1 +
 lib/dpif-netdev.c | 378 ++
 vswitchd/vswitch.xml  |  41 +
 4 files changed, 461 insertions(+)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index dd9172d..c273b40 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -183,3 +183,44 @@ or can be triggered by using::
In addition, the output of ``pmd-rxq-show`` was modified to include
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
+
+Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
+---
+
+Cycle or utilization based allocation of Rx queues to PMDs gives efficient
+load distribution but it is not adaptive to change in traffic pattern occuring
+over the time. This causes uneven load among the PMDs which results in overall
+lower throughput.
+
+To address this automatic load balancing of PMDs can be set by::
+
+$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
+
+If pmd-auto-lb is set to true AND cycle based assignment is enabled then auto
+load balancing of PMDs is enabled provided there are 2 or more non-isolated
+PMDs and at least one of these PMDs is polling more than one RX queue.
+
+Once auto load balancing is set, each non-isolated PMD measures the processing
+load for each of its associated queues every 10 seconds. If the aggregated PMD
+load reaches 95% for 6 consecutive intervals then PMD considers itself to be
+overloaded.
+
+If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
+performed by OVS main thread. The dry-run does NOT change the existing queue
+to PMD assignments.
+
+If the resultant mapping of dry-run indicates an improved distribution of the
+load then the actual reassignment will be performed.
+
+The minimum time between 2 consecutive PMD auto load balancing iterations can
+also be configured by::
+
+$ ovs-vsctl set open_vswitch .\
+other_config:pmd-auto-lb-rebal-interval=""
+
+where  is a value in minutes. The default interval is 1 minute
+and setting it to 0 will also result in default value i.e. 1 min.
+
+A user can use this option to avoid frequent trigger of auto load balancing of
+PMDs. For e.g. set this (in min) such that it occurs once in few hours or a day
+or a week.
diff --git a/NEWS b/NEWS
index 2de844f..0e9fcb1 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.10.0
  * Add option for simple round-robin based Rxq to PMD assignment.
It can be set with pmd-rxq-assign.
  * Add support for DPDK 18.11
+ * Add support for Auto load balancing of PMDs (experimental)
- Add 'symmetric_l3' hash function.
- OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
- ovs-vswitchd:
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1564db9..81e9cbf 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -80,6 +80,12 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
+/* Auto Load Balancing Defaults */
+#define ALB_ACCEPTABLE_IMPROVEMENT   25
+#define ALB_PMD_LOAD_THRESHOLD   95
+#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define MIN_TO_MSEC  6
+
 #define FLOW_DUMP_MA

Re: [ovs-dev] odp-util: Do not rewrite fields with the same values as matched

2019-01-10 Thread 0-day Robot
Bleep bloop.  Greetings Eli Britstein, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Inappropriate bracing around statement
#49 FILE: lib/odp-util.c:7113:
if (size == 0)

ERROR: Inappropriate bracing around statement
#52 FILE: lib/odp-util.c:7116:
if (memcmp(pkey0, pkey1, size) == 0)

ERROR: Inappropriate bracing around statement
#54 FILE: lib/odp-util.c:7118:
else

WARNING: Line is 90 characters long (recommended limit is 79)
#77 FILE: lib/odp-util.c:7174:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ethernet, name), 
sizeof(type)},

WARNING: Line is 86 characters long (recommended limit is 79)
#100 FILE: lib/odp-util.c:7317:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv4, name), 
sizeof(type)},

WARNING: Line is 86 characters long (recommended limit is 79)
#122 FILE: lib/odp-util.c:7378:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv6, name), 
sizeof(type)},

WARNING: Line is 85 characters long (recommended limit is 79)
#144 FILE: lib/odp-util.c:7438:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_arp, name), 
sizeof(type)},

WARNING: Line is 86 characters long (recommended limit is 79)
#165 FILE: lib/odp-util.c:7479:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_icmp, name), 
sizeof(type)},

WARNING: Line is 84 characters long (recommended limit is 79)
#188 FILE: lib/odp-util.c:7532:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_nd, name), 
sizeof(type)},

WARNING: Line is 85 characters long (recommended limit is 79)
#209 FILE: lib/odp-util.c:7751:
#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_tcp, name), 
sizeof(type)},

Lines checked: 345, Warnings: 7, Errors: 3


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] datapath: Declare ovs key structures using macros

2019-01-10 Thread 0-day Robot
Bleep bloop.  Greetings Eli Britstein, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 122 characters long (recommended limit is 79)
#25 FILE: build-aux/extract-odp-netlink-h:45:
s/OVS_KEY_FIELD_ARR(__u8,[[:space:]]*\([a-zA-Z0-9_]*\),[[:space:]]*ETH_ALEN[[:space:]]*/OVS_KEY_FIELD(struct
 eth_addr, \1/

WARNING: Line is 134 characters long (recommended limit is 79)
#33 FILE: build-aux/extract-odp-netlink-h:53:
s/OVS_KEY_FIELD_ARR(__be32,[[:space:]]*\([a-zA-Z0-9_]\{2,\}\),[[:space:]]*[[:space:]]*4[[:space:]]*/OVS_KEY_FIELD(struct
 in6_addr, \1/

Lines checked: 187, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 2/2] odp-util: Do not rewrite fields with the same values as matched

2019-01-10 Thread Eli Britstein
To improve performance and avoid wasting resources for HW offloaded
flows, do not rewrite fields that are matched with the same value.

Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
---
 lib/odp-util.c| 109 +-
 tests/mpls-xlate.at   |   2 +-
 tests/ofproto-dpif.at |  14 +++
 3 files changed, 107 insertions(+), 18 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 0491bed38..15aebd046 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -7086,12 +7086,49 @@ commit_odp_tunnel_action(const struct flow *flow, 
struct flow *base,
 }
 }
 
+struct ovs_key_field_properties {
+int offset;
+int size;
+};
+
+/* Compare the keys similary to memcmp, but each field separately.
+ * The offsets and sizes of each field is provided by field_properties
+ * argument.
+ * For fields with the same value, zero out their mask part in order not to
+ * rewrite them as it's unnecessary */
+static bool
+keycmp_mask(const void *key0, const void *key1,
+struct ovs_key_field_properties *field_properties, void *mask)
+{
+int field;
+bool differ = false;
+
+for (field = 0 ; ; field++) {
+int size = field_properties[field].size;
+int offset = field_properties[field].offset;
+char *pkey0 = ((char *)key0) + offset;
+char *pkey1 = ((char *)key1) + offset;
+char *pmask = ((char *)mask) + offset;
+
+if (size == 0)
+break;
+
+if (memcmp(pkey0, pkey1, size) == 0)
+memset(pmask, 0, size);
+else
+differ = true;
+}
+
+return differ;
+}
+
 static bool
 commit(enum ovs_key_attr attr, bool use_masked_set,
const void *key, void *base, void *mask, size_t size,
+   struct ovs_key_field_properties *field_properties,
struct ofpbuf *odp_actions)
 {
-if (memcmp(key, base, size)) {
+if (keycmp_mask(key, base, field_properties, mask)) {
 bool fully_masked = odp_mask_is_exact(attr, mask, size);
 
 if (use_masked_set && !fully_masked) {
@@ -7133,6 +7170,12 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 bool use_masked)
 {
 struct ovs_key_ethernet key, base, mask;
+struct ovs_key_field_properties ovs_key_ethernet_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ethernet, name), 
sizeof(type)},
+OVS_KEY_ETHERNET_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 if (flow->packet_type != htonl(PT_ETH)) {
 return;
@@ -7143,7 +7186,8 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 get_ethernet_key(&wc->masks, &mask);
 
 if (commit(OVS_KEY_ATTR_ETHERNET, use_masked,
-   &key, &base, &mask, sizeof key, odp_actions)) {
+   &key, &base, &mask, sizeof key,
+   ovs_key_ethernet_field_properties, odp_actions)) {
 put_ethernet_key(&base, base_flow);
 put_ethernet_key(&mask, &wc->masks);
 }
@@ -7269,6 +7313,12 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
bool use_masked)
 {
 struct ovs_key_ipv4 key, mask, base;
+struct ovs_key_field_properties ovs_key_ipv4_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv4, name), 
sizeof(type)},
+OVS_KEY_IPV4_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 /* Check that nw_proto and nw_frag remain unchanged. */
 ovs_assert(flow->nw_proto == base_flow->nw_proto &&
@@ -7286,7 +7336,7 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
 }
 
 if (commit(OVS_KEY_ATTR_IPV4, use_masked, &key, &base, &mask, sizeof key,
-   odp_actions)) {
+   ovs_key_ipv4_field_properties, odp_actions)) {
 put_ipv4_key(&base, base_flow, false);
 if (mask.ipv4_proto != 0) { /* Mask was changed by commit(). */
 put_ipv4_key(&mask, &wc->masks, true);
@@ -7324,6 +7374,12 @@ commit_set_ipv6_action(const struct flow *flow, struct 
flow *base_flow,
bool use_masked)
 {
 struct ovs_key_ipv6 key, mask, base;
+struct ovs_key_field_properties ovs_key_ipv6_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv6, name), 
sizeof(type)},
+OVS_KEY_IPV6_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 /* Check that nw_proto and nw_frag remain unchanged. */
 ovs_assert(flow->nw_proto == base_flow->nw_proto &&
@@ -7342,7 +7398,7 @@ commit_set_ipv6_action(const struct flow *flow, struct 
flow *base_flow,
 }
 
 if (commit(OVS_KEY_ATTR_IPV6, use_masked, &key, &base, &mask, sizeof key,
-   odp_actions)) {
+   ovs_key_ipv6_field_properties, odp_actions)) {
 put_ipv6_key(&base, base_flow, false);
 if (mask.ipv6_proto != 0) { /* Mask was changed by commit(). */
 put_ipv

[ovs-dev] [PATCH V2 1/2] datapath: Declare ovs key structures using macros

2019-01-10 Thread Eli Britstein
Declare ovs key structures using macros as a pre-step of retrieving
field information, with no functional change.

Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
---
 build-aux/extract-odp-netlink-h   |   3 +-
 datapath/linux/compat/include/linux/openvswitch.h | 102 +++---
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/build-aux/extract-odp-netlink-h b/build-aux/extract-odp-netlink-h
index bc1cc35a7..c413fdbf4 100755
--- a/build-aux/extract-odp-netlink-h
+++ b/build-aux/extract-odp-netlink-h
@@ -42,7 +42,7 @@ $i\
 s,,"openvswitch/types.h"\
 #include ,
 s,#.*,,
-s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]*\]/struct
 eth_addr \1/
+s/OVS_KEY_FIELD_ARR(__u8,[[:space:]]*\([a-zA-Z0-9_]*\),[[:space:]]*ETH_ALEN[[:space:]]*/OVS_KEY_FIELD(struct
 eth_addr, \1/
 
 # Transform IPv6 addresses from an array to struct in6_addr
 #
@@ -50,6 +50,7 @@ 
s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]
 # one character because struct ovs_key_nsh has a member "__be32 c[4];"
 # that is not an IPv6 address.
 
s/__be32[[:space:]]*\([a-zA-Z0-9_]\{2,\}\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct
 in6_addr \1/
+s/OVS_KEY_FIELD_ARR(__be32,[[:space:]]*\([a-zA-Z0-9_]\{2,\}\),[[:space:]]*[[:space:]]*4[[:space:]]*/OVS_KEY_FIELD(struct
 in6_addr, \1/
 
 # Transform most Linux-specific __u types into C99 uint_t types,
 # and most Linux-specific __be into Open vSwitch ovs_be,
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1b0..3af82a9b5 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -422,73 +422,109 @@ enum ovs_frag_type {
 
 #define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
 
+#define OVS_KEY_FIELD_ARR(type, name, elements) type name[elements];
+#define OVS_KEY_FIELD(type, name) type name;
+
+#define OVS_KEY_ETHERNET_FIELDS \
+OVS_KEY_FIELD_ARR(__u8, eth_src, ETH_ALEN) \
+OVS_KEY_FIELD_ARR(__u8, eth_dst, ETH_ALEN)
+
 struct ovs_key_ethernet {
-   __u8 eth_src[ETH_ALEN];
-   __u8 eth_dst[ETH_ALEN];
+OVS_KEY_ETHERNET_FIELDS
 };
 
 struct ovs_key_mpls {
__be32 mpls_lse;
 };
 
+#define OVS_KEY_IPV4_FIELDS \
+OVS_KEY_FIELD(__be32, ipv4_src) \
+OVS_KEY_FIELD(__be32, ipv4_dst) \
+OVS_KEY_FIELD(__u8, ipv4_proto) \
+OVS_KEY_FIELD(__u8, ipv4_tos) \
+OVS_KEY_FIELD(__u8, ipv4_ttl) \
+OVS_KEY_FIELD(__u8, ipv4_frag /* One of OVS_FRAG_TYPE_*. */)
+
 struct ovs_key_ipv4 {
-   __be32 ipv4_src;
-   __be32 ipv4_dst;
-   __u8   ipv4_proto;
-   __u8   ipv4_tos;
-   __u8   ipv4_ttl;
-   __u8   ipv4_frag;   /* One of OVS_FRAG_TYPE_*. */
+OVS_KEY_IPV4_FIELDS
 };
 
+#define OVS_KEY_IPV6_FIELDS \
+OVS_KEY_FIELD_ARR(__be32, ipv6_src, 4) \
+OVS_KEY_FIELD_ARR(__be32, ipv6_dst, 4) \
+OVS_KEY_FIELD(__be32, ipv6_label /* 20-bits in least-significant bits. */) 
\
+OVS_KEY_FIELD(__u8, ipv6_proto) \
+OVS_KEY_FIELD(__u8, ipv6_tclass) \
+OVS_KEY_FIELD(__u8, ipv6_hlimit) \
+OVS_KEY_FIELD(__u8, ipv6_frag /* One of OVS_FRAG_TYPE_*. */)
+
 struct ovs_key_ipv6 {
-   __be32 ipv6_src[4];
-   __be32 ipv6_dst[4];
-   __be32 ipv6_label;  /* 20-bits in least-significant bits. */
-   __u8   ipv6_proto;
-   __u8   ipv6_tclass;
-   __u8   ipv6_hlimit;
-   __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
+OVS_KEY_IPV6_FIELDS
 };
 
+#define OVS_KEY_TCP_FIELDS \
+OVS_KEY_FIELD(__be16, tcp_src) \
+OVS_KEY_FIELD(__be16, tcp_dst)
+
 struct ovs_key_tcp {
-   __be16 tcp_src;
-   __be16 tcp_dst;
+OVS_KEY_TCP_FIELDS
 };
 
+#define OVS_KEY_UDP_FIELDS \
+OVS_KEY_FIELD(__be16, udp_src) \
+OVS_KEY_FIELD(__be16, udp_dst)
+
 struct ovs_key_udp {
-   __be16 udp_src;
-   __be16 udp_dst;
+OVS_KEY_UDP_FIELDS
 };
 
+#define OVS_KEY_SCTP_FIELDS \
+OVS_KEY_FIELD(__be16, sctp_src) \
+OVS_KEY_FIELD(__be16, sctp_dst)
+
 struct ovs_key_sctp {
-   __be16 sctp_src;
-   __be16 sctp_dst;
+OVS_KEY_SCTP_FIELDS
 };
 
+#define OVS_KEY_ICMP_FIELDS \
+OVS_KEY_FIELD(__u8, icmp_type) \
+OVS_KEY_FIELD(__u8, icmp_code)
+
 struct ovs_key_icmp {
-   __u8 icmp_type;
-   __u8 icmp_code;
+OVS_KEY_ICMP_FIELDS
 };
 
+#define OVS_KEY_ICMPV6_FIELDS \
+OVS_KEY_FIELD(__u8, icmpv6_type) \
+OVS_KEY_FIELD(__u8, icmpv6_code)
+
 struct ovs_key_icmpv6 {
-   __u8 icmpv6_type;
-   __u8 icmpv6_code;
+OVS_KEY_ICMPV6_FIELDS
 };
 
+#define OVS_KEY_ARP_FIELDS \
+OVS_KEY_FIELD(__be32, arp_sip) \
+OVS_KEY_FIELD(__be32, arp_tip) \
+OVS_KEY_FIELD(__be16, arp_op) \
+OVS_KEY_FIELD_ARR(__u8, arp_sha, ETH_ALEN) \
+OVS_KEY_FIELD_ARR(__u8, arp_tha, ETH_ALEN)
+
 struct ovs_key_arp {
-   __be32 arp_sip;
-   __be32 arp_tip;
-   __be16 arp_op;
-   __u8   arp_sha[ETH_ALEN];
-   __u8   arp

[ovs-dev] [PATCH V2 0/2] Do not rewrite fields with the same values as

2019-01-10 Thread Eli Britstein
Hi

This patch set avoids unnecessary rewrite actions to fields with the
same values as matched on.

Patch 1 is a pre-step by defining ovs key structs using macros

Patch 2 avoids the unnecessary rewrites and adapts the tests accordingly


Travis link:
https://travis-ci.org/roidayan/ovs/builds/477699710
Appvoyer link:
https://ci.appveyor.com/project/roidayan/ovs/builds/21515073

Thanks,
Eli


Eli Britstein (2):
  datapath: Declare ovs key structures using macros
  odp-util: Do not rewrite fields with the same values as matched

 build-aux/extract-odp-netlink-h   |   3 +-
 datapath/linux/compat/include/linux/openvswitch.h | 102 +---
 lib/odp-util.c| 109 --
 tests/mpls-xlate.at   |   2 +-
 tests/ofproto-dpif.at |  14 +--
 5 files changed, 178 insertions(+), 52 deletions(-)

-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Justin Pettit


> On Jan 10, 2019, at 9:13 PM, Ben Pfaff  wrote:
> 
>> On Fri, Jan 11, 2019 at 05:09:45AM +, Justin Pettit wrote:
>> 
 On Jan 10, 2019, at 9:02 PM, Ben Pfaff  wrote:
 
> On Fri, Jan 11, 2019 at 01:56:01AM +, Justin Pettit wrote:
> 
> On Jan 10, 2019, at 4:03 PM, Ben Pfaff  wrote:
> 
> Thanks.  I applied this to master.
> 
> Now, what about patches 7 and 8?
 
 I like to build up the suspense.
 
 Just kidding.  I hope to get them out tonight.
>>> 
>>> I've heard that kind of optimism before.
>> 
>> Optimism is the faith that leads to achievement. Nothing can be done
>> without hope and confidence. - Helen Keller
> 
> I don't think Helen Keller ever built a distributed system.

Of course not with that attitude!


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 05:09:45AM +, Justin Pettit wrote:
> 
> > On Jan 10, 2019, at 9:02 PM, Ben Pfaff  wrote:
> > 
> >> On Fri, Jan 11, 2019 at 01:56:01AM +, Justin Pettit wrote:
> >> 
> >>> On Jan 10, 2019, at 4:03 PM, Ben Pfaff  wrote:
> >>> 
> >>> Thanks.  I applied this to master.
> >>> 
> >>> Now, what about patches 7 and 8?
> >> 
> >> I like to build up the suspense.
> >> 
> >> Just kidding.  I hope to get them out tonight.
> > 
> > I've heard that kind of optimism before.
> 
> Optimism is the faith that leads to achievement. Nothing can be done
> without hope and confidence. - Helen Keller

I don't think Helen Keller ever built a distributed system.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] odp-util: Do not rewrite fields with the same values as matched

2019-01-10 Thread Eli Britstein
You are right. I'll fix it.

On 1/10/2019 9:35 PM, Yifeng Sun wrote:
Hi,

When I try to understand this patch, I feel there may be some issue in this 
loop below.
It looks like each loop is doing the same thing. Can you please take a look?

+for (i = 0; i < size; i++)
+if (memcmp(pkey0, pkey1, size) == 0)
+memset(pmask, 0, size);
+else
+differ = true;

Thanks,
Yifeng

On Thu, Jan 10, 2019 at 12:52 AM Eli Britstein 
mailto:el...@mellanox.com>> wrote:
To improve performance and avoid wasting resources for HW offloaded
flows, do not rewrite fields that are matched with the same value.

Signed-off-by: Eli Britstein mailto:el...@mellanox.com>>
Reviewed-by: Roi Dayan mailto:r...@mellanox.com>>
---
 lib/odp-util.c| 110 +-
 
tests/mpls-xlate.at
   |   2 +-
 
tests/ofproto-dpif.at
 |  14 +++
 3 files changed, 108 insertions(+), 18 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 0491bed38..7e916f9d9 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -7086,12 +7086,50 @@ commit_odp_tunnel_action(const struct flow *flow, 
struct flow *base,
 }
 }

+struct ovs_key_field_properties {
+int offset;
+int size;
+};
+
+/* Compare the keys similary to memcmp, but each field separately.
+ * The offsets and sizes of each field is provided by field_properties
+ * argument.
+ * For fields with the same value, zero out their mask part in order not to
+ * rewrite them as it's unnecessary */
+static bool
+keycmp_mask(const void *key0, const void *key1,
+struct ovs_key_field_properties *field_properties, void *mask)
+{
+int field, i;
+bool differ = false;
+
+for (field = 0 ; ; field++) {
+int size = field_properties[field].size;
+int offset = field_properties[field].offset;
+char *pkey0 = ((char *)key0) + offset;
+char *pkey1 = ((char *)key1) + offset;
+char *pmask = ((char *)mask) + offset;
+
+if (size == 0)
+break;
+
+for (i = 0; i < size; i++)
+if (memcmp(pkey0, pkey1, size) == 0)
+memset(pmask, 0, size);
+else
+differ = true;
+}
+
+return differ;
+}
+
 static bool
 commit(enum ovs_key_attr attr, bool use_masked_set,
const void *key, void *base, void *mask, size_t size,
+   struct ovs_key_field_properties *field_properties,
struct ofpbuf *odp_actions)
 {
-if (memcmp(key, base, size)) {
+if (keycmp_mask(key, base, field_properties, mask)) {
 bool fully_masked = odp_mask_is_exact(attr, mask, size);

 if (use_masked_set && !fully_masked) {
@@ -7133,6 +7171,12 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 bool use_masked)
 {
 struct ovs_key_ethernet key, base, mask;
+struct ovs_key_field_properties ovs_key_ethernet_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ethernet, name), 
sizeof(type)},
+OVS_KEY_ETHERNET_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};

 if (flow->packet_type != htonl(PT_ETH)) {
 return;
@@ -7143,7 +7187,8 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 get_ethernet_key(&wc->masks, &mask);

 if (commit(OVS_KEY_ATTR_ETHERNET, use_masked,
-   &key, &base, &mask, sizeof key, odp_actions)) {
+   &key, &base, &mask, sizeof key,
+   ovs_key_ethernet_field_properties, odp_actions)) {
 put_ethernet_key(&base, base_flow);
 put_ethernet_key(&mask, &wc->masks);
 }
@@ -7269,6 +7314,12 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
bool use_masked)
 {
 struct ovs_key_ipv4 key, mask, base;
+struct ovs_key_field_properties ovs_key_ipv4_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv4, name), 
sizeof(type)},
+OVS_KEY_IPV4_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};

 /* Check that nw_proto and nw_frag remain unchanged. */
 ovs_assert(flow->nw_proto == base_flow->nw_proto &&
@@ -7286,7 +7337,7 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
 }

 if (commit(OVS_KEY_ATTR_IPV4, use_masked, &key, &base, &mask, sizeof key,
-   odp_actions)) 

Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Justin Pettit


> On Jan 10, 2019, at 9:02 PM, Ben Pfaff  wrote:
> 
>> On Fri, Jan 11, 2019 at 01:56:01AM +, Justin Pettit wrote:
>> 
>>> On Jan 10, 2019, at 4:03 PM, Ben Pfaff  wrote:
>>> 
>>> Thanks.  I applied this to master.
>>> 
>>> Now, what about patches 7 and 8?
>> 
>> I like to build up the suspense.
>> 
>> Just kidding.  I hope to get them out tonight.
> 
> I've heard that kind of optimism before.

Optimism is the faith that leads to achievement. Nothing can be done without 
hope and confidence. - Helen Keller

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Ben Pfaff
On Fri, Jan 11, 2019 at 01:56:01AM +, Justin Pettit wrote:
> 
> > On Jan 10, 2019, at 4:03 PM, Ben Pfaff  wrote:
> > 
> > Thanks.  I applied this to master.
> > 
> > Now, what about patches 7 and 8?
> 
> I like to build up the suspense.
> 
> Just kidding.  I hope to get them out tonight.

I've heard that kind of optimism before.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Justin Pettit


> On Jan 10, 2019, at 4:03 PM, Ben Pfaff  wrote:
> 
> Thanks.  I applied this to master.
> 
> Now, what about patches 7 and 8?

I like to build up the suspense.

Just kidding.  I hope to get them out tonight.

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread Darrell Ball
thanks; not a full review

On Wed, Jan 9, 2019 at 10:06 AM Tiago Lam  wrote:

> Previous commits have added support to the dp_packet API to handle
> multi-segmented packets, where data is not stored contiguously in
> memory. However, in some cases, it is inevitable and data must be
> provided contiguously. Examples of such cases are when performing csums
> over the entire packet data, or when write()'ing to a file descriptor
> (for a tap interface, for example). For such cases, the dp_packet API
> has been extended to provide a way to transform a multi-segmented
> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
> copy of memory). If the packet's data is already stored in memory
> contigously then there's no need to convert the packet.
>
> Thus, the main use cases that were assuming that a dp_packet's data is
> always held contiguously in memory were changed to make use of the new
> "linear functions" in the dp_packet API when there's a need to traverse
> the entire's packet data. Per the example above, when the packet's data
> needs to be write() to the tap's file descriptor, or when the conntrack
> module needs to verify a packet's checksum, the data is now linearized.
>
> Additionally, the miniflow_extract() function has been modified to check
> if the respective packet headers don't span across multiple mbufs. This
> requirement is needed to guarantee that callers can assume headers are
> always in contiguous memory.
>
> Signed-off-by: Tiago Lam 
> ---
>  lib/bfd.c |   3 +-
>  lib/conntrack-private.h   |   4 +-
>  lib/conntrack.c   |  43 +--
>  lib/crc32c.c  |  35 --
>  lib/crc32c.h  |   2 +
>  lib/dp-packet.c   |  18 +++
>  lib/dp-packet.h   | 267
> ++
>  lib/dpif-netdev.c |  13 +-
>  lib/dpif-netlink.c|   5 +
>  lib/dpif.c|   9 ++
>  lib/flow.c|  97 ---
>  lib/flow.h|   2 +-
>  lib/mcast-snooping.c  |   2 +
>  lib/netdev-bsd.c  |   5 +
>  lib/netdev-dummy.c|  13 +-
>  lib/netdev-linux.c|  13 +-
>  lib/netdev-native-tnl.c   |  26 ++--
>  lib/odp-execute.c |  12 +-
>  lib/ovs-lldp.c|   3 +-
>  lib/packets.c |  85 --
>  lib/packets.h |   7 ++
>  ofproto/ofproto-dpif-upcall.c |  20 +++-
>  ofproto/ofproto-dpif-xlate.c  |  32 -
>  tests/test-rstp.c |   8 +-
>  tests/test-stp.c  |   8 +-
>  25 files changed, 579 insertions(+), 153 deletions(-)
>
> diff --git a/lib/bfd.c b/lib/bfd.c
> index cc8c685..12e076a 100644
> --- a/lib/bfd.c
> +++ b/lib/bfd.c
> @@ -721,7 +721,8 @@ bfd_process_packet(struct bfd *bfd, const struct flow
> *flow,
>  if (!msg) {
>  VLOG_INFO_RL(&rl, "%s: Received too-short BFD control message
> (only "
>   "%"PRIdPTR" bytes long, at least %d required).",
> - bfd->name, (uint8_t *) dp_packet_tail(p) - l7,
> + bfd->name, dp_packet_size(p) -
> + (l7 - (uint8_t *) dp_packet_data(p)),
>   BFD_PACKET_LEN);
>  goto out;
>  }
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index a344801..1be2df1 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -159,8 +159,8 @@ tcp_payload_length(struct dp_packet *pkt)
>  {
>  const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
>  if (tcp_payload) {
> -return ((char *) dp_packet_tail(pkt) - dp_packet_l2_pad_size(pkt)
> -- tcp_payload);
> +return dp_packet_l4_size(pkt) -
> +   (tcp_payload - (char *) dp_packet_l4(pkt));
>

why was this change needed ?


>  } else {
>  return 0;
>  }
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 6f6021a..0dd2dcc 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -636,12 +636,22 @@ reverse_pat_packet(struct dp_packet *pkt, const
> struct conn *conn)
>  static void
>  reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
>  {
> -char *tail = dp_packet_tail(pkt);
> -char pad = dp_packet_l2_pad_size(pkt);
> +char *tail;
> +char pad;
>  struct conn_key inner_key;
>  const char *inner_l4 = NULL;
> -uint16_t orig_l3_ofs = pkt->l3_ofs;
> -uint16_t orig_l4_ofs = pkt->l4_ofs;
> +uint16_t orig_l3_ofs;
> +uint16_t orig_l4_ofs;
> +
> +/* We need the whole packet to parse the packet below */
> +if (!dp_packet_is_linear(pkt)) {
> +dp_packet_linearize(pkt);
> +}
> +
> +tail = dp_packet_tail(pkt);
> +pad = dp_packet_l2_pad_size(pkt);
> +orig_l3_ofs = pkt->l3_ofs;
> +orig_l4_ofs = pkt->l4_ofs;


>  if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
>  struct ip_header *nh = dp_packet_l3(pkt);
> @

[ovs-dev] [PATCH] datapath-windows: Add support for 'OVS_KEY_ATTR_ENCAP' key attribute.

2019-01-10 Thread Anand Kumar
Add a new structure in l2 header to accomodate vlan header,
based of commit "d7efce7beff25052bd9083419200e1a47f0d6066
datapath: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"

Also reset vlan header in flow key, after deleting vlan tag from nbl

With this change a sample vlan flow would look like,
eth(src=0a:ea:8a:24:03:86,dst=0a:cd:fa:4d:15:5c),in_port(3),eth_type(0x8100),
vlan(vid=2239,pcp=0),encap(eth_type(0x0800),ipv4(src=13.12.11.149,dst=13.12.11.107,
proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0))

Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Actions.c|   3 +
 datapath-windows/ovsext/DpInternal.h |  12 +++-
 datapath-windows/ovsext/Flow.c   | 126 +++
 datapath-windows/ovsext/User.c   |  19 ++
 4 files changed, 144 insertions(+), 16 deletions(-)

diff --git a/datapath-windows/ovsext/Actions.c 
b/datapath-windows/ovsext/Actions.c
index 6922f05..5c9b5c3 100644
--- a/datapath-windows/ovsext/Actions.c
+++ b/datapath-windows/ovsext/Actions.c
@@ -2057,6 +2057,9 @@ OvsDoExecuteActions(POVS_SWITCH_CONTEXT switchContext,
 goto dropit;
 }
 }
+/* Reset vlan header info in flowkey. */
+key->l2.vlanKey.vlanTci = 0;
+key->l2.vlanKey.vlanTpid = 0;
 break;
 }
 
diff --git a/datapath-windows/ovsext/DpInternal.h 
b/datapath-windows/ovsext/DpInternal.h
index 3e351b7..58e7ed8 100644
--- a/datapath-windows/ovsext/DpInternal.h
+++ b/datapath-windows/ovsext/DpInternal.h
@@ -112,6 +112,11 @@ typedef struct Icmp6Key {
 struct in6_addr ndTarget;/* IPv6 neighbor discovery (ND) target. */
 } Icmp6Key; /* Size of 72 byte. */
 
+typedef struct VlanKey {
+ovs_be16 vlanTci;/* If 802.1Q, TCI | VLAN_CFI; otherwise 0. */
+ovs_be16 vlanTpid;   /* Vlan type. Generally 802.1q or 802.1ad.*/
+} VlanKey;
+
 typedef struct L2Key {
 uint32_t inPort; /* Port number of input port. */
 union {
@@ -123,9 +128,10 @@ typedef struct L2Key {
 };
 uint8_t dlSrc[6];/* Ethernet source address. */
 uint8_t dlDst[6];/* Ethernet destination address. */
-ovs_be16 vlanTci;/* If 802.1Q, TCI | VLAN_CFI; otherwise 0. */
 ovs_be16 dlType; /* Ethernet frame type. */
-} L2Key; /* Size of 24 byte. */
+struct VlanKey vlanKey;  /* VLAN header. */
+uint16_t pad[3]; /* Padding 6 bytes. */
+} L2Key; /* Size of 32 byte. */
 
 /* Number of packet attributes required to store OVS tunnel key. */
 #define NUM_PKT_ATTR_REQUIRED 35
@@ -182,7 +188,7 @@ typedef struct MplsKey {
 
 typedef __declspec(align(8)) struct OvsFlowKey {
 OvsIPv4TunnelKey tunKey; /* 280 bytes */
-L2Key l2;/* 24 bytes */
+L2Key l2;/* 32 bytes */
 union {
 /* These headers are mutually exclusive. */
 IpKey ipKey; /* size 16 */
diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c
index f880987..7994786 100644
--- a/datapath-windows/ovsext/Flow.c
+++ b/datapath-windows/ovsext/Flow.c
@@ -115,7 +115,7 @@ const NL_POLICY nlFlowKeyPolicy[] = {
 [OVS_KEY_ATTR_PRIORITY] = {.type = NL_A_UNSPEC, .minLen = 4,
.maxLen = 4, .optional = TRUE},
 [OVS_KEY_ATTR_IN_PORT] = {.type = NL_A_UNSPEC, .minLen = 4,
-  .maxLen = 4, .optional = FALSE},
+  .maxLen = 4, .optional = TRUE},
 [OVS_KEY_ATTR_ETHERNET] = {.type = NL_A_UNSPEC,
.minLen = sizeof(struct ovs_key_ethernet),
.maxLen = sizeof(struct ovs_key_ethernet),
@@ -457,6 +457,7 @@ _FlowNlGetCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
 NL_BUFFER nlBuf;
 PNL_ATTR keyAttrs[__OVS_KEY_ATTR_MAX];
 PNL_ATTR tunnelAttrs[__OVS_TUNNEL_KEY_ATTR_MAX];
+PNL_ATTR encapAttrs[__OVS_KEY_ATTR_MAX];
 
 NlBufInit(&nlBuf, usrParamsCtx->outputBuffer,
   usrParamsCtx->outputLength);
@@ -464,6 +465,7 @@ _FlowNlGetCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
 RtlZeroMemory(&getOutput, sizeof(OvsFlowGetOutput));
 UINT32 keyAttrOffset = 0;
 UINT32 tunnelKeyAttrOffset = 0;
+UINT32 encapOffset = 0;
 BOOLEAN ok;
 NL_ERROR nlError = NL_ERROR_SUCCESS;
 
@@ -503,6 +505,23 @@ _FlowNlGetCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
 goto done;
 }
 
+if (keyAttrs[OVS_KEY_ATTR_ENCAP]) {
+encapOffset = (UINT32)((PCHAR) (keyAttrs[OVS_KEY_ATTR_ENCAP])
+  - (PCHAR)nlMsgHdr);
+
+if ((NlAttrParseNested(nlMsgHdr, encapOffset,
+   NlAttrLen(keyAttrs[OVS_KEY_ATTR_ENCAP]),
+   nlFlowKeyPolicy,
+   ARRAY_SIZE(nlFlowKeyPolicy),
+   encapAttrs, ARRAY_SIZE(encapAttrs)))
+  

[ovs-dev] [RFC PATCH v1 3/3] OVN ACL: Allow a user to input ct.label value for an acl

2019-01-10 Thread Ankur Sharma
This patch allows user to associate a value with acl,
which will be assigned to ct.label of the corresponding
connection tracking entry.

This value can be used to map a ct entry with corresponding
OVN ACL or higher level constructs like security group.

signed-off-by: Ankur Sharma 
---
 ovn/northd/ovn-northd.c   | 37 --
 ovn/ovn-nb.ovsschema  |  5 +++--
 ovn/ovn-nb.xml|  9 
 ovn/utilities/ovn-nbctl.c | 24 +++-
 tests/ovn-nbctl.at| 12 --
 tests/ovn.at  | 57 +++
 6 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index a3249c4..2088d51 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -166,12 +166,13 @@ enum ovn_stage {
 #define OVN_ACL_PRI_OFFSET 1000
 
 /* Register definitions specific to switches. */
-#define REGBIT_CONNTRACK_DEFRAG  "reg0[0]"
-#define REGBIT_CONNTRACK_COMMIT  "reg0[1]"
-#define REGBIT_CONNTRACK_NAT "reg0[2]"
-#define REGBIT_DHCP_OPTS_RESULT  "reg0[3]"
-#define REGBIT_DNS_LOOKUP_RESULT "reg0[4]"
-#define REGBIT_ND_RA_OPTS_RESULT "reg0[5]"
+#define REGBIT_CONNTRACK_DEFRAG "reg0[0]"
+#define REGBIT_CONNTRACK_COMMIT "reg0[1]"
+#define REGBIT_CONNTRACK_NAT"reg0[2]"
+#define REGBIT_CONNTRACK_SET_LABEL  "reg0[3]"
+#define REGBIT_DHCP_OPTS_RESULT "reg0[4]"
+#define REGBIT_DNS_LOOKUP_RESULT"reg0[5]"
+#define REGBIT_ND_RA_OPTS_RESULT"reg0[6]"
 
 /* Register definitions for switches and routers. */
 #define REGBIT_NAT_REDIRECT "reg9[0]"
@@ -3554,7 +3555,14 @@ consider_acl(struct hmap *lflows, struct ovn_datapath 
*od,
   " || (!ct.new && ct.est && !ct.rpl "
"&& ct_mark.blocked == 1)) "
   "&& (%s)", acl->match);
-ds_put_cstr(&actions, REGBIT_CONNTRACK_COMMIT" = 1; ");
+if (acl->label) {
+   ds_put_format(&actions, REGBIT_CONNTRACK_COMMIT" = 1; "
+ ""REGBIT_CONNTRACK_SET_LABEL" = 1; "
+ "xxreg1 = %s; ", acl->label);
+} else {
+   ds_put_cstr(&actions, REGBIT_CONNTRACK_COMMIT" = 1; ");
+}
+
 build_acl_log(&actions, acl);
 ds_put_cstr(&actions, "next;");
 ovn_lflow_add_with_hint(lflows, od, stage,
@@ -3988,6 +3996,21 @@ build_stateful(struct ovn_datapath *od, struct hmap 
*lflows)
 ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, 0, "1", "next;");
 ovn_lflow_add(lflows, od, S_SWITCH_OUT_STATEFUL, 0, "1", "next;");
 
+/* If REGBIT_CONNTRACK_COMMIT is set as 1 and
+ * REGBIT_CONNTRACK_SET_LABEL is set to 1, then the packets should be
+ * committed to conntrack.
+ * We always set ct_mark.blocked to 0 here as
+ * any packet that makes it this far is part of a connection we
+ * want to allow to continue. */
+ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, 150,
+  REGBIT_CONNTRACK_COMMIT" == 1 && "
+  ""REGBIT_CONNTRACK_SET_LABEL" == 1",
+  "ct_commit(ct_mark=0/1, ct_label=xxreg1); next;");
+ovn_lflow_add(lflows, od, S_SWITCH_OUT_STATEFUL, 150,
+  REGBIT_CONNTRACK_COMMIT" == 1 && "
+  ""REGBIT_CONNTRACK_SET_LABEL" == 1",
+  "ct_commit(ct_mark=0/1, ct_label=xxreg1); next;");
+
 /* If REGBIT_CONNTRACK_COMMIT is set as 1, then the packets should be
  * committed to conntrack. We always set ct_mark.blocked to 0 here as
  * any packet that makes it this far is part of a connection we
diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
index f3683df..bd2f1f8 100644
--- a/ovn/ovn-nb.ovsschema
+++ b/ovn/ovn-nb.ovsschema
@@ -1,7 +1,7 @@
 {
 "name": "OVN_Northbound",
-"version": "5.14.0",
-"cksum": "3600467067 20513",
+"version": "5.14.1",
+"cksum": "2042385824 20587",
 "tables": {
 "NB_Global": {
 "columns": {
@@ -171,6 +171,7 @@
 "debug"]]},
   "min": 0, "max": 1}},
 "meter": {"type": {"key": "string", "min": 0, "max": 1}},
+"label": {"type": {"key": "string", "min": 0, "max": 1}},
 "external_ids": {
 "type": {"key": "string", "value": "string",
  "min": 0, "max": "unlimited"}}},
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 6d6fb05..6e624c8 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1219,6 +1219,15 @@
 default, log messages are not rate-limited.
 
   
+
+  
+
+Associates an identifier with the ACL.
+Same value will be written to corresponding connection
+tracker entry. Value should be in hex, for example: 0x1234.
+
+  
+

[ovs-dev] [RFC PATCH v1 2/3] OVN ACL: Allow ct_mark and ct_label values to be set from register as well

2019-01-10 Thread Ankur Sharma
OVN allows only an integer (or masked integer) to be assigned to
ct_mark and ct_label.

This patch, enhances the parser code to allow ct_mark and ct_label
to be assigned from 32 bit registers (MFF_REG0 - MFF_REG15) and  128
bit registers (MFF_XXREG0 - MFF_XXREG3) respectively.

signed-off-by: Ankur Sharma 
---
 include/ovn/actions.h |  3 +++
 ovn/lib/actions.c | 73 ++-
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1c0c67c..58b96a1 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -24,6 +24,7 @@
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/uuid.h"
+#include "openvswitch/meta-flow.h"
 #include "util.h"
 
 struct expr;
@@ -196,8 +197,10 @@ struct ovnact_ct_next {
 /* OVNACT_CT_COMMIT. */
 struct ovnact_ct_commit {
 struct ovnact ovnact;
+bool is_ct_mark_reg, is_ct_label_reg; /* If the value is from a register */
 uint32_t ct_mark, ct_mark_mask;
 ovs_be128 ct_label, ct_label_mask;
+enum mf_field_id ct_mark_reg, ct_label_reg;
 };
 
 /* OVNACT_CT_DNAT, OVNACT_CT_SNAT. */
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 7b7a894..c0c8da7 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -627,8 +627,28 @@ parse_ct_commit_arg(struct action_context *ctx,
 } else if (ctx->lexer->token.type == LEX_T_MASKED_INTEGER) {
 cc->ct_mark = ntohll(ctx->lexer->token.value.integer);
 cc->ct_mark_mask = ntohll(ctx->lexer->token.mask.integer);
+} else if (ctx->lexer->token.type == LEX_T_ID) {
+
+   cc->ct_mark_mask = UINT32_MAX;
+
+   const struct mf_field *mf = mf_from_name(ctx->lexer->token.s);
+   if (mf) {
+
+  if (mf->id >= MFF_REG0 && mf->id <= MFF_REG15) {
+ cc->is_ct_mark_reg = true;
+ cc->ct_mark_reg = mf->id;
+  } else {
+ lexer_syntax_error(ctx->lexer, "input: %s,  not a 32 bit "
+"register", mf->name);
+ return;
+  }
+   } else {
+  lexer_syntax_error(ctx->lexer, "invalid field name: %s",
+ ctx->lexer->token.s);
+  return;
+   }
 } else {
-lexer_syntax_error(ctx->lexer, "expecting integer");
+lexer_syntax_error(ctx->lexer, "invalid token type");
 return;
 }
 lexer_get(ctx->lexer);
@@ -642,9 +662,28 @@ parse_ct_commit_arg(struct action_context *ctx,
 } else if (ctx->lexer->token.type == LEX_T_MASKED_INTEGER) {
 cc->ct_label = ctx->lexer->token.value.be128_int;
 cc->ct_label_mask = ctx->lexer->token.mask.be128_int;
+} else if (ctx->lexer->token.type == LEX_T_ID) {
+
+   cc->ct_label_mask = OVS_BE128_MAX;
+   const struct mf_field *mf = mf_from_name(ctx->lexer->token.s);
+   if (mf) {
+  if (mf->id >= MFF_XXREG0 && mf->id <= MFF_XXREG3) {
+ cc->is_ct_label_reg = true;
+ cc->ct_label_reg = mf->id;
+  } else {
+ lexer_syntax_error(ctx->lexer, "input: %s,  not a 128 bit "
+"register", mf->name);
+ return;
+  }
+   } else {
+  lexer_syntax_error(ctx->lexer, "invalid field name: %s",
+ ctx->lexer->token.s);
+  return;
+   }
+
 } else {
-lexer_syntax_error(ctx->lexer, "expecting integer");
-return;
+   lexer_syntax_error(ctx->lexer, "invalid token type");
+   return;
 }
 lexer_get(ctx->lexer);
 } else {
@@ -713,14 +752,36 @@ encode_CT_COMMIT(const struct ovnact_ct_commit *cc,
 ofpbuf_pull(ofpacts, set_field_offset);
 
 if (cc->ct_mark_mask) {
-const ovs_be32 value = htonl(cc->ct_mark);
-const ovs_be32 mask = htonl(cc->ct_mark_mask);
-ofpact_put_set_field(ofpacts, mf_from_id(MFF_CT_MARK), &value, &mask);
+   if (cc->is_ct_mark_reg) {
+  struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
+
+  move->src.field = mf_from_id(cc->ct_mark_reg);
+  move->src.ofs = 0;
+  move->src.n_bits = 32;
+  move->dst.field = mf_from_id(MFF_CT_MARK);
+  move->dst.ofs = 0;
+  move->dst.n_bits = 32;
+   } else {
+  const ovs_be32 value = htonl(cc->ct_mark);
+  const ovs_be32 mask = htonl(cc->ct_mark_mask);
+  ofpact_put_set_field(ofpacts, mf_from_id(MFF_CT_MARK), &value, 
&mask);
+   }
 }
 
 if (!ovs_be128_is_zero(cc->ct_label_mask)) {
+   if (cc->is_ct_label_reg) {
+  struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
+
+  move->src.field = mf_from_id(cc->ct_label_reg);
+  move->src.ofs = 0;
+  

[ovs-dev] [RFC PATCH v1 1/3] OVN ACL: Replace the usage of ct_label with ct_mark

2019-01-10 Thread Ankur Sharma
OVN ACL implementation used ct_label to indicate if a previosuly
allowed connection shoudl not be allowed anymore and vice versa.

However, ct_label is a 128 bit value and we should rather leverage
on ct_mark which is a 32 bit value.

Using ct_mark for this purpose, allows us to use ct_label for storing
other values like, identifier for corresponidng OVN ACL/Security group etc.

signed-off-by: Ankur Sharma 
---
 ovn/lib/logical-fields.c|  1 +
 ovn/northd/ovn-northd.8.xml | 14 ++---
 ovn/northd/ovn-northd.c | 48 ++---
 tests/ovn.at|  9 +
 4 files changed, 37 insertions(+), 35 deletions(-)

diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
index a8b5e3c..5165a3a 100644
--- a/ovn/lib/logical-fields.c
+++ b/ovn/lib/logical-fields.c
@@ -108,6 +108,7 @@ ovn_init_symtab(struct shash *symtab)
 
 /* Connection tracking state. */
 expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
+expr_symtab_add_subfield(symtab, "ct_mark.blocked", NULL, "ct_mark[0]");
 
 expr_symtab_add_field(symtab, "ct_label", MFF_CT_LABEL, NULL, false);
 expr_symtab_add_subfield(symtab, "ct_label.blocked", NULL, "ct_label[0]");
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 392a5ef..4fef4c4 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -286,14 +286,14 @@
   
   
 allow-related ACLs translate into logical
-flows with the ct_commit(ct_label=0/1); next; actions
+flows with the ct_commit(ct_mark=0/1); next; actions
 for new connections and reg0[1] = 1; next; for existing
 connections.
   
   
 Other ACLs translate to drop; for new or untracked
-connections and ct_commit(ct_label=1/1); for known
-connections.  Setting ct_label marks a connection
+connections and ct_commit(ct_mark=1/1); for known
+connections.  Setting ct_mark marks a connection
 as one that was previously allowed, but should no longer be
 allowed due to a policy change.
   
@@ -319,12 +319,12 @@
 A priority-65535 flow that allows any traffic in the reply
 direction for a connection that has been committed to the
 connection tracker (i.e., established flows), as long as
-the committed flow does not have ct_label.blocked set.
+the committed flow does not have ct_mark.blocked set.
 We only handle traffic in the reply direction here because
 we want all packets going in the request direction to still
 go through the flows that implement the currently defined
 policy based on ACLs.  If a connection is no longer allowed by
-policy, ct_label.blocked will get set and packets in the
+policy, ct_mark.blocked will get set and packets in the
 reply direction will no longer be allowed, either.
   
 
@@ -332,7 +332,7 @@
 A priority-65535 flow that allows any traffic that is considered
 related to a committed flow in the connection tracker (e.g., an
 ICMP Port Unreachable from a non-listening UDP port), as long
-as the committed flow does not have ct_label.blocked set.
+as the committed flow does not have ct_mark.blocked set.
   
 
   
@@ -342,7 +342,7 @@
 
   
 A priority-65535 flow that drops all trafic in the reply direction
-with ct_label.blocked set meaning that the connection
+with ct_mark.blocked set meaning that the connection
 should no longer be allowed due to a policy change.  Packets
 in the request direction are skipped here to let a newly created
 ACL re-allow this connection.
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 3fd8a87..a3249c4 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -3546,13 +3546,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath 
*od,
  * It's also possible that a known connection was marked for
  * deletion after a policy was deleted, but the policy was
  * re-added while that connection is still known.  We catch
- * that case here and un-set ct_label.blocked (which will be done
+ * that case here and un-set ct_mark.blocked (which will be done
  * by ct_commit in the "stateful" stage) to indicate that the
  * connection should be allowed to resume.
  */
 ds_put_format(&match, "((ct.new && !ct.est)"
   " || (!ct.new && ct.est && !ct.rpl "
-   "&& ct_label.blocked == 1)) "
+   "&& ct_mark.blocked == 1)) "
   "&& (%s)", acl->match);
 ds_put_cstr(&actions, REGBIT_CONNTRACK_COMMIT" = 1; ");
 build_acl_log(&actions, acl);
@@ -3573,7 +3573,7 @@ conside

[ovs-dev] [RFC PATCH v1 0/3] Associate identifier with OVN ACL connection tracking entry

2019-01-10 Thread Ankur Sharma
What:

a. Goal is to be able to associate some identifier with a connection tracking
entry.

b. This identifier can be used to map OVN ACL which added this entry or
higher level constructs like openstack security group etc.

c. There are 2 connection tracking fields which can be used for it.
ct.mark (32 bits) and ct.label (128 bits).

d. Patch intends to use ct.label, as this is a longer field and
hence would be put to a better use, if it stores the identifier.

Why:

a. Adding an identifier would help in debugging.
b. Now, we can map a connection tracking entry to corresponding
   acl, security group etc.

How:

Following is the sequence of changes:

Patch 1:
i.  Current implementation uses a bit ct.label to handle policy update cases,
where we use a bit in ct.label to indicate that reply traffic should
be dropped now.
ii. Swap the usage of ct.label in current implementation with ct.mark.

Patch 2:
i. Add support in parser to allow ct.label and mark to be set from registers
as well (as of now only integer/masked integer is allowed).

Patch 3:
i. Add a new column (named 'label') to Table ACL in northbound schema.
ii. ovn-northd changes to enhance logical flows to set ct.label to acl->label.
For example:
table=4 (ls_out_acl ),  action=(reg0[1] = 1; reg0[3] = 1; xxreg1 = 
0x1234; next;)
.
.
.
table=7 (ls_out_stateful), ... match=(reg0[1] == 1 && reg0[3] == 1),
   action=(ct_commit(ct_mark=0/1, 
ct_label=xxreg1); next;)


Ankur Sharma (3):
  OVN ACL: Replace the usage of ct_label with ct_mark
  OVN ACL: Allow ct_mark and ct_label values to be set from register as
well
  OVN ACL: Allow a user to input ct.label value for an acl

 include/ovn/actions.h   |  3 ++
 ovn/lib/actions.c   | 73 ++
 ovn/lib/logical-fields.c|  1 +
 ovn/northd/ovn-northd.8.xml | 14 
 ovn/northd/ovn-northd.c | 85 -
 ovn/ovn-nb.ovsschema|  5 +--
 ovn/ovn-nb.xml  |  9 +
 ovn/utilities/ovn-nbctl.c   | 24 -
 tests/ovn-nbctl.at  | 12 +--
 tests/ovn.at| 66 ---
 10 files changed, 239 insertions(+), 53 deletions(-)

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Ben Pfaff
On Thu, Jan 10, 2019 at 07:22:11PM +, Justin Pettit wrote:
> 
> > On Aug 30, 2018, at 1:00 PM, Ben Pfaff  wrote:
> > 
> > OpenFlow has a concept of multipart messages, that is, messages that can be
> > broken into multiple pieces that are sent separately.  Before OpenFlow 1.3,
> > only replies could actually have multiple pieces.  OpenFlow 1.3 introduced
> > the idea that requests could have multiple pieces.  This is only useful for
> > multipart requests that take an array as part of the request, which amounts
> > to only flow monitoring requests and table features requests.  So far, OVS
> > hasn't implemented the multipart versions of these (it just reports an
> > error).  This commit introduces the necessary infastructure to implement
> > them properly.
> > 
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Justin Pettit 

Thanks.  I applied this to master.

Now, what about patches 7 and 8?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] nroff: Increase width for .IP used for ordered lists.

2019-01-10 Thread Ben Pfaff
Thank you for the review.  I applied this series to master.

On Thu, Jan 10, 2019 at 03:09:44PM -0500, Mark Michelson wrote:
> Hi Ben,
> 
> Excellent new document. This will be great to have in the code.
> 
> For the series
> Acked-by: Mark Michelson 
> 
> On 11/9/18 12:39 AM, Ben Pfaff wrote:
> > The ordered lists that a .25in width produced looked OK in PostScript
> > or PDF output, but in text output every list item spanned two lines,
> > like this:
> > 
> > 1.
> >   First list item.
> > 2.
> >   Second list item.
> > 
> > With this change, they appear normally:
> > 
> > 1. First list item.
> > 2. Second list item.
> > 
> > Signed-off-by: Ben Pfaff 
> > ---
> >   python/build/nroff.py | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/python/build/nroff.py b/python/build/nroff.py
> > index 73353061..17ff69bfe47a 100644
> > --- a/python/build/nroff.py
> > +++ b/python/build/nroff.py
> > @@ -316,7 +316,7 @@ def block_xml_to_nroff(nodes, para='.PP'):
> >   if node.tagName == 'ul':
> >   s += ".IP \\(bu\n"
> >   else:
> > -s += ".IP %d. .25in\n" % i
> > +s += ".IP %d. .4in\n" % i
> >   s += block_xml_to_nroff(li_node.childNodes, ".IP")
> >   elif li_node.nodeType == node.COMMENT_NODE:
> >   pass
> > 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] vconn: Allow timout configuration for blocking connection.

2019-01-10 Thread Ben Pfaff
On Wed, Jan 09, 2019 at 08:30:17PM +0300, Ilya Maximets wrote:
> On some systems in case where remote is not responding, socket could
> remain in SYN_SENT state for a really long time without errors waiting
> for connection. This leads to situations where vconn connection hangs
> for a few minutes waiting for connection to the DOWN remote.
> 
> For example, this situation emulated by "refuse-connection" vconn
> testcase. This leads to test failures because Alarm signal arrives much
> faster than ETIMEDOUT from the socket:
> 
>   ./vconn.at:21: ovstest test-vconn refuse-connection tcp
>   Alarm clock
>   stderr:
>   |socket_util|INFO|0:127.0.0.1: listening on port 63812
>   |poll_loop|DBG|wakeup due to 0-ms timeout
>   |poll_loop|DBG|wakeup due to 10155-ms timeout
>   |fatal_signal|WARN|terminating with signal 14 (Alarm clock)
>   ./vconn.at:21: exit code was 142, expected 0
>   vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED
> 
> This patch allowes to specify timeout value for vconn blocking
> connections. If the connection takes more time, socket will be closed
> with ETIMEDOUT error code. Negative value could be used to wait
> infinitely.
> 
> Signed-off-by: Ilya Maximets 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] stream: Allow timeout configuration for open_block.

2019-01-10 Thread Ben Pfaff
On Wed, Jan 09, 2019 at 08:30:16PM +0300, Ilya Maximets wrote:
> On some systems in case where remote is not responding, socket could
> remain in SYN_SENT state for a really long time without errors waiting
> for connection. This leads to situations where open_blok() hangs for
> a few minutes waiting for connection to the DOWN remote.
> 
> For example, our "multiple remotes" idl tests hangs waiting for
> connection to the WRONG_PORT on FreeBSD in CirrusCI environment.
> This leads to test failures because Alarm signal arrives much faster
> than ETIMEDOUT from the socket.
> 
> This patch allowes to specify timeout value for 'open_block' function.
> If the connection takes more time, socket will be closed with
> ETIMEDOUT error code. Negative value or None in python could be
> used to wait infinitely.
> 
> Signed-off-by: Ilya Maximets 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] poll-loop: Set poll loop initial deadline to LLONG_MAX.

2019-01-10 Thread Ben Pfaff
This is consistent with the re-initialization value that poll_block() uses.
It is better than 0 because the monotonic clock can have a negative value,
even though that is rare and pathological.

Found by inspection.

Signed-off-by: Ben Pfaff 
---
 lib/poll-loop.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/poll-loop.c b/lib/poll-loop.c
index ec5c8197eb37..4e751ff2c710 100644
--- a/lib/poll-loop.c
+++ b/lib/poll-loop.c
@@ -415,6 +415,7 @@ poll_loop(void)
 loop = pthread_getspecific(key);
 if (!loop) {
 loop = xzalloc(sizeof *loop);
+loop->timeout_when = LLONG_MAX;
 hmap_init(&loop->poll_nodes);
 xpthread_setspecific(key, loop);
 }
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/3] python: Disable flake8 warning W504 "line break after binary operator".

2019-01-10 Thread Ben Pfaff
OVS Python uses this style all over and I don't see a reason to avoid it.

Signed-off-by: Ben Pfaff 
---
 Makefile.am | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile.am b/Makefile.am
index 8408509c9a83..ff1f94b4841f 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -392,6 +392,7 @@ ALL_LOCAL += flake8-check
 #   E131 continuation line unaligned for hanging indent
 #   E722 do not use bare except, specify exception instead
 #   W503 line break before binary operator
+#   W504 line break after binary operator
 # F*** -- warnings native to flake8
 #   F811 redefinition of unused  from line  (only from flake8 v2.0)
 # D*** -- warnings from flake8-docstrings plugin
@@ -401,7 +402,7 @@ ALL_LOCAL += flake8-check
 #   H233 Python 3.x incompatible use of print operator
 #   H238 old style class declaration, use new style (inherit from `object`)
 FLAKE8_SELECT = H231,H232,H233,H238
-FLAKE8_IGNORE = E121,E123,E125,E126,E127,E128,E129,E131,E722,W503,F811,D,H,I
+FLAKE8_IGNORE = 
E121,E123,E125,E126,E127,E128,E129,E131,E722,W503,W504,F811,D,H,I
 flake8-check: $(FLAKE8_PYFILES)
$(FLAKE8_WERROR)$(AM_V_GEN) \
  src='$^' && \
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/3] python: Avoid flake8 warning for unused variables.

2019-01-10 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 python/ovs/socket_util.py | 2 +-
 python/ovs/stream.py  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
index 2596ddefdb8c..8f9d31825c92 100644
--- a/python/ovs/socket_util.py
+++ b/python/ovs/socket_util.py
@@ -90,7 +90,7 @@ def make_unix_socket(style, nonblock, bind_path, 
connect_path, short=False):
 
 try:
 os.fchmod(sock.fileno(), 0o700)
-except OSError as e:
+except OSError:
 pass
 if connect_path is not None:
 try:
diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index cdfcc399e512..e948427734a8 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -174,7 +174,7 @@ class Stream(object):
 try:
 winutils.set_pipe_mode(npipe,
win32pipe.PIPE_READMODE_BYTE)
-except pywintypes.error as e:
+except pywintypes.error:
 return errno.ENOENT, None
 except pywintypes.error as e:
 if e.winerror == winutils.winerror.ERROR_PIPE_BUSY:
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/3] flake8 fixes

2019-01-10 Thread Ben Pfaff
This is needed to make OVS work with flake8 3.6.

v1->v2:
   - Added a few more fixes to patch 1.
   - Patch 3 is new.

Ben Pfaff (3):
  python: Fix invalid escape sequences.
  python: Disable flake8 warning W504 "line break after binary
operator".
  python: Avoid flake8 warning for unused variables.

 Makefile.am|  3 ++-
 python/build/nroff.py  |  4 ++--
 python/ovs/db/schema.py|  2 +-
 python/ovs/json.py |  4 ++--
 python/ovs/socket_util.py  |  2 +-
 python/ovs/stream.py   |  2 +-
 python/ovs/unixctl/__init__.py |  2 +-
 python/ovs/util.py |  2 +-
 python/ovs/vlog.py |  2 +-
 tests/test-ovsdb.py|  2 +-
 utilities/checkpatch.py| 24 
 utilities/ovs-dev.py   |  2 +-
 12 files changed, 26 insertions(+), 25 deletions(-)

-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/3] python: Fix invalid escape sequences.

2019-01-10 Thread Ben Pfaff
It appears that Python silently treats invalid escape sequences in
strings as literals, e.g. "\." is the same as "\\.".  Newer versions of
checkpatch complain, and it does seem reasonable to me to fix these.

Signed-off-by: Ben Pfaff 
---
 python/build/nroff.py  |  4 ++--
 python/ovs/db/schema.py|  2 +-
 python/ovs/json.py |  4 ++--
 python/ovs/unixctl/__init__.py |  2 +-
 python/ovs/util.py |  2 +-
 python/ovs/vlog.py |  2 +-
 tests/test-ovsdb.py|  2 +-
 utilities/checkpatch.py| 24 
 utilities/ovs-dev.py   |  2 +-
 9 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/python/build/nroff.py b/python/build/nroff.py
index 73353061..74a3f08ca40e 100644
--- a/python/build/nroff.py
+++ b/python/build/nroff.py
@@ -274,7 +274,7 @@ def diagram_to_nroff(nodes, para):
 text_s = '.br\n'.join(["\\fL%s\n" % s for s in text if s != ""])
 return para + """
 .\\" check if in troff mode (TTY)
-.if t \{
+.if t \\{
 .PS
 boxht = .2
 textht = 1/6
@@ -283,7 +283,7 @@ fillval = .2
 .PE
 \\}
 .\\" check if in nroff mode:
-.if n \{
+.if n \\{
 .nf
 """ + text_s + """\
 .fi
diff --git a/python/ovs/db/schema.py b/python/ovs/db/schema.py
index 55c8ae7f3531..44b030757ef7 100644
--- a/python/ovs/db/schema.py
+++ b/python/ovs/db/schema.py
@@ -73,7 +73,7 @@ class DbSchema(object):
 parser.finish()
 
 if (version is not None and
-not re.match('[0-9]+\.[0-9]+\.[0-9]+$', version)):
+not re.match(r'[0-9]+\.[0-9]+\.[0-9]+$', version)):
 raise error.Error('schema version "%s" not in format x.y.z'
   % version)
 
diff --git a/python/ovs/json.py b/python/ovs/json.py
index e84063fc2ed1..94c8e30f365d 100644
--- a/python/ovs/json.py
+++ b/python/ovs/json.py
@@ -177,7 +177,7 @@ class Parser(object):
 return False
 
 __number_re = re.compile("(-)?(0|[1-9][0-9]*)"
-"(?:\.([0-9]+))?(?:[eE]([-+]?[0-9]+))?$")
+r"(?:\.([0-9]+))?(?:[eE]([-+]?[0-9]+))?$")
 
 def __lex_finish_number(self):
 s = self.buffer
@@ -234,7 +234,7 @@ class Parser(object):
 self.__error("leading zeros not allowed")
 elif re.match("-([^0-9]|$)", s):
 self.__error("'-' must be followed by digit")
-elif re.match("-?(0|[1-9][0-9]*)\.([^0-9]|$)", s):
+elif re.match(r"-?(0|[1-9][0-9]*)\.([^0-9]|$)", s):
 self.__error("decimal point must be followed by digit")
 elif re.search("e[-+]?([^0-9]|$)", s):
 self.__error("exponent must contain at least one digit")
diff --git a/python/ovs/unixctl/__init__.py b/python/ovs/unixctl/__init__.py
index 025da2a90dac..c2e5aca8d6ff 100644
--- a/python/ovs/unixctl/__init__.py
+++ b/python/ovs/unixctl/__init__.py
@@ -73,7 +73,7 @@ def command_register(name, usage, min_args, max_args, 
callback, aux):
 def socket_name_from_target(target):
 assert isinstance(target, strtypes)
 
-""" On Windows an absolute path contains ':' ( i.e: C:\ ) """
+""" On Windows an absolute path contains ':' ( i.e: C:\\ ) """
 if target.startswith('/') or target.find(':') > -1:
 return 0, target
 
diff --git a/python/ovs/util.py b/python/ovs/util.py
index 411ac99c85a4..3dba022f8e27 100644
--- a/python/ovs/util.py
+++ b/python/ovs/util.py
@@ -31,7 +31,7 @@ def abs_file_name(dir_, file_name):
 This differs from os.path.abspath() in that it will never change the
 meaning of a file name.
 
-On Windows an absolute path contains ':' ( i.e: C:\ ) """
+On Windows an absolute path contains ':' ( i.e: C:\\ ) """
 if file_name.startswith('/') or file_name.find(':') > -1:
 return file_name
 else:
diff --git a/python/ovs/vlog.py b/python/ovs/vlog.py
index e3cf76dcd9e2..5b478fc541f7 100644
--- a/python/ovs/vlog.py
+++ b/python/ovs/vlog.py
@@ -142,7 +142,7 @@ class Vlog(object):
 return re.sub(match, replace, tmp)
 
 def _format_time(self, tmp):
-date_regex = re.compile('(%(0?[1-9]?[dD])(\{(.*)\})?)')
+date_regex = re.compile(r'(%(0?[1-9]?[dD])(\{(.*)\})?)')
 match = date_regex.search(tmp)
 
 if match is None:
diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index c03476c7ffaf..01c1f379c5e7 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -167,7 +167,7 @@ def get_simple_printable_row_string(row, columns):
 s += "%s=%s " % (column, value)
 s = s.strip()
 s = re.sub('""|,|u?\'', "", s)
-s = re.sub('UUID\(([^)]+)\)', r'\1', s)
+s = re.sub(r'UUID\(([^)]+)\)', r'\1', s)
 s = re.sub('False', 'false', s)
 s = re.sub('True', 'true', s)
 s = re.sub(r'(ba)=([^[][^ ]*) ', r'\1=[\2] ', s)
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 41676adab375..d8bd34b1f19b 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -518,38 +518,38 @@ checks = [
  'check': lambda x: trailing_whitespa

Re: [ovs-dev] [PATCH] connmgr: Do not send asynchronous messages to rconns lacking protocols.

2019-01-10 Thread Justin Pettit


> On Dec 12, 2018, at 12:28 PM, Ben Pfaff  wrote:
> 
> There are corner cases in which an rconn might not have a defined OpenFlow
> protocol or version.  These happen at connection startup, before the
> protocol version has been negotiated, and can also happen when a connection
> is being shut down.  It's desirable to avoid these situations entirely,
> but so far we haven't managed to do this.  This commit avoids trying to
> send messages to such connection, which is what really tends to get OVS in
> trouble since there's no way to construct an OpenFlow message without
> knowing what version of OpenFlow to use (with a few exceptions that don't
> matter here).
> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047876.html
> Reported-by: Josh Bailey 
> Signed-off-by: Ben Pfaff 

Acked-by: Justin Pettit 

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] python: Disable flake8 warning W504 "line break after binary operator".

2019-01-10 Thread Ben Pfaff
OVS Python uses this style all over and I don't see a reason to avoid it.

Signed-off-by: Ben Pfaff 
---
 Makefile.am | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile.am b/Makefile.am
index 8408509c9a83..ff1f94b4841f 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -392,6 +392,7 @@ ALL_LOCAL += flake8-check
 #   E131 continuation line unaligned for hanging indent
 #   E722 do not use bare except, specify exception instead
 #   W503 line break before binary operator
+#   W504 line break after binary operator
 # F*** -- warnings native to flake8
 #   F811 redefinition of unused  from line  (only from flake8 v2.0)
 # D*** -- warnings from flake8-docstrings plugin
@@ -401,7 +402,7 @@ ALL_LOCAL += flake8-check
 #   H233 Python 3.x incompatible use of print operator
 #   H238 old style class declaration, use new style (inherit from `object`)
 FLAKE8_SELECT = H231,H232,H233,H238
-FLAKE8_IGNORE = E121,E123,E125,E126,E127,E128,E129,E131,E722,W503,F811,D,H,I
+FLAKE8_IGNORE = 
E121,E123,E125,E126,E127,E128,E129,E131,E722,W503,W504,F811,D,H,I
 flake8-check: $(FLAKE8_PYFILES)
$(FLAKE8_WERROR)$(AM_V_GEN) \
  src='$^' && \
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] python: Fix invalid escape sequences.

2019-01-10 Thread Ben Pfaff
It appears that Python silently treats invalid escape sequences in
strings as literals, e.g. "\." is the same as "\\.".  Newer versions of
checkpatch complain, and it does seem reasonable to me to fix these.

Signed-off-by: Ben Pfaff 
---
 python/build/nroff.py   |  4 ++--
 tests/test-ovsdb.py |  2 +-
 utilities/checkpatch.py | 24 
 utilities/ovs-dev.py|  2 +-
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/python/build/nroff.py b/python/build/nroff.py
index 73353061..74a3f08ca40e 100644
--- a/python/build/nroff.py
+++ b/python/build/nroff.py
@@ -274,7 +274,7 @@ def diagram_to_nroff(nodes, para):
 text_s = '.br\n'.join(["\\fL%s\n" % s for s in text if s != ""])
 return para + """
 .\\" check if in troff mode (TTY)
-.if t \{
+.if t \\{
 .PS
 boxht = .2
 textht = 1/6
@@ -283,7 +283,7 @@ fillval = .2
 .PE
 \\}
 .\\" check if in nroff mode:
-.if n \{
+.if n \\{
 .nf
 """ + text_s + """\
 .fi
diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index 2d11122b..1d7c023da27d 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -167,7 +167,7 @@ def get_simple_printable_row_string(row, columns):
 s += "%s=%s " % (column, value)
 s = s.strip()
 s = re.sub('""|,|u?\'', "", s)
-s = re.sub('UUID\(([^)]+)\)', r'\1', s)
+s = re.sub(r'UUID\(([^)]+)\)', r'\1', s)
 s = re.sub('False', 'false', s)
 s = re.sub('True', 'true', s)
 s = re.sub(r'(ba)=([^[][^ ]*) ', r'\1=[\2] ', s)
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 41676adab375..d8bd34b1f19b 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -518,38 +518,38 @@ checks = [
  'check': lambda x: trailing_whitespace_or_crlf(x),
  'print': lambda: print_warning("Line has trailing whitespace")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': lambda x: not if_and_for_whitespace_checks(x),
  'print': lambda: print_error("Improper whitespace around control block")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': lambda x: not if_and_for_end_with_bracket_check(x),
  'print': lambda: print_error("Inappropriate bracing around statement")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': lambda x: pointer_whitespace_check(x),
  'print':
  lambda: print_error("Inappropriate spacing in pointer declaration")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': lambda x: trailing_operator(x),
  'print':
  lambda: print_error("Line has '?' or ':' operator at end of line")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: has_comment(x),
  'check': lambda x: has_xxx_mark(x),
  'print': lambda: print_warning("Comment with 'xxx' marker")},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: has_comment(x),
  'check': lambda x: check_comment_spelling(x)},
 
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'check': lambda x: empty_return_with_brace(x),
  'interim_line': True,
  'print':
@@ -584,7 +584,7 @@ std_functions = [
 ('error', 'Use ovs_error() in place of error()'),
 ]
 checks += [
-{'regex': '(\.c|\.h)(\.in)?$',
+{'regex': r'(\.c|\.h)(\.in)?$',
  'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': regex_function_factory(function_name),
@@ -601,11 +601,11 @@ infix_operators = \
 [re.escape(op) for op in ['%', '<<', '>>', '<=', '>=', '==', '!=',
 '^', '|', '&&', '||', '?:', '=', '+=', '-=', '*=', '/=', '%=',
 '&=', '^=', '|=', '<<=', '>>=']] \
-+ ['[^<" ]<[^=" ]', '[^->" ]>[^=" ]', '[^ !()/"]\*[^/]', '[^ !&()"]&',
-   '[^" +(]\+[^"+;]', '[^" -(]-[^"->;]', '[^" <>=!^|+\-*/%&]=[^"=]',
++ ['[^<" ]<[^=" ]', '[^->" ]>[^=" ]', r'[^ !()/"]\*[^/]', '[^ !&()"]&',
+   r'[^" +(]\+[^"+;]', '[^" -(]-[^"->;]', r'[^" <>=!^|+\-*/%&]=[^"=]',
'[^* ]/[^* ]']
 checks += [
-{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+{'regex': r'(\.c|\.h)(\.in)?$', 'match_name': None,
  'prereq': lambda x: not is_comment_line(x),
  'check': regex_operator_factory(operator),
  'print': lambda: print_warning("Line lacks whitespace around operator")}
@@ -694,7 +694,7 @@ def ovs_checkpatch_parse(text, filename, author=None, 

Re: [ovs-dev] [PATCH v13 03/11] dp-packet: Handle multi-seg mbufs in helper funcs.

2019-01-10 Thread Lam, Tiago
On 10/01/2019 11:17, Ian Stokes wrote:
> On 1/9/2019 6:05 PM, Tiago Lam wrote:
>> Most helper functions in dp-packet assume that the data held by a
>> dp_packet is contiguous, and perform operations such as pointer
>> arithmetic under that assumption. However, with the introduction of
>> multi-segment mbufs, where data is non-contiguous, such assumptions are
>> no longer possible. Some examples of Such helper functions are
>> dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
>> dp_packet_get_allocated() and dp_packet_at().
>>
>> Thus, instead of assuming contiguous data in dp_packet, they  now
>> iterate over the (non-contiguous) data in mbufs to perform their
>> calculations.
>>
>> Finally, dp_packet_use__() has also been modified to perform the
>> initialisation of the packet (and setting the source) before continuing
>> to set its size and data length, which now depends on the type of
>> packet.
>>
> 
> Hi Tiago, thanks for the patch, while reviewing a later patch in the 
> series I noticed there are 2 or 3 functions changed in this patch that 
> are changed/removed altogether at a later point. Maybe this was because 
> of a legacy approach and so they were not needed any more?
> 
> It would be good to avoid introducing these changes if they are going to 
> be removed in a later patch unless they are necessary.
> 
> I've the functions in particular  below if you can take a look and confirm.

Thanks for the analysis, Ian.

This is indeed because of different approaches that were taken and then
merged to the linearization patch without much thought on changes made
to earlier patches. I'll remove those pieces from this patch.

Just another reply below.

> 
>> Co-authored-by: Mark Kavanagh 
>>
>> Signed-off-by: Mark Kavanagh 
>> Signed-off-by: Tiago Lam 
>> Acked-by: Eelco Chaudron 
>> Acked-by: Flavio Leitner 
>> ---
>>   lib/dp-packet.c |   4 +-
>>   lib/dp-packet.h | 192 
>> +++-
>>   2 files changed, 178 insertions(+), 18 deletions(-)
>>
>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>> index 93b0e9c..f54119d 100644
>> --- a/lib/dp-packet.c
>> +++ b/lib/dp-packet.c
>> @@ -41,11 +41,11 @@ static void
>>   dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
>>enum dp_packet_source source)
>>   {
>> +dp_packet_init__(b, allocated, source);
>> +
>>   dp_packet_set_base(b, base);
>>   dp_packet_set_data(b, base);
>>   dp_packet_set_size(b, 0);
>> -
>> -dp_packet_init__(b, allocated, source);
>>   }
>>   
>>   /* Initializes 'b' as an empty dp_packet that contains the 'allocated' 
>> bytes of
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 72a8043..8947477 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -133,6 +133,10 @@ static inline void *dp_packet_at(const struct dp_packet 
>> *, size_t offset,
>>size_t size);
>>   static inline void *dp_packet_at_assert(const struct dp_packet *,
>>   size_t offset, size_t size);
>> +#ifdef DPDK_NETDEV
>> +static inline const struct rte_mbuf *
>> +dp_packet_mbuf_from_offset(const struct dp_packet *b, size_t *offset);
>> +#endif
>>   static inline void *dp_packet_tail(const struct dp_packet *);
>>   static inline void *dp_packet_end(const struct dp_packet *);
>>   
>> @@ -185,9 +189,25 @@ dp_packet_delete(struct dp_packet *b)
>>   static inline void *
>>   dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
>>   {
>> -return offset + size <= dp_packet_size(b)
>> -   ? (char *) dp_packet_data(b) + offset
>> -   : NULL;
>> +if (offset + size > dp_packet_size(b)) {
>> +return NULL;
>> +}
>> +
>> +#ifdef DPDK_NETDEV
>> +if (b->source == DPBUF_DPDK) {
>> +struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
>> +
>> +while (buf && offset > buf->data_len) {
>> +offset -= buf->data_len;
>> +
>> +buf = buf->next;
>> +}
>> +
>> +return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
>> +}
>> +#endif
>> +
>> +return (char *) dp_packet_data(b) + offset;
>>   }
>>   
>>   /* Returns a pointer to byte 'offset' in 'b', which must contain at least
>> @@ -196,13 +216,23 @@ static inline void *
>>   dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
>>   {
>>   ovs_assert(offset + size <= dp_packet_size(b));
>> -return ((char *) dp_packet_data(b)) + offset;
>> +return dp_packet_at(b, offset, size);
>>   }
>>   
>>   /* Returns a pointer to byte following the last byte of data in use in 
>> 'b'. */
>>   static inline void *
>>   dp_packet_tail(const struct dp_packet *b)
>>   {
>> +#ifdef DPDK_NETDEV
>> +if (b->source == DPBUF_DPDK) {
>> +struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
>> +/* Find last segment where data ends, meaning the tail of the 
>> chained
>> +  

Re: [ovs-dev] [PATCH v13 08/11] netdev-dpdk: support multi-segment jumbo frames

2019-01-10 Thread Lam, Tiago
Hi Ilya,

On 10/01/2019 11:51, Ilya Maximets wrote:
> On 09.01.2019 21:05, Tiago Lam wrote:
>> From: Mark Kavanagh 
>>

[snip]

>>  
>> @@ -971,6 +1002,24 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int 
>> n_rxq, int n_txq)
>>  }
>>  }
>>  
>> +/* Multi-segment-mbuf-specific setup. */
>> +if (dpdk_multi_segment_mbufs) {
>> +if (info.tx_offload_capa & DEV_TX_OFFLOAD_MULTI_SEGS) {
>> +/* Enable multi-seg mbufs. DPDK PMDs typically attempt to use
>> + * simple or vectorized transmit functions, neither of which are
>> + * compatible with multi-segment mbufs. */
>> +conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
>> +} else {
>> +VLOG_WARN("Interface %s doesn't support multi-segment mbufs",
>> +  dev->up.name);
>> +conf.txmode.offloads &= ~DEV_TX_OFFLOAD_MULTI_SEGS;
> 
> Simple warning is not enough.
> There are few PMDs that does not support segmented packets and does not
> expect them. Sending segmented packets to such ports could cause a crash,
> memory leaks or any other unexpected behaviour and they will simply not
> work, at first.
> We need a fallback solution for this case.
> 
> Best regards, Ilya Maximets.
> 

Thanks, that makes sense. I guess we can have the fallback of failing to
initialize the device as a last resort, and point out that such an
interface can't be used with multi-segs. As discussed in yesterday's
community call, we plan to mark this experimental at first to understand
such cases.

An aside, as part of the TSO work I've started to integrate with
`rte_eth_tx_prepare()` in DPDK. I can see in the code that it wouldn't
help in this case, but I would expect that such cases would be handled
by this function in the future.

What do you think?

Thanks,
Tiago.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 08/11] netdev-dpdk: support multi-segment jumbo frames

2019-01-10 Thread Ilya Maximets
On 09.01.2019 21:05, Tiago Lam wrote:
> From: Mark Kavanagh 
> 
> Currently, jumbo frame support for OvS-DPDK is implemented by
> increasing the size of mbufs within a mempool, such that each mbuf
> within the pool is large enough to contain an entire jumbo frame of
> a user-defined size. Typically, for each user-defined MTU,
> 'requested_mtu', a new mempool is created, containing mbufs of size
> ~requested_mtu.
> 
> With the multi-segment approach, a port uses a single mempool,
> (containing standard/default-sized mbufs of ~2k bytes), irrespective
> of the user-requested MTU value. To accommodate jumbo frames, mbufs
> are chained together, where each mbuf in the chain stores a portion of
> the jumbo frame. Each mbuf in the chain is termed a segment, hence the
> name.
> 
> == Enabling multi-segment mbufs ==
> Multi-segment and single-segment mbufs are mutually exclusive, and the
> user must decide on which approach to adopt on init. The introduction
> of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
> is a global boolean value, which determines how jumbo frames are
> represented across all DPDK ports. In the absence of a user-supplied
> value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
> mbufs must be explicitly enabled / single-segment mbufs remain the
> default.
> 
> Setting the field is identical to setting existing DPDK-specific OVSDB
> fields:
> 
> ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
> ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
> ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
> 
> Co-authored-by: Tiago Lam 
> 
> Signed-off-by: Mark Kavanagh 
> Signed-off-by: Tiago Lam 
> Acked-by: Eelco Chaudron 
> ---
>  Documentation/topics/dpdk/jumbo-frames.rst | 73 
> ++
>  Documentation/topics/dpdk/memory.rst   | 36 +++
>  NEWS   |  1 +
>  lib/dpdk.c |  8 
>  lib/netdev-dpdk.c  | 66 +++
>  lib/netdev-dpdk.h  |  1 +
>  vswitchd/vswitch.xml   | 22 +
>  7 files changed, 199 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
> b/Documentation/topics/dpdk/jumbo-frames.rst
> index 00360b4..9804bbb 100644
> --- a/Documentation/topics/dpdk/jumbo-frames.rst
> +++ b/Documentation/topics/dpdk/jumbo-frames.rst
> @@ -71,3 +71,76 @@ Jumbo frame support has been validated against 9728B 
> frames, which is the
>  largest frame size supported by Fortville NIC using the DPDK i40e driver, but
>  larger frames and other DPDK NIC drivers may be supported. These cases are
>  common for use cases involving East-West traffic only.
> +
> +---
> +Multi-segment mbufs
> +---
> +
> +Instead of increasing the size of mbufs within a mempool, such that each mbuf
> +within the pool is large enough to contain an entire jumbo frame of a
> +user-defined size, mbufs can be chained together instead. In this approach 
> each
> +mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
> +irrespective of the user-requested MTU value. Since each mbuf in the chain is
> +termed a segment, this approach is named "multi-segment mbufs".
> +
> +This approach may bring more flexibility in use cases where the maximum 
> packet
> +length may be hard to guess. For example, in cases where packets originate 
> from
> +sources marked for offload (such as TSO), each packet may be larger than the
> +MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
> +enough to hold all of the packet's data.
> +
> +Multi-segment and single-segment mbufs are mutually exclusive, and the user
> +must decide on which approach to adopt on initialisation. If multi-segment
> +mbufs is to be enabled, it can be done so with the following command::
> +
> +$ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
> +
> +Single-segment mbufs still remain the default when using OvS-DPDK, and the
> +above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
> +multi-segment mbufs are to be used.
> +
> +~
> +Performance notes
> +~
> +
> +When using multi-segment mbufs some PMDs may not support vectorized Tx
> +functions, due to its non-contiguous nature. As a result this can hit
> +performance for smaller packet sizes. For example, on a setup sending 64B
> +packets at line rate, a decrease of ~20% has been observed. The performance
> +impact stops being noticeable for larger packet sizes, although the exact 
> size
> +will depend on each PMD, and vary between architectures.
> +
> +Tests performed with the i40e PMD driver only showed this limitation for 64B
> +packets, and the same rate was observed when comparing multi-segment mbuf

Re: [ovs-dev] [PATCH v13 03/11] dp-packet: Handle multi-seg mbufs in helper funcs.

2019-01-10 Thread Ian Stokes

On 1/9/2019 6:05 PM, Tiago Lam wrote:

Most helper functions in dp-packet assume that the data held by a
dp_packet is contiguous, and perform operations such as pointer
arithmetic under that assumption. However, with the introduction of
multi-segment mbufs, where data is non-contiguous, such assumptions are
no longer possible. Some examples of Such helper functions are
dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
dp_packet_get_allocated() and dp_packet_at().

Thus, instead of assuming contiguous data in dp_packet, they  now
iterate over the (non-contiguous) data in mbufs to perform their
calculations.

Finally, dp_packet_use__() has also been modified to perform the
initialisation of the packet (and setting the source) before continuing
to set its size and data length, which now depends on the type of
packet.



Hi Tiago, thanks for the patch, while reviewing a later patch in the 
series I noticed there are 2 or 3 functions changed in this patch that 
are changed/removed altogether at a later point. Maybe this was because 
of a legacy approach and so they were not needed any more?


It would be good to avoid introducing these changes if they are going to 
be removed in a later patch unless they are necessary.


I've the functions in particular  below if you can take a look and confirm.


Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
  lib/dp-packet.c |   4 +-
  lib/dp-packet.h | 192 +++-
  2 files changed, 178 insertions(+), 18 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 93b0e9c..f54119d 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -41,11 +41,11 @@ static void
  dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
   enum dp_packet_source source)
  {
+dp_packet_init__(b, allocated, source);
+
  dp_packet_set_base(b, base);
  dp_packet_set_data(b, base);
  dp_packet_set_size(b, 0);
-
-dp_packet_init__(b, allocated, source);
  }
  
  /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 72a8043..8947477 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -133,6 +133,10 @@ static inline void *dp_packet_at(const struct dp_packet *, 
size_t offset,
   size_t size);
  static inline void *dp_packet_at_assert(const struct dp_packet *,
  size_t offset, size_t size);
+#ifdef DPDK_NETDEV
+static inline const struct rte_mbuf *
+dp_packet_mbuf_from_offset(const struct dp_packet *b, size_t *offset);
+#endif
  static inline void *dp_packet_tail(const struct dp_packet *);
  static inline void *dp_packet_end(const struct dp_packet *);
  
@@ -185,9 +189,25 @@ dp_packet_delete(struct dp_packet *b)

  static inline void *
  dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
  {
-return offset + size <= dp_packet_size(b)
-   ? (char *) dp_packet_data(b) + offset
-   : NULL;
+if (offset + size > dp_packet_size(b)) {
+return NULL;
+}
+
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+while (buf && offset > buf->data_len) {
+offset -= buf->data_len;
+
+buf = buf->next;
+}
+
+return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+}
+#endif
+
+return (char *) dp_packet_data(b) + offset;
  }
  
  /* Returns a pointer to byte 'offset' in 'b', which must contain at least

@@ -196,13 +216,23 @@ static inline void *
  dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
  {
  ovs_assert(offset + size <= dp_packet_size(b));
-return ((char *) dp_packet_data(b)) + offset;
+return dp_packet_at(b, offset, size);
  }
  
  /* Returns a pointer to byte following the last byte of data in use in 'b'. */

  static inline void *
  dp_packet_tail(const struct dp_packet *b)
  {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+/* Find last segment where data ends, meaning the tail of the chained
+ *  mbufs must be there */
+buf = rte_pktmbuf_lastseg(buf);
+
+return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len);
+}
+#endif
  return (char *) dp_packet_data(b) + dp_packet_size(b);
  }
  
@@ -211,6 +241,15 @@ dp_packet_tail(const struct dp_packet *b)

  static inline void *
  dp_packet_end(const struct dp_packet *b)
  {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+buf = rte_pktmbuf_lastseg(buf);
+
+return (char *) buf->buf_addr + buf->buf_len;
+}
+#endif
  return (char *) dp_packet_base(b) + dp_pack

Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

2019-01-10 Thread Ilya Maximets
On 08.01.2019 12:31, Ophir Munk wrote:
> Dpdk port representors were introduced in dpdk versions 18.xx.
> Prior to port representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (referenced as
> dpdk port id in OVS). With port representors the relationship becomes
> one-to-many rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address :08:00.0: ":08:00.0,representor=[3]" and
> ":08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device port representors in [3] is closed - the PCI bus
> cannot be detached until the other device port representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include port representors.
> Dpdk port representors related commits are listed in [1]. Dpdk port
> representors documentation appears in [2]. A sample configuration
> which uses two representors ports (the output of "ovs-vsctl show"
> command) is shown in [3].
> 
> [1]
> e0cb96204b71 ("net/i40e: add support for representor ports")
> cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> 26c08b979d26 ("net/mlx5: add port representor awareness")
> 
> [2]
> doc/guides/prog_guide/switch_representation.rst
> 
> [3]
> Bridge "ovs_br0"
> Port "ovs_br0"
> Interface "ovs_br0"
> type: internal
> Port "port-rep3"
> Interface "port-rep3"
> type: dpdk
> options: {dpdk-devargs=":08:00.0,representor=[3]"}
> Port "port-rep5"
> Interface "port-rep5"
> type: dpdk
> options: {dpdk-devargs=":08:00.0,representor=[5]"}
> ovs_version: "2.10.90"
> 
> Signed-off-by: Ophir Munk 
> ---
> v1 (hwol branch):
> 1. rebase on top of Kevin's patch
> dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to 
> use DPDK 18.11.
> https://patchwork.ozlabs.org/patch/1005535/
> 2. skipping count of sibling ports in case the sibling port state is 
> RTE_ETH_DEV_UNUSED 
> 
> v2 (switching to master branch):
> 1. Update based on review comments: 
> https://patchwork.ozlabs.org/patch/1011457/#2051417
> 2. Send patch on master branch
> 
> v3:
> Add a FIXME comment regarding the direct access to DPDK rte_eth_devices array
> 
> v4:
> 1. Add FIXME comment to every direct access to DPDK rte_eth_devices array
> 2. Do not probe unconditionally during add-port. Instead see if a device has 
> already
> been probed, and if so skip the rte_dev_probe(devargs)
> 
> 
> 
>  lib/netdev-dpdk.c | 136 
> ++
>  1 file changed, 106 insertions(+), 30 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 320422b..b6e903f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1218,6 +1218,26 @@ dpdk_dev_parse_name(const char dev_name[], const char 
> prefix[],
>  }
>  }
>  
> +/* Get the number of OVS interfaces which have the same DPDK
> + * rte device (e.g. same pci bus address).
> + * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> + */
> +static int
> +netdev_dpdk_get_num_ports(struct rte_device *device)
> +OVS_REQUIRES(dpdk_mutex)
> +{
> +struct netdev_dpdk *dev;
> +int count = 0;
> +
> +LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +if (rte_eth_devices[dev->port_id].device == device
> +&& rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
> +count++;
> +}
> +}
> +return count;
> +}
> +
>  static int
>  vhost_common_construct(struct netdev *netdev)
>  OVS_REQUIRES(dpdk_mutex)
> @@ -1353,20 +1373,25 @@ static void
>  netdev_dpdk_destruct(struct netdev *netdev)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -struct rte_eth_dev_info dev_info;
> +struct rte_device *rte_dev;
>  
>  ovs_mutex_lock(&dpdk_mutex);
>  
>  rte_eth_dev_stop(dev->port_id);
>  dev->started = false;
> -

Please, keep the empty line.

>  if (dev->attached) {
> +/* Remove the port eth device. */
>  rte_eth_dev_close(dev->port_id);
> -rte_eth_dev_info_get(dev->port_id, &dev_info);
> -if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> -VLOG_INFO("Device '%s' has been detached", dev->devargs);

Please keep this log message for the successful case.

> -} else {
> -VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +VLOG_INFO("Device '%s' has been removed", dev->devargs);
> +/* If this is the last port_id using this rte device
> + * remove this rte device and all its eth devices.
> + * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> + */
> +rte_dev = rte_eth_devices[dev->port_id].device;
> +if (netdev_dpdk_get_num_ports(rte_dev) == 1) {

rte_eth_dev_close() sets the R

Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread Lam, Tiago
On 10/01/2019 09:12, David Marchand wrote:
> On Thu, Jan 10, 2019 at 10:11 AM David Marchand
> mailto:david.march...@redhat.com>> wrote:
> 
> This part triggers build (non fatal o_O) warnings on fedora 28.
> 
> 
> Oops, important.
> This is when building without dpdk support.

Hi David,

Thanks for pointing this out.

I don't seem to be able to trigger this locally, can you point out what
gcc version you're using (or clang)? The Travis [1] and the ML bot
didn't complain either, so this might be specific to certain versions.
But I'll gladly take a look if I'm to replicate - although it warning
seems fairly straightforward to fix.

Tiago.

[1] https://travis-ci.org/istokes/ovs/builds/477475924

> 
> 
> -- 
> David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] compat: Fixup ipv6 fragmentation on 4.9.135+ kernels

2019-01-10 Thread Greg Rose
Upstream commit 648700f76b03 ("inet: frags: use rhashtables...") changed
how ipv6 fragmentation is implemented.  This patch was backported to
the upstream stable 4.9.x kernel starting at 4.9.135.

This patch creates the compatibility layer changes required to both
compile and also operate correctly with ipv6 fragmentation on these
kernels. Check if the inet_frags 'rnd' field is present to key on
whether the upstream patch is present.  Also update Travis to the
latest 4.9 kernel release so that this patch is compile tested.

Passes Travis:
https://travis-ci.org/gvrose8192/ovs-experimental/builds/478033409

Cc: William Tu 
Cc: Yi-Hung Wei 
Cc: Yifeng Sun 
Signed-off-by: Greg Rose 
---
 .travis.yml|  2 +-
 acinclude.m4   |  3 ++
 datapath/linux/compat/nf_conntrack_reasm.c | 54 --
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index a2ef8bd..4d82e2e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,7 +39,7 @@ env:
   - KERNEL=4.16.18
   - KERNEL=4.15.18
   - KERNEL=4.14.63
-  - KERNEL=4.9.120
+  - KERNEL=4.9.149
   - KERNEL=4.4.148
   - KERNEL=3.19.8
   - KERNEL=3.16.57
diff --git a/acinclude.m4 b/acinclude.m4
index 8b43d2b..f038fd4 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -915,6 +915,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/ip_tunnels.h],
 [ip_tunnel_info_opts_set], [flags],
 [OVS_DEFINE([HAVE_IP_TUNNEL_INFO_OPTS_SET_FLAGS])])
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/net/inet_frag.h], [inet_frags],
+[rnd],
+[OVS_DEFINE([HAVE_INET_FRAGS_RND])])
 
   if cmp -s datapath/linux/kcompat.h.new \
 datapath/linux/kcompat.h >/dev/null 2>&1; then
diff --git a/datapath/linux/compat/nf_conntrack_reasm.c 
b/datapath/linux/compat/nf_conntrack_reasm.c
index ce13112..9d77d98 100644
--- a/datapath/linux/compat/nf_conntrack_reasm.c
+++ b/datapath/linux/compat/nf_conntrack_reasm.c
@@ -99,6 +99,7 @@ static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h)
return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK);
 }
 
+#ifdef HAVE_INET_FRAGS_RND
 static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr,
 const struct in6_addr *daddr)
 {
@@ -125,6 +126,7 @@ static unsigned int nf_hashfn(struct inet_frag_queue *q)
return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
 }
 
+#endif /* HAVE_INET_FRAGS_RND */
 static void nf_ct_frag6_expire(unsigned long data)
 {
struct frag_queue *fq;
@@ -133,9 +135,14 @@ static void nf_ct_frag6_expire(unsigned long data)
fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
net = get_net_from_netns_frags6(fq->q.net);
 
+#ifdef HAVE_INET_FRAGS_RND
ip6_expire_frag_queue(net, fq, &nf_frags);
+#else
+   ip6_expire_frag_queue(net, fq);
+#endif
 }
 
+#ifdef HAVE_INET_FRAGS_RND
 /* Creation primitives. */
 static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 u32 user, struct in6_addr *src,
@@ -168,7 +175,27 @@ static inline struct frag_queue *fq_find(struct net *net, 
__be32 id,
}
return container_of(q, struct frag_queue, q);
 }
+#else
+static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
+ const struct ipv6hdr *hdr, int iif)
+{
+   struct frag_v6_compare_key key = {
+   .id = id,
+   .saddr = hdr->saddr,
+   .daddr = hdr->daddr,
+   .user = user,
+   .iif = iif,
+   };
+   struct inet_frag_queue *q;
+
+   q = inet_frag_find(&net->nf_frag.frags, &key);
+   if (!q)
+   return NULL;
+
+   return container_of(q, struct frag_queue, q);
+}
 
+#endif  /* HAVE_INET_FRAGS_RND */
 
 static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
 const struct frag_hdr *fhdr, int nhoff)
@@ -317,7 +344,11 @@ found:
return 0;
 
 discard_fq:
+#ifdef HAVE_INET_FRAGS_RND
inet_frag_kill(&fq->q, &nf_frags);
+#else
+   inet_frag_kill(&fq->q);
+#endif
 err:
return -1;
 }
@@ -339,7 +370,11 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff 
*prev,  struct net_devic
intpayload_len;
u8 ecn;
 
+#ifdef HAVE_INET_FRAGS_RND
inet_frag_kill(&fq->q, &nf_frags);
+#else
+   inet_frag_kill(&fq->q);
+#endif
 
WARN_ON(head == NULL);
WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -561,8 +596,13 @@ int rpl_nf_ct_frag6_gather(struct net *net, struct sk_buff 
*skb, u32 user)
 #endif
 
skb_orphan(skb);
+#ifdef HAVE_INET_FRAGS_RND
fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
 ip6_frag_ecn(hdr));
+#else
+   fq = fq_find(net, fhdr->identification, us

Re: [ovs-dev] [PATCH v2 1/2] conntrack: fix tcp seq adjustments when mangling commands

2019-01-10 Thread Darrell Ball
On Wed, Jan 9, 2019 at 12:53 AM David Marchand 
wrote:

> Hello,
>
> On Wed, Jan 9, 2019 at 4:51 AM Darrell Ball  wrote:
>
>> On Tue, Jan 8, 2019 at 2:57 PM Darrell Ball  wrote:
>>
>>> On Tue, Jan 8, 2019 at 2:13 AM David Marchand 
>>> wrote:
>>>

 Let me take an example with the test CHECK_FTP_SNAT_POST_RECIRC() +
 address 10.1.1.240.

 When dealing with the first mangle operation (client sending its first
 "PORT xxx" for the first "ls" command), the conn tcp seq skew value is at 
 0.
 repl_ftp_v4_addr() returns 2 since it replaced 10.1.1.1 with 10.1.1.240.
 The tcp seq numbers is kept untouched in the "if (do_seq_skew_adj &&
 seq_skew != 0)" block later in this function.

 When dealing with the second mangle operation (second "PORT xxx"), the
 current conn tcp skew value is at 2.
 repl_ftp_v4_addr() returns again 2 since it replaced 10.1.1.1 with
 10.1.1.240.
 The tcp seq number is updated by 2 in the "if (do_seq_skew_adj &&
 seq_skew != 0)" block later in this function.

 Now, you could argue that the code you propose can work so far, but if
 we add a new command again in this session, the tcp seq numbers sequence is
 broken.
 The command packet tcp seq number would be updated by the
 repl_ftp_v4_addr() return value 2 and not by the accumulated value which
 would be 4 for it to work.

>>>
>>> @@ -3170,9 +3178,8 @@ repl_ftp_v6_addr(struct dp_packet *pkt, struct
>>> ct_addr v6_addr_rep,
>>>
>>>  static void
>>>  handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx,
>>> -   struct dp_packet *pkt,
>>> -   const struct conn *conn_for_expectation,
>>> -   long long now, enum ftp_ctl_pkt ftp_ctl, bool nat)
>>> +   struct dp_packet *pkt, const struct conn *ec, long long
>>> now,
>>> +   enum ftp_ctl_pkt ftp_ctl, bool nat)
>>>  {
>>>  struct ip_header *l3_hdr = dp_packet_l3(pkt);
>>>  ovs_be32 v4_addr_rep = 0;
>>> @@ -3187,24 +3194,22 @@ handle_ftp_ctl(struct conntrack *ct, const
>>> struct conn_lookup_ctx *ctx,
>>>  return;
>>>  }
>>>
>>> -if (!nat || !conn_for_expectation->seq_skew) {
>>> +if (!nat || !ec->seq_skew) {
>>>  do_seq_skew_adj = false;
>>>  }
>>>
>>>
>>
> How about removing this boolean ?
> See below.
>
>
>>  struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt);
>>>  int64_t seq_skew = 0;
>>>
>>> -if (ftp_ctl == CT_FTP_CTL_OTHER) {
>>> -seq_skew = conn_for_expectation->seq_skew;
>>> -} else if (ftp_ctl == CT_FTP_CTL_INTEREST) {
>>> +if (ftp_ctl == CT_FTP_CTL_INTEREST) {
>>>  enum ftp_ctl_pkt rc;
>>>  if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
>>> -rc = process_ftp_ctl_v6(ct, pkt, conn_for_expectation,
>>> +rc = process_ftp_ctl_v6(ct, pkt, ec,
>>>  &v6_addr_rep, &ftp_data_start,
>>>  &addr_offset_from_ftp_data_start,
>>>  &addr_size, &mode);
>>>  } else {
>>> -rc = process_ftp_ctl_v4(ct, pkt, conn_for_expectation,
>>> +rc = process_ftp_ctl_v4(ct, pkt, ec,
>>>  &v4_addr_rep, &ftp_data_start,
>>>  &addr_offset_from_ftp_data_start);
>>>  }
>>> @@ -3214,65 +3219,67 @@ handle_ftp_ctl(struct conntrack *ct, const
>>> struct conn_lookup_ctx *ctx,
>>>  pkt->md.ct_state |= CS_TRACKED | CS_INVALID;
>>>  return;
>>>  } else if (rc == CT_FTP_CTL_INTEREST) {
>>> +
>>>  uint16_t ip_len;
>>>
>>>  if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) {
>>> -seq_skew = repl_ftp_v6_addr(pkt, v6_addr_rep,
>>> ftp_data_start,
>>> -
>>> addr_offset_from_ftp_data_start,
>>> -addr_size, mode);
>>> +if (nat) {
>>> +seq_skew = repl_ftp_v6_addr(pkt, v6_addr_rep,
>>> +   ftp_data_start,
>>> +   addr_offset_from_ftp_data_start,
>>> +   addr_size, mode);
>>> +}
>>> +
>>>  if (seq_skew) {
>>> -ip_len = ntohs(nh6->ip6_ctlun.ip6_un1.ip6_un1_plen);
>>> -ip_len += seq_skew;
>>> +ip_len = ntohs(nh6->ip6_ctlun.ip6_un1.ip6_un1_plen)
>>> +
>>> +seq_skew;
>>>  nh6->ip6_ctlun.ip6_un1.ip6_un1_plen = htons(ip_len);
>>> -conn_seq_skew_set(ct, &conn_for_expectation->key,
>>> now,
>>> -  seq_skew, ctx->reply);
>>>  }
>>>  } else {
>>> -seq_skew = repl_ftp_v4_addr(pkt, v4_addr_rep,
>>> ftp_data_start,
>>> -
>>> addr_offset_from_ftp_data_start);
>>> -   

Re: [ovs-dev] [patch v1] conntrack: Fix FTP seq_skew boundary adjustments.

2019-01-10 Thread Darrell Ball
On Thu, Jan 10, 2019 at 1:03 AM David Marchand 
wrote:

> Hello,
>
> On Wed, Jan 9, 2019 at 4:44 AM Darrell Ball  wrote:
>
>> Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
>> Signed-off-by: Darrell Ball 
>> ---
>>
>> Backport to 2.8.
>>
>>  lib/conntrack.c | 10 --
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 6f6021a..e992b77 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -3255,10 +3255,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
>> conn_lookup_ctx *ctx,
>>  uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack));
>>
>>  if ((seq_skew > 0) && (tcp_ack < seq_skew)) {
>> -/* Should not be possible; will be marked invalid. */
>> -tcp_ack = 0;
>> +tcp_ack = UINT32_MAX - (seq_skew - tcp_ack - 1);
>>  } else if ((seq_skew < 0) && (UINT32_MAX - tcp_ack <
>> -seq_skew)) {
>> -tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack);
>> +tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack) - 1;
>>  } else {
>>  tcp_ack -= seq_skew;
>>  }
>> @@ -3267,10 +3266,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
>> conn_lookup_ctx *ctx,
>>  } else {
>>  uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq));
>>  if ((seq_skew > 0) && (UINT32_MAX - tcp_seq < seq_skew)) {
>> -tcp_seq = seq_skew - (UINT32_MAX - tcp_seq);
>> +tcp_seq = seq_skew - (UINT32_MAX - tcp_seq) - 1;
>>  } else if ((seq_skew < 0) && (tcp_seq < -seq_skew)) {
>> -/* Should not be possible; will be marked invalid. */
>> -tcp_seq = 0;
>> +tcp_seq = UINT32_MAX - ((-seq_skew) - tcp_seq - 1);
>>  } else {
>>  tcp_seq += seq_skew;
>>  }
>> --
>> 1.9.1
>>
>>
> Ah, now that I think about it, I had seen some packets with ack == 0 but
> did not reproduce (it was in my todolist).
> Good catch.
>
> Just wondering, can't we rely integer promotion then truncation on 32bits ?
> My test program tends to show it works.
>

I believe a compiler is free to convert the operands to int64 and AFAIK
overflow/underflow is undefined for signed variables.


>
>

>

> Something like:
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index eb36353..04ddf5e 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -3329,32 +3329,13 @@ handle_ftp_ctl(struct conntrack *ct, const struct
> conn_lookup_ctx *ctx,
>
>  if (nat && ec->seq_skew != 0) {
>  if (ctx->reply != ec->seq_skew_dir) {
> -
>  uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack));
>
> -if ((ec->seq_skew > 0) && (tcp_ack < ec->seq_skew)) {
> -/* Should not be possible; will be marked invalid. */
> -tcp_ack = 0;
> -} else if ((ec->seq_skew < 0) &&
> -   (UINT32_MAX - tcp_ack < -ec->seq_skew)) {
> -tcp_ack = (-ec->seq_skew) - (UINT32_MAX - tcp_ack);
> -} else {
> -tcp_ack -= ec->seq_skew;
> -}
> -ovs_be32 new_tcp_ack = htonl(tcp_ack);
> -put_16aligned_be32(&th->tcp_ack, new_tcp_ack);
> +put_16aligned_be32(&th->tcp_ack, htonl(tcp_ack -
> ec->seq_skew));
>  } else {
>  uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq));
> -if ((ec->seq_skew > 0) && (UINT32_MAX - tcp_seq <
> ec->seq_skew)) {
> -tcp_seq = ec->seq_skew - (UINT32_MAX - tcp_seq);
> -} else if ((ec->seq_skew < 0) && (tcp_seq < -ec->seq_skew)) {
> -/* Should not be possible; will be marked invalid. */
> -tcp_seq = 0;
> -} else {
> -tcp_seq += ec->seq_skew;
> -}
> -ovs_be32 new_tcp_seq = htonl(tcp_seq);
> -put_16aligned_be32(&th->tcp_seq, new_tcp_seq);
> +
> +put_16aligned_be32(&th->tcp_seq, htonl(tcp_seq +
> ec->seq_skew));
>  }
>  }
>
>
> --
> David Marchand
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] ovn: Support a new Logical_Switch_Port.type - 'external'

2019-01-10 Thread nusiddiq
From: Numan Siddique 

In the case of OpenStack + OVN, when the VMs are booted on
hypervisors supporting SR-IOV nics, there are no OVS ports
for these VMs. When these VMs sends DHCPv4, DHPCv6 or IPv6
Router Solicitation requests, the local ovn-controller
cannot reply to these packets. OpenStack Neutron dhcp agent
service needs to be run to serve these requests.

With the new logical port type - 'external', OVN itself can
handle these requests avoiding the need to deploy any
external services like neutron dhcp agent.

To make use of this feature, CMS has to
 - create a logical port for such VMs
 - set the type to 'external'
 - set requested-chassis="" in the options
   column.
 - create a localnet port for the logical switch
 - configure the ovn-bridge-mappings option in the OVS db.

When the ovn-controller running in that 'chassis', detects
the Port_Binding row, it adds the necessary DHCPv4/v6 OF
flows. Since the packet enters the logical switch pipeline
via the localnet port, the inport register (reg14) is set
to the tunnel key of localnet port in the match conditions.

In case the chassis goes down for some reason, it is the
responsibility of CMS to change the 'requested-chassis'
option to some other active chassis, so that it can serve
these requests.

When the VM with the external port, sends an ARP request for
the router ips, only the chassis which has claimed the port,
will reply to the ARP requests. Rest of the chassis on
receiving these packets drop them in the ingress switch
datapath stage - S_SWITCH_IN_EXTERNAL_PORT which is just
before S_SWITCH_IN_L2_LKUP.

This would guarantee that only the chassis which has claimed
the external ports will run the router datapath pipeline.

Signed-off-by: Numan Siddique 
---

v3 -> v4
--
  * Updated the documention as per Han Zhou's suggestion.

v2 -> v3
---
  * Rebased 

 ovn/controller/binding.c|  15 +-
 ovn/controller/lflow.c  |  41 ++-
 ovn/controller/lflow.h  |   2 +
 ovn/controller/lport.c  |  26 ++
 ovn/controller/lport.h  |   5 +
 ovn/controller/ovn-controller.c |   6 +
 ovn/lib/ovn-util.c  |   1 +
 ovn/northd/ovn-northd.8.xml |  52 +++-
 ovn/northd/ovn-northd.c | 123 ++--
 ovn/ovn-architecture.7.xml  |  76 +
 ovn/ovn-nb.xml  |  44 +++
 tests/ovn.at| 530 +++-
 12 files changed, 889 insertions(+), 32 deletions(-)

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 021ecddcf..ee396c93d 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -471,13 +471,26 @@ consider_local_datapath(struct ovsdb_idl_txn 
*ovnsb_idl_txn,
  * for them. */
 sset_add(local_lports, binding_rec->logical_port);
 our_chassis = false;
+} else if (!strcmp(binding_rec->type, "external")) {
+const char *chassis_id = smap_get(&binding_rec->options,
+  "requested-chassis");
+our_chassis = chassis_id && (
+!strcmp(chassis_id, chassis_rec->name) ||
+!strcmp(chassis_id, chassis_rec->hostname));
+if (our_chassis) {
+add_local_datapath(sbrec_datapath_binding_by_key,
+   sbrec_port_binding_by_datapath,
+   sbrec_port_binding_by_name,
+   binding_rec->datapath, true, local_datapaths);
+}
 }
 
 if (our_chassis
 || !strcmp(binding_rec->type, "patch")
 || !strcmp(binding_rec->type, "localport")
 || !strcmp(binding_rec->type, "vtep")
-|| !strcmp(binding_rec->type, "localnet")) {
+|| !strcmp(binding_rec->type, "localnet")
+|| !strcmp(binding_rec->type, "external")) {
 update_local_lport_ids(local_lport_ids, binding_rec);
 }
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 8db81927e..98e8ed3b9 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -52,7 +52,10 @@ lflow_init(void)
 struct lookup_port_aux {
 struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
 struct ovsdb_idl_index *sbrec_port_binding_by_name;
+struct ovsdb_idl_index *sbrec_port_binding_by_type;
+struct ovsdb_idl_index *sbrec_datapath_binding_by_key;
 const struct sbrec_datapath_binding *dp;
+const struct sbrec_chassis *chassis;
 };
 
 struct condition_aux {
@@ -66,6 +69,8 @@ static void consider_logical_flow(
 struct ovsdb_idl_index *sbrec_chassis_by_name,
 struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath,
 struct ovsdb_idl_index *sbrec_port_binding_by_name,
+struct ovsdb_idl_index *sbrec_port_binding_by_type,
+struct ovsdb_idl_index *sbrec_datapath_binding_by_key,
 const struct sbrec_logical_flow *,
 const struct hmap *local_datapaths,
 const struct sbrec_chassis *,
@@ -89,8 +94,24 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *

Re: [ovs-dev] [PATCH 1/3] nroff: Increase width for .IP used for ordered lists.

2019-01-10 Thread Mark Michelson

Hi Ben,

Excellent new document. This will be great to have in the code.

For the series
Acked-by: Mark Michelson 

On 11/9/18 12:39 AM, Ben Pfaff wrote:

The ordered lists that a .25in width produced looked OK in PostScript
or PDF output, but in text output every list item spanned two lines,
like this:

1.
  First list item.
2.
  Second list item.

With this change, they appear normally:

1. First list item.
2. Second list item.

Signed-off-by: Ben Pfaff 
---
  python/build/nroff.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/build/nroff.py b/python/build/nroff.py
index 73353061..17ff69bfe47a 100644
--- a/python/build/nroff.py
+++ b/python/build/nroff.py
@@ -316,7 +316,7 @@ def block_xml_to_nroff(nodes, para='.PP'):
  if node.tagName == 'ul':
  s += ".IP \\(bu\n"
  else:
-s += ".IP %d. .25in\n" % i
+s += ".IP %d. .4in\n" % i
  s += block_xml_to_nroff(li_node.childNodes, ".IP")
  elif li_node.nodeType == node.COMMENT_NODE:
  pass



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovn: Support a new Logical_Switch_Port.type - 'external'

2019-01-10 Thread Numan Siddique
On Fri, Jan 11, 2019 at 12:09 AM Numan Siddique  wrote:

>
>
> On Fri, Jan 11, 2019 at 12:07 AM Han Zhou  wrote:
>
>>
>>
>>
>> On Thu, Dec 20, 2018 at 5:34 AM Numan Siddique 
>> wrote:
>> >
>> >
>> >
>> > On Wed, Dec 19, 2018 at 4:23 PM Miguel Angel Ajo Pelayo <
>> majop...@redhat.com> wrote:
>> >>
>> >> Oh, if requested chassis matches the chasis of the SRIOV instance,
>> then we don't need HA. Ignore my comments regarding multiple chassis.
>> >>
>> >> And yes it's my understanding that the traffic will bounce to the host
>> interface probably through the internal switch embedded in the SRIOV card,
>> or, if the host uses a separate interface then via de Tor switch.
>> >>
>> >>
>> >> On Wed, Dec 19, 2018, 1:54 AM Han Zhou > >>>
>> >>> Hi Numan, I haven't finished reviewing the whole patch yet, but I
>> have some
>> >>> questions inlined.
>> >>>
>> >>>
>> >>> > +
>> >>> > +
>> >>> > +  
>> >>> > +
>> >>> > +  For each router port IP address A which
>> belongs
>> >>> to the
>> >>> > +  logical switch, A priority-100 flow is added which
>> matches
>> >>> > +  REGBIT_EXT_PORT_NOT_RESIDENT && arp.tpa ==
>> >>> A
>> >>> > +  && arp.op == 1 (ARP request to the router
>> >>> > +  IP) with the action to drop the packet.
>> >>> > +
>> >>> > +
>> >>> > +
>> >>> > +  These flows guarantees that the ARP/NS request to the
>> router IP
>> >>> > +  address from the external ports is responded by only the
>> >>> chassis
>> >>> > +  which has claimed these external ports. All the other
>> chassis,
>> >>> > +  drops these packets.
>> >>> > +
>> >>>
>> >>> Could you explain more about how this solves
>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1613384
>> >>> For my understanding, the logical router port MAC would still flap on
>> the
>> >>> physical switch ports, if there are multiple external ports and their
>> >>> "requested-chassis" are set to different chassis. Or does this suggest
>> >>> specifying a single chassis as "requested-chassis" for all
>> external-ports?
>> >>>
>> >
>> > You are right. If there are multiple external ports and their
>> "requested-chassis" are set to
>> > different chassis  you would still see MAC flap issue. I would say CMS
>> should handle
>> > this situation and set the same "requested-chassis" for all external
>> ports.
>> >
>>
>> OK. So would it be better to document this consideration?
>>
>
> Yeah. Sounds good to me. I will update the patch with the documentation.
>
>
Please note that, CMS configures the source MAC to use when ovn-controller
sends the DHCP response (with the put_dhcp_opts action).
So we will see MAC flap issue for this MAC configured and not for the
logical
router port MAC.

Thanks
Numan



> Thanks
> Numan
>
>
>>
>> >
>> >>>
>> >>> > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> >>> > index 3936e6016..37b97a0d9 100644
>> >>> > --- a/ovn/ovn-architecture.7.xml
>> >>> > +++ b/ovn/ovn-architecture.7.xml
>> >>> > @@ -1678,6 +1678,72 @@
>> >>> >  
>> >>> >
>> >>> >
>> >>> > +  Native OVN services for external logical ports
>> >>> > +
>> >>> > +  
>> >>> > +To support OVN native services (like DHCP/IPv6 RA/DNS lookup)
>> to the
>> >>> > +cloud resources which are external, OVN supports
>> >>> external
>> >>> > +logical ports.
>> >>> > +  
>> >>> > +
>> >>> > +  
>> >>> > +Below are some of the use cases where external
>> ports
>> >>> can be
>> >>> > +used.
>> >>> > +  
>> >>> > +
>> >>> > +  
>> >>> > +
>> >>> > +  VMs connected to SR-IOV nics - Traffic from these VMs by
>> passes the
>> >>> > +  kernel stack and local ovn-controller do not
>> bind
>> >>> these
>> >>> > +  ports and cannot serve the native services.
>> >>> > +
>> >>>
>> >>> Would the broadcast traffic (e.g. DHCP discover) sent out from SR-IOV
>> come
>> >>> back to the local host? Is it free to select either the local host or
>> any
>> >>> other hosts as the "requested-chassis"? Or does this suggest that we
>> have
>> >>> to use a different chassis other than the local host as the
>> >>> "requested-chassis"?
>> >>>
>> >
>> > As Miguel mentioned the traffic may be received by the compute host
>> where the SR-IOV VM
>> > is present if the host NIC is connected to the same switch.
>> >
>> > It is free to select any chassis as long as that chassis can receive
>> the broadcast packet.
>> >
>> >
>> >
>> >>>
>> >>> I will review in more detail later.
>> >
>> >
>> > Thanks
>> > Numan
>> >
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Han
>> >>> ___
>> >>> dev mailing list
>> >>> d...@openvswitch.org
>> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] odp-util: Do not rewrite fields with the same values as matched

2019-01-10 Thread Yifeng Sun
Hi,

When I try to understand this patch, I feel there may be some issue in this
loop below.
It looks like each loop is doing the same thing. Can you please take a look?

+for (i = 0; i < size; i++)
+if (memcmp(pkey0, pkey1, size) == 0)
+memset(pmask, 0, size);
+else
+differ = true;

Thanks,
Yifeng

On Thu, Jan 10, 2019 at 12:52 AM Eli Britstein  wrote:

> To improve performance and avoid wasting resources for HW offloaded
> flows, do not rewrite fields that are matched with the same value.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Roi Dayan 
> ---
>  lib/odp-util.c| 110
> +-
>  tests/mpls-xlate.at   |   2 +-
>  tests/ofproto-dpif.at |  14 +++
>  3 files changed, 108 insertions(+), 18 deletions(-)
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 0491bed38..7e916f9d9 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -7086,12 +7086,50 @@ commit_odp_tunnel_action(const struct flow *flow,
> struct flow *base,
>  }
>  }
>
> +struct ovs_key_field_properties {
> +int offset;
> +int size;
> +};
> +
> +/* Compare the keys similary to memcmp, but each field separately.
> + * The offsets and sizes of each field is provided by field_properties
> + * argument.
> + * For fields with the same value, zero out their mask part in order not
> to
> + * rewrite them as it's unnecessary */
> +static bool
> +keycmp_mask(const void *key0, const void *key1,
> +struct ovs_key_field_properties *field_properties, void *mask)
> +{
> +int field, i;
> +bool differ = false;
> +
> +for (field = 0 ; ; field++) {
> +int size = field_properties[field].size;
> +int offset = field_properties[field].offset;
> +char *pkey0 = ((char *)key0) + offset;
> +char *pkey1 = ((char *)key1) + offset;
> +char *pmask = ((char *)mask) + offset;
> +
> +if (size == 0)
> +break;
> +
> +for (i = 0; i < size; i++)
> +if (memcmp(pkey0, pkey1, size) == 0)
> +memset(pmask, 0, size);
> +else
> +differ = true;
> +}
> +
> +return differ;
> +}
> +
>  static bool
>  commit(enum ovs_key_attr attr, bool use_masked_set,
> const void *key, void *base, void *mask, size_t size,
> +   struct ovs_key_field_properties *field_properties,
> struct ofpbuf *odp_actions)
>  {
> -if (memcmp(key, base, size)) {
> +if (keycmp_mask(key, base, field_properties, mask)) {
>  bool fully_masked = odp_mask_is_exact(attr, mask, size);
>
>  if (use_masked_set && !fully_masked) {
> @@ -7133,6 +7171,12 @@ commit_set_ether_action(const struct flow *flow,
> struct flow *base_flow,
>  bool use_masked)
>  {
>  struct ovs_key_ethernet key, base, mask;
> +struct ovs_key_field_properties ovs_key_ethernet_field_properties[] =
> {
> +#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ethernet,
> name), sizeof(type)},
> +OVS_KEY_ETHERNET_FIELDS
> +{0, 0}
> +#undef OVS_KEY_FIELD
> +};
>
>  if (flow->packet_type != htonl(PT_ETH)) {
>  return;
> @@ -7143,7 +7187,8 @@ commit_set_ether_action(const struct flow *flow,
> struct flow *base_flow,
>  get_ethernet_key(&wc->masks, &mask);
>
>  if (commit(OVS_KEY_ATTR_ETHERNET, use_masked,
> -   &key, &base, &mask, sizeof key, odp_actions)) {
> +   &key, &base, &mask, sizeof key,
> +   ovs_key_ethernet_field_properties, odp_actions)) {
>  put_ethernet_key(&base, base_flow);
>  put_ethernet_key(&mask, &wc->masks);
>  }
> @@ -7269,6 +7314,12 @@ commit_set_ipv4_action(const struct flow *flow,
> struct flow *base_flow,
> bool use_masked)
>  {
>  struct ovs_key_ipv4 key, mask, base;
> +struct ovs_key_field_properties ovs_key_ipv4_field_properties[] = {
> +#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv4, name),
> sizeof(type)},
> +OVS_KEY_IPV4_FIELDS
> +{0, 0}
> +#undef OVS_KEY_FIELD
> +};
>
>  /* Check that nw_proto and nw_frag remain unchanged. */
>  ovs_assert(flow->nw_proto == base_flow->nw_proto &&
> @@ -7286,7 +7337,7 @@ commit_set_ipv4_action(const struct flow *flow,
> struct flow *base_flow,
>  }
>
>  if (commit(OVS_KEY_ATTR_IPV4, use_masked, &key, &base, &mask, sizeof
> key,
> -   odp_actions)) {
> +   ovs_key_ipv4_field_properties, odp_actions)) {
>  put_ipv4_key(&base, base_flow, false);
>  if (mask.ipv4_proto != 0) { /* Mask was changed by commit(). */
>  put_ipv4_key(&mask, &wc->masks, true);
> @@ -7324,6 +7375,12 @@ commit_set_ipv6_action(const struct flow *flow,
> struct flow *base_flow,
> bool use_masked)
>  {
>  struct ovs_key_ipv6 key, mask, base;
> +struct ovs_key_field_properties ovs_key_ipv6_field_properti

Re: [ovs-dev] [PATCH 6/8] ofproto: Handle multipart requests with multiple parts.

2019-01-10 Thread Justin Pettit


> On Aug 30, 2018, at 1:00 PM, Ben Pfaff  wrote:
> 
> OpenFlow has a concept of multipart messages, that is, messages that can be
> broken into multiple pieces that are sent separately.  Before OpenFlow 1.3,
> only replies could actually have multiple pieces.  OpenFlow 1.3 introduced
> the idea that requests could have multiple pieces.  This is only useful for
> multipart requests that take an array as part of the request, which amounts
> to only flow monitoring requests and table features requests.  So far, OVS
> hasn't implemented the multipart versions of these (it just reports an
> error).  This commit introduces the necessary infastructure to implement
> them properly.
> 
> Signed-off-by: Ben Pfaff 

Acked-by: Justin Pettit 

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/1] utilities: gdb debug commands fix typos

2019-01-10 Thread Ben Pfaff
Thanks Andreas and Eelco, I applied this to master.

On Wed, Jan 02, 2019 at 01:32:53PM +0100, Eelco Chaudron wrote:
> Thanks Andreas,
> 
> Acked-by: Eelco Chaudron 
> 
> On 1 Jan 2019, at 15:34, Andreas Karis wrote:
> 
> > Fix minor typos in ovs_gdb debug script.
> > 
> > Signed-off-by: Andreas Karis 
> > ---
> >  utilities/gdb/ovs_gdb.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/utilities/gdb/ovs_gdb.py b/utilities/gdb/ovs_gdb.py
> > index cb9778c69..fab69cc8e 100644
> > --- a/utilities/gdb/ovs_gdb.py
> > +++ b/utilities/gdb/ovs_gdb.py
> > @@ -122,7 +122,7 @@ def get_global_variable(name):
> >  var = gdb.lookup_symbol(name)[0]
> >  if var is None or not var.is_variable:
> >  print("Can't find {} global variable, are you sure "
> > -  "your debugging OVS?".format(name))
> > +  "you are debugging OVS?".format(name))
> >  return None
> >  return gdb.parse_and_eval(name)
> > 
> > @@ -424,7 +424,7 @@ class CmdDumpBridgePorts(gdb.Command):
> >  def display_single_port(port, indent=0):
> >  indent = " " * indent
> >  port = port.cast(gdb.lookup_type('struct port').pointer())
> > -print("{}(struct port *) {}: name = {}, brige = (struct bridge
> > *) {}".
> > +print("{}(struct port *) {}: name = {}, bridge = (struct bridge
> > *) {}".
> >format(indent, port, port['name'].string(),
> >   port['bridge']))
> > 
> > -- 
> > 2.17.2
> > 
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/1] utilities: Update gdb script so it works with all python versions

2019-01-10 Thread Ben Pfaff
On Tue, Jan 08, 2019 at 04:52:45PM +0800, solomon wrote:
> 
> Eelco Chaudron wrote:
> > Newer versions of Python require a different iterator function. This
> > change will make the iterator classes work with all Python versions.
> > 
> > Adds a fix for python3 as it does not support the long() type.
> > The fix guaranties the script still works on Python 2.7.
> > 
> > The uKey walker is rather slow on python3, so added a spinner to
> > indicate we are still busy processing entries.
> > 
> > Fix functions using the iterkeys() function on dictionaries.
> > 
> > Signed-off-by: Eelco Chaudron 
> 
> It's good to me.
> Tested-by: solomon 

Thanks Eelco and solomon.  I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] ovs:group: support to insert bucket with weight value for select type

2019-01-10 Thread Ben Pfaff
On Sun, Jan 06, 2019 at 05:17:34PM +0800, solomon wrote:
> After creating a group with hash select type,then  we need to insert a new 
> bucket with weight,

Thanks for the patch.  I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovn: Support a new Logical_Switch_Port.type - 'external'

2019-01-10 Thread Numan Siddique
On Fri, Jan 11, 2019 at 12:07 AM Han Zhou  wrote:

>
>
>
> On Thu, Dec 20, 2018 at 5:34 AM Numan Siddique 
> wrote:
> >
> >
> >
> > On Wed, Dec 19, 2018 at 4:23 PM Miguel Angel Ajo Pelayo <
> majop...@redhat.com> wrote:
> >>
> >> Oh, if requested chassis matches the chasis of the SRIOV instance, then
> we don't need HA. Ignore my comments regarding multiple chassis.
> >>
> >> And yes it's my understanding that the traffic will bounce to the host
> interface probably through the internal switch embedded in the SRIOV card,
> or, if the host uses a separate interface then via de Tor switch.
> >>
> >>
> >> On Wed, Dec 19, 2018, 1:54 AM Han Zhou  >>>
> >>> Hi Numan, I haven't finished reviewing the whole patch yet, but I have
> some
> >>> questions inlined.
> >>>
> >>>
> >>> > +
> >>> > +
> >>> > +  
> >>> > +
> >>> > +  For each router port IP address A which
> belongs
> >>> to the
> >>> > +  logical switch, A priority-100 flow is added which matches
> >>> > +  REGBIT_EXT_PORT_NOT_RESIDENT && arp.tpa ==
> >>> A
> >>> > +  && arp.op == 1 (ARP request to the router
> >>> > +  IP) with the action to drop the packet.
> >>> > +
> >>> > +
> >>> > +
> >>> > +  These flows guarantees that the ARP/NS request to the
> router IP
> >>> > +  address from the external ports is responded by only the
> >>> chassis
> >>> > +  which has claimed these external ports. All the other
> chassis,
> >>> > +  drops these packets.
> >>> > +
> >>>
> >>> Could you explain more about how this solves
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1613384
> >>> For my understanding, the logical router port MAC would still flap on
> the
> >>> physical switch ports, if there are multiple external ports and their
> >>> "requested-chassis" are set to different chassis. Or does this suggest
> >>> specifying a single chassis as "requested-chassis" for all
> external-ports?
> >>>
> >
> > You are right. If there are multiple external ports and their
> "requested-chassis" are set to
> > different chassis  you would still see MAC flap issue. I would say CMS
> should handle
> > this situation and set the same "requested-chassis" for all external
> ports.
> >
>
> OK. So would it be better to document this consideration?
>

Yeah. Sounds good to me. I will update the patch with the documentation.

Thanks
Numan


>
> >
> >>>
> >>> > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> >>> > index 3936e6016..37b97a0d9 100644
> >>> > --- a/ovn/ovn-architecture.7.xml
> >>> > +++ b/ovn/ovn-architecture.7.xml
> >>> > @@ -1678,6 +1678,72 @@
> >>> >  
> >>> >
> >>> >
> >>> > +  Native OVN services for external logical ports
> >>> > +
> >>> > +  
> >>> > +To support OVN native services (like DHCP/IPv6 RA/DNS lookup)
> to the
> >>> > +cloud resources which are external, OVN supports
> >>> external
> >>> > +logical ports.
> >>> > +  
> >>> > +
> >>> > +  
> >>> > +Below are some of the use cases where external
> ports
> >>> can be
> >>> > +used.
> >>> > +  
> >>> > +
> >>> > +  
> >>> > +
> >>> > +  VMs connected to SR-IOV nics - Traffic from these VMs by
> passes the
> >>> > +  kernel stack and local ovn-controller do not bind
> >>> these
> >>> > +  ports and cannot serve the native services.
> >>> > +
> >>>
> >>> Would the broadcast traffic (e.g. DHCP discover) sent out from SR-IOV
> come
> >>> back to the local host? Is it free to select either the local host or
> any
> >>> other hosts as the "requested-chassis"? Or does this suggest that we
> have
> >>> to use a different chassis other than the local host as the
> >>> "requested-chassis"?
> >>>
> >
> > As Miguel mentioned the traffic may be received by the compute host
> where the SR-IOV VM
> > is present if the host NIC is connected to the same switch.
> >
> > It is free to select any chassis as long as that chassis can receive the
> broadcast packet.
> >
> >
> >
> >>>
> >>> I will review in more detail later.
> >
> >
> > Thanks
> > Numan
> >
> >>>
> >>>
> >>> Thanks,
> >>> Han
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovn: Support a new Logical_Switch_Port.type - 'external'

2019-01-10 Thread Han Zhou
On Thu, Dec 20, 2018 at 5:34 AM Numan Siddique  wrote:
>
>
>
> On Wed, Dec 19, 2018 at 4:23 PM Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:
>>
>> Oh, if requested chassis matches the chasis of the SRIOV instance, then
we don't need HA. Ignore my comments regarding multiple chassis.
>>
>> And yes it's my understanding that the traffic will bounce to the host
interface probably through the internal switch embedded in the SRIOV card,
or, if the host uses a separate interface then via de Tor switch.
>>
>>
>> On Wed, Dec 19, 2018, 1:54 AM Han Zhou >>
>>> Hi Numan, I haven't finished reviewing the whole patch yet, but I have
some
>>> questions inlined.
>>>
>>>
>>> > +
>>> > +
>>> > +  
>>> > +
>>> > +  For each router port IP address A which
belongs
>>> to the
>>> > +  logical switch, A priority-100 flow is added which matches
>>> > +  REGBIT_EXT_PORT_NOT_RESIDENT && arp.tpa ==
>>> A
>>> > +  && arp.op == 1 (ARP request to the router
>>> > +  IP) with the action to drop the packet.
>>> > +
>>> > +
>>> > +
>>> > +  These flows guarantees that the ARP/NS request to the
router IP
>>> > +  address from the external ports is responded by only the
>>> chassis
>>> > +  which has claimed these external ports. All the other
chassis,
>>> > +  drops these packets.
>>> > +
>>>
>>> Could you explain more about how this solves
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1613384
>>> For my understanding, the logical router port MAC would still flap on
the
>>> physical switch ports, if there are multiple external ports and their
>>> "requested-chassis" are set to different chassis. Or does this suggest
>>> specifying a single chassis as "requested-chassis" for all
external-ports?
>>>
>
> You are right. If there are multiple external ports and their
"requested-chassis" are set to
> different chassis  you would still see MAC flap issue. I would say CMS
should handle
> this situation and set the same "requested-chassis" for all external
ports.
>

OK. So would it be better to document this consideration?

>
>>>
>>> > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>>> > index 3936e6016..37b97a0d9 100644
>>> > --- a/ovn/ovn-architecture.7.xml
>>> > +++ b/ovn/ovn-architecture.7.xml
>>> > @@ -1678,6 +1678,72 @@
>>> >  
>>> >
>>> >
>>> > +  Native OVN services for external logical ports
>>> > +
>>> > +  
>>> > +To support OVN native services (like DHCP/IPv6 RA/DNS lookup) to
the
>>> > +cloud resources which are external, OVN supports
>>> external
>>> > +logical ports.
>>> > +  
>>> > +
>>> > +  
>>> > +Below are some of the use cases where external ports
>>> can be
>>> > +used.
>>> > +  
>>> > +
>>> > +  
>>> > +
>>> > +  VMs connected to SR-IOV nics - Traffic from these VMs by
passes the
>>> > +  kernel stack and local ovn-controller do not bind
>>> these
>>> > +  ports and cannot serve the native services.
>>> > +
>>>
>>> Would the broadcast traffic (e.g. DHCP discover) sent out from SR-IOV
come
>>> back to the local host? Is it free to select either the local host or
any
>>> other hosts as the "requested-chassis"? Or does this suggest that we
have
>>> to use a different chassis other than the local host as the
>>> "requested-chassis"?
>>>
>
> As Miguel mentioned the traffic may be received by the compute host where
the SR-IOV VM
> is present if the host NIC is connected to the same switch.
>
> It is free to select any chassis as long as that chassis can receive the
broadcast packet.
>
>
>
>>>
>>> I will review in more detail later.
>
>
> Thanks
> Numan
>
>>>
>>>
>>> Thanks,
>>> Han
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] "soft freeze" for 2.11

2019-01-10 Thread Ben Pfaff
We're now in the "soft freeze" stage for OVS 2.11 development.
release-process.txt describes this stage as:

1. "Soft freeze" of the master branch.

   During the freeze, we ask committers to refrain from applying patches that
   add new features unless those patches were already being publicly discussed
   and reviewed before the freeze began.  Bug fixes are welcome at any time.
   Please propose and discuss exceptions on ovs-dev.

The next release stage is to branch.  By the schedule, we should do that
on Jan. 15.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/8] vconn: Allow timout configuration for blocking connection.

2019-01-10 Thread Ben Pfaff
On Thu, Jan 10, 2019 at 11:44:05AM +0300, Ilya Maximets wrote:
> On 09.01.2019 23:27, Ben Pfaff wrote:
> > On Wed, Jan 09, 2019 at 08:28:54PM +0300, Ilya Maximets wrote:
> >> On 27.12.2018 20:36, Ben Pfaff wrote:
> >>> On Wed, Dec 26, 2018 at 06:23:56PM +0300, Ilya Maximets wrote:
>  On some systems in case where remote is not responding, socket could
>  remain in SYN_SENT state for a really long time without errors waiting
>  for connection. This leads to situations where vconn connection hangs
>  for a few minutes waiting for connection to the DOWN remote.
> 
>  For example, this situation emulated by "refuse-connection" vconn
>  testcase. This leads to test failures because Alarm signal arrives much
>  faster than ETIMEDOUT from the socket:
> 
>    ./vconn.at:21: ovstest test-vconn refuse-connection tcp
>    Alarm clock
>    stderr:
>    |socket_util|INFO|0:127.0.0.1: listening on port 63812
>    |poll_loop|DBG|wakeup due to 0-ms timeout
>    |poll_loop|DBG|wakeup due to 10155-ms timeout
>    |fatal_signal|WARN|terminating with signal 14 (Alarm clock)
>    ./vconn.at:21: exit code was 142, expected 0
>    vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED
> 
>  This patch allowes to specify timeout value for vconn blocking
>  connections. If the connection takes more time, socket will be closed
>  with ETIMEDOUT error code. Negative value could be used to wait
>  infinitely.
> 
>  Signed-off-by: Ilya Maximets 
> >>>
> >>> Same comments as patch 2.
> >>>
> >>> Are the timeouts only useful for the test cases?  I wonder whether just
> >>> calling alarm(10); at the beginning of the test programs would be just
> >>> as helpful.  On the other hand, it would make using a debugger on those
> >>> programs harder.
> >>
> >> I guess, we have alarms in all the test programs.
> >> The issue here is that some test apps like 'test_refuse_connection' treats
> >> connection failure as a success. But on some systems, wrong connections 
> >> hangs
> >> for a really long time and alarm kills the test application. In this case
> >> we can't say for sure if the test failed or not, i.e. if it was expected
> >> connection failure or other random issue that forced the application to 
> >> hang.
> >>
> >> stream connection tests even worse, because they are trying to sequentially
> >> establish connection to one of 3 different remotes while only one of them 
> >> is
> >> correct. And it will never try to connect to correct one if the blocking
> >> connection to wrong port will hang for a few minutes. It'll be simply 
> >> killed
> >> by alarm.
> > 
> > It should be possible to tell what caused the test program to exit by
> > testing the the exit status.  When a program exits due to a signal, a
> > Bourne-compatible shell sets $? to 128 plus the signal number.  Usually,
> > it's good enough just to know that the process died with an unusual exit
> > status, but you can get the particular signal name back with "kill -l
> > $?", e.g. on Linux "kill -l 142" prints "ALRM".  This behavior is
> > specified by POSIX so it should be portable.
> 
> Yes, we can detect that app was killed by alarm, but we can't say if it was
> expected hang while connecting to the wrong port or it was just too long
> execution due to random environment issue or a bug.
> 
> Let's look at "multiple remotes" test cases. Their workflow is following:
> 
>   1. alarm(10)
>   2. Initialize idl with multiple remotes.
>   2. RPC: Try to connect to WRONG_PORT_1. Fail expected.
>   3. RPC: Try to connect to right port.
>   4. Perform some ovsdb transactions.
>   5. Check result.
> 
> Step 2 always hangs in CirrusCI environment and app dies there by alarm.
> We can't treat this as success because we didn't check anything useful.

Oh, I see, we have some tests where the expected behavior is to hang,
but we want to make sure that it's for the right reason.  I didn't
properly understand that.  I'll take a look at the new patches.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] netdev-dpdk: Consider packets marked for TSO.

2019-01-10 Thread 0-day Robot
Bleep bloop.  Greetings Tiago Lam, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
fatal: sha1 information is lacking or useless (lib/netdev-dpdk.c).
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 netdev-dpdk: Consider packets marked for TSO.
The copy of the patch that failed is found in:
   
/var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v3 5/5] ovn: Generate ICMPv4 packet in router pipeline for larger packets

2019-01-10 Thread nusiddiq
From: Numan Siddique 

Signed-off-by: Numan Siddique 
---
 ovn/northd/ovn-northd.8.xml |  83 +-
 ovn/northd/ovn-northd.c |  89 ++-
 tests/ovn.at| 165 
 3 files changed, 331 insertions(+), 6 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 392a5efc9..a3922e8e1 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -2002,7 +2002,86 @@ next;
   
 
 
-Ingress Table 9: Gateway Redirect
+Ingress Table 9: Check packet length
+
+
+  For distributed logical routers with distributed gateway port configured
+  with options:gateway_mtu to a valid integer value, this
+  table adds a priority-50 logical flow with the match
+  ip4 && outport == GW_PORT where
+  GW_PORT is the distributed gateway router port and applies the
+  action check_pkt_larger and advances the packet to the
+  next table.
+
+
+
+REGBIT_PKT_LARGER = check_pkt_larger(L); next;
+
+
+
+  where L is the packet length to check for. If the packet
+  is larger than L, it stores 1 in the register bit
+  REGBIT_PKT_LARGER. The value of
+  L is taken from  column of
+   row.
+
+
+
+  This table adds one priority-0 fallback flow that matches all packets
+  and advances to the next table.
+
+
+Ingress Table 10: Handle larger packets
+
+
+  For distributed logical routers with distributed gateway port configured
+  with options:gateway_mtu to a valid integer value, this
+  table adds the following priority-50 logical flow for each
+  logical router port with the match ip4 &&
+  inport == LRP && outport == GW_PORT
+  && REGBIT_PKT_LARGER, where LRP is the logical
+  router port and GW_PORT is the distributed gateway router port
+  and applies the following action
+
+
+
+icmp4 {
+icmp4.type = 3; /* Destination Unreachable. */
+icmp4.code = 4;  /* Frag Needed and DF was Set. */
+icmp4.frag_mtu = M;
+eth.dst = E;
+ip4.dst = ip4.src;
+ip4.src = I;
+ip.ttl = 255;
+REGBIT_EGRESS_LOOPBACK = 1;
+next(pipeline=ingress, table=0);
+};
+
+
+
+  
+Where M is the fragment MTU whose value is taken from
+ column of
+ row.
+  
+
+  
+E is the Ethernet address of the logical router port.
+  
+
+  
+I is the IPv4 address of the logical router port.
+  
+
+
+
+  This table adds one priority-0 fallback flow that matches all packets
+  and advances to the next table.
+
+
+Ingress Table 11: Gateway Redirect
 
 
   For distributed logical routers where one of the logical router
@@ -2059,7 +2138,7 @@ next;
   
 
 
-Ingress Table 10: ARP Request
+Ingress Table 12: ARP Request
 
 
   In the common case where the Ethernet destination has been resolved, this
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index ebf829278..9f0621c41 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -143,8 +143,10 @@ enum ovn_stage {
 PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE, 6, "lr_in_nd_ra_response") \
 PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING, 7, "lr_in_ip_routing")   \
 PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,8, "lr_in_arp_resolve")  \
-PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,9, "lr_in_gw_redirect")  \
-PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,10, "lr_in_arp_request")  \
+PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN   , 9, "lr_in_chk_pkt_len")   \
+PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS,10,"lr_in_larger_pkts")   \
+PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,11, "lr_in_gw_redirect")  \
+PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,12, "lr_in_arp_request")  \
   \
 /* Logical router egress stages. */   \
 PIPELINE_STAGE(ROUTER, OUT, UNDNAT,0, "lr_out_undnat")\
@@ -180,6 +182,8 @@ enum ovn_stage {
  * logical router dropping packets with source IP address equals
  * one of the logical router's own IP addresses. */
 #define REGBIT_EGRESS_LOOPBACK  "reg9[1]"
+/* Register to store the result of check_pkt_larger action. */
+#define REGBIT_PKT_LARGER"reg9[2]"
 
 /* Returns an "enum ovn_stage" built from the arguments. */
 static enum ovn_stage
@@ -6574,7 +6578,84 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
   "get_nd(outport, xxreg0); next;");
 }
 
-/* Logical router ingress table 9: Gateway redirect.
+/* Local router ingress table 9: Check packet length.
+ *
+ * Any IPv4 packet with outport set to the distributed gateway
+ * router port, check the packet length and store the result in the
+ * 'REGBIT_PKT_LARGER' register bit.
+ *
+ * Local router ingress table 10: Hand

[ovs-dev] [RFC v3 4/5] ovn: Support OVS action 'check_pkt_larger' in OVN

2019-01-10 Thread nusiddiq
From: Numan Siddique 

Previous commit added a new OVS action 'check_pkt_larger'. This
patch supports that action in OVN. The syntax to use this would be

reg0[0] = check_pkt_larger(LEN)

Upcoming commit will make use of this action in ovn-northd and
will generate an ICMP4 packet if the packet length is greater than
the specified length.

Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h | 10 +++-
 ovn/controller/pinctrl.c  | 15 +++
 ovn/lib/actions.c | 53 +++
 ovn/ovn-sb.xml| 21 
 ovn/utilities/ovn-trace.c |  4 ++-
 tests/ovn.at  | 10 
 6 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 095b60df0..6571e2aee 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -81,7 +81,8 @@ struct ovn_extend_table;
 OVNACT(PUT_ND_RA_OPTS,ovnact_put_opts)\
 OVNACT(ND_NS, ovnact_nest)\
 OVNACT(SET_METER, ovnact_set_meter)   \
-OVNACT(OVNFIELD_LOAD, ovnact_load)
+OVNACT(OVNFIELD_LOAD, ovnact_load)\
+OVNACT(CHECK_PKT_LARGER,  ovnact_check_pkt_larger)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -311,6 +312,13 @@ struct ovnact_set_meter {
 uint64_t burst;  /* burst rate field, in kbps. */
 };
 
+/* OVNACT_CHECK_IP_PKT_LARGER. */
+struct ovnact_check_pkt_larger {
+struct ovnact ovnact;
+uint16_t pkt_len;
+struct expr_field dst;  /* 1-bit destination field. */
+};
+
 /* Internal use by the helpers below. */
 void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
 void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 59bfe33cc..ba6d4a24e 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -452,6 +452,8 @@ pinctrl_handle_icmp(const struct flow *ip_flow, struct 
dp_packet *pkt_in,
 packet_set_icmp(&packet, ICMP4_DST_UNREACH, 1);
 uint8_t *n_ovnfield_acts = ofpbuf_try_pull(userdata,
sizeof *n_ovnfield_acts);
+bool include_orig_ip_datagram = false;
+
 if (n_ovnfield_acts && *n_ovnfield_acts) {
 for (uint8_t i = 0; i < *n_ovnfield_acts; i++) {
  struct ovnfield_act_header *oah =
@@ -465,12 +467,25 @@ pinctrl_handle_icmp(const struct flow *ip_flow, struct 
dp_packet *pkt_in,
  ovs_be16 *mtu = ofpbuf_try_pull(userdata, sizeof *mtu);
  if (mtu) {
  ih->icmp_fields.frag.mtu = *mtu;
+ include_orig_ip_datagram = true;
  }
  break;
  }
  }
 }
 }
+
+if (include_orig_ip_datagram) {
+size_t orig_ip_datagram_len = sizeof(struct ip_header) + 8;
+uint8_t *data = dp_packet_put_zeros(&packet, orig_ip_datagram_len);
+memcpy(data, dp_packet_l3(pkt_in), orig_ip_datagram_len);
+nh->ip_tot_len += htons(orig_ip_datagram_len);
+ih->icmp_csum = 0;
+ih->icmp_csum = csum(ih, sizeof *ih + orig_ip_datagram_len);
+nh->ip_csum = 0;
+nh->ip_csum = csum(nh, sizeof *nh);
+
+}
 } else {
 struct ip6_hdr *nh = dp_packet_put_zeros(&packet, sizeof *nh);
 struct icmp6_error_header *ih;
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index a2000667e..ea9120b79 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -2366,6 +2366,55 @@ encode_OVNFIELD_LOAD(const struct ovnact_load *load,
 }
 }
 
+static void
+parse_check_pkt_larger(struct action_context *ctx,
+   const struct expr_field *dst,
+   struct ovnact_check_pkt_larger *cipl)
+{
+int pkt_len;
+
+lexer_get(ctx->lexer); /* Skip check_pkt_len. */
+if (!lexer_force_match(ctx->lexer, LEX_T_LPAREN)
+|| !lexer_get_int(ctx->lexer, &pkt_len)
+|| !lexer_force_match(ctx->lexer, LEX_T_RPAREN)) {
+return;
+}
+
+/* Validate that the destination is a 1-bit, modifiable field. */
+char *error = expr_type_check(dst, 1, true);
+if (error) {
+lexer_error(ctx->lexer, "%s", error);
+free(error);
+return;
+}
+cipl->dst = *dst;
+cipl->pkt_len = pkt_len;
+}
+
+static void
+format_CHECK_PKT_LARGER(const struct ovnact_check_pkt_larger *cipl,
+struct ds *s)
+{
+expr_field_format(&cipl->dst, s);
+ds_put_format(s, " = check_pkt_larger(%d);", cipl->pkt_len);
+}
+
+static void
+encode_CHECK_PKT_LARGER(const struct ovnact_check_pkt_larger *cipl,
+const struct ovnact_encode_params *ep OVS_UNUSED,
+struct ofpbuf *ofpacts)
+{
+struct

[ovs-dev] [RFC v3 3/5] ovn: Add a new OVN field icmp4.frag_mtu

2019-01-10 Thread nusiddiq
From: Numan Siddique 

In order to support OVN specific fields (which are not yet
supported in OpenvSwitch to set or modify values) a generic
OVN field support is added in this patch. These OVN fields are
expected to be used as nested OVN actions inside OVN actions
like icmp4, icmp6 etc. This patch adds only one field for now
 - icmp4.frag_mtu. It should be fairly straightforward to
add similar fields in the near future.

This field is expected to be used as an inner field with in
the 'icmp4' action.

Eg.
"icmp4 {"eth.dst <-> eth.src; "
"icmp4.type = 3; /* Destination Unreachable */ "
"icmp4.code = 4; /* Fragmentation Needed */ "
 icmp4.frag_mtu = 1442;
 ...
 "next; };",

pinctrl module of ovn-controller will set the specified value
in the the low-order 16 bits of the ICMP4 header field that is
labelled "unused" in the ICMP specification as defined in the RFC 1191.

Upcoming patch will use it to send an icmp4 packet if the
source IPv4 packet destined to go via external gateway needs to
be fragmented.

Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h |  25 +++-
 include/ovn/automake.mk   |   3 +-
 include/ovn/expr.h|   5 +
 {ovn/lib => include/ovn}/logical-fields.h |  41 +++
 ovn/controller/lflow.c|   1 +
 ovn/controller/lflow.h|   2 +-
 ovn/controller/pinctrl.c  |  23 +++-
 ovn/lib/actions.c | 135 +-
 ovn/lib/automake.mk   |   1 -
 ovn/lib/expr.c|  17 ++-
 ovn/lib/logical-fields.c  |  36 +-
 ovn/northd/ovn-northd.c   |   2 +-
 ovn/ovn-sb.xml|  11 ++
 ovn/utilities/ovn-trace.c |   5 +-
 tests/ovn.at  |  18 ++-
 tests/test-ovn.c  |   2 +-
 16 files changed, 282 insertions(+), 45 deletions(-)
 rename {ovn/lib => include/ovn}/logical-fields.h (71%)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 1c0c67ce6..095b60df0 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -80,7 +80,8 @@ struct ovn_extend_table;
 OVNACT(LOG,   ovnact_log) \
 OVNACT(PUT_ND_RA_OPTS,ovnact_put_opts)\
 OVNACT(ND_NS, ovnact_nest)\
-OVNACT(SET_METER, ovnact_set_meter)
+OVNACT(SET_METER, ovnact_set_meter)   \
+OVNACT(OVNFIELD_LOAD, ovnact_load)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -142,6 +143,20 @@ ovnact_end(const struct ovnact *ovnacts, size_t 
ovnacts_len)
 #define OVNACT_FOR_EACH(POS, OVNACTS, OVNACTS_LEN)  \
 for ((POS) = (OVNACTS); (POS) < ovnact_end(OVNACTS, OVNACTS_LEN);  \
  (POS) = ovnact_next(POS))
+
+static inline int
+ovnacts_count(const struct ovnact *ovnacts, size_t ovnacts_len)
+{
+uint8_t n_ovnacts = 0;
+if (ovnacts) {
+const struct ovnact *a;
+
+OVNACT_FOR_EACH (a, ovnacts, ovnacts_len) {
+n_ovnacts++;
+}
+}
+return n_ovnacts;
+}
 
 /* Action structure for each OVNACT_*. */
 
@@ -229,6 +244,8 @@ struct ovnact_nest {
 struct ovnact ovnact;
 struct ovnact *nested;
 size_t nested_len;
+struct ovnact *nested_ovnfields;
+size_t nested_ovnfields_len;
 };
 
 /* OVNACT_GET_ARP, OVNACT_GET_ND. */
@@ -461,6 +478,12 @@ struct action_header {
 };
 BUILD_ASSERT_DECL(sizeof(struct action_header) == 8);
 
+OVS_PACKED(
+struct ovnfield_act_header {
+ovs_be16 id; /* one of enum ovnfield_id. */
+ovs_be16 len; /* Length of the ovnfield data. */
+});
+
 struct ovnact_parse_params {
 /* A table of "struct expr_symbol"s to support (as one would provide to
  * expr_parse()). */
diff --git a/include/ovn/automake.mk b/include/ovn/automake.mk
index d2924c2f6..54b0e2c0e 100644
--- a/include/ovn/automake.mk
+++ b/include/ovn/automake.mk
@@ -2,4 +2,5 @@ ovnincludedir = $(includedir)/ovn
 ovninclude_HEADERS = \
include/ovn/actions.h \
include/ovn/expr.h \
-   include/ovn/lex.h
+   include/ovn/lex.h  \
+   include/ovn/logical-fields.h
diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 3995e62f0..a68fd6e1c 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -58,6 +58,7 @@
 #include "openvswitch/list.h"
 #include "openvswitch/match.h"
 #include "openvswitch/meta-flow.h"
+#include "logical-fields.h"
 
 struct ds;
 struct expr;
@@ -244,6 +245,7 @@ struct expr_symbol {
 int width;
 
 const struct mf_field *field; /* Fields only, otherwise NULL. */
+const struct ovn_field *ovn_field;  /* OVN Fields only, otherwise NULL. */
 const struct expr_symbol *parent; /* Subfields only, otherwise NULL. */
 int parent_ofs;   /* Subfields only, otherwise 0. */
 char *predicate

[ovs-dev] [RFC v3 2/5] Add a new OVS action check_pkt_larger

2019-01-10 Thread nusiddiq
From: Numan Siddique 

This patch adds a new action 'check_pkt_larger' which checks if the
packet is larger than the given size and stores the result in the
destination register.

Usage: check_pkt_larger:len->REGISTER
Eg. match=...,actions=check_pkt_larger:1442->NXM_NX_REG0[0],next;

This patch makes use of the new datapath action - 'check_pkt_len'
which is still WIP. At the start of ovs-vswitchd, datapath is probed
for this action. If the datapath action is present, then 'check_pkt_larger'
makes use of this datapath action.

Datapath action 'check_pkt_len' takes these nlattrs
  * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'
  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.

Let's say we have these flows added to an OVS bridge br-int

table=0, priority=100 
in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1)
table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3
table=1, priority=100,in_port=1,ip,actions=output:4

Suppose if a packet is received from in_port=1, the packet is sent
to userspace (MISS_UPCALL). The action 'check_pkt_larger' will be translated as
  - check_pkt_len( > 100 ? (3) : (4))

datapath will check the packet length and if the packet length is greater than 
100,
it will be output to port 3, else it will be output to port 4.

In case, datapath doesn't support 'check_pkt_len' action, the OVS action
'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added.

This OVS action is intended to be used by OVN to check the packet length
and generate an ICMP packet with type 3, code 4 and next hop mtu
in the logical router pipeline if the MTU of the physical interface
is lesser than the packet length. More information can be found here [1]

TODO:
 - Add test cases.
 - Add documentation

Request to suggest a better name for the action in case 'check_pkt_larger'
seems odd.

[1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff 
Signed-off-by: Numan Siddique 
CC: Ben Pfaff 
---
 .../linux/compat/include/linux/openvswitch.h  |  30 +++--
 include/openvswitch/ofp-actions.h |  18 +++
 lib/dpif-netdev.c |   1 +
 lib/dpif.c|   1 +
 lib/odp-execute.c |  72 
 lib/odp-util.c|  36 ++
 lib/ofp-actions.c |  98 +++-
 lib/ofp-parse.c   |  10 ++
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  | 109 ++
 ofproto/ofproto-dpif.c|  47 
 ofproto/ofproto-dpif.h|   5 +-
 13 files changed, 418 insertions(+), 11 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1b0..26dd69dcd 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -855,6 +855,24 @@ enum ovs_nat_attr {
 
 #define OVS_NAT_ATTR_MAX (__OVS_NAT_ATTR_MAX - 1)
 
+/*
+ * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN.
+ *
+ * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for.
+ * @OVS_CHECK_PKT_LEN_ATTR_USERSPACE_COND: u8 comparison condition to send
+ * the packet to userspace. One of OVS_CHECK_PKT_LEN_COND_*.
+ * @OVS_CHECK_PKT_LEN_ATTR_USERPACE - Nested OVS_USERSPACE_ATTR_* actions.
+ */
+enum ovs_check_pkt_len_attr {
+OVS_CHECK_PKT_LEN_ATTR_UNSPEC,
+OVS_CHECK_PKT_LEN_ATTR_PKT_LEN,
+OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER,
+OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL,
+__OVS_CHECK_PKT_LEN_ATTR_MAX,
+};
+
+#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -909,8 +927,6 @@ enum ovs_nat_attr {
  * tunnel header.
  * @OVS_ACTION_ATTR_METER: Run packet through a meter, which may drop the
  * packet, or modify the packet (e.g., change the DSCP field).
- * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
- * actions without affecting the original packet and key.
  */
 
 enum ovs_action_attr {
@@ -936,12 +952,13 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_CT_CLEAR, /* No argument. */
OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */
OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
-   OVS_ACTION_ATTR_METER,/* u32 meter number. */
-   OVS_ACTION_A

[ovs-dev] [RFC v3 1/5] datapath: Add a new action check_pkt_len

2019-01-10 Thread nusiddiq
From: Numan Siddique 

[Please note, this patch is submitted as RFC in ovs-dev ML to
get feedback before submitting to netdev ML]

This patch adds a new action which checks the packet length and
executes a set of actions if the packet length is greater than
the specified length or executes another set of actions if the
packet length is lesser or equal to.

This action takes below nlattrs
  * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for

  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'

  * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.

The main use case for adding this action is to solve the packet
drops because of MTU mismatch in OVN virtual networking solution.
When a VM (which belongs to a logical switch of OVN) sends a packet
destined to go via the gateway router and if the nic which provides
external connectivity, has a lesser MTU, OVS drops the packet
if the packet length is greater than this MTU.

With the help of this action, OVN will check the packet length
and if it is greater than the MTU size, it will generate an
ICMP packet (type 3, code 4) and includes the next hop mtu in it
so that the sender can fragment the packets.

Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff 
Signed-off-by: Numan Siddique 
CC: Ben Pfaff 
CC: Justin Pettit 
CC: Pravin B Shelar 
---
 include/uapi/linux/openvswitch.h |  25 -
 net/openvswitch/actions.c|  55 +-
 net/openvswitch/flow_netlink.c   | 175 +++
 3 files changed, 253 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index dbe0cbe4f1b7..c395baffdd42 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -798,6 +798,27 @@ struct ovs_action_push_eth {
struct ovs_key_ethernet addresses;
 };
 
+/*
+ * enum ovs_check_pkt_len_attr - Attributes for %OVS_ACTION_ATTR_CHECK_PKT_LEN.
+ *
+ * @OVS_CHECK_PKT_LEN_ATTR_PKT_LEN: u16 Packet length to check for.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is greater than the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ * @OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested OVS_ACTION_ATTR_*
+ * actions to apply if the packer length is lesser or equal to the specified
+ * length in the attr - OVS_CHECK_PKT_LEN_ATTR_PKT_LEN.
+ */
+enum ovs_check_pkt_len_attr {
+   OVS_CHECK_PKT_LEN_ATTR_UNSPEC,
+   OVS_CHECK_PKT_LEN_ATTR_PKT_LEN,
+   OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER,
+   OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL,
+   __OVS_CHECK_PKT_LEN_ATTR_MAX,
+};
+
+#define OVS_CHECK_PKT_LEN_ATTR_MAX (__OVS_CHECK_PKT_LEN_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -842,7 +863,8 @@ struct ovs_action_push_eth {
  * packet, or modify the packet (e.g., change the DSCP field).
  * @OVS_ACTION_ATTR_CLONE: make a copy of the packet and execute a list of
  * actions without affecting the original packet and key.
- *
+ * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the length of the packet and
+ * execute a set of actions as specified in OVS_CHECK_PKT_LEN_ATTR_*.
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
  * type may not be changed.
@@ -875,6 +897,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_PUSH_NSH, /* Nested OVS_NSH_KEY_ATTR_*. */
OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
OVS_ACTION_ATTR_METER,/* u32 meter ID. */
+   OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e47ebbbe71b8..9551c07eae92 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -169,6 +169,10 @@ static int clone_execute(struct datapath *dp, struct 
sk_buff *skb,
 const struct nlattr *actions, int len,
 bool last, bool clone_flow_key);
 
+static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
+ struct sw_flow_key *key,
+ const struct nlattr *attr, int len);
+
 static void update_ethertype(struct sk_buff *skb, struct ethhdr *hdr,
 __be16 ethertype)
 {
@@ -1213,6 +1217,46 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
return clone_execute(dp, skb, key, recirc_id, NULL, 0, last, true);
 }
 
+static int execute_check_pkt_le

[ovs-dev] [RFC v3 0/5] Address MTU issue for larger packets in OVN

2019-01-10 Thread nusiddiq
From: Numan Siddique 

This is an RFC series to address the MTU issues for OVN reported
here [1].

To address this issue, a new OVS action - check_pkt_larger is added.
A new datapath action is also added - check_pkt_len.

The datapath patch is submitted here to get feedback before submitting
to the kernel net-dev ML.

v2 --> v3
-
In v2, the check_pkt_len implementation in odp-execute.c (in p2) was missing,
which is added in v3.



v1 of RFC included patches for datapath and OVS. This v2 series also
includes the corresponding OVN changes.

In v1, check_pkt_len datapath action was implemented as 
  - check_pkt_len(pkt_len, condition, userspace action if condition matches)
   Eg.
check_pkt_len( > 1500 ? userspace(pid=,slow_path(check_pkt_len))
If the packet length is greater than 1500 bytes, send it to userspace.

In v2 a different approach is taken
  - check_pkt_len(> pkt_len ? (set of action to apply) : (another set of
actions to apply))
Eg. check_pkt_len(> 1500 ? (2): (3))
If the packet length is greater than 1500, output to port 2, else
output to port 3.

ovs-vswitchd is modified accordingly.

The ovs action - check_pkt_larger is unchanged. It looks like
  - check_pkt_larger(pkt_len)->REGISTER_BIT.
Eg.  check_pkt_larger(1500)->NXM_NX_REG0[0]
If the packet is larger than 1500 bytes, REG0[0] will be set to 1.

More details in the patches.

If the approach seems reasonable, I will submit the datapath patch first
to net-dev ML first.

[1] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html

Datapath patch
-
Numan Siddique (1):
 datapath: Add a new action check_pkt_len

 include/uapi/linux/openvswitch.h |  25 -
 net/openvswitch/actions.c|  55 +-
 net/openvswitch/flow_netlink.c   | 175 +++
 3 files changed, 253 insertions(+), 2 deletions(-)


OVS patches
---
Numan Siddique (4):
  Add a new OVS action check_pkt_larger
  ovn: Add a new OVN field icmp4.frag_mtu
  ovn: Support OVS action 'check_pkt_larger' in OVN
  ovn: Generate ICMPv4 packet in router pipeline for larger packets

 .../linux/compat/include/linux/openvswitch.h  |  30 ++-
 include/openvswitch/ofp-actions.h |  18 ++
 include/ovn/actions.h |  33 ++-
 include/ovn/automake.mk   |   3 +-
 include/ovn/expr.h|   5 +
 {ovn/lib => include/ovn}/logical-fields.h |  41 
 lib/dpif-netdev.c |   1 +
 lib/dpif.c|   1 +
 lib/odp-execute.c |   4 +
 lib/odp-util.c|  36 
 lib/ofp-actions.c |  98 -
 lib/ofp-parse.c   |  10 +
 ofproto/ofproto-dpif-ipfix.c  |   1 +
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-xlate.c  | 108 ++
 ofproto/ofproto-dpif.c|  47 +
 ofproto/ofproto-dpif.h|   5 +-
 ovn/controller/lflow.c|   1 +
 ovn/controller/lflow.h|   2 +-
 ovn/controller/pinctrl.c  |  38 +++-
 ovn/lib/actions.c | 188 ++---
 ovn/lib/automake.mk   |   1 -
 ovn/lib/expr.c|  17 +-
 ovn/lib/logical-fields.c  |  36 +++-
 ovn/northd/ovn-northd.8.xml   |  83 +++-
 ovn/northd/ovn-northd.c   |  91 -
 ovn/ovn-sb.xml|  32 +++
 ovn/utilities/ovn-trace.c |   9 +-
 tests/ovn.at  | 193 +-
 tests/test-ovn.c  |   2 +-
 30 files changed, 1072 insertions(+), 63 deletions(-)
 rename {ovn/lib => include/ovn}/logical-fields.h (71%)

-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/3] netdev-dpdk: Enable TSO when using multi-seg mbufs

2019-01-10 Thread Tiago Lam
TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
network stack to delegate segmentation of a TCP segment to the hardware
NIC, thus saving compute resources. This may improve performance
significantly for TCP workload in virtualized environments.

While a previous commit already added the necesary logic to netdev-dpdk
to deal with packets marked for TSO, this set of changes enables TSO by
default when using multi-segment mbufs.

Thus, to enable TSO on the physical DPDK interfaces, only the following
command needs to be issued before starting OvS:
ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 Documentation/automake.mk   |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/tso.rst   | 111 
 NEWS|   1 +
 lib/netdev-dpdk.c   |  63 +---
 5 files changed, 168 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/topics/dpdk/tso.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e..a20deb8 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -39,6 +39,7 @@ DOC_SOURCE = \
Documentation/topics/dpdk/index.rst \
Documentation/topics/dpdk/bridge.rst \
Documentation/topics/dpdk/jumbo-frames.rst \
+   Documentation/topics/dpdk/tso.rst \
Documentation/topics/dpdk/memory.rst \
Documentation/topics/dpdk/pdump.rst \
Documentation/topics/dpdk/phy.rst \
diff --git a/Documentation/topics/dpdk/index.rst 
b/Documentation/topics/dpdk/index.rst
index cf24a7b..eb2a04d 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -40,4 +40,5 @@ The DPDK Datapath
/topics/dpdk/qos
/topics/dpdk/pdump
/topics/dpdk/jumbo-frames
+   /topics/dpdk/tso
/topics/dpdk/memory
diff --git a/Documentation/topics/dpdk/tso.rst 
b/Documentation/topics/dpdk/tso.rst
new file mode 100644
index 000..503354f
--- /dev/null
+++ b/Documentation/topics/dpdk/tso.rst
@@ -0,0 +1,111 @@
+..
+  Copyright 2018, Red Hat, Inc.
+
+  Licensed under the Apache License, Version 2.0 (the "License"); you may
+  not use this file except in compliance with the License. You may obtain
+  a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+  License for the specific language governing permissions and limitations
+  under the License.
+
+  Convention for heading levels in Open vSwitch documentation:
+
+  ===  Heading 0 (reserved for the title in a document)
+  ---  Heading 1
+  ~~~  Heading 2
+  +++  Heading 3
+  '''  Heading 4
+
+  Avoid deeper levels because they do not render well.
+
+===
+TSO
+===
+
+.. versionadded:: 2.11.0
+
+TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to
+offload the TCP segmentation into hardware, thus saving the cycles that would
+be required to perform this same segmentation in Software.
+
+TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation
+of an oversized TCP segment to the underlying physical NIC. Offload of frame
+segmentation achieves computational savings in the core, freeing up CPU cycles
+for more useful work.
+
+A common use case for TSO is when using virtualization, where traffic that's
+coming in from a VM can offload the TCP segmentation, thus avoiding the
+fragmentation in Software. Additionally, if the traffic is headed to a VM
+within the same host further optimization can be expected. As the traffic never
+leaves the machine, no MTU needs to be accounted for, and thus no segmentation
+and checksum calculations are required, which saves yet more cycles. Only when
+the traffic actually leaves the host the segmentation needs to happen, in which
+case it will be performed by the egress NIC.
+
+When using TSO with DPDK, the implementation relies on the multi-segment mbufs
+feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf
+contains ~2KiB of the entire packet's data and is linked to the next mbuf that
+contains the next portion of data.
+
+Enabling TSO
+
+Once multi-segment mbufs is enabled, TSO will be enabled by default, if there's
+support for it in the underlying physical NICs attached to OvS-DPDK.
+
+When using :doc:`vHost User ports `, TSO may be enabled in one of
+two ways, as follows.
+
+`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest
+connection is established, `TSO` is thus advertised to the guest as an
+available feature:
+
+  1. QEMU Command Line P

[ovs-dev] [PATCH v2 2/3] netdev-dpdk: Consider packets marked for TSO.

2019-01-10 Thread Tiago Lam
Previously, TSO was being explicity disabled on vhost interfaces,
meaning the guests wouldn't have TSO support negotiated in. With TSO
negotiated and enabled, packets are now marked for TSO, through the
PKT_TX_TCP_SEG flag.

In order to deal with this type of packets, a new function,
netdev_dpdk_prep_tso_packet(), has been introduced, with the main
purpose of setting correctly the l2, l3 and l4 length members of the
mbuf struct, and the appropriate ol_flags. This function supports TSO
both in IPv4 and IPv6.

netdev_dpdk_prep_tso_packet() is then only called when packets are
marked with the PKT_TX_TCP_SEG flag, meaning they have been marked for
TSO, and when the packet will be traversing the NIC.

Additionally, if a packet is marked for TSO but the egress netdev
doesn't support it, the packet is dropped.

Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
---
 lib/dp-packet.h|  14 +++
 lib/netdev-bsd.c   |  11 -
 lib/netdev-dpdk.c  | 121 ++---
 lib/netdev-dummy.c |  11 -
 lib/netdev-linux.c |  15 +++
 5 files changed, 146 insertions(+), 26 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 970aaf2..c384416 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -104,6 +104,8 @@ static inline void dp_packet_set_size(struct dp_packet *, 
uint32_t);
 static inline uint16_t dp_packet_get_allocated(const struct dp_packet *);
 static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t);
 
+static inline bool dp_packet_is_tso(struct dp_packet *b);
+
 void *dp_packet_resize_l2(struct dp_packet *, int increment);
 void *dp_packet_resize_l2_5(struct dp_packet *, int increment);
 static inline void *dp_packet_eth(const struct dp_packet *);
@@ -761,6 +763,12 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
 b->mbuf.buf_len = s;
 }
 
+static inline bool
+dp_packet_is_tso(struct dp_packet *b)
+{
+return (b->mbuf.ol_flags & PKT_TX_TCP_SEG) ? true : false;
+}
+
 static inline void
 dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
 {
@@ -972,6 +980,12 @@ dp_packet_get_allocated(const struct dp_packet *b)
 return b->allocated_;
 }
 
+static inline bool
+dp_packet_is_tso(struct dp_packet *b)
+{
+return false;
+}
+
 static inline void
 dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
 {
diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 278c8a9..5e8c5cc 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -700,13 +700,22 @@ netdev_bsd_send(struct netdev *netdev_, int qid 
OVS_UNUSED,
 }
 
 DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+size_t size = dp_packet_size(packet);
+
+/* TSO not supported in BSD netdev */
+if (dp_packet_is_tso(packet)) {
+VLOG_WARN_RL(&rl, "%s: No TSO enabled on port, TSO packet dropped "
+ "%" PRIu32 " ", name, size);
+
+continue;
+}
+
 /* We need the whole data to send the packet on the device */
 if (!dp_packet_is_linear(packet)) {
 dp_packet_linearize(packet);
 }
 
 const void *data = dp_packet_data(packet);
-size_t size = dp_packet_size(packet);
 
 while (!error) {
 ssize_t retval;
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 77d04fc..ad7223a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1375,14 +1375,16 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
 goto out;
 }
 
-err = rte_vhost_driver_disable_features(dev->vhost_id,
-1ULL << VIRTIO_NET_F_HOST_TSO4
-| 1ULL << VIRTIO_NET_F_HOST_TSO6
-| 1ULL << VIRTIO_NET_F_CSUM);
-if (err) {
-VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
- "port: %s\n", name);
-goto out;
+if (!dpdk_multi_segment_mbufs) {
+err = rte_vhost_driver_disable_features(dev->vhost_id,
+1ULL << VIRTIO_NET_F_HOST_TSO4
+| 1ULL << VIRTIO_NET_F_HOST_TSO6
+| 1ULL << VIRTIO_NET_F_CSUM);
+if (err) {
+VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
+ "client port: %s\n", dev->up.name);
+goto out;
+}
 }
 
 err = rte_vhost_driver_start(dev->vhost_id);
@@ -2027,6 +2029,44 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
 rte_free(rx);
 }
 
+/* Should only be called if PKT_TX_TCP_SEG is set in ol_flags.
+ * Furthermore, it also sets the PKT_TX_TCP_CKSUM and PKT_TX_IP_CKSUM flags,
+ * and PKT_TX_IPV4 and PKT_TX_IPV6 in case the packet is IPv4 or IPv6,
+ * respectiveoly. */
+static void
+netdev_dpdk_prep_tso_packet(struct rte_mbuf *mbuf, int mtu)
+{
+struct dp_packet *pkt;
+struct tcp_header *th;
+
+pkt = CONTAINER_OF(mbuf, s

[ovs-dev] [PATCH v2 1/3] netdev-dpdk: Validate packets burst before Tx.

2019-01-10 Thread Tiago Lam
Given that multi-segment mbufs might be sent between interfaces that
support different capabilities, and may even support different layouts
of mbufs, outgoing packets should be validated before sent on the egress
interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
order to validate the following (taken from the DPDK documentation), on
a device specific manner:
- Check if packet meets devices requirements for tx offloads.
- Check limitations about number of segments.
- Check additional requirements when debug is enabled.
- Update and/or reset required checksums when tx offload is set for
packet.

Signed-off-by: Tiago Lam 
---
 lib/netdev-dpdk.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d6114ee..77d04fc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
 
 /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of
  * 'pkts', even in case of failure.
+ * In case multi-segment mbufs / TSO is being used, it also prepares. In such
+ * cases, only the prepared packets will be sent to Tx burst, meaning that if
+ * an invalid packet appears in 'pkts'[3] only the validated packets in indices
+ * 0, 1 and 2 will be sent.
  *
  * Returns the number of packets that weren't transmitted. */
 static inline int
@@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int 
qid,
  struct rte_mbuf **pkts, int cnt)
 {
 uint32_t nb_tx = 0;
+uint16_t nb_prep = cnt;
 
-while (nb_tx != cnt) {
+if (dpdk_multi_segment_mbufs) {
+/* Validate the burst of packets for Tx. */
+nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
+if (nb_prep != cnt) {
+VLOG_WARN_RL(&rl, "%s: Preparing packet tx burst failed (%u/%u "
+ "packets valid): %s", dev->up.name, nb_prep, cnt,
+ rte_strerror(rte_errno));
+}
+}
+
+/* Tx the validated burst of packets only. */
+while (nb_tx != nb_prep) {
 uint32_t ret;
 
-ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);
+ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx,
+   nb_prep - nb_tx);
 if (!ret) {
 break;
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/3] dpdk: Add support for TSO

2019-01-10 Thread Tiago Lam
Enabling TSO offload allows a host stack to delegate the segmentation of
oversized TCP packets to the underlying physical NIC, if supported. In the case
of a VM this means that the segmentation of the packets is not performed by the
guest kernel, but by the host NIC itself. In turn, since the TSO calculations
and checksums are being performed in hardware, this alleviates the CPU load on
the host system. In inter VM communication this might account to significant
savings, and higher throughput, even more so if the VMs are running on the same
host.

Thus, although inter VM communication is already possible as is, there's a
sacrifice in terms of CPU, which may affect the overall throughput.

This series adds support for TSO in OvS-DPDK, by making use of the TSO
offloading feature already supported by DPDK vhost backend, having the
following scenarios in mind:
- Inter VM communication on the same host;
- Inter VM communication on different hosts;
- The same two use cases above, but on a VLAN network.

The work is based on [1]; It has been rebased to run on top of the
multi-segment mbufs work (v13) [2] and re-worked to use DPDK v18.11.

[1] https://patchwork.ozlabs.org/patch/749564/
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2019-January/354950.html

Considerations:
- As mentioned above, this series depends on the multi-segment mbuf series
  (v13) and can't be applied on master as is;
- The `rte_eth_tx_prepare()` API in DPDK is marked experimental, and although
  I'm not getting any errors / warnings while compiling, do shout if get into
  trouble while testing;
- I'm due to send v3 in the next few days, but sending v2 now to enable early
  testing;

Tiago Lam (3):
  netdev-dpdk: Validate packets burst before Tx.
  netdev-dpdk: Consider packets marked for TSO.
  netdev-dpdk: Enable TSO when using multi-seg mbufs

 Documentation/automake.mk   |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/tso.rst   | 111 
 NEWS|   1 +
 lib/dp-packet.h |  14 +++
 lib/netdev-bsd.c|  11 +-
 lib/netdev-dpdk.c   | 203 ++--
 lib/netdev-dummy.c  |  11 +-
 lib/netdev-linux.c  |  15 +++
 9 files changed, 332 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/topics/dpdk/tso.rst

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS-DPDK public meeting

2019-01-10 Thread Kevin Traynor
Next meeting Jan 23 1700 UTC

Minutes for 9th January.
Attendees: David, Asaf, Flavio, Ian, Tiago, Ophir, Johann, Aaron,
Frikkie, Simon, Pieter, Kevin.

===
GENERAL
===
- OVS 2.11
-- Release Dates
http://docs.openvswitch.org/en/latest/internals/release-process/?highlight=release#release-scheduling

- Meeting Frequency
-- Will move meeting to bi-weekly for next few until 2.11 is released

- Backports
-- Some discussion about what gets backported into older branches
-- There was a request (Asaf) for backport of TSO into older releases
-- Counter argument is that it is a new experimental feature so could
de-stabilze the code



PERFORMANCE/FEATURES

- Port Representors
-- Looks almost ready for merge
-- Giving it another day for comments

- MSEG
-- Has some acks
- TSO
-- Incremental on top of MSEG
-- Final few edits being made by Tiago
-- These 2 would have experimental tag initially

- PMD Auto Balance
-- v4 coming from Nitin
-- Kevin has been testing


On 07/25/2017 02:25 PM, Kevin Traynor wrote:
> Hi All,
> 
> The OVS-DPDK public meetings have moved to Wednesday's at the same time.
> Everyone is welcome, it's a chance to share status/plans etc.
> 
> It's scheduled for every 2 weeks starting 26th July. Meeting time is
> currently @ 4pm UTC.
> 
> You can connect through Bluejeans or through dial in:
> 
> https://bluejeans.com/139318596
> 
> US: +1.408.740.7256 
> UK: +44.203.608.5256 
> Germany: +49.32.221.091256 
> Ireland: +353.1.697.1256 
> Other numbers at https://www.bluejeans.com/numbers
> Meeting ID: 139318596
> 
> thanks,
> Kevin.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Private Lender Bentex Funding Group Ltd.

2019-01-10 Thread Mrs Rose Larsson. via dev



Private Lender Bentex Funding Group Ltd.
Greetings to you by (BFGL).
We are a France-Paris based investment company known as Bentex Funding Group 
Ltd working on expanding its portfolio globally and financing projects.

We would be happy to fund and invest with you in any profitable project if you 
have any viable project we can finance by making mutual investment with you. If 
you are interested, kindly contact us on:avitinvestmentauthori...@gmail.com for 
more details.
Looking forward hearing from you soonest.
Yours truly,
Mrs Rose Larsson.
(Personal Assistant)
Bentex Funding Group Ltd(BFGL)
509 Rue Jacques Coeur,75008 Paris-France
Paris-France.Bentex Funding Group Ltd (BFGL)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

2019-01-10 Thread Thomas Monjalon
10/01/2019 15:23, Ilya Maximets:
> On 10.01.2019 16:42, Thomas Monjalon wrote:
> > Hi,
> > 
> > 10/01/2019 12:32, Ilya Maximets:
> >> On 08.01.2019 12:31, Ophir Munk wrote:
> >>>  if (dev->attached) {
> >>> +/* Remove the port eth device. */
> >>>  rte_eth_dev_close(dev->port_id);
> >>> -rte_eth_dev_info_get(dev->port_id, &dev_info);
> >>> -if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> >>> -VLOG_INFO("Device '%s' has been detached", dev->devargs);
> >>
> >> Please keep this log message for the successful case.
> >>
> >>> -} else {
> >>> -VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> >>> +VLOG_INFO("Device '%s' has been removed", dev->devargs);
> >>> +/* If this is the last port_id using this rte device
> >>> + * remove this rte device and all its eth devices.
> >>> + * FIXME: avoid direct access to DPDK internal array 
> >>> rte_eth_devices.
> >>> + */
> >>> +rte_dev = rte_eth_devices[dev->port_id].device;
> >>> +if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> >>
> >> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> >> RTE_ETH_DEV_CLOSE_REMOVE flag.
> >> The function netdev_dpdk_get_num_ports() skips unused ports.
> >> The result should not be equal to 1. We need the other way to detect
> >> the siblings or close the port after calculating them.
> >> In this case we'll probably do not need to have check for UNUSED state
> >> inside the _get_num_ports().
> > 
> > Maybe I don't understand correctly.
> > Why do you want to count closed sibling ports?
> 
> Proposed version of 'netdev_dpdk_get_num_ports' iterates over
> 'dpdk_list' which contains only ports that currently opened by OVS.
> So, the only port that could have UNUSED state in that list is the
> port that we're just closed here. If we'll count number of siblings
> before closing the port where should be no UNUSED ports in that list.
> 
> The issue I tried to raise here is that rte_eth_dev_close() changed the
> state only for PMDs with RTE_ETH_DEV_CLOSE_REMOVE. So, the result
> of 'netdev_dpdk_get_num_ports' will be different for different PMDs.

Yes, that's true.
You can manage it two ways:
- you rely on OVS list of open ports
- you remove the device completely (as before)
  for PMDs not supporting RTE_ETH_DEV_CLOSE_REMOVE

> >>> +if (rte_dev_remove(rte_dev) < 0) {
> >>> +VLOG_ERR("Device '%s' can not be detached", 
> >>> dev->devargs);
> >>> +}
> >>>  }
> >>>  }
> > 
> > [...]
> >> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> >> devargs iterator ?
> > 
> > The devargs iterator manages "mac=" property in DPDK 18.11.
> 
> Good.
> 
> > 
> > [...]
> >>> -rte_eth_dev_close(port_id);
> >>> +/* FIXME: avoid direct access to DPDK internal array 
> >>> rte_eth_devices. */
> >>> +rte_dev = rte_eth_devices[port_id].device;
> >>> +if (netdev_dpdk_get_num_ports(rte_dev)) {
> >>
> >> Probably same here.
> >> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without 
> >> RTE_ETH_DEV_CLOSE_REMOVE flag.
> > 
> > If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
> > just after rte_eth_dev_close().
> 
> And what if we call rte_dev_remove() after rte_eth_dev_close() for
> PMD with RTE_ETH_DEV_CLOSE_REMOVE ?

Then you release the full device, as before with the old "detach" API.
It makes no difference for single-port device.
But it is bad for multi-port devices (which should already implement
RTE_ETH_DEV_CLOSE_REMOVE).

> >>> +response = xasprintf("Device '%s' is being shared with other "
> >>> + "interfaces. Remove them before detaching.",
> >>> + argv[1]);
> >>> +goto error;
> >>> +}
> >>>  
> >>> -rte_eth_dev_info_get(port_id, &dev_info);
> >>> -if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> >>> -response = xasprintf("Device '%s' can not be detached", argv[1]);
> >>> +rte_eth_dev_close(port_id);
> >>> +if (rte_dev_remove(rte_dev) < 0) {
> >>> +response = xasprintf("Device '%s' can not be removed", argv[1]);



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

2019-01-10 Thread Ilya Maximets
On 10.01.2019 16:42, Thomas Monjalon wrote:
> Hi,
> 
> 10/01/2019 12:32, Ilya Maximets:
>> On 08.01.2019 12:31, Ophir Munk wrote:
>>>  if (dev->attached) {
>>> +/* Remove the port eth device. */
>>>  rte_eth_dev_close(dev->port_id);
>>> -rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> -if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>>> -VLOG_INFO("Device '%s' has been detached", dev->devargs);
>>
>> Please keep this log message for the successful case.
>>
>>> -} else {
>>> -VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +VLOG_INFO("Device '%s' has been removed", dev->devargs);
>>> +/* If this is the last port_id using this rte device
>>> + * remove this rte device and all its eth devices.
>>> + * FIXME: avoid direct access to DPDK internal array 
>>> rte_eth_devices.
>>> + */
>>> +rte_dev = rte_eth_devices[dev->port_id].device;
>>> +if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
>>
>> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
>> RTE_ETH_DEV_CLOSE_REMOVE flag.
>> The function netdev_dpdk_get_num_ports() skips unused ports.
>> The result should not be equal to 1. We need the other way to detect
>> the siblings or close the port after calculating them.
>> In this case we'll probably do not need to have check for UNUSED state
>> inside the _get_num_ports().
> 
> Maybe I don't understand correctly.
> Why do you want to count closed sibling ports?

Proposed version of 'netdev_dpdk_get_num_ports' iterates over
'dpdk_list' which contains only ports that currently opened by OVS.
So, the only port that could have UNUSED state in that list is the
port that we're just closed here. If we'll count number of siblings
before closing the port where should be no UNUSED ports in that list.

The issue I tried to raise here is that rte_eth_dev_close() changed the
state only for PMDs with RTE_ETH_DEV_CLOSE_REMOVE. So, the result
of 'netdev_dpdk_get_num_ports' will be different for different PMDs.

> 
>>> +if (rte_dev_remove(rte_dev) < 0) {
>>> +VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +}
>>>  }
>>>  }
> 
> [...]
>> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
>> devargs iterator ?
> 
> The devargs iterator manages "mac=" property in DPDK 18.11.

Good.

> 
> [...]
>>> -rte_eth_dev_close(port_id);
>>> +/* FIXME: avoid direct access to DPDK internal array rte_eth_devices. 
>>> */
>>> +rte_dev = rte_eth_devices[port_id].device;
>>> +if (netdev_dpdk_get_num_ports(rte_dev)) {
>>
>> Probably same here.
>> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without 
>> RTE_ETH_DEV_CLOSE_REMOVE flag.
> 
> If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
> just after rte_eth_dev_close().

And what if we call rte_dev_remove() after rte_eth_dev_close() for
PMD with RTE_ETH_DEV_CLOSE_REMOVE ?

> 
>>> +response = xasprintf("Device '%s' is being shared with other "
>>> + "interfaces. Remove them before detaching.",
>>> + argv[1]);
>>> +goto error;
>>> +}
>>>  
>>> -rte_eth_dev_info_get(port_id, &dev_info);
>>> -if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>>> -response = xasprintf("Device '%s' can not be detached", argv[1]);
>>> +rte_eth_dev_close(port_id);
>>> +if (rte_dev_remove(rte_dev) < 0) {
>>> +response = xasprintf("Device '%s' can not be removed", argv[1]);
> 
> 
> 
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

2019-01-10 Thread Thomas Monjalon
Hi,

10/01/2019 12:32, Ilya Maximets:
> On 08.01.2019 12:31, Ophir Munk wrote:
> >  if (dev->attached) {
> > +/* Remove the port eth device. */
> >  rte_eth_dev_close(dev->port_id);
> > -rte_eth_dev_info_get(dev->port_id, &dev_info);
> > -if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > -VLOG_INFO("Device '%s' has been detached", dev->devargs);
> 
> Please keep this log message for the successful case.
> 
> > -} else {
> > -VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > +/* If this is the last port_id using this rte device
> > + * remove this rte device and all its eth devices.
> > + * FIXME: avoid direct access to DPDK internal array 
> > rte_eth_devices.
> > + */
> > +rte_dev = rte_eth_devices[dev->port_id].device;
> > +if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> 
> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> RTE_ETH_DEV_CLOSE_REMOVE flag.
> The function netdev_dpdk_get_num_ports() skips unused ports.
> The result should not be equal to 1. We need the other way to detect
> the siblings or close the port after calculating them.
> In this case we'll probably do not need to have check for UNUSED state
> inside the _get_num_ports().

Maybe I don't understand correctly.
Why do you want to count closed sibling ports?

> > +if (rte_dev_remove(rte_dev) < 0) {
> > +VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +}
> >  }
> >  }

[...]
> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> devargs iterator ?

The devargs iterator manages "mac=" property in DPDK 18.11.

[...]
> > -rte_eth_dev_close(port_id);
> > +/* FIXME: avoid direct access to DPDK internal array rte_eth_devices. 
> > */
> > +rte_dev = rte_eth_devices[port_id].device;
> > +if (netdev_dpdk_get_num_ports(rte_dev)) {
> 
> Probably same here.
> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without 
> RTE_ETH_DEV_CLOSE_REMOVE flag.

If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
just after rte_eth_dev_close().

> > +response = xasprintf("Device '%s' is being shared with other "
> > + "interfaces. Remove them before detaching.",
> > + argv[1]);
> > +goto error;
> > +}
> >  
> > -rte_eth_dev_info_get(port_id, &dev_info);
> > -if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > -response = xasprintf("Device '%s' can not be detached", argv[1]);
> > +rte_eth_dev_close(port_id);
> > +if (rte_dev_remove(rte_dev) < 0) {
> > +response = xasprintf("Device '%s' can not be removed", argv[1]);



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread Lam, Tiago
On 10/01/2019 09:41, David Marchand wrote:
> 
> On Thu, Jan 10, 2019 at 10:32 AM Lam, Tiago  > wrote:
> 
> On 10/01/2019 09:12, David Marchand wrote:
> > On Thu, Jan 10, 2019 at 10:11 AM David Marchand
> > mailto:david.march...@redhat.com> 
> >> wrote:
> >
> >     This part triggers build (non fatal o_O) warnings on fedora 28.
> >
> >
> > Oops, important.
> > This is when building without dpdk support.
> 
> I don't seem to be able to trigger this locally, can you point out what
> gcc version you're using (or clang)? The Travis [1] and the ML bot
> didn't complain either, so this might be specific to certain versions.
> But I'll gladly take a look if I'm to replicate - although it warning
> seems fairly straightforward to fix.
> 
> 
> Sure:
> # gcc --version
> gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5)

Thanks, I'll include the below snippet in the next revision which,
I believe, should fix it.

Tiago.

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 970aaf2..3f3b7e0 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -402,13 +402,11 @@ dp_packet_read_data(const struct dp_packet *b, size_t 
offset, size_t size,
 /* Non-DPDK dp_packets should always hit the above condition */
 ovs_assert(1);
 #endif
+} else {/* Copy all data. */
+*ptr = buf;
+dp_packet_copy_from_offset(b, offset, size, buf);
 }

-/* Copy all data */
-
-*ptr = buf;
-dp_packet_copy_from_offset(b, offset, size, buf);
-
 return 0;
 }
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread David Marchand
On Thu, Jan 10, 2019 at 10:32 AM Lam, Tiago  wrote:

> On 10/01/2019 09:12, David Marchand wrote:
> > On Thu, Jan 10, 2019 at 10:11 AM David Marchand
> > mailto:david.march...@redhat.com>> wrote:
> >
> > This part triggers build (non fatal o_O) warnings on fedora 28.
> >
> >
> > Oops, important.
> > This is when building without dpdk support.
>
> I don't seem to be able to trigger this locally, can you point out what
> gcc version you're using (or clang)? The Travis [1] and the ML bot
> didn't complain either, so this might be specific to certain versions.
> But I'll gladly take a look if I'm to replicate - although it warning
> seems fairly straightforward to fix.
>

Sure:
# gcc --version
gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5)


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread David Marchand
On Thu, Jan 10, 2019 at 10:11 AM David Marchand 
wrote:

> This part triggers build (non fatal o_O) warnings on fedora 28.
>

Oops, important.
This is when building without dpdk support.


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 06/11] dp-packet: Add support for data "linearization".

2019-01-10 Thread David Marchand
Hello Tiago,

On Thu, Jan 10, 2019 at 1:14 AM Tiago Lam  wrote:

> Previous commits have added support to the dp_packet API to handle
> multi-segmented packets, where data is not stored contiguously in
> memory. However, in some cases, it is inevitable and data must be
> provided contiguously. Examples of such cases are when performing csums
> over the entire packet data, or when write()'ing to a file descriptor
> (for a tap interface, for example). For such cases, the dp_packet API
> has been extended to provide a way to transform a multi-segmented
> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
> copy of memory). If the packet's data is already stored in memory
> contigously then there's no need to convert the packet.
>
> Thus, the main use cases that were assuming that a dp_packet's data is
> always held contiguously in memory were changed to make use of the new
> "linear functions" in the dp_packet API when there's a need to traverse
> the entire's packet data. Per the example above, when the packet's data
> needs to be write() to the tap's file descriptor, or when the conntrack
> module needs to verify a packet's checksum, the data is now linearized.
>
> Additionally, the miniflow_extract() function has been modified to check
> if the respective packet headers don't span across multiple mbufs. This
> requirement is needed to guarantee that callers can assume headers are
> always in contiguous memory.
>
> Signed-off-by: Tiago Lam 
> ---
>
>
[snip]


> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index a003546..970aaf2 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
>
> [snip]


> @@ -356,6 +379,39 @@ dp_packet_try_pull(struct dp_packet *b, size_t size)
>  ? dp_packet_pull(b, size) : NULL;
>  }
>
> +/* Reads 'size' bytes from 'offset' in 'b', linearly, to 'ptr', if 'buf'
> is
> + * NULL. Otherwise, if a 'buf' is provided, it must have 'size' bytes,
> and the
> + * data will be copied there, iff it is found to be non-linear. */
> +static inline ssize_t
> +dp_packet_read_data(const struct dp_packet *b, size_t offset, size_t size,
> +void **ptr, void *buf) {
> +/* Zero copy */
> +if ((*ptr = dp_packet_at(b, offset, size)) != NULL) {
> +return 0;
> +}
> +
> +/* Copy available linear data */
> +if (buf == NULL) {
> +#ifdef DPDK_NETDEV
> +size_t mofs = offset;
> +const struct rte_mbuf *mbuf = dp_packet_mbuf_from_offset(b,
> &mofs);
> +*ptr = dp_packet_at(b, offset, mbuf->data_len - mofs);
> +
> +return size - (mbuf->data_len - mofs);
> +#else
> +/* Non-DPDK dp_packets should always hit the above condition */
> +ovs_assert(1);
> +#endif
> +}
> +
> +/* Copy all data */
> +
> +*ptr = buf;
> +dp_packet_copy_from_offset(b, offset, size, buf);
> +
> +return 0;
> +}
> +
>  static inline bool
>  dp_packet_is_eth(const struct dp_packet *b)
>  {
>

This part triggers build (non fatal o_O) warnings on fedora 28.
I did not take time to analyse this yet, tell me if you can have a look.


In file included from /vagrant/lib/packets.c:35:
In function ‘dp_packet_copy_from_offset’,
inlined from ‘dp_packet_read_data.part.16.constprop’ at
/vagrant/lib/dp-packet.h:410:5,
inlined from ‘dp_packet_read_data.constprop’ at
/vagrant/lib/dp-packet.h:386:1,
inlined from ‘packet_csum_continue’ at /vagrant/lib/packets.c:1705:15:
/vagrant/lib/dp-packet.h:1058:5: warning: argument 1 null where non-null
expected [-Wnonnull]
 memcpy(buf, (char *)dp_packet_data(b) + offset, size);
 ^
In file included from ./lib/string.h:20,
 from /vagrant/lib/packets.h:23,
 from /vagrant/lib/packets.c:18:
/vagrant/lib/packets.c: In function ‘packet_csum_continue’:
/usr/include/string.h:42:14: note: in a call to function ‘memcpy’ declared
here
 extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
  ^~
In file included from /vagrant/lib/packets.c:35:
In function ‘dp_packet_copy_from_offset’,
inlined from ‘dp_packet_read_data.part.16.constprop’ at
/vagrant/lib/dp-packet.h:410:5,
inlined from ‘dp_packet_read_data.constprop’ at
/vagrant/lib/dp-packet.h:386:1,
inlined from ‘packet_crc32c’ at /vagrant/lib/packets.c:1732:15:
/vagrant/lib/dp-packet.h:1058:5: warning: argument 1 null where non-null
expected [-Wnonnull]
 memcpy(buf, (char *)dp_packet_data(b) + offset, size);
 ^
In file included from ./lib/string.h:20,
 from /vagrant/lib/packets.h:23,
 from /vagrant/lib/packets.c:18:
/vagrant/lib/packets.c: In function ‘packet_crc32c’:
/usr/include/string.h:42:14: note: in a call to function ‘memcpy’ declared
here
 extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
  ^~


-- 
David Marchand
_

Re: [ovs-dev] [patch v1] conntrack: Fix FTP seq_skew boundary adjustments.

2019-01-10 Thread David Marchand
Hello,

On Wed, Jan 9, 2019 at 4:44 AM Darrell Ball  wrote:

> Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> Signed-off-by: Darrell Ball 
> ---
>
> Backport to 2.8.
>
>  lib/conntrack.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 6f6021a..e992b77 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -3255,10 +3255,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
> conn_lookup_ctx *ctx,
>  uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack));
>
>  if ((seq_skew > 0) && (tcp_ack < seq_skew)) {
> -/* Should not be possible; will be marked invalid. */
> -tcp_ack = 0;
> +tcp_ack = UINT32_MAX - (seq_skew - tcp_ack - 1);
>  } else if ((seq_skew < 0) && (UINT32_MAX - tcp_ack <
> -seq_skew)) {
> -tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack);
> +tcp_ack = (-seq_skew) - (UINT32_MAX - tcp_ack) - 1;
>  } else {
>  tcp_ack -= seq_skew;
>  }
> @@ -3267,10 +3266,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct
> conn_lookup_ctx *ctx,
>  } else {
>  uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq));
>  if ((seq_skew > 0) && (UINT32_MAX - tcp_seq < seq_skew)) {
> -tcp_seq = seq_skew - (UINT32_MAX - tcp_seq);
> +tcp_seq = seq_skew - (UINT32_MAX - tcp_seq) - 1;
>  } else if ((seq_skew < 0) && (tcp_seq < -seq_skew)) {
> -/* Should not be possible; will be marked invalid. */
> -tcp_seq = 0;
> +tcp_seq = UINT32_MAX - ((-seq_skew) - tcp_seq - 1);
>  } else {
>  tcp_seq += seq_skew;
>  }
> --
> 1.9.1
>
>
Ah, now that I think about it, I had seen some packets with ack == 0 but
did not reproduce (it was in my todolist).
Good catch.

Just wondering, can't we rely integer promotion then truncation on 32bits ?
My test program tends to show it works.

Something like:

diff --git a/lib/conntrack.c b/lib/conntrack.c
index eb36353..04ddf5e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -3329,32 +3329,13 @@ handle_ftp_ctl(struct conntrack *ct, const struct
conn_lookup_ctx *ctx,

 if (nat && ec->seq_skew != 0) {
 if (ctx->reply != ec->seq_skew_dir) {
-
 uint32_t tcp_ack = ntohl(get_16aligned_be32(&th->tcp_ack));

-if ((ec->seq_skew > 0) && (tcp_ack < ec->seq_skew)) {
-/* Should not be possible; will be marked invalid. */
-tcp_ack = 0;
-} else if ((ec->seq_skew < 0) &&
-   (UINT32_MAX - tcp_ack < -ec->seq_skew)) {
-tcp_ack = (-ec->seq_skew) - (UINT32_MAX - tcp_ack);
-} else {
-tcp_ack -= ec->seq_skew;
-}
-ovs_be32 new_tcp_ack = htonl(tcp_ack);
-put_16aligned_be32(&th->tcp_ack, new_tcp_ack);
+put_16aligned_be32(&th->tcp_ack, htonl(tcp_ack -
ec->seq_skew));
 } else {
 uint32_t tcp_seq = ntohl(get_16aligned_be32(&th->tcp_seq));
-if ((ec->seq_skew > 0) && (UINT32_MAX - tcp_seq <
ec->seq_skew)) {
-tcp_seq = ec->seq_skew - (UINT32_MAX - tcp_seq);
-} else if ((ec->seq_skew < 0) && (tcp_seq < -ec->seq_skew)) {
-/* Should not be possible; will be marked invalid. */
-tcp_seq = 0;
-} else {
-tcp_seq += ec->seq_skew;
-}
-ovs_be32 new_tcp_seq = htonl(tcp_seq);
-put_16aligned_be32(&th->tcp_seq, new_tcp_seq);
+
+put_16aligned_be32(&th->tcp_seq, htonl(tcp_seq +
ec->seq_skew));
 }
 }


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] odp-util: Do not rewrite fields with the same values as matched

2019-01-10 Thread Eli Britstein
To improve performance and avoid wasting resources for HW offloaded
flows, do not rewrite fields that are matched with the same value.

Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
---
 lib/odp-util.c| 110 +-
 tests/mpls-xlate.at   |   2 +-
 tests/ofproto-dpif.at |  14 +++
 3 files changed, 108 insertions(+), 18 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 0491bed38..7e916f9d9 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -7086,12 +7086,50 @@ commit_odp_tunnel_action(const struct flow *flow, 
struct flow *base,
 }
 }
 
+struct ovs_key_field_properties {
+int offset;
+int size;
+};
+
+/* Compare the keys similary to memcmp, but each field separately.
+ * The offsets and sizes of each field is provided by field_properties
+ * argument.
+ * For fields with the same value, zero out their mask part in order not to
+ * rewrite them as it's unnecessary */
+static bool
+keycmp_mask(const void *key0, const void *key1,
+struct ovs_key_field_properties *field_properties, void *mask)
+{
+int field, i;
+bool differ = false;
+
+for (field = 0 ; ; field++) {
+int size = field_properties[field].size;
+int offset = field_properties[field].offset;
+char *pkey0 = ((char *)key0) + offset;
+char *pkey1 = ((char *)key1) + offset;
+char *pmask = ((char *)mask) + offset;
+
+if (size == 0)
+break;
+
+for (i = 0; i < size; i++)
+if (memcmp(pkey0, pkey1, size) == 0)
+memset(pmask, 0, size);
+else
+differ = true;
+}
+
+return differ;
+}
+
 static bool
 commit(enum ovs_key_attr attr, bool use_masked_set,
const void *key, void *base, void *mask, size_t size,
+   struct ovs_key_field_properties *field_properties,
struct ofpbuf *odp_actions)
 {
-if (memcmp(key, base, size)) {
+if (keycmp_mask(key, base, field_properties, mask)) {
 bool fully_masked = odp_mask_is_exact(attr, mask, size);
 
 if (use_masked_set && !fully_masked) {
@@ -7133,6 +7171,12 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 bool use_masked)
 {
 struct ovs_key_ethernet key, base, mask;
+struct ovs_key_field_properties ovs_key_ethernet_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ethernet, name), 
sizeof(type)},
+OVS_KEY_ETHERNET_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 if (flow->packet_type != htonl(PT_ETH)) {
 return;
@@ -7143,7 +7187,8 @@ commit_set_ether_action(const struct flow *flow, struct 
flow *base_flow,
 get_ethernet_key(&wc->masks, &mask);
 
 if (commit(OVS_KEY_ATTR_ETHERNET, use_masked,
-   &key, &base, &mask, sizeof key, odp_actions)) {
+   &key, &base, &mask, sizeof key,
+   ovs_key_ethernet_field_properties, odp_actions)) {
 put_ethernet_key(&base, base_flow);
 put_ethernet_key(&mask, &wc->masks);
 }
@@ -7269,6 +7314,12 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
bool use_masked)
 {
 struct ovs_key_ipv4 key, mask, base;
+struct ovs_key_field_properties ovs_key_ipv4_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv4, name), 
sizeof(type)},
+OVS_KEY_IPV4_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 /* Check that nw_proto and nw_frag remain unchanged. */
 ovs_assert(flow->nw_proto == base_flow->nw_proto &&
@@ -7286,7 +7337,7 @@ commit_set_ipv4_action(const struct flow *flow, struct 
flow *base_flow,
 }
 
 if (commit(OVS_KEY_ATTR_IPV4, use_masked, &key, &base, &mask, sizeof key,
-   odp_actions)) {
+   ovs_key_ipv4_field_properties, odp_actions)) {
 put_ipv4_key(&base, base_flow, false);
 if (mask.ipv4_proto != 0) { /* Mask was changed by commit(). */
 put_ipv4_key(&mask, &wc->masks, true);
@@ -7324,6 +7375,12 @@ commit_set_ipv6_action(const struct flow *flow, struct 
flow *base_flow,
bool use_masked)
 {
 struct ovs_key_ipv6 key, mask, base;
+struct ovs_key_field_properties ovs_key_ipv6_field_properties[] = {
+#define OVS_KEY_FIELD(type, name) {offsetof(struct ovs_key_ipv6, name), 
sizeof(type)},
+OVS_KEY_IPV6_FIELDS
+{0, 0}
+#undef OVS_KEY_FIELD
+};
 
 /* Check that nw_proto and nw_frag remain unchanged. */
 ovs_assert(flow->nw_proto == base_flow->nw_proto &&
@@ -7342,7 +7399,7 @@ commit_set_ipv6_action(const struct flow *flow, struct 
flow *base_flow,
 }
 
 if (commit(OVS_KEY_ATTR_IPV6, use_masked, &key, &base, &mask, sizeof key,
-   odp_actions)) {
+   ovs_key_ipv6_field_properties, odp_actions)) {
 put_ipv6_key(&base, base_flow, false);
 if (mask.ipv6_proto != 0) { /

[ovs-dev] [PATCH 1/2] datapath: Declare ovs key structures using macros

2019-01-10 Thread Eli Britstein
Declare ovs key structures using macros as a pre-step of retrieving
field information, with no functional change.

Signed-off-by: Eli Britstein 
Reviewed-by: Roi Dayan 
---
 build-aux/extract-odp-netlink-h   |   3 +-
 datapath/linux/compat/include/linux/openvswitch.h | 102 +++---
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/build-aux/extract-odp-netlink-h b/build-aux/extract-odp-netlink-h
index bc1cc35a7..c413fdbf4 100755
--- a/build-aux/extract-odp-netlink-h
+++ b/build-aux/extract-odp-netlink-h
@@ -42,7 +42,7 @@ $i\
 s,,"openvswitch/types.h"\
 #include ,
 s,#.*,,
-s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]*\]/struct
 eth_addr \1/
+s/OVS_KEY_FIELD_ARR(__u8,[[:space:]]*\([a-zA-Z0-9_]*\),[[:space:]]*ETH_ALEN[[:space:]]*/OVS_KEY_FIELD(struct
 eth_addr, \1/
 
 # Transform IPv6 addresses from an array to struct in6_addr
 #
@@ -50,6 +50,7 @@ 
s/__u8[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*\[[[:space:]]*ETH_ALEN[[:space:]]
 # one character because struct ovs_key_nsh has a member "__be32 c[4];"
 # that is not an IPv6 address.
 
s/__be32[[:space:]]*\([a-zA-Z0-9_]\{2,\}\)[[:space:]]*\[[[:space:]]*4[[:space:]]*\]/struct
 in6_addr \1/
+s/OVS_KEY_FIELD_ARR(__be32,[[:space:]]*\([a-zA-Z0-9_]\{2,\}\),[[:space:]]*[[:space:]]*4[[:space:]]*/OVS_KEY_FIELD(struct
 in6_addr, \1/
 
 # Transform most Linux-specific __u types into C99 uint_t types,
 # and most Linux-specific __be into Open vSwitch ovs_be,
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 9b087f1b0..3af82a9b5 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -422,73 +422,109 @@ enum ovs_frag_type {
 
 #define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
 
+#define OVS_KEY_FIELD_ARR(type, name, elements) type name[elements];
+#define OVS_KEY_FIELD(type, name) type name;
+
+#define OVS_KEY_ETHERNET_FIELDS \
+OVS_KEY_FIELD_ARR(__u8, eth_src, ETH_ALEN) \
+OVS_KEY_FIELD_ARR(__u8, eth_dst, ETH_ALEN)
+
 struct ovs_key_ethernet {
-   __u8 eth_src[ETH_ALEN];
-   __u8 eth_dst[ETH_ALEN];
+OVS_KEY_ETHERNET_FIELDS
 };
 
 struct ovs_key_mpls {
__be32 mpls_lse;
 };
 
+#define OVS_KEY_IPV4_FIELDS \
+OVS_KEY_FIELD(__be32, ipv4_src) \
+OVS_KEY_FIELD(__be32, ipv4_dst) \
+OVS_KEY_FIELD(__u8, ipv4_proto) \
+OVS_KEY_FIELD(__u8, ipv4_tos) \
+OVS_KEY_FIELD(__u8, ipv4_ttl) \
+OVS_KEY_FIELD(__u8, ipv4_frag /* One of OVS_FRAG_TYPE_*. */)
+
 struct ovs_key_ipv4 {
-   __be32 ipv4_src;
-   __be32 ipv4_dst;
-   __u8   ipv4_proto;
-   __u8   ipv4_tos;
-   __u8   ipv4_ttl;
-   __u8   ipv4_frag;   /* One of OVS_FRAG_TYPE_*. */
+OVS_KEY_IPV4_FIELDS
 };
 
+#define OVS_KEY_IPV6_FIELDS \
+OVS_KEY_FIELD_ARR(__be32, ipv6_src, 4) \
+OVS_KEY_FIELD_ARR(__be32, ipv6_dst, 4) \
+OVS_KEY_FIELD(__be32, ipv6_label /* 20-bits in least-significant bits. */) 
\
+OVS_KEY_FIELD(__u8, ipv6_proto) \
+OVS_KEY_FIELD(__u8, ipv6_tclass) \
+OVS_KEY_FIELD(__u8, ipv6_hlimit) \
+OVS_KEY_FIELD(__u8, ipv6_frag /* One of OVS_FRAG_TYPE_*. */)
+
 struct ovs_key_ipv6 {
-   __be32 ipv6_src[4];
-   __be32 ipv6_dst[4];
-   __be32 ipv6_label;  /* 20-bits in least-significant bits. */
-   __u8   ipv6_proto;
-   __u8   ipv6_tclass;
-   __u8   ipv6_hlimit;
-   __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
+OVS_KEY_IPV6_FIELDS
 };
 
+#define OVS_KEY_TCP_FIELDS \
+OVS_KEY_FIELD(__be16, tcp_src) \
+OVS_KEY_FIELD(__be16, tcp_dst)
+
 struct ovs_key_tcp {
-   __be16 tcp_src;
-   __be16 tcp_dst;
+OVS_KEY_TCP_FIELDS
 };
 
+#define OVS_KEY_UDP_FIELDS \
+OVS_KEY_FIELD(__be16, udp_src) \
+OVS_KEY_FIELD(__be16, udp_dst)
+
 struct ovs_key_udp {
-   __be16 udp_src;
-   __be16 udp_dst;
+OVS_KEY_UDP_FIELDS
 };
 
+#define OVS_KEY_SCTP_FIELDS \
+OVS_KEY_FIELD(__be16, sctp_src) \
+OVS_KEY_FIELD(__be16, sctp_dst)
+
 struct ovs_key_sctp {
-   __be16 sctp_src;
-   __be16 sctp_dst;
+OVS_KEY_SCTP_FIELDS
 };
 
+#define OVS_KEY_ICMP_FIELDS \
+OVS_KEY_FIELD(__u8, icmp_type) \
+OVS_KEY_FIELD(__u8, icmp_code)
+
 struct ovs_key_icmp {
-   __u8 icmp_type;
-   __u8 icmp_code;
+OVS_KEY_ICMP_FIELDS
 };
 
+#define OVS_KEY_ICMPV6_FIELDS \
+OVS_KEY_FIELD(__u8, icmpv6_type) \
+OVS_KEY_FIELD(__u8, icmpv6_code)
+
 struct ovs_key_icmpv6 {
-   __u8 icmpv6_type;
-   __u8 icmpv6_code;
+OVS_KEY_ICMPV6_FIELDS
 };
 
+#define OVS_KEY_ARP_FIELDS \
+OVS_KEY_FIELD(__be32, arp_sip) \
+OVS_KEY_FIELD(__be32, arp_tip) \
+OVS_KEY_FIELD(__be16, arp_op) \
+OVS_KEY_FIELD_ARR(__u8, arp_sha, ETH_ALEN) \
+OVS_KEY_FIELD_ARR(__u8, arp_tha, ETH_ALEN)
+
 struct ovs_key_arp {
-   __be32 arp_sip;
-   __be32 arp_tip;
-   __be16 arp_op;
-   __u8   arp_sha[ETH_ALEN];
-   __u8   arp

[ovs-dev] [PATCH 0/2] Do not rewrite fields with the same values as

2019-01-10 Thread Eli Britstein
Hi

This patch set avoids unnecessary rewrite actions to fields with the
same values as matched on.

Patch 1 is a pre-step by defining ovs key structs using macros

Patch 2 avoids the unnecessary rewrites and adapts the tests accordingly


Travis link:
https://travis-ci.org/roidayan/ovs/builds/477699710
Appvoyer link:
https://ci.appveyor.com/project/roidayan/ovs/builds/21515073

Thanks,
Eli


Eli Britstein (2):
  datapath: Declare ovs key structures using macros
  odp-util: Do not rewrite fields with the same values as matched

 build-aux/extract-odp-netlink-h   |   3 +-
 datapath/linux/compat/include/linux/openvswitch.h | 102 +---
 lib/odp-util.c| 110 --
 tests/mpls-xlate.at   |   2 +-
 tests/ofproto-dpif.at |  14 +--
 5 files changed, 179 insertions(+), 52 deletions(-)

-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/8] vconn: Allow timout configuration for blocking connection.

2019-01-10 Thread Ilya Maximets
On 09.01.2019 23:27, Ben Pfaff wrote:
> On Wed, Jan 09, 2019 at 08:28:54PM +0300, Ilya Maximets wrote:
>> On 27.12.2018 20:36, Ben Pfaff wrote:
>>> On Wed, Dec 26, 2018 at 06:23:56PM +0300, Ilya Maximets wrote:
 On some systems in case where remote is not responding, socket could
 remain in SYN_SENT state for a really long time without errors waiting
 for connection. This leads to situations where vconn connection hangs
 for a few minutes waiting for connection to the DOWN remote.

 For example, this situation emulated by "refuse-connection" vconn
 testcase. This leads to test failures because Alarm signal arrives much
 faster than ETIMEDOUT from the socket:

   ./vconn.at:21: ovstest test-vconn refuse-connection tcp
   Alarm clock
   stderr:
   |socket_util|INFO|0:127.0.0.1: listening on port 63812
   |poll_loop|DBG|wakeup due to 0-ms timeout
   |poll_loop|DBG|wakeup due to 10155-ms timeout
   |fatal_signal|WARN|terminating with signal 14 (Alarm clock)
   ./vconn.at:21: exit code was 142, expected 0
   vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED

 This patch allowes to specify timeout value for vconn blocking
 connections. If the connection takes more time, socket will be closed
 with ETIMEDOUT error code. Negative value could be used to wait
 infinitely.

 Signed-off-by: Ilya Maximets 
>>>
>>> Same comments as patch 2.
>>>
>>> Are the timeouts only useful for the test cases?  I wonder whether just
>>> calling alarm(10); at the beginning of the test programs would be just
>>> as helpful.  On the other hand, it would make using a debugger on those
>>> programs harder.
>>
>> I guess, we have alarms in all the test programs.
>> The issue here is that some test apps like 'test_refuse_connection' treats
>> connection failure as a success. But on some systems, wrong connections hangs
>> for a really long time and alarm kills the test application. In this case
>> we can't say for sure if the test failed or not, i.e. if it was expected
>> connection failure or other random issue that forced the application to hang.
>>
>> stream connection tests even worse, because they are trying to sequentially
>> establish connection to one of 3 different remotes while only one of them is
>> correct. And it will never try to connect to correct one if the blocking
>> connection to wrong port will hang for a few minutes. It'll be simply killed
>> by alarm.
> 
> It should be possible to tell what caused the test program to exit by
> testing the the exit status.  When a program exits due to a signal, a
> Bourne-compatible shell sets $? to 128 plus the signal number.  Usually,
> it's good enough just to know that the process died with an unusual exit
> status, but you can get the particular signal name back with "kill -l
> $?", e.g. on Linux "kill -l 142" prints "ALRM".  This behavior is
> specified by POSIX so it should be portable.

Yes, we can detect that app was killed by alarm, but we can't say if it was
expected hang while connecting to the wrong port or it was just too long
execution due to random environment issue or a bug.

Let's look at "multiple remotes" test cases. Their workflow is following:

  1. alarm(10)
  2. Initialize idl with multiple remotes.
  2. RPC: Try to connect to WRONG_PORT_1. Fail expected.
  3. RPC: Try to connect to right port.
  4. Perform some ovsdb transactions.
  5. Check result.

Step 2 always hangs in CirrusCI environment and app dies there by alarm.
We can't treat this as success because we didn't check anything useful.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Fix table title for VM MQ config in dpdk howto.

2019-01-10 Thread Ian Stokes

On 8/31/2018 3:27 PM, Stephen Finucane wrote:

On Thu, 2018-08-30 at 17:36 +0100, Cian Ferriter wrote:

Found this when searching "BIOS Settings" for use with DPDK.

CC: Stephen Finucane 
Fixes: c50938a24031 ("doc: Convert INSTALL.DPDK-ADVANCED to rST")
Signed-off-by: Cian Ferriter 


Oops.

Acked-by: Stephen Finucane 


---
  Documentation/howto/dpdk.rst |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index ab3d576..6397d25 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -284,7 +284,7 @@ devices to bridge ``br0``. Once complete, follow the below 
steps:
 We must configure with appropriate software versions to ensure this feature
 is supported.
  
-   .. list-table:: Recommended BIOS Settings

+   .. list-table:: VM Configuration
:header-rows: 1
  
* - Setting




Apologies for the delay with this, fell through the cracks. I've applied 
this to master and backported from 2.10 to 2.7


Thanks
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] I would like to have an important discussion with you,

2019-01-10 Thread Klaus Dieter
I would like to have an important discussion with you,
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev