[ovs-dev] [PATCH v2] debian: Move libovn out from package libopenvswitch.

2018-08-24 Thread Han Zhou
From: Han Zhou 

Since we are packaging OVN and OVS components separately, libovn
shouldn't belong to OVS, so move it to ovn-common. Also, remove
it from libopenvswitch-dev.

Signed-off-by: Han Zhou 
---
 debian/libopenvswitch-dev.install | 12 ++--
 debian/libopenvswitch.install |  6 +-
 debian/ovn-common.install |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/debian/libopenvswitch-dev.install 
b/debian/libopenvswitch-dev.install
index ca3d22c..bda5a2c 100644
--- a/debian/libopenvswitch-dev.install
+++ b/debian/libopenvswitch-dev.install
@@ -1,5 +1,13 @@
-usr/lib/*/lib*.so
-usr/lib/*/lib*.a
+usr/lib/*/libopenvswitch*.so
+usr/lib/*/libofproto*.so
+usr/lib/*/libovsdb*.so
+usr/lib/*/libsflow*.so
+usr/lib/*/libvtep*.so
+usr/lib/*/libopenvswitch*.a
+usr/lib/*/libofproto*.a
+usr/lib/*/libovsdb*.a
+usr/lib/*/libsflow*.a
+usr/lib/*/libvtep*.a
 usr/lib/*/pkgconfig
 include/*.h usr/include/openvswitch
 include/openflow/*.h usr/include/openvswitch/openflow
diff --git a/debian/libopenvswitch.install b/debian/libopenvswitch.install
index 3ddde58..ce9dc40 100644
--- a/debian/libopenvswitch.install
+++ b/debian/libopenvswitch.install
@@ -1 +1,5 @@
-usr/lib/*/lib*.so.*
+usr/lib/*/libopenvswitch*.so.*
+usr/lib/*/libofproto*.so.*
+usr/lib/*/libovsdb*.so.*
+usr/lib/*/libsflow*.so.*
+usr/lib/*/libvtep*.so.*
diff --git a/debian/ovn-common.install b/debian/ovn-common.install
index e95ee6c..90484d2 100644
--- a/debian/ovn-common.install
+++ b/debian/ovn-common.install
@@ -4,3 +4,4 @@ usr/bin/ovn-trace
 usr/bin/ovn-detrace
 usr/share/openvswitch/scripts/ovn-ctl
 usr/share/openvswitch/scripts/ovndb-servers.ocf
+usr/lib/*/libovn*.so.*
-- 
2.1.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] debian: Move libovn out from package libopenvswitch.

2018-08-24 Thread Han Zhou
On Fri, Aug 24, 2018 at 5:56 PM 0-day Robot  wrote:
>
> Bleep bloop.  Greetings Han Zhou, I am a robot and I have tried out your
patch.
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> checkpatch:
> ERROR: Author Han Zhou  needs to sign off.
> WARNING: Unexpected sign-offs from developers who are not authors or
co-authors or committers: Han Zhou 
> Lines checked: 60, Warnings: 1, Errors: 1
>
>
> Please check this out.  If you feel there has been an error, please email
acon...@bytheb.org
>
> Thanks,
> 0-day Robot

Sorry, it seems I messed up the other way this time. I will send v2 trying
to making the signed-off aligned with author ;)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] debian: Move libovn out from package libopenvswitch.

2018-08-24 Thread 0-day Robot
Bleep bloop.  Greetings Han Zhou, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Han Zhou  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Han Zhou 
Lines checked: 60, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/4] tests: Add regression tests for all the bugs found by oss-fuzz so far.

2018-08-24 Thread Ben Pfaff
This should actually be at the end of the series because the series
fixes some of the bugs that it tests for.  Other than ordering, the
series is correct and ready for review.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] debian: Move libovn out from package libopenvswitch.

2018-08-24 Thread Han Zhou
From: Han Zhou 

Since we are packaging OVN and OVS components separately, libovn
shouldn't belong to OVS, so move it to ovn-common. Also, remove
it from libopenvswitch-dev.

Signed-off-by: Han Zhou 
---
 debian/libopenvswitch-dev.install | 12 ++--
 debian/libopenvswitch.install |  6 +-
 debian/ovn-common.install |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/debian/libopenvswitch-dev.install 
b/debian/libopenvswitch-dev.install
index ca3d22c..bda5a2c 100644
--- a/debian/libopenvswitch-dev.install
+++ b/debian/libopenvswitch-dev.install
@@ -1,5 +1,13 @@
-usr/lib/*/lib*.so
-usr/lib/*/lib*.a
+usr/lib/*/libopenvswitch*.so
+usr/lib/*/libofproto*.so
+usr/lib/*/libovsdb*.so
+usr/lib/*/libsflow*.so
+usr/lib/*/libvtep*.so
+usr/lib/*/libopenvswitch*.a
+usr/lib/*/libofproto*.a
+usr/lib/*/libovsdb*.a
+usr/lib/*/libsflow*.a
+usr/lib/*/libvtep*.a
 usr/lib/*/pkgconfig
 include/*.h usr/include/openvswitch
 include/openflow/*.h usr/include/openvswitch/openflow
diff --git a/debian/libopenvswitch.install b/debian/libopenvswitch.install
index 3ddde58..ce9dc40 100644
--- a/debian/libopenvswitch.install
+++ b/debian/libopenvswitch.install
@@ -1 +1,5 @@
-usr/lib/*/lib*.so.*
+usr/lib/*/libopenvswitch*.so.*
+usr/lib/*/libofproto*.so.*
+usr/lib/*/libovsdb*.so.*
+usr/lib/*/libsflow*.so.*
+usr/lib/*/libvtep*.so.*
diff --git a/debian/ovn-common.install b/debian/ovn-common.install
index e95ee6c..90484d2 100644
--- a/debian/ovn-common.install
+++ b/debian/ovn-common.install
@@ -4,3 +4,4 @@ usr/bin/ovn-trace
 usr/bin/ovn-detrace
 usr/share/openvswitch/scripts/ovn-ctl
 usr/share/openvswitch/scripts/ovndb-servers.ocf
+usr/lib/*/libovn*.so.*
-- 
2.1.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Add support to configure ct zone limits

2018-08-24 Thread Sairam Venugopal
Hi Anand,

Thanks for the patch. See comments inline. 

Thanks,
Sairam

On 8/21/18, 2:58 PM, "ovs-dev-boun...@openvswitch.org on behalf of Anand 
Kumar"  
wrote:

This patch implements limiting conntrack entries
per zone using dpctl commands.

Example:
ovs-appctl dpctl/ct-set-limits default=5 zone=1,limit=2 zone=1,limit=3
ovs-appctl dpct/ct-del-limits zone=4
ovs-appctl dpct/ct-get-limits zone=1,2,3

- Also update the netlink-socket.c to support netlink family
  'OVS_WIN_NL_CTLIMIT_FAMILY_ID' for conntrack zone limit.

Signed-off-by: Anand Kumar 
---
 datapath-windows/include/OvsDpInterfaceExt.h |   1 +
 datapath-windows/ovsext/Conntrack.c  | 163 
++-
 datapath-windows/ovsext/Conntrack.h  |  12 ++
 datapath-windows/ovsext/Datapath.c   |  34 +-
 lib/netlink-socket.c |   5 +
 5 files changed, 212 insertions(+), 3 deletions(-)

diff --git a/datapath-windows/include/OvsDpInterfaceExt.h 
b/datapath-windows/include/OvsDpInterfaceExt.h
index db91c3e..5fd8000 100644
--- a/datapath-windows/include/OvsDpInterfaceExt.h
+++ b/datapath-windows/include/OvsDpInterfaceExt.h
@@ -72,6 +72,7 @@
  */
 
 #define OVS_WIN_NL_CT_FAMILY_ID  (NLMSG_MIN_TYPE + 7)
+#define OVS_WIN_NL_CTLIMIT_FAMILY_ID (NLMSG_MIN_TYPE + 8)
 
 #define OVS_WIN_NL_INVALID_MCGRP_ID  0
 #define OVS_WIN_NL_MCGRP_START_ID100
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index dd16602..b806cd7 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -34,6 +34,8 @@ static OVS_CT_THREAD_CTX ctThreadCtx;
 static PNDIS_RW_LOCK_EX *ovsCtBucketLock = NULL;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 static ULONG ctTotalEntries;
+static POVS_CT_ZONE_INFO zoneInfo = NULL;
+static ULONG defaultCtLimit;
 
 static __inline OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 
*tuple);
 static __inline NDIS_STATUS
@@ -99,6 +101,19 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 if (status != STATUS_SUCCESS) {
 OvsCleanupConntrack();
 }
+

Sai: Can you move the following prior to the OvsNatInit or handle 
OvsCleanupConntrack()?
Also, shouldn't zoneInfo have a lock for manipulation?

+zoneInfo = OvsAllocateMemoryWithTag(sizeof(OVS_CT_ZONE_INFO) *
+(UINT16_MAX + 1), OVS_CT_POOL_TAG);
+if (zoneInfo == NULL) {
+status = STATUS_INSUFFICIENT_RESOURCES;
+goto freeBucketLock;
+}
+
+defaultCtLimit = CT_MAX_ENTRIES;
+for (int i = 0; i <= UINT16_MAX; i++) {
+zoneInfo[i].entries = 0;
+zoneInfo[i].limit = defaultCtLimit;
+}
 return STATUS_SUCCESS;
 
 freeBucketLock:
@@ -149,6 +164,22 @@ OvsCleanupConntrack(VOID)
 OvsFreeMemoryWithTag(ovsCtBucketLock, OVS_CT_POOL_TAG);
 ovsCtBucketLock = NULL;
 OvsNatCleanup();
+if (zoneInfo) {
+OvsFreeMemoryWithTag(zoneInfo, OVS_CT_POOL_TAG);
+}
+}
+
+VOID


Sai: can we set zone to UINT16 instead of int?

+OvsCtSetZoneLimit(int zone, ULONG value) {
+   if (zone == -1) {
+/* Set default limit for all zones. */
+defaultCtLimit = value;
+for (UINT32 i = 0; i <= UINT16_MAX; i++) {
+zoneInfo[i].limit = value;
+}
+} else {
+zoneInfo[(UINT16)zone].limit = value;
+}
 }
 
 /*
@@ -263,6 +294,7 @@ OvsCtAddEntry(POVS_CT_ENTRY entry,
>link);
 
 NdisInterlockedIncrement((PLONG));
+zoneInfo[ctx->key.zone].entries++;
 NdisReleaseRWLock(ovsCtBucketLock[bucketIdx], );
 return TRUE;
 }
@@ -437,6 +469,7 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN 
forceDelete)
 if (entry->natInfo.natAction) {
 OvsNatDeleteKey(>key);
 }
+zoneInfo[entry->key.zone].entries--;
 OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
 RemoveEntryList(>link);
 OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql);
@@ -877,12 +910,16 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
  );
 
 } else {
-if (commit && ctTotalEntries >= CT_MAX_ENTRIES) {
+if (commit && (ctTotalEntries >= CT_MAX_ENTRIES ||
+zoneInfo[ctx.key.zone].entries >= 
zoneInfo[ctx.key.zone].limit)) {
 /* Don't proceed with processing if the max limit has been hit.
  * This blocks only new entries from being created and doesn't
  * affect existing connections.
  */
-

[ovs-dev] [PATCH 4/4] ofp-actions: Re-fix error path for parsing OpenFlow actions.

2018-08-24 Thread Ben Pfaff
A previous commit attempted to fix the error path when the actions nested
within clone provoked an error.  However, this commit just introduced a new
problem in another case, since it made ofpacts_pull_openflow_actions__()
restore a previously valid pointer to data that might have been
reallocated.

This commit takes another approach.  Instead of trying to restore anything
at all, it just defines ofpacts_pull_openflow_actions__() to clear the
output buffer when there's an error.  It seems that this is less error
prone.  Most of the callers don't care; this commit fixes up the ones that
do.

Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9975
Fixes: 20cdd1dbd546 ("ofp-actions: Avoid assertion failure for clone(ct(...bad 
actions...)).")
Signed-off-by: Ben Pfaff 
---
 lib/ofp-actions.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 6adb55d23b02..a80a4a308dba 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -5888,6 +5888,9 @@ decode_NXAST_RAW_CLONE(const struct ext_action_header 
*eah,
 ofp_version,
 1u << 
OVSINST_OFPIT11_APPLY_ACTIONS,
 out, 0, vl_mff_map, tlv_bitmap);
+if (error) {
+return error;
+}
 clone = ofpbuf_push_uninit(out, sizeof *clone);
 out->header = >ofpact;
 ofpact_finish_CLONE(out, );
@@ -6479,7 +6482,7 @@ decode_NXAST_RAW_CT(const struct nx_action_conntrack *nac,
 out, OFPACT_CT, vl_mff_map,
 tlv_bitmap);
 if (error) {
-goto out;
+return error;
 }
 
 conntrack = ofpbuf_push_uninit(out, sizeof(*conntrack));
@@ -7500,8 +7503,6 @@ ofpacts_pull_openflow_actions__(struct ofpbuf *openflow,
 uint64_t *ofpacts_tlv_bitmap)
 {
 const struct ofp_action_header *actions;
-void *orig_data = ofpacts->data;
-size_t orig_size = ofpacts->size;
 enum ofperr error;
 
 if (actions_len % OFP_ACTION_ALIGN != 0) {
@@ -7525,21 +7526,15 @@ ofpacts_pull_openflow_actions__(struct ofpbuf *openflow,
outer_action);
 }
 if (error) {
-/* Back out changes to 'ofpacts'.  (Normally, decoding would only
- * append to 'ofpacts', so that one would expect only to need to
- * restore 'ofpacts->size', but some action parsing temporarily pulls
- * off data from the start of 'ofpacts' and may not properly re-push it
- * on error paths.) */
-ofpacts->data = orig_data;
-ofpacts->size = orig_size;
+ofpbuf_clear(ofpacts);
 }
 return error;
 }
 
 /* Attempts to convert 'actions_len' bytes of OpenFlow actions from the front
  * of 'openflow' into ofpacts.  On success, appends the converted actions to
- * 'ofpacts'; on failure, 'ofpacts' is unchanged (but might be reallocated) .
- * Returns 0 if successful, otherwise an OpenFlow error.
+ * 'ofpacts'; on failure, clears 'ofpacts'.  Returns 0 if successful, otherwise
+ * an OpenFlow error.
  *
  * Actions are processed according to their OpenFlow version which
  * is provided in the 'version' parameter.
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/4] ofp-port: Fix leak on error path in parse_intel_port_custom_property().

2018-08-24 Thread Ben Pfaff
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9972
Signed-off-by: Ben Pfaff 
---
 lib/ofp-port.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/ofp-port.c b/lib/ofp-port.c
index 8d882a14b4df..f19beb64a04c 100644
--- a/lib/ofp-port.c
+++ b/lib/ofp-port.c
@@ -1618,6 +1618,7 @@ parse_intel_port_custom_property(struct ofpbuf *payload,
 uint8_t *name_len = ofpbuf_try_pull(payload, sizeof *name_len);
 char *name = name_len ? ofpbuf_try_pull(payload, *name_len) : NULL;
 if (!name_len || !name) {
+free(ops->custom_stats.counters);
 return OFPERR_OFPBPC_BAD_LEN;
 }
 
@@ -1628,6 +1629,7 @@ parse_intel_port_custom_property(struct ofpbuf *payload,
 /* Counter value. */
 ovs_be64 *value = ofpbuf_try_pull(payload, sizeof *value);
 if (!value) {
+free(ops->custom_stats.counters);
 return OFPERR_OFPBPC_BAD_LEN;
 }
 c->value = ntohll(get_unaligned_be64(value));
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/4] tests: Add regression tests for all the bugs found by oss-fuzz so far.

2018-08-24 Thread Ben Pfaff
This will make it harder for bugs found by oss-fuzz to reappear.

Signed-off-by: Ben Pfaff 
---
 tests/automake.mk  |  33 +
 tests/fuzz-regression-list.at  |  23 ++
 tests/fuzz-regression.at   |  21 +
 .../flow_extract_fuzzer-5112775280951296   | Bin 0 -> 54 bytes
 .../flow_extract_fuzzer-5457710546944000   | Bin 0 -> 227 bytes
 .../json_parser_fuzzer-4790908707930112| Bin 0 -> 43 bytes
 .../ofp_print_fuzzer-4584019764183040  | Bin 0 -> 128 bytes
 .../ofp_print_fuzzer-4730143510626304  | Bin 0 -> 12 bytes
 .../ofp_print_fuzzer-4854119633256448  | Bin 0 -> 48 bytes
 .../ofp_print_fuzzer-5070973479944192  | Bin 0 -> 112 bytes
 .../ofp_print_fuzzer-5072291707748352  | Bin 0 -> 112 bytes
 .../ofp_print_fuzzer-5147430386401280  | Bin 0 -> 128 bytes
 .../ofp_print_fuzzer-5168455220199424  | Bin 0 -> 80 bytes
 .../ofp_print_fuzzer-5190507327127552  | Bin 0 -> 113 bytes
 .../ofp_print_fuzzer-5204186701496320  | Bin 0 -> 80 bytes
 .../ofp_print_fuzzer-5394482341085184  | Bin 0 -> 128 bytes
 .../ofp_print_fuzzer-5395207246839808  | Bin 0 -> 156 bytes
 .../ofp_print_fuzzer-564745581120  | Bin 0 -> 48 bytes
 .../ofp_print_fuzzer-5674119268925440  | Bin 0 -> 96 bytes
 .../ofp_print_fuzzer-5674419757252608  | Bin 0 -> 112 bytes
 .../ofp_print_fuzzer-5677588436484096  | Bin 0 -> 128 bytes
 .../ofp_print_fuzzer-5706562554298368  | Bin 0 -> 72 bytes
 .../ofp_print_fuzzer-5722747668791296  | Bin 0 -> 120 bytes
 .../ofp_print_fuzzer-6285128790704128  | Bin 0 -> 128 bytes
 .../ofp_print_fuzzer-6470117922701312  | Bin 0 -> 256 bytes
 .../ofp_print_fuzzer-6502620041576448  | Bin 0 -> 128 bytes
 tests/testsuite.at |   1 +
 27 files changed, 78 insertions(+)
 create mode 100644 tests/fuzz-regression-list.at
 create mode 100644 tests/fuzz-regression.at
 create mode 100644 tests/fuzz-regression/flow_extract_fuzzer-5112775280951296
 create mode 100644 tests/fuzz-regression/flow_extract_fuzzer-5457710546944000
 create mode 100644 tests/fuzz-regression/json_parser_fuzzer-4790908707930112
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-4584019764183040
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-4730143510626304
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-4854119633256448
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5070973479944192
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5072291707748352
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5147430386401280
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5168455220199424
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5190507327127552
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5204186701496320
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5394482341085184
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5395207246839808
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-564745581120
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5674119268925440
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5674419757252608
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5677588436484096
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5706562554298368
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-5722747668791296
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-6285128790704128
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-6470117922701312
 create mode 100644 tests/fuzz-regression/ofp_print_fuzzer-6502620041576448

diff --git a/tests/automake.mk b/tests/automake.mk
index b29a37ec840a..d5d53cdd18ae 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -37,6 +37,8 @@ TESTSUITE_AT = \
tests/ofp-util.at \
tests/ofp-errors.at \
tests/ovs-ofctl.at \
+   tests/fuzz-regression.at \
+   tests/fuzz-regression-list.at \
tests/odp.at \
tests/mpls-xlate.at \
tests/multipath.at \
@@ -108,6 +110,37 @@ TESTSUITE_AT = \
tests/packet-type-aware.at \
tests/nsh.at
 
+EXTRA_DIST += $(FUZZ_REGRESSION_TESTS)
+FUZZ_REGRESSION_TESTS = \
+   tests/fuzz-regression/flow_extract_fuzzer-5112775280951296 \
+   tests/fuzz-regression/flow_extract_fuzzer-5457710546944000 \
+   tests/fuzz-regression/json_parser_fuzzer-4790908707930112 \
+   tests/fuzz-regression/ofp_print_fuzzer-4584019764183040 \
+   tests/fuzz-regression/ofp_print_fuzzer-4730143510626304 \
+   tests/fuzz-regression/ofp_print_fuzzer-4854119633256448 \
+   tests/fuzz-regression/ofp_print_fuzzer-5070973479944192 \
+  

[ovs-dev] [PATCH 2/4] nx-match: Avoid double-free on some error paths.

2018-08-24 Thread Ben Pfaff
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9966
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9968
Fixes: f1eb32b9641c ("ofp-group: Fix memory leak in error cases parsing group 
requests.")
Signed-off-by: Ben Pfaff 
---
 lib/nx-match.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/nx-match.c b/lib/nx-match.c
index 3b030833cb2c..8f98032195af 100644
--- a/lib/nx-match.c
+++ b/lib/nx-match.c
@@ -763,6 +763,7 @@ oxm_pull_field_array(const void *fields_data, size_t 
fields_len,
 ofperr_to_string(error));
 
 free(fa->values);
+fa->values = NULL;
 return error;
 }
 }
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovn: Fix the issue in IPv6 Neigh Solicitation responder for router IPs

2018-08-24 Thread Han Zhou
On Fri, Aug 24, 2018 at 12:27 PM  wrote:
>
> From: Numan Siddique 
>
> Commit [1] added a new action 'nd_na_router' to set the router bit
> in the 'flags' field of the Neighbour Adv packet for router IPs.
> This action was used in the router pipeline. But the logical switch
> pipeline also adds the Neighbour Adv flows for router IPs but with
> 'nd_na' action (which the commit [1] didn't handle).
>
> This patch fixes this by changing the action to 'nd_na_router' for
> router IPs.
>
> Without this patch, the IPv6 functionality is broken.
>
> [1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
> for NS request for router IP"
>
> Signed-off-by: Numan Siddique 
> ---
>  ovn/northd/ovn-northd.8.xml | 24 ++--
>  ovn/northd/ovn-northd.c |  4 +++-
>  tests/ovn.at| 15 ++-
>  3 files changed, 39 insertions(+), 4 deletions(-)
>
> v1 -> v2
> ---
> Addressed the review comments from Han Zhou by using the nd_na_router
> action for router IPs in the logical switch pipeline.
>
>
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index e5ff2b661..7352c6764 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -549,8 +549,8 @@ output;
>  
>Priority-50 flows that match IPv6 ND neighbor solicitations to
>each known IP address A (and A's
> -  solicited node address) of every logical switch port, and
> -  respond with neighbor advertisements directly with
> +  solicited node address) of every logical switch port except of
type
> +  router, and respond with neighbor advertisements directly with
>corresponding Ethernet address E:
>  
>
> @@ -566,6 +566,26 @@ nd_na {
>  };
>  
>
> +
> +  Priority-50 flows that match IPv6 ND neighbor solicitations to
> +  each known IP address A (and A's
> +  solicited node address) of logical switch port of type router,
and
> +  respond with neighbor advertisements directly with
> +  corresponding Ethernet address E:
> +
> +
> +
> +nd_na_router {
> +eth.src = E;
> +ip6.src = A;
> +nd.target = A;
> +nd.tll = E;
> +outport = inport;
> +flags.loopback = 1;
> +output;
> +};
> +
> +
>  
>These flows are omitted for logical ports (other than router
ports or
>localport ports) that are down.
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 1d020a739..72e25181d 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4163,7 +4163,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
hmap *ports,
>
>  ds_clear();
>  ds_put_format(,
> -"nd_na { "
> +"%s { "
>  "eth.src = %s; "
>  "ip6.src = %s; "
>  "nd.target = %s; "
> @@ -4172,6 +4172,8 @@ build_lswitch_flows(struct hmap *datapaths, struct
hmap *ports,
>  "flags.loopback = 1; "
>  "output; "
>  "};",
> +!strcmp(op->nbsp->type, "router") ?
> +"nd_na_router" : "nd_na",
>  op->lsp_addrs[i].ea_s,
>  op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>  op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index c5d054c21..e10a7f9ba 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -9530,7 +9530,7 @@ ovn-nbctl lr-add lr0_ip6
>  ovn-nbctl lrp-add lr0_ip6 lrp0_ip6 00:00:00:00:af:01
aef0:0:0:0:0:0:0:0/64
>  ovn-nbctl lsp-add sw0_ip6 lrp0_ip6-attachment
>  ovn-nbctl lsp-set-type lrp0_ip6-attachment router
> -ovn-nbctl lsp-set-addresses lrp0_ip6-attachment 00:00:00:00:af:01
> +ovn-nbctl lsp-set-addresses lrp0_ip6-attachment router
>  ovn-nbctl lsp-set-options lrp0_ip6-attachment router-port=lrp0_ip6
>  ovn-nbctl set logical_router_port lrp0_ip6
ipv6_ra_configs:address_mode=slaac
>
> @@ -9563,6 +9563,19 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
>  ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>
>  OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up sw0_ip6-port1` = xup])
> +
> +# There should be 2 Neighbor Advertisement flows for the router port
> +# aef0:: ip address in logical switch pipeline with action nd_na_router.
> +AT_CHECK([ovn-sbctl dump-flows sw0_ip6 | grep ls_in_arp_rsp | \
> +grep "nd_na_router" | wc -l], [0], [2
> +])
> +
> +# There should be 4 Neighbor Advertisement flows with action nd_na_router
> +# in the router pipeline for the router lr0_ip6.
> +AT_CHECK([ovn-sbctl dump-flows lr0_ip6 | grep nd_na_router | \
> +wc -l], [0], [4
> +])
> +
>  cr_uuid=`ovn-sbctl find port_binding logical_port=cr-ip6_public | grep
_uuid | cut -f2 -d ":"`
>
>  # There is only one chassis.
> --
> 2.17.1

[ovs-dev] [PATCH V2] ofproto-dpif: Check for EBUSY as well

2018-08-24 Thread Greg Rose
Guru reported that we can't create more than one geneve tunnel.
Sometimes a driver will return EBUSY as well as EEXIST for some
duplicate configurations.  Check for EBUSY too.

Fixes: 7521e0cf9e ("ofproto-dpif: Let the dpif report when a ...")
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047214.html
Reported-by: Guru Shetty 
Signed-off-by: Greg Rose 

---

V2 - Add Reported-at tag as requested by Ben
---
 ofproto/ofproto-dpif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index e3abda5..0a0c69a 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3687,7 +3687,7 @@ port_add(struct ofproto *ofproto_, struct netdev *netdev)
 
 odp_port_t port_no = ODPP_NONE;
 int error = dpif_port_add(ofproto->backer->dpif, netdev, _no);
-if (error != EEXIST) {
+if (error != EEXIST && error != EBUSY) {
 if (error) {
 return error;
 }
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-trace: Make -generate send packets to controller again.

2018-08-24 Thread Justin Pettit


> On Aug 24, 2018, at 12:25 PM, Ben Pfaff  wrote:
> 
> +/* Copies ODP actions from 'in' (with length 'size') to 'out', dropping
> + * OVS_ACTION_ATTR_OUTPUT along the way. */
> +static void
> +prune_output_actions(const struct ofpbuf *in, struct ofpbuf *out)

Do you think it's worth clarifying that it's 'in->size'?  Otherwise it sounds 
like a separate argument.

> +/* Executes all of the datapath actions, except for any 
> OVS_ACTION_ATTR_OUTPUT
> + * actions, in 'actions' on 'packet', which has the given 'flow', on 'dpif'.
> + * The actions have slow path reason 'slow' (if any).  Appends any error
> + * message to 'output'.
> + *
> + * This is mainly useful to execute actions to send a packet to an OpenFlow
> + * controller. */

Does this now generate NetFlow and sFlow events?  If so, do you think it's 
worth mentioning here and in the commit message?

Thanks for fixing this!

Acked-by: Justin Pettit 

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] oss-fuzz: Move oss-fuzz test harnesses and fuzzer configs to ovs source repo

2018-08-24 Thread Ben Pfaff
Thanks.  Applied to master.

On Wed, Aug 22, 2018 at 03:09:26PM +0200, Bhargava Shastry wrote:
> Sorry, I forgot to attach the patch itself. Here it is (attached).
> 
> On 08/22/2018 02:37 PM, Bhargava Shastry wrote:
> > Hi Ben,
> > 
> > The patch looks good to me. I have signed it here:
> > 
> > https://github.com/bshastry/ovs/tree/try-patch-v3
> > 
> > that was rebased on top off latest master.
> > 
> > Regards,
> > Bhargava
> > 
> > On 08/22/2018 01:01 AM, Ben Pfaff wrote:
> >> I sent a v2:
> >> https://patchwork.ozlabs.org/patch/960749/
> >>
> >> If you like it, I need a Signed-off-by from you.
> >>
> >> Q: What's a Signed-off-by and how do I provide one?
> >>
> >> A: Free and open source software projects usually require a 
> >> contributor to
> >> provide some assurance that they're entitled to contribute the code 
> >> that
> >> they provide.  Some projects, for example, do this with a Contributor
> >> License Agreement (CLA) or a copyright assignment that is signed on 
> >> paper
> >> or electronically.
> >>
> >> For this purpose, Open vSwitch has adopted something called the 
> >> Developer's
> >> Certificate of Origin (DCO), which is also used by the Linux kernel and
> >> originated there.  Informally stated, agreeing to the DCO is the
> >> developer's way of attesting that a particular commit that they are
> >> contributing is one that they are allowed to contribute.  You should 
> >> visit
> >> https://developercertificate.org/ to read the full statement of the 
> >> DCO,
> >> which is less than 200 words long.
> >>
> >> To certify compliance with the Developer's Certificate of Origin for a
> >> particular commit, just add the following line to the end of your 
> >> commit
> >> message, properly substituting your name and email address:
> >>
> >> Signed-off-by: Firstname Lastname 
> >>
> >> Git has special support for adding a Signed-off-by line to a commit
> >> message: when you run "git commit", just add the -s option, as in "git
> >> commit -s".  If you use the "git citool" GUI for commits, you can add a
> >> Signed-off-by line to the commit message by pressing Control+S.  Other 
> >> Git
> >> user interfaces may provide similar support.
> >>
> >> Thanks,
> >>
> >> Ben.
> >>
> >> On Fri, Aug 03, 2018 at 10:51:39AM +0200, Bhargava Shastry wrote:
> >>> Hello,
> >>>
> >>> Gentle reminder to check if the proposed patch works :-)
> >>>
> >>> Thanks,
> >>> Bhargava
> >>>
> >>> On 07/30/2018 11:58 AM, Bhargava Shastry wrote:
>  Any updates on the proposed patch? :-)
> 
>  On 07/16/2018 02:07 PM, Bhargava Shastry wrote:
> > Update: I fixed these errors in the attached patch that supersedes the
> > patch here (https://patchwork.ozlabs.org/patch/942118/)
> >
> > The major change is that I add the following line for each fuzz target
> > binary in the tests/oss-fuzz/automake.mk file:
> >
> > e.g.,
> > tests_oss_fuzz_flow_extract_target_LDFLAGS = $(LIB_FUZZING_ENGINE) \
> > -lc++
> >
> > Regards,
> > Bhargava
> >
> > On 07/16/2018 11:45 AM, Bhargava Shastry wrote:
> >> Oops, here's the link failure log:
> >>
> >> ```
> >> /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../x86_64-linux-gnu/crt1.o: 
> >> In
> >> function `_start':
> >> (.text+0x20): undefined reference to `main'
> >> clang-7: error: linker command failed with exit code 1 (use -v to see
> >> invocation)
> >> Makefile:4159: recipe for target 'tests/oss-fuzz/flow_extract_target' 
> >> failed
> >> make: *** [tests/oss-fuzz/flow_extract_target] Error 1
> >> fuzzers build failed.
> >> ```
> >>
> >> The main symbol is provided by libfuzzer (clang++ -lFuzzingEngine)
> >>
> >> Regards,
> >> Bhargava
> >>
> >> On 07/16/2018 11:36 AM, Bhargava Shastry wrote:
> >>> Hi Ben,
> >>>
>  Never mind that one, I failed to check in some of that.
> 
>  I sent it formally:
>  https://patchwork.ozlabs.org/patch/942118/
> >>>
> >>> Thanks for the patch. This fixes the previous error. Now, there are 
> >>> some
> >>> new errors during the compilation/linking process. I think most of 
> >>> this
> >>> can be fixed if I figure out how automake works. In a nutshell, here's
> >>> the problem:
> >>>
> >>> - oss-fuzz provides compilation flags that can be plugged in like so
> >>> ```
> >>> CC=clang
> >>> CXX=clang++
> >>> CFLAGS=-O1 -fno-omit-frame-pointer -gline-tables-only
> >>> -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address
> >>> -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link
> >>> CXXFLAGS=-O1 -fno-omit-frame-pointer -gline-tables-only
> >>> -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address
> >>> 

Re: [ovs-dev] [PATCH] ofproto-dpif: Check for EBUSY as well

2018-08-24 Thread Gregory Rose



On 8/24/2018 12:39 PM, Ben Pfaff wrote:

On Wed, Aug 22, 2018 at 02:38:13PM -0700, Greg Rose wrote:

Sometimes a driver will return EBUSY as well as EEXIST for some
duplicate configurations.  Check for EBUSY too.

Fixes: 7521e0cf9e ("ofproto-dpif: Let the dpif report when a ...")
Reported by: Guru Shetty 
Signed-off-by: Greg Rose 

This fixes a larger issue, right?  If so please describe that in the
commit message or at least add a Reported-at: to the original report.


Sure, will do.  V2 upcoming.

- Greg



Thanks,

Ben.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev: Clean up class initialization.

2018-08-24 Thread Ben Pfaff
On Wed, Aug 22, 2018 at 12:52:47PM +, Stokes, Ian wrote:
> > The macros are hard to read.  This makes it a little more readable.
> > 
> > Signed-off-by: Ben Pfaff 
> > ---
> > v1->v2: Refactor DPDK_FLOW_OFFLOAD_API as Ian requested.
> 
> Thanks Ben, LGTM, will add this to this week's pull request for master unless 
> there is any other input.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Check for EBUSY as well

2018-08-24 Thread Ben Pfaff
On Wed, Aug 22, 2018 at 02:38:13PM -0700, Greg Rose wrote:
> Sometimes a driver will return EBUSY as well as EEXIST for some
> duplicate configurations.  Check for EBUSY too.
> 
> Fixes: 7521e0cf9e ("ofproto-dpif: Let the dpif report when a ...")
> Reported by: Guru Shetty 
> Signed-off-by: Greg Rose 

This fixes a larger issue, right?  If so please describe that in the
commit message or at least add a Reported-at: to the original report.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovn: Fix the issue in IPv6 Neigh Solicitation responder for router IPs

2018-08-24 Thread nusiddiq
From: Numan Siddique 

Commit [1] added a new action 'nd_na_router' to set the router bit
in the 'flags' field of the Neighbour Adv packet for router IPs.
This action was used in the router pipeline. But the logical switch
pipeline also adds the Neighbour Adv flows for router IPs but with
'nd_na' action (which the commit [1] didn't handle).

This patch fixes this by changing the action to 'nd_na_router' for
router IPs.

Without this patch, the IPv6 functionality is broken.

[1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
for NS request for router IP"

Signed-off-by: Numan Siddique 
---
 ovn/northd/ovn-northd.8.xml | 24 ++--
 ovn/northd/ovn-northd.c |  4 +++-
 tests/ovn.at| 15 ++-
 3 files changed, 39 insertions(+), 4 deletions(-)

v1 -> v2
---
Addressed the review comments from Han Zhou by using the nd_na_router
action for router IPs in the logical switch pipeline.


diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index e5ff2b661..7352c6764 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -549,8 +549,8 @@ output;
 
   Priority-50 flows that match IPv6 ND neighbor solicitations to
   each known IP address A (and A's
-  solicited node address) of every logical switch port, and
-  respond with neighbor advertisements directly with
+  solicited node address) of every logical switch port except of type
+  router, and respond with neighbor advertisements directly with
   corresponding Ethernet address E:
 
 
@@ -566,6 +566,26 @@ nd_na {
 };
 
 
+
+  Priority-50 flows that match IPv6 ND neighbor solicitations to
+  each known IP address A (and A's
+  solicited node address) of logical switch port of type router, and
+  respond with neighbor advertisements directly with
+  corresponding Ethernet address E:
+
+
+
+nd_na_router {
+eth.src = E;
+ip6.src = A;
+nd.target = A;
+nd.tll = E;
+outport = inport;
+flags.loopback = 1;
+output;
+};
+
+
 
   These flows are omitted for logical ports (other than router ports or
   localport ports) that are down.
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 1d020a739..72e25181d 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4163,7 +4163,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap 
*ports,
 
 ds_clear();
 ds_put_format(,
-"nd_na { "
+"%s { "
 "eth.src = %s; "
 "ip6.src = %s; "
 "nd.target = %s; "
@@ -4172,6 +4172,8 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap 
*ports,
 "flags.loopback = 1; "
 "output; "
 "};",
+!strcmp(op->nbsp->type, "router") ?
+"nd_na_router" : "nd_na",
 op->lsp_addrs[i].ea_s,
 op->lsp_addrs[i].ipv6_addrs[j].addr_s,
 op->lsp_addrs[i].ipv6_addrs[j].addr_s,
diff --git a/tests/ovn.at b/tests/ovn.at
index c5d054c21..e10a7f9ba 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -9530,7 +9530,7 @@ ovn-nbctl lr-add lr0_ip6
 ovn-nbctl lrp-add lr0_ip6 lrp0_ip6 00:00:00:00:af:01 aef0:0:0:0:0:0:0:0/64
 ovn-nbctl lsp-add sw0_ip6 lrp0_ip6-attachment
 ovn-nbctl lsp-set-type lrp0_ip6-attachment router
-ovn-nbctl lsp-set-addresses lrp0_ip6-attachment 00:00:00:00:af:01
+ovn-nbctl lsp-set-addresses lrp0_ip6-attachment router
 ovn-nbctl lsp-set-options lrp0_ip6-attachment router-port=lrp0_ip6
 ovn-nbctl set logical_router_port lrp0_ip6 ipv6_ra_configs:address_mode=slaac
 
@@ -9563,6 +9563,19 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
 
 OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up sw0_ip6-port1` = xup])
+
+# There should be 2 Neighbor Advertisement flows for the router port
+# aef0:: ip address in logical switch pipeline with action nd_na_router.
+AT_CHECK([ovn-sbctl dump-flows sw0_ip6 | grep ls_in_arp_rsp | \
+grep "nd_na_router" | wc -l], [0], [2
+])
+
+# There should be 4 Neighbor Advertisement flows with action nd_na_router
+# in the router pipeline for the router lr0_ip6.
+AT_CHECK([ovn-sbctl dump-flows lr0_ip6 | grep nd_na_router | \
+wc -l], [0], [4
+])
+
 cr_uuid=`ovn-sbctl find port_binding logical_port=cr-ip6_public | grep _uuid | 
cut -f2 -d ":"`
 
 # There is only one chassis.
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-trace: Make -generate send packets to controller again.

2018-08-24 Thread Ben Pfaff
Prior to the OVS 2.9 development cycle, any flow that sent a packet to a
controller required that the flow be slow-pathed.  In some cases this led
to poor performance, so OVS 2.9 made controller actions fast-pathable.  As
a side effect of the change, "ovs-appctl ofproto/trace -generate" no longer
sent packets to the controller.  This usually didn't matter but it broke
the Faucet tutorial, which relied on this behavior.  This commit
reintroduces the original behavior and thus should fix the tutorial.

CC: Justin Pettit 
Fixes: d39ec23de384 ("ofproto-dpif: Don't slow-path controller actions.")
Reported-by: macman31 
Reported-at: https://github.com/openvswitch/ovs-issues/issues/145
Reported-by: Brad Cowie 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047234.html
Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif-trace.c | 91 +---
 ofproto/ofproto-unixctl.man  |  6 +--
 2 files changed, 80 insertions(+), 17 deletions(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index e8fe0c6c17e9..2110fc713f0d 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -573,6 +573,74 @@ exit:
 free_ct_states(_ct_states);
 }
 
+static void
+explain_slow_path(enum slow_path_reason slow, struct ds *output)
+{
+ds_put_cstr(output, "\nThis flow is handled by the userspace "
+"slow path because it:");
+for (; slow; slow = zero_rightmost_1bit(slow)) {
+enum slow_path_reason bit = rightmost_1bit(slow);
+ds_put_format(output, "\n  - %s.",
+  slow_path_reason_to_explanation(bit));
+}
+}
+
+/* Copies ODP actions from 'in' (with length 'size') to 'out', dropping
+ * OVS_ACTION_ATTR_OUTPUT along the way. */
+static void
+prune_output_actions(const struct ofpbuf *in, struct ofpbuf *out)
+{
+const struct nlattr *a;
+unsigned int left;
+NL_ATTR_FOR_EACH (a, left, in->data, in->size) {
+if (a->nla_type == OVS_ACTION_ATTR_CLONE) {
+struct ofpbuf in_nested;
+nl_attr_get_nested(a, _nested);
+
+size_t ofs = nl_msg_start_nested(out, OVS_ACTION_ATTR_CLONE);
+prune_output_actions(_nested, out);
+nl_msg_end_nested(out, ofs);
+} else if (a->nla_type != OVS_ACTION_ATTR_OUTPUT) {
+ofpbuf_put(out, a, NLA_ALIGN(a->nla_len));
+}
+}
+}
+
+/* Executes all of the datapath actions, except for any OVS_ACTION_ATTR_OUTPUT
+ * actions, in 'actions' on 'packet', which has the given 'flow', on 'dpif'.
+ * The actions have slow path reason 'slow' (if any).  Appends any error
+ * message to 'output'.
+ *
+ * This is mainly useful to execute actions to send a packet to an OpenFlow
+ * controller. */
+static void
+execute_actions_except_outputs(struct dpif *dpif,
+   const struct dp_packet *packet,
+   const struct flow *flow,
+   const struct ofpbuf *actions,
+   enum slow_path_reason slow,
+   struct ds *output)
+{
+struct ofpbuf pruned_actions;
+ofpbuf_init(_actions, 0);
+prune_output_actions(actions, _actions);
+
+struct dpif_execute execute = {
+.actions = pruned_actions.data,
+.actions_len = pruned_actions.size,
+.needs_help = (slow & SLOW_ACTION) != 0,
+.flow = flow,
+.packet = dp_packet_clone_with_headroom(packet, 2),
+};
+int error = dpif_execute(dpif, );
+if (error) {
+ds_put_format(output, "\nAction execution failed (%s)\n.",
+  ovs_strerror(error));
+}
+dp_packet_delete(execute.packet);
+ofpbuf_uninit(_actions);
+}
+
 static void
 ofproto_trace__(struct ofproto_dpif *ofproto, const struct flow *flow,
 const struct dp_packet *packet, struct ovs_list *recirc_queue,
@@ -627,23 +695,18 @@ ofproto_trace__(struct ofproto_dpif *ofproto, const 
struct flow *flow,
 ds_put_format(output,
   "\nTranslation failed (%s), packet is dropped.\n",
   xlate_strerror(error));
-} else if (xout.slow) {
-enum slow_path_reason slow;
-
-ds_put_cstr(output, "\nThis flow is handled by the userspace "
-"slow path because it:");
-
-slow = xout.slow;
-while (slow) {
-enum slow_path_reason bit = rightmost_1bit(slow);
-
-ds_put_format(output, "\n  - %s.",
-  slow_path_reason_to_explanation(bit));
-
-slow &= ~bit;
+} else {
+if (xout.slow) {
+explain_slow_path(xout.slow, output);
+}
+if (packet) {
+execute_actions_except_outputs(ofproto->backer->dpif, packet,
+   _flow, _actions,
+   xout.slow, output);
 

Re: [ovs-dev] [PATCH] Do not add IPv6 Neigh Adv flows for router ips in switch pipeline

2018-08-24 Thread Han Zhou
On Fri, Aug 24, 2018 at 11:17 AM Numan Siddique  wrote:
>
>
>
> On Fri, Aug 24, 2018 at 11:19 PM Han Zhou  wrote:
>>
>>
>>
>> On Thu, Aug 23, 2018 at 12:10 PM  wrote:
>> >
>> > From: Numan Siddique 
>> >
>> > Commit [1] added a new action 'nd_na_router' to set the router bit
>> > in the 'flags' field of the Neighbour Adv packet in the router
>> > pipeline. But the logical switch pipeline was also adding the
>> > Neighbour Adv flows for router IPs with 'nd_na' action, which the
>> > commit [1] didn't handle.
>> >
>> > This patch fixes this.
>> >
>> > [1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
>> > for NS request for router IP"
>> >
>> > Signed-off-by: Numan Siddique 
>> > ---
>>
>> Hi Numan, thanks for the fix. This patch fixes the issue by skipping
adding the flow in LS pipeline, but this would make the ND responder not
effective. It was supposed to suppress the broadcast packets by directly
respond from the LS pipeline, now that the flow is skipped, it will end up
flooding the broadcast to all ports on the LS. Would it be better to add a
higher priority flow in same table in the LS pipeline to match an extra
condition, the router IP, for the ND packet and use nd_na_router action
there? This way we can avoid flooding when VM sending ND for router IP and
at the same time respond correctly with the router bit set. What do you
think?
>
>
> Hi Han,
> Your suggestion makes sense. I think we can use the same priority and use
"nd_nd_router" if the logical switch port is of type router.

Yes, forget about the "extra condition" I mentioned. IP is already there,
my bad :)
>
>>
>>
>> Please also find one minor comment inlined.
>>
>> Thanks,
>> Han
>>
>> >  ovn/northd/ovn-northd.c | 70 ++---
>> >  tests/ovn.at| 24 +-
>> >  2 files changed, 61 insertions(+), 33 deletions(-)
>> >
>> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> > index 1d020a739..acc0d586e 100644
>> > --- a/ovn/northd/ovn-northd.c
>> > +++ b/ovn/northd/ovn-northd.c
>> > @@ -4150,40 +4150,46 @@ build_lswitch_flows(struct hmap *datapaths,
struct hmap *ports,
>> >ds_cstr(), "next;");
>> >  }
>> >
>> > -/* For ND solicitations, we need to listen for both the
>> > - * unicast IPv6 address and its all-nodes multicast
address,
>> > - * but always respond with the unicast IPv6 address. */
>> > -for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs;
j++) {
>> > -ds_clear();
>> > -ds_put_format(,
>> > -"nd_ns && ip6.dst == {%s, %s} && nd.target ==
%s",
>> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> > -op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
>> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>> > +/* Skip adding Neighbor Adv flows for ports with router
peers.
>> > + * Router pipeline takes care of adding those with
nd_na_router
>> > + * action.
>> > + */
>> > +if (!op->peer) {
>>
>> I am not sure if it is more straightforward to use: if
(!strcmp(op->nbsp->type, "router"))
>
>
> I thought about it. I will change it to strcmp. Seems more clearer.
>
> Thanks for the review.
>
> Numan
>
>>
>>
>> > +/* For ND solicitations, we need to listen for both
the
>> > + * unicast IPv6 address and its all-nodes multicast
address,
>> > + * but always respond with the unicast IPv6 address.
*/
>> > +for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs;
j++) {
>> > +ds_clear();
>> > +ds_put_format(,
>> > +"nd_ns && ip6.dst == {%s, %s} &&
nd.target == %s",
>> > +op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> > +op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
>> > +op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>> >
>> > -ds_clear();
>> > -ds_put_format(,
>> > -"nd_na { "
>> > -"eth.src = %s; "
>> > -"ip6.src = %s; "
>> > -"nd.target = %s; "
>> > -"nd.tll = %s; "
>> > -"outport = inport; "
>> > -"flags.loopback = 1; "
>> > -"output; "
>> > -"};",
>> > -op->lsp_addrs[i].ea_s,
>> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
>> > -op->lsp_addrs[i].ea_s);
>> > -ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
50,
>> > -  ds_cstr(), ds_cstr());
>> > +ds_clear();
>> > +

Re: [ovs-dev] [PATCH v3 1/1] netdev-vport: reject concomitant incompatible tunnels

2018-08-24 Thread Eelco Chaudron



On 24 Aug 2018, at 19:49, Ben Pfaff wrote:


On Fri, Aug 24, 2018 at 11:38:05AM +0200, Eelco Chaudron wrote:


Hi Ben some of this is existing code, however, which version of 
Sparse are

you using, as I’m not getting any error?


I appear to be using:

$ sparse --version
v0.5.0-44-g40791b9

via git://git.kernel.org/pub/scm/devel/sparse/chrisl/sparse.  It looks
like v0.5.2 exists, so I should probably upgrade.


I was using 5.2, but I think it’s a problem in combination with my GCC 
version.

Will try figure this out next week…


What version do you have?

However, this might all be obsoleted by the changes that should go in 
in
Ian Stokes' next pull request, which should include the following 
patch

to make the code here a little less awful:
https://patchwork.ozlabs.org/patch/957990/


I’ll wait for this to go in before sending a v4.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Do not add IPv6 Neigh Adv flows for router ips in switch pipeline

2018-08-24 Thread Numan Siddique
On Fri, Aug 24, 2018 at 11:19 PM Han Zhou  wrote:

>
>
> On Thu, Aug 23, 2018 at 12:10 PM  wrote:
> >
> > From: Numan Siddique 
> >
> > Commit [1] added a new action 'nd_na_router' to set the router bit
> > in the 'flags' field of the Neighbour Adv packet in the router
> > pipeline. But the logical switch pipeline was also adding the
> > Neighbour Adv flows for router IPs with 'nd_na' action, which the
> > commit [1] didn't handle.
> >
> > This patch fixes this.
> >
> > [1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
> > for NS request for router IP"
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> Hi Numan, thanks for the fix. This patch fixes the issue by skipping
> adding the flow in LS pipeline, but this would make the ND responder not
> effective. It was supposed to suppress the broadcast packets by directly
> respond from the LS pipeline, now that the flow is skipped, it will end up
> flooding the broadcast to all ports on the LS. Would it be better to add a
> higher priority flow in same table in the LS pipeline to match an extra
> condition, the router IP, for the ND packet and use nd_na_router action
> there? This way we can avoid flooding when VM sending ND for router IP and
> at the same time respond correctly with the router bit set. What do you
> think?
>

Hi Han,
Your suggestion makes sense. I think we can use the same priority and use
"nd_nd_router" if the logical switch port is of type router.


>
> Please also find one minor comment inlined.
>
> Thanks,
> Han
>
> >  ovn/northd/ovn-northd.c | 70 ++---
> >  tests/ovn.at| 24 +-
> >  2 files changed, 61 insertions(+), 33 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 1d020a739..acc0d586e 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -4150,40 +4150,46 @@ build_lswitch_flows(struct hmap *datapaths,
> struct hmap *ports,
> >ds_cstr(), "next;");
> >  }
> >
> > -/* For ND solicitations, we need to listen for both the
> > - * unicast IPv6 address and its all-nodes multicast address,
> > - * but always respond with the unicast IPv6 address. */
> > -for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) {
> > -ds_clear();
> > -ds_put_format(,
> > -"nd_ns && ip6.dst == {%s, %s} && nd.target ==
> %s",
> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> > -op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s);
> > +/* Skip adding Neighbor Adv flows for ports with router
> peers.
> > + * Router pipeline takes care of adding those with
> nd_na_router
> > + * action.
> > + */
> > +if (!op->peer) {
>
> I am not sure if it is more straightforward to use: if
> (!strcmp(op->nbsp->type, "router"))
>

I thought about it. I will change it to strcmp. Seems more clearer.

Thanks for the review.

Numan


>
> > +/* For ND solicitations, we need to listen for both the
> > + * unicast IPv6 address and its all-nodes multicast
> address,
> > + * but always respond with the unicast IPv6 address. */
> > +for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs;
> j++) {
> > +ds_clear();
> > +ds_put_format(,
> > +"nd_ns && ip6.dst == {%s, %s} && nd.target
> == %s",
> > +op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> > +op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> > +op->lsp_addrs[i].ipv6_addrs[j].addr_s);
> >
> > -ds_clear();
> > -ds_put_format(,
> > -"nd_na { "
> > -"eth.src = %s; "
> > -"ip6.src = %s; "
> > -"nd.target = %s; "
> > -"nd.tll = %s; "
> > -"outport = inport; "
> > -"flags.loopback = 1; "
> > -"output; "
> > -"};",
> > -op->lsp_addrs[i].ea_s,
> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> > -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> > -op->lsp_addrs[i].ea_s);
> > -ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
> 50,
> > -  ds_cstr(), ds_cstr());
> > +ds_clear();
> > +ds_put_format(,
> > +"nd_na { "
> > +"eth.src = %s; "
> > +"ip6.src = %s; "
> > +"nd.target = 

Re: [ovs-dev] [PATCH 2/2] tests: Extend tests of simulated packets to kernel 4.17.x

2018-08-24 Thread Ben Pfaff
On Thu, Aug 23, 2018 at 10:18:05PM -0700, Gregory Rose wrote:
> On 8/21/2018 7:42 AM, Yifeng Sun wrote:
> >GRE-related tests are skipped on ubuntu-18.04.1 because the
> >vanilla `ip` will fail to set dev's mac address. This bug
> >is described in this link:
> >
> >https://bugzilla.redhat.com/show_bug.cgi?id=1550097
> >
> >This patch enables GRE tests to run even if the buggy `ip`
> >is being used.
> >
> >Signed-off-by: Yifeng Sun 
> 
> Looks good, thanks!
> 
> Reviewed-by: Greg Rose 
> Tested-by: Greg Rose 

Thanks Yifeng (and Greg).  I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/1] netdev-vport: reject concomitant incompatible tunnels

2018-08-24 Thread Ben Pfaff
On Fri, Aug 24, 2018 at 11:38:05AM +0200, Eelco Chaudron wrote:
> 
> 
> On 10 Jul 2018, at 23:45, Ben Pfaff wrote:
> 
> >On Fri, Jun 01, 2018 at 07:03:55PM +0200, Eelco Chaudron wrote:
> >>This patch will make sure VXLAN tunnels with and without the group
> >>based policy (GBP) option enabled can not coexist on the same
> >>destination UDP port.
> >>
> >>In theory, VXLAN tunnel with and without GBP enables can be
> >>multiplexed on the same UDP port as long as different VNI's are
> >>used. However currently OVS does not support this, hence this patch to
> >>check for this condition.
> >>
> >>Signed-off-by: Eelco Chaudron 
> >
> >Thanks for the patch, and sorry that I'm so slow.
> >
> >Does this support the case where, in a single database transaction, a
> >GBP VXLAN tunnel is removed and a non-GBP VXLAN tunnel is created that
> >would otherwise interfere with one another (and the converse case)?  If
> >so, could that be included in the test?
> >
> >"sparse" doesn't like the initialization strategy:
> >
> >../lib/netdev-vport.c:1207:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1211:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1215:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1219:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1220:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1221:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1225:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1229:9: error: Using plain integer as NULL pointer
> >../lib/netdev-vport.c:1261:13: error: Using plain integer as NULL pointer
> >
> 
> Hi Ben some of this is existing code, however, which version of Sparse are
> you using, as I’m not getting any error?

I appear to be using:

$ sparse --version
v0.5.0-44-g40791b9

via git://git.kernel.org/pub/scm/devel/sparse/chrisl/sparse.  It looks
like v0.5.2 exists, so I should probably upgrade.

What version do you have?

However, this might all be obsoleted by the changes that should go in in
Ian Stokes' next pull request, which should include the following patch
to make the code here a little less awful:
https://patchwork.ozlabs.org/patch/957990/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Do not add IPv6 Neigh Adv flows for router ips in switch pipeline

2018-08-24 Thread Han Zhou
On Thu, Aug 23, 2018 at 12:10 PM  wrote:
>
> From: Numan Siddique 
>
> Commit [1] added a new action 'nd_na_router' to set the router bit
> in the 'flags' field of the Neighbour Adv packet in the router
> pipeline. But the logical switch pipeline was also adding the
> Neighbour Adv flows for router IPs with 'nd_na' action, which the
> commit [1] didn't handle.
>
> This patch fixes this.
>
> [1] - "c9756229ed: ovn: Set proper Neighbour Adv flag when replying
> for NS request for router IP"
>
> Signed-off-by: Numan Siddique 
> ---

Hi Numan, thanks for the fix. This patch fixes the issue by skipping adding
the flow in LS pipeline, but this would make the ND responder not
effective. It was supposed to suppress the broadcast packets by directly
respond from the LS pipeline, now that the flow is skipped, it will end up
flooding the broadcast to all ports on the LS. Would it be better to add a
higher priority flow in same table in the LS pipeline to match an extra
condition, the router IP, for the ND packet and use nd_na_router action
there? This way we can avoid flooding when VM sending ND for router IP and
at the same time respond correctly with the router bit set. What do you
think?

Please also find one minor comment inlined.

Thanks,
Han

>  ovn/northd/ovn-northd.c | 70 ++---
>  tests/ovn.at| 24 +-
>  2 files changed, 61 insertions(+), 33 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 1d020a739..acc0d586e 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4150,40 +4150,46 @@ build_lswitch_flows(struct hmap *datapaths,
struct hmap *ports,
>ds_cstr(), "next;");
>  }
>
> -/* For ND solicitations, we need to listen for both the
> - * unicast IPv6 address and its all-nodes multicast address,
> - * but always respond with the unicast IPv6 address. */
> -for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs; j++) {
> -ds_clear();
> -ds_put_format(,
> -"nd_ns && ip6.dst == {%s, %s} && nd.target ==
%s",
> -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> -op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> -op->lsp_addrs[i].ipv6_addrs[j].addr_s);
> +/* Skip adding Neighbor Adv flows for ports with router
peers.
> + * Router pipeline takes care of adding those with
nd_na_router
> + * action.
> + */
> +if (!op->peer) {

I am not sure if it is more straightforward to use: if
(!strcmp(op->nbsp->type, "router"))

> +/* For ND solicitations, we need to listen for both the
> + * unicast IPv6 address and its all-nodes multicast
address,
> + * but always respond with the unicast IPv6 address. */
> +for (size_t j = 0; j < op->lsp_addrs[i].n_ipv6_addrs;
j++) {
> +ds_clear();
> +ds_put_format(,
> +"nd_ns && ip6.dst == {%s, %s} && nd.target
== %s",
> +op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> +op->lsp_addrs[i].ipv6_addrs[j].sn_addr_s,
> +op->lsp_addrs[i].ipv6_addrs[j].addr_s);
>
> -ds_clear();
> -ds_put_format(,
> -"nd_na { "
> -"eth.src = %s; "
> -"ip6.src = %s; "
> -"nd.target = %s; "
> -"nd.tll = %s; "
> -"outport = inport; "
> -"flags.loopback = 1; "
> -"output; "
> -"};",
> -op->lsp_addrs[i].ea_s,
> -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> -op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> -op->lsp_addrs[i].ea_s);
> -ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 50,
> -  ds_cstr(), ds_cstr());
> +ds_clear();
> +ds_put_format(,
> +"nd_na { "
> +"eth.src = %s; "
> +"ip6.src = %s; "
> +"nd.target = %s; "
> +"nd.tll = %s; "
> +"outport = inport; "
> +"flags.loopback = 1; "
> +"output; "
> +"};",
> +op->lsp_addrs[i].ea_s,
> +op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> +op->lsp_addrs[i].ipv6_addrs[j].addr_s,
> +op->lsp_addrs[i].ea_s);
> +

[ovs-dev] [PATCH 6/6] netdev-dummy: Initialize new dummy ports as "up".

2018-08-24 Thread Ben Pfaff
Dummy ports started out down and hardly any of the tests ever brought them
up.  This led to some odd test results and caused problems for testing with
controllers that didn't bother with ports that were down, like recent
versions of Faucet.  There doesn't seem to be a big reason for them to be
down by default, so this commit changes them to be up by default.  It also
updates the tests to match the new behavior.

Reported-by: Brad Cowie 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047234.html
Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c|   2 +-
 tests/ofproto-dpif.at |  20 ++---
 tests/ofproto.at  | 206 +++---
 3 files changed, 124 insertions(+), 104 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index d4984674fb53..e7c2e3375b5f 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -683,7 +683,7 @@ netdev_dummy_construct(struct netdev *netdev_)
 netdev->hwaddr.ea[4] = n >> 8;
 netdev->hwaddr.ea[5] = n;
 netdev->mtu = 1500;
-netdev->flags = 0;
+netdev->flags = NETDEV_UP;
 netdev->ifindex = -EOPNOTSUPP;
 netdev->requested_n_rxq = netdev_->n_rxq;
 netdev->requested_n_txq = netdev_->n_txq;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 362c58db437b..0953bffbaea6 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -713,7 +713,7 @@ AT_CHECK([ovs-ofctl -O OpenFlow12 add-group br0 
'group_id=1234,type=ff,bucket=wa
 AT_CHECK([ovs-ofctl -O OpenFlow12 add-flow br0 'ip 
actions=write_actions(group:1234)'])
 AT_CHECK([ovs-appctl ofproto/trace br0 
'in_port=1,dl_src=50:54:00:00:00:05,dl_dst=50:54:00:00:00:07,dl_type=0x0800,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=1,nw_tos=0,nw_ttl=128,icmp_type=8,icmp_code=0'],
 [0], [stdout])
 AT_CHECK([tail -1 stdout], [0],
-  [Datapath actions: drop
+  [Datapath actions: 10
 ])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
@@ -2395,13 +2395,13 @@ OVS_APP_EXIT_AND_WAIT(ovs-ofctl)
 
 AT_CHECK([strip_metadata < ofctl_monitor.log], [0], [dnl
 NXT_PACKET_IN (xid=0x0): cookie=0xd total_len=58 in_port=1 (via action) 
data_len=58 (unbuffered)
-tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.255.255,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7745
+tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7744
 dnl
 NXT_PACKET_IN (xid=0x0): cookie=0xd total_len=58 in_port=1 (via action) 
data_len=58 (unbuffered)
-tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.255.255,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7745
+tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7744
 dnl
 NXT_PACKET_IN (xid=0x0): cookie=0xd total_len=58 in_port=1 (via action) 
data_len=58 (unbuffered)
-tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.255.255,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7745
+tcp,vlan_tci=0x,dl_src=60:66:66:66:00:06,dl_dst=50:54:00:00:00:07,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_tos=0,nw_ecn=0,nw_ttl=255,tp_src=80,tp_dst=0,tcp_flags=0
 tcp_csum:7744
 ])
 
 AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
@@ -6228,7 +6228,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=202
 in_unicasts=3
 in_multicasts=4294967295
@@ -6251,7 +6251,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=148
 in_unicasts=2
 in_multicasts=4294967295
@@ -6274,7 +6274,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=0
 in_unicasts=0
 in_multicasts=4294967295
@@ -6297,7 +6297,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=0
 in_unicasts=0
 in_multicasts=4294967295
@@ -6320,7 +6320,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=202
 in_unicasts=3
 in_multicasts=4294967295
@@ -6343,7 +6343,7 @@ IFCOUNTERS
 type=6
 ifspeed=1
 direction=0
-status=0
+status=3
 in_octets=148
 in_unicasts=2
 in_multicasts=4294967295
diff --git a/tests/ofproto.at b/tests/ofproto.at
index 9819bc577bdd..c92339a34fef 100644
--- a/tests/ofproto.at
+++ b/tests/ofproto.at
@@ -39,8 +39,8 @@ n_tables:254, n_buffers:0

[ovs-dev] [PATCH 5/6] ofproto: Consistently force off OFPPS_LIVE if port or link is down.

2018-08-24 Thread Ben Pfaff
It doesn't make sense for a port that is down to be "live" from OpenFlow's
point of view, but this could happen in OVS.

Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif.c | 26 ++
 ofproto/ofproto-provider.h |  1 +
 ofproto/ofproto.c  | 43 +--
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 868c728c0c14..1faf996b83c8 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -2003,16 +2003,6 @@ port_modified(struct ofport *port_)
 bfd_set_netdev(port->bfd, netdev);
 }
 
-/* Set liveness, unless the link is administratively or
- * operationally down or link monitoring false */
-if (!(port->up.pp.config & OFPUTIL_PC_PORT_DOWN) &&
-!(port->up.pp.state & OFPUTIL_PS_LINK_DOWN) &&
-port->up.may_enable) {
-port->up.pp.state |= OFPUTIL_PS_LIVE;
-} else {
-port->up.pp.state &= ~OFPUTIL_PS_LIVE;
-}
-
 ofproto_dpif_monitor_port_update(port, port->bfd, port->cfm,
  port->lldp, >up.pp.hw_addr);
 
@@ -2047,6 +2037,7 @@ port_reconfigured(struct ofport *port_, enum 
ofputil_port_config old_config)
 bundle_update(port->bundle);
 }
 }
+port_run(port);
 }
 
 static int
@@ -3609,7 +3600,7 @@ port_run(struct ofport_dpif *ofport)
 
 bool enable = may_enable_port(ofport);
 if (ofport->up.may_enable != enable) {
-ofport->up.may_enable = enable;
+ofproto_port_set_enable(>up, enable);
 
 struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 ofproto->backer->need_revalidate = REV_PORT_TOGGLED;
@@ -3617,19 +3608,6 @@ port_run(struct ofport_dpif *ofport)
 if (ofport->rstp_port) {
 rstp_port_set_mac_operational(ofport->rstp_port, enable);
 }
-
-/* Propagate liveness, unless the link is administratively or
- * operationally down. */
-if (!(ofport->up.pp.config & OFPUTIL_PC_PORT_DOWN) &&
-!(ofport->up.pp.state & OFPUTIL_PS_LINK_DOWN)) {
-enum ofputil_port_state of_state = ofport->up.pp.state;
-if (enable) {
-of_state |= OFPUTIL_PS_LIVE;
-} else {
-of_state &= ~OFPUTIL_PS_LIVE;
-}
-ofproto_port_set_state(>up, of_state);
-}
 }
 }
 
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 28627cbd3902..4fd8cb14ed40 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -165,6 +165,7 @@ struct ofport {
 bool may_enable;/* May be live (OFPPS_LIVE) if link is up. */
 };
 
+void ofproto_port_set_enable(struct ofport *, bool enable);
 void ofproto_port_set_state(struct ofport *, enum ofputil_port_state);
 
 /* OpenFlow table flags:
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 94e8b9576b06..9594427a2fa6 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2457,11 +2457,34 @@ ofport_remove_with_name(struct ofproto *ofproto, const 
char *name)
 }
 }
 
+static enum ofputil_port_state
+normalize_state(enum ofputil_port_config config,
+enum ofputil_port_state state,
+bool may_enable)
+{
+return (config & OFPUTIL_PC_PORT_DOWN
+|| state & OFPUTIL_PS_LINK_DOWN
+|| !may_enable
+? state & ~OFPUTIL_PS_LIVE
+: state | OFPUTIL_PS_LIVE);
+}
+
+void
+ofproto_port_set_enable(struct ofport *port, bool enable)
+{
+if (enable != port->may_enable) {
+port->may_enable = enable;
+ofproto_port_set_state(port, normalize_state(port->pp.config,
+ port->pp.state,
+ port->may_enable));
+}
+}
 
 /* Update OpenFlow 'state' in 'port' and notify controller. */
 void
 ofproto_port_set_state(struct ofport *port, enum ofputil_port_state state)
 {
+state = normalize_state(port->pp.config, state, port->may_enable);
 if (port->pp.state != state) {
 struct ofputil_phy_port old_pp = port->pp;
 port->pp.state = state;
@@ -2611,16 +2634,18 @@ update_port(struct ofproto *ofproto, const char *name)
 port = ofproto_get_port(ofproto, ofproto_port.ofp_port);
 if (port && !strcmp(netdev_get_name(port->netdev), name)) {
 struct netdev *old_netdev = port->netdev;
-struct ofputil_phy_port old_pp = port->pp;
 
 /* ofport_open() only sets OFPUTIL_PC_PORT_DOWN and
- * OFPUTIL_PS_LINK_DOWN.  Keep the other config and state bits. */
+ * OFPUTIL_PS_LINK_DOWN.  Keep the other config and state bits (but
+ * a port that is down cannot be live). */
 pp.config |= port->pp.config & ~OFPUTIL_PC_PORT_DOWN;
 pp.state |= port->pp.state & ~OFPUTIL_PS_LINK_DOWN;
+pp.state = 

[ovs-dev] [PATCH 4/6] ofproto-dpif: Refactor port_run().

2018-08-24 Thread Ben Pfaff
This makes port_run() easier to understand but should not change its
behavior.

Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif.c | 62 ++
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 643c2cf325f0..868c728c0c14 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3569,45 +3569,49 @@ ofport_update_peer(struct ofport_dpif *ofport)
 free(peer_name);
 }
 
-static void
-port_run(struct ofport_dpif *ofport)
+static bool
+may_enable_port(struct ofport_dpif *ofport)
 {
-long long int carrier_seq = netdev_get_carrier_resets(ofport->up.netdev);
-bool carrier_changed = carrier_seq != ofport->carrier_seq;
-bool enable = netdev_get_carrier(ofport->up.netdev);
-bool cfm_enable = false;
-bool bfd_enable = false;
-
-ofport->carrier_seq = carrier_seq;
-
-if (ofport->cfm) {
-int cfm_opup = cfm_get_opup(ofport->cfm);
-
-cfm_enable = !cfm_get_fault(ofport->cfm);
-
-if (cfm_opup >= 0) {
-cfm_enable = cfm_enable && cfm_opup;
-}
+/* Carrier must be up. */
+if (!netdev_get_carrier(ofport->up.netdev)) {
+return false;
 }
 
-if (ofport->bfd) {
-bfd_enable = bfd_forwarding(ofport->bfd);
+/* If CFM or BFD is enabled, then at least one of them must report that the
+ * port is up. */
+if ((ofport->bfd || ofport->cfm)
+&& !(ofport->cfm
+ && !cfm_get_fault(ofport->cfm)
+ && cfm_get_opup(ofport->cfm) != 0)
+&& !(ofport->bfd
+ && bfd_forwarding(ofport->bfd))) {
+return false;
 }
 
-if (ofport->bfd || ofport->cfm) {
-enable = enable && (cfm_enable || bfd_enable);
+/* If LACP is enabled, it must report that the link is enabled. */
+if (ofport->bundle
+&& !lacp_slave_may_enable(ofport->bundle->lacp, ofport)) {
+return false;
 }
 
-if (ofport->bundle) {
-enable = enable && lacp_slave_may_enable(ofport->bundle->lacp, ofport);
-if (carrier_changed) {
-lacp_slave_carrier_changed(ofport->bundle->lacp, ofport);
-}
+return true;
+}
+
+static void
+port_run(struct ofport_dpif *ofport)
+{
+long long int carrier_seq = netdev_get_carrier_resets(ofport->up.netdev);
+bool carrier_changed = carrier_seq != ofport->carrier_seq;
+ofport->carrier_seq = carrier_seq;
+if (carrier_changed && ofport->bundle) {
+lacp_slave_carrier_changed(ofport->bundle->lacp, ofport);
 }
 
+bool enable = may_enable_port(ofport);
 if (ofport->up.may_enable != enable) {
-struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+ofport->up.may_enable = enable;
 
+struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 ofproto->backer->need_revalidate = REV_PORT_TOGGLED;
 
 if (ofport->rstp_port) {
@@ -3627,8 +3631,6 @@ port_run(struct ofport_dpif *ofport)
 ofproto_port_set_state(>up, of_state);
 }
 }
-
-ofport->up.may_enable = enable;
 }
 
 static int
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/6] ofproto: Move may_enable from ofport_dpif to ofport.

2018-08-24 Thread Ben Pfaff
This concept of whether a port is suitable to be "live" in the sense of the
OpenFlow OFPPS_LIVE bit is a generic one that can be handled at the ofproto
layer instead of needing to be part of ofproto-dpif.

An upcoming commit will make more use of this at the ofproto layer.

Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif.c | 14 ++
 ofproto/ofproto-provider.h |  1 +
 ofproto/ofproto.c  |  1 +
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index e3abda571d31..643c2cf325f0 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -137,7 +137,6 @@ struct ofport_dpif {
 struct cfm *cfm;/* Connectivity Fault Management, if any. */
 struct bfd *bfd;/* BFD, if any. */
 struct lldp *lldp;  /* lldp, if any. */
-bool may_enable;/* May be enabled in bonds. */
 bool is_tunnel; /* This port is a tunnel. */
 long long int carrier_seq;  /* Carrier status changes. */
 struct ofport_dpif *peer;   /* Peer if patch port. */
@@ -479,7 +478,7 @@ type_run(const char *type)
  ofport->rstp_port, ofport->qdscp,
  ofport->n_qdscp, ofport->up.pp.config,
  ofport->up.pp.state, ofport->is_tunnel,
- ofport->may_enable);
+ ofport->up.may_enable);
 }
 }
 xlate_txn_commit();
@@ -1848,7 +1847,6 @@ port_construct(struct ofport *port_)
 port->cfm = NULL;
 port->bfd = NULL;
 port->lldp = NULL;
-port->may_enable = false;
 port->stp_port = NULL;
 port->stp_state = STP_DISABLED;
 port->rstp_port = NULL;
@@ -2009,7 +2007,7 @@ port_modified(struct ofport *port_)
  * operationally down or link monitoring false */
 if (!(port->up.pp.config & OFPUTIL_PC_PORT_DOWN) &&
 !(port->up.pp.state & OFPUTIL_PS_LINK_DOWN) &&
-port->may_enable) {
+port->up.may_enable) {
 port->up.pp.state |= OFPUTIL_PS_LIVE;
 } else {
 port->up.pp.state &= ~OFPUTIL_PS_LIVE;
@@ -2792,7 +2790,7 @@ set_rstp_port(struct ofport *ofport_,
   ofport, netdev_get_name(ofport->up.netdev));
 update_rstp_port_state(ofport);
 /* Synchronize operational status. */
-rstp_port_set_mac_operational(rp, ofport->may_enable);
+rstp_port_set_mac_operational(rp, ofport->up.may_enable);
 }
 
 static void
@@ -3341,7 +3339,7 @@ bundle_run(struct ofbundle *bundle)
 struct ofport_dpif *port;
 
 LIST_FOR_EACH (port, bundle_node, >ports) {
-bond_slave_set_may_enable(bundle->bond, port, port->may_enable);
+bond_slave_set_may_enable(bundle->bond, port, port->up.may_enable);
 }
 
 if (bond_run(bundle->bond, lacp_status(bundle->lacp))) {
@@ -3607,7 +3605,7 @@ port_run(struct ofport_dpif *ofport)
 }
 }
 
-if (ofport->may_enable != enable) {
+if (ofport->up.may_enable != enable) {
 struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 
 ofproto->backer->need_revalidate = REV_PORT_TOGGLED;
@@ -3630,7 +3628,7 @@ port_run(struct ofport_dpif *ofport)
 }
 }
 
-ofport->may_enable = enable;
+ofport->up.may_enable = enable;
 }
 
 static int
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 2b77b8993ada..28627cbd3902 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -162,6 +162,7 @@ struct ofport {
 uint64_t change_seq;
 long long int created;  /* Time created, in msec. */
 int mtu;
+bool may_enable;/* May be live (OFPPS_LIVE) if link is up. */
 };
 
 void ofproto_port_set_state(struct ofport *, enum ofputil_port_state);
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 8a8a8494ca94..94e8b9576b06 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2403,6 +2403,7 @@ ofport_install(struct ofproto *p,
 ofport->pp = *pp;
 ofport->ofp_port = pp->port_no;
 ofport->created = time_msec();
+ofport->may_enable = false;
 
 /* Add port to 'p'. */
 hmap_insert(>ports, >hmap_node,
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/6] ofproto: Refactor update_port().

2018-08-24 Thread Ben Pfaff
update_port() worked a little too hard to avoid copying and comparing some
bits in the ofputil_phy_port.  This seems like a simpler approach all
around.  It should behave the same way.

Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto.c | 34 +-
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 95235f21232c..8a8a8494ca94 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2362,9 +2362,8 @@ ofport_open(struct ofproto *ofproto,
 return netdev;
 }
 
-/* Returns true if most fields of 'a' and 'b' are equal.  Differences in name,
- * port number, and 'config' bits other than OFPUTIL_PC_PORT_DOWN are
- * disregarded. */
+/* Returns true if most fields of 'a' and 'b' are equal.  Differences in name
+ * and port number are disregarded. */
 static bool
 ofport_equal(const struct ofputil_phy_port *a,
  const struct ofputil_phy_port *b)
@@ -2372,7 +2371,7 @@ ofport_equal(const struct ofputil_phy_port *a,
 return (eth_addr_equals(a->hw_addr, b->hw_addr)
 && eth_addr64_equals(a->hw_addr64, b->hw_addr64)
 && a->state == b->state
-&& !((a->config ^ b->config) & OFPUTIL_PC_PORT_DOWN)
+&& a->config == b->config
 && a->curr == b->curr
 && a->advertised == b->advertised
 && a->supported == b->supported
@@ -2457,26 +2456,6 @@ ofport_remove_with_name(struct ofproto *ofproto, const 
char *name)
 }
 }
 
-/* Updates 'port' with new 'pp' description.
- *
- * Does not handle a name or port number change.  The caller must implement
- * such a change as a delete followed by an add.  */
-static void
-ofport_modified(struct ofport *port, struct ofputil_phy_port *pp)
-{
-port->pp.hw_addr = pp->hw_addr;
-port->pp.hw_addr64 = pp->hw_addr64;
-port->pp.config = ((port->pp.config & ~OFPUTIL_PC_PORT_DOWN)
-| (pp->config & OFPUTIL_PC_PORT_DOWN));
-port->pp.state = ((port->pp.state & ~OFPUTIL_PS_LINK_DOWN)
-  | (pp->state & OFPUTIL_PS_LINK_DOWN));
-port->pp.curr = pp->curr;
-port->pp.advertised = pp->advertised;
-port->pp.supported = pp->supported;
-port->pp.peer = pp->peer;
-port->pp.curr_speed = pp->curr_speed;
-port->pp.max_speed = pp->max_speed;
-}
 
 /* Update OpenFlow 'state' in 'port' and notify controller. */
 void
@@ -2633,10 +2612,15 @@ update_port(struct ofproto *ofproto, const char *name)
 struct netdev *old_netdev = port->netdev;
 struct ofputil_phy_port old_pp = port->pp;
 
+/* ofport_open() only sets OFPUTIL_PC_PORT_DOWN and
+ * OFPUTIL_PS_LINK_DOWN.  Keep the other config and state bits. */
+pp.config |= port->pp.config & ~OFPUTIL_PC_PORT_DOWN;
+pp.state |= port->pp.state & ~OFPUTIL_PS_LINK_DOWN;
+
 /* 'name' hasn't changed location.  Any properties changed? */
 bool port_changed = !ofport_equal(>pp, );
 if (port_changed) {
-ofport_modified(port, );
+port->pp = pp;
 }
 
 update_mtu(ofproto, port);
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/6] connmgr: Suppress duplicate port status notifications.

2018-08-24 Thread Ben Pfaff
When the status of a port changes, ofproto calls into connmgr to notify
controllers.  Sometimes, particular changes are only visible to controllers
running specific versions of OpenFlow.  Until now, OVS would send those
controllers duplicate port status notifications.  This is unnecessary and
somewhat confusing.  This commit eliminates it.

This commit updates one of the tests not to expect duplicate notifications.

Signed-off-by: Ben Pfaff 
---
 ofproto/connmgr.c | 38 --
 ofproto/connmgr.h |  4 +++-
 ofproto/ofproto.c | 18 ++
 tests/lacp.at |  3 ---
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index f78b4c5ff411..50e0b8f991b5 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1609,24 +1609,28 @@ ofconn_send(const struct ofconn *ofconn, struct ofpbuf 
*msg,
 
 /* Sending asynchronous messages. */
 
-/* Sends an OFPT_PORT_STATUS message with 'opp' and 'reason' to appropriate
+/* Sends an OFPT_PORT_STATUS message with 'new_pp' and 'reason' to appropriate
  * controllers managed by 'mgr'.  For messages caused by a controller
  * OFPT_PORT_MOD, specify 'source' as the controller connection that sent the
- * request; otherwise, specify 'source' as NULL. */
+ * request; otherwise, specify 'source' as NULL.
+ *
+ * If 'reason' is OFPPR_MODIFY and 'old_pp' is nonnull, then messages are
+ * suppressed in the case where the change would not be visible to a particular
+ * controller.  For example, OpenFlow 1.0 does not have the OFPPS_LIVE flag, so
+ * this would suppress a change solely to that flag from being sent to an
+ * OpenFlow 1.0 controller. */
 void
 connmgr_send_port_status(struct connmgr *mgr, struct ofconn *source,
- const struct ofputil_phy_port *pp, uint8_t reason)
+ const struct ofputil_phy_port *old_pp,
+ const struct ofputil_phy_port *new_pp,
+ uint8_t reason)
 {
 /* XXX Should limit the number of queued port status change messages. */
-struct ofputil_port_status ps;
-struct ofconn *ofconn;
+struct ofputil_port_status new_ps = { reason, *new_pp };
 
-ps.reason = reason;
-ps.desc = *pp;
+struct ofconn *ofconn;
 LIST_FOR_EACH (ofconn, node, >all_conns) {
 if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS, reason)) {
-struct ofpbuf *msg;
-
 /* Before 1.5, OpenFlow specified that OFPT_PORT_MOD should not
  * generate OFPT_PORT_STATUS messages.  That requirement was a
  * relic of how OpenFlow originally supported a single controller,
@@ -1651,7 +1655,21 @@ connmgr_send_port_status(struct connmgr *mgr, struct 
ofconn *source,
 continue;
 }
 
-msg = ofputil_encode_port_status(, ofconn_get_protocol(ofconn));
+enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
+struct ofpbuf *msg = ofputil_encode_port_status(_ps, protocol);
+if (reason == OFPPR_MODIFY && old_pp) {
+struct ofputil_port_status old_ps = { reason, *old_pp };
+struct ofpbuf *old_msg = ofputil_encode_port_status(_ps,
+protocol);
+bool suppress = ofpbuf_equal(msg, old_msg);
+ofpbuf_delete(old_msg);
+
+if (suppress) {
+ofpbuf_delete(msg);
+continue;
+}
+}
+
 ofconn_send(ofconn, msg, NULL);
 }
 }
diff --git a/ofproto/connmgr.h b/ofproto/connmgr.h
index eb3be16686c5..4a22f1c26611 100644
--- a/ofproto/connmgr.h
+++ b/ofproto/connmgr.h
@@ -147,7 +147,9 @@ void ofconn_report_flow_mod(struct ofconn *, enum 
ofp_flow_mod_command);
 /* Sending asynchronous messages. */
 bool connmgr_wants_packet_in_on_miss(struct connmgr *mgr);
 void connmgr_send_port_status(struct connmgr *, struct ofconn *source,
-  const struct ofputil_phy_port *, uint8_t reason);
+  const struct ofputil_phy_port *old_pp,
+  const struct ofputil_phy_port *new_pp,
+  uint8_t reason);
 void connmgr_send_flow_removed(struct connmgr *,
const struct ofputil_flow_removed *)
 OVS_REQUIRES(ofproto_mutex);
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 9552a585d096..95235f21232c 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2417,7 +2417,7 @@ ofport_install(struct ofproto *p,
 if (error) {
 goto error;
 }
-connmgr_send_port_status(p->connmgr, NULL, pp, OFPPR_ADD);
+connmgr_send_port_status(p->connmgr, NULL, NULL, pp, OFPPR_ADD);
 return 0;
 
 error:
@@ -2438,7 +2438,7 @@ ofport_remove(struct ofport *ofport)
 struct ofproto *p = ofport->ofproto;
 bool 

[ovs-dev] [PATCH v9 14/14] dpdk-tests: End-to-end tests for multi-seg mbufs.

2018-08-24 Thread Tiago Lam
The following tests are added to the DPDK testsuite to add some
coverage for the multi-segment mbufs:
- Check that multi-segment mbufs are disabled by default;
- Check that providing `other_config:dpdk-multi-seg-mbufs=true` indeed
  enables mbufs;
- Using a DPDK port, send a random packet out and check that `ofctl
  dump-flows` shows the correct amount of packets and bytes sent.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk.at | 65 
 1 file changed, 65 insertions(+)

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 3d21b01..af8de8c 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -71,3 +71,68 @@ OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel 
module is probably n
 ")
 AT_CLEANUP
 dnl --
+
+AT_SETUP([Jumbo frames - Multi-segment disabled by default])
+OVS_DPDK_START()
+
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [1], [])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment enabled])
+OVS_DPDK_START([dpdk-multi-seg-mbufs=true])
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [], [stdout])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment mbufs Tx])
+OVS_DPDK_PRE_CHECK()
+OVS_DPDK_START([per-port-memory=true dpdk-multi-seg-mbufs=true])
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br10 dpdk0 \
+-- set Interface dpdk0 type=dpdk options:dpdk-devargs=$(cat PCI_ADDR) \
+-- set Interface dpdk0 mtu_request=9000], [], [stdout], [stderr])
+
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+dnl Add flows to send packets out from the 'dpdk0' port
+AT_CHECK([
+ovs-ofctl del-flows br10
+ovs-ofctl add-flow br10 in_port=LOCAL,actions=output:dpdk0
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [], [stdout])
+
+dnl Send packet out, of the 'dpdk0' port
+AT_CHECK([
+ARP_HEADER="09000B0009000A000806000108000604000100010A\
+0100020A02"
+dnl Build a random hex string to append to the ARP_HEADER
+RANDOM_BODY=$(printf '0102030405%.0s' {1..1750})
+dnl 8792B ARP packet
+RANDOM_ARP="$ARP_HEADER$RANDOM_BODY"
+
+ovs-ofctl packet-out br10 "packet=$RANDOM_ARP,action=resubmit:LOCAL"
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [0], [stdout])
+
+dnl Confirm the single packet as been sent with correct size
+AT_CHECK([ovs-ofctl dump-flows br10 | ofctl_strip | grep in_port], [0], [dnl
+ n_packets=1, n_bytes=8792, in_port=LOCAL actions=output:1
+])
+
+dnl Clean up
+OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably 
not loaded./d
+/Failed to enable flow control/d
+/failed to connect to \/tmp\/dpdkvhostclient0: No such file or directory/d
+/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 13/14] dpdk-tests: Accept other configs in OVS_DPDK_START

2018-08-24 Thread Tiago Lam
As it stands, OVS_DPDK_START() won't allow other configs to be set
before starting the ovs-vswitchd daemon. This is a problem since some
configs, such as the "dpdk-multi-seg-mbufs=true" for enabling the
multi-segment mbufs, need to be set prior to start OvS.

To support other options, OVS_DPDK_START() has been modified to accept
extra configs in the form "$config_name=$config_value". It then uses
ovs-vsctl to set the configs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk-macros.at | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index 0762ee0..7c65834 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -21,7 +21,7 @@ m4_define([OVS_DPDK_PRE_CHECK],
 ])
 
 
-# OVS_DPDK_START()
+# OVS_DPDK_START([other-conf-args])
 #
 # Create an empty database and start ovsdb-server. Add special configuration
 # dpdk-init to enable DPDK functionality. Start ovs-vswitchd connected to that
@@ -48,6 +48,10 @@ m4_define([OVS_DPDK_START],
AT_CHECK([lscpu], [], [stdout])
AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) 
{printf "1024,"}; print "1024"}' > SOCKET_MEM])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
other_config:dpdk-socket-mem="$(cat SOCKET_MEM)"])
+   dnl Iterate through $other-conf-args list and include them
+   m4_foreach_w(opt, $1, [
+   AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:opt])
+   ])
 
dnl Start ovs-vswitchd.
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn 
-vofproto_dpif -vunixctl], [0], [stdout], [stderr])
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 12/14] dpdk-tests: Add unit-tests for multi-seg mbufs.

2018-08-24 Thread Tiago Lam
In order to create a minimal environment that allows the tests to get
mbufs from an existing mempool, the following approach is taken:
- EAL is initialised (by using the main dpdk_init()) and a (very) small
  mempool is instantiated (mimicking the logic in dpdk_mp_create()).
  This mempool instance is global and used by all the tests;
- Packets are then allocated from the instantiated mempool, and tested
  on, by running some operations on them and manipulating data.

The tests introduced focus on testing DPDK dp_packets (where
source=DPBUF_DPDK), linked with a single or multiple mbufs, across
several operations, such as:
- dp_packet_put();
- dp_packet_shift();
- dp_packet_reserve();
- dp_packet_push_uninit();
- dp_packet_clear();
- dp_packet_equal();
- dp_packet_linear_data();
- And as a consequence of some of these, dp_packet_put_uninit() and
  dp_packet_resize__().

Finally, this has also been integrated with the new DPDK testsuite.
Thus, when running `$sudo make check-dpdk` one will also be running
these tests.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/automake.mk  |  10 +-
 tests/dpdk-packet-mbufs.at |   7 +
 tests/system-dpdk-testsuite.at |   1 +
 tests/test-dpdk-mbufs.c| 619 +
 4 files changed, 636 insertions(+), 1 deletion(-)
 create mode 100644 tests/dpdk-packet-mbufs.at
 create mode 100644 tests/test-dpdk-mbufs.c

diff --git a/tests/automake.mk b/tests/automake.mk
index 49ceb41..f484f69 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -135,7 +135,8 @@ SYSTEM_DPDK_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-dpdk-macros.at \
tests/system-dpdk-testsuite.at \
-   tests/system-dpdk.at
+   tests/system-dpdk.at \
+   tests/dpdk-packet-mbufs.at
 
 check_SCRIPTS += tests/atlocal
 
@@ -392,6 +393,10 @@ tests_ovstest_SOURCES = \
tests/test-vconn.c \
tests/test-aa.c \
tests/test-stopwatch.c
+if DPDK_NETDEV
+tests_ovstest_SOURCES += \
+   tests/test-dpdk-mbufs.c
+endif
 
 if !WIN32
 tests_ovstest_SOURCES += \
@@ -404,6 +409,9 @@ tests_ovstest_SOURCES += \
 endif
 
 tests_ovstest_LDADD = lib/libopenvswitch.la ovn/lib/libovn.la
+if DPDK_NETDEV
+tests_ovstest_LDFLAGS = $(AM_LDFLAGS) $(DPDK_vswitchd_LDFLAGS)
+endif
 
 noinst_PROGRAMS += tests/test-strtok_r
 tests_test_strtok_r_SOURCES = tests/test-strtok_r.c
diff --git a/tests/dpdk-packet-mbufs.at b/tests/dpdk-packet-mbufs.at
new file mode 100644
index 000..f28e4fc
--- /dev/null
+++ b/tests/dpdk-packet-mbufs.at
@@ -0,0 +1,7 @@
+AT_BANNER([OVS-DPDK dp_packet unit tests])
+
+AT_SETUP([OVS-DPDK dp_packet - mbufs allocation])
+AT_KEYWORDS([dp_packet, multi-seg, mbufs])
+AT_CHECK(ovstest test-dpdk-packet, [], [ignore], [ignore])
+
+AT_CLEANUP
diff --git a/tests/system-dpdk-testsuite.at b/tests/system-dpdk-testsuite.at
index 382f09e..f5edf58 100644
--- a/tests/system-dpdk-testsuite.at
+++ b/tests/system-dpdk-testsuite.at
@@ -23,3 +23,4 @@ m4_include([tests/system-common-macros.at])
 m4_include([tests/system-dpdk-macros.at])
 
 m4_include([tests/system-dpdk.at])
+m4_include([tests/dpdk-packet-mbufs.at])
diff --git a/tests/test-dpdk-mbufs.c b/tests/test-dpdk-mbufs.c
new file mode 100644
index 000..19081a3
--- /dev/null
+++ b/tests/test-dpdk-mbufs.c
@@ -0,0 +1,619 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dp-packet.h"
+#include "ovstest.h"
+#include "dpdk.h"
+#include "smap.h"
+
+#define N_MBUFS 1024
+#define MBUF_DATA_LEN 2048
+
+static int num_tests = 0;
+
+/* Global var to hold a mempool instance, "test-mp", used in all of the tests
+ * below. This instance is instantiated in dpdk_setup_eal_with_mp(). */
+static struct rte_mempool *mp;
+
+/* Test data used to fill the packets with data. Note that this isn't a string
+ * that repsents a valid packet, by any means. The pattern is generated in set_
+ * testing_pattern_str() and the sole purpose is to verify the data remains the
+ * same after inserting and operating on multi-segment mbufs. */
+static char *test_str;
+
+/* Asserts a dp_packet that holds a single mbuf, where:
+ * - nb_segs must be 1;
+ * - pkt_len must be equal to data_len which in turn must equal the provided
+ *   'pkt_len';
+ * - data_off must start at the provided 'data_ofs';
+ * - next must be NULL. 

[ovs-dev] [PATCH v9 11/14] netdev-dpdk: support multi-segment jumbo frames

2018-08-24 Thread Tiago Lam
From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 Documentation/topics/dpdk/jumbo-frames.rst | 67 ++
 Documentation/topics/dpdk/memory.rst   | 36 
 NEWS   |  1 +
 lib/dpdk.c |  8 
 lib/netdev-dpdk.c  | 66 +
 lib/netdev-dpdk.h  |  2 +
 vswitchd/vswitch.xml   | 22 ++
 7 files changed, 194 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..07bf3ca 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,70 @@ Jumbo frame support has been validated against 9728B frames, 
which is the
 largest frame size supported by Fortville NIC using the DPDK i40e driver, but
 larger frames and other DPDK NIC drivers may be supported. These cases are
 common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+$ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation for 64B
+packets, and the same rate was observed when comparing multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test case.
+
+Because of this, multi-segment mbufs is not advised to be used with smaller
+packet 

[ovs-dev] [PATCH v9 10/14] netdev-dpdk: copy large packet to multi-seg. mbufs

2018-08-24 Thread Tiago Lam
From: Mark Kavanagh 

Currently, packets are only copied to a single segment in the function
dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames,
particularly when multi-segment mbufs are involved.

This patch calculates the number of segments needed by a packet and
copies the data to each segment.

A new function, dpdk_buf_alloc(), has also been introduced as a wrapper
around the nonpmd_mp_mutex to serialise allocations from a non-pmd
context.

Co-authored-by: Michael Qiu 
Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Michael Qiu 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 84 +--
 1 file changed, 75 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e005d00..5ab1af3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -552,6 +552,27 @@ dpdk_rte_mzalloc(size_t sz)
 return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
 }
 
+static struct rte_mbuf *
+dpdk_buf_alloc(struct rte_mempool *mp)
+{
+struct rte_mbuf *mbuf = NULL;
+
+/* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_lock(_mp_mutex);
+
+mbuf = rte_pktmbuf_alloc(mp);
+
+ovs_mutex_unlock(_mp_mutex);
+
+return mbuf;
+}
+
+mbuf = rte_pktmbuf_alloc(mp);
+
+return mbuf;
+}
+
 void
 free_dpdk_buf(struct dp_packet *packet)
 {
@@ -2316,6 +2337,49 @@ out:
 }
 }
 
+static int
+dpdk_copy_dp_packet_to_mbuf(struct dp_packet *packet, struct rte_mbuf **head,
+struct rte_mempool *mp)
+{
+struct rte_mbuf *mbuf, *fmbuf;
+uint32_t size = dp_packet_size(packet);
+uint16_t max_data_len;
+uint32_t nb_segs = 0;
+
+/* Allocate first mbuf to know the size of data available */
+fmbuf = mbuf = *head = dpdk_buf_alloc(mp);
+if (OVS_UNLIKELY(!mbuf)) {
+return ENOMEM;
+}
+
+/* All new allocated mbuf's max data len is the same */
+max_data_len = mbuf->buf_len - mbuf->data_off;
+
+/* Calculate # of output mbufs. */
+nb_segs = size / max_data_len;
+if (size % max_data_len) {
+nb_segs = nb_segs + 1;
+}
+
+/* Allocate additional mbufs, less the one alredy allocated above */
+for (int i = 1; i < nb_segs; i++) {
+mbuf->next = dpdk_buf_alloc(mp);
+if (!mbuf->next) {
+free_dpdk_buf(CONTAINER_OF(fmbuf, struct dp_packet, mbuf));
+fmbuf = NULL;
+return ENOMEM;
+}
+mbuf = mbuf->next;
+}
+
+fmbuf->nb_segs = nb_segs;
+fmbuf->pkt_len = size;
+
+dp_packet_mbuf_write(fmbuf, 0, size, dp_packet_linear_data(packet));
+
+return 0;
+}
+
 /* Tx function. Transmit packets indefinitely */
 static void
 dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2332,6 +2396,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
 uint32_t cnt = batch_cnt;
 uint32_t dropped = 0;
+uint32_t i;
 
 if (dev->type != DPDK_DEV_VHOST) {
 /* Check if QoS has been configured for this netdev. */
@@ -2342,28 +2407,29 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 
 uint32_t txcnt = 0;
 
-for (uint32_t i = 0; i < cnt; i++) {
+for (i = 0; i < cnt; i++) {
 struct dp_packet *packet = batch->packets[i];
 uint32_t size = dp_packet_size(packet);
+int err = 0;
 
 if (OVS_UNLIKELY(size > dev->max_packet_len)) {
 VLOG_WARN_RL(, "Too big size %u max_packet_len %d",
  size, dev->max_packet_len);
-
 dropped++;
 continue;
 }
 
-pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
-if (OVS_UNLIKELY(!pkts[txcnt])) {
+err = dpdk_copy_dp_packet_to_mbuf(packet, [txcnt],
+  dev->dpdk_mp->mp);
+if (err != 0) {
+if (err == ENOMEM) {
+VLOG_ERR_RL(, "Failed to alloc mbufs! %u packets dropped",
+cnt - i);
+}
+
 dropped += cnt - i;
 break;
 }
-
-/* We have to do a copy for now */
-memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
-   dp_packet_data(packet), size);
-dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
 dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
 
 txcnt++;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 09/14] dp-packet: Add support for data "linearization".

2018-08-24 Thread Tiago Lam
Previous commits have added support to the dp_packet API to handle
multi-segmented packets, where data is not stored contiguously in
memory. However, in some cases, it is inevitable and data must be
provided contiguously. Examples of such cases are when performing csums
over the entire packet data, or when write()'ing to a file descriptor
(for a tap interface, for example). For such cases, the dp_packet API
has been extended to provide a way to transform a multi-segmented
DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
copy of memory). If the packet's data is already stored in memory
contigously then there's no need to convert the packet.

Additionally, the main use cases that were assuming that a dp_packet's
data is always held contiguously in memory were changed to make use of
the new "linear functions" in the dp_packet API when there's a need to
traverse the entire's packet data. Per the example above, when the
packet's data needs to be write() to the tap's file descriptor, or when
the conntrack module needs to verify a packet's checksum, the data is
now linearized.

Signed-off-by: Tiago Lam 
---
 lib/bfd.c |  3 +-
 lib/conntrack.c   | 17 +
 lib/dp-packet.c   | 18 +
 lib/dp-packet.h   | 89 +++
 lib/dpif-netlink.c|  2 +-
 lib/dpif.c|  2 +-
 lib/netdev-bsd.c  |  2 +-
 lib/netdev-dummy.c|  5 ++-
 lib/netdev-linux.c|  5 ++-
 lib/netdev-native-tnl.c   | 10 -
 lib/odp-execute.c |  2 +-
 lib/ofp-print.c   |  2 +-
 lib/ovs-lldp.c|  3 +-
 lib/packets.c |  3 +-
 ofproto/ofproto-dpif-sflow.c  |  2 +-
 ofproto/ofproto-dpif-upcall.c |  2 +-
 ofproto/ofproto-dpif-xlate.c  | 12 --
 17 files changed, 145 insertions(+), 34 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 5308262..d50d2da 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -722,7 +722,8 @@ bfd_process_packet(struct bfd *bfd, const struct flow *flow,
 if (!msg) {
 VLOG_INFO_RL(, "%s: Received too-short BFD control message (only "
  "%"PRIdPTR" bytes long, at least %d required).",
- bfd->name, (uint8_t *) dp_packet_tail(p) - l7,
+ bfd->name, dp_packet_size(p) -
+ (l7 - (uint8_t *) dp_packet_data(p)),
  BFD_PACKET_LEN);
 goto out;
 }
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 974f985..15d1ed2 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -636,6 +636,8 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 static void
 reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
+void *l3 = dp_packet_linear_ofs(pkt, pkt->l3_ofs);
+void *l4 = dp_packet_linear_ofs(pkt, pkt->l4_ofs);
 char *tail = dp_packet_tail(pkt);
 char pad = dp_packet_l2_pad_size(pkt);
 struct conn_key inner_key;
@@ -644,8 +646,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 uint16_t orig_l4_ofs = pkt->l4_ofs;
 
 if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
-struct ip_header *nh = dp_packet_l3(pkt);
-struct icmp_header *icmp = dp_packet_l4(pkt);
+struct ip_header *nh = l3;
+struct icmp_header *icmp = l4;
 struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
 extract_l3_ipv4(_key, inner_l3, tail - ((char *)inner_l3) - pad,
 _l4, false);
@@ -664,8 +666,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 icmp->icmp_csum = 0;
 icmp->icmp_csum = csum(icmp, tail - (char *) icmp - pad);
 } else {
-struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt);
-struct icmp6_error_header *icmp6 = dp_packet_l4(pkt);
+struct ovs_16aligned_ip6_hdr *nh6 = l3;
+struct icmp6_error_header *icmp6 = l4;
 struct ovs_16aligned_ip6_hdr *inner_l3_6 =
 (struct ovs_16aligned_ip6_hdr *) (icmp6 + 1);
 extract_l3_ipv6(_key, inner_l3_6,
@@ -1320,6 +1322,7 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 write_ct_md(packet, zone, NULL, NULL, NULL);
 continue;
 }
+
 process_one(ct, packet, , zone, force, commit, now, setmark,
 setlabel, nat_action_info, tp_src, tp_dst, helper);
 }
@@ -1902,8 +1905,8 @@ conn_key_extract(struct conntrack *ct, struct dp_packet 
*pkt, ovs_be16 dl_type,
  struct conn_lookup_ctx *ctx, uint16_t zone)
 {
 const struct eth_header *l2 = dp_packet_eth(pkt);
-const struct ip_header *l3 = dp_packet_l3(pkt);
-const char *l4 = dp_packet_l4(pkt);
+const struct ip_header *l3 = dp_packet_linear_ofs(pkt, pkt->l3_ofs);
+const char *l4 = dp_packet_linear_ofs(pkt, pkt->l4_ofs);
 
 memset(ctx, 0, sizeof *ctx);
 
@@ 

[ovs-dev] [PATCH v9 08/14] dp-packet: copy data from multi-seg. DPDK mbuf

2018-08-24 Thread Tiago Lam
From: Michael Qiu 

When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's data one by
one.

Also, lots of DPDK mbuf's info is missed during a copy, like packet
type, ol_flags, etc.  That information is very important for DPDK to do
packets processing.

Co-authored-by: Mark Kavanagh 
Co-authored-by: Tiago Lam 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 69 ++-
 lib/dp-packet.h   |  3 +++
 lib/netdev-dpdk.c |  1 +
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 167bf43..806640b 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t 
allocated,
 dp_packet_set_size(b, 0);
 }
 
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
+{
+ovs_assert(dst != NULL && src != NULL);
+struct rte_mbuf *buf_dst = &(dst->mbuf);
+struct rte_mbuf buf_src = src->mbuf;
+
+buf_dst->ol_flags = buf_src.ol_flags;
+buf_dst->packet_type = buf_src.packet_type;
+buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should be the first byte of a region
  * obtained from malloc().  It will be freed (with free()) if 'b' is resized or
@@ -158,6 +174,44 @@ dp_packet_clone(const struct dp_packet *buffer)
 return dp_packet_clone_with_headroom(buffer, 0);
 }
 
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *b, size_t headroom) {
+struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(b);
+
+/* copy multi-seg data */
+if (b->source == DPBUF_DPDK && !rte_pktmbuf_is_contiguous(>mbuf)) {
+void *dst = NULL;
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
+
+new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+dst = dp_packet_data(new_buffer);
+dp_packet_set_size(new_buffer, pkt_len);
+
+if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
+return NULL;
+}
+} else {
+new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(b),
+dp_packet_size(b),
+headroom);
+}
+
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(_buffer->l2_pad_size, >l2_pad_size,
+   sizeof(struct dp_packet) -
+   offsetof(struct dp_packet, l2_pad_size));
+
+dp_packet_copy_mbuf_flags(new_buffer, b);
+if (dp_packet_rss_valid(new_buffer)) {
+new_buffer->mbuf.hash.rss = b->mbuf.hash.rss;
+}
+
+return new_buffer;
+}
+#else
 /* Creates and returns a new dp_packet whose data are copied from 'buffer'.
  * The returned dp_packet will additionally have 'headroom' bytes of
  * headroom. */
@@ -165,32 +219,25 @@ struct dp_packet *
 dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
 {
 struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(buffer);
 
 new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
 /* Copy the following fields into the returned buffer: l2_pad_size,
  * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
 memcpy(_buffer->l2_pad_size, >l2_pad_size,
 sizeof(struct dp_packet) -
 offsetof(struct dp_packet, l2_pad_size));
 
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
 new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
 if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
 new_buffer->rss_hash = buffer->rss_hash;
-#endif
 }
 
 return new_buffer;
 }
+#endif
 
 /* Creates and returns a new dp_packet that initially contains a copy of the
  * 'size' bytes of data starting at 'data' with no headroom or tailroom. */
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3a99044..022e420 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -124,6 +124,9 @@ void dp_packet_init_dpdk(struct dp_packet *);
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
 
+void dp_packet_copy_mbuf_flags(struct dp_packet *dst,
+   const struct dp_packet *src);
+
 

[ovs-dev] [PATCH v9 07/14] dp-packet: Handle multi-seg mubfs in shift() func.

2018-08-24 Thread Tiago Lam
In its current implementation dp_packet_shift() is also unaware of
multi-seg mbufs (that holds data in memory non-contiguously) and assumes
that data exists contiguously in memory, memmove'ing data to perform the
shift.

To add support for multi-seg mbuds a new set of functions was
introduced, dp_packet_mbuf_shift() and dp_packet_mbuf_write(). These
functions are used by dp_packet_shift(), when handling multi-seg mbufs,
to shift and write data within a chain of mbufs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c | 100 
 lib/dp-packet.h |  10 ++
 2 files changed, 110 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 2aaeaae..167bf43 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -294,6 +294,100 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t 
size)
 }
 }
 
+#ifdef DPDK_NETDEV
+/* Write len data bytes in a mbuf at specified offset.
+ *
+ * 'mbuf', pointer to the destination mbuf where 'ofs' is, and the mbuf where
+ * the data will first be written.
+ * 'ofs', the offset within the provided 'mbuf' where 'data' is to be written.
+ * 'len', the size of the to be written 'data'.
+ * 'data', pointer to the to be written bytes.
+ *
+ * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function
+ * available with DPDK, in the rte_mbuf.h */
+void
+dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len,
+ const void *data)
+{
+char *dst_addr;
+uint16_t data_len;
+int len_copy;
+while (mbuf) {
+if (len == 0) {
+break;
+}
+
+dst_addr = rte_pktmbuf_mtod_offset(mbuf, char *, ofs);
+data_len = MBUF_BUF_END(mbuf->buf_addr, mbuf->buf_len) - dst_addr;
+
+len_copy = MIN(len, data_len);
+/* We don't know if 'data' is the result of a rte_pktmbuf_read() call,
+ * in which case we may end up writing to the same region of memory we
+ * are reading from and overlapping. Hence the use of memmove() here */
+memmove(dst_addr, data, len_copy);
+
+data = ((char *) data) + len_copy;
+len -= len_copy;
+ofs = 0;
+
+mbuf->data_len = len_copy;
+mbuf = mbuf->next;
+}
+}
+
+static void
+dp_packet_mbuf_shift_(struct rte_mbuf *dbuf, int16_t dst_ofs,
+  const struct rte_mbuf *sbuf, uint16_t src_ofs, int len)
+{
+char *rd = xmalloc(sizeof(*rd) * len);
+const char *wd = rte_pktmbuf_read(sbuf, src_ofs, len, rd);
+
+ovs_assert(wd);
+
+dp_packet_mbuf_write(dbuf, dst_ofs, len, wd);
+
+free(rd);
+}
+
+/* Similarly to dp_packet_shift(), shifts the data within the mbufs of a
+ * dp_packet of DPBUF_DPDK source by 'delta' bytes.
+ * Caller must make sure of the following conditions:
+ * - When shifting left, delta can't be bigger than the data_len available in
+ *   the last mbuf;
+ * - When shifting right, delta can't be bigger than the space available in the
+ *   first mbuf (buf_len - data_off).
+ * Both these conditions guarantee that a shift operation doesn't fall outside
+ * the bounds of the existing mbufs, so that the first and last mbufs (when
+ * using multi-segment mbufs), remain the same. */
+static void
+dp_packet_mbuf_shift(struct dp_packet *b, int delta)
+{
+uint16_t src_ofs;
+int16_t dst_ofs;
+
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
+struct rte_mbuf *tmbuf = rte_pktmbuf_lastseg(mbuf);
+
+if (delta < 0) {
+ovs_assert(-delta <= tmbuf->data_len);
+} else {
+ovs_assert(delta < (mbuf->buf_len - mbuf->data_off));
+}
+
+/* Set the destination and source offsets to copy to */
+dst_ofs = delta;
+src_ofs = 0;
+
+/* Shift data from src mbuf and offset to dst mbuf and offset */
+dp_packet_mbuf_shift_(mbuf, dst_ofs, mbuf, src_ofs,
+  rte_pktmbuf_pkt_len(mbuf));
+
+/* Update mbufs' properties, and if using multi-segment mbufs, first and
+ * last mbuf's data_len also needs to be adjusted */
+mbuf->data_off = mbuf->data_off + dst_ofs;
+}
+#endif
+
 /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
  * For example, a 'delta' of 1 would cause each byte of data to move one byte
  * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
@@ -306,6 +400,12 @@ dp_packet_shift(struct dp_packet *b, int delta)
: true);
 
 if (delta != 0) {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+dp_packet_mbuf_shift(b, delta);
+return;
+}
+#endif
 char *dst = (char *) dp_packet_data(b) + delta;
 memmove(dst, dp_packet_data(b), dp_packet_size(b));
 dp_packet_set_data(b, dst);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 48be19b..3a99044 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -80,6 +80,11 @@ struct dp_packet {
 };
 };
 
+#ifdef 

[ovs-dev] [PATCH v9 06/14] dp-packet: Handle multi-seg mbufs in helper funcs.

2018-08-24 Thread Tiago Lam
Most helper functions in dp-packet assume that the data held by a
dp_packet is contiguous, and perform operations such as pointer
arithmetic under that assumption. However, with the introduction of
multi-segment mbufs, where data is non-contiguous, such assumptions are
no longer possible. Some examples of Such helper functions are
dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
dp_packet_get_allocated() and dp_packet_at().

Thus, instead of assuming contiguous data in dp_packet, they  now
iterate over the (non-contiguous) data in mbufs to perform their
calculations.

Finally, dp_packet_use__() has also been modified to perform the
initialisation of the packet (and setting the source) before continuing
to set its size and data length, which now depends on the type of
packet.

Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c |   4 +-
 lib/dp-packet.h | 150 +++-
 2 files changed, 140 insertions(+), 14 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 782e7c2..2aaeaae 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -41,11 +41,11 @@ static void
 dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
  enum dp_packet_source source)
 {
+dp_packet_init__(b, allocated, source);
+
 dp_packet_set_base(b, base);
 dp_packet_set_data(b, base);
 dp_packet_set_size(b, 0);
-
-dp_packet_init__(b, allocated, source);
 }
 
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index d2803af..48be19b 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -185,9 +185,25 @@ dp_packet_delete(struct dp_packet *b)
 static inline void *
 dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
 {
-return offset + size <= dp_packet_size(b)
-   ? (char *) dp_packet_data(b) + offset
-   : NULL;
+if (offset + size > dp_packet_size(b)) {
+return NULL;
+}
+
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, >mbuf);
+
+while (buf && offset > buf->data_len) {
+offset -= buf->data_len;
+
+buf = buf->next;
+}
+
+return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+}
+#endif
+
+return (char *) dp_packet_data(b) + offset;
 }
 
 /* Returns a pointer to byte 'offset' in 'b', which must contain at least
@@ -196,13 +212,23 @@ static inline void *
 dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
 {
 ovs_assert(offset + size <= dp_packet_size(b));
-return ((char *) dp_packet_data(b)) + offset;
+return dp_packet_at(b, offset, size);
 }
 
 /* Returns a pointer to byte following the last byte of data in use in 'b'. */
 static inline void *
 dp_packet_tail(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, >mbuf);
+/* Find last segment where data ends, meaning the tail of the chained
+ *  mbufs must be there */
+buf = rte_pktmbuf_lastseg(buf);
+
+return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len);
+}
+#endif
 return (char *) dp_packet_data(b) + dp_packet_size(b);
 }
 
@@ -211,6 +237,15 @@ dp_packet_tail(const struct dp_packet *b)
 static inline void *
 dp_packet_end(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+buf = rte_pktmbuf_lastseg(buf);
+
+return (char *) buf->buf_addr + buf->buf_len;
+}
+#endif
 return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
 }
 
@@ -236,6 +271,15 @@ dp_packet_tailroom(const struct dp_packet *b)
 static inline void
 dp_packet_clear(struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+/* sets pkt_len and data_len to zero and frees unused mbufs */
+dp_packet_set_size(b, 0);
+rte_pktmbuf_reset(>mbuf);
+
+return;
+}
+#endif
 dp_packet_set_data(b, dp_packet_base(b));
 dp_packet_set_size(b, 0);
 }
@@ -252,24 +296,38 @@ dp_packet_pull(struct dp_packet *b, size_t size)
 return data;
 }
 
+#ifdef DPDK_NETDEV
+/* Similar to dp_packet_try_pull() but doesn't actually pull any data, only
+ * checks if it could and returns true or false accordingly.
+ *
+ * Valid for dp_packets carrying mbufs only. */
+static inline bool
+dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) {
+if (size > b->mbuf.data_len) {
+return false;
+}
+
+return true;
+}
+#endif
+
 /* If 'b' has at least 'size' bytes of data, removes that many bytes from the
  * head end of 'b' and returns the first byte removed.  Otherwise, returns a
  * null pointer without modifying 

[ovs-dev] [PATCH v9 05/14] dp-packet: Fix data_len handling multi-seg mbufs.

2018-08-24 Thread Tiago Lam
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu 
Co-authored-by: Mark Kavanagh 
Co-authored-by: Przemyslaw Lal 
Co-authored-by: Marcin Ksiadz 
Co-authored-by: Yuanhan Liu 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Przemyslaw Lal 
Signed-off-by: Marcin Ksiadz 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.h | 76 -
 1 file changed, 64 insertions(+), 12 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6376039..d2803af 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -429,17 +429,49 @@ dp_packet_size(const struct dp_packet *b)
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-/* netdev-dpdk does not currently support segmentation; consequently, for
- * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
- * be used interchangably.
- *
- * On the datapath, it is expected that the size of packets
- * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
- * loss of accuracy in assigning 'v' to 'data_len'.
- */
-b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-b->mbuf.pkt_len = v; /* Total length of all segments linked to
-  * this segment. */
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *mbuf = >mbuf;
+uint16_t new_len = v;
+uint16_t data_len;
+uint16_t nb_segs = 0;
+uint16_t pkt_len = 0;
+
+/* Trim 'v' length bytes from the end of the chained buffers, freeing
+   any buffers that may be left floating */
+while (mbuf) {
+data_len = MIN(new_len, mbuf->data_len);
+mbuf->data_len = data_len;
+
+if (new_len - data_len <= 0) {
+/* Free the rest of chained mbufs */
+free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
+   mbuf));
+mbuf->next = NULL;
+} else if (!mbuf->next) {
+/* Don't assign more than what we have available */
+mbuf->data_len = MIN(new_len,
+ mbuf->buf_len - mbuf->data_off);
+}
+
+new_len -= data_len;
+nb_segs += 1;
+pkt_len += mbuf->data_len;
+mbuf = mbuf->next;
+}
+
+/* pkt_len != v would effectively mean that pkt_len < than 'v' (as
+ * being bigger is logically impossible). Being < than 'v' would mean
+ * the 'v' provided was bigger than the available room, which is the
+ * responsibility of the caller to make sure there is enough room */
+ovs_assert(pkt_len == v);
+
+b->mbuf.nb_segs = nb_segs;
+b->mbuf.pkt_len = pkt_len;
+} else {
+b->mbuf.data_len = v;
+/* Total length of all segments linked to this segment. */
+b->mbuf.pkt_len = v;
+}
 }
 
 static inline uint16_t
@@ -451,7 +483,27 @@ __packet_data(const struct dp_packet *b)
 static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
-b->mbuf.data_off = v;
+if (b->source == DPBUF_DPDK) {
+/* Moving data_off away from the first mbuf in the chain is not a
+ * possibility using DPBUF_DPDK dp_packets */
+ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
+
+uint16_t prev_ofs = b->mbuf.data_off;
+b->mbuf.data_off = v;
+int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
+
+/* When dealing with DPDK mbufs, keep data_off and data_len in sync.
+ * Thus, update data_len if the length changes with the move of
+ * data_off. However, if data_len is 0, there's no data to move and
+ * data_len should remain 0. */
+
+if (b->mbuf.data_len != 0) {
+b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
+   b->mbuf.buf_len - 

[ovs-dev] [PATCH v9 04/14] netdev-dpdk: Serialise non-pmds mbufs' alloc/free.

2018-08-24 Thread Tiago Lam
A new mutex, 'nonpmd_mp_mutex', has been introduced to serialise
allocation and free operations by non-pmd threads on a given mempool.

free_dpdk_buf() has been modified to make use of the introduced mutex.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ebd55e9..aee8e20 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -322,6 +322,16 @@ static struct ovs_mutex dpdk_mp_mutex 
OVS_ACQ_AFTER(dpdk_mutex)
 static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
 = OVS_LIST_INITIALIZER(_mp_list);
 
+/* This mutex must be used by non pmd threads when allocating or freeing
+ * mbufs through mempools, when outside of the `non_pmd_mutex` mutex, in struct
+ * dp_netdev.
+ * The reason, as pointed out in the "Known Issues" section in DPDK's EAL docs,
+ * is that the implementation on which mempool is based off is non-preemptable.
+ * Since non-pmds may end up not being pinned this could lead to the preemption
+ * between non-pmds performing operations on the same mempool, which could lead
+ * to memory corruption. */
+static struct ovs_mutex nonpmd_mp_mutex = OVS_MUTEX_INITIALIZER;
+
 struct dpdk_mp {
  struct rte_mempool *mp;
  int mtu;
@@ -492,6 +502,8 @@ struct netdev_rxq_dpdk {
 dpdk_port_t port_id;
 };
 
+static bool dpdk_thread_is_pmd(void);
+
 static void netdev_dpdk_destruct(struct netdev *netdev);
 static void netdev_dpdk_vhost_destruct(struct netdev *netdev);
 
@@ -525,6 +537,12 @@ dpdk_buf_size(int mtu)
  NETDEV_DPDK_MBUF_ALIGN);
 }
 
+static bool
+dpdk_thread_is_pmd(void)
+{
+ return rte_lcore_id() != NON_PMD_CORE_ID;
+}
+
 /* Allocates an area of 'sz' bytes from DPDK.  The memory is zero'ed.
  *
  * Unlike xmalloc(), this function can return NULL on failure. */
@@ -535,11 +553,20 @@ dpdk_rte_mzalloc(size_t sz)
 }
 
 void
-free_dpdk_buf(struct dp_packet *p)
+free_dpdk_buf(struct dp_packet *packet)
 {
-struct rte_mbuf *pkt = (struct rte_mbuf *) p;
+/* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_lock(_mp_mutex);
+
+rte_pktmbuf_free(>mbuf);
+
+ovs_mutex_unlock(_mp_mutex);
+
+return;
+}
 
-rte_pktmbuf_free(pkt);
+rte_pktmbuf_free(>mbuf);
 }
 
 static void
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 03/14] dp-packet: Fix allocated size on DPDK init.

2018-08-24 Thread Tiago Lam
When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.

In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf.  Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.

To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 3 +--
 lib/dp-packet.h   | 2 +-
 lib/netdev-dpdk.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..782e7c2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -99,9 +99,8 @@ dp_packet_use_const(struct dp_packet *b, const void *data, 
size_t size)
  * buffer.  Here, non-transient ovs dp-packet fields are initialized for
  * packets that are part of a DPDK memory pool. */
 void
-dp_packet_init_dpdk(struct dp_packet *b, size_t allocated)
+dp_packet_init_dpdk(struct dp_packet *b)
 {
-dp_packet_set_allocated(b, allocated);
 b->source = DPBUF_DPDK;
 }
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b948fe1..6376039 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -114,7 +114,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t);
 void dp_packet_use_stub(struct dp_packet *, void *, size_t);
 void dp_packet_use_const(struct dp_packet *, const void *, size_t);
 
-void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
+void dp_packet_init_dpdk(struct dp_packet *);
 
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0cd9ff6..ebd55e9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -550,7 +550,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
 {
 struct rte_mbuf *pkt = _p;
 
-dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
+dp_packet_init_dpdk((struct dp_packet *) pkt);
 }
 
 static int
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v9 01/14] netdev-dpdk: fix mbuf sizing

2018-08-24 Thread Tiago Lam
From: Mark Kavanagh 

There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
  buffer alignment (typically 1024B). So, for example, in order to
  successfully receive and capture a 1500B packet, mbufs with a
  data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
  * the dp packet, which includes a struct rte mbuf (704B)
  * RTE_PKTMBUF_HEADROOM (128B)
  * packet data (aligned to 1k, as previously described)
  * RTE_PKTMBUF_TAILROOM (typically 0)

Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:

mbuf.data_len == mbuf.buf_len - mbuf.data_off

This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.

Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla 
Signed-off-by: Mark Kavanagh 
Acked-by: Santosh Shukla 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 56 +--
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ac02a09..0cd9ff6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
 #define FRAME_LEN_TO_MTU(frame_len) ((frame_len)\
  - ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu)  ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM),   \
- RTE_CACHE_LINE_SIZE)
 #define NETDEV_DPDK_MBUF_ALIGN  1024
 #define NETDEV_DPDK_MAX_PKT_LEN 9728
 
@@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 char mp_name[RTE_MEMPOOL_NAMESIZE];
 const char *netdev_name = netdev_get_name(>up);
 int socket_id = dev->requested_socket_id;
-uint32_t n_mbufs;
+uint32_t n_mbufs = 0;
+uint32_t mbuf_size = 0;
+uint32_t aligned_mbuf_size = 0;
+uint32_t mbuf_priv_data_len = 0;
+uint32_t pkt_size = 0;
 uint32_t hash = hash_string(netdev_name, 0);
 struct dpdk_mp *dmp = NULL;
 int ret;
@@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 dmp->mtu = mtu;
 dmp->refcount = 1;
 
+/* Get the size of each mbuf, based on the MTU */
+mbuf_size = dpdk_buf_size(dev->requested_mtu);
+
 n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
 
 do {
@@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
  * so this is not an issue for tasks such as debugging.
  */
 ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
-   "ovs%08x%02d%05d%07u",
-   hash, socket_id, mtu, n_mbufs);
+   "ovs%08x%02d%05d%07u",
+hash, socket_id, mtu, n_mbufs);
 if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
 VLOG_DBG("snprintf returned %d. "
  "Failed to generate a mempool name for \"%s\". "
@@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 break;
 }
 
-VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
-  "on socket %d for %d Rx and %d Tx queues.",
-  netdev_name, n_mbufs, socket_id,
-  dev->requested_n_rxq, dev->requested_n_txq);
-
-dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
-  MP_CACHE_SZ,
-  sizeof (struct dp_packet)
-  - sizeof (struct rte_mbuf),
-  MBUF_SIZE(mtu)
-  - sizeof(struct dp_packet),
+VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
+  "on socket %d for %d Rx and %d Tx queues, "
+  "cache line size of %u",
+  netdev_name, n_mbufs, 

[ovs-dev] [PATCH v9 00/14] Support multi-segment mbufs

2018-08-24 Thread Tiago Lam
Overview

This patchset introduces support for multi-segment mbufs to OvS-DPDK.
Multi-segment mbufs are typically used when the size of an mbuf is
insufficient to contain the entirety of a packet's data. Instead, the
data is split across numerous mbufs, each carrying a portion, or
'segment', of the packet data. Mbufs are chained via their 'next'
attribute (an mbuf pointer).

The main motivation behind the support for multi-segment mbufs is to
later introduce TSO (use case i. below) / GRO in OvS-DPDK, which is
planned to be introduced after this series.

Use Cases
=
i.  Handling oversized (guest-originated) frames, which are marked
for hardware accelration/offload (TSO, for example).

Packets which originate from a non-DPDK source may be marked for
offload; as such, they may be larger than the permitted ingress
interface's MTU, and may be stored in an oversized dp-packet. In
order to transmit such packets over a DPDK port, their contents
must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
its current implementation, that function only copies data into
a single mbuf; if the space available in the mbuf is exhausted,
but not all packet data has been copied, then it is lost.
Similarly, when cloning a DPDK mbuf, it must be considered
whether that mbuf contains multiple segments. Both issues are
resolved within this patchset.

ii. Handling jumbo frames.

While OvS already supports jumbo frames, it does so by increasing
mbuf size, such that the entirety of a jumbo frame may be handled
in a single mbuf. This is certainly the preferred, and most
performant approach (and remains the default).

Enabling multi-segment mbufs

Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.

This is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Performance notes (based on v8, 1st non-RFC)
=
In order to test for regressions in performance, tests were run on top
of master 88125d6 and v8 of this patchset, both with the multi-segment
mbufs option enabled and disabled.

VSperf was used to run the phy2phy_cont and pvp_cont tests with varying
packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.

Test | Size | Master | Multi-seg disabled | Multi-seg enabled
-
p2p  |  64  | ~22.7  |  ~22.65|   ~18.3
p2p  | 1500 |  ~1.6  |~1.6|~1.6
p2p  | 7000 | ~0.36  |   ~0.36|   ~0.36
pvp  |  64  |  ~6.7  |~6.7|~6.3
pvp  | 1500 |  ~1.6  |~1.6|~1.6
pvp  | 7000 | ~0.36  |   ~0.36|   ~0.36

Packet size is in bytes, while all packet rates are reported in mpps
(aggregated).

No noticeable regression has been observed (certainly everything is
within the ± 5% margin of existing performance), aside from the 64B
packet size case when multi-segment mbuf is enabled. This is
expected, however, because of how Tx vectoriszed functions are
incompatible with multi-segment mbufs on some PMDs. The PMD under
use during these tests was the i40e (on a Intel X710 NIC), which
indeed doesn't support vectorized Tx functions with multi-segment
mbufs.

---
v9: - Rebase on master e4e2009 ("tunnel, tests: Sort flow output in ERSPAN
  v1/v2 metadata");
- Simplify patch 09/14. The functions introduced in packets.c were dropped
  so the code in netdev-native-tnl.c remains largely the same. These can be
  introduced at a later time, if needed (maybe when csum across segmented
  data is introduced);

v8: - Rebase on master 1dd218a ("ovsdb-idl: Fix recently introduced Python 3
  tests.");
- Address Ian's comment:
  - Fix sparse warnings on patch 07/14 and 12/14 by allocating memory
dynamically.
- Address Ilya's comments:
  - netdev_linux_tap_batch_send() and udp_extract_tnl_md() now linearize
the data before hand, beforing write()'ing or performing the checksums
in the data;
  - Some other cases have been found and adapted; The new patch 09/14
introduced in the series is where the "linearization" logic is
introduced and, as a consequence, some users of the dp_packet API,

Re: [ovs-dev] [PATCH 2/2] vswitch.xml: Fix type of dpdk-init key.

2018-08-24 Thread Kevin Traynor
On 08/24/2018 04:18 PM, Ilya Maximets wrote:
> This adds available modes to the man page.
> 
> CC: Kevin Traynor 
> Fixes: 6d947d508a51 ("vswitch.xml: Update dpdk-init documentation.")
> Signed-off-by: Ilya Maximets 

Thanks for catching here and in roundrobin patch.

Acked-by: Kevin Traynor 

> ---
>  vswitchd/vswitch.xml | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 570c1619f..fefcf741d 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -225,7 +225,8 @@
>
>  
> -  type='{"type": "string"}'>
> +  type='{"type": "string",
> + "enum": ["set", ["false", "true", "try"]]}'>
>  
>Set this value to true or try to enable
>runtime support for DPDK ports. The vswitch must have compile-time
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] vswitch.xml: Fix type of dpdk-init key.

2018-08-24 Thread Ilya Maximets
This adds available modes to the man page.

CC: Kevin Traynor 
Fixes: 6d947d508a51 ("vswitch.xml: Update dpdk-init documentation.")
Signed-off-by: Ilya Maximets 
---
 vswitchd/vswitch.xml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 570c1619f..fefcf741d 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -225,7 +225,8 @@
   
 
   
+  type='{"type": "string",
+ "enum": ["set", ["false", "true", "try"]]}'>
 
   Set this value to true or try to enable
   runtime support for DPDK ports. The vswitch must have compile-time
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] vswitch.xml: Fix key type and description style of tc-policy.

2018-08-24 Thread Ilya Maximets
The set of supported values specified.
Style fixed to look good in man page. Fixed indents.

CC: Paul Blakey 
Fixes: 691d20cbdcf3 ("other-config: Add tc-policy switch to
  control tc flower flag")
Signed-off-by: Ilya Maximets 
---
 vswitchd/vswitch.xml | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index a858e8ef4..570c1619f 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -201,16 +201,23 @@
   
 
   
-
-Specified the policy used with HW offloading.
-Options:
-none- Add software rule and offload rule to 
HW.
-skip_sw - Offload rule to HW only.
-skip_hw - Add software rule without offloading 
rule to HW.
-
-
-This is only relevant if HW offloading is enabled (hw-offload).
+  type='{"type": "string",
+ "enum": ["set", ["none", "skip_sw", "skip_hw"]]}'>
+
+  Specified the policy used with HW offloading.
+  Options:
+  
+none
+Add software rule and offload rule to HW.
+skip_sw
+Offload rule to HW only.
+skip_hw
+Add software rule without offloading rule to HW.
+  
+
+
+  This is only relevant if
+   is enabled.
 
 
   The default value is none.
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Add round-robin based rxq to pmd assignment.

2018-08-24 Thread Ilya Maximets
I'd suggest following incremental with mostly style fixes:

-
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e886acd3f..d39ac3779 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4272,14 +4272,12 @@ rr_numa_list_populate(struct dp_netdev *dp, struct 
rr_numa_list *rr)
 /*
  * Returns the next pmd from the numa node.
  *
- * If updown is 'true' it will alternate between
- * selecting the next pmd in either an up or down
- * walk, switching between up/down when the first
- * or last core is reached. e.g. 1,2,3,3,2,1,1,2...
+ * If 'updown' is 'true' it will alternate between selecting the next pmd in
+ * either an up or down walk, switching between up/down when the first or last
+ * core is reached. e.g. 1,2,3,3,2,1,1,2...
  *
- * If updown is 'false' it will select the next pmd
- * wrapping around when last core reached.
- * e.g. 1,2,3,1,2,3,1,2...
+ * If 'updown' is 'false' it will select the next pmd wrapping around when last
+ * core reached. e.g. 1,2,3,1,2,3,1,2...
  */
 static struct dp_netdev_pmd_thread *
 rr_numa_get_pmd(struct rr_numa *numa, bool updown)
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index a858e8ef4..2439b249d 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -425,14 +425,18 @@
   
 
   
-
-Specifies how RX queues will be automatically assigned to CPU
-cores. Options:
-cycles - Rxqs will be sorted by order of measured
-processing cycles before being assigned to CPU cores.
-roundrobin  - Rxqs will be round-robined across
-CPU cores.
+  type='{"type": "string",
+ "enum": ["set", ["cycles", "roundrobin"]]}'>
+
+  Specifies how RX queues will be automatically assigned to CPU cores.
+  Options:
+  
+cycles
+Rxqs will be sorted by order of measured processing cycles
+before being assigned to CPU cores.
+roundrobin
+Rxqs will be round-robined across CPU cores.
+  
 
 
   The default value is cycles.
-

One more comment inline.

Best regards, Ilya Maximets.

On 23.08.2018 20:07, Kevin Traynor wrote:
> Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
> (i.e. CPUs) was done by round-robin.
> 
> That was changed in OVS 2.9 to ordering the Rxqs based on
> their measured processing cycles. This was to assign the
> busiest Rxqs to different PMDs, improving aggregate
> throughput.
> 
> For the most part the new scheme should be better, but
> there could be situations where a user prefers a simple
> round-robin scheme because Rxqs from a single port are
> more likely to be spread across multiple PMDs, and/or
> traffic is very bursty/unpredictable.
> 
> Add 'pmd-rxq-assign' config to allow a user to select
> round-robin based assignment.
> 
> Signed-off-by: Kevin Traynor 
> ---
> 
> V2:
> - simplified nextpmd change (Eelco)
> - removed confusing doc sentence (Eelco)
> - fixed commit msg (Ilya)
> - made change in pmd-rxq-assign value also perform re-assignment (Ilya)
> - renamed to roundrobin mode (Ilya)
> - moved vswitch.xml changes to right config section (Ilya)
> - comment/log updates
> - moved NEWS update to separate patch as it's been changing on master
> 
>  Documentation/topics/dpdk/pmd.rst | 33 +---
>  lib/dpif-netdev.c | 80 
> +--
>  tests/pmd.at  | 12 +-
>  vswitchd/vswitch.xml  | 20 ++
>  4 files changed, 116 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/pmd.rst 
> b/Documentation/topics/dpdk/pmd.rst
> index 5f0671e..dd9172d 100644
> --- a/Documentation/topics/dpdk/pmd.rst
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -113,10 +113,15 @@ means that this thread will only poll the *pinned* Rx 
> queues.
>  
>  If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to 
> PMDs
> -(cores) automatically. Where known, the processing cycles that have been 
> stored
> -for each Rx queue will be used to assign Rx queue to PMDs based on a round
> -robin of the sorted Rx queues. For example, take the following example, where
> -there are five Rx queues and three cores - 3, 7, and 8 - available and the
> -measured usage of core cycles per Rx queue over the last interval is seen to
> -be:
> +(cores) automatically.
> +
> +The algorithm used to automatically assign Rxqs to PMDs can be set by::
> +
> +$ ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=
> +
> +By default, ``cycles`` assignment is used where the Rxqs will be ordered by
> +their measured processing cycles, and then be evenly assigned in descending
> +order to PMDs based on an up/down walk of the PMDs. For example, where there
> +are five Rx queues and three cores - 

[ovs-dev] Physical device and virtual device in OVS

2018-08-24 Thread rakesh kumar
1.How OVS handles the packet coming from Physical device or virtual device 
means how it differentiate from both . Please give your views.

Regards
Rakesh Kumar
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/1] netdev-vport: reject concomitant incompatible tunnels

2018-08-24 Thread Eelco Chaudron



On 24 Aug 2018, at 11:38, Eelco Chaudron wrote:


On 10 Jul 2018, at 23:45, Ben Pfaff wrote:


On Fri, Jun 01, 2018 at 07:03:55PM +0200, Eelco Chaudron wrote:

"sparse" doesn't like the initialization strategy:

../lib/netdev-vport.c:1207:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1211:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1215:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1219:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1220:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1221:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1225:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1229:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1261:13: error: Using plain integer as NULL 
pointer




Hi Ben some of this is existing code, however, which version of Sparse 
are you using, as I’m not getting any error?




Ignore, as I can see them trough Travis with the same version. Must be 
my setup :(


Guess it’s time for weekend…

//Eelco
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/1] netdev-vport: reject concomitant incompatible tunnels

2018-08-24 Thread Eelco Chaudron



On 10 Jul 2018, at 23:45, Ben Pfaff wrote:


On Fri, Jun 01, 2018 at 07:03:55PM +0200, Eelco Chaudron wrote:

This patch will make sure VXLAN tunnels with and without the group
based policy (GBP) option enabled can not coexist on the same
destination UDP port.

In theory, VXLAN tunnel with and without GBP enables can be
multiplexed on the same UDP port as long as different VNI's are
used. However currently OVS does not support this, hence this patch 
to

check for this condition.

Signed-off-by: Eelco Chaudron 


Thanks for the patch, and sorry that I'm so slow.

Does this support the case where, in a single database transaction, a
GBP VXLAN tunnel is removed and a non-GBP VXLAN tunnel is created that
would otherwise interfere with one another (and the converse case)?  
If

so, could that be included in the test?

"sparse" doesn't like the initialization strategy:

../lib/netdev-vport.c:1207:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1211:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1215:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1219:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1220:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1221:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1225:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1229:9: error: Using plain integer as NULL 
pointer
../lib/netdev-vport.c:1261:13: error: Using plain integer as NULL 
pointer




Hi Ben some of this is existing code, however, which version of Sparse 
are you using, as I’m not getting any error?



Thanks,

Ben.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] porting: Support for kernel 4.16.x & 4.17.x

2018-08-24 Thread Yifeng Sun
Thanks Greg for your review and testing.

Yifeng

On Thu, Aug 23, 2018 at 10:17 PM, Gregory Rose  wrote:

> On 8/21/2018 7:42 AM, Yifeng Sun wrote:
>
> Add support for kernel version up to 4.17.x. On Travis, build passed
> for all kernel versions. And no new test fails are introduced by this
> patch.
>
> Cleaned up file datapath/linux/compat/include/net/ip6_fib.h which
> has no effect to kernel module but brings complexity to porting.
>
> Signed-off-by: Yifeng Sun  
>
>
> LGTM.
>
> Travis is happy:
>
> https://travis-ci.org/gvrose8192/ovs-experimental/builds/419847640
>
> I ran check-kmod on a 4.16.13 kernel and got one transient error but
> everything else looks good there
> too.
>
> ## - ##
> ## Test results. ##
> ## - ##
>
> ERROR: 116 tests were run,
> 1 failed unexpectedly.
> 14 tests were skipped.
> ## -- ##
> ## system-kmod-testsuite.log was created. ##
> ## -- ##
>
> Please send `tests/system-kmod-testsuite.log' and all information you
> think might help:
>
>To:  
>Subject: [openvswitch 2.10.90] system-kmod-testsuite: 109 failed
>
> Nice work, thanks!!!
>
> Reviewed-by: Greg Rose  
> Tested-by: Greg Rose  
>
>
> ---
>  .travis.yml  |  2 ++
>  Documentation/faq/releases.rst   |  2 +-
>  NEWS |  2 +-
>  acinclude.m4 |  6 ++--
>  datapath/linux/Modules.mk|  1 -
>  datapath/linux/compat/include/net/dst_metadata.h | 14 ++---
>  datapath/linux/compat/include/net/ip6_fib.h  | 40 
> 
>  7 files changed, 15 insertions(+), 52 deletions(-)
>  delete mode 100644 datapath/linux/compat/include/net/ip6_fib.h
>
> diff --git a/.travis.yml b/.travis.yml
> index 998b33d..21447b5 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -35,6 +35,8 @@ env:
>- BUILD_ENV="-m32" OPTS="--disable-ssl"
>- KERNEL=3.16.54 DPDK=1
>- KERNEL=3.16.54 DPDK=1 OPTS="--enable-shared"
> +  - KERNEL=4.17.14
> +  - KERNEL=4.16.18
>- KERNEL=4.15.18
>- KERNEL=4.14.63
>- KERNEL=4.9.120
> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
> index e64fa22..41d41e3 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -67,7 +67,7 @@ Q: What Linux kernel versions does each Open vSwitch 
> release work with?
>  2.7.x3.10 to 4.9
>  2.8.x3.10 to 4.12
>  2.9.x3.10 to 4.13
> -2.10.x   3.10 to 4.15
> +2.10.x   3.10 to 4.17
>   ==
>
>  Open vSwitch userspace should also work with the Linux kernel module 
> built
> diff --git a/NEWS b/NEWS
> index 8987f9a..5bab26d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -4,7 +4,7 @@ Post-v2.10.0
>   as the default syslog method.
> - The environment variable OVS_CTL_TIMEOUT, if set, is now used
>   as the default timeout for control utilities.
> -
> +   - Support for the kernel versions 4.16.x and 4.17.x.
>
>  v2.10.0 - xx xxx 
>  -
> diff --git a/acinclude.m4 b/acinclude.m4
> index a6a0e9a..ab141bd 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -151,10 +151,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [
>  AC_MSG_RESULT([$kversion])
>
>  if test "$version" -ge 4; then
> -   if test "$version" = 4 && test "$patchlevel" -le 15; then
> +   if test "$version" = 4 && test "$patchlevel" -le 17; then
>: # Linux 4.x
> else
> -  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but 
> version newer than 4.15.x is not supported (please refer to the FAQ for 
> advice)])
> +  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but 
> version newer than 4.17.x is not supported (please refer to the FAQ for 
> advice)])
> fi
>  elif test "$version" = 3 && test "$patchlevel" -ge 10; then
> : # Linux 3.x
> @@ -803,8 +803,6 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
>[OVS_DEFINE(HAVE_NF_CONNTRACK_HELPER_PUT)])
>
> OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h],:space:]]]SKB_GSO_UDP[[[:space:,
>[OVS_DEFINE([HAVE_SKB_GSO_UDP])])
> -  OVS_GREP_IFELSE([$KSRC/include/net/dst.h],[DST_NOCACHE],
> -  [OVS_DEFINE([HAVE_DST_NOCACHE])])
>OVS_FIND_FIELD_IFELSE([$KSRC/include/net/rtnetlink.h], [rtnl_link_ops],
>  [extack],
>[OVS_DEFINE([HAVE_EXT_ACK_IN_RTNL_LINKOPS])])
> diff --git a/datapath/linux/Modules.mk b/datapath/linux/Modules.mk
> index 2fec650..b06ca15 100644
> --- a/datapath/linux/Modules.mk
> +++ b/datapath/linux/Modules.mk
> @@ -82,7 +82,6 @@ openvswitch_headers += \
>   linux/compat/include/net/inetpeer.h \
>   linux/compat/include/net/ip.h \
>   linux/compat/include/net/ip_tunnels.h \
> - linux/compat/include/net/ip6_fib.h \
>   

Re: [ovs-dev] [PATCH v2 1/2] dpif-netdev: Add round-robin based rxq to pmd assignment.

2018-08-24 Thread Eelco Chaudron




On 23 Aug 2018, at 19:07, Kevin Traynor wrote:


Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
(i.e. CPUs) was done by round-robin.

That was changed in OVS 2.9 to ordering the Rxqs based on
their measured processing cycles. This was to assign the
busiest Rxqs to different PMDs, improving aggregate
throughput.

For the most part the new scheme should be better, but
there could be situations where a user prefers a simple
round-robin scheme because Rxqs from a single port are
more likely to be spread across multiple PMDs, and/or
traffic is very bursty/unpredictable.

Add 'pmd-rxq-assign' config to allow a user to select
round-robin based assignment.


Thanks for doing a v2, changes look fine to me.

Acked-by: Eelco Chaudron 


Signed-off-by: Kevin Traynor 
---

V2:
- simplified nextpmd change (Eelco)
- removed confusing doc sentence (Eelco)
- fixed commit msg (Ilya)
- made change in pmd-rxq-assign value also perform re-assignment 
(Ilya)

- renamed to roundrobin mode (Ilya)
- moved vswitch.xml changes to right config section (Ilya)
- comment/log updates
- moved NEWS update to separate patch as it's been changing on master

 Documentation/topics/dpdk/pmd.rst | 33 +---
 lib/dpif-netdev.c | 80 
+--

 tests/pmd.at  | 12 +-
 vswitchd/vswitch.xml  | 20 ++
 4 files changed, 116 insertions(+), 29 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst

index 5f0671e..dd9172d 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -113,10 +113,15 @@ means that this thread will only poll the 
*pinned* Rx queues.


 If ``pmd-rxq-affinity`` is not set for Rx queues, they will be 
assigned to PMDs
-(cores) automatically. Where known, the processing cycles that have 
been stored
-for each Rx queue will be used to assign Rx queue to PMDs based on a 
round
-robin of the sorted Rx queues. For example, take the following 
example, where
-there are five Rx queues and three cores - 3, 7, and 8 - available 
and the
-measured usage of core cycles per Rx queue over the last interval is 
seen to

-be:
+(cores) automatically.
+
+The algorithm used to automatically assign Rxqs to PMDs can be set 
by::

+
+$ ovs-vsctl set Open_vSwitch . 
other_config:pmd-rxq-assign=

+
+By default, ``cycles`` assignment is used where the Rxqs will be 
ordered by
+their measured processing cycles, and then be evenly assigned in 
descending
+order to PMDs based on an up/down walk of the PMDs. For example, 
where there
+are five Rx queues and three cores - 3, 7, and 8 - available and the 
measured
+usage of core cycles per Rx queue over the last interval is seen to 
be:


 - Queue #0: 30%
@@ -132,4 +137,20 @@ The Rx queues will be assigned to the cores in 
the following order::

 Core 8: Q3 (60%) | Q0 (30%)

+Alternatively, ``roundrobin`` assignment can be used, where the Rxqs 
are
+assigned to PMDs in a round-robined fashion. This algorithm was used 
by
+default prior to OVS 2.9. For example, given the following ports and 
queues:

+
+- Port #0 Queue #0 (P0Q0)
+- Port #0 Queue #1 (P0Q1)
+- Port #1 Queue #0 (P1Q0)
+- Port #1 Queue #1 (P1Q1)
+- Port #1 Queue #2 (P1Q2)
+
+The Rx queues may be assigned to the cores in the following order::
+
+Core 3: P0Q0 | P1Q1
+Core 7: P0Q1 | P1Q2
+Core 8: P1Q0 |
+
 To see the current measured usage history of PMD core cycles for each 
Rx

 queue::
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7f836bb..e4ccd33 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -342,4 +342,6 @@ struct dp_netdev {
 struct id_pool *tx_qid_pool;
 struct ovs_mutex tx_qid_pool_mutex;
+/* Use measured cycles for rxq to pmd assignment. */
+bool pmd_rxq_assign_cyc;

 /* Protects the access of the 'struct dp_netdev_pmd_thread'
@@ -1493,4 +1495,5 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,


 cmap_init(>poll_threads);
+dp->pmd_rxq_assign_cyc = true;

 ovs_mutex_init(>tx_qid_pool_mutex);
@@ -3717,4 +3720,6 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)

 struct dp_netdev *dp = get_dp_netdev(dpif);
 const char *cmask = smap_get(other_config, "pmd-cpu-mask");
+const char *pmd_rxq_assign = smap_get_def(other_config, 
"pmd-rxq-assign",

+ "cycles");
 unsigned long long insert_prob =
 smap_get_ullong(other_config, "emc-insert-inv-prob",
@@ -3779,4 +3784,13 @@ dpif_netdev_set_config(struct dpif *dpif, const 
struct smap *other_config)

 }
 }
+
+bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles");
+if (pmd_rxq_assign_cyc || !strcmp(pmd_rxq_assign, "roundrobin")) 
{

+if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) {
+dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc;
+VLOG_INFO("Rxq to PMD assignment mode: \'%s\'.",