Re: [ovs-dev] [PATCH V5 1/2] netdev-dpdk: DPDK v17.11 upgrade

2017-12-07 Thread Tiwei Bie
On Thu, Dec 07, 2017 at 05:45:36PM +, Mark Kavanagh wrote:
> This commit adds support for DPDK v17.11:
> - minor updates to accomodate DPDK API changes
> - update references to DPDK version in Documentation
> - update DPDK version in travis' linux-build script
> - document DPDK v17.11 virtio driver bug
> 
> Signed-off-by: Mark Kavanagh 
> Acked-by: Maxime Coquelin 
> Acked-by: Ciara Loftus 
> Acked-by: Jan Scheurich 
> Tested-by: Jan Scheurich 
> Tested-by: Guoshuai Li 
> ---
[...]
>  
> -.. _DPDK release notes: 
> http://dpdk.org/doc/guides/rel_notes/release_17_05.html
> +.. _DPDK release notes: 
> http://dpdk.org/doc/guides/rel_notes/release_17_11.html
> +
> +- The DPDK v17.11 virtio net driver contains a bug that prevents guest DPDK
> +  applications from receiving packets. This only occurs when guest-bound 
> traffic
> +  is live before a DPDK application is started within the guest, and where 
> two
> +  or more forwarding cores are used. As such, it is not recommended for 
> guests
> +  which execute DPDK applications to upgrade to DPDK v17.11.
> +
> +  Note that this issue does not occur when guest network devices are 
> controlled
> +  by the guest kernel.
> +
> +  Details regarding the virtio driver bug are available on the `DPDK mailing
> +  list`_.
> +
> +.. _DPDK mailing list: 
> http://dpdk.org/ml/archives/dev/2017-December/082801.html
>  
>  Reporting Bugs
>  --
> diff --git a/Documentation/topics/dpdk/ring.rst 
> b/Documentation/topics/dpdk/ring.rst
> index ad9d7a5..8d0ede8 100644
> --- a/Documentation/topics/dpdk/ring.rst
> +++ b/Documentation/topics/dpdk/ring.rst
> @@ -77,4 +77,4 @@ DPDK. However, this functionality was removed because:
>  - :doc:`vhost-user interfaces ` are the defacto DPDK-based path 
> to
>guests
>  
> -.. _DPDK documentation: 
> https://dpdk.readthedocs.io/en/v17.05/prog_guide/ring_lib.html
> +.. _DPDK documentation: 
> https://dpdk.readthedocs.io/en/v17.11/prog_guide/ring_lib.html
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 74ac06e..1a5de24 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -287,6 +287,11 @@ application in the VM.
>  
>Support for DPDK in the guest requires QEMU >= 2.2
>  
> +.. important::
> +
> +  DPDK v17.11 virtio PMD contains a bug that affects testpmd/DPDK guest
> +  applications. As such, guests should not upgrade to 17.11.
> +

I'm not very familiar with the use case related to ovs.
The bug you mentioned only affects the vector Rx of virtio
PMD, while the vector Rx won't be enabled by default, as
it has the most limitations among the three Rx functions
in virtio PMD, e.g. it's not able to handle the mergeable
rxbuf (VIRTIO_NET_F_MRG_RXBUF) which is enabled by default
and commonly used. But I'm not sure whether the vector Rx
of virtio PMD is commonly used in ovs related cases.

If it's really necessary to document a PMD bug of DPDK in
ovs, I think we should provide more accurate info in this
warning, e.g. warn users not to use vector Rx when using
virtio PMD instead of warning them not to use DPDK 17.11.

Thanks,
Tiwei
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v3 2/8] lib/dp-packet: init specific mbuf fields to 0

2017-12-07 Thread Michael Qiu

2017/11/22 2:29, Mark Kavanagh :


dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.

This patch ensures that the following mbuf fields are initialized to 0,
on creation of a new dp_packet:
- ol_flags
- nb_segs
- tx_offload
- packet_type

Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/

Signed-off-by: Mark Kavanagh 


Acked-by: Michael Qiu 


---
  lib/dp-packet.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b4b721c..7aa440f 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p 
OVS_UNUSED)
  
  /* This initialization is needed for packets that do not come

   * from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * The DPDK rte library will still otherwise manage the mbuf. */
  static inline void
  dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
  {
  #ifdef DPDK_NETDEV
-p->mbuf.ol_flags = 0;
+struct rte_mbuf *mbuf = &(p->mbuf);
+mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0;
  #endif
  }
  


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Bonjour,

2017-12-07 Thread Mme Ginette BOURGEOIS
 Bonjour,
Je vous prie de me confirmer l'accusé de réception du mail, si j'ai 
pris contact avec vous ,c'est concernant un projet humanitaire que je 
tiens à réaliser dans votre pays car je tiens a mettre à votre 
disposition une somme de Un Million Cinq Cent Mille Euros que je 
dispose au niveau de la Banque UBA BENIN pour la réalisation d'un 
projet  humanitaire. Pour ce fait œuvre humanitaire constituera a aidé 
les plus âgées et pauvre ainsi que les orphelins et les non abrités en 
leur construisant des maisons, mosquées, écoles. Etc. En effet je 
suis une femme en hospitalisation car je suis souffrante d'une ischémie 
en phase terminale de son apogée. Je suis stérile et je n'ai pas eu 
d'enfants .Raison pour laquelle la providence m'oriente vers vous. 
Espérant vous lire afin de vous donner plus d'amples d'explication sur 
ce projet.
Mme Ginette
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 13/13] log: Add async commit support.

2017-12-07 Thread Ben Pfaff
The OVSDB log code has always had the ability to commit the log to disk and
wait for the commit to finish.  This patch introduces a new feature that
allows the client to start a commit in the background and then to determine
asynchronously that the commit has completed.  This will be especially
useful later for the distributed database feature.

Signed-off-by: Ben Pfaff 
---
 ovsdb/file.c   |   4 +-
 ovsdb/log.c| 152 +++--
 ovsdb/log.h|   7 ++-
 ovsdb/ovsdb-tool.c |   2 +-
 tests/test-ovsdb.c |   2 +-
 5 files changed, 158 insertions(+), 9 deletions(-)

diff --git a/ovsdb/file.c b/ovsdb/file.c
index 094fa0ce207a..40fecdf59af5 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -651,7 +651,7 @@ ovsdb_file_compact(struct ovsdb_file *file)
 
 /* Commit the old version, so that we can be assured that we'll eventually
  * have either the old or the new version. */
-error = ovsdb_log_commit(file->log);
+error = ovsdb_log_commit_block(file->log);
 if (error) {
 goto exit;
 }
@@ -854,7 +854,7 @@ ovsdb_file_txn_commit(struct json *json, const char 
*comment,
 }
 
 if (durable) {
-error = ovsdb_log_commit(log);
+error = ovsdb_log_commit_block(log);
 if (error) {
 return ovsdb_wrap_error(error, "committing transaction failed");
 }
diff --git a/ovsdb/log.c b/ovsdb/log.c
index e364667ac673..e2c58d1cc8b1 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -24,12 +24,17 @@
 #include 
 #include 
 
+#include "lockfile.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/json.h"
 #include "openvswitch/vlog.h"
-#include "lockfile.h"
-#include "ovsdb.h"
+#include "ovs-atomic.h"
+#include "ovs-rcu.h"
+#include "ovs-thread.h"
 #include "ovsdb-error.h"
+#include "ovsdb.h"
+#include "openvswitch/poll-loop.h"
+#include "seq.h"
 #include "sha1.h"
 #include "socket-util.h"
 #include "transaction.h"
@@ -78,6 +83,7 @@ struct ovsdb_log {
 struct lockfile *lockfile;
 FILE *stream;
 off_t base;
+struct afsync *afsync;
 };
 
 /* Whether the OS supports renaming open files.
@@ -95,6 +101,9 @@ static bool parse_header(char *header, const char **magicp,
  uint8_t sha1[SHA1_DIGEST_SIZE]);
 static bool is_magic_ok(const char *needle, const char *haystack);
 
+static struct afsync *afsync_create(int fd, uint64_t initial_ticket);
+static uint64_t afsync_destroy(struct afsync *);
+
 /* Attempts to open 'name' with the specified 'open_mode'.  On success, stores
  * the new log into '*filep' and returns NULL; otherwise returns NULL and
  * stores NULL into '*filep'.
@@ -261,6 +270,7 @@ ovsdb_log_open(const char *name, const char *magic,
 file->prev_offset = 0;
 file->offset = 0;
 file->base = 0;
+file->afsync = NULL;
 *filep = file;
 return NULL;
 
@@ -300,6 +310,7 @@ ovsdb_log_close(struct ovsdb_log *file)
 {
 if (file) {
 ovsdb_error_destroy(file->error);
+afsync_destroy(file->afsync);
 free(file->name);
 free(file->display_name);
 free(file->magic);
@@ -606,8 +617,10 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json 
*json)
 return NULL;
 }
 
+/* Attempts to commit 'file' to disk.  Waits for the commit to succeed or fail.
+ * Returns NULL if successful, otherwise the error that occurred. */
 struct ovsdb_error *
-ovsdb_log_commit(struct ovsdb_log *file)
+ovsdb_log_commit_block(struct ovsdb_log *file)
 {
 if (file->stream && fsync(fileno(file->stream))) {
 return ovsdb_io_error(errno, "%s: fsync failed", file->display_name);
@@ -712,7 +725,7 @@ ovsdb_rename(const char *old, const char *new)
 struct ovsdb_error * OVS_WARN_UNUSED_RESULT
 ovsdb_log_replace_commit(struct ovsdb_log *old, struct ovsdb_log *new)
 {
-struct ovsdb_error *error = ovsdb_log_commit(new);
+struct ovsdb_error *error = ovsdb_log_commit_block(new);
 if (error) {
 ovsdb_log_replace_abort(new);
 return error;
@@ -784,6 +797,10 @@ ovsdb_log_replace_commit(struct ovsdb_log *old, struct 
ovsdb_log *new)
 ovsdb_error_destroy(old->error);
 old->error = NULL;
 /* prev_offset only matters for OVSDB_LOG_READ. */
+if (old->afsync) {
+uint64_t ticket = afsync_destroy(old->afsync);
+old->afsync = afsync_create(fileno(old->stream), ticket + 1);
+}
 old->offset = new->offset;
 /* Keep old->name. */
 free(old->magic);
@@ -816,3 +833,130 @@ ovsdb_log_disable_renaming_open_files(void)
 {
 rename_open_files = false;
 }
+
+struct afsync {
+pthread_t thread;
+atomic_uint64_t cur, next;
+struct seq *request, *complete;
+int fd;
+};
+
+static void *
+afsync_thread(void *afsync_)
+{
+struct afsync *afsync = afsync_;
+uint64_t cur = 0;
+for (;;) {
+ovsrcu_quiesce_start();
+
+uint64_t request_seq = seq_read(afsync->request);
+
+uint64_t next;
+

[ovs-dev] [PATCH 12/13] log: Use absolute name of log file.

2017-12-07 Thread Ben Pfaff
In ovsdb-server, the OVSDB log code is used to open the databases specified
on the command line before ovsdb-server daemonizes itself.  Afterward, it
is occasionally necessary for ovsdb-server to reference those files by name
again.  When that happens, if daemonization changed the current directory
to the root, any relative names are no longer valid and those references
will fail.  Until now, this was handled at a higher level in ovsdb-server,
but in the future it will be convenient to handle it in the log code
itself.  This commit prepares for that by making the log code take the
absolute name of log files itself.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 93 +++--
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index 27f4b328d451..e364667ac673 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -72,7 +72,8 @@ struct ovsdb_log {
 
 off_t prev_offset;
 off_t offset;
-char *name;
+char *name; /* Absolute name of file. */
+char *display_name; /* For use in log messages, etc. */
 char *magic;
 struct lockfile *lockfile;
 FILE *stream;
@@ -124,8 +125,22 @@ ovsdb_log_open(const char *name, const char *magic,
 if (open_mode == OVSDB_LOG_CREATE_EXCL || open_mode == OVSDB_LOG_CREATE) {
 ovs_assert(!strchr(magic, '|'));
 }
+
 *filep = NULL;
 
+/* Get the absolute name of the file because we might need to access it by
+ * name again later after the process has changed directory (e.g. because
+ * daemonize() chdirs to "/").
+ *
+ * We save the user-provided name of the file for use in log messages, to
+ * reduce user confusion. */
+char *abs_name = abs_file_name(NULL, name);
+if (!abs_name) {
+error = ovsdb_io_error(0, "could not determine current "
+  "working directory");
+goto error;
+}
+
 ovs_assert(locking == -1 || locking == false || locking == true);
 if (locking < 0) {
 locking = open_mode != OVSDB_LOG_READ_ONLY;
@@ -238,7 +253,8 @@ ovsdb_log_open(const char *name, const char *magic,
 struct ovsdb_log *file = xmalloc(sizeof *file);
 file->state = OVSDB_LOG_READ;
 file->error = NULL;
-file->name = xstrdup(name);
+file->name = abs_name;
+file->display_name = xstrdup(name);
 file->magic = xstrdup(actual_magic);
 file->lockfile = lockfile;
 file->stream = stream;
@@ -253,6 +269,7 @@ error_fclose:
 error_unlock:
 lockfile_unlock(lockfile);
 error:
+free(abs_name);
 return error;
 }
 
@@ -284,6 +301,7 @@ ovsdb_log_close(struct ovsdb_log *file)
 if (file) {
 ovsdb_error_destroy(file->error);
 free(file->name);
+free(file->display_name);
 free(file->magic);
 if (file->stream) {
 fclose(file->stream);
@@ -359,8 +377,9 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned 
long int length,
 json_parser_abort(parser);
 return ovsdb_io_error(ferror(file->stream) ? errno : EOF,
   "%s: error reading %lu bytes "
-  "starting at offset %lld", file->name,
-  length, (long long int) offset);
+  "starting at offset %lld",
+  file->display_name, length,
+  (long long int) offset);
 }
 sha1_update(, input, chunk);
 json_parser_feed(parser, input, chunk);
@@ -412,7 +431,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 if (feof(file->stream)) {
 return NULL;
 }
-error = ovsdb_io_error(errno, "%s: read failed", file->name);
+error = ovsdb_io_error(errno, "%s: read failed", file->display_name);
 goto error;
 }
 off_t data_offset = file->offset + strlen(header);
@@ -422,7 +441,8 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 || strcmp(magic, file->magic)) {
 error = ovsdb_syntax_error(NULL, NULL, "%s: parse error at offset "
"%lld in header line \"%.*s\"",
-   file->name, (long long int) file->offset,
+   file->display_name,
+   (long long int) file->offset,
(int) strcspn(header, "\n"), header);
 goto error;
 }
@@ -436,7 +456,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 error = ovsdb_syntax_error(NULL, NULL, "%s: %lu bytes starting at "
"offset %lld have SHA-1 hash "SHA1_FMT" "
"but should have hash "SHA1_FMT,
-   file->name, data_length,
+   file->display_name, 

[ovs-dev] [PATCH 11/13] log: Replace ovsdb_log_get_offset() by a more abstract mechanism.

2017-12-07 Thread Ben Pfaff
Upcoming support for clustered databases will need to provide a more
abstract way to determine when a given file should be compacted, so this
changes the standalone database support to use this mechanism in advance.

Signed-off-by: Ben Pfaff 
---
 ovsdb/file.c | 24 
 ovsdb/log.c  | 26 +++---
 ovsdb/log.h  | 22 +-
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/ovsdb/file.c b/ovsdb/file.c
index fca07c3d722d..094fa0ce207a 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -73,7 +73,6 @@ static struct ovsdb_error *ovsdb_file_create(struct ovsdb *,
  struct ovsdb_log *,
  const char *file_name,
  unsigned int n_transactions,
- off_t snapshot_size,
  struct ovsdb_file **filep);
 
 /* Opens database 'file_name' and stores a pointer to the new database in
@@ -209,7 +208,6 @@ ovsdb_file_open__(const char *file_name,
  *
  * The schema precedes the snapshot in the log; we could compensate for its
  * size, but it's just not that important. */
-off_t snapshot_size = 0;
 unsigned int n_transactions = 0;
 while ((error = ovsdb_log_read(log, )) == NULL && json) {
 struct ovsdb_txn *txn;
@@ -230,7 +228,7 @@ ovsdb_file_open__(const char *file_name,
 }
 
 if (n_transactions == 1) {
-snapshot_size = ovsdb_log_get_offset(log);
+ovsdb_log_mark_base(log);
 }
 }
 if (error) {
@@ -247,8 +245,7 @@ ovsdb_file_open__(const char *file_name,
 if (!read_only) {
 struct ovsdb_file *file;
 
-error = ovsdb_file_create(db, log, file_name, n_transactions,
-  snapshot_size, );
+error = ovsdb_file_create(db, log, file_name, n_transactions, );
 if (error) {
 goto error;
 }
@@ -515,7 +512,6 @@ struct ovsdb_file {
 long long int last_compact;
 long long int next_compact;
 unsigned int n_transactions;
-off_t snapshot_size;
 };
 
 static const struct ovsdb_replica_class ovsdb_file_class;
@@ -523,8 +519,7 @@ static const struct ovsdb_replica_class ovsdb_file_class;
 static struct ovsdb_error *
 ovsdb_file_create(struct ovsdb *db, struct ovsdb_log *log,
   const char *file_name,
-  unsigned int n_transactions, off_t snapshot_size,
-  struct ovsdb_file **filep)
+  unsigned int n_transactions, struct ovsdb_file **filep)
 {
 struct ovsdb_file *file;
 char *deref_name;
@@ -548,7 +543,6 @@ ovsdb_file_create(struct ovsdb *db, struct ovsdb_log *log,
 file->file_name = abs_name;
 file->last_compact = time_msec();
 file->next_compact = file->last_compact + COMPACT_MIN_MSEC;
-file->snapshot_size = snapshot_size;
 file->n_transactions = n_transactions;
 ovsdb_add_replica(db, >replica);
 
@@ -601,12 +595,9 @@ ovsdb_file_commit(struct ovsdb_replica *replica,
  * tried), and if there are at least 100 transactions in the database, and
  * if the database is at least 10 MB, and the database is at least 4x the
  * size of the previous snapshot, then compact the database. */
-off_t log_size = ovsdb_log_get_offset(file->log);
 if (time_msec() >= file->next_compact
 && file->n_transactions >= 100
-&& log_size >= 10 * 1024 * 1024
-&& log_size / 4 >= file->snapshot_size)
-{
+&& ovsdb_log_has_grown(file->log)) {
 error = ovsdb_file_compact(file);
 if (error) {
 char *s = ovsdb_error_to_string(error);
@@ -653,10 +644,9 @@ ovsdb_file_compact(struct ovsdb_file *file)
 int retval;
 
 comment = xasprintf("compacting database online "
-"(%.3f seconds old, %u transactions, %llu bytes)",
+"(%.3f seconds old, %u transactions)",
 (time_wall_msec() - file->last_compact) / 1000.0,
-file->n_transactions,
-(unsigned long long) ovsdb_log_get_offset(file->log));
+file->n_transactions);
 VLOG_INFO("%s: %s", file->file_name, comment);
 
 /* Commit the old version, so that we can be assured that we'll eventually
@@ -686,6 +676,7 @@ ovsdb_file_compact(struct ovsdb_file *file)
 if (error) {
 goto exit;
 }
+ovsdb_log_mark_base(new_log);
 
 /* Replace original file by the temporary file.
  *
@@ -740,6 +731,7 @@ ovsdb_file_compact(struct ovsdb_file *file)
 ovs_fatal(0, "error reading database");
 }
 json_destroy(json);
+ovsdb_log_mark_base(file->log);
 }
 
 /* Success! */
diff --git a/ovsdb/log.c b/ovsdb/log.c
index dd641884a2ee..27f4b328d451 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -76,6 

[ovs-dev] [PATCH 09/13] log: Support using /dev/stdin for opening logs read-only.

2017-12-07 Thread Ben Pfaff
On Unix-like systems, usually /dev/stdin opens a duplicate of fd 0, and
this will be convenient in a few places later on.  This commit makes this
support universal.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index 1769e12266d1..d25f766474dc 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -166,7 +166,13 @@ ovsdb_log_open(const char *name, const char *magic,
 #ifdef _WIN32
 flags = flags | O_BINARY;
 #endif
-fd = open(name, flags, 0666);
+/* Special case for /dev/stdin to make it work even if the operating system
+ * doesn't support it under that name. */
+if (!strcmp(name, "/dev/stdin") && open_mode == OVSDB_LOG_READ_ONLY) {
+fd = dup(STDIN_FILENO);
+} else {
+fd = open(name, flags, 0666);
+}
 if (fd < 0) {
 const char *op = (open_mode == OVSDB_LOG_CREATE_EXCL ? "create"
 : open_mode == OVSDB_LOG_CREATE ? "create or open"
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 10/13] log: Support multiple magic.

2017-12-07 Thread Ben Pfaff
Some OVSDB tools will want to open files that might be standalone or
clustered databases, and so it's better if ovsdb_log_open() can accept more
than one valid "magic".  This commit makes that possible.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c| 147 +
 ovsdb/log.h|   6 ++-
 tests/ovsdb-log.at |   2 +-
 3 files changed, 108 insertions(+), 47 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index d25f766474dc..dd641884a2ee 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -88,13 +88,19 @@ static bool rename_open_files = false;
 static bool rename_open_files = true;
 #endif
 
+static bool parse_header(char *header, const char **magicp,
+ unsigned long int *length,
+ uint8_t sha1[SHA1_DIGEST_SIZE]);
+static bool is_magic_ok(const char *needle, const char *haystack);
+
 /* Attempts to open 'name' with the specified 'open_mode'.  On success, stores
  * the new log into '*filep' and returns NULL; otherwise returns NULL and
  * stores NULL into '*filep'.
  *
  * 'magic' is a short text string put at the beginning of every record and used
  * to distinguish one kind of log file from another.  For a conventional OVSDB
- * log file, use OVSDB_MAGIC.
+ * log file, use OVSDB_MAGIC.  To accept more than one magic string, separate
+ * them with "|", e.g. "MAGIC 1|MAGIC 2".
  *
  * Whether the file will be locked using lockfile_lock() depends on 'locking':
  * use true to lock it, false not to lock it, or -1 to lock it only if
@@ -107,12 +113,16 @@ ovsdb_log_open(const char *name, const char *magic,
 {
 struct lockfile *lockfile;
 struct ovsdb_error *error;
-struct ovsdb_log *file;
 struct stat s;
 FILE *stream;
 int flags;
 int fd;
 
+/* If we can create a new file, we need to know what kind of magic to
+ * use, so there must be only one kind. */
+if (open_mode == OVSDB_LOG_CREATE_EXCL || open_mode == OVSDB_LOG_CREATE) {
+ovs_assert(!strchr(magic, '|'));
+}
 *filep = NULL;
 
 ovs_assert(locking == -1 || locking == false || locking == true);
@@ -181,43 +191,54 @@ ovsdb_log_open(const char *name, const char *magic,
 goto error_unlock;
 }
 
-if (!fstat(fd, )) {
-if (s.st_size == 0) {
-/* It's (probably) a new file so fsync() its parent directory to
- * ensure that its directory entry is committed to disk. */
-fsync_parent_dir(name);
-} else if (s.st_size >= strlen(magic) && S_ISREG(s.st_mode)) {
-/* Try to read the magic from the first log record.  If it's not
- * the magic we expect, this is the wrong kind of file, so reject
- * it immediately. */
-size_t magic_len = strlen(magic);
-char *buf = xzalloc(magic_len + 1);
-bool err = (read(fd, buf, magic_len) == magic_len
-&& strcmp(buf, magic));
-free(buf);
-if (err) {
-error = ovsdb_error(NULL, "%s: bad magic (unexpected "
-"kind of file)", name);
-goto error_close;
-}
-if (lseek(fd, 0, SEEK_SET)) {
-error = ovsdb_io_error(errno, "%s: seek failed", name);
-goto error_close;
-}
-}
-}
-
 stream = fdopen(fd, open_mode == OVSDB_LOG_READ_ONLY ? "rb" : "w+b");
 if (!stream) {
 error = ovsdb_io_error(errno, "%s: fdopen failed", name);
-goto error_close;
+close(fd);
+goto error_unlock;
+}
+
+/* Read the magic from the first log record. */
+char header[128];
+const char *actual_magic;
+if (!fgets(header, sizeof header, stream)) {
+if (ferror(stream)) {
+error = ovsdb_io_error(errno, "%s: read error", name);
+goto error_fclose;
+}
+
+/* We need to be able to report what kind of file this is but we can't
+ * if it's empty and we accept more than one. */
+if (strchr(magic, '|')) {
+error = ovsdb_error(NULL, "%s: unexpected end of file", name);
+goto error_fclose;
+}
+actual_magic = magic;
+
+/* It's an empty file and therefore probably a new file, so fsync()
+ * its parent directory to ensure that its directory entry is
+ * committed to disk. */
+fsync_parent_dir(name);
+} else {
+unsigned long int length;
+uint8_t sha1[SHA1_DIGEST_SIZE];
+if (!parse_header(header, _magic, , sha1)
+|| !is_magic_ok(actual_magic, magic)) {
+error = ovsdb_error(NULL, "%s: unexpected file format", name);
+goto error_fclose;
+}
+}
+
+if (fseek(stream, 0, SEEK_SET)) {
+error = ovsdb_io_error(errno, "%s: seek failed", name);
+goto error_fclose;
 }
 
-file = xmalloc(sizeof *file);
+struct 

[ovs-dev] [PATCH 08/13] log: New function ovsdb_log_compose_record().

2017-12-07 Thread Ben Pfaff
This will acquire a new user later on.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 55 ---
 ovsdb/log.h |  4 
 2 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index 22cd9fe89599..1769e12266d1 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 
+#include "openvswitch/dynamic-string.h"
 #include "openvswitch/json.h"
 #include "openvswitch/vlog.h"
 #include "lockfile.h"
@@ -419,18 +420,32 @@ ovsdb_log_unread(struct ovsdb_log *file)
 file->offset = file->prev_offset;
 }
 
+void
+ovsdb_log_compose_record(const struct json *json,
+ const char *magic, struct ds *header, struct ds *data)
+{
+ovs_assert(json->type == JSON_OBJECT || json->type == JSON_ARRAY);
+ovs_assert(!header->length);
+ovs_assert(!data->length);
+
+/* Compose content.  Add a new-line (replacing the null terminator) to make
+ * the file easier to read, even though it has no semantic value.  */
+json_to_ds(json, 0, data);
+ds_put_char(data, '\n');
+
+/* Compose header. */
+uint8_t sha1[SHA1_DIGEST_SIZE];
+sha1_bytes(data->string, data->length, sha1);
+ds_put_format(header, "%s %"PRIuSIZE" "SHA1_FMT"\n",
+  magic, data->length, SHA1_ARGS(sha1));
+}
+
 /* Writes log record 'json' to 'file'.  Returns NULL if successful or an error
  * (which the caller must eventually destroy) on failure. */
 struct ovsdb_error *
 ovsdb_log_write(struct ovsdb_log *file, const struct json *json)
 {
-uint8_t sha1[SHA1_DIGEST_SIZE];
 struct ovsdb_error *error;
-char *json_string;
-char header[128];
-size_t length;
-
-json_string = NULL;
 
 switch (file->state) {
 case OVSDB_LOG_WRITE:
@@ -463,22 +478,18 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json 
*json)
 goto error;
 }
 
-/* Compose content.  Add a new-line (replacing the null terminator) to make
- * the file easier to read, even though it has no semantic value.  */
-json_string = json_to_string(json, 0);
-length = strlen(json_string) + 1;
-json_string[length - 1] = '\n';
-
-/* Compose header. */
-sha1_bytes(json_string, length, sha1);
-snprintf(header, sizeof header, "%s %"PRIuSIZE" "SHA1_FMT"\n",
- file->magic, length, SHA1_ARGS(sha1));
+struct ds header = DS_EMPTY_INITIALIZER;
+struct ds data = DS_EMPTY_INITIALIZER;
+ovsdb_log_compose_record(json, file->magic, , );
+size_t total_length = header.length + data.length;
 
 /* Write. */
-if (fwrite(header, strlen(header), 1, file->stream) != 1
-|| fwrite(json_string, length, 1, file->stream) != 1
-|| fflush(file->stream))
-{
+bool ok = (fwrite(header.string, header.length, 1, file->stream) == 1
+   && fwrite(data.string, data.length, 1, file->stream) == 1
+   && fflush(file->stream) == 0);
+ds_destroy();
+ds_destroy();
+if (!ok) {
 error = ovsdb_io_error(errno, "%s: write failed", file->name);
 
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
@@ -492,14 +503,12 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json 
*json)
 goto error;
 }
 
-file->offset += strlen(header) + length;
-free(json_string);
+file->offset += total_length;
 return NULL;
 
 error:
 file->state = OVSDB_LOG_WRITE_ERROR;
 file->error = ovsdb_error_clone(error);
-free(json_string);
 return error;
 }
 
diff --git a/ovsdb/log.h b/ovsdb/log.h
index 41734e73b351..cc8469c01e57 100644
--- a/ovsdb/log.h
+++ b/ovsdb/log.h
@@ -19,6 +19,7 @@
 #include 
 #include "compiler.h"
 
+struct ds;
 struct json;
 struct ovsdb_log;
 
@@ -42,6 +43,9 @@ struct ovsdb_error *ovsdb_log_read(struct ovsdb_log *, struct 
json **)
 OVS_WARN_UNUSED_RESULT;
 void ovsdb_log_unread(struct ovsdb_log *);
 
+void ovsdb_log_compose_record(const struct json *, const char *magic,
+  struct ds *header, struct ds *data);
+
 struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, const struct json *)
 OVS_WARN_UNUSED_RESULT;
 struct ovsdb_error *ovsdb_log_commit(struct ovsdb_log *)
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 07/13] log: New functions for replacing a log's contents.

2017-12-07 Thread Ben Pfaff
These functions will acquire users in future commits.

This new functionality made the internal state machine of the log hard
enough to understand that I thought that it was best to make it explicit
with a 'state' variable, so this commit introduces one.

This commit duplicates code and logic from ovsdb_rename() and
ovsdb_compact() in ovsdb/file.c.  A future commit removes this duplication.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c| 294 -
 ovsdb/log.h|  14 +++
 tests/ovsdb-log.at |  74 ++
 tests/test-ovsdb.c |  58 +--
 4 files changed, 404 insertions(+), 36 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index 6a1a4484ad34..22cd9fe89599 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -36,23 +36,57 @@
 
 VLOG_DEFINE_THIS_MODULE(ovsdb_log);
 
-enum ovsdb_log_mode {
-OVSDB_LOG_READ,
-OVSDB_LOG_WRITE
+/* State in a log's state machine.
+ *
+ * OVSDB_LOG_READ is the initial state for a newly opened log.  Log records may
+ * be read in this state only.  Reaching end of file does not cause a state
+ * transition.  A read error transitions to OVSDB_LOG_READ_ERROR.
+ *
+ * OVSDB_LOG_READ_ERROR prevents further reads from succeeding; they will
+ * report the same error as before.  A log write transitions away to
+ * OVSDB_LOG_WRITE or OVSDB_LOG_WRITE_ERROR.
+ *
+ * OVSDB_LOG_WRITE is the state following a call to ovsdb_log_write(), when all
+ * goes well.  Any state other than OVSDB_LOG_BROKEN may transition to this
+ * state.  A write error transitions to OVSDB_LOG_WRITE_ERROR.
+ *
+ * OVSDB_LOG_WRITE_ERROR is the state following a write error.  Further writes
+ * retry and might transition back to OVSDB_LOG_WRITE.
+ *
+ * OVSDB_LOG_BROKEN is the state following a call to ovsdb_log_replace() or
+ * ovsdb_log_replace_commit(), if it fails in a spectacular enough way that no
+ * further reads or writes can succeed.  This is a terminal state.
+ */
+enum ovsdb_log_state {
+OVSDB_LOG_READ, /* Ready to read. */
+OVSDB_LOG_READ_ERROR,   /* Read failed, see 'error' for details. */
+OVSDB_LOG_WRITE,/* Ready to write. */
+OVSDB_LOG_WRITE_ERROR,  /* Write failed, see 'error' for details. */
+OVSDB_LOG_BROKEN,   /* Disk on fire, see 'error' for details. */
 };
 
 struct ovsdb_log {
+enum ovsdb_log_state state;
+struct ovsdb_error *error;
+
 off_t prev_offset;
 off_t offset;
 char *name;
 char *magic;
 struct lockfile *lockfile;
 FILE *stream;
-struct ovsdb_error *read_error;
-bool write_error;
-enum ovsdb_log_mode mode;
 };
 
+/* Whether the OS supports renaming open files.
+ *
+ * (Making this a variable makes it easier to test both strategies on Unix-like
+ * systems.) */
+#ifdef _WIN32
+static bool rename_open_files = false;
+#else
+static bool rename_open_files = true;
+#endif
+
 /* Attempts to open 'name' with the specified 'open_mode'.  On success, stores
  * the new log into '*filep' and returns NULL; otherwise returns NULL and
  * stores NULL into '*filep'.
@@ -173,15 +207,14 @@ ovsdb_log_open(const char *name, const char *magic,
 }
 
 file = xmalloc(sizeof *file);
+file->state = OVSDB_LOG_READ;
+file->error = NULL;
 file->name = xstrdup(name);
 file->magic = xstrdup(magic);
 file->lockfile = lockfile;
 file->stream = stream;
 file->prev_offset = 0;
 file->offset = 0;
-file->read_error = NULL;
-file->write_error = false;
-file->mode = OVSDB_LOG_READ;
 *filep = file;
 return NULL;
 
@@ -197,10 +230,12 @@ void
 ovsdb_log_close(struct ovsdb_log *file)
 {
 if (file) {
+ovsdb_error_destroy(file->error);
 free(file->name);
-fclose(file->stream);
+if (file->stream) {
+fclose(file->stream);
+}
 lockfile_unlock(file->lockfile);
-ovsdb_error_destroy(file->read_error);
 free(file);
 }
 }
@@ -283,6 +318,20 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned 
long int length,
 struct ovsdb_error *
 ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 {
+*jsonp = NULL;
+switch (file->state) {
+case OVSDB_LOG_READ:
+break;
+
+case OVSDB_LOG_READ_ERROR:
+case OVSDB_LOG_WRITE_ERROR:
+case OVSDB_LOG_BROKEN:
+return ovsdb_error_clone(file->error);
+
+case OVSDB_LOG_WRITE:
+return NULL;
+}
+
 uint8_t expected_sha1[SHA1_DIGEST_SIZE];
 uint8_t actual_sha1[SHA1_DIGEST_SIZE];
 struct ovsdb_error *error;
@@ -291,20 +340,13 @@ ovsdb_log_read(struct ovsdb_log *file, struct json 
**jsonp)
 struct json *json;
 char header[128];
 
-*jsonp = json = NULL;
-
-if (file->read_error) {
-return ovsdb_error_clone(file->read_error);
-} else if (file->mode == OVSDB_LOG_WRITE) {
-return NULL;
-}
+json = NULL;
 
 if (!fgets(header, sizeof header, file->stream)) {
 

[ovs-dev] [PATCH 05/13] log: Add new open mode OVSDB_LOG_CREATE_EXCL.

2017-12-07 Thread Ben Pfaff
Until now, OVSDB_LOG_CREATE implied EXCL, but this commit breaks them
apart.

Signed-off-by: Ben Pfaff 
---
 ovsdb/file.c   |  2 +-
 ovsdb/log.c| 23 ++-
 ovsdb/log.h|  3 ++-
 ovsdb/ovsdb-tool.c |  2 +-
 tests/ovsdb-log.at |  7 ++-
 tests/test-ovsdb.c |  2 ++
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/ovsdb/file.c b/ovsdb/file.c
index 16461a75bfe5..fca07c3d722d 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -441,7 +441,7 @@ ovsdb_file_save_copy__(const char *file_name, int locking,
 struct json *json;
 
 error = ovsdb_log_open(file_name, OVSDB_MAGIC,
-   OVSDB_LOG_CREATE, locking, );
+   OVSDB_LOG_CREATE_EXCL, locking, );
 if (error) {
 return error;
 }
diff --git a/ovsdb/log.c b/ovsdb/log.c
index f3c6e22ae212..f95075034b3d 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -95,11 +95,16 @@ ovsdb_log_open(const char *name, const char *magic,
 lockfile = NULL;
 }
 
-if (open_mode == OVSDB_LOG_READ_ONLY) {
+switch (open_mode) {
+case OVSDB_LOG_READ_ONLY:
 flags = O_RDONLY;
-} else if (open_mode == OVSDB_LOG_READ_WRITE) {
+break;
+
+case OVSDB_LOG_READ_WRITE:
 flags = O_RDWR;
-} else if (open_mode == OVSDB_LOG_CREATE) {
+break;
+
+case OVSDB_LOG_CREATE_EXCL:
 #ifndef _WIN32
 if (stat(name, ) == -1 && errno == ENOENT
 && lstat(name, ) == 0 && S_ISLNK(s.st_mode)) {
@@ -114,7 +119,13 @@ ovsdb_log_open(const char *name, const char *magic,
 #else
 flags = O_RDWR | O_CREAT | O_EXCL;
 #endif
-} else {
+break;
+
+case OVSDB_LOG_CREATE:
+flags = O_RDWR | O_CREAT;
+break;
+
+default:
 OVS_NOT_REACHED();
 }
 #ifdef _WIN32
@@ -122,7 +133,9 @@ ovsdb_log_open(const char *name, const char *magic,
 #endif
 fd = open(name, flags, 0666);
 if (fd < 0) {
-const char *op = open_mode == OVSDB_LOG_CREATE ? "create" : "open";
+const char *op = (open_mode == OVSDB_LOG_CREATE_EXCL ? "create"
+: open_mode == OVSDB_LOG_CREATE ? "create or open"
+: "open");
 error = ovsdb_io_error(errno, "%s: %s failed", name, op);
 goto error_unlock;
 }
diff --git a/ovsdb/log.h b/ovsdb/log.h
index fd495e518dd0..08c6a7783094 100644
--- a/ovsdb/log.h
+++ b/ovsdb/log.h
@@ -26,7 +26,8 @@ struct ovsdb_log;
 enum ovsdb_log_open_mode {
 OVSDB_LOG_READ_ONLY,/* Open existing file, read-only. */
 OVSDB_LOG_READ_WRITE,   /* Open existing file, read/write. */
-OVSDB_LOG_CREATE/* Create new file, read/write. */
+OVSDB_LOG_CREATE_EXCL,  /* Create new file, read/write. */
+OVSDB_LOG_CREATE/* Create or open file, read/write. */
 };
 
 #define OVSDB_MAGIC "OVSDB JSON"
diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 45b3f7348c3d..da6b7142a4fa 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -218,7 +218,7 @@ do_create(struct ovs_cmdl_context *ctx)
 
 /* Create database file. */
 check_ovsdb_error(ovsdb_log_open(db_file_name, OVSDB_MAGIC,
- OVSDB_LOG_CREATE, -1, ));
+ OVSDB_LOG_CREATE_EXCL, -1, ));
 check_ovsdb_error(ovsdb_log_write(log, json));
 check_ovsdb_error(ovsdb_log_commit(log));
 ovsdb_log_close(log);
diff --git a/tests/ovsdb-log.at b/tests/ovsdb-log.at
index 29c0c5913c15..826e334efddd 100644
--- a/tests/ovsdb-log.at
+++ b/tests/ovsdb-log.at
@@ -46,9 +46,14 @@ AT_CHECK(
 file: read: {"x":1}
 ]], [ignore])
 AT_CHECK(
-  [test-ovsdb log-io file create read], [1],
+  [test-ovsdb log-io file create-excl read], [1],
   [], [test-ovsdb: I/O error: file: create failed (File exists)
 ])
+AT_CHECK(
+  [test-ovsdb log-io file create read], [0],
+  [file: open successful
+file: read: {"x":1}
+])
 AT_CHECK([test -f .file.~lock~])
 AT_CLEANUP
 
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index 4d6a894e3e83..e06ecad4f368 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -323,6 +323,8 @@ do_log_io(struct ovs_cmdl_context *ctx)
 mode = OVSDB_LOG_READ_WRITE;
 } else if (!strcmp(mode_string, "create")) {
 mode = OVSDB_LOG_CREATE;
+} else if (!strcmp(mode_string, "create-excl")) {
+mode = OVSDB_LOG_CREATE_EXCL;
 } else {
 ovs_fatal(0, "unknown log-io open mode \"%s\"", mode_string);
 }
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 06/13] log: Make reading in writing mode less of an error.

2017-12-07 Thread Ben Pfaff
Clients are intended to use the OVSDB log code by reading zero or more
records and then writing zero or more records, but not reading after any
write has occurred.  Until now, ovsdb_log_read() has signaled the latter
behavior as an error.  Upcoming changes to OVSDB are going to make it an
expected behavior in some cases, so this commit changes it so that it
just becomes an empty read.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index f95075034b3d..6a1a4484ad34 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -296,7 +296,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 if (file->read_error) {
 return ovsdb_error_clone(file->read_error);
 } else if (file->mode == OVSDB_LOG_WRITE) {
-return OVSDB_BUG("reading file in write mode");
+return NULL;
 }
 
 if (!fgets(header, sizeof header, file->stream)) {
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 04/13] log: Log write errors.

2017-12-07 Thread Ben Pfaff
This saves all the callers from logging them separately.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index 1a372d7b73ee..f3c6e22ae212 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -25,6 +25,7 @@
 #include 
 
 #include "openvswitch/json.h"
+#include "openvswitch/vlog.h"
 #include "lockfile.h"
 #include "ovsdb.h"
 #include "ovsdb-error.h"
@@ -33,6 +34,8 @@
 #include "transaction.h"
 #include "util.h"
 
+VLOG_DEFINE_THIS_MODULE(ovsdb_log);
+
 enum ovsdb_log_mode {
 OVSDB_LOG_READ,
 OVSDB_LOG_WRITE
@@ -411,6 +414,10 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json 
*json)
 {
 error = ovsdb_io_error(errno, "%s: write failed", file->name);
 
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
+VLOG_WARN_RL(, "%s: write failed (%s)",
+ file->name, ovs_strerror(errno));
+
 /* Remove any partially written data, ignoring errors since there is
  * nothing further we can do. */
 ignore(ftruncate(fileno(file->stream), file->offset));
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 03/13] log: Make json parameter to ovsdb_log_write() const.

2017-12-07 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c | 4 +++-
 ovsdb/log.h | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index d9fedd9ded6c..1a372d7b73ee 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -360,8 +360,10 @@ ovsdb_log_unread(struct ovsdb_log *file)
 file->offset = file->prev_offset;
 }
 
+/* Writes log record 'json' to 'file'.  Returns NULL if successful or an error
+ * (which the caller must eventually destroy) on failure. */
 struct ovsdb_error *
-ovsdb_log_write(struct ovsdb_log *file, struct json *json)
+ovsdb_log_write(struct ovsdb_log *file, const struct json *json)
 {
 uint8_t sha1[SHA1_DIGEST_SIZE];
 struct ovsdb_error *error;
diff --git a/ovsdb/log.h b/ovsdb/log.h
index fbcea1f2b991..fd495e518dd0 100644
--- a/ovsdb/log.h
+++ b/ovsdb/log.h
@@ -41,7 +41,7 @@ struct ovsdb_error *ovsdb_log_read(struct ovsdb_log *, struct 
json **)
 OVS_WARN_UNUSED_RESULT;
 void ovsdb_log_unread(struct ovsdb_log *);
 
-struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, struct json *)
+struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, const struct json *)
 OVS_WARN_UNUSED_RESULT;
 struct ovsdb_error *ovsdb_log_commit(struct ovsdb_log *)
 OVS_WARN_UNUSED_RESULT;
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 02/13] log: Require log entries to be JSON objects.

2017-12-07 Thread Ben Pfaff
The current and upcoming users of the OVSDB logging module only use
JSON objects as records.  This commit simplifies the users slightly
by allowing them to always assume that the records are JSON objects.

Unfortunately this resulted in a large number of updates to tests,
which didn't always use JSON objects.

Signed-off-by: Ben Pfaff 
---
 ovsdb/log.c|  17 +
 tests/ovsdb-log.at | 212 ++---
 2 files changed, 123 insertions(+), 106 deletions(-)

diff --git a/ovsdb/log.c b/ovsdb/log.c
index e6f66e668fe5..d9fedd9ded6c 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -254,6 +254,16 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned 
long int length,
 return NULL;
 }
 
+/* Attempts to read a log record from 'file'.
+ *
+ * If successful, returns NULL and stores in '*jsonp' the JSON object that the
+ * record contains.  The caller owns the data and must eventually free it (with
+ * json_destroy()).
+ *
+ * If a read error occurs, returns the error and stores NULL in '*jsonp'.
+ *
+ * If the read reaches end of file, returns NULL and stores NULL in
+ * '*jsonp'. */
 struct ovsdb_error *
 ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
 {
@@ -315,6 +325,13 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp)
json->u.string);
 goto error;
 }
+if (json->type != JSON_OBJECT) {
+error = ovsdb_syntax_error(NULL, NULL, "%s: %lu bytes starting at "
+   "offset %lld are not a JSON object",
+   file->name, data_length,
+   (long long int) data_offset);
+goto error;
+}
 
 file->prev_offset = file->offset;
 file->offset = data_offset + data_length;
diff --git a/tests/ovsdb-log.at b/tests/ovsdb-log.at
index c8efaaec1a50..29c0c5913c15 100644
--- a/tests/ovsdb-log.at
+++ b/tests/ovsdb-log.at
@@ -19,14 +19,14 @@ AT_SETUP([write one, reread])
 AT_KEYWORDS([ovsdb log])
 AT_CAPTURE_FILE([file])
 AT_CHECK(
-  [[test-ovsdb log-io file create 'write:[0]']], [0], 
+  [[test-ovsdb log-io file create 'write:{"x":0}']], [0],
   [[file: open successful
-file: write:[0] successful
+file: write:{"x":0} successful
 ]], [ignore])
 AT_CHECK(
   [test-ovsdb log-io file read-only read read], [0], 
   [[file: open successful
-file: read: [0]
+file: read: {"x":0}
 file: read: end of log
 ]], [ignore])
 AT_CHECK([test -f .file.~lock~])
@@ -36,14 +36,14 @@ AT_SETUP([check that create fails if file exists])
 AT_KEYWORDS([ovsdb log])
 AT_CAPTURE_FILE([file])
 AT_CHECK(
-  [[test-ovsdb log-io file create 'write:[1]']], [0], 
+  [[test-ovsdb log-io file create 'write:{"x":1}']], [0],
   [[file: open successful
-file: write:[1] successful
+file: write:{"x":1} successful
 ]], [ignore])
 AT_CHECK(
   [test-ovsdb log-io file read-only read], [0], 
   [[file: open successful
-file: read: [1]
+file: read: {"x":1}
 ]], [ignore])
 AT_CHECK(
   [test-ovsdb log-io file create read], [1],
@@ -56,18 +56,18 @@ AT_SETUP([write one, reread])
 AT_KEYWORDS([ovsdb log])
 AT_CAPTURE_FILE([file])
 AT_CHECK(
-  [[test-ovsdb log-io file create 'write:[0]' 'write:[1]' 'write:[2]']], [0], 
+  [[test-ovsdb log-io file create 'write:{"x":0}' 'write:{"x":1}' 
'write:{"x":2}']], [0],
   [[file: open successful
-file: write:[0] successful
-file: write:[1] successful
-file: write:[2] successful
+file: write:{"x":0} successful
+file: write:{"x":1} successful
+file: write:{"x":2} successful
 ]], [ignore])
 AT_CHECK(
   [test-ovsdb log-io file read-only read read read read], [0], 
   [[file: open successful
-file: read: [0]
-file: read: [1]
-file: read: [2]
+file: read: {"x":0}
+file: read: {"x":1}
+file: read: {"x":2}
 file: read: end of log
 ]], [ignore])
 AT_CHECK([test -f .file.~lock~])
@@ -79,18 +79,18 @@ AT_CAPTURE_FILE([file])
 # Sometimes you just need more magic:
 # http://www.catb.org/jargon/html/magic-story.html
 AT_CHECK(
-  [[test-ovsdb --magic="MORE MAGIC" log-io file create 'write:[0]' 'write:[1]' 
'write:[2]']], [0],
+  [[test-ovsdb --magic="MORE_MAGIC" log-io file create 'write:{"x":0}' 
'write:{"x":1}' 'write:{"x":2}']], [0],
   [[file: open successful
-file: write:[0] successful
-file: write:[1] successful
-file: write:[2] successful
+file: write:{"x":0} successful
+file: write:{"x":1} successful
+file: write:{"x":2} successful
 ]], [ignore])
 AT_CHECK(
-  [test-ovsdb --magic="MORE MAGIC" log-io file read-only read read read read], 
[0],
+  [test-ovsdb --magic="MORE_MAGIC" log-io file read-only read read read read], 
[0],
   [[file: open successful
-file: read: [0]
-file: read: [1]
-file: read: [2]
+file: read: {"x":0}
+file: read: {"x":1}
+file: read: {"x":2}
 file: read: end of log
 ]], [ignore])
 AT_CHECK(
@@ -104,27 +104,27 @@ AT_SETUP([write one, reread, append])
 AT_KEYWORDS([ovsdb log])
 AT_CAPTURE_FILE([file])
 AT_CHECK(
-  [[test-ovsdb log-io file create 'write:[0]' 'write:[1]' 

[ovs-dev] [PATCH 01/13] log: Allow client to specify magic.

2017-12-07 Thread Ben Pfaff
Until now, the logging code in ovsdb has only supported a single file
format, for OVSDB standalone database files.  Upcoming commits will add
support for another, incompatible format, which uses a different magic
string for identification.  This commit allows the logging code to
support both formats.

Signed-off-by: Ben Pfaff 
---
 ovsdb/file.c   |  5 +++--
 ovsdb/log.c| 54 --
 ovsdb/log.h|  5 -
 ovsdb/ovsdb-tool.c | 10 +-
 tests/ovsdb-log.at | 27 +++
 tests/test-ovsdb.c | 17 ++---
 6 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/ovsdb/file.c b/ovsdb/file.c
index 6a406da2a838..16461a75bfe5 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -129,7 +129,7 @@ ovsdb_file_open_log(const char *file_name, enum 
ovsdb_log_open_mode open_mode,
 
 ovs_assert(logp || schemap);
 
-error = ovsdb_log_open(file_name, open_mode, -1, );
+error = ovsdb_log_open(file_name, OVSDB_MAGIC, open_mode, -1, );
 if (error) {
 goto error;
 }
@@ -440,7 +440,8 @@ ovsdb_file_save_copy__(const char *file_name, int locking,
 struct ovsdb_log *log;
 struct json *json;
 
-error = ovsdb_log_open(file_name, OVSDB_LOG_CREATE, locking, );
+error = ovsdb_log_open(file_name, OVSDB_MAGIC,
+   OVSDB_LOG_CREATE, locking, );
 if (error) {
 return error;
 }
diff --git a/ovsdb/log.c b/ovsdb/log.c
index 380f5e93d464..e6f66e668fe5 100644
--- a/ovsdb/log.c
+++ b/ovsdb/log.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2009, 2010, 2011, 2012, 2013 Nicira, Inc.
+/* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2017 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -42,6 +42,7 @@ struct ovsdb_log {
 off_t prev_offset;
 off_t offset;
 char *name;
+char *magic;
 struct lockfile *lockfile;
 FILE *stream;
 struct ovsdb_error *read_error;
@@ -53,12 +54,17 @@ struct ovsdb_log {
  * the new log into '*filep' and returns NULL; otherwise returns NULL and
  * stores NULL into '*filep'.
  *
+ * 'magic' is a short text string put at the beginning of every record and used
+ * to distinguish one kind of log file from another.  For a conventional OVSDB
+ * log file, use OVSDB_MAGIC.
+ *
  * Whether the file will be locked using lockfile_lock() depends on 'locking':
  * use true to lock it, false not to lock it, or -1 to lock it only if
  * 'open_mode' is a mode that allows writing.
  */
 struct ovsdb_error *
-ovsdb_log_open(const char *name, enum ovsdb_log_open_mode open_mode,
+ovsdb_log_open(const char *name, const char *magic,
+   enum ovsdb_log_open_mode open_mode,
int locking, struct ovsdb_log **filep)
 {
 struct lockfile *lockfile;
@@ -118,10 +124,30 @@ ovsdb_log_open(const char *name, enum ovsdb_log_open_mode 
open_mode,
 goto error_unlock;
 }
 
-if (!fstat(fd, ) && s.st_size == 0) {
-/* It's (probably) a new file so fsync() its parent directory to ensure
- * that its directory entry is committed to disk. */
-fsync_parent_dir(name);
+if (!fstat(fd, )) {
+if (s.st_size == 0) {
+/* It's (probably) a new file so fsync() its parent directory to
+ * ensure that its directory entry is committed to disk. */
+fsync_parent_dir(name);
+} else if (s.st_size >= strlen(magic) && S_ISREG(s.st_mode)) {
+/* Try to read the magic from the first log record.  If it's not
+ * the magic we expect, this is the wrong kind of file, so reject
+ * it immediately. */
+size_t magic_len = strlen(magic);
+char *buf = xzalloc(magic_len + 1);
+bool err = (read(fd, buf, magic_len) == magic_len
+&& strcmp(buf, magic));
+free(buf);
+if (err) {
+error = ovsdb_error(NULL, "%s: bad magic (unexpected "
+"kind of file)", name);
+goto error_close;
+}
+if (lseek(fd, 0, SEEK_SET)) {
+error = ovsdb_io_error(errno, "%s: seek failed", name);
+goto error_close;
+}
+}
 }
 
 stream = fdopen(fd, open_mode == OVSDB_LOG_READ_ONLY ? "rb" : "w+b");
@@ -132,6 +158,7 @@ ovsdb_log_open(const char *name, enum ovsdb_log_open_mode 
open_mode,
 
 file = xmalloc(sizeof *file);
 file->name = xstrdup(name);
+file->magic = xstrdup(magic);
 file->lockfile = lockfile;
 file->stream = stream;
 file->prev_offset = 0;
@@ -162,21 +189,20 @@ ovsdb_log_close(struct ovsdb_log *file)
 }
 }
 
-static const char magic[] = "OVSDB JSON ";
-
 static bool
-parse_header(char *header, unsigned long int *length,
+parse_header(const char *magic, char *header, unsigned long int 

[ovs-dev] [PATCH 00/13] OVSDB log enhancements

2017-12-07 Thread Ben Pfaff
These commits are prerequisites for the distributed database project
currently underway.

Ben Pfaff (13):
  log: Allow client to specify magic.
  log: Require log entries to be JSON objects.
  log: Make json parameter to ovsdb_log_write() const.
  log: Log write errors.
  log: Add new open mode OVSDB_LOG_CREATE_EXCL.
  log: Make reading in writing mode less of an error.
  log: New functions for replacing a log's contents.
  log: New function ovsdb_log_compose_record().
  log: Support using /dev/stdin for opening logs read-only.
  log: Support multiple magic.
  log: Replace ovsdb_log_get_offset() by a more abstract mechanism.
  log: Use absolute name of log file.
  log: Add async commit support.

 ovsdb/file.c   |  33 +--
 ovsdb/log.c| 788 +
 ovsdb/log.h|  61 -
 ovsdb/ovsdb-tool.c |  12 +-
 tests/ovsdb-log.at | 304 ++---
 tests/test-ovsdb.c |  75 -
 6 files changed, 1018 insertions(+), 255 deletions(-)

-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.

2017-12-07 Thread Sairam Venugopal
We can hold off on back-porting this. This change mainly affects the delete by 
5-tuple API.

https://github.com/openvswitch/ovs/commit/817a76577fec3f03310d7d3a5a10df01340ee8ad

Unless we plan to backport that patch, we should hold off on backporting this 
as well.

Thanks,
Sairam



On 12/6/17, 9:41 AM, "aserd...@ovn.org"  wrote:

>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>> boun...@openvswitch.org] On Behalf Of Justin Pettit
>> Sent: Tuesday, December 5, 2017 9:30 AM
>> To: d...@openvswitch.org; Alin Gabriel Serdean ;
>> Sairam Venugopal 
>> Subject: [ovs-dev] [PATCH] datapath-windows: Correct endianness for
>> deleting zone.
>> 
>> The zone Netlink attribute is supposed to be in network-byte order, but
>the
>> Windows code for deleting conntrack entries was treating it as host-byte
>> order.
>> 
>> Found by inspection.
>> 
>> Signed-off-by: Justin Pettit 
>
>Thanks for the patch, Justin.  I tested the patch, and everything looks
>good.
>I want to hold on the acknowledge until all of us are on the same page.
>
>This will make the windows datapath consistent with future patches from the
>userspace.
>For consistency I would like to backport the patch all the way to
>branch-2.6.
>
>What do you think Sai?
>
>Thanks,
>Alin.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.

2017-12-07 Thread Justin Pettit


> On Dec 7, 2017, at 2:57 PM, Sairam Venugopal  wrote:
> 
> Thanks for the patch. Discussed about the patch offline. 
> 
> This is required since we have now introduced a newer api for deleting 
> 5-tuples
> and this breaks the assumption of zone-id not being in the right byte order.
> 
> Acked-by: Sairam Venugopal 

Thanks.  I pushed this to master.

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.

2017-12-07 Thread Sairam Venugopal
Thanks for the patch. Discussed about the patch offline. 

This is required since we have now introduced a newer api for deleting 5-tuples
and this breaks the assumption of zone-id not being in the right byte order.

Acked-by: Sairam Venugopal 





On 12/4/17, 11:30 PM, "Justin Pettit"  wrote:

>The zone Netlink attribute is supposed to be in network-byte order, but
>the Windows code for deleting conntrack entries was treating it as
>host-byte order.
>
>Found by inspection.
>
>Signed-off-by: Justin Pettit 
>---
> datapath-windows/ovsext/Conntrack.c | 2 +-
> lib/netlink-conntrack.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Conntrack.c 
>b/datapath-windows/ovsext/Conntrack.c
>index 3203411a8b7a..edc0ec9c5324 100644
>--- a/datapath-windows/ovsext/Conntrack.c
>+++ b/datapath-windows/ovsext/Conntrack.c
>@@ -1076,7 +1076,7 @@ OvsCtDeleteCmdHandler(POVS_USER_PARAMS_CONTEXT 
>usrParamsCtx,
> }
> 
> if (ctAttrs[CTA_ZONE]) {
>-zone = NlAttrGetU16(ctAttrs[CTA_ZONE]);
>+zone = ntohs(NlAttrGetU16(ctAttrs[CTA_ZONE]));
> }
> 
> status = OvsCtFlush(zone);
>diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c
>index 1e1bb2f79d1d..3c651b6c856a 100644
>--- a/lib/netlink-conntrack.c
>+++ b/lib/netlink-conntrack.c
>@@ -251,7 +251,7 @@ nl_ct_flush_zone(uint16_t flush_zone)
> 
> nl_msg_put_nfgenmsg(, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK,
> IPCTNL_MSG_CT_DELETE, NLM_F_REQUEST);
>-nl_msg_put_be16(, CTA_ZONE, flush_zone);
>+nl_msg_put_be16(, CTA_ZONE, htons(flush_zone));
> 
> err = nl_transact(NETLINK_NETFILTER, , NULL);
> ofpbuf_uninit();
>-- 
>2.7.4
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Keep inserting buckets into a group from changing group type.

2017-12-07 Thread Ben Pfaff
On Thu, Dec 07, 2017 at 02:03:56PM -0800, Ben Pfaff wrote:
> The "insert buckets" and "delete buckets" operations on a group should not
> change the group's type or properties, but the implementation did this by
> mistake.  This fixes the problem.
> 
> Reported-by: shivani dommeti 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045830.html
> Signed-off-by: Ben Pfaff 

This isn't well specified by OpenFlow so I filed EXT-570 in the ONF's
JIRA as a request for clarification.  (I expect no response.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto: Keep inserting buckets into a group from changing group type.

2017-12-07 Thread Ben Pfaff
The "insert buckets" and "delete buckets" operations on a group should not
change the group's type or properties, but the implementation did this by
mistake.  This fixes the problem.

Reported-by: shivani dommeti 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045830.html
Signed-off-by: Ben Pfaff 
---
 AUTHORS.rst   |  1 +
 ofproto/ofproto.c | 17 +
 tests/ofproto.at  | 25 +
 3 files changed, 43 insertions(+)

diff --git a/AUTHORS.rst b/AUTHORS.rst
index 075b2f72d47d..979cfd7bce41 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -589,6 +589,7 @@ likunyunkunyu...@hotmail.com
 meishengxin meisheng...@huawei.com
 neeraj mehtamehtaneera...@gmail.com
 rahim entezari  rahim.entez...@gmail.com
+shivani dommeti shivani.domm...@gmail.com
 weizj   34965...@qq.com
 俊 赵   zhaoju...@outlook.com
 冯全树(Crab)fqs...@126.com
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 82c2bb27d348..84eb18e901ed 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -6905,6 +6905,13 @@ handle_queue_get_config_request(struct ofconn *ofconn,
 return error;
 }
 
+/* Allocates, initializes, and constructs a new group in 'ofproto', obtaining
+ * all the attributes for it from 'gm', and stores a pointer to it in
+ * '*ofgroup'.  Makes the new group visible from the flow table starting from
+ * 'version'.
+ *
+ * Returns 0 if successful, otherwise an error code.  If there is an error then
+ * '*ofgroup' is indeterminate upon return. */
 static enum ofperr
 init_group(struct ofproto *ofproto, const struct ofputil_group_mod *gm,
ovs_version_t version, struct ofgroup **ofgroup)
@@ -7113,6 +7120,16 @@ modify_group_start(struct ofproto *ofproto, struct 
ofproto_group_mod *ogm)
 return OFPERR_OFPGMFC_UNKNOWN_GROUP;
 }
 
+/* Inserting or deleting a bucket should not change the group's type or
+ * properties, so change the group mod so that these aspects match the old
+ * group.  (See EXT-570.)  */
+if (ogm->gm.command == OFPGC15_INSERT_BUCKET ||
+ogm->gm.command == OFPGC15_REMOVE_BUCKET) {
+ogm->gm.type = old_group->type;
+ofputil_group_properties_destroy(>gm.props);
+ofputil_group_properties_copy(>gm.props, _group->props);
+}
+
 if (old_group->type != ogm->gm.type
 && (ofproto->n_groups[ogm->gm.type]
 >= ofproto->ogf.max_groups[ogm->gm.type])) {
diff --git a/tests/ofproto.at b/tests/ofproto.at
index c19a3d10daed..74a30cbefe3b 100644
--- a/tests/ofproto.at
+++ b/tests/ofproto.at
@@ -621,7 +621,32 @@ OFPST_GROUP_DESC reply (OF1.5):
  
group_id=1234,type=all,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1,bucket=bucket_id:10,actions=output:10,bucket=bucket_id:11,actions=output:11,bucket=bucket_id:12,actions=output:12,bucket=bucket_id:13,actions=output:13,bucket=bucket_id:14,actions=output:14,bucket=bucket_id:15,actions=output:15,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21
 ])
 
+# Delete groups.
+AT_CHECK([ovs-ofctl -O OpenFlow15 del-groups br0])
+
+# Add "fast_failover" group, then insert a bucket into it and make
+# sure that the type of the group doesn't change.  (There was a bug
+# that caused this to happen.)
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn add-group br0 group_id=1234,type=ff])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-groups br0], [0], [stdout])
+AT_CHECK([strip_xids < stdout], [0], [dnl
+OFPST_GROUP_DESC reply (OF1.5):
+ group_id=1234,type=ff
+])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 
group_id=1234,command_bucket_id=first,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-groups br0], [0], [stdout])
+AT_CHECK([strip_xids < stdout], [0], [dnl
+OFPST_GROUP_DESC reply (OF1.5):
+ 
group_id=1234,type=ff,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21
+])
+
 # Negative tests.
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 
group_id=1234,command_bucket_id=123,type=indirect,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1],
 [1], [],
+  [ovs-ofctl: type is not needed
+])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 
group_id=1234,command_bucket_id=123,selection_method=dp_hash,type=indirect,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1],
 [1], [],
+  [ovs-ofctl: selection method is not needed
+])
 AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 
group_id=1234,command_bucket_id=0xff01,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1],
 [1], [],
   [ovs-ofctl: invalid command bucket id 4294967041
 ])
-- 
2.10.2

___
dev 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Change assertion to log message.

2017-12-07 Thread Gregory Rose

On 12/7/2017 1:01 PM, Ben Pfaff wrote:

Until now, compose_output_action__() has asserted that a packet output to
a patch port is not to be truncated.  This commit changes this to an error
that will be included in trace output, for two reasons.  First, this sounds
like only a minor problem to me which doesn't warrant killing the process.
Second, it will be easier to track down the actual problem (if any) if we
can get a trace instead of a segfault.

Reported-by: Kevin Lin 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html
Signed-off-by: Ben Pfaff 
---
  ofproto/ofproto-dpif-xlate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index fcced344ed8a..60eaf18038ca 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3701,7 +3701,9 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t 
ofp_port,
  }
  
  if (xport->peer) {

-   ovs_assert(!truncate);
+   if (truncate) {
+   xlate_report_error(ctx, "Cannot truncate output to patch port");
+   }
 patch_port_output(ctx, xport, xport->peer);
 return;
  }


I agree that is a better approach.

Reviewed-by: Greg Rose 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/2] Support conntrack flush by ct 5-tuple

2017-12-07 Thread Justin Pettit


> On Dec 7, 2017, at 10:40 AM, Yi-Hung Wei  wrote:
> 
> This patch series add support to flush a connection tracking entry
> specified by a 5-tuple.

Thanks for the patches.  I pushed the series to the master branch.

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Argumentación jurídica para las áreas contable y financiera

2017-12-07 Thread Cómo ganar casos difíciles en materia fiscal
Conozca, interprete y aplique la ley a su favor

Argumentación jurídica para las áreas contable y financiera
18 de Diciembre - L.C. y L.D. Jacobo Meraz Sotelo 9am-6pm

Es de suma importancia que los empresarios y los encargados de las áreas 
contables y administrativas conozcan la interpretación y argumentación de las 
normas jurídicas tributarias a efecto de hacer valer sus derechos ante actos de 
autoridad hacendaria, para proteger con eficacia la libertad y el patrimonio de 
los contribuyentes, y el funcionamiento de sus negocios, sobre todo, con las 
modificaciones en tema de comprobantes fiscales vigentes a partir del 1 de 
diciembre. 

BENEFICIOS DE ASISTIR: 

- Conocerá el marco jurídico normativo, así como los derechos fundamentales.
- Entenderá conceptos y métodos de interpretación de la ley fiscal. 
- Aprenderá a preparar argumentos los cuales aplicará mediante un caso práctico.
- ¡Mucha más información valiosa! 

¿Requiere la información a la Brevedad? responda este email con la palabra: 
Contable + nombre - teléfono - correo.


centro telefónico:018002120744


 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-xlate: Change assertion to log message.

2017-12-07 Thread Ben Pfaff
Until now, compose_output_action__() has asserted that a packet output to
a patch port is not to be truncated.  This commit changes this to an error
that will be included in trace output, for two reasons.  First, this sounds
like only a minor problem to me which doesn't warrant killing the process.
Second, it will be easier to track down the actual problem (if any) if we
can get a trace instead of a segfault.

Reported-by: Kevin Lin 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html
Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif-xlate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index fcced344ed8a..60eaf18038ca 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3701,7 +3701,9 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t 
ofp_port,
 }
 
 if (xport->peer) {
-   ovs_assert(!truncate);
+   if (truncate) {
+   xlate_report_error(ctx, "Cannot truncate output to patch port");
+   }
patch_port_output(ctx, xport, xport->peer);
return;
 }
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.8 v1] OVN: Add external_ids to NAT and Logical_Router_Static_Route tables.

2017-12-07 Thread Russell Bryant
On Wed, Dec 6, 2017 at 4:59 AM,   wrote:
> From: Lucas Alvares Gomes 
>
> The external_ids column is missing from the NAT and
> Logical_Router_Static_Route tables.
>
> As discussed at [0] the change to the schema for this backport should
> leave the version number unmodified.
>
> [0]
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341630.html
>
> Signed-off-by: Lucas Alvares Gomes 

Thanks, Lucas.  I have applied this backport to branch-2.8.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.

2017-12-07 Thread Bodireddy, Bhanuprakash
>On 07/12/17 14:28, Ilya Maximets wrote:
>> Thanks for review, comments inline.
>>
>> On 07.12.2017 15:49, Eelco Chaudron wrote:
>>> On 01/12/17 16:44, Ilya Maximets wrote:
 This is preparation for 'struct dp_netdev_pmd_thread' modification
 in upcoming commits. Needed to avoid reordering and regrouping while
 replacing old and adding new members.

>>> Should this be part of the TX batching set? Anyway, I'm ok if it's
>>> not stalling the approval :)
>> Unfortunately yes, because members reordered and regrouped just to
>> include new members: pmd->ctx and pmd->n_output_batches. This could
>> not be a standalone change because adding of different members will
>> require different regrouping/ reordering. I moved this change to a
>> separate patch to not do this twice while adding each member in patches
>2/7 and 6/7.
>>
>> Anyway, as I mentioned in cover letter, I still prefer reverting of
>> the padding at all by this patch:
>>
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

I understand that with PADDED_MEMBERS macro it was slightly tricky to extend or 
reorganize the structure and so suggested 'pahole'.
But I see that the problem hasn't gone and still there are some strong opinions 
on reverting the earlier effort.

I  don’t mind reverting the patch but would be nice if the changes to this 
structure are made keeping alignment in mind.

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.

2017-12-07 Thread Bodireddy, Bhanuprakash
>> >> If CPU just skips this instruction we will lost all the prefetching
>> >> optimizations because all the calls will be replaced by non-existent
>'prefetchwt1'.
>> >
>> > [Bhanu] I would be worried if core generates an exception treating
>> > it as illegal instruction. Instead pipeline units treat this as NOP
>> > if it
>> doesn't support it.
>> > So the micro optimizations doesn't really do any thing on the processors
>that doesn't support it.
>>
>> This could be an issue. If someday we'll have real performance
>> optimization based on OPCH_HTW prefetch, we will have prefetchwt1 on
>> system that supports it and NOP on others even if they have usual
>> prefetchw which could provide performance improvement too.

[Bhanu]  Adding the below information only for future reference, (going to 
point to this thread in the commit log)

On systems that has *only* prefetchw and no prefetchwt1 instruction.
 OPCH_LTW-   prefetchw 
 OPCH_MTW  -   prefetchw
 OPCH_HTW   -prefetchw
 OPCH_NTW   -prefetchw

On systems that supports both prefetchw and prefetchwt1,
 OPCH_LTW-   prefetchwt1
 OPCH_MTW  -   prefetchwt1
 OPCH_HTW   -prefetchw
 OPCH_NTW   -prefetchwt1

So OPCH_HTW would always be prefetchw and LTW/MTW/HTW  might turn in to NOPs on 
processors that support prefetchw alone.
(when compiled with CFLAGS = -march=native -mprefetchwt1)

>>
>> As I understand, checking of '-mprefetchwt1' is equal to checking
>> compiler version. It doesn't check anything about supporting of this
>instruction in CPU.
>> This could end up with non-working performance optimizations and even
>> degradation on systems that supports usual prefetches but not
>> prefetchwt1 (useless NOPs degrades performance if they are on a hot
>path).
>>
>> IMHO, This compiler option should be passed only if CPU really supports it.
>> I guess, the maximum that we can do is add a note into performance
>> optimization guide that '-mprefetchwt1' could be passed via CFLAGS if
>> user sure that it supported by target CPU.
>
>That is my thinking as well. The people/organizations building OVS packages
>for deployment have the responsibility to specify the minimum requirements
>on the target architecture and feed that into the compiler using CFLAGS. That
>may well be leaning towards the lower end of capabilities to maximize
>compatibility and sacrifice some performance on high-end CPUs.
>
>The specialized prefetch macros should be mapped to the best available
>target instructions by the compiler and/or conditional compile directives
>based on the CFLAGS architecture settings.
>
>We would gather all these target-specific compiler optimization guidelines in
>the advanced DPDK documentation of OVS.
>
>Of course developers or benchmark testers are free to use -march=native or
>similar at their discretion in their local test beds for best possible 
>performance.

If the general view is get rid of this flag at compilation and only to document 
this, I am happy with this and can update the documentation.
But I still think we are being too defensive here and with few NOPs performance 
impact isn't even noticeable. 

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 7/8] net: ovs: remove unused hardirq.h

2017-12-07 Thread Pravin Shelar
On Fri, Nov 17, 2017 at 3:02 PM, Yang Shi  wrote:
> Preempt counter APIs have been split out, currently, hardirq.h just
> includes irq_enter/exit APIs which are not used by openvswitch at all.
>
> So, remove the unused hardirq.h.
>
> Signed-off-by: Yang Shi 
> Cc: Pravin Shelar 
> Cc: "David S. Miller" 
> Cc: d...@openvswitch.org

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 7/8] net: ovs: remove unused hardirq.h

2017-12-07 Thread Yang Shi

Hi folks,

Any comment on this one?

Thanks,
Yang


On 11/17/17 5:48 PM, Yang Shi wrote:
It looks the email address of Pravin in MAINTAINERS file is obsolete, 
sent to the right address.


Yang


On 11/17/17 3:02 PM, Yang Shi wrote:

Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by openvswitch at all.

So, remove the unused hardirq.h.

Signed-off-by: Yang Shi 
Cc: Pravin Shelar 
Cc: "David S. Miller" 
Cc: d...@openvswitch.org
---
  net/openvswitch/vport-internal_dev.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c

index 04a3128..2f47c65 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -16,7 +16,6 @@
   * 02110-1301, USA
   */
-#include 
  #include 
  #include 
  #include 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpctl: Fix comment describing get_one_dp().

2017-12-07 Thread Justin Pettit

> On Dec 7, 2017, at 8:21 AM, Ben Pfaff  wrote:
> 
>> On Wed, Dec 06, 2017 at 10:19:23PM -0800, Justin Pettit wrote:
>> Signed-off-by: Justin Pettit 
>> ---
>> lib/dpctl.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/dpctl.c b/lib/dpctl.c
>> index 06cfbc4abf21..5e064ec9c100 100644
>> --- a/lib/dpctl.c
>> +++ b/lib/dpctl.c
>> @@ -129,8 +129,8 @@ if_up(struct netdev *netdev)
>> }
>> 
>> /* Retrieve the name of the datapath if exactly one exists.  The caller
>> - * is responsible for freeing the returned string.  If there is not one
>> - * datapath, aborts with an error message. */
>> + * is responsible for freeing the returned string.  If a single datapath
>> + * name cannot be determined, returns NULL. */
>> static char *
>> get_one_dp(struct dpctl_params *dpctl_p)
> 
> Acked-by: Ben Pfaff 

Thanks. I pushed this to master. 

—Justin



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/2] dpctl: Support flush conntrack by conntrack 5-tuple

2017-12-07 Thread Yi-Hung Wei
With this patch, "flush-conntrack" in ovs-dpctl and ovs-appctl accept
a conntrack 5-tuple to delete the conntrack entry specified by the 5-tuple.
For example, user can use the following command to flush a conntrack entry
in zone 5.

$ ovs-dpctl flush-conntrack zone=5 \

'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'
$ ovs-appctl dpctl/flush-conntrack zone=5 \

'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'

VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei 
---
 NEWS |   2 +
 lib/ct-dpif.c| 108 +++
 lib/ct-dpif.h|   1 +
 lib/dpctl.c  |  76 +--
 lib/dpctl.man|  20 ++--
 tests/system-kmod-macros.at  |   8 +++
 tests/system-traffic.at  |  65 +++
 tests/system-userspace-macros.at |  10 
 utilities/ovs-dpctl.c|   4 +-
 9 files changed, 272 insertions(+), 22 deletions(-)

diff --git a/NEWS b/NEWS
index 427c8f83d8b2..188a0757a797 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,8 @@ Post-v2.8.0
  * ovn-ctl: New commands run_nb_ovsdb and run_sb_ovsdb.
- Linux kernel 4.13
  * Add support for compiling OVS with the latest Linux 4.13 kernel
+   - "flush-conntrack" in ovs-dpctl and ovs-appctl now accept a 5-tuple to
+ delete a specific connection tracking entry.
 
 v2.8.0 - 31 Aug 2017
 
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index cee4791565fb..239c848d3735 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -20,6 +20,7 @@
 #include 
 
 #include "ct-dpif.h"
+#include "openvswitch/ofp-parse.h"
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(ct_dpif);
@@ -435,3 +436,110 @@ ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, 
int conn_per_state)
 ds_put_cstr(ds, "]");
 ds_put_format(ds, "=%u", conn_per_state);
 }
+
+/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
+ * Returns true on success.  Otherwise, returns false and puts the error
+ * message in 'ds'. */
+bool
+ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct ds *ds)
+{
+char *pos, *key, *value, *copy;
+memset(tuple, 0, sizeof *tuple);
+
+pos = copy = xstrdup(s);
+while (ofputil_parse_key_value(, , )) {
+if (!*value) {
+ds_put_format(ds, "field %s missing value", key);
+goto error;
+}
+
+if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) {
+if (tuple->l3_type && tuple->l3_type != AF_INET) {
+ds_put_cstr(ds, "L3 type set multiple times");
+goto error;
+} else {
+tuple->l3_type = AF_INET;
+}
+if (!ip_parse(value, key[6] == 's' ? >src.ip :
+ >dst.ip)) {
+goto error_with_msg;
+}
+} else if (!strcmp(key, "ct_ipv6_src") ||
+   !strcmp(key, "ct_ipv6_dst")) {
+if (tuple->l3_type && tuple->l3_type != AF_INET6) {
+ds_put_cstr(ds, "L3 type set multiple times");
+goto error;
+} else {
+tuple->l3_type = AF_INET6;
+}
+if (!ipv6_parse(value, key[8] == 's' ? >src.in6 :
+   >dst.in6)) {
+goto error_with_msg;
+}
+} else if (!strcmp(key, "ct_nw_proto")) {
+char *err = str_to_u8(value, key, >ip_proto);
+if (err) {
+free(err);
+goto error_with_msg;
+}
+} else if (!strcmp(key, "ct_tp_src") || !strcmp(key,"ct_tp_dst")) {
+uint16_t port;
+char *err = str_to_u16(value, key, );
+if (err) {
+free(err);
+goto error_with_msg;
+}
+if (key[6] == 's') {
+tuple->src_port = htons(port);
+} else {
+tuple->dst_port = htons(port);
+}
+} else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
+   !strcmp(key, "icmp_id") ) {
+if (tuple->ip_proto != IPPROTO_ICMP &&
+tuple->ip_proto != IPPROTO_ICMPV6) {
+ds_put_cstr(ds, "invalid L4 fields");
+goto error;
+}
+uint16_t icmp_id;
+char *err;
+if (key[5] == 't') {
+err = str_to_u8(value, key, >icmp_type);
+} else if (key[5] == 'c') {
+err = str_to_u8(value, key, >icmp_code);
+} else {
+err = str_to_u16(value, key, _id);
+tuple->icmp_id = htons(icmp_id);
+}
+if (err) {
+free(err);
+goto error_with_msg;
+}

[ovs-dev] [PATCH v3 1/2] ct-dpif, dpif-netlink: Support conntrack flush by ct 5-tuple

2017-12-07 Thread Yi-Hung Wei
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack.  Future patches will add support for the userspace and
Windows datapaths.

VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei 
---
 lib/ct-dpif.c   | 24 +---
 lib/ct-dpif.h   |  3 +-
 lib/dpctl.c |  2 +-
 lib/dpif-netdev.c   |  6 ++-
 lib/dpif-netlink.c  |  7 +++-
 lib/dpif-provider.h | 16 ++--
 lib/netlink-conntrack.c | 97 +
 lib/netlink-conntrack.h |  1 +
 ofproto/ofproto-dpif.c  |  2 +-
 tests/ovs-ofctl.at  |  2 +-
 10 files changed, 144 insertions(+), 16 deletions(-)

diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index c79e69e23970..cee4791565fb 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -110,20 +110,32 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
 : EOPNOTSUPP);
 }
 
-/* Flush the entries in the connection tracker used by 'dpif'.
+/* Flush the entries in the connection tracker used by 'dpif'.  The
+ * arguments have the following behavior:
  *
- * If 'zone' is not NULL, flush only the entries in '*zone'. */
+ *   - If both 'zone' and 'tuple' are NULL, flush all the conntrack entries.
+ *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the conntrack
+ * entries in '*zone'.
+ *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'tuple'
+ * in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */
 int
-ct_dpif_flush(struct dpif *dpif, const uint16_t *zone)
+ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
+  const struct ct_dpif_tuple *tuple)
 {
-if (zone) {
-VLOG_DBG("%s: ct_flush: %"PRIu16, dpif_name(dpif), *zone);
+if (tuple) {
+struct ds ds = DS_EMPTY_INITIALIZER;
+ct_dpif_format_tuple(, tuple);
+VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif), ds_cstr(),
+zone ? *zone : 0);
+ds_destroy();
+} else if (zone) {
+VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone);
 } else {
 VLOG_DBG("%s: ct_flush: ", dpif_name(dpif));
 }
 
 return (dpif->dpif_class->ct_flush
-? dpif->dpif_class->ct_flush(dpif, zone)
+? dpif->dpif_class->ct_flush(dpif, zone, tuple)
 : EOPNOTSUPP);
 }
 
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index d5f966175bc0..ef019050c78e 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -195,7 +195,8 @@ int ct_dpif_dump_start(struct dpif *, struct 
ct_dpif_dump_state **,
const uint16_t *zone, int *);
 int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
 int ct_dpif_dump_done(struct ct_dpif_dump_state *);
-int ct_dpif_flush(struct dpif *, const uint16_t *zone);
+int ct_dpif_flush(struct dpif *, const uint16_t *zone,
+  const struct ct_dpif_tuple *);
 void ct_dpif_entry_uninit(struct ct_dpif_entry *);
 void ct_dpif_format_entry(const struct ct_dpif_entry *, struct ds *,
   bool verbose, bool print_stats);
diff --git a/lib/dpctl.c b/lib/dpctl.c
index 6520788aa428..7fc0e3afab37 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1353,7 +1353,7 @@ dpctl_flush_conntrack(int argc, const char *argv[],
 return error;
 }
 
-error = ct_dpif_flush(dpif, pzone);
+error = ct_dpif_flush(dpif, pzone, NULL);
 
 dpif_close(dpif);
 return error;
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630c2712..b1ef9a6a5992 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5734,10 +5734,14 @@ dpif_netdev_ct_dump_done(struct dpif *dpif OVS_UNUSED,
 }
 
 static int
-dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone)
+dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone,
+ const struct ct_dpif_tuple *tuple)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 
+if (tuple) {
+return EOPNOTSUPP;
+}
 return conntrack_flush(>conntrack, zone);
 }
 
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index c265909f8587..3f76128999d1 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2892,9 +2892,12 @@ dpif_netlink_ct_dump_done(struct dpif *dpif OVS_UNUSED,
 }
 
 static int
-dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, const uint16_t *zone)
+dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, const uint16_t *zone,
+  const struct ct_dpif_tuple *tuple)
 {
-if (zone) {
+if (tuple) {
+return nl_ct_flush_tuple(tuple, zone ? *zone : 0);
+} else if (zone) {
 return nl_ct_flush_zone(*zone);
 } else {
 return nl_ct_flush();
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 

[ovs-dev] [PATCH v3 0/2] Support conntrack flush by ct 5-tuple

2017-12-07 Thread Yi-Hung Wei
This patch series add support to flush a connection tracking entry
specified by a 5-tuple.

v2 -> v3:
 * Rebase to master.
 * Fold in Justin's review comments.
 * Pull out dpif-netdev implementation.
   Will fix the conntack expectation issue, and submit a follow up patch.
 * Skip the ct flush by 5-tuple test on userspace datapath.

v1 -> v2:
 * Incoporate suggestions from Justin.
 * Add dpif-netdev implementation.
 * Improve error reporting logic in dpctl.c.
 * Support ICMP 5-tuple.
 * Rebase to master.


Yi-Hung Wei (2):
  ct-dpif,dpif-netlink: Support conntrack flush by ct 5-tuple
  dpctl: Support flush conntrack by conntrack 5-tuple

 NEWS |   2 +
 lib/ct-dpif.c| 132 +--
 lib/ct-dpif.h|   4 +-
 lib/dpctl.c  |  76 +-
 lib/dpctl.man|  20 --
 lib/dpif-netdev.c|   6 +-
 lib/dpif-netlink.c   |   7 ++-
 lib/dpif-provider.h  |  16 -
 lib/netlink-conntrack.c  |  97 
 lib/netlink-conntrack.h  |   1 +
 ofproto/ofproto-dpif.c   |   2 +-
 tests/ovs-ofctl.at   |   2 +-
 tests/system-kmod-macros.at  |   8 +++
 tests/system-traffic.at  |  65 +++
 tests/system-userspace-macros.at |  10 +++
 utilities/ovs-dpctl.c|   4 +-
 16 files changed, 415 insertions(+), 37 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 3/3] dpctl: Support flush conntrack by conntrack 5-tuple

2017-12-07 Thread Yi-Hung Wei
> I don't think the parentheses are necessary around "ct-tuple".
>
> I've attached an incremental patch to this message.  Can you take a look?  If 
> you agree with the changes, I'll commit this patch and the first from this 
> series.
>
> Thanks,
>
> --Justin

Thanks for the review. I think all the comments are valuable, I fold them in v3.

As you mentioned in the review comment on the second patch, I will
pull out the second patch first, and re-submit it separately once I
fix the conntrack expectation issue. Moreover, since the userspace
datapath does not support flush by ct-tuple for now, I add a macro to
skip the system traffic test on the userspace datapath. The diff after
fold in your comments are as following.

Thanks,

-Yi-Hung

===
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index a7c6808ad1b4..34db21a564ae 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -96,3 +96,11 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK])
 # always supports NAT, so no check is needed.
 #
 m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrack by
+# conntrack 5-tuple test. The kernel datapath does support this
+# feature. Will remove this check after both kernel and userspace datapath
+# support it.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 4c638118d234..56aae69538cf 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -834,6 +834,7 @@ AT_CLEANUP

 AT_SETUP([conntrack - ct flush by 5-tuple])
 CHECK_CONNTRACK()
+CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
 OVS_TRAFFIC_VSWITCHD_START()

 ADD_NAMESPACES(at_ns0, at_ns1)
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index d3d27bb2b8f2..f22061298985 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -99,3 +99,13 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK],
 # datapath supports NAT.
 #
 m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrack by
+# conntrack 5-tuple test. The userspace datapath does not support
+# this feature yet.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
+[
+AT_SKIP_IF([:])
+])
===
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.

2017-12-07 Thread Mooney, Sean K


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Chandran, Sugesh
> Sent: Thursday, December 7, 2017 5:07 PM
> To: O Mahony, Billy ; d...@openvswitch.org;
> b...@ovn.org
> Subject: Re: [ovs-dev] [PATCH 2/2] Adding configuration option to
> whitelist DPDK physical ports.
> 
> 
> 
> Regards
> _Sugesh
> 
> > -Original Message-
> > From: O Mahony, Billy
> > Sent: Thursday, December 7, 2017 11:47 AM
> > To: Chandran, Sugesh ;
> d...@openvswitch.org;
> > b...@ovn.org
> > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > whitelist DPDK physical ports.
> >
> > Hi Sugesh,
> >
> > > -Original Message-
> > > From: Chandran, Sugesh
> > > Sent: Wednesday, December 6, 2017 6:23 PM
> > > To: O Mahony, Billy ;
> d...@openvswitch.org;
> > > b...@ovn.org
> > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > > whitelist DPDK physical ports.
> > >
> > > Thank you Billy for the review.
> > > Please find below my reply.
> > >
> > > Regards
> > > _Sugesh
> > >
> > >
> > > > -Original Message-
> > > > From: O Mahony, Billy
> > > > Sent: Wednesday, December 6, 2017 5:31 PM
> > > > To: Chandran, Sugesh ;
> > > > d...@openvswitch.org; b...@ovn.org
> > > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > > > whitelist DPDK physical ports.
> > > >
> > > > Hi Sugesh,
> > > >
> > > > This is definitely a very useful feature. I'm looking forward to
> > > > running trex on the same DUT as my ovs-dpdk.
[Mooney, Sean K]  you can all ready to this you just need to set the whitelist 
In other_config:dpdk-extra just repeat "-w $address" for each device.
To have two dpdk primary processes on the same system you will also need to 
change
The hugepage prfix used be dpdk which you can also do via the dpdk-extra option.

After this patch we will still be able to specify the whitelist using
other_config:dpdk-extra correct? If not this may break ovs-dpdk support
in openstack installers. I ported our whitelist code in networking-ovs-dpdk to 
use dpdk-extra when
when we moved the dpdk params to the db and I also added it to kolla.
im pretty sure tripple0 and fule also do the same.

> > > >
> > > > However I'd suggest adding an sscanf or some such to verify that
> > > > the domain is also specified for each whitelist member. And
> either
> > > > add the default of '' or complain loudly if the domain is
> absent.
> > > [Sugesh] Will throw an error in that case then .
> > >
> > > >
> > > > Currently (without this patch) you must specify the domain when
> > > > adding
> > ports:
> > > >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else
> an
> > > > error such as 'Cannot find unplugged device (05:00.0)'  is
> reported.
> > > >
> > > > And with the patch if you include the domain in the other_config
> (e.g.
> > > > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything
> > > > works just as before.
> > > >
> > > > However with the patch if you add the whitelist *without* a
> domain e.g.
> > > > ovs-vsctl --no-wait set Open_vSwitch .
> > > > other_config:dpdk-whitelist-pci- ids="05:00.0"
> > > >
> > > > There is no immediate error. However later when doing add-port if
> > > > you include the domain (current required practice) you will get
> an error.
> > > > If you omit the domain all is well.
> > > [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the
> ''.
> > > But to bind and PCI scan its not necessary.
> > > So to keep it consistent, I would add check for PCI-ID in whitelist
> > > config too, and throw error incase pci-id are mentioned wrong(means
> > > without
> > ''.
[Mooney, Sean K] don't assume it is '' it is only '' if the pci device 
is
A child of the first pci root usally connected to socket 0 unless you server is 
old
Enough to still use a FSB instead of qpi/dmi.

> > > Does it looks OK to you?
> >
> > [[BO'M]] I think the error is the right thing to do. It would be
> > tempting to insert the default '' if the domain is omitted but
> > then you would have a confusing inconsistency in that it would be ok
> > to omit the domain in one place (whitelist) but not in the other
> (add-port).
> >
> [Sugesh] Thank you Billy,Will add the check and error out accordingly.
> Will release the next patch version by incorporating the comments, once
> you have complete the review of second part of the patch too.
[Mooney, Sean K] it should be noted that the dpdk devbing script will only work 
without
The domain if its unambiguious. On a multi socket system with the right or 
wrong depending
on your perspective placement of physical pci devise it possible to get 
colltions in which
case dpdk-devbind will not be able to correctly determing what device to manage 
so you
should always use the full pci address.

> > > >
> > > > It's a 

[ovs-dev] [PATCH V5 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Mark Kavanagh
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.

This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.

This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:

ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.

Signed-off-by: Mark Kavanagh 
---
 Documentation/topics/dpdk/vhost-user.rst | 28 
 NEWS |  1 +
 lib/dpdk-stub.c  |  6 ++
 lib/dpdk.c   | 12 
 lib/dpdk.h   |  3 +++
 lib/netdev-dpdk.c| 14 ++
 vswitchd/vswitch.xml | 15 +++
 7 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index 1a5de24..8b1c671 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -273,6 +273,34 @@ One benefit of using this mode is the ability for vHost 
ports to 'reconnect' in
 event of the switch crashing or being brought down. Once it is brought back up,
 the vHost ports will reconnect automatically and normal service will resume.
 
+vhost-user-client IOMMU Support
+~~~
+
+vhost IOMMU is a feature which restricts the vhost memory that a virtio device
+can access, and as such is useful in deployments in which security is a
+concern.
+
+IOMMU support may be enabled via a global config value,
+```vhost-iommu-support```. Setting this to true enables vhost IOMMU support for
+all vhost ports when/where available::
+
+$ ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
+
+The default value is false.
+
+.. important::
+
+Changing this value requires restarting the daemon.
+
+.. important::
+
+Enabling the IOMMU feature also enables the vhost user reply-ack protocol;
+this is known to work on QEMU v2.10.0, but is buggy on older versions
+(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is disabled by
+default (and should remain so if using the aforementioned versions of
+QEMU). Starting with QEMU v2.9.1, vhost-iommu-support can safely be
+enabled, even without having an IOMMU device, with no performance penalty.
+
 .. _dpdk-testpmd:
 
 DPDK in the Guest
diff --git a/NEWS b/NEWS
index d4a1c9a..99c47d8 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.8.0
  * Add support for compiling OVS with the latest Linux 4.13 kernel
- DPDK:
  * Add support for DPDK v17.11
+ * Add support for vHost IOMMU
 
 v2.8.0 - 31 Aug 2017
 
diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index daef729..3602180 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void)
 {
 return NULL;
 }
+
+bool
+dpdk_vhost_iommu_enabled(void)
+{
+return false;
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 8da6c32..6710d10 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -41,6 +41,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 static FILE *log_stream = NULL;   /* Stream for DPDK log redirection */
 
 static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
+static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */
 
 static int
 process_vhost_flags(char *flag, const char *default_val, int size,
@@ -345,6 +346,11 @@ dpdk_init__(const struct smap *ovs_other_config)
 vhost_sock_dir = sock_dir_subcomponent;
 }
 
+vhost_iommu_enabled = smap_get_bool(ovs_other_config,
+"vhost-iommu-support", false);
+VLOG_INFO("IOMMU support for vhost-user-client %s.",
+   vhost_iommu_enabled ? "enabled" : "disabled");
+
 argv = grow_argv(, 0, 1);
 argc = 1;
 argv[0] = xstrdup(ovs_get_program_name());
@@ -482,6 +488,12 @@ dpdk_get_vhost_sock_dir(void)
 return vhost_sock_dir;
 }
 
+bool
+dpdk_vhost_iommu_enabled(void)
+{
+return vhost_iommu_enabled;
+}
+
 void
 dpdk_set_lcore_id(unsigned cpu)
 {
diff --git a/lib/dpdk.h b/lib/dpdk.h
index 673a1f1..dc58d96 100644
--- a/lib/dpdk.h

[ovs-dev] [PATCH V5 1/2] netdev-dpdk: DPDK v17.11 upgrade

2017-12-07 Thread Mark Kavanagh
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug

Signed-off-by: Mark Kavanagh 
Acked-by: Maxime Coquelin 
Acked-by: Ciara Loftus 
Acked-by: Jan Scheurich 
Tested-by: Jan Scheurich 
Tested-by: Guoshuai Li 
---
 .travis/linux-build.sh   |  2 +-
 Documentation/faq/releases.rst   |  1 +
 Documentation/intro/install/dpdk.rst | 24 +++-
 Documentation/topics/dpdk/ring.rst   |  2 +-
 Documentation/topics/dpdk/vhost-user.rst |  5 +
 NEWS |  2 ++
 lib/netdev-dpdk.c|  5 +++--
 7 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 4d6459f..ed28ee4 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -81,7 +81,7 @@ fi
 
 if [ "$DPDK" ]; then
 if [ -z "$DPDK_VER" ]; then
-DPDK_VER="17.05.2"
+DPDK_VER="17.11"
 fi
 install_dpdk $DPDK_VER
 if [ "$CC" = "clang" ]; then
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index d903b06..62a1957 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -164,6 +164,7 @@ Q: What DPDK version does each Open vSwitch release work 
with?
 2.6.x16.07.2
 2.7.x16.11.3
 2.8.x17.05.2
+2.9.x17.11
  ===
 
 Q: I get an error like this when I configure Open vSwitch:
diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index bb69ae5..8c036dc 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -40,7 +40,7 @@ Build requirements
 In addition to the requirements described in :doc:`general`, building Open
 vSwitch with DPDK will require the following:
 
-- DPDK 17.05.2
+- DPDK 17.11
 
 - A `DPDK supported NIC`_
 
@@ -69,9 +69,9 @@ Install DPDK
 #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
 
$ cd /usr/src/
-   $ wget http://fast.dpdk.org/rel/dpdk-17.05.2.tar.xz
-   $ tar xf dpdk-17.05.2.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-stable-17.05.2
+   $ wget http://fast.dpdk.org/rel/dpdk-17.11.tar.xz
+   $ tar xf dpdk-17.11.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-17.11
$ cd $DPDK_DIR
 
 #. (Optional) Configure DPDK as a shared library
@@ -583,7 +583,21 @@ Limitations
   The latest list of validated firmware versions can be found in the `DPDK
   release notes`_.
 
-.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_05.html
+.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html
+
+- The DPDK v17.11 virtio net driver contains a bug that prevents guest DPDK
+  applications from receiving packets. This only occurs when guest-bound 
traffic
+  is live before a DPDK application is started within the guest, and where two
+  or more forwarding cores are used. As such, it is not recommended for guests
+  which execute DPDK applications to upgrade to DPDK v17.11.
+
+  Note that this issue does not occur when guest network devices are controlled
+  by the guest kernel.
+
+  Details regarding the virtio driver bug are available on the `DPDK mailing
+  list`_.
+
+.. _DPDK mailing list: 
http://dpdk.org/ml/archives/dev/2017-December/082801.html
 
 Reporting Bugs
 --
diff --git a/Documentation/topics/dpdk/ring.rst 
b/Documentation/topics/dpdk/ring.rst
index ad9d7a5..8d0ede8 100644
--- a/Documentation/topics/dpdk/ring.rst
+++ b/Documentation/topics/dpdk/ring.rst
@@ -77,4 +77,4 @@ DPDK. However, this functionality was removed because:
 - :doc:`vhost-user interfaces ` are the defacto DPDK-based path to
   guests
 
-.. _DPDK documentation: 
https://dpdk.readthedocs.io/en/v17.05/prog_guide/ring_lib.html
+.. _DPDK documentation: 
https://dpdk.readthedocs.io/en/v17.11/prog_guide/ring_lib.html
diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index 74ac06e..1a5de24 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -287,6 +287,11 @@ application in the VM.
 
   Support for DPDK in the guest requires QEMU >= 2.2
 
+.. important::
+
+  DPDK v17.11 virtio PMD contains a bug that affects testpmd/DPDK guest
+  applications. As such, guests should not upgrade to 17.11.
+
 To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
 :ref:`dpdk-vhost-user-client`. Once started, connect to the VM, download the
 DPDK sources to VM and build DPDK::
diff --git a/NEWS b/NEWS
index 427c8f8..d4a1c9a 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,8 @@ 

[ovs-dev] [PATCH V5 0/2] DPDK v17.11 Support

2017-12-07 Thread Mark Kavanagh
This patchset adds support for DPDK v17.11:
- the first patch introduces minor code updates to accomodate DPDK API
  changes, and also updates Documentation and travis scripts.
- the second patch adds a new global configuration option,
  vhost-iommu-support; this is required in order to take advantage of
  the vHost IOMMU feature introduced in DPDK v17.11.

---

v5:  - document DPDK v17.11 virtio PMD bug in guest
 - minor typographical corrections in documentation 

v4:  - rebase to HEAD of master
 - clarify that IOMMU support applies exclusively to vhost user
   client ports.
 - reword caveats regarding modifying vhost-iommu-support at runtime
 - use smap_get_bool() to parse vhost-iommu-support, instead of
   process_vhost_flags().
 - add stub implementation of dpdk_vhost_iommu_enabled().
 - move stdbool.h #include outside of DPDK compiler guards.
 - remove mention of vhost IOMMU mode from
   netdev_dpdk_vhost_client_configure() log.

v3:  - erroneous; disregard.

v2:  - refactor vHost IOMMU enablement mechanism (use a global config
   value, instead of the previous per-port approach).

Mark Kavanagh (2):
  netdev-dpdk: DPDK v17.11 upgrade
  netdev-dpdk: vHost IOMMU support

 .travis/linux-build.sh   |  2 +-
 Documentation/faq/releases.rst   |  1 +
 Documentation/intro/install/dpdk.rst | 24 ++-
 Documentation/topics/dpdk/ring.rst   |  2 +-
 Documentation/topics/dpdk/vhost-user.rst | 33 
 NEWS |  3 +++
 lib/dpdk-stub.c  |  6 ++
 lib/dpdk.c   | 12 
 lib/dpdk.h   |  3 +++
 lib/netdev-dpdk.c| 19 --
 vswitchd/vswitch.xml | 15 +++
 11 files changed, 107 insertions(+), 13 deletions(-)

-- 
1.9.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 3/6] datapath: conntrack: make protocol tracker pointers const

2017-12-07 Thread Greg Rose
Upstream commit:
commit b3480fe059ac9121b5714205b4ddae14b59ef4be
Author: Florian Westphal 
Date:   Sat Aug 12 00:57:08 2017 +0200

netfilter: conntrack: make protocol tracker pointers const

Doesn't change generated code, but will make it easier to eventually
make the actual trackers themselvers const.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 

Signed-off-by: Greg Rose 
---
 datapath/conntrack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index d517a87..c97e3e4 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -613,8 +613,8 @@ static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 u8 l3num, struct sk_buff *skb, bool natted)
 {
-   struct nf_conntrack_l3proto *l3proto;
-   struct nf_conntrack_l4proto *l4proto;
+   const struct nf_conntrack_l3proto *l3proto;
+   const struct nf_conntrack_l4proto *l4proto;
struct nf_conntrack_tuple tuple;
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.

2017-12-07 Thread Chandran, Sugesh


Regards
_Sugesh

> -Original Message-
> From: O Mahony, Billy
> Sent: Thursday, December 7, 2017 11:47 AM
> To: Chandran, Sugesh ; d...@openvswitch.org;
> b...@ovn.org
> Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist
> DPDK physical ports.
> 
> Hi Sugesh,
> 
> > -Original Message-
> > From: Chandran, Sugesh
> > Sent: Wednesday, December 6, 2017 6:23 PM
> > To: O Mahony, Billy ; d...@openvswitch.org;
> > b...@ovn.org
> > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > whitelist DPDK physical ports.
> >
> > Thank you Billy for the review.
> > Please find below my reply.
> >
> > Regards
> > _Sugesh
> >
> >
> > > -Original Message-
> > > From: O Mahony, Billy
> > > Sent: Wednesday, December 6, 2017 5:31 PM
> > > To: Chandran, Sugesh ;
> > > d...@openvswitch.org; b...@ovn.org
> > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > > whitelist DPDK physical ports.
> > >
> > > Hi Sugesh,
> > >
> > > This is definitely a very useful feature. I'm looking forward to
> > > running trex on the same DUT as my ovs-dpdk.
> > >
> > > However I'd suggest adding an sscanf or some such to verify that the
> > > domain is also specified for each whitelist member. And either add
> > > the default of '' or complain loudly if the domain is absent.
> > [Sugesh] Will throw an error in that case then .
> >
> > >
> > > Currently (without this patch) you must specify the domain when adding
> ports:
> > >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else an
> > > error such as 'Cannot find unplugged device (05:00.0)'  is reported.
> > >
> > > And with the patch if you include the domain in the other_config (e.g.
> > > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything works
> > > just as before.
> > >
> > > However with the patch if you add the whitelist *without* a domain e.g.
> > >   ovs-vsctl --no-wait set Open_vSwitch .
> > > other_config:dpdk-whitelist-pci- ids="05:00.0"
> > >
> > > There is no immediate error. However later when doing add-port if
> > > you include the domain (current required practice) you will get an error.
> > > If you omit the domain all is well.
> > [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the ''.
> > But to bind and PCI scan its not necessary.
> > So to keep it consistent, I would add check for PCI-ID in whitelist
> > config too, and throw error incase pci-id are mentioned wrong(means without
> ''.
> > Does it looks OK to you?
> 
> [[BO'M]] I think the error is the right thing to do. It would be tempting to 
> insert
> the default '' if the domain is omitted but then you would have a 
> confusing
> inconsistency in that it would be ok to omit the domain in one place 
> (whitelist)
> but not in the other (add-port).
> 
[Sugesh] Thank you Billy,Will add the check and error out accordingly.
Will release the next patch version by incorporating the comments, once you 
have complete the
review of second part of the patch too.
> > >
> > > It's a little bit strange as regardless of domain or no domain in
> > > the other_config the PCI probe always reports the NIC as expected:
> > > 2017-12-06T16:55:27Z|00013|dpdk|INFO|EAL: PCI device
> > > :05:00.0 on NUMA socket -1
> > > 2017-12-06T16:55:27Z|00014|dpdk|WARN|EAL:   Invalid NUMA socket,
> > > default to 0
> > > 2017-12-06T16:55:27Z|00015|dpdk|INFO|EAL:   probe driver: 8086:1572
> > > net_i40e
> > >
> > > I'll be using the other patch in this series "isolate rte-mempool
> > > allocation" over the next few days so I'll review that in due course.
> > >
> > > Thanks,
> > > Billy.
> > >
> > > > -Original Message-
> > > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > > boun...@openvswitch.org] On Behalf Of Sugesh Chandran
> > > > Sent: Friday, November 10, 2017 1:29 AM
> > > > To: d...@openvswitch.org; b...@ovn.org
> > > > Subject: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > > > whitelist DPDK physical ports.
> > > >
> > > > Adding a OVS configuration option to whitelist DPDK physical ports.
> > > > By default running multiple instances of DPDK on a single platform
> > > > cannot use physical ports at the same time even though they are 
> > > > distinct.
> > > >
> > > > The eal init scans all the ports that are bound to DPDK and
> > > > initialize the drivers accordingly. This happens for every DPDK process 
> > > > init.
> > > > On a multi instance deployment usecase, it causes issues for using
> > > > physical NIC ports.
> > > > Consider a two DPDK process that are running on a single platform,
> > > > the second DPDK primary process will try to initialize the drivers
> > > > for all the physical ports even though it may be used in first DPDK 
> > > > process.
> > > >
> > > > To avoid this situation user can whitelist the ports for each DPDK
> application.
> > > > Whitelisting of 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Mooney, Sean K


> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Thursday, December 7, 2017 4:07 PM
> To: Mooney, Sean K ; Kevin Traynor
> ; Kavanagh, Mark B ;
> Ilya Maximets ; Stokes, Ian
> ; d...@openvswitch.org
> Cc: maxime.coque...@redhat.com; Fischetti, Antonio
> ; Bie, Tiwei ;
> Mcnamara, John ; Guoshuai Li
> ; Loftus, Ciara ; Yuanhan Liu
> 
> Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> 
> > > Would the virtio PMD bug in DPDK 17.11 in the guest actually be
> > > mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser
> > > backend inside OVS on the host?
> 
> > [Mooney, Sean K] from talking to mark about this eairlier I don’t
> belive so.
> > I think if you used 17.11 testpmd in the guest with kernel ovs you
> > should get the same behavior e.g. it does not forward packet. The
> > guest should not be able to know with certainty what vhost backend is
> in use on the host.
> 
> If that is the case why should OVS even consider documenting that there
> are known problems when running a DPDK 17.11 application in a guest,
> irrespective of the OVS version or datapath?
> 
> If anything, this is a critical DPDK bug and DPDK need to warn their
> virtio PMD users not to use 17.11 until this bug is fixed in a 17.11.1
> maintenance release.
> 
> The real issue I see is that OVS-DPDK may also be deployed in in a VM
> with virtio ports, and that kind of deployment will indeed be broken
> with DPDK 17.11. Can we accept and document that as a limitation in OVS
> 2.9 that will be fixed in OVS 2.9.1 as soon as DPDK 17.11.1 is out?
[Mooney, Sean K] that is a concern for me also. I run ovs-dpdk in a vm on a 
daily
Basis but I typical configure the vm with e1000 virtual nics but that then 
requires
Me to use kernel ovs on the host. I started using this partly to work around old
virtio pmd bugs that are have since been fixed.

I ran ovs-dpdk in a vm  on top of ovs-dpdk with vhost-user nics via openstack 
for years
However using Virtio-net-pci devices for ovs-dpdk requires special handeling
Due to some limitations in the choice of pf driver.

Without a virtualized iommu in a guest its not possible to use the vfio-pci
Driver with virtio-net-interfaces. qemu/kvm did not support
vIOMMUs in the past so this has only recently become an option but I have not
validated that it works myself.

Uio support is compiled out by default by Ubuntu,fedora,centos and presumable 
rhel.
That mean even on distros that package ovs-dpdk to run it in a vm you have to 
build
From source.

Igb_uio was previously known to be buggy with vitio-net-pci devices and dpdk 
had advised
against using it as you could not reliably unbind the virtio interface form 
igb_uio and
back to the kernel driver.

So igb_uio worked but you had to reboot the vm to get the kernel dirver working 
again
which was annoying for my development and ci usecases.hence my use of e1000 
vNICs
This may have been fixed.

Uio_pci_generic works with virtio-net-pci devices as far as I know but unlike
Vfio-pci or igb_uio its memory access is over privaldged so its not a great 
choice
If you care about you applications security. It did however avoid the buggy 
igb_uio
Rebinding issue and it does not need an iommu to work so has been the most 
reliable
If least secure option for nested ovs-dpdk.

Since this bug is in the dpdk virtio pmd and not the pf driver however it will 
effect
Each one equally so I think the onus remain with the vnf vendors that build on 
ovs and dpdk 
to validate their dependencies before choosing to rev the ovs and/or dpdk 
versions they use.
Having a warning about this particular issue in the ovs docs however may help 
some of them
Avoid this trap and it will give us something I can refer people to between now 
and the dpdk 17.11.1
Release.

> 
> Or should we as a temporary solution apply a patch to DPDK 17.11 to fix
> this when building OVS?
[Mooney, Sean K] dpdk is not built via the ovs make file we just link to it so
I don’t think that is a good option for upstream ovs. That said I have patching 
support
In my networking-ovs-dpdk devstack plugin for just this reason. If people are 
building
Ovs form source I think just specifying which dpdk commit fixes the issue 
should be
Enough for them to be able to work around this issue until 17.11.1 is released.
> 
> BR, Jan
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 6/6] travis: Update kernel test list from kernel.org

2017-12-07 Thread Greg Rose
From: Greg Rose 

Signed-off-by: Greg Rose 
---
 .travis.yml | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index f217840..8ca6d75 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,17 +25,16 @@ sudo: false
 
 env:
   - OPTS="--disable-ssl"
-  - TESTSUITE=1 KERNEL=3.16.47
+  - TESTSUITE=1 KERNEL=3.16.51
   - TESTSUITE=1 OPTS="--enable-shared"
   - BUILD_ENV="-m32" OPTS="--disable-ssl"
-  - KERNEL=3.16.47 DPDK=1
-  - KERNEL=3.16.47 DPDK=1 OPTS="--enable-shared"
-  - KERNEL=4.13
-  - KERNEL=4.12.11
-  - KERNEL=4.9.48
-  - KERNEL=4.4.87
-  - KERNEL=4.1.43
-  - KERNEL=3.10.107
+  - KERNEL=3.16.51 DPDK=1
+  - KERNEL=3.16.51 DPDK=1 OPTS="--enable-shared"
+  - KERNEL=4.14.3
+  - KERNEL=4.9.66
+  - KERNEL=4.4.103
+  - KERNEL=4.1.46
+  - KERNEL=3.10.108
   - TESTSUITE=1 LIBS=-ljemalloc
 
 matrix:
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 5/6] acinclude.m4: Enable Linux 4.14

2017-12-07 Thread Greg Rose
From: Greg Rose 

Signed-off-by: Greg Rose 
---
 acinclude.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 6511a24..b815c58 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -143,10 +143,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [
 AC_MSG_RESULT([$kversion])
 
 if test "$version" -ge 4; then
-   if test "$version" = 4 && test "$patchlevel" -le 13; then
+   if test "$version" = 4 && test "$patchlevel" -le 14; then
   : # Linux 4.x
else
-  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.13.x is not supported (please refer to the FAQ for advice)])
+  AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 
newer than 4.14.x is not supported (please refer to the FAQ for advice)])
fi
 elif test "$version" = 3 && test "$patchlevel" -ge 10; then
: # Linux 3.x
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 4/6] datapath: Fix SKB_GSO_UDP usage

2017-12-07 Thread Greg Rose
From: Greg Rose 

Using SKB_GSO_UDP breaks the compilation on Linux 4.14. Check for
the HAVE_SKB_GSO_UDP compiler #define.

Signed-off-by: Greg Rose 
---
 datapath/datapath.c |  9 ++---
 datapath/linux/compat/stt.c | 11 ++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index 1780819..a3fdd8f 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -339,8 +339,10 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
 const struct dp_upcall_info *upcall_info,
 uint32_t cutlen)
 {
+#ifdef HAVE_SKB_GSO_UDP
unsigned short gso_type = skb_shinfo(skb)->gso_type;
struct sw_flow_key later_key;
+#endif
struct sk_buff *segs, *nskb;
struct ovs_skb_cb ovs_cb;
int err;
@@ -352,7 +354,7 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
return PTR_ERR(segs);
if (segs == NULL)
return -EINVAL;
-
+#ifdef HAVE_SKB_GSO_UDP
if (gso_type & SKB_GSO_UDP) {
/* The initial flow key extracted by ovs_flow_key_extract()
 * in this case is for a first fragment, so we need to
@@ -361,14 +363,15 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
later_key = *key;
later_key.ip.frag = OVS_FRAG_TYPE_LATER;
}
-
+#endif
/* Queue all of the segments. */
skb = segs;
do {
*OVS_CB(skb) = ovs_cb;
+#ifdef HAVE_SKB_GSO_UDP
if (gso_type & SKB_GSO_UDP && skb != segs)
key = _key;
-
+#endif
err = queue_userspace_packet(dp, skb, key, upcall_info, cutlen);
if (err)
break;
diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
index 37d5f4b..66a97f2 100644
--- a/datapath/linux/compat/stt.c
+++ b/datapath/linux/compat/stt.c
@@ -81,8 +81,13 @@ struct stt_dev {
 #define STT_PROTO_TCP  BIT(3)
 #define STT_PROTO_TYPES(STT_PROTO_IPV4 | STT_PROTO_TCP)
 
+#ifdef HAVE_SKB_GSO_UDP
 #define SUPPORTED_GSO_TYPES (SKB_GSO_TCPV4 | SKB_GSO_UDP | SKB_GSO_DODGY | \
 SKB_GSO_TCPV6)
+#else
+#define SUPPORTED_GSO_TYPES (SKB_GSO_TCPV4 | SKB_GSO_DODGY | \
+SKB_GSO_TCPV6)
+#endif
 
 /* The length and offset of a fragment are encoded in the sequence number.
  * STT_SEQ_LEN_SHIFT is the left shift needed to store the length.
@@ -1310,7 +1315,7 @@ static bool validate_checksum(struct sk_buff *skb)
 static bool set_offloads(struct sk_buff *skb)
 {
struct stthdr *stth = stt_hdr(skb);
-   unsigned short gso_type;
+   unsigned short gso_type = 0;
int l3_header_size;
int l4_header_size;
u16 csum_offset;
@@ -1351,7 +1356,9 @@ static bool set_offloads(struct sk_buff *skb)
case STT_PROTO_IPV4:
/* UDP/IPv4 */
csum_offset = offsetof(struct udphdr, check);
+#ifdef HAVE_SKB_GSO_UDP
gso_type = SKB_GSO_UDP;
+#endif
l3_header_size = sizeof(struct iphdr);
l4_header_size = sizeof(struct udphdr);
skb->protocol = htons(ETH_P_IP);
@@ -1359,7 +1366,9 @@ static bool set_offloads(struct sk_buff *skb)
default:
/* UDP/IPv6 */
csum_offset = offsetof(struct udphdr, check);
+#ifdef HAVE_SKB_GSO_UDP
gso_type = SKB_GSO_UDP;
+#endif
l3_header_size = sizeof(struct ipv6hdr);
l4_header_size = sizeof(struct udphdr);
skb->protocol = htons(ETH_P_IPV6);
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 2/6] compat: Do not include headers when not compiling

2017-12-07 Thread Greg Rose
From: Greg Rose 

If the entire file is not going to be compiled because OVS is using
upstream tunnel support then also don't bother pulling in the headers.

Signed-off-by: Greg Rose 
---
 datapath/linux/compat/ip_gre.c| 2 +-
 datapath/linux/compat/ip_output.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c
index 03c5435..2579093 100644
--- a/datapath/linux/compat/ip_gre.c
+++ b/datapath/linux/compat/ip_gre.c
@@ -12,6 +12,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#ifndef USE_UPSTREAM_TUNNEL
 #include 
 #include 
 #include 
@@ -52,7 +53,6 @@
 #include 
 #include 
 
-#ifndef USE_UPSTREAM_TUNNEL
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
 #include 
diff --git a/datapath/linux/compat/ip_output.c 
b/datapath/linux/compat/ip_output.c
index edca340..e2f869f 100644
--- a/datapath/linux/compat/ip_output.c
+++ b/datapath/linux/compat/ip_output.c
@@ -45,6 +45,7 @@
  * Hirokazu Takahashi: sendfile() on UDP works now.
  */
 
+#ifndef HAVE_CORRECT_MRU_HANDLING
 #include 
 #include 
 #include 
@@ -82,7 +83,6 @@
 #include 
 #include 
 
-#ifndef HAVE_CORRECT_MRU_HANDLING
 static inline void rpl_ip_options_fragment(struct sk_buff *skb)
 {
unsigned char *optptr = skb_network_header(skb) + sizeof(struct iphdr);
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 1/6] datapath: Fix netdev_master_upper_dev_link for 4.14

2017-12-07 Thread Greg Rose
From: Greg Rose 

An extended netlink ack has been added for 4.14 - add compat layer
changes so that it compiles for all kernels up to and including
4.14.

Signed-off-by: Greg Rose 
---
 acinclude.m4|  3 +++
 datapath/linux/compat/include/linux/netdevice.h | 15 ++-
 datapath/vport-netdev.c |  9 -
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 1179a40..6511a24 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -785,6 +785,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netfilter.h], [nf_hook_ops],
 [list],
 [OVS_DEFINE([HAVE_LIST_IN_NF_HOOK_OPS])])
+  OVS_FIND_PARAM_IFELSE([$KSRC/include/linux/netdevice.h],
+[netdev_master_upper_dev_link], [extack],
+[OVS_DEFINE([HAVE_UPPER_DEV_LINK_EXTACK])])
 
   if cmp -s datapath/linux/kcompat.h.new \
 datapath/linux/kcompat.h >/dev/null 2>&1; then
diff --git a/datapath/linux/compat/include/linux/netdevice.h 
b/datapath/linux/compat/include/linux/netdevice.h
index 3c3cf42..c460332 100644
--- a/datapath/linux/compat/include/linux/netdevice.h
+++ b/datapath/linux/compat/include/linux/netdevice.h
@@ -101,13 +101,26 @@ static inline bool netif_needs_gso(struct sk_buff *skb,
 #ifndef HAVE_NETDEV_MASTER_UPPER_DEV_LINK_RH
 static inline int rpl_netdev_master_upper_dev_link(struct net_device *dev,
   struct net_device *upper_dev,
-  void *upper_priv, void 
*upper_info)
+  void *upper_priv,
+  void *upper_info, void *extack)
 {
return netdev_master_upper_dev_link(dev, upper_dev);
 }
 #define netdev_master_upper_dev_link rpl_netdev_master_upper_dev_link
 
 #endif
+#else
+#ifndef HAVE_UPPER_DEV_LINK_EXTACK
+static inline int rpl_netdev_master_upper_dev_link(struct net_device *dev,
+  struct net_device *upper_dev,
+  void *upper_priv,
+  void *upper_info, void *extack)
+{
+   return netdev_master_upper_dev_link(dev, upper_dev, upper_priv,
+   upper_info);
+}
+#define netdev_master_upper_dev_link rpl_netdev_master_upper_dev_link
+#endif
 #endif
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3,16,0)
diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
index 697c442..e2d8eaf 100644
--- a/datapath/vport-netdev.c
+++ b/datapath/vport-netdev.c
@@ -112,8 +112,15 @@ struct vport *ovs_netdev_link(struct vport *vport, const 
char *name)
}
 
rtnl_lock();
+#ifdef HAVE_NETDEV_MASTER_UPPER_DEV_LINK_RH
err = netdev_master_upper_dev_link(vport->dev,
-  get_dpdev(vport->dp), NULL, NULL);
+  get_dpdev(vport->dp),
+  NULL, NULL);
+#else
+   err = netdev_master_upper_dev_link(vport->dev,
+  get_dpdev(vport->dp),
+  NULL, NULL, NULL);
+#endif
if (err)
goto error_unlock;
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/6] Enable OVS on Linux 4.14 kernel

2017-12-07 Thread Greg Rose
Various fixes and compat layer changes required to enable building
OVS for the upstream Linux 4.14 kernel.

The constant changing of the netdev_master_upper_dev_link parameters
is a real headache.  I couldn't think of any cleaner way to do it
than the approach I used but I welcome suggestions on how to make
that code less ugly - because it's ten miles of bad road ugly.
But it does compile and pass basic checks on all of the currently
supported kernels.

There's more fixes for SKB_GSO_UDP - I'm not sure why these don't
show up for the 4.13 or previous kernels but I think it has to do
with the recent change to make sure the SKB_GSO_UDP was searched
as a whole word and thus exposed more fractures.

I updated .travis.yml to use all the most recent supported LTS
and stable kernels from kernel.org.

Greg Rose (6):
  datapath: Fix netdev_master_upper_dev_link for 4.14
  compat: Do not include headers when not compiling
  datapath: Constify struct nf_conntrack_l4proto
  datapath: Fix SKB_GSO_UDP usage
  acinclude.m4: Enable Linux 4.14
  travis: Update kernel test list from kernel.org

 .travis.yml | 17 -
 acinclude.m4|  7 +--
 datapath/conntrack.c|  2 +-
 datapath/datapath.c |  9 ++---
 datapath/linux/compat/include/linux/netdevice.h | 15 ++-
 datapath/linux/compat/ip_gre.c  |  2 +-
 datapath/linux/compat/ip_output.c   |  2 +-
 datapath/linux/compat/stt.c | 11 ++-
 datapath/vport-netdev.c |  9 -
 9 files changed, 54 insertions(+), 20 deletions(-)

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpctl: Fix comment describing get_one_dp().

2017-12-07 Thread Ben Pfaff
On Wed, Dec 06, 2017 at 10:19:23PM -0800, Justin Pettit wrote:
> Signed-off-by: Justin Pettit 
> ---
>  lib/dpctl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 06cfbc4abf21..5e064ec9c100 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -129,8 +129,8 @@ if_up(struct netdev *netdev)
>  }
>  
>  /* Retrieve the name of the datapath if exactly one exists.  The caller
> - * is responsible for freeing the returned string.  If there is not one
> - * datapath, aborts with an error message. */
> + * is responsible for freeing the returned string.  If a single datapath
> + * name cannot be determined, returns NULL. */
>  static char *
>  get_one_dp(struct dpctl_params *dpctl_p)

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Jan Scheurich
> > Would the virtio PMD bug in DPDK 17.11 in the guest actually be
> > mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend
> > inside OVS on the host?

> [Mooney, Sean K] from talking to mark about this eairlier I don’t belive so.
> I think if you used 17.11 testpmd in the guest with kernel ovs you should
> get the same behavior e.g. it does not forward packet. The guest should not
> be able to know with certainty what vhost backend is in use on the host.

If that is the case why should OVS even consider documenting that there are 
known problems when running a DPDK 17.11 application in a guest, irrespective 
of the OVS version or datapath?

If anything, this is a critical DPDK bug and DPDK need to warn their virtio PMD 
users not to use 17.11 until this bug is fixed in a 17.11.1 maintenance release.

The real issue I see is that OVS-DPDK may also be deployed in in a VM with 
virtio ports, and that kind of deployment will indeed be broken with DPDK 
17.11. Can we accept and document that as a limitation in OVS 2.9 that will be 
fixed in OVS 2.9.1 as soon as DPDK 17.11.1 is out?

Or should we as a temporary solution apply a patch to DPDK 17.11 to fix this 
when building OVS?

BR, Jan



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.

2017-12-07 Thread Eelco Chaudron

On 07/12/17 14:48, Jan Scheurich wrote:

-Original Message-
From: Ilya Maximets [mailto:i.maxim...@samsung.com]
Sent: Thursday, 07 December, 2017 14:28

This is preparation for 'struct dp_netdev_pmd_thread' modification
in upcoming commits. Needed to avoid reordering and regrouping while
replacing old and adding new members.


Should this be part of the TX batching set? Anyway, I'm ok if it's not stalling 
the approval :)

Unfortunately yes, because members reordered and regrouped just to include
new members: pmd->ctx and pmd->n_output_batches. This could not be a standalone
change because adding of different members will require different regrouping/
reordering. I moved this change to a separate patch to not do this twice while
adding each member in patches 2/7 and 6/7.

Anyway, as I mentioned in cover letter, I still prefer reverting of the padding
at all by this patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html


I think we should not spent time designing and reviewing these kind of patches 
that are made necessary by the introduction of commit  
a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.

As far as I can see there was never a single review on the original patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339402.html
I wonder how it got merged into master in the first place.

I strongly support Ilya's revert patch for that commit:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

Let's do that quickly to remove some of the obstacles to merging important 
patches in time for OVS 2.9.

BR, Jan


Looking at the above, and going over the patches, I also agree undoing 
the commit is the right thing to do.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kavanagh, Mark B
>From: Kevin Traynor [mailto:ktray...@redhat.com]
>Sent: Thursday, December 7, 2017 3:29 PM
>To: Jan Scheurich ; Kavanagh, Mark B
>; Ilya Maximets ; Stokes,
>Ian ; d...@openvswitch.org
>Cc: maxime.coque...@redhat.com; Mooney, Sean K ;
>Fischetti, Antonio ; Bie, Tiwei
>; Mcnamara, John ; Guoshuai Li
>; Loftus, Ciara ; Yuanhan Liu
>
>Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>On 12/07/2017 03:09 PM, Jan Scheurich wrote:
 I think the point that both yourself and Sean has made is completely
>valid, which puts option a) back on the table.

>>>
>>> a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be
>>> good in addition though. It is nicer that an OVS 2.9 user doesn't have
>>> to know they can't use the latest DPDK in the guest.
>>>
>>
>> Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by
>running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the
>host?
>>
>> If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK
>applications in the guest. Guests should update their DPDK version if  it
>contains a critical bug.
>>
>
>I don't think there is any documented coupling between host and guest
>DPDK versions. I doubt anyone tests lots of combinations but hopefully
>virtio provides the necessary means to run multiple combos. I think it's
>reasonable in this case to document a warning for an OVS user about a
>known bad combination that is likely to be selected (i.e. latest
>upstream/releases).
>
>Kevin.
>
>> BR, Jan
>>

Hi Jan,

DPDK v17.11 in the host is fine; the observed issue is present in the guest's 
virtio driver in DPDK v17.11 (which is not present in v17.05.2).

As Kevin mentioned, I don't think there is any explicit coupling - or 
expectation of same - regarding the combination of DPDK versions used in guest 
and host; however, in this case I think it's certainly sensible to document the 
issue presented by the inclusion of 17.11 DPDK in the guest (at least when DPDK 
applications are used therein).

Thanks,
Mark


>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Mooney, Sean K


> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Thursday, December 7, 2017 3:09 PM
> To: Kevin Traynor ; Kavanagh, Mark B
> ; Ilya Maximets ;
> Stokes, Ian ; d...@openvswitch.org
> Cc: maxime.coque...@redhat.com; Mooney, Sean K
> ; Fischetti, Antonio
> ; Bie, Tiwei ;
> Mcnamara, John ; Guoshuai Li
> ; Loftus, Ciara ; Yuanhan Liu
> 
> Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> 
> > > I think the point that both yourself and Sean has made is
> completely valid, which puts option a) back on the table.
> > >
> >
> > a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would
> > be good in addition though. It is nicer that an OVS 2.9 user doesn't
> > have to know they can't use the latest DPDK in the guest.
> >
> 
> Would the virtio PMD bug in DPDK 17.11 in the guest actually be
> mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend
> inside OVS on the host?
[Mooney, Sean K] from talking to mark about this eairlier I don’t belive so.
I think if you used 17.11 testpmd in the guest with kernel ovs you should
get the same behavior e.g. it does not forward packet. The guest should not
be able to know with certainty what vhost backend is in use on the host.
> 
> If not, I would prefer if we decoupled the DPDK life cycle of OVS and
> DPDK applications in the guest. Guests should update their DPDK version
> if  it contains a critical bug.
> 
> BR, Jan
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kevin Traynor
On 12/07/2017 03:09 PM, Jan Scheurich wrote:
>>> I think the point that both yourself and Sean has made is completely valid, 
>>> which puts option a) back on the table.
>>>
>>
>> a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be
>> good in addition though. It is nicer that an OVS 2.9 user doesn't have
>> to know they can't use the latest DPDK in the guest.
>>
> 
> Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by 
> running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the 
> host?
> 
> If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK 
> applications in the guest. Guests should update their DPDK version if  it 
> contains a critical bug.
> 

I don't think there is any documented coupling between host and guest
DPDK versions. I doubt anyone tests lots of combinations but hopefully
virtio provides the necessary means to run multiple combos. I think it's
reasonable in this case to document a warning for an OVS user about a
known bad combination that is likely to be selected (i.e. latest
upstream/releases).

Kevin.

> BR, Jan
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Jan Scheurich
> > I think the point that both yourself and Sean has made is completely valid, 
> > which puts option a) back on the table.
> >
> 
> a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be
> good in addition though. It is nicer that an OVS 2.9 user doesn't have
> to know they can't use the latest DPDK in the guest.
> 

Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by 
running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the 
host?

If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK 
applications in the guest. Guests should update their DPDK version if  it 
contains a critical bug.

BR, Jan


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kevin Traynor
On 12/04/2017 11:15 AM, Mark Kavanagh wrote:
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index c145e1a..6c6df50 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -344,6 +344,21 @@
>  
>
>  
> +   +  type='{"type": "boolean"}'>
> +
> +  vHost IOMMU is a  security feature, which restricts the vhost 
> memory

very minor nit if you are still reworking anyway, there's some
additional spaces above

> +  that a virtio device may access. vHost IOMMU support is disabled by
> +  default, due to a bug in QEMU implementations of the vhost 
> REPLY_ACK
> +  protocol, (on which vHost IOMMU relies) prior to v2.9.1. Setting 
> this
> +  value to true enables vHost IOMMU support for vHost 
> User
> +  Client ports in OvS-DPDK, starting from DPDK v17.11.
> +
> +
> +  Changing this value requires restarting the daemon.
> +
> +  
> +
>type='{"type": "integer", "minInteger": 1}'>
>  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kevin Traynor
On 12/07/2017 12:27 PM, Kavanagh, Mark B wrote:
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Thursday, December 7, 2017 12:19 PM
>> To: Kavanagh, Mark B ; Stokes, Ian
>> ; d...@openvswitch.org
>> Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean K
>> ; Fischetti, Antonio ;
>> Bie, Tiwei ; Mcnamara, John ;
>> Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara
>> ; Yuanhan Liu 
>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>
>> On 07.12.2017 14:32, Kavanagh, Mark B wrote:
>>> Yuanhan's old email address was used in the previous mail - updated
>> correctly here.
>>> -Mark
>>>
 -Original Message-
 From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>> boun...@openvswitch.org]
 On Behalf Of Kavanagh, Mark B
 Sent: Thursday, December 7, 2017 11:24 AM
 To: Stokes, Ian ; d...@openvswitch.org
 Cc: Liu, Yuanhan ; Bie, Tiwei ;
 Mcnamara, John ; maxime.coque...@redhat.com;
 i.maxim...@samsung.com
 Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

 Hi folks,

 Thanks for all of your respective reviews and testing of this patchset.

 It seems, however, that an issue in DPDK v17.11 has come to light that
>> affects
 guests which use testpmd.
 Specifically, traffic does not reach a guest when traffic is live prior to
 kicking off the testpmd app within said guest, and at least two forwarding
 cores are used. [1]

 This is explained in additional detail is [2], an extract of which is
 reproduced below:

"the vector Rx could be broken if backend has consumed all the avail
 descs before the
 device is started. Because in current implementation, the vector Rx 
 will
 return immediately
 without refilling the avail ring if the used ring is empty. So we have
 to refill
 the avail ring after flushing the elements in the used ring."

 This issue was initially uncovered by Antonio Fischetti, as part of the
>> 17.11
 patchset validation, and has since been localized to DPDK, rather than OvS.
 As a result, it seems now that we should not move to 17.11, but seek an
>> out-
 of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
 should we:
a) move to 17.11 now, note the issue above in the 'errata' section of 
 the
 documentation, and move to 17.11.1 when it becomes available in February of
 next year
b) request the early release of the 17.11.1 stable branch, which
 incorporates a fix for this issue (the possibility, and availability, of
>> which
 are both TBD).

 Thanks in advance,
 Mark
>>
>> Hmm. Isn't it a guest driver issue? Do we need to care so much about running
>> OVS inside the VM? If I assumed right, if we're running OVS not inside the 
>> VM,
>> there will be no issues on the OVS side. The issue happens if guest
>> application
>> uses DPDK 17.11, but it will happen regardless of the DPDK version on the 
>> host
>> side.
>>
>> So, the only bad scenario is running OVS inside the VM with virtio-net PMD
>> driver.
>> My question is: Do we need to care about this scenario so much?
>> If the answer is "not really" then we can use variant a).
>> But if it's very important thing we need to support than b) will be
>> preferable.
>>
>> Am I missing something?
> 
> Hey Ilya,
> 
> I've just had the exact same conversation with Sean Mooney locally.
> 
> I think the point that both yourself and Sean has made is completely valid, 
> which puts option a) back on the table.
> 

a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be
good in addition though. It is nicer that an OVS 2.9 user doesn't have
to know they can't use the latest DPDK in the guest.

Other than the outstanding documentation, the patches LGTM.

> Antonio has agreed to test that this works; once he confirms I can push v5 of 
> the 17.11 patchet, and simply document the guest DPDK issue.
> 
> Thanks again,
> Mark
> 
> 
>>
>> Best regards, Ilya Maximets.
>>
>>

 [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
 [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html


> -Original Message-
> From: Stokes, Ian
> Sent: Thursday, December 7, 2017 8:43 AM
> To: Kavanagh, Mark B ; d...@openvswitch.org
> Cc: ktray...@redhat.com; maxime.coque...@redhat.com;
>> i.maxim...@samsung.com;
> jan.scheur...@ericsson.com; Mooney, Sean K 
> Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>

Re: [ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.

2017-12-07 Thread Ilya Maximets
On 05.12.2017 19:19, Bodireddy, Bhanuprakash wrote:
> [...]
>> int main()
>> {
>>int c;
>>
>>__builtin_prefetch(, 1, 1);
>>c = 8;
>>
>>return c;
>> }
>>
>> on my old Ivy Bridge i7-3770 CPU. It does not support even 'prefetchw':
>>
>>  PREFETCHWT1  = false
>>  3DNow! PREFETCH/PREFETCHW instructions = false
>>
>> Results:
> 
> [Bhanu] I  found https://gcc.godbolt.org/ the other day and its handy to 
> generate code for different targets and compilers.
> 
>> $ gcc 1.c
>> $ objdump -S ./a.out | grep prefetch -A2 -B2
>>  40055b:   31 c0   xor%eax,%eax
>>  40055d:   48 8d 45 f4 lea-0xc(%rbp),%rax
>>  400561:   0f 18 18prefetcht2 (%rax)
>>  400564:   c7 45 f4 08 00 00 00movl   $0x8,-0xc(%rbp)
>>  40056b:   8b 45 f4mov-0xc(%rbp),%eax
> 
> [Bhanu] Expected and compiler generates prefetcht2.
> 
>>
>> $ gcc 1.c -march=native
>> $ objdump -S ./a.out | grep prefetch -A2 -B2
>>  40055b:   31 c0   xor%eax,%eax
>>  40055d:   48 8d 45 f4 lea-0xc(%rbp),%rax
>>  400561:   0f 18 18prefetcht2 (%rax)
>>  400564:   c7 45 f4 08 00 00 00movl   $0x8,-0xc(%rbp)
>>  40056b:   8b 45 f4mov-0xc(%rbp),%eax
> 
> [Bhanu] Though march=native is specified the processor doesn't  have it and 
> still prefetchnt2 is generated by compiler.
> 
>> $ gcc 1.c -march=native -mprefetchwt1
>> $ objdump -S ./a.out | grep prefetch -A2 -B2
>>  40055b:   31 c0   xor%eax,%eax
>>  40055d:   48 8d 45 f4 lea-0xc(%rbp),%rax
>>  400561:   0f 0d 10prefetchwt1 (%rax)
>>  400564:   c7 45 f4 08 00 00 00movl   $0x8,-0xc(%rbp)
>>  40056b:   8b 45 f4mov-0xc(%rbp),%eax
> 
> [Bhanu] The compiler inserts prefetchwt1 instruction as we asked it to do.
> 
>>
>> So, it inserts this instruction even if I have no such instruction in CPU.
> 
> [Bhanu] 
> Though the compiler generates this, as the instruction isn't available on the 
> processor it just become a multi byte NO-Operation(NOP).
> On processors(Intel) that doesn't have prefetchw or 3D Now feature(AMD)  it 
> decodes in to NOP.
> http://ref.x86asm.net/coder64.html#x0F0D
>   - Click on '0D' in two-byte opcode index - (16.  0F0D NOP)
>-  More information on this can be found in Intel SW 
> developers manual (Combined Volumes)
> 
>> More interesting is that program still works without any issues.
>> I assume that CPU just skips that instruction or executes something else.
> 
> [Bhanu] This is what is mostly expected. On processors that supports 
> prefetchwt1 it executes and others it just becomes a NOP.
> 
>>
>> So, it's really strange and it's unclear what CPU really executes in case 
>> where
>> we have 'prefetchwt1' in code but not supported by CPU.
> 
> [Bhanu] It’s decoded in to NOP may be by pipeline decoding units.
> 
>>
>> If CPU just skips this instruction we will lost all the prefetching 
>> optimizations
>> because all the calls will be replaced by non-existent 'prefetchwt1'.
> 
> [Bhanu] I would be worried if core generates an exception treating it as 
> illegal instruction. Instead pipeline units treat this as NOP if it doesn't 
> support it.
> So the micro optimizations doesn't really do any thing on the processors that 
> doesn't support it.

This could be an issue. If someday we'll have real performance optimization
based on OPCH_HTW prefetch, we will have prefetchwt1 on system that supports
it and NOP on others even if they have usual prefetchw which could provide
performance improvement too.

As I understand, checking of '-mprefetchwt1' is equal to checking compiler
version. It doesn't check anything about supporting of this instruction in CPU.
This could end up with non-working performance optimizations and even
degradation on systems that supports usual prefetches but not prefetchwt1
(useless NOPs degrades performance if they are on a hot path).

IMHO, This compiler option should be passed only if CPU really supports it.
I guess, the maximum that we can do is add a note into performance optimization
guide that '-mprefetchwt1' could be passed via CFLAGS if user sure that it
supported by target CPU.

> 
>>
>> How can we be sure that 'prefetchwt1' was really executed?
> 
> [Bhanu] I don’t know how we can see this unless we can peek in to Instruction 
> queues & Decoders of the pipeline :(.
> 
> - Bhanuprakash.
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 5/5] dpif-netdev: Prefetch the cacheline having the cycle stats.

2017-12-07 Thread Ilya Maximets
On 05.12.2017 18:11, Bodireddy, Bhanuprakash wrote:
>>
>>> Prefetch the cacheline having the cycle stats so that we can speed up
>>> the cycles_count_start() and cycles_count_intermediate().
>>
>> Do you have any performance results?
> 
> I don’t have nos. for this patch alone. I was testing the overall throughput 
> along with other patches (that were *not* part of this RFC series) to verify 
> performance improvements. I will include in commit log when I do for 
> individual patches. 
> 
> BTW, I usually look at  the % of total instructions getting retired, cycles 
> spent in front and back-end for the functions to see if prefetching does 
> improve/degrade performance.
> 
> - Bhanuprakash.
> 
>>
>>>
>>> Signed-off-by: Bhanuprakash Bodireddy >> intel.com>
>>> ---
>>>  lib/dpif-netdev.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>> b74b5d7..ab13d83 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread {
>>>  struct ovs_mutex flow_mutex;
>>>  /* 8 pad bytes. */
>>>  );
>>> -PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> +PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE,
>> cachelineC,
>>>  struct cmap flow_table OVS_GUARDED; /* Flow table. */
>>>
>>>  /* One classifier per in_port polled by the pmd */ @@ -4082,6
>>> +4082,7 @@ reload:
>>>  lc = UINT_MAX;
>>>  }
>>>
>>> +OVS_PREFETCH_CACHE(>cachelineC, OPCH_HTW);

How does prefetch just before the infinite loop should improve performance?
I didn't test that, but IMHO, this should have zero impact.

>>>  cycles_count_start(pmd);
>>>  for (;;) {
>>>  for (i = 0; i < poll_cnt; i++) {
>>> --
>>> 2.4.11
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.

2017-12-07 Thread Jan Scheurich
> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Thursday, 07 December, 2017 14:28

> >> This is preparation for 'struct dp_netdev_pmd_thread' modification
> >> in upcoming commits. Needed to avoid reordering and regrouping while
> >> replacing old and adding new members.
> >>
> > Should this be part of the TX batching set? Anyway, I'm ok if it's not 
> > stalling the approval :)
> 
> Unfortunately yes, because members reordered and regrouped just to include
> new members: pmd->ctx and pmd->n_output_batches. This could not be a 
> standalone
> change because adding of different members will require different regrouping/
> reordering. I moved this change to a separate patch to not do this twice while
> adding each member in patches 2/7 and 6/7.
> 
> Anyway, as I mentioned in cover letter, I still prefer reverting of the 
> padding
> at all by this patch:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html
> 

I think we should not spent time designing and reviewing these kind of patches 
that are made necessary by the introduction of commit  
a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.

As far as I can see there was never a single review on the original patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339402.html
I wonder how it got merged into master in the first place.

I strongly support Ilya's revert patch for that commit:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

Let's do that quickly to remove some of the obstacles to merging important 
patches in time for OVS 2.9.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.

2017-12-07 Thread Ilya Maximets
Thanks for review, comments inline.

On 07.12.2017 15:49, Eelco Chaudron wrote:
> On 01/12/17 16:44, Ilya Maximets wrote:
>> This is preparation for 'struct dp_netdev_pmd_thread' modification
>> in upcoming commits. Needed to avoid reordering and regrouping while
>> replacing old and adding new members.
>>
> Should this be part of the TX batching set? Anyway, I'm ok if it's not 
> stalling the approval :)

Unfortunately yes, because members reordered and regrouped just to include
new members: pmd->ctx and pmd->n_output_batches. This could not be a standalone
change because adding of different members will require different regrouping/
reordering. I moved this change to a separate patch to not do this twice while
adding each member in patches 2/7 and 6/7.

Anyway, as I mentioned in cover letter, I still prefer reverting of the padding
at all by this patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

> 
> See some comments below on the use of remaining padding...
>> Signed-off-by: Bhanuprakash Bodireddy 
>> Co-authored-by: Bhanuprakash Bodireddy 
>> Signed-off-by: Ilya Maximets 
>> ---
>>   lib/dpif-netdev.c | 65 
>> ++-
>>   1 file changed, 40 insertions(+), 25 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 0a62630..7a7c6ce 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -549,29 +549,22 @@ struct tx_port {
>>   struct dp_netdev_pmd_thread {
>>   PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0,
>>   struct dp_netdev *dp;
>> -    struct cmap_node node;  /* In 'dp->poll_threads'. */
>> -    pthread_cond_t cond;    /* For synchronizing pmd thread
>> -   reload. */
>> +    struct cmap_node node; /* In 'dp->poll_threads'. */
>> +    pthread_cond_t cond;   /* For synchronizing pmd thread reload. 
>> */
>>   );
>>     PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1,
>>   struct ovs_mutex cond_mutex;    /* Mutex for condition variable. */
>>   pthread_t thread;
>> -    unsigned core_id;   /* CPU core id of this pmd thread. 
>> */
>> -    int numa_id;    /* numa node id of this pmd thread. 
>> */
>>   );
>>     /* Per thread exact-match cache.  Note, the instance for cpu core
>>    * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
>>    * need to be protected by 'non_pmd_mutex'.  Every other instance
>>    * will only be accessed by its own pmd thread. */
>> -    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
>> -    struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. 
>> */
>> -
>> -    /* Queue id used by this pmd thread to send packets on all netdevs if
>> - * XPS disabled for this netdev. All static_tx_qid's are unique and less
>> - * than 'cmap_count(dp->poll_threads)'. */
>> -    uint32_t static_tx_qid;
>> +    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
>> +    );
>>     /* Flow-Table and classifiers
>>    *
>> @@ -579,7 +572,10 @@ struct dp_netdev_pmd_thread {
>>    * changes to 'classifiers' must be made while still holding the
>>    * 'flow_mutex'.
>>    */
>> -    struct ovs_mutex flow_mutex;
>> +    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> +    struct ovs_mutex flow_mutex;
>> +    /* 8 pad bytes. */
> Do we really want to add pad bytes left in this structure? They can easily be 
> wrong is something changes elsewhere?
> Guessing from some the differences you might have other patches applied?
> Using the pahole tool I think the 8 here should be 16?
> 
>     union {
>         struct {
>             struct ovs_mutex flow_mutex; /* 0    48 */
>         }; /* 48 */
>         uint8_t    pad12[64];    /*  64 */
>     };   /* 0    64 */

I left 8 bytes here because we may have different size of ovs_mutex on different
systems. The biggest value I saw was 56 bytes on my ARMv8 platform.

> 
>> +    );
>>   PADDED_MEMBERS(CACHE_LINE_SIZE,
>>   struct cmap flow_table OVS_GUARDED; /* Flow table. */
>>   @@ -596,35 +592,50 @@ struct dp_netdev_pmd_thread {
>>     /* Used to count cycles. See 'cycles_counter_end()'. */
>>   unsigned long long last_cycles;
>> -    struct latch exit_latch;    /* For terminating the pmd thread. 
>> */
>> -    );
>>   +    /* 8 pad bytes. */
>> +    );
>>   PADDED_MEMBERS(CACHE_LINE_SIZE,
>>   /* Statistics. */
>>   struct dp_netdev_pmd_stats stats;
>> +    /* 8 pad bytes. */
> Should this not be 24?

Yes, you're right. It'll be 8 after applying of the last patch of the series.
I can keep 24 in this 

Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.

2017-12-07 Thread Eelco Chaudron

Hi All,

I reviewed the code for this V6, however I did not do the testing as 
before as my setup is torn down at the moment.
Some small comments on "[PATCH v6 1/7] dpif-netdev: Refactor PMD thread 
structure for further extension." see reply on that email.


Other than that I would like to ack the series:

Acked-by: Eelco Chaudron 


Guess the only thing missing is proper documentation for this change.

Thanks,

Eelco

On 01/12/17 16:44, Ilya Maximets wrote:

This patch-set inspired by [1] from Bhanuprakash Bodireddy.
Implementation of [1] looks very complex and introduces many pitfalls [2]
for later code modifications like possible packet stucks.

This version targeted to make simple and flexible output packet batching on
higher level without introducing and even simplifying netdev layer.

Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
significant performance improvement.

Test results for time-based batching for v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html

Test results for v4:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html

[1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet 
transmission.
 https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

[2] For example:
 https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

Version 6:
* Rebased on current master:
  - Added new patch to refactor dp_netdev_pmd_thread structure
according to following suggestion:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

  NOTE: I still prefer reverting of the padding related patch.
Rebase done to not block acepting of this series.
Revert patch and discussion here:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html

* Added comment about pmd_thread_ctx_time_update() usage.

Version 5:
* pmd_thread_ctx_time_update() calls moved to different places to
  call them only from dp_netdev_process_rxq_port() and main
  polling functions:
pmd_thread_main, dpif_netdev_run and dpif_netdev_execute.
  All other functions should use cached time from pmd->ctx.now.
  It's guaranteed to be updated at least once per polling cycle.
* 'may_steal' patch returned to version from v3 because
  'may_steal' in qos is a completely different variable. This
  patch only removes 'may_steal' from netdev API.
* 2 more usec functions added to timeval to have complete public API.
* Checking of 'output_cnt' turned to assertion.

Version 4:
* Rebased on current master.
* Rebased on top of "Keep latest measured time for PMD thread."
  (Jan Scheurich)
* Microsecond resolution related patches integrated.
* Time-based batching without RFC tag.
* 'output_time' renamed to 'flush_time'. (Jan Scheurich)
* 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'.
  (Jan Scheurich)
* 'output-max-latency' renamed to 'tx-flush-interval'.
* Added patch for output batching statistics.

Version 3:

* Rebased on current master.
* Time based RFC: fixed assert on n_output_batches <= 0.

Version 2:

* Rebased on current master.
* Added time based batching RFC patch.
* Fixed mixing packets with different sources in same batch.


Ilya Maximets (7):
   dpif-netdev: Refactor PMD thread structure for further extension.
   dpif-netdev: Keep latest measured time for PMD thread.
   dpif-netdev: Output packet batching.
   netdev: Remove unused may_steal.
   netdev: Remove useless cutlen.
   dpif-netdev: Time based output batching.
   dpif-netdev: Count sent packets and batches.

  lib/dpif-netdev.c | 412 +-
  lib/netdev-bsd.c  |   6 +-
  lib/netdev-dpdk.c |  30 ++--
  lib/netdev-dummy.c|   6 +-
  lib/netdev-linux.c|   8 +-
  lib/netdev-provider.h |   7 +-
  lib/netdev.c  |  12 +-
  lib/netdev.h  |   2 +-
  vswitchd/vswitch.xml  |  16 ++
  9 files changed, 349 insertions(+), 150 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.

2017-12-07 Thread Eelco Chaudron

On 01/12/17 16:44, Ilya Maximets wrote:

This is preparation for 'struct dp_netdev_pmd_thread' modification
in upcoming commits. Needed to avoid reordering and regrouping while
replacing old and adding new members.

Should this be part of the TX batching set? Anyway, I'm ok if it's not 
stalling the approval :)


See some comments below on the use of remaining padding...

Signed-off-by: Bhanuprakash Bodireddy 
Co-authored-by: Bhanuprakash Bodireddy 
Signed-off-by: Ilya Maximets 
---
  lib/dpif-netdev.c | 65 ++-
  1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..7a7c6ce 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -549,29 +549,22 @@ struct tx_port {
  struct dp_netdev_pmd_thread {
  PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0,
  struct dp_netdev *dp;
-struct cmap_node node;  /* In 'dp->poll_threads'. */
-pthread_cond_t cond;/* For synchronizing pmd thread
-   reload. */
+struct cmap_node node; /* In 'dp->poll_threads'. */
+pthread_cond_t cond;   /* For synchronizing pmd thread reload. */
  );
  
  PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1,

  struct ovs_mutex cond_mutex;/* Mutex for condition variable. */
  pthread_t thread;
-unsigned core_id;   /* CPU core id of this pmd thread. */
-int numa_id;/* numa node id of this pmd thread. */
  );
  
  /* Per thread exact-match cache.  Note, the instance for cpu core

   * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
   * need to be protected by 'non_pmd_mutex'.  Every other instance
   * will only be accessed by its own pmd thread. */
-OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
-struct ovs_refcount ref_cnt;/* Every reference must be refcount'ed. */
-
-/* Queue id used by this pmd thread to send packets on all netdevs if
- * XPS disabled for this netdev. All static_tx_qid's are unique and less
- * than 'cmap_count(dp->poll_threads)'. */
-uint32_t static_tx_qid;
+PADDED_MEMBERS(CACHE_LINE_SIZE,
+OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
+);
  
  /* Flow-Table and classifiers

   *
@@ -579,7 +572,10 @@ struct dp_netdev_pmd_thread {
   * changes to 'classifiers' must be made while still holding the
   * 'flow_mutex'.
   */
-struct ovs_mutex flow_mutex;
+PADDED_MEMBERS(CACHE_LINE_SIZE,
+struct ovs_mutex flow_mutex;
+/* 8 pad bytes. */
Do we really want to add pad bytes left in this structure? They can 
easily be wrong is something changes elsewhere?

Guessing from some the differences you might have other patches applied?
Using the pahole tool I think the 8 here should be 16?

    union {
        struct {
            struct ovs_mutex flow_mutex; /* 0    48 */
        }; /* 48 */
        uint8_t    pad12[64];    /*  64 */
    };   /* 0    64 */


+);
  PADDED_MEMBERS(CACHE_LINE_SIZE,
  struct cmap flow_table OVS_GUARDED; /* Flow table. */
  
@@ -596,35 +592,50 @@ struct dp_netdev_pmd_thread {
  
  /* Used to count cycles. See 'cycles_counter_end()'. */

  unsigned long long last_cycles;
-struct latch exit_latch;/* For terminating the pmd thread. */
-);
  
+/* 8 pad bytes. */

+);
  PADDED_MEMBERS(CACHE_LINE_SIZE,
  /* Statistics. */
  struct dp_netdev_pmd_stats stats;
+/* 8 pad bytes. */

Should this not be 24?

    union {
        struct {
            struct dp_netdev_pmd_stats stats; /* 0 40 */
        };    /* 40 */
        uint8_t    pad14[64]; /* 64 */
    };    /* 0 64 */


+);
  
+PADDED_MEMBERS(CACHE_LINE_SIZE,

+struct latch exit_latch; /* For terminating the pmd thread. */
  struct seq *reload_seq;
  uint64_t last_reload_seq;
-atomic_bool reload; /* Do we need to reload ports? */
-bool isolated;
-
+atomic_bool reload;  /* Do we need to reload ports? */
  /* Set to true if the pmd thread needs to be reloaded. */
  bool need_reload;
-/* 5 pad bytes. */
+bool isolated;
+
+struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */
+
+/* Queue id used by this pmd thread to send packets on all netdevs if
+ * XPS disabled for this netdev. All static_tx_qid's are unique and
+ * less than 'cmap_count(dp->poll_threads)'. */
+uint32_t static_tx_qid;
+
+

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Fischetti, Antonio
I just checked there's no issue if we use dpdk v17.11 on the host
and dpdk 17.05.2 in the guest.

Antonio

> -Original Message-
> From: Kavanagh, Mark B
> Sent: Thursday, December 7, 2017 12:28 PM
> To: Ilya Maximets ; Stokes, Ian
> ; d...@openvswitch.org
> Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean
> K ; Fischetti, Antonio
> ; Bie, Tiwei ;
> Mcnamara, John ; Guoshuai Li
> ; ktray...@redhat.com; Loftus, Ciara
> ; Yuanhan Liu 
> Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> 
> >From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >Sent: Thursday, December 7, 2017 12:19 PM
> >To: Kavanagh, Mark B ; Stokes, Ian
> >; d...@openvswitch.org
> >Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney,
> Sean K
> >; Fischetti, Antonio
> ;
> >Bie, Tiwei ; Mcnamara, John
> ;
> >Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara
> >; Yuanhan Liu 
> >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> >
> >On 07.12.2017 14:32, Kavanagh, Mark B wrote:
> >> Yuanhan's old email address was used in the previous mail - updated
> >correctly here.
> >> -Mark
> >>
> >>> -Original Message-
> >>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> >boun...@openvswitch.org]
> >>> On Behalf Of Kavanagh, Mark B
> >>> Sent: Thursday, December 7, 2017 11:24 AM
> >>> To: Stokes, Ian ; d...@openvswitch.org
> >>> Cc: Liu, Yuanhan ; Bie, Tiwei
> ;
> >>> Mcnamara, John ;
> maxime.coque...@redhat.com;
> >>> i.maxim...@samsung.com
> >>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU
> support
> >>>
> >>> Hi folks,
> >>>
> >>> Thanks for all of your respective reviews and testing of this
> patchset.
> >>>
> >>> It seems, however, that an issue in DPDK v17.11 has come to light
> that
> >affects
> >>> guests which use testpmd.
> >>> Specifically, traffic does not reach a guest when traffic is live
> prior to
> >>> kicking off the testpmd app within said guest, and at least two
> forwarding
> >>> cores are used. [1]
> >>>
> >>> This is explained in additional detail is [2], an extract of which
> is
> >>> reproduced below:
> >>>
> >>>   "the vector Rx could be broken if backend has consumed all the
> avail
> >>> descs before the
> >>>device is started. Because in current implementation, the vector
> Rx will
> >>> return immediately
> >>>without refilling the avail ring if the used ring is empty. So we
> have
> >>> to refill
> >>>the avail ring after flushing the elements in the used ring."
> >>>
> >>> This issue was initially uncovered by Antonio Fischetti, as part of
> the
> >17.11
> >>> patchset validation, and has since been localized to DPDK, rather
> than OvS.
> >>> As a result, it seems now that we should not move to 17.11, but seek
> an
> >out-
> >>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on
> this -
> >>> should we:
> >>>   a) move to 17.11 now, note the issue above in the 'errata' section
> of the
> >>> documentation, and move to 17.11.1 when it becomes available in
> February of
> >>> next year
> >>>   b) request the early release of the 17.11.1 stable branch, which
> >>> incorporates a fix for this issue (the possibility, and
> availability, of
> >which
> >>> are both TBD).
> >>>
> >>> Thanks in advance,
> >>> Mark
> >
> >Hmm. Isn't it a guest driver issue? Do we need to care so much about
> running
> >OVS inside the VM? If I assumed right, if we're running OVS not inside
> the VM,
> >there will be no issues on the OVS side. The issue happens if guest
> >application
> >uses DPDK 17.11, but it will happen regardless of the DPDK version on
> the host
> >side.
> >
> >So, the only bad scenario is running OVS inside the VM with virtio-net
> PMD
> >driver.
> >My question is: Do we need to care about this scenario so much?
> >If the answer is "not really" then we can use variant a).
> >But if it's very important thing we need to support than b) will be
> >preferable.
> >
> >Am I missing something?
> 
> Hey Ilya,
> 
> I've just had the exact same conversation with Sean Mooney locally.
> 
> I think the point that both yourself and Sean has made is completely
> valid, which puts option a) back on the table.
> 
> Antonio has agreed to test that this works; once he confirms I can push
> v5 of the 17.11 patchet, and simply document the guest DPDK issue.
> 
> Thanks again,
> Mark
> 
> 
> >
> >Best regards, Ilya Maximets.
> >
> >
> >>>
> >>> [1] 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kavanagh, Mark B
>From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>Sent: Thursday, December 7, 2017 12:19 PM
>To: Kavanagh, Mark B ; Stokes, Ian
>; d...@openvswitch.org
>Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean K
>; Fischetti, Antonio ;
>Bie, Tiwei ; Mcnamara, John ;
>Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara
>; Yuanhan Liu 
>Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>On 07.12.2017 14:32, Kavanagh, Mark B wrote:
>> Yuanhan's old email address was used in the previous mail - updated
>correctly here.
>> -Mark
>>
>>> -Original Message-
>>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>boun...@openvswitch.org]
>>> On Behalf Of Kavanagh, Mark B
>>> Sent: Thursday, December 7, 2017 11:24 AM
>>> To: Stokes, Ian ; d...@openvswitch.org
>>> Cc: Liu, Yuanhan ; Bie, Tiwei ;
>>> Mcnamara, John ; maxime.coque...@redhat.com;
>>> i.maxim...@samsung.com
>>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>
>>> Hi folks,
>>>
>>> Thanks for all of your respective reviews and testing of this patchset.
>>>
>>> It seems, however, that an issue in DPDK v17.11 has come to light that
>affects
>>> guests which use testpmd.
>>> Specifically, traffic does not reach a guest when traffic is live prior to
>>> kicking off the testpmd app within said guest, and at least two forwarding
>>> cores are used. [1]
>>>
>>> This is explained in additional detail is [2], an extract of which is
>>> reproduced below:
>>>
>>> "the vector Rx could be broken if backend has consumed all the avail
>>> descs before the
>>>  device is started. Because in current implementation, the vector Rx 
>>> will
>>> return immediately
>>>  without refilling the avail ring if the used ring is empty. So we have
>>> to refill
>>>  the avail ring after flushing the elements in the used ring."
>>>
>>> This issue was initially uncovered by Antonio Fischetti, as part of the
>17.11
>>> patchset validation, and has since been localized to DPDK, rather than OvS.
>>> As a result, it seems now that we should not move to 17.11, but seek an
>out-
>>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
>>> should we:
>>> a) move to 17.11 now, note the issue above in the 'errata' section of 
>>> the
>>> documentation, and move to 17.11.1 when it becomes available in February of
>>> next year
>>> b) request the early release of the 17.11.1 stable branch, which
>>> incorporates a fix for this issue (the possibility, and availability, of
>which
>>> are both TBD).
>>>
>>> Thanks in advance,
>>> Mark
>
>Hmm. Isn't it a guest driver issue? Do we need to care so much about running
>OVS inside the VM? If I assumed right, if we're running OVS not inside the VM,
>there will be no issues on the OVS side. The issue happens if guest
>application
>uses DPDK 17.11, but it will happen regardless of the DPDK version on the host
>side.
>
>So, the only bad scenario is running OVS inside the VM with virtio-net PMD
>driver.
>My question is: Do we need to care about this scenario so much?
>If the answer is "not really" then we can use variant a).
>But if it's very important thing we need to support than b) will be
>preferable.
>
>Am I missing something?

Hey Ilya,

I've just had the exact same conversation with Sean Mooney locally.

I think the point that both yourself and Sean has made is completely valid, 
which puts option a) back on the table.

Antonio has agreed to test that this works; once he confirms I can push v5 of 
the 17.11 patchet, and simply document the guest DPDK issue.

Thanks again,
Mark


>
>Best regards, Ilya Maximets.
>
>
>>>
>>> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
>>> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
>>>
>>>
 -Original Message-
 From: Stokes, Ian
 Sent: Thursday, December 7, 2017 8:43 AM
 To: Kavanagh, Mark B ; d...@openvswitch.org
 Cc: ktray...@redhat.com; maxime.coque...@redhat.com;
>i.maxim...@samsung.com;
 jan.scheur...@ericsson.com; Mooney, Sean K 
 Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

> DPDK v17.11 introduces support for the vHost IOMMU feature.
> This is a security feature, which restricts the vhost memory that a
>virtio
> device may access.

 Hi all,

 There were a few requests for changes in the v4 of this patch and it's an
 important aspect of the DPDK 17.11 support for OVS.

 I'd like to get this series in to the pull request this week. If people
>have
 flagged issues for the latest revision I'd appreciate 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Ilya Maximets
On 07.12.2017 14:32, Kavanagh, Mark B wrote:
> Yuanhan's old email address was used in the previous mail - updated correctly 
> here.
> -Mark
> 
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org 
>> [mailto:ovs-dev-boun...@openvswitch.org]
>> On Behalf Of Kavanagh, Mark B
>> Sent: Thursday, December 7, 2017 11:24 AM
>> To: Stokes, Ian ; d...@openvswitch.org
>> Cc: Liu, Yuanhan ; Bie, Tiwei ;
>> Mcnamara, John ; maxime.coque...@redhat.com;
>> i.maxim...@samsung.com
>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>
>> Hi folks,
>>
>> Thanks for all of your respective reviews and testing of this patchset.
>>
>> It seems, however, that an issue in DPDK v17.11 has come to light that 
>> affects
>> guests which use testpmd.
>> Specifically, traffic does not reach a guest when traffic is live prior to
>> kicking off the testpmd app within said guest, and at least two forwarding
>> cores are used. [1]
>>
>> This is explained in additional detail is [2], an extract of which is
>> reproduced below:
>>
>>  "the vector Rx could be broken if backend has consumed all the avail
>> descs before the
>>   device is started. Because in current implementation, the vector Rx 
>> will
>> return immediately
>>   without refilling the avail ring if the used ring is empty. So we have
>> to refill
>>   the avail ring after flushing the elements in the used ring."
>>
>> This issue was initially uncovered by Antonio Fischetti, as part of the 17.11
>> patchset validation, and has since been localized to DPDK, rather than OvS.
>> As a result, it seems now that we should not move to 17.11, but seek an out-
>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
>> should we:
>>  a) move to 17.11 now, note the issue above in the 'errata' section of 
>> the
>> documentation, and move to 17.11.1 when it becomes available in February of
>> next year
>>  b) request the early release of the 17.11.1 stable branch, which
>> incorporates a fix for this issue (the possibility, and availability, of 
>> which
>> are both TBD).
>>
>> Thanks in advance,
>> Mark

Hmm. Isn't it a guest driver issue? Do we need to care so much about running
OVS inside the VM? If I assumed right, if we're running OVS not inside the VM,
there will be no issues on the OVS side. The issue happens if guest application
uses DPDK 17.11, but it will happen regardless of the DPDK version on the host
side.

So, the only bad scenario is running OVS inside the VM with virtio-net PMD 
driver.
My question is: Do we need to care about this scenario so much?
If the answer is "not really" then we can use variant a).
But if it's very important thing we need to support than b) will be preferable.

Am I missing something?

Best regards, Ilya Maximets.


>>
>> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
>> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
>>
>>
>>> -Original Message-
>>> From: Stokes, Ian
>>> Sent: Thursday, December 7, 2017 8:43 AM
>>> To: Kavanagh, Mark B ; d...@openvswitch.org
>>> Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com;
>>> jan.scheur...@ericsson.com; Mooney, Sean K 
>>> Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>>
 DPDK v17.11 introduces support for the vHost IOMMU feature.
 This is a security feature, which restricts the vhost memory that a virtio
 device may access.
>>>
>>> Hi all,
>>>
>>> There were a few requests for changes in the v4 of this patch and it's an
>>> important aspect of the DPDK 17.11 support for OVS.
>>>
>>> I'd like to get this series in to the pull request this week. If people have
>>> flagged issues for the latest revision I'd appreciate if they could review
>> the
>>> latest revision and flag any new issues that need to be raised.
>>>
>>> Ian
>>>

 This feature also enables the vhost REPLY_ACK protocol, the implementation
 of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is
 buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature
 is disabled by default in (and should remain so), for the aforementioned
 older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
 safely be enabled, even without having an IOMMU device, with no
 performance penalty.

 This patch adds a new global config option, vhost-iommu-support, that
 controls enablement of the vhost IOMMU feature:

 ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

 This value defaults to false; to enable IOMMU support, this field should
 be set to true when setting other global parameters on init (such as
 "dpdk-socket-mem", for example). Changing the value at runtime is not
 supported, and requires 

Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.

2017-12-07 Thread O Mahony, Billy
Hi Sugesh,

> -Original Message-
> From: Chandran, Sugesh
> Sent: Wednesday, December 6, 2017 6:23 PM
> To: O Mahony, Billy ; d...@openvswitch.org;
> b...@ovn.org
> Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist
> DPDK physical ports.
> 
> Thank you Billy for the review.
> Please find below my reply.
> 
> Regards
> _Sugesh
> 
> 
> > -Original Message-
> > From: O Mahony, Billy
> > Sent: Wednesday, December 6, 2017 5:31 PM
> > To: Chandran, Sugesh ; d...@openvswitch.org;
> > b...@ovn.org
> > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > whitelist DPDK physical ports.
> >
> > Hi Sugesh,
> >
> > This is definitely a very useful feature. I'm looking forward to
> > running trex on the same DUT as my ovs-dpdk.
> >
> > However I'd suggest adding an sscanf or some such to verify that the
> > domain is also specified for each whitelist member. And either add the
> > default of '' or complain loudly if the domain is absent.
> [Sugesh] Will throw an error in that case then .
> 
> >
> > Currently (without this patch) you must specify the domain when adding 
> > ports:
> >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else an
> > error such as 'Cannot find unplugged device (05:00.0)'  is reported.
> >
> > And with the patch if you include the domain in the other_config (e.g.
> > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything works
> > just as before.
> >
> > However with the patch if you add the whitelist *without* a domain e.g.
> > ovs-vsctl --no-wait set Open_vSwitch .
> > other_config:dpdk-whitelist-pci- ids="05:00.0"
> >
> > There is no immediate error. However later when doing add-port if you
> > include the domain (current required practice) you will get an error.
> > If you omit the domain all is well.
> [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the ''.
> But to bind and PCI scan its not necessary.
> So to keep it consistent, I would add check for PCI-ID in whitelist config 
> too, and
> throw error incase pci-id are mentioned wrong(means without ''.
> Does it looks OK to you?

[[BO'M]] I think the error is the right thing to do. It would be tempting to 
insert the default '' if the domain is omitted but then you would have a 
confusing inconsistency in that it would be ok to omit the domain in one place 
(whitelist) but not in the other (add-port).

> >
> > It's a little bit strange as regardless of domain or no domain in the
> > other_config the PCI probe always reports the NIC as expected:
> > 2017-12-06T16:55:27Z|00013|dpdk|INFO|EAL: PCI device :05:00.0
> > on NUMA socket -1
> > 2017-12-06T16:55:27Z|00014|dpdk|WARN|EAL:   Invalid NUMA socket,
> > default to 0
> > 2017-12-06T16:55:27Z|00015|dpdk|INFO|EAL:   probe driver: 8086:1572
> > net_i40e
> >
> > I'll be using the other patch in this series "isolate rte-mempool
> > allocation" over the next few days so I'll review that in due course.
> >
> > Thanks,
> > Billy.
> >
> > > -Original Message-
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > boun...@openvswitch.org] On Behalf Of Sugesh Chandran
> > > Sent: Friday, November 10, 2017 1:29 AM
> > > To: d...@openvswitch.org; b...@ovn.org
> > > Subject: [ovs-dev] [PATCH 2/2] Adding configuration option to
> > > whitelist DPDK physical ports.
> > >
> > > Adding a OVS configuration option to whitelist DPDK physical ports.
> > > By default running multiple instances of DPDK on a single platform
> > > cannot use physical ports at the same time even though they are distinct.
> > >
> > > The eal init scans all the ports that are bound to DPDK and
> > > initialize the drivers accordingly. This happens for every DPDK process 
> > > init.
> > > On a multi instance deployment usecase, it causes issues for using
> > > physical NIC ports.
> > > Consider a two DPDK process that are running on a single platform,
> > > the second DPDK primary process will try to initialize the drivers
> > > for all the physical ports even though it may be used in first DPDK 
> > > process.
> > >
> > > To avoid this situation user can whitelist the ports for each DPDK 
> > > application.
> > > Whitelisting of ports/PCI-ID in a DPDK process will limit the
> > > eal-init only on those ports.
> > >
> > > To whitelist two physical ports ":06:00.0" and ":06:00.1",
> > > the configuration option in OVS would be
> > >   ovs-vsctl  set Open_vSwitch . other_config:dpdk-whitelist-pci-
> > > ids=":06:00.0,:06:00.1"
> > >
> > > To update the whitelist ports, OVS daemon has to be restarted.
> > >
> > > Signed-off-by: Sugesh Chandran 
> > > ---
> > >  lib/dpdk.c   | 29 +
> > >  vswitchd/vswitch.xml | 21 +
> > >  2 files changed, 50 insertions(+)
> > >
> > > diff --git a/lib/dpdk.c b/lib/dpdk.c index 9d187c7..0f11977 100644
> > > --- 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Stokes, Ian
> Hi folks,
> 
> Thanks for all of your respective reviews and testing of this patchset.
> 
> It seems, however, that an issue in DPDK v17.11 has come to light that
> affects guests which use testpmd.
> Specifically, traffic does not reach a guest when traffic is live prior to
> kicking off the testpmd app within said guest, and at least two forwarding
> cores are used. [1]
> 
> This is explained in additional detail is [2], an extract of which is
> reproduced below:
> 
>   "the vector Rx could be broken if backend has consumed all the avail
> descs before the
>device is started. Because in current implementation, the vector Rx
> will return immediately
>without refilling the avail ring if the used ring is empty. So we
> have to refill
>the avail ring after flushing the elements in the used ring."
> 

Thanks for flagging this Mark.

> This issue was initially uncovered by Antonio Fischetti, as part of the
> 17.11 patchset validation, and has since been localized to DPDK, rather
> than OvS.
> As a result, it seems now that we should not move to 17.11, but seek an
> out-of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on
> this - should we:
>   a) move to 17.11 now, note the issue above in the 'errata' section of
> the documentation, and move to 17.11.1 when it becomes available in
> February of next year

I'm not sure this would work, this seems like a pretty common use case, it 
would be quite an important errata for users to be aware of.

>   b) request the early release of the 17.11.1 stable branch, which
> incorporates a fix for this issue (the possibility, and availability, of
> which are both TBD).

+1 for the above option. I'd like to see this issue resolved before moving but 
I'm open to others input.

Thanks
Ian
> 
> Thanks in advance,
> Mark
> 
> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
> 
> 
> >-Original Message-
> >From: Stokes, Ian
> >Sent: Thursday, December 7, 2017 8:43 AM
> >To: Kavanagh, Mark B ; d...@openvswitch.org
> >Cc: ktray...@redhat.com; maxime.coque...@redhat.com;
> i.maxim...@samsung.com;
> >jan.scheur...@ericsson.com; Mooney, Sean K 
> >Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> >
> >> DPDK v17.11 introduces support for the vHost IOMMU feature.
> >> This is a security feature, which restricts the vhost memory that a
> virtio
> >> device may access.
> >
> >Hi all,
> >
> >There were a few requests for changes in the v4 of this patch and it's an
> >important aspect of the DPDK 17.11 support for OVS.
> >
> >I'd like to get this series in to the pull request this week. If people
> have
> >flagged issues for the latest revision I'd appreciate if they could
> review the
> >latest revision and flag any new issues that need to be raised.
> >
> >Ian
> >
> >>
> >> This feature also enables the vhost REPLY_ACK protocol, the
> implementation
> >> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but
> is
> >> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the
> feature
> >> is disabled by default in (and should remain so), for the
> aforementioned
> >> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
> >> safely be enabled, even without having an IOMMU device, with no
> >> performance penalty.
> >>
> >> This patch adds a new global config option, vhost-iommu-support, that
> >> controls enablement of the vhost IOMMU feature:
> >>
> >> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> >>
> >> This value defaults to false; to enable IOMMU support, this field
> should
> >> be set to true when setting other global parameters on init (such as
> >> "dpdk-socket-mem", for example). Changing the value at runtime is not
> >> supported, and requires restarting the vswitch daemon.
> >>
> >> Signed-off-by: Mark Kavanagh 
> >>
> >> ---
> >>
> >> v4:  - rebase to HEAD of master
> >>  - clarify that IOMMU support applies exclusively to vhost user
> >>client ports.
> >>  - reword caveats regarding modifying vhost-iommu-support at
> runtime
> >>  - use smap_get_bool() to parse vhost-iommu-support, instead of
> >>process_vhost_flags().
> >>  - add stub implementation of dpdk_vhost_iommu_enabled().
> >>  - move stdbool.h #include outside of DPDK compiler guards.
> >>  - remove mention of vhost IOMMU mode from
> >>netdev_dpdk_vhost_client_configure() log.
> >>
> >> v3:  - erroneous; disregard.
> >>
> >> v2:  - rebase to HEAD of master
> >>  - refactor vHost IOMMU enablement mechanism (use a global
> >>config option, instead of the previous per-port approach).
> >> ---
> >>  Documentation/topics/dpdk/vhost-user.rst | 27
> +++
> >>  NEWS |  1 +
> >>  lib/dpdk-stub.c  

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kavanagh, Mark B

>From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>Sent: Thursday, December 7, 2017 9:49 AM
>To: Kavanagh, Mark B ; d...@openvswitch.org
>Cc: ktray...@redhat.com; maxime.coque...@redhat.com;
>jan.scheur...@ericsson.com; Mooney, Sean K ; Stokes,
>Ian 
>Subject: Re: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>Looks good to me in general, but I prefer if you'll fix following
>checkpatch warnings:

Thanks for your comments Ilya - I'll address same in the next version of this 
patchset, depending on the outcome of this: 
https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341701.html 

Best regards,
Mark

>
>== Checking 2e9587f27add ("netdev-dpdk: vHost IOMMU support") ==
>WARNING: Line length is >79-characters long
>#52 FILE: Documentation/topics/dpdk/vhost-user.rst:280:
>can access, and as such is useful in deployments in which security is a
>concern.
>
>WARNING: Line length is >79-characters long
>#54 FILE: Documentation/topics/dpdk/vhost-user.rst:282:
>IOMMU support may be enabled via a global config value, ```vhost-iommu-
>support```.
>
>WARNING: Line length is >79-characters long
>#71 FILE: Documentation/topics/dpdk/vhost-user.rst:299:
>default (and should remain so if using the aforementioned versions of
>QEMU).
>
>One more comment inline.
>
>Best regards, Ilya Maximets.
>
>On 04.12.2017 14:15, Mark Kavanagh wrote:
>> DPDK v17.11 introduces support for the vHost IOMMU feature.
>> This is a security feature, which restricts the vhost memory
>> that a virtio device may access.
>>
>> This feature also enables the vhost REPLY_ACK protocol, the
>> implementation of which is known to work in newer versions of
>> QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
>> v2.9.0, inclusive). As such, the feature is disabled by default
>> in (and should remain so), for the aforementioned older QEMU
>> verions. Starting with QEMU v2.9.1, vhost-iommu-support can
>> safely be enabled, even without having an IOMMU device, with
>> no performance penalty.
>>
>> This patch adds a new global config option, vhost-iommu-support,
>> that controls enablement of the vhost IOMMU feature:
>>
>> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
>>
>> This value defaults to false; to enable IOMMU support, this field
>> should be set to true when setting other global parameters on init
>> (such as "dpdk-socket-mem", for example). Changing the value at
>> runtime is not supported, and requires restarting the vswitch daemon.
>>
>> Signed-off-by: Mark Kavanagh 
>>
>> ---
>>
>> v4:  - rebase to HEAD of master
>>  - clarify that IOMMU support applies exclusively to vhost user
>>client ports.
>>  - reword caveats regarding modifying vhost-iommu-support at runtime
>>  - use smap_get_bool() to parse vhost-iommu-support, instead of
>>process_vhost_flags().
>>  - add stub implementation of dpdk_vhost_iommu_enabled().
>>  - move stdbool.h #include outside of DPDK compiler guards.
>>  - remove mention of vhost IOMMU mode from
>>netdev_dpdk_vhost_client_configure() log.
>>
>> v3:  - erroneous; disregard.
>>
>> v2:  - rebase to HEAD of master
>>  - refactor vHost IOMMU enablement mechanism (use a global
>>config option, instead of the previous per-port approach).
>> ---
>>  Documentation/topics/dpdk/vhost-user.rst | 27 +++
>>  NEWS |  1 +
>>  lib/dpdk-stub.c  |  6 ++
>>  lib/dpdk.c   | 12 
>>  lib/dpdk.h   |  3 +++
>>  lib/netdev-dpdk.c| 14 ++
>>  vswitchd/vswitch.xml | 15 +++
>>  7 files changed, 74 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>b/Documentation/topics/dpdk/vhost-user.rst
>> index 5347995..ffef653 100644
>> --- a/Documentation/topics/dpdk/vhost-user.rst
>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for vHost
>ports to 'reconnect' in
>>  event of the switch crashing or being brought down. Once it is brought back
>up,
>>  the vHost ports will reconnect automatically and normal service will
>resume.
>>
>> +vhost-user-client IOMMU Support
>> +~~~
>> +
>> +vhost IOMMU is a feature which restricts the vhost memory that a virtio
>device
>> +can access, and as such is useful in deployments in which security is a
>concern.
>> +
>> +IOMMU support may be enabled via a global config value, ```vhost-iommu-
>support```.
>> +Setting this to true enables vhost IOMMU support for all vhost ports
>when/where
>> +available::
>> +
>> +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true
>
>Few spaces missing in above command.
>
>> +
>> +The default 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kavanagh, Mark B
Yuanhan's old email address was used in the previous mail - updated correctly 
here.
-Mark

>-Original Message-
>From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
>On Behalf Of Kavanagh, Mark B
>Sent: Thursday, December 7, 2017 11:24 AM
>To: Stokes, Ian ; d...@openvswitch.org
>Cc: Liu, Yuanhan ; Bie, Tiwei ;
>Mcnamara, John ; maxime.coque...@redhat.com;
>i.maxim...@samsung.com
>Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>Hi folks,
>
>Thanks for all of your respective reviews and testing of this patchset.
>
>It seems, however, that an issue in DPDK v17.11 has come to light that affects
>guests which use testpmd.
>Specifically, traffic does not reach a guest when traffic is live prior to
>kicking off the testpmd app within said guest, and at least two forwarding
>cores are used. [1]
>
>This is explained in additional detail is [2], an extract of which is
>reproduced below:
>
>   "the vector Rx could be broken if backend has consumed all the avail
>descs before the
>device is started. Because in current implementation, the vector Rx 
> will
>return immediately
>without refilling the avail ring if the used ring is empty. So we have
>to refill
>the avail ring after flushing the elements in the used ring."
>
>This issue was initially uncovered by Antonio Fischetti, as part of the 17.11
>patchset validation, and has since been localized to DPDK, rather than OvS.
>As a result, it seems now that we should not move to 17.11, but seek an out-
>of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this -
>should we:
>   a) move to 17.11 now, note the issue above in the 'errata' section of 
> the
>documentation, and move to 17.11.1 when it becomes available in February of
>next year
>   b) request the early release of the 17.11.1 stable branch, which
>incorporates a fix for this issue (the possibility, and availability, of which
>are both TBD).
>
>Thanks in advance,
>Mark
>
>[1] http://dpdk.org/ml/archives/dev/2017-December/082801.html
>[2] http://dpdk.org/ml/archives/dev/2017-December/082874.html
>
>
>>-Original Message-
>>From: Stokes, Ian
>>Sent: Thursday, December 7, 2017 8:43 AM
>>To: Kavanagh, Mark B ; d...@openvswitch.org
>>Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com;
>>jan.scheur...@ericsson.com; Mooney, Sean K 
>>Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>>
>>> DPDK v17.11 introduces support for the vHost IOMMU feature.
>>> This is a security feature, which restricts the vhost memory that a virtio
>>> device may access.
>>
>>Hi all,
>>
>>There were a few requests for changes in the v4 of this patch and it's an
>>important aspect of the DPDK 17.11 support for OVS.
>>
>>I'd like to get this series in to the pull request this week. If people have
>>flagged issues for the latest revision I'd appreciate if they could review
>the
>>latest revision and flag any new issues that need to be raised.
>>
>>Ian
>>
>>>
>>> This feature also enables the vhost REPLY_ACK protocol, the implementation
>>> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is
>>> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature
>>> is disabled by default in (and should remain so), for the aforementioned
>>> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
>>> safely be enabled, even without having an IOMMU device, with no
>>> performance penalty.
>>>
>>> This patch adds a new global config option, vhost-iommu-support, that
>>> controls enablement of the vhost IOMMU feature:
>>>
>>> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
>>>
>>> This value defaults to false; to enable IOMMU support, this field should
>>> be set to true when setting other global parameters on init (such as
>>> "dpdk-socket-mem", for example). Changing the value at runtime is not
>>> supported, and requires restarting the vswitch daemon.
>>>
>>> Signed-off-by: Mark Kavanagh 
>>>
>>> ---
>>>
>>> v4:  - rebase to HEAD of master
>>>  - clarify that IOMMU support applies exclusively to vhost user
>>>client ports.
>>>  - reword caveats regarding modifying vhost-iommu-support at runtime
>>>  - use smap_get_bool() to parse vhost-iommu-support, instead of
>>>process_vhost_flags().
>>>  - add stub implementation of dpdk_vhost_iommu_enabled().
>>>  - move stdbool.h #include outside of DPDK compiler guards.
>>>  - remove mention of vhost IOMMU mode from
>>>netdev_dpdk_vhost_client_configure() log.
>>>
>>> v3:  - erroneous; disregard.
>>>
>>> v2:  - rebase to HEAD of master
>>>  - refactor vHost IOMMU enablement mechanism (use a global
>>>config option, instead of the previous per-port 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Kavanagh, Mark B
Hi folks,

Thanks for all of your respective reviews and testing of this patchset.

It seems, however, that an issue in DPDK v17.11 has come to light that affects 
guests which use testpmd. 
Specifically, traffic does not reach a guest when traffic is live prior to 
kicking off the testpmd app within said guest, and at least two forwarding 
cores are used. [1]

This is explained in additional detail is [2], an extract of which is 
reproduced below:

"the vector Rx could be broken if backend has consumed all the avail 
descs before the
 device is started. Because in current implementation, the vector Rx 
will return immediately
 without refilling the avail ring if the used ring is empty. So we have 
to refill
 the avail ring after flushing the elements in the used ring."

This issue was initially uncovered by Antonio Fischetti, as part of the 17.11 
patchset validation, and has since been localized to DPDK, rather than OvS.
As a result, it seems now that we should not move to 17.11, but seek an 
out-of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - 
should we:
a) move to 17.11 now, note the issue above in the 'errata' section of 
the documentation, and move to 17.11.1 when it becomes available in February of 
next year
b) request the early release of the 17.11.1 stable branch, which 
incorporates a fix for this issue (the possibility, and availability, of which 
are both TBD).

Thanks in advance,
Mark

[1] http://dpdk.org/ml/archives/dev/2017-December/082801.html 
[2] http://dpdk.org/ml/archives/dev/2017-December/082874.html 


>-Original Message-
>From: Stokes, Ian
>Sent: Thursday, December 7, 2017 8:43 AM
>To: Kavanagh, Mark B ; d...@openvswitch.org
>Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com;
>jan.scheur...@ericsson.com; Mooney, Sean K 
>Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>
>> DPDK v17.11 introduces support for the vHost IOMMU feature.
>> This is a security feature, which restricts the vhost memory that a virtio
>> device may access.
>
>Hi all,
>
>There were a few requests for changes in the v4 of this patch and it's an
>important aspect of the DPDK 17.11 support for OVS.
>
>I'd like to get this series in to the pull request this week. If people have
>flagged issues for the latest revision I'd appreciate if they could review the
>latest revision and flag any new issues that need to be raised.
>
>Ian
>
>>
>> This feature also enables the vhost REPLY_ACK protocol, the implementation
>> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is
>> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature
>> is disabled by default in (and should remain so), for the aforementioned
>> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
>> safely be enabled, even without having an IOMMU device, with no
>> performance penalty.
>>
>> This patch adds a new global config option, vhost-iommu-support, that
>> controls enablement of the vhost IOMMU feature:
>>
>> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
>>
>> This value defaults to false; to enable IOMMU support, this field should
>> be set to true when setting other global parameters on init (such as
>> "dpdk-socket-mem", for example). Changing the value at runtime is not
>> supported, and requires restarting the vswitch daemon.
>>
>> Signed-off-by: Mark Kavanagh 
>>
>> ---
>>
>> v4:  - rebase to HEAD of master
>>  - clarify that IOMMU support applies exclusively to vhost user
>>client ports.
>>  - reword caveats regarding modifying vhost-iommu-support at runtime
>>  - use smap_get_bool() to parse vhost-iommu-support, instead of
>>process_vhost_flags().
>>  - add stub implementation of dpdk_vhost_iommu_enabled().
>>  - move stdbool.h #include outside of DPDK compiler guards.
>>  - remove mention of vhost IOMMU mode from
>>netdev_dpdk_vhost_client_configure() log.
>>
>> v3:  - erroneous; disregard.
>>
>> v2:  - rebase to HEAD of master
>>  - refactor vHost IOMMU enablement mechanism (use a global
>>config option, instead of the previous per-port approach).
>> ---
>>  Documentation/topics/dpdk/vhost-user.rst | 27 +++
>>  NEWS |  1 +
>>  lib/dpdk-stub.c  |  6 ++
>>  lib/dpdk.c   | 12 
>>  lib/dpdk.h   |  3 +++
>>  lib/netdev-dpdk.c| 14 ++
>>  vswitchd/vswitch.xml | 15 +++
>>  7 files changed, 74 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>> b/Documentation/topics/dpdk/vhost-user.rst
>> index 5347995..ffef653 100644
>> --- 

Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.

2017-12-07 Thread Ilya Maximets
Thanks a lot for testing the series.
>From my side I want to share testing results for my primary target of this
patch set: PVP with DPDK guest and bonded physical ports in OVS.

Setup looks like:

netdev@ovs-netdev:
br-virt:
vhuost-user-1 (dpdkvhostuserclient, 1 rx and 1 tx queue)
bond-patch (peer=virt-patch)
br-bond:
ens1f0 (dpdk)
ens1f1 (dpdk)
virt-patch (peer=bond-patch)

pmd thread numa_id 0 core_id 1:
isolated : true
port: ens1f0queue-id: 0
port: ens1f1queue-id: 0
pmd thread numa_id 0 core_id 2:
isolated : true
port: vhuost-user-1queue-id: 0

Bridge "br-bond"
Port "bond"
Interface "ens1f1"
Interface "ens1f0"

Port "bond" is the balance-tcp OVS bond port.

testpmd DPDK application works in guest in macswap mode.

Packet size: 512B

Results are in Mpps averaged over three 20 second runs.


Results:

 master  patchpatch  
FlowsMppstx-flush-interval=0  tx-flush-interval=50   

83.891   4.236   + 8.8%   4.210+ 8.2%
256  2.612   3.268   +25.1%   3.077+17.8%
1024 2.509   3.144   +25.3%   2.935+16.9%
8192 2.031   2.384   +17.3%   2.379+17.1%
1048576  1.950   2.292   +17.5%   2.287+17.2%


The patch-set provides constant significant improvement for all numbers of flows
in that testing scenario up to 25% for medium number of flows. Time based 
batching
shows less performance improvement (significantly less for medium numbers of 
flows)
and not preferred in that case.

Significant performance improvement achieved due to reduced number of send 
operations.
With patches applied OVS makes only ~2 calls to netdev_send() per input batch 
(one call
per port in bonding) instead of sending almost each packet separately.

Best regards, Ilya Maximets.

On 05.12.2017 20:26, Jan Scheurich wrote:
> We have now repeated our earlier iperf3 tests for this patch series.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html
> 
>  
> 
> We use an iperf3 server as representative for a typical IO-intensive kernel 
> application. The iperf3 server executes in a VM with 2 vCPUs where both 
> virtio interrupts and iperf process are pinned to the same vCPU for best 
> performance. We run two iperf3 clients in parallel on a different server to 
> avoid the client to become the bottleneck when enabling tx batching.
> 
>  
> 
> OVS   tx-flush-     iperf3  Avg. PMD    PMD   Iperf   ping -f
> 
> version   interval  Gbps    cycles/pkt  util      CPU load    avg rtt 
>  
> 
> --
> 
> master    - 7.24    1778    46.5%  99.7%  23 us
> 
> Patch v6  0 7.18    1873    47.7% 100.0%  29 us
> 
> Patch v6 50 8.99    1108    36.3%  99.7%  38 us
> 
> Patch v6    100           -   88 us
> 
>  
> 
> In all cases the vCPU capacity of the of the server VM handing the virtio 
> interrupts and the iperf3 server thread is the bottleneck. The TCP throughput 
> is throttled by packets being dropped on Tx to the vhostuser port of the 
> server VM. The Linux kernel is not fast enough to handle the interrupts and 
> poll the incoming packets.
> 
>  
> 
> As expected the tx batching patch alone with tx-flush-interval=0 does not 
> provide any benefit as it doesn’t reduce the virtio interrupt rate.
> 
>  
> 
> Setting the tx-flush-interval to 50 microseconds immediately improves the 
> throughput: The PMD utilization drops from 47% to 36% due to the reduced rate 
> of write calls to the virtio kick fd. (I believe the more pronounced drop in 
> processing cycles/pkt is an artifact of the patch. The cycles used for 
> delayed tx to vhostuser are no longer counted as packet processing cost. To 
> be checked in the individual patch review.)
> 
>  
> 
> More importantly, the iperf3 server VM can now receive 8.99 instead of 7.24 
> Gbit/s, an increase by 24%. I am sure that 10G line rate could be reached 
> with vhost multi-queue in the server VM.
> 
>  
> 
> Compared to the v4 version of the patches, the impact on latency is now 
> reduced a lot. Packets with an inter-arrival time larger than the configured 
> tx-flush-interval are not affected at all. For a 50 us tx-flush-interval this 
> means packet flows with a packet rate of up to 20 Kpps!
> 
>  
> 
> Hence the average RTT reported by “ping -f” experience only a small increases 
> from 23 us on master to 38 us with tx-flush-interval=50. Only when increasing 
> tx-flush-interval well beyond the intrinsic average inter-arrival time, it 
> translates directly into increased latency.
> 
>  
> 
> Conclusion: Time-based tx batching fulfills the expectations for 
> 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Ilya Maximets
Looks good to me in general, but I prefer if you'll fix following
checkpatch warnings:

== Checking 2e9587f27add ("netdev-dpdk: vHost IOMMU support") ==
WARNING: Line length is >79-characters long
#52 FILE: Documentation/topics/dpdk/vhost-user.rst:280:
can access, and as such is useful in deployments in which security is a concern.

WARNING: Line length is >79-characters long
#54 FILE: Documentation/topics/dpdk/vhost-user.rst:282:
IOMMU support may be enabled via a global config value, 
```vhost-iommu-support```.

WARNING: Line length is >79-characters long
#71 FILE: Documentation/topics/dpdk/vhost-user.rst:299:
default (and should remain so if using the aforementioned versions of QEMU).

One more comment inline.

Best regards, Ilya Maximets.

On 04.12.2017 14:15, Mark Kavanagh wrote:
> DPDK v17.11 introduces support for the vHost IOMMU feature.
> This is a security feature, which restricts the vhost memory
> that a virtio device may access.
> 
> This feature also enables the vhost REPLY_ACK protocol, the
> implementation of which is known to work in newer versions of
> QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
> v2.9.0, inclusive). As such, the feature is disabled by default
> in (and should remain so), for the aforementioned older QEMU
> verions. Starting with QEMU v2.9.1, vhost-iommu-support can
> safely be enabled, even without having an IOMMU device, with
> no performance penalty.
> 
> This patch adds a new global config option, vhost-iommu-support,
> that controls enablement of the vhost IOMMU feature:
> 
> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> 
> This value defaults to false; to enable IOMMU support, this field
> should be set to true when setting other global parameters on init
> (such as "dpdk-socket-mem", for example). Changing the value at
> runtime is not supported, and requires restarting the vswitch daemon.
> 
> Signed-off-by: Mark Kavanagh 
> 
> ---
> 
> v4:  - rebase to HEAD of master
>  - clarify that IOMMU support applies exclusively to vhost user
>client ports.
>  - reword caveats regarding modifying vhost-iommu-support at runtime
>  - use smap_get_bool() to parse vhost-iommu-support, instead of
>process_vhost_flags().
>  - add stub implementation of dpdk_vhost_iommu_enabled().
>  - move stdbool.h #include outside of DPDK compiler guards.
>  - remove mention of vhost IOMMU mode from
>netdev_dpdk_vhost_client_configure() log.
> 
> v3:  - erroneous; disregard.
> 
> v2:  - rebase to HEAD of master
>  - refactor vHost IOMMU enablement mechanism (use a global
>config option, instead of the previous per-port approach).
> ---
>  Documentation/topics/dpdk/vhost-user.rst | 27 +++
>  NEWS |  1 +
>  lib/dpdk-stub.c  |  6 ++
>  lib/dpdk.c   | 12 
>  lib/dpdk.h   |  3 +++
>  lib/netdev-dpdk.c| 14 ++
>  vswitchd/vswitch.xml | 15 +++
>  7 files changed, 74 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 5347995..ffef653 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for vHost 
> ports to 'reconnect' in
>  event of the switch crashing or being brought down. Once it is brought back 
> up,
>  the vHost ports will reconnect automatically and normal service will resume.
>  
> +vhost-user-client IOMMU Support
> +~~~
> +
> +vhost IOMMU is a feature which restricts the vhost memory that a virtio 
> device
> +can access, and as such is useful in deployments in which security is a 
> concern.
> +
> +IOMMU support may be enabled via a global config value, 
> ```vhost-iommu-support```.
> +Setting this to true enables vhost IOMMU support for all vhost ports 
> when/where
> +available::
> +
> +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true

Few spaces missing in above command.

> +
> +The default value is false.
> +
> +.. important::
> +
> +Changing this value requires restarting the daemon.
> +
> +.. important::
> +
> +Enabling the IOMMU feature also enables the vhost user reply-ack 
> protocol;
> +this is known to work on QEMU v2.10.0, but is buggy on older versions
> +(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is disabled by
> +default (and should remain so if using the aforementioned versions of 
> QEMU).
> +Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, 
> even
> +without having an IOMMU device, with no performance penalty.
> +
>  .. _dpdk-testpmd:
>  
>  DPDK in the Guest
> diff --git a/NEWS b/NEWS
> index 

Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support

2017-12-07 Thread Stokes, Ian
> DPDK v17.11 introduces support for the vHost IOMMU feature.
> This is a security feature, which restricts the vhost memory that a virtio
> device may access.

Hi all,

There were a few requests for changes in the v4 of this patch and it's an 
important aspect of the DPDK 17.11 support for OVS.

I'd like to get this series in to the pull request this week. If people have 
flagged issues for the latest revision I'd appreciate if they could review the 
latest revision and flag any new issues that need to be raised.

Ian

> 
> This feature also enables the vhost REPLY_ACK protocol, the implementation
> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is
> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature
> is disabled by default in (and should remain so), for the aforementioned
> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can
> safely be enabled, even without having an IOMMU device, with no
> performance penalty.
> 
> This patch adds a new global config option, vhost-iommu-support, that
> controls enablement of the vhost IOMMU feature:
> 
> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> 
> This value defaults to false; to enable IOMMU support, this field should
> be set to true when setting other global parameters on init (such as
> "dpdk-socket-mem", for example). Changing the value at runtime is not
> supported, and requires restarting the vswitch daemon.
> 
> Signed-off-by: Mark Kavanagh 
> 
> ---
> 
> v4:  - rebase to HEAD of master
>  - clarify that IOMMU support applies exclusively to vhost user
>client ports.
>  - reword caveats regarding modifying vhost-iommu-support at runtime
>  - use smap_get_bool() to parse vhost-iommu-support, instead of
>process_vhost_flags().
>  - add stub implementation of dpdk_vhost_iommu_enabled().
>  - move stdbool.h #include outside of DPDK compiler guards.
>  - remove mention of vhost IOMMU mode from
>netdev_dpdk_vhost_client_configure() log.
> 
> v3:  - erroneous; disregard.
> 
> v2:  - rebase to HEAD of master
>  - refactor vHost IOMMU enablement mechanism (use a global
>config option, instead of the previous per-port approach).
> ---
>  Documentation/topics/dpdk/vhost-user.rst | 27 +++
>  NEWS |  1 +
>  lib/dpdk-stub.c  |  6 ++
>  lib/dpdk.c   | 12 
>  lib/dpdk.h   |  3 +++
>  lib/netdev-dpdk.c| 14 ++
>  vswitchd/vswitch.xml | 15 +++
>  7 files changed, 74 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 5347995..ffef653 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for
> vHost ports to 'reconnect' in  event of the switch crashing or being
> brought down. Once it is brought back up,  the vHost ports will reconnect
> automatically and normal service will resume.
> 
> +vhost-user-client IOMMU Support
> +~~~
> +
> +vhost IOMMU is a feature which restricts the vhost memory that a virtio
> +device can access, and as such is useful in deployments in which security
> is a concern.
> +
> +IOMMU support may be enabled via a global config value, ```vhost-iommu-
> support```.
> +Setting this to true enables vhost IOMMU support for all vhost ports
> +when/where
> +available::
> +
> +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true
> +
> +The default value is false.
> +
> +.. important::
> +
> +Changing this value requires restarting the daemon.
> +
> +.. important::
> +
> +Enabling the IOMMU feature also enables the vhost user reply-ack
> protocol;
> +this is known to work on QEMU v2.10.0, but is buggy on older versions
> +(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is
> disabled by
> +default (and should remain so if using the aforementioned versions of
> QEMU).
> +Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled,
> even
> +without having an IOMMU device, with no performance penalty.
> +
>  .. _dpdk-testpmd:
> 
>  DPDK in the Guest
> diff --git a/NEWS b/NEWS
> index d4a1c9a..99c47d8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,7 @@ Post-v2.8.0
>   * Add support for compiling OVS with the latest Linux 4.13 kernel
> - DPDK:
>   * Add support for DPDK v17.11
> + * Add support for vHost IOMMU
> 
>  v2.8.0 - 31 Aug 2017
>  
> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..3602180
> 100644
> --- a/lib/dpdk-stub.c
> +++ b/lib/dpdk-stub.c
> @@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void)  {
>  return