Re: [ovs-dev] [PATCH V5 1/2] netdev-dpdk: DPDK v17.11 upgrade
On Thu, Dec 07, 2017 at 05:45:36PM +, Mark Kavanagh wrote: > This commit adds support for DPDK v17.11: > - minor updates to accomodate DPDK API changes > - update references to DPDK version in Documentation > - update DPDK version in travis' linux-build script > - document DPDK v17.11 virtio driver bug > > Signed-off-by: Mark Kavanagh> Acked-by: Maxime Coquelin > Acked-by: Ciara Loftus > Acked-by: Jan Scheurich > Tested-by: Jan Scheurich > Tested-by: Guoshuai Li > --- [...] > > -.. _DPDK release notes: > http://dpdk.org/doc/guides/rel_notes/release_17_05.html > +.. _DPDK release notes: > http://dpdk.org/doc/guides/rel_notes/release_17_11.html > + > +- The DPDK v17.11 virtio net driver contains a bug that prevents guest DPDK > + applications from receiving packets. This only occurs when guest-bound > traffic > + is live before a DPDK application is started within the guest, and where > two > + or more forwarding cores are used. As such, it is not recommended for > guests > + which execute DPDK applications to upgrade to DPDK v17.11. > + > + Note that this issue does not occur when guest network devices are > controlled > + by the guest kernel. > + > + Details regarding the virtio driver bug are available on the `DPDK mailing > + list`_. > + > +.. _DPDK mailing list: > http://dpdk.org/ml/archives/dev/2017-December/082801.html > > Reporting Bugs > -- > diff --git a/Documentation/topics/dpdk/ring.rst > b/Documentation/topics/dpdk/ring.rst > index ad9d7a5..8d0ede8 100644 > --- a/Documentation/topics/dpdk/ring.rst > +++ b/Documentation/topics/dpdk/ring.rst > @@ -77,4 +77,4 @@ DPDK. However, this functionality was removed because: > - :doc:`vhost-user interfaces ` are the defacto DPDK-based path > to >guests > > -.. _DPDK documentation: > https://dpdk.readthedocs.io/en/v17.05/prog_guide/ring_lib.html > +.. _DPDK documentation: > https://dpdk.readthedocs.io/en/v17.11/prog_guide/ring_lib.html > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index 74ac06e..1a5de24 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -287,6 +287,11 @@ application in the VM. > >Support for DPDK in the guest requires QEMU >= 2.2 > > +.. important:: > + > + DPDK v17.11 virtio PMD contains a bug that affects testpmd/DPDK guest > + applications. As such, guests should not upgrade to 17.11. > + I'm not very familiar with the use case related to ovs. The bug you mentioned only affects the vector Rx of virtio PMD, while the vector Rx won't be enabled by default, as it has the most limitations among the three Rx functions in virtio PMD, e.g. it's not able to handle the mergeable rxbuf (VIRTIO_NET_F_MRG_RXBUF) which is enabled by default and commonly used. But I'm not sure whether the vector Rx of virtio PMD is commonly used in ovs related cases. If it's really necessary to document a PMD bug of DPDK in ovs, I think we should provide more accurate info in this warning, e.g. warn users not to use vector Rx when using virtio PMD instead of warning them not to use DPDK 17.11. Thanks, Tiwei ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH v3 2/8] lib/dp-packet: init specific mbuf fields to 0
2017/11/22 2:29, Mark Kavanagh : dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to 0, on creation of a new dp_packet: - ol_flags - nb_segs - tx_offload - packet_type Adapted from an idea by Michael Qiu: https://patchwork.ozlabs.org/patch/777570/ Signed-off-by: Mark Kavanagh Acked-by: Michael Qiu --- lib/dp-packet.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..7aa440f 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV -p->mbuf.ol_flags = 0; +struct rte_mbuf *mbuf = &(p->mbuf); +mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0; #endif } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Bonjour,
Bonjour, Je vous prie de me confirmer l'accusé de réception du mail, si j'ai pris contact avec vous ,c'est concernant un projet humanitaire que je tiens à réaliser dans votre pays car je tiens a mettre à votre disposition une somme de Un Million Cinq Cent Mille Euros que je dispose au niveau de la Banque UBA BENIN pour la réalisation d'un projet humanitaire. Pour ce fait œuvre humanitaire constituera a aidé les plus âgées et pauvre ainsi que les orphelins et les non abrités en leur construisant des maisons, mosquées, écoles. Etc. En effet je suis une femme en hospitalisation car je suis souffrante d'une ischémie en phase terminale de son apogée. Je suis stérile et je n'ai pas eu d'enfants .Raison pour laquelle la providence m'oriente vers vous. Espérant vous lire afin de vous donner plus d'amples d'explication sur ce projet. Mme Ginette ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 13/13] log: Add async commit support.
The OVSDB log code has always had the ability to commit the log to disk and wait for the commit to finish. This patch introduces a new feature that allows the client to start a commit in the background and then to determine asynchronously that the commit has completed. This will be especially useful later for the distributed database feature. Signed-off-by: Ben Pfaff--- ovsdb/file.c | 4 +- ovsdb/log.c| 152 +++-- ovsdb/log.h| 7 ++- ovsdb/ovsdb-tool.c | 2 +- tests/test-ovsdb.c | 2 +- 5 files changed, 158 insertions(+), 9 deletions(-) diff --git a/ovsdb/file.c b/ovsdb/file.c index 094fa0ce207a..40fecdf59af5 100644 --- a/ovsdb/file.c +++ b/ovsdb/file.c @@ -651,7 +651,7 @@ ovsdb_file_compact(struct ovsdb_file *file) /* Commit the old version, so that we can be assured that we'll eventually * have either the old or the new version. */ -error = ovsdb_log_commit(file->log); +error = ovsdb_log_commit_block(file->log); if (error) { goto exit; } @@ -854,7 +854,7 @@ ovsdb_file_txn_commit(struct json *json, const char *comment, } if (durable) { -error = ovsdb_log_commit(log); +error = ovsdb_log_commit_block(log); if (error) { return ovsdb_wrap_error(error, "committing transaction failed"); } diff --git a/ovsdb/log.c b/ovsdb/log.c index e364667ac673..e2c58d1cc8b1 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -24,12 +24,17 @@ #include #include +#include "lockfile.h" #include "openvswitch/dynamic-string.h" #include "openvswitch/json.h" #include "openvswitch/vlog.h" -#include "lockfile.h" -#include "ovsdb.h" +#include "ovs-atomic.h" +#include "ovs-rcu.h" +#include "ovs-thread.h" #include "ovsdb-error.h" +#include "ovsdb.h" +#include "openvswitch/poll-loop.h" +#include "seq.h" #include "sha1.h" #include "socket-util.h" #include "transaction.h" @@ -78,6 +83,7 @@ struct ovsdb_log { struct lockfile *lockfile; FILE *stream; off_t base; +struct afsync *afsync; }; /* Whether the OS supports renaming open files. @@ -95,6 +101,9 @@ static bool parse_header(char *header, const char **magicp, uint8_t sha1[SHA1_DIGEST_SIZE]); static bool is_magic_ok(const char *needle, const char *haystack); +static struct afsync *afsync_create(int fd, uint64_t initial_ticket); +static uint64_t afsync_destroy(struct afsync *); + /* Attempts to open 'name' with the specified 'open_mode'. On success, stores * the new log into '*filep' and returns NULL; otherwise returns NULL and * stores NULL into '*filep'. @@ -261,6 +270,7 @@ ovsdb_log_open(const char *name, const char *magic, file->prev_offset = 0; file->offset = 0; file->base = 0; +file->afsync = NULL; *filep = file; return NULL; @@ -300,6 +310,7 @@ ovsdb_log_close(struct ovsdb_log *file) { if (file) { ovsdb_error_destroy(file->error); +afsync_destroy(file->afsync); free(file->name); free(file->display_name); free(file->magic); @@ -606,8 +617,10 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json *json) return NULL; } +/* Attempts to commit 'file' to disk. Waits for the commit to succeed or fail. + * Returns NULL if successful, otherwise the error that occurred. */ struct ovsdb_error * -ovsdb_log_commit(struct ovsdb_log *file) +ovsdb_log_commit_block(struct ovsdb_log *file) { if (file->stream && fsync(fileno(file->stream))) { return ovsdb_io_error(errno, "%s: fsync failed", file->display_name); @@ -712,7 +725,7 @@ ovsdb_rename(const char *old, const char *new) struct ovsdb_error * OVS_WARN_UNUSED_RESULT ovsdb_log_replace_commit(struct ovsdb_log *old, struct ovsdb_log *new) { -struct ovsdb_error *error = ovsdb_log_commit(new); +struct ovsdb_error *error = ovsdb_log_commit_block(new); if (error) { ovsdb_log_replace_abort(new); return error; @@ -784,6 +797,10 @@ ovsdb_log_replace_commit(struct ovsdb_log *old, struct ovsdb_log *new) ovsdb_error_destroy(old->error); old->error = NULL; /* prev_offset only matters for OVSDB_LOG_READ. */ +if (old->afsync) { +uint64_t ticket = afsync_destroy(old->afsync); +old->afsync = afsync_create(fileno(old->stream), ticket + 1); +} old->offset = new->offset; /* Keep old->name. */ free(old->magic); @@ -816,3 +833,130 @@ ovsdb_log_disable_renaming_open_files(void) { rename_open_files = false; } + +struct afsync { +pthread_t thread; +atomic_uint64_t cur, next; +struct seq *request, *complete; +int fd; +}; + +static void * +afsync_thread(void *afsync_) +{ +struct afsync *afsync = afsync_; +uint64_t cur = 0; +for (;;) { +ovsrcu_quiesce_start(); + +uint64_t request_seq = seq_read(afsync->request); + +uint64_t next; +
[ovs-dev] [PATCH 12/13] log: Use absolute name of log file.
In ovsdb-server, the OVSDB log code is used to open the databases specified on the command line before ovsdb-server daemonizes itself. Afterward, it is occasionally necessary for ovsdb-server to reference those files by name again. When that happens, if daemonization changed the current directory to the root, any relative names are no longer valid and those references will fail. Until now, this was handled at a higher level in ovsdb-server, but in the future it will be convenient to handle it in the log code itself. This commit prepares for that by making the log code take the absolute name of log files itself. Signed-off-by: Ben Pfaff--- ovsdb/log.c | 93 +++-- 1 file changed, 60 insertions(+), 33 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index 27f4b328d451..e364667ac673 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -72,7 +72,8 @@ struct ovsdb_log { off_t prev_offset; off_t offset; -char *name; +char *name; /* Absolute name of file. */ +char *display_name; /* For use in log messages, etc. */ char *magic; struct lockfile *lockfile; FILE *stream; @@ -124,8 +125,22 @@ ovsdb_log_open(const char *name, const char *magic, if (open_mode == OVSDB_LOG_CREATE_EXCL || open_mode == OVSDB_LOG_CREATE) { ovs_assert(!strchr(magic, '|')); } + *filep = NULL; +/* Get the absolute name of the file because we might need to access it by + * name again later after the process has changed directory (e.g. because + * daemonize() chdirs to "/"). + * + * We save the user-provided name of the file for use in log messages, to + * reduce user confusion. */ +char *abs_name = abs_file_name(NULL, name); +if (!abs_name) { +error = ovsdb_io_error(0, "could not determine current " + "working directory"); +goto error; +} + ovs_assert(locking == -1 || locking == false || locking == true); if (locking < 0) { locking = open_mode != OVSDB_LOG_READ_ONLY; @@ -238,7 +253,8 @@ ovsdb_log_open(const char *name, const char *magic, struct ovsdb_log *file = xmalloc(sizeof *file); file->state = OVSDB_LOG_READ; file->error = NULL; -file->name = xstrdup(name); +file->name = abs_name; +file->display_name = xstrdup(name); file->magic = xstrdup(actual_magic); file->lockfile = lockfile; file->stream = stream; @@ -253,6 +269,7 @@ error_fclose: error_unlock: lockfile_unlock(lockfile); error: +free(abs_name); return error; } @@ -284,6 +301,7 @@ ovsdb_log_close(struct ovsdb_log *file) if (file) { ovsdb_error_destroy(file->error); free(file->name); +free(file->display_name); free(file->magic); if (file->stream) { fclose(file->stream); @@ -359,8 +377,9 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned long int length, json_parser_abort(parser); return ovsdb_io_error(ferror(file->stream) ? errno : EOF, "%s: error reading %lu bytes " - "starting at offset %lld", file->name, - length, (long long int) offset); + "starting at offset %lld", + file->display_name, length, + (long long int) offset); } sha1_update(, input, chunk); json_parser_feed(parser, input, chunk); @@ -412,7 +431,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) if (feof(file->stream)) { return NULL; } -error = ovsdb_io_error(errno, "%s: read failed", file->name); +error = ovsdb_io_error(errno, "%s: read failed", file->display_name); goto error; } off_t data_offset = file->offset + strlen(header); @@ -422,7 +441,8 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) || strcmp(magic, file->magic)) { error = ovsdb_syntax_error(NULL, NULL, "%s: parse error at offset " "%lld in header line \"%.*s\"", - file->name, (long long int) file->offset, + file->display_name, + (long long int) file->offset, (int) strcspn(header, "\n"), header); goto error; } @@ -436,7 +456,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) error = ovsdb_syntax_error(NULL, NULL, "%s: %lu bytes starting at " "offset %lld have SHA-1 hash "SHA1_FMT" " "but should have hash "SHA1_FMT, - file->name, data_length, + file->display_name,
[ovs-dev] [PATCH 11/13] log: Replace ovsdb_log_get_offset() by a more abstract mechanism.
Upcoming support for clustered databases will need to provide a more abstract way to determine when a given file should be compacted, so this changes the standalone database support to use this mechanism in advance. Signed-off-by: Ben Pfaff--- ovsdb/file.c | 24 ovsdb/log.c | 26 +++--- ovsdb/log.h | 22 +- 3 files changed, 48 insertions(+), 24 deletions(-) diff --git a/ovsdb/file.c b/ovsdb/file.c index fca07c3d722d..094fa0ce207a 100644 --- a/ovsdb/file.c +++ b/ovsdb/file.c @@ -73,7 +73,6 @@ static struct ovsdb_error *ovsdb_file_create(struct ovsdb *, struct ovsdb_log *, const char *file_name, unsigned int n_transactions, - off_t snapshot_size, struct ovsdb_file **filep); /* Opens database 'file_name' and stores a pointer to the new database in @@ -209,7 +208,6 @@ ovsdb_file_open__(const char *file_name, * * The schema precedes the snapshot in the log; we could compensate for its * size, but it's just not that important. */ -off_t snapshot_size = 0; unsigned int n_transactions = 0; while ((error = ovsdb_log_read(log, )) == NULL && json) { struct ovsdb_txn *txn; @@ -230,7 +228,7 @@ ovsdb_file_open__(const char *file_name, } if (n_transactions == 1) { -snapshot_size = ovsdb_log_get_offset(log); +ovsdb_log_mark_base(log); } } if (error) { @@ -247,8 +245,7 @@ ovsdb_file_open__(const char *file_name, if (!read_only) { struct ovsdb_file *file; -error = ovsdb_file_create(db, log, file_name, n_transactions, - snapshot_size, ); +error = ovsdb_file_create(db, log, file_name, n_transactions, ); if (error) { goto error; } @@ -515,7 +512,6 @@ struct ovsdb_file { long long int last_compact; long long int next_compact; unsigned int n_transactions; -off_t snapshot_size; }; static const struct ovsdb_replica_class ovsdb_file_class; @@ -523,8 +519,7 @@ static const struct ovsdb_replica_class ovsdb_file_class; static struct ovsdb_error * ovsdb_file_create(struct ovsdb *db, struct ovsdb_log *log, const char *file_name, - unsigned int n_transactions, off_t snapshot_size, - struct ovsdb_file **filep) + unsigned int n_transactions, struct ovsdb_file **filep) { struct ovsdb_file *file; char *deref_name; @@ -548,7 +543,6 @@ ovsdb_file_create(struct ovsdb *db, struct ovsdb_log *log, file->file_name = abs_name; file->last_compact = time_msec(); file->next_compact = file->last_compact + COMPACT_MIN_MSEC; -file->snapshot_size = snapshot_size; file->n_transactions = n_transactions; ovsdb_add_replica(db, >replica); @@ -601,12 +595,9 @@ ovsdb_file_commit(struct ovsdb_replica *replica, * tried), and if there are at least 100 transactions in the database, and * if the database is at least 10 MB, and the database is at least 4x the * size of the previous snapshot, then compact the database. */ -off_t log_size = ovsdb_log_get_offset(file->log); if (time_msec() >= file->next_compact && file->n_transactions >= 100 -&& log_size >= 10 * 1024 * 1024 -&& log_size / 4 >= file->snapshot_size) -{ +&& ovsdb_log_has_grown(file->log)) { error = ovsdb_file_compact(file); if (error) { char *s = ovsdb_error_to_string(error); @@ -653,10 +644,9 @@ ovsdb_file_compact(struct ovsdb_file *file) int retval; comment = xasprintf("compacting database online " -"(%.3f seconds old, %u transactions, %llu bytes)", +"(%.3f seconds old, %u transactions)", (time_wall_msec() - file->last_compact) / 1000.0, -file->n_transactions, -(unsigned long long) ovsdb_log_get_offset(file->log)); +file->n_transactions); VLOG_INFO("%s: %s", file->file_name, comment); /* Commit the old version, so that we can be assured that we'll eventually @@ -686,6 +676,7 @@ ovsdb_file_compact(struct ovsdb_file *file) if (error) { goto exit; } +ovsdb_log_mark_base(new_log); /* Replace original file by the temporary file. * @@ -740,6 +731,7 @@ ovsdb_file_compact(struct ovsdb_file *file) ovs_fatal(0, "error reading database"); } json_destroy(json); +ovsdb_log_mark_base(file->log); } /* Success! */ diff --git a/ovsdb/log.c b/ovsdb/log.c index dd641884a2ee..27f4b328d451 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -76,6
[ovs-dev] [PATCH 09/13] log: Support using /dev/stdin for opening logs read-only.
On Unix-like systems, usually /dev/stdin opens a duplicate of fd 0, and this will be convenient in a few places later on. This commit makes this support universal. Signed-off-by: Ben Pfaff--- ovsdb/log.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index 1769e12266d1..d25f766474dc 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -166,7 +166,13 @@ ovsdb_log_open(const char *name, const char *magic, #ifdef _WIN32 flags = flags | O_BINARY; #endif -fd = open(name, flags, 0666); +/* Special case for /dev/stdin to make it work even if the operating system + * doesn't support it under that name. */ +if (!strcmp(name, "/dev/stdin") && open_mode == OVSDB_LOG_READ_ONLY) { +fd = dup(STDIN_FILENO); +} else { +fd = open(name, flags, 0666); +} if (fd < 0) { const char *op = (open_mode == OVSDB_LOG_CREATE_EXCL ? "create" : open_mode == OVSDB_LOG_CREATE ? "create or open" -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 10/13] log: Support multiple magic.
Some OVSDB tools will want to open files that might be standalone or clustered databases, and so it's better if ovsdb_log_open() can accept more than one valid "magic". This commit makes that possible. Signed-off-by: Ben Pfaff--- ovsdb/log.c| 147 + ovsdb/log.h| 6 ++- tests/ovsdb-log.at | 2 +- 3 files changed, 108 insertions(+), 47 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index d25f766474dc..dd641884a2ee 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -88,13 +88,19 @@ static bool rename_open_files = false; static bool rename_open_files = true; #endif +static bool parse_header(char *header, const char **magicp, + unsigned long int *length, + uint8_t sha1[SHA1_DIGEST_SIZE]); +static bool is_magic_ok(const char *needle, const char *haystack); + /* Attempts to open 'name' with the specified 'open_mode'. On success, stores * the new log into '*filep' and returns NULL; otherwise returns NULL and * stores NULL into '*filep'. * * 'magic' is a short text string put at the beginning of every record and used * to distinguish one kind of log file from another. For a conventional OVSDB - * log file, use OVSDB_MAGIC. + * log file, use OVSDB_MAGIC. To accept more than one magic string, separate + * them with "|", e.g. "MAGIC 1|MAGIC 2". * * Whether the file will be locked using lockfile_lock() depends on 'locking': * use true to lock it, false not to lock it, or -1 to lock it only if @@ -107,12 +113,16 @@ ovsdb_log_open(const char *name, const char *magic, { struct lockfile *lockfile; struct ovsdb_error *error; -struct ovsdb_log *file; struct stat s; FILE *stream; int flags; int fd; +/* If we can create a new file, we need to know what kind of magic to + * use, so there must be only one kind. */ +if (open_mode == OVSDB_LOG_CREATE_EXCL || open_mode == OVSDB_LOG_CREATE) { +ovs_assert(!strchr(magic, '|')); +} *filep = NULL; ovs_assert(locking == -1 || locking == false || locking == true); @@ -181,43 +191,54 @@ ovsdb_log_open(const char *name, const char *magic, goto error_unlock; } -if (!fstat(fd, )) { -if (s.st_size == 0) { -/* It's (probably) a new file so fsync() its parent directory to - * ensure that its directory entry is committed to disk. */ -fsync_parent_dir(name); -} else if (s.st_size >= strlen(magic) && S_ISREG(s.st_mode)) { -/* Try to read the magic from the first log record. If it's not - * the magic we expect, this is the wrong kind of file, so reject - * it immediately. */ -size_t magic_len = strlen(magic); -char *buf = xzalloc(magic_len + 1); -bool err = (read(fd, buf, magic_len) == magic_len -&& strcmp(buf, magic)); -free(buf); -if (err) { -error = ovsdb_error(NULL, "%s: bad magic (unexpected " -"kind of file)", name); -goto error_close; -} -if (lseek(fd, 0, SEEK_SET)) { -error = ovsdb_io_error(errno, "%s: seek failed", name); -goto error_close; -} -} -} - stream = fdopen(fd, open_mode == OVSDB_LOG_READ_ONLY ? "rb" : "w+b"); if (!stream) { error = ovsdb_io_error(errno, "%s: fdopen failed", name); -goto error_close; +close(fd); +goto error_unlock; +} + +/* Read the magic from the first log record. */ +char header[128]; +const char *actual_magic; +if (!fgets(header, sizeof header, stream)) { +if (ferror(stream)) { +error = ovsdb_io_error(errno, "%s: read error", name); +goto error_fclose; +} + +/* We need to be able to report what kind of file this is but we can't + * if it's empty and we accept more than one. */ +if (strchr(magic, '|')) { +error = ovsdb_error(NULL, "%s: unexpected end of file", name); +goto error_fclose; +} +actual_magic = magic; + +/* It's an empty file and therefore probably a new file, so fsync() + * its parent directory to ensure that its directory entry is + * committed to disk. */ +fsync_parent_dir(name); +} else { +unsigned long int length; +uint8_t sha1[SHA1_DIGEST_SIZE]; +if (!parse_header(header, _magic, , sha1) +|| !is_magic_ok(actual_magic, magic)) { +error = ovsdb_error(NULL, "%s: unexpected file format", name); +goto error_fclose; +} +} + +if (fseek(stream, 0, SEEK_SET)) { +error = ovsdb_io_error(errno, "%s: seek failed", name); +goto error_fclose; } -file = xmalloc(sizeof *file); +struct
[ovs-dev] [PATCH 08/13] log: New function ovsdb_log_compose_record().
This will acquire a new user later on. Signed-off-by: Ben Pfaff--- ovsdb/log.c | 55 --- ovsdb/log.h | 4 2 files changed, 36 insertions(+), 23 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index 22cd9fe89599..1769e12266d1 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -24,6 +24,7 @@ #include #include +#include "openvswitch/dynamic-string.h" #include "openvswitch/json.h" #include "openvswitch/vlog.h" #include "lockfile.h" @@ -419,18 +420,32 @@ ovsdb_log_unread(struct ovsdb_log *file) file->offset = file->prev_offset; } +void +ovsdb_log_compose_record(const struct json *json, + const char *magic, struct ds *header, struct ds *data) +{ +ovs_assert(json->type == JSON_OBJECT || json->type == JSON_ARRAY); +ovs_assert(!header->length); +ovs_assert(!data->length); + +/* Compose content. Add a new-line (replacing the null terminator) to make + * the file easier to read, even though it has no semantic value. */ +json_to_ds(json, 0, data); +ds_put_char(data, '\n'); + +/* Compose header. */ +uint8_t sha1[SHA1_DIGEST_SIZE]; +sha1_bytes(data->string, data->length, sha1); +ds_put_format(header, "%s %"PRIuSIZE" "SHA1_FMT"\n", + magic, data->length, SHA1_ARGS(sha1)); +} + /* Writes log record 'json' to 'file'. Returns NULL if successful or an error * (which the caller must eventually destroy) on failure. */ struct ovsdb_error * ovsdb_log_write(struct ovsdb_log *file, const struct json *json) { -uint8_t sha1[SHA1_DIGEST_SIZE]; struct ovsdb_error *error; -char *json_string; -char header[128]; -size_t length; - -json_string = NULL; switch (file->state) { case OVSDB_LOG_WRITE: @@ -463,22 +478,18 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json *json) goto error; } -/* Compose content. Add a new-line (replacing the null terminator) to make - * the file easier to read, even though it has no semantic value. */ -json_string = json_to_string(json, 0); -length = strlen(json_string) + 1; -json_string[length - 1] = '\n'; - -/* Compose header. */ -sha1_bytes(json_string, length, sha1); -snprintf(header, sizeof header, "%s %"PRIuSIZE" "SHA1_FMT"\n", - file->magic, length, SHA1_ARGS(sha1)); +struct ds header = DS_EMPTY_INITIALIZER; +struct ds data = DS_EMPTY_INITIALIZER; +ovsdb_log_compose_record(json, file->magic, , ); +size_t total_length = header.length + data.length; /* Write. */ -if (fwrite(header, strlen(header), 1, file->stream) != 1 -|| fwrite(json_string, length, 1, file->stream) != 1 -|| fflush(file->stream)) -{ +bool ok = (fwrite(header.string, header.length, 1, file->stream) == 1 + && fwrite(data.string, data.length, 1, file->stream) == 1 + && fflush(file->stream) == 0); +ds_destroy(); +ds_destroy(); +if (!ok) { error = ovsdb_io_error(errno, "%s: write failed", file->name); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); @@ -492,14 +503,12 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json *json) goto error; } -file->offset += strlen(header) + length; -free(json_string); +file->offset += total_length; return NULL; error: file->state = OVSDB_LOG_WRITE_ERROR; file->error = ovsdb_error_clone(error); -free(json_string); return error; } diff --git a/ovsdb/log.h b/ovsdb/log.h index 41734e73b351..cc8469c01e57 100644 --- a/ovsdb/log.h +++ b/ovsdb/log.h @@ -19,6 +19,7 @@ #include #include "compiler.h" +struct ds; struct json; struct ovsdb_log; @@ -42,6 +43,9 @@ struct ovsdb_error *ovsdb_log_read(struct ovsdb_log *, struct json **) OVS_WARN_UNUSED_RESULT; void ovsdb_log_unread(struct ovsdb_log *); +void ovsdb_log_compose_record(const struct json *, const char *magic, + struct ds *header, struct ds *data); + struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, const struct json *) OVS_WARN_UNUSED_RESULT; struct ovsdb_error *ovsdb_log_commit(struct ovsdb_log *) -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 07/13] log: New functions for replacing a log's contents.
These functions will acquire users in future commits. This new functionality made the internal state machine of the log hard enough to understand that I thought that it was best to make it explicit with a 'state' variable, so this commit introduces one. This commit duplicates code and logic from ovsdb_rename() and ovsdb_compact() in ovsdb/file.c. A future commit removes this duplication. Signed-off-by: Ben Pfaff--- ovsdb/log.c| 294 - ovsdb/log.h| 14 +++ tests/ovsdb-log.at | 74 ++ tests/test-ovsdb.c | 58 +-- 4 files changed, 404 insertions(+), 36 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index 6a1a4484ad34..22cd9fe89599 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -36,23 +36,57 @@ VLOG_DEFINE_THIS_MODULE(ovsdb_log); -enum ovsdb_log_mode { -OVSDB_LOG_READ, -OVSDB_LOG_WRITE +/* State in a log's state machine. + * + * OVSDB_LOG_READ is the initial state for a newly opened log. Log records may + * be read in this state only. Reaching end of file does not cause a state + * transition. A read error transitions to OVSDB_LOG_READ_ERROR. + * + * OVSDB_LOG_READ_ERROR prevents further reads from succeeding; they will + * report the same error as before. A log write transitions away to + * OVSDB_LOG_WRITE or OVSDB_LOG_WRITE_ERROR. + * + * OVSDB_LOG_WRITE is the state following a call to ovsdb_log_write(), when all + * goes well. Any state other than OVSDB_LOG_BROKEN may transition to this + * state. A write error transitions to OVSDB_LOG_WRITE_ERROR. + * + * OVSDB_LOG_WRITE_ERROR is the state following a write error. Further writes + * retry and might transition back to OVSDB_LOG_WRITE. + * + * OVSDB_LOG_BROKEN is the state following a call to ovsdb_log_replace() or + * ovsdb_log_replace_commit(), if it fails in a spectacular enough way that no + * further reads or writes can succeed. This is a terminal state. + */ +enum ovsdb_log_state { +OVSDB_LOG_READ, /* Ready to read. */ +OVSDB_LOG_READ_ERROR, /* Read failed, see 'error' for details. */ +OVSDB_LOG_WRITE,/* Ready to write. */ +OVSDB_LOG_WRITE_ERROR, /* Write failed, see 'error' for details. */ +OVSDB_LOG_BROKEN, /* Disk on fire, see 'error' for details. */ }; struct ovsdb_log { +enum ovsdb_log_state state; +struct ovsdb_error *error; + off_t prev_offset; off_t offset; char *name; char *magic; struct lockfile *lockfile; FILE *stream; -struct ovsdb_error *read_error; -bool write_error; -enum ovsdb_log_mode mode; }; +/* Whether the OS supports renaming open files. + * + * (Making this a variable makes it easier to test both strategies on Unix-like + * systems.) */ +#ifdef _WIN32 +static bool rename_open_files = false; +#else +static bool rename_open_files = true; +#endif + /* Attempts to open 'name' with the specified 'open_mode'. On success, stores * the new log into '*filep' and returns NULL; otherwise returns NULL and * stores NULL into '*filep'. @@ -173,15 +207,14 @@ ovsdb_log_open(const char *name, const char *magic, } file = xmalloc(sizeof *file); +file->state = OVSDB_LOG_READ; +file->error = NULL; file->name = xstrdup(name); file->magic = xstrdup(magic); file->lockfile = lockfile; file->stream = stream; file->prev_offset = 0; file->offset = 0; -file->read_error = NULL; -file->write_error = false; -file->mode = OVSDB_LOG_READ; *filep = file; return NULL; @@ -197,10 +230,12 @@ void ovsdb_log_close(struct ovsdb_log *file) { if (file) { +ovsdb_error_destroy(file->error); free(file->name); -fclose(file->stream); +if (file->stream) { +fclose(file->stream); +} lockfile_unlock(file->lockfile); -ovsdb_error_destroy(file->read_error); free(file); } } @@ -283,6 +318,20 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned long int length, struct ovsdb_error * ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) { +*jsonp = NULL; +switch (file->state) { +case OVSDB_LOG_READ: +break; + +case OVSDB_LOG_READ_ERROR: +case OVSDB_LOG_WRITE_ERROR: +case OVSDB_LOG_BROKEN: +return ovsdb_error_clone(file->error); + +case OVSDB_LOG_WRITE: +return NULL; +} + uint8_t expected_sha1[SHA1_DIGEST_SIZE]; uint8_t actual_sha1[SHA1_DIGEST_SIZE]; struct ovsdb_error *error; @@ -291,20 +340,13 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) struct json *json; char header[128]; -*jsonp = json = NULL; - -if (file->read_error) { -return ovsdb_error_clone(file->read_error); -} else if (file->mode == OVSDB_LOG_WRITE) { -return NULL; -} +json = NULL; if (!fgets(header, sizeof header, file->stream)) {
[ovs-dev] [PATCH 05/13] log: Add new open mode OVSDB_LOG_CREATE_EXCL.
Until now, OVSDB_LOG_CREATE implied EXCL, but this commit breaks them apart. Signed-off-by: Ben Pfaff--- ovsdb/file.c | 2 +- ovsdb/log.c| 23 ++- ovsdb/log.h| 3 ++- ovsdb/ovsdb-tool.c | 2 +- tests/ovsdb-log.at | 7 ++- tests/test-ovsdb.c | 2 ++ 6 files changed, 30 insertions(+), 9 deletions(-) diff --git a/ovsdb/file.c b/ovsdb/file.c index 16461a75bfe5..fca07c3d722d 100644 --- a/ovsdb/file.c +++ b/ovsdb/file.c @@ -441,7 +441,7 @@ ovsdb_file_save_copy__(const char *file_name, int locking, struct json *json; error = ovsdb_log_open(file_name, OVSDB_MAGIC, - OVSDB_LOG_CREATE, locking, ); + OVSDB_LOG_CREATE_EXCL, locking, ); if (error) { return error; } diff --git a/ovsdb/log.c b/ovsdb/log.c index f3c6e22ae212..f95075034b3d 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -95,11 +95,16 @@ ovsdb_log_open(const char *name, const char *magic, lockfile = NULL; } -if (open_mode == OVSDB_LOG_READ_ONLY) { +switch (open_mode) { +case OVSDB_LOG_READ_ONLY: flags = O_RDONLY; -} else if (open_mode == OVSDB_LOG_READ_WRITE) { +break; + +case OVSDB_LOG_READ_WRITE: flags = O_RDWR; -} else if (open_mode == OVSDB_LOG_CREATE) { +break; + +case OVSDB_LOG_CREATE_EXCL: #ifndef _WIN32 if (stat(name, ) == -1 && errno == ENOENT && lstat(name, ) == 0 && S_ISLNK(s.st_mode)) { @@ -114,7 +119,13 @@ ovsdb_log_open(const char *name, const char *magic, #else flags = O_RDWR | O_CREAT | O_EXCL; #endif -} else { +break; + +case OVSDB_LOG_CREATE: +flags = O_RDWR | O_CREAT; +break; + +default: OVS_NOT_REACHED(); } #ifdef _WIN32 @@ -122,7 +133,9 @@ ovsdb_log_open(const char *name, const char *magic, #endif fd = open(name, flags, 0666); if (fd < 0) { -const char *op = open_mode == OVSDB_LOG_CREATE ? "create" : "open"; +const char *op = (open_mode == OVSDB_LOG_CREATE_EXCL ? "create" +: open_mode == OVSDB_LOG_CREATE ? "create or open" +: "open"); error = ovsdb_io_error(errno, "%s: %s failed", name, op); goto error_unlock; } diff --git a/ovsdb/log.h b/ovsdb/log.h index fd495e518dd0..08c6a7783094 100644 --- a/ovsdb/log.h +++ b/ovsdb/log.h @@ -26,7 +26,8 @@ struct ovsdb_log; enum ovsdb_log_open_mode { OVSDB_LOG_READ_ONLY,/* Open existing file, read-only. */ OVSDB_LOG_READ_WRITE, /* Open existing file, read/write. */ -OVSDB_LOG_CREATE/* Create new file, read/write. */ +OVSDB_LOG_CREATE_EXCL, /* Create new file, read/write. */ +OVSDB_LOG_CREATE/* Create or open file, read/write. */ }; #define OVSDB_MAGIC "OVSDB JSON" diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c index 45b3f7348c3d..da6b7142a4fa 100644 --- a/ovsdb/ovsdb-tool.c +++ b/ovsdb/ovsdb-tool.c @@ -218,7 +218,7 @@ do_create(struct ovs_cmdl_context *ctx) /* Create database file. */ check_ovsdb_error(ovsdb_log_open(db_file_name, OVSDB_MAGIC, - OVSDB_LOG_CREATE, -1, )); + OVSDB_LOG_CREATE_EXCL, -1, )); check_ovsdb_error(ovsdb_log_write(log, json)); check_ovsdb_error(ovsdb_log_commit(log)); ovsdb_log_close(log); diff --git a/tests/ovsdb-log.at b/tests/ovsdb-log.at index 29c0c5913c15..826e334efddd 100644 --- a/tests/ovsdb-log.at +++ b/tests/ovsdb-log.at @@ -46,9 +46,14 @@ AT_CHECK( file: read: {"x":1} ]], [ignore]) AT_CHECK( - [test-ovsdb log-io file create read], [1], + [test-ovsdb log-io file create-excl read], [1], [], [test-ovsdb: I/O error: file: create failed (File exists) ]) +AT_CHECK( + [test-ovsdb log-io file create read], [0], + [file: open successful +file: read: {"x":1} +]) AT_CHECK([test -f .file.~lock~]) AT_CLEANUP diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index 4d6a894e3e83..e06ecad4f368 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -323,6 +323,8 @@ do_log_io(struct ovs_cmdl_context *ctx) mode = OVSDB_LOG_READ_WRITE; } else if (!strcmp(mode_string, "create")) { mode = OVSDB_LOG_CREATE; +} else if (!strcmp(mode_string, "create-excl")) { +mode = OVSDB_LOG_CREATE_EXCL; } else { ovs_fatal(0, "unknown log-io open mode \"%s\"", mode_string); } -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 06/13] log: Make reading in writing mode less of an error.
Clients are intended to use the OVSDB log code by reading zero or more records and then writing zero or more records, but not reading after any write has occurred. Until now, ovsdb_log_read() has signaled the latter behavior as an error. Upcoming changes to OVSDB are going to make it an expected behavior in some cases, so this commit changes it so that it just becomes an empty read. Signed-off-by: Ben Pfaff--- ovsdb/log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index f95075034b3d..6a1a4484ad34 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -296,7 +296,7 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) if (file->read_error) { return ovsdb_error_clone(file->read_error); } else if (file->mode == OVSDB_LOG_WRITE) { -return OVSDB_BUG("reading file in write mode"); +return NULL; } if (!fgets(header, sizeof header, file->stream)) { -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 04/13] log: Log write errors.
This saves all the callers from logging them separately. Signed-off-by: Ben Pfaff--- ovsdb/log.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/ovsdb/log.c b/ovsdb/log.c index 1a372d7b73ee..f3c6e22ae212 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -25,6 +25,7 @@ #include #include "openvswitch/json.h" +#include "openvswitch/vlog.h" #include "lockfile.h" #include "ovsdb.h" #include "ovsdb-error.h" @@ -33,6 +34,8 @@ #include "transaction.h" #include "util.h" +VLOG_DEFINE_THIS_MODULE(ovsdb_log); + enum ovsdb_log_mode { OVSDB_LOG_READ, OVSDB_LOG_WRITE @@ -411,6 +414,10 @@ ovsdb_log_write(struct ovsdb_log *file, const struct json *json) { error = ovsdb_io_error(errno, "%s: write failed", file->name); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); +VLOG_WARN_RL(, "%s: write failed (%s)", + file->name, ovs_strerror(errno)); + /* Remove any partially written data, ignoring errors since there is * nothing further we can do. */ ignore(ftruncate(fileno(file->stream), file->offset)); -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 03/13] log: Make json parameter to ovsdb_log_write() const.
Signed-off-by: Ben Pfaff--- ovsdb/log.c | 4 +++- ovsdb/log.h | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index d9fedd9ded6c..1a372d7b73ee 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -360,8 +360,10 @@ ovsdb_log_unread(struct ovsdb_log *file) file->offset = file->prev_offset; } +/* Writes log record 'json' to 'file'. Returns NULL if successful or an error + * (which the caller must eventually destroy) on failure. */ struct ovsdb_error * -ovsdb_log_write(struct ovsdb_log *file, struct json *json) +ovsdb_log_write(struct ovsdb_log *file, const struct json *json) { uint8_t sha1[SHA1_DIGEST_SIZE]; struct ovsdb_error *error; diff --git a/ovsdb/log.h b/ovsdb/log.h index fbcea1f2b991..fd495e518dd0 100644 --- a/ovsdb/log.h +++ b/ovsdb/log.h @@ -41,7 +41,7 @@ struct ovsdb_error *ovsdb_log_read(struct ovsdb_log *, struct json **) OVS_WARN_UNUSED_RESULT; void ovsdb_log_unread(struct ovsdb_log *); -struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, struct json *) +struct ovsdb_error *ovsdb_log_write(struct ovsdb_log *, const struct json *) OVS_WARN_UNUSED_RESULT; struct ovsdb_error *ovsdb_log_commit(struct ovsdb_log *) OVS_WARN_UNUSED_RESULT; -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 02/13] log: Require log entries to be JSON objects.
The current and upcoming users of the OVSDB logging module only use JSON objects as records. This commit simplifies the users slightly by allowing them to always assume that the records are JSON objects. Unfortunately this resulted in a large number of updates to tests, which didn't always use JSON objects. Signed-off-by: Ben Pfaff--- ovsdb/log.c| 17 + tests/ovsdb-log.at | 212 ++--- 2 files changed, 123 insertions(+), 106 deletions(-) diff --git a/ovsdb/log.c b/ovsdb/log.c index e6f66e668fe5..d9fedd9ded6c 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -254,6 +254,16 @@ parse_body(struct ovsdb_log *file, off_t offset, unsigned long int length, return NULL; } +/* Attempts to read a log record from 'file'. + * + * If successful, returns NULL and stores in '*jsonp' the JSON object that the + * record contains. The caller owns the data and must eventually free it (with + * json_destroy()). + * + * If a read error occurs, returns the error and stores NULL in '*jsonp'. + * + * If the read reaches end of file, returns NULL and stores NULL in + * '*jsonp'. */ struct ovsdb_error * ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) { @@ -315,6 +325,13 @@ ovsdb_log_read(struct ovsdb_log *file, struct json **jsonp) json->u.string); goto error; } +if (json->type != JSON_OBJECT) { +error = ovsdb_syntax_error(NULL, NULL, "%s: %lu bytes starting at " + "offset %lld are not a JSON object", + file->name, data_length, + (long long int) data_offset); +goto error; +} file->prev_offset = file->offset; file->offset = data_offset + data_length; diff --git a/tests/ovsdb-log.at b/tests/ovsdb-log.at index c8efaaec1a50..29c0c5913c15 100644 --- a/tests/ovsdb-log.at +++ b/tests/ovsdb-log.at @@ -19,14 +19,14 @@ AT_SETUP([write one, reread]) AT_KEYWORDS([ovsdb log]) AT_CAPTURE_FILE([file]) AT_CHECK( - [[test-ovsdb log-io file create 'write:[0]']], [0], + [[test-ovsdb log-io file create 'write:{"x":0}']], [0], [[file: open successful -file: write:[0] successful +file: write:{"x":0} successful ]], [ignore]) AT_CHECK( [test-ovsdb log-io file read-only read read], [0], [[file: open successful -file: read: [0] +file: read: {"x":0} file: read: end of log ]], [ignore]) AT_CHECK([test -f .file.~lock~]) @@ -36,14 +36,14 @@ AT_SETUP([check that create fails if file exists]) AT_KEYWORDS([ovsdb log]) AT_CAPTURE_FILE([file]) AT_CHECK( - [[test-ovsdb log-io file create 'write:[1]']], [0], + [[test-ovsdb log-io file create 'write:{"x":1}']], [0], [[file: open successful -file: write:[1] successful +file: write:{"x":1} successful ]], [ignore]) AT_CHECK( [test-ovsdb log-io file read-only read], [0], [[file: open successful -file: read: [1] +file: read: {"x":1} ]], [ignore]) AT_CHECK( [test-ovsdb log-io file create read], [1], @@ -56,18 +56,18 @@ AT_SETUP([write one, reread]) AT_KEYWORDS([ovsdb log]) AT_CAPTURE_FILE([file]) AT_CHECK( - [[test-ovsdb log-io file create 'write:[0]' 'write:[1]' 'write:[2]']], [0], + [[test-ovsdb log-io file create 'write:{"x":0}' 'write:{"x":1}' 'write:{"x":2}']], [0], [[file: open successful -file: write:[0] successful -file: write:[1] successful -file: write:[2] successful +file: write:{"x":0} successful +file: write:{"x":1} successful +file: write:{"x":2} successful ]], [ignore]) AT_CHECK( [test-ovsdb log-io file read-only read read read read], [0], [[file: open successful -file: read: [0] -file: read: [1] -file: read: [2] +file: read: {"x":0} +file: read: {"x":1} +file: read: {"x":2} file: read: end of log ]], [ignore]) AT_CHECK([test -f .file.~lock~]) @@ -79,18 +79,18 @@ AT_CAPTURE_FILE([file]) # Sometimes you just need more magic: # http://www.catb.org/jargon/html/magic-story.html AT_CHECK( - [[test-ovsdb --magic="MORE MAGIC" log-io file create 'write:[0]' 'write:[1]' 'write:[2]']], [0], + [[test-ovsdb --magic="MORE_MAGIC" log-io file create 'write:{"x":0}' 'write:{"x":1}' 'write:{"x":2}']], [0], [[file: open successful -file: write:[0] successful -file: write:[1] successful -file: write:[2] successful +file: write:{"x":0} successful +file: write:{"x":1} successful +file: write:{"x":2} successful ]], [ignore]) AT_CHECK( - [test-ovsdb --magic="MORE MAGIC" log-io file read-only read read read read], [0], + [test-ovsdb --magic="MORE_MAGIC" log-io file read-only read read read read], [0], [[file: open successful -file: read: [0] -file: read: [1] -file: read: [2] +file: read: {"x":0} +file: read: {"x":1} +file: read: {"x":2} file: read: end of log ]], [ignore]) AT_CHECK( @@ -104,27 +104,27 @@ AT_SETUP([write one, reread, append]) AT_KEYWORDS([ovsdb log]) AT_CAPTURE_FILE([file]) AT_CHECK( - [[test-ovsdb log-io file create 'write:[0]' 'write:[1]'
[ovs-dev] [PATCH 01/13] log: Allow client to specify magic.
Until now, the logging code in ovsdb has only supported a single file format, for OVSDB standalone database files. Upcoming commits will add support for another, incompatible format, which uses a different magic string for identification. This commit allows the logging code to support both formats. Signed-off-by: Ben Pfaff--- ovsdb/file.c | 5 +++-- ovsdb/log.c| 54 -- ovsdb/log.h| 5 - ovsdb/ovsdb-tool.c | 10 +- tests/ovsdb-log.at | 27 +++ tests/test-ovsdb.c | 17 ++--- 6 files changed, 93 insertions(+), 25 deletions(-) diff --git a/ovsdb/file.c b/ovsdb/file.c index 6a406da2a838..16461a75bfe5 100644 --- a/ovsdb/file.c +++ b/ovsdb/file.c @@ -129,7 +129,7 @@ ovsdb_file_open_log(const char *file_name, enum ovsdb_log_open_mode open_mode, ovs_assert(logp || schemap); -error = ovsdb_log_open(file_name, open_mode, -1, ); +error = ovsdb_log_open(file_name, OVSDB_MAGIC, open_mode, -1, ); if (error) { goto error; } @@ -440,7 +440,8 @@ ovsdb_file_save_copy__(const char *file_name, int locking, struct ovsdb_log *log; struct json *json; -error = ovsdb_log_open(file_name, OVSDB_LOG_CREATE, locking, ); +error = ovsdb_log_open(file_name, OVSDB_MAGIC, + OVSDB_LOG_CREATE, locking, ); if (error) { return error; } diff --git a/ovsdb/log.c b/ovsdb/log.c index 380f5e93d464..e6f66e668fe5 100644 --- a/ovsdb/log.c +++ b/ovsdb/log.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2009, 2010, 2011, 2012, 2013 Nicira, Inc. +/* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2017 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -42,6 +42,7 @@ struct ovsdb_log { off_t prev_offset; off_t offset; char *name; +char *magic; struct lockfile *lockfile; FILE *stream; struct ovsdb_error *read_error; @@ -53,12 +54,17 @@ struct ovsdb_log { * the new log into '*filep' and returns NULL; otherwise returns NULL and * stores NULL into '*filep'. * + * 'magic' is a short text string put at the beginning of every record and used + * to distinguish one kind of log file from another. For a conventional OVSDB + * log file, use OVSDB_MAGIC. + * * Whether the file will be locked using lockfile_lock() depends on 'locking': * use true to lock it, false not to lock it, or -1 to lock it only if * 'open_mode' is a mode that allows writing. */ struct ovsdb_error * -ovsdb_log_open(const char *name, enum ovsdb_log_open_mode open_mode, +ovsdb_log_open(const char *name, const char *magic, + enum ovsdb_log_open_mode open_mode, int locking, struct ovsdb_log **filep) { struct lockfile *lockfile; @@ -118,10 +124,30 @@ ovsdb_log_open(const char *name, enum ovsdb_log_open_mode open_mode, goto error_unlock; } -if (!fstat(fd, ) && s.st_size == 0) { -/* It's (probably) a new file so fsync() its parent directory to ensure - * that its directory entry is committed to disk. */ -fsync_parent_dir(name); +if (!fstat(fd, )) { +if (s.st_size == 0) { +/* It's (probably) a new file so fsync() its parent directory to + * ensure that its directory entry is committed to disk. */ +fsync_parent_dir(name); +} else if (s.st_size >= strlen(magic) && S_ISREG(s.st_mode)) { +/* Try to read the magic from the first log record. If it's not + * the magic we expect, this is the wrong kind of file, so reject + * it immediately. */ +size_t magic_len = strlen(magic); +char *buf = xzalloc(magic_len + 1); +bool err = (read(fd, buf, magic_len) == magic_len +&& strcmp(buf, magic)); +free(buf); +if (err) { +error = ovsdb_error(NULL, "%s: bad magic (unexpected " +"kind of file)", name); +goto error_close; +} +if (lseek(fd, 0, SEEK_SET)) { +error = ovsdb_io_error(errno, "%s: seek failed", name); +goto error_close; +} +} } stream = fdopen(fd, open_mode == OVSDB_LOG_READ_ONLY ? "rb" : "w+b"); @@ -132,6 +158,7 @@ ovsdb_log_open(const char *name, enum ovsdb_log_open_mode open_mode, file = xmalloc(sizeof *file); file->name = xstrdup(name); +file->magic = xstrdup(magic); file->lockfile = lockfile; file->stream = stream; file->prev_offset = 0; @@ -162,21 +189,20 @@ ovsdb_log_close(struct ovsdb_log *file) } } -static const char magic[] = "OVSDB JSON "; - static bool -parse_header(char *header, unsigned long int *length, +parse_header(const char *magic, char *header, unsigned long int
[ovs-dev] [PATCH 00/13] OVSDB log enhancements
These commits are prerequisites for the distributed database project currently underway. Ben Pfaff (13): log: Allow client to specify magic. log: Require log entries to be JSON objects. log: Make json parameter to ovsdb_log_write() const. log: Log write errors. log: Add new open mode OVSDB_LOG_CREATE_EXCL. log: Make reading in writing mode less of an error. log: New functions for replacing a log's contents. log: New function ovsdb_log_compose_record(). log: Support using /dev/stdin for opening logs read-only. log: Support multiple magic. log: Replace ovsdb_log_get_offset() by a more abstract mechanism. log: Use absolute name of log file. log: Add async commit support. ovsdb/file.c | 33 +-- ovsdb/log.c| 788 + ovsdb/log.h| 61 - ovsdb/ovsdb-tool.c | 12 +- tests/ovsdb-log.at | 304 ++--- tests/test-ovsdb.c | 75 - 6 files changed, 1018 insertions(+), 255 deletions(-) -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.
We can hold off on back-porting this. This change mainly affects the delete by 5-tuple API. https://github.com/openvswitch/ovs/commit/817a76577fec3f03310d7d3a5a10df01340ee8ad Unless we plan to backport that patch, we should hold off on backporting this as well. Thanks, Sairam On 12/6/17, 9:41 AM, "aserd...@ovn.org"wrote: >> -Original Message- >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of Justin Pettit >> Sent: Tuesday, December 5, 2017 9:30 AM >> To: d...@openvswitch.org; Alin Gabriel Serdean ; >> Sairam Venugopal >> Subject: [ovs-dev] [PATCH] datapath-windows: Correct endianness for >> deleting zone. >> >> The zone Netlink attribute is supposed to be in network-byte order, but >the >> Windows code for deleting conntrack entries was treating it as host-byte >> order. >> >> Found by inspection. >> >> Signed-off-by: Justin Pettit > >Thanks for the patch, Justin. I tested the patch, and everything looks >good. >I want to hold on the acknowledge until all of us are on the same page. > >This will make the windows datapath consistent with future patches from the >userspace. >For consistency I would like to backport the patch all the way to >branch-2.6. > >What do you think Sai? > >Thanks, >Alin. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.
> On Dec 7, 2017, at 2:57 PM, Sairam Venugopalwrote: > > Thanks for the patch. Discussed about the patch offline. > > This is required since we have now introduced a newer api for deleting > 5-tuples > and this breaks the assumption of zone-id not being in the right byte order. > > Acked-by: Sairam Venugopal Thanks. I pushed this to master. --Justin ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath-windows: Correct endianness for deleting zone.
Thanks for the patch. Discussed about the patch offline. This is required since we have now introduced a newer api for deleting 5-tuples and this breaks the assumption of zone-id not being in the right byte order. Acked-by: Sairam VenugopalOn 12/4/17, 11:30 PM, "Justin Pettit" wrote: >The zone Netlink attribute is supposed to be in network-byte order, but >the Windows code for deleting conntrack entries was treating it as >host-byte order. > >Found by inspection. > >Signed-off-by: Justin Pettit >--- > datapath-windows/ovsext/Conntrack.c | 2 +- > lib/netlink-conntrack.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > >diff --git a/datapath-windows/ovsext/Conntrack.c >b/datapath-windows/ovsext/Conntrack.c >index 3203411a8b7a..edc0ec9c5324 100644 >--- a/datapath-windows/ovsext/Conntrack.c >+++ b/datapath-windows/ovsext/Conntrack.c >@@ -1076,7 +1076,7 @@ OvsCtDeleteCmdHandler(POVS_USER_PARAMS_CONTEXT >usrParamsCtx, > } > > if (ctAttrs[CTA_ZONE]) { >-zone = NlAttrGetU16(ctAttrs[CTA_ZONE]); >+zone = ntohs(NlAttrGetU16(ctAttrs[CTA_ZONE])); > } > > status = OvsCtFlush(zone); >diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c >index 1e1bb2f79d1d..3c651b6c856a 100644 >--- a/lib/netlink-conntrack.c >+++ b/lib/netlink-conntrack.c >@@ -251,7 +251,7 @@ nl_ct_flush_zone(uint16_t flush_zone) > > nl_msg_put_nfgenmsg(, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, > IPCTNL_MSG_CT_DELETE, NLM_F_REQUEST); >-nl_msg_put_be16(, CTA_ZONE, flush_zone); >+nl_msg_put_be16(, CTA_ZONE, htons(flush_zone)); > > err = nl_transact(NETLINK_NETFILTER, , NULL); > ofpbuf_uninit(); >-- >2.7.4 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto: Keep inserting buckets into a group from changing group type.
On Thu, Dec 07, 2017 at 02:03:56PM -0800, Ben Pfaff wrote: > The "insert buckets" and "delete buckets" operations on a group should not > change the group's type or properties, but the implementation did this by > mistake. This fixes the problem. > > Reported-by: shivani dommeti> Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045830.html > Signed-off-by: Ben Pfaff This isn't well specified by OpenFlow so I filed EXT-570 in the ONF's JIRA as a request for clarification. (I expect no response.) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ofproto: Keep inserting buckets into a group from changing group type.
The "insert buckets" and "delete buckets" operations on a group should not change the group's type or properties, but the implementation did this by mistake. This fixes the problem. Reported-by: shivani dommetiReported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045830.html Signed-off-by: Ben Pfaff --- AUTHORS.rst | 1 + ofproto/ofproto.c | 17 + tests/ofproto.at | 25 + 3 files changed, 43 insertions(+) diff --git a/AUTHORS.rst b/AUTHORS.rst index 075b2f72d47d..979cfd7bce41 100644 --- a/AUTHORS.rst +++ b/AUTHORS.rst @@ -589,6 +589,7 @@ likunyunkunyu...@hotmail.com meishengxin meisheng...@huawei.com neeraj mehtamehtaneera...@gmail.com rahim entezari rahim.entez...@gmail.com +shivani dommeti shivani.domm...@gmail.com weizj 34965...@qq.com 俊 赵 zhaoju...@outlook.com 冯全树(Crab)fqs...@126.com diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index 82c2bb27d348..84eb18e901ed 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -6905,6 +6905,13 @@ handle_queue_get_config_request(struct ofconn *ofconn, return error; } +/* Allocates, initializes, and constructs a new group in 'ofproto', obtaining + * all the attributes for it from 'gm', and stores a pointer to it in + * '*ofgroup'. Makes the new group visible from the flow table starting from + * 'version'. + * + * Returns 0 if successful, otherwise an error code. If there is an error then + * '*ofgroup' is indeterminate upon return. */ static enum ofperr init_group(struct ofproto *ofproto, const struct ofputil_group_mod *gm, ovs_version_t version, struct ofgroup **ofgroup) @@ -7113,6 +7120,16 @@ modify_group_start(struct ofproto *ofproto, struct ofproto_group_mod *ogm) return OFPERR_OFPGMFC_UNKNOWN_GROUP; } +/* Inserting or deleting a bucket should not change the group's type or + * properties, so change the group mod so that these aspects match the old + * group. (See EXT-570.) */ +if (ogm->gm.command == OFPGC15_INSERT_BUCKET || +ogm->gm.command == OFPGC15_REMOVE_BUCKET) { +ogm->gm.type = old_group->type; +ofputil_group_properties_destroy(>gm.props); +ofputil_group_properties_copy(>gm.props, _group->props); +} + if (old_group->type != ogm->gm.type && (ofproto->n_groups[ogm->gm.type] >= ofproto->ogf.max_groups[ogm->gm.type])) { diff --git a/tests/ofproto.at b/tests/ofproto.at index c19a3d10daed..74a30cbefe3b 100644 --- a/tests/ofproto.at +++ b/tests/ofproto.at @@ -621,7 +621,32 @@ OFPST_GROUP_DESC reply (OF1.5): group_id=1234,type=all,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1,bucket=bucket_id:10,actions=output:10,bucket=bucket_id:11,actions=output:11,bucket=bucket_id:12,actions=output:12,bucket=bucket_id:13,actions=output:13,bucket=bucket_id:14,actions=output:14,bucket=bucket_id:15,actions=output:15,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21 ]) +# Delete groups. +AT_CHECK([ovs-ofctl -O OpenFlow15 del-groups br0]) + +# Add "fast_failover" group, then insert a bucket into it and make +# sure that the type of the group doesn't change. (There was a bug +# that caused this to happen.) +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn add-group br0 group_id=1234,type=ff]) +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-groups br0], [0], [stdout]) +AT_CHECK([strip_xids < stdout], [0], [dnl +OFPST_GROUP_DESC reply (OF1.5): + group_id=1234,type=ff +]) +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 group_id=1234,command_bucket_id=first,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21]) +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-groups br0], [0], [stdout]) +AT_CHECK([strip_xids < stdout], [0], [dnl +OFPST_GROUP_DESC reply (OF1.5): + group_id=1234,type=ff,bucket=bucket_id:20,actions=output:20,bucket=bucket_id:21,actions=output:21 +]) + # Negative tests. +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 group_id=1234,command_bucket_id=123,type=indirect,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1], [1], [], + [ovs-ofctl: type is not needed +]) +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 group_id=1234,command_bucket_id=123,selection_method=dp_hash,type=indirect,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1], [1], [], + [ovs-ofctl: selection method is not needed +]) AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 group_id=1234,command_bucket_id=0xff01,bucket=bucket_id:0,actions=output:0,bucket=bucket_id:1,actions=output:1], [1], [], [ovs-ofctl: invalid command bucket id 4294967041 ]) -- 2.10.2 ___ dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Change assertion to log message.
On 12/7/2017 1:01 PM, Ben Pfaff wrote: Until now, compose_output_action__() has asserted that a packet output to a patch port is not to be truncated. This commit changes this to an error that will be included in trace output, for two reasons. First, this sounds like only a minor problem to me which doesn't warrant killing the process. Second, it will be easier to track down the actual problem (if any) if we can get a trace instead of a segfault. Reported-by: Kevin LinReported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html Signed-off-by: Ben Pfaff --- ofproto/ofproto-dpif-xlate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index fcced344ed8a..60eaf18038ca 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3701,7 +3701,9 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, } if (xport->peer) { - ovs_assert(!truncate); + if (truncate) { + xlate_report_error(ctx, "Cannot truncate output to patch port"); + } patch_port_output(ctx, xport, xport->peer); return; } I agree that is a better approach. Reviewed-by: Greg Rose ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 0/2] Support conntrack flush by ct 5-tuple
> On Dec 7, 2017, at 10:40 AM, Yi-Hung Weiwrote: > > This patch series add support to flush a connection tracking entry > specified by a 5-tuple. Thanks for the patches. I pushed the series to the master branch. --Justin ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Argumentación jurídica para las áreas contable y financiera
Conozca, interprete y aplique la ley a su favor Argumentación jurídica para las áreas contable y financiera 18 de Diciembre - L.C. y L.D. Jacobo Meraz Sotelo 9am-6pm Es de suma importancia que los empresarios y los encargados de las áreas contables y administrativas conozcan la interpretación y argumentación de las normas jurídicas tributarias a efecto de hacer valer sus derechos ante actos de autoridad hacendaria, para proteger con eficacia la libertad y el patrimonio de los contribuyentes, y el funcionamiento de sus negocios, sobre todo, con las modificaciones en tema de comprobantes fiscales vigentes a partir del 1 de diciembre. BENEFICIOS DE ASISTIR: - Conocerá el marco jurídico normativo, así como los derechos fundamentales. - Entenderá conceptos y métodos de interpretación de la ley fiscal. - Aprenderá a preparar argumentos los cuales aplicará mediante un caso práctico. - ¡Mucha más información valiosa! ¿Requiere la información a la Brevedad? responda este email con la palabra: Contable + nombre - teléfono - correo. centro telefónico:018002120744 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ofproto-dpif-xlate: Change assertion to log message.
Until now, compose_output_action__() has asserted that a packet output to a patch port is not to be truncated. This commit changes this to an error that will be included in trace output, for two reasons. First, this sounds like only a minor problem to me which doesn't warrant killing the process. Second, it will be easier to track down the actual problem (if any) if we can get a trace instead of a segfault. Reported-by: Kevin LinReported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-December/045832.html Signed-off-by: Ben Pfaff --- ofproto/ofproto-dpif-xlate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index fcced344ed8a..60eaf18038ca 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3701,7 +3701,9 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, } if (xport->peer) { - ovs_assert(!truncate); + if (truncate) { + xlate_report_error(ctx, "Cannot truncate output to patch port"); + } patch_port_output(ctx, xport, xport->peer); return; } -- 2.10.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH branch-2.8 v1] OVN: Add external_ids to NAT and Logical_Router_Static_Route tables.
On Wed, Dec 6, 2017 at 4:59 AM,wrote: > From: Lucas Alvares Gomes > > The external_ids column is missing from the NAT and > Logical_Router_Static_Route tables. > > As discussed at [0] the change to the schema for this backport should > leave the version number unmodified. > > [0] > https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341630.html > > Signed-off-by: Lucas Alvares Gomes Thanks, Lucas. I have applied this backport to branch-2.8. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.
>On 07/12/17 14:28, Ilya Maximets wrote: >> Thanks for review, comments inline. >> >> On 07.12.2017 15:49, Eelco Chaudron wrote: >>> On 01/12/17 16:44, Ilya Maximets wrote: This is preparation for 'struct dp_netdev_pmd_thread' modification in upcoming commits. Needed to avoid reordering and regrouping while replacing old and adding new members. >>> Should this be part of the TX batching set? Anyway, I'm ok if it's >>> not stalling the approval :) >> Unfortunately yes, because members reordered and regrouped just to >> include new members: pmd->ctx and pmd->n_output_batches. This could >> not be a standalone change because adding of different members will >> require different regrouping/ reordering. I moved this change to a >> separate patch to not do this twice while adding each member in patches >2/7 and 6/7. >> >> Anyway, as I mentioned in cover letter, I still prefer reverting of >> the padding at all by this patch: >> >> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html I understand that with PADDED_MEMBERS macro it was slightly tricky to extend or reorganize the structure and so suggested 'pahole'. But I see that the problem hasn't gone and still there are some strong opinions on reverting the earlier effort. I don’t mind reverting the patch but would be nice if the changes to this structure are made keeping alignment in mind. - Bhanuprakash. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.
>> >> If CPU just skips this instruction we will lost all the prefetching >> >> optimizations because all the calls will be replaced by non-existent >'prefetchwt1'. >> > >> > [Bhanu] I would be worried if core generates an exception treating >> > it as illegal instruction. Instead pipeline units treat this as NOP >> > if it >> doesn't support it. >> > So the micro optimizations doesn't really do any thing on the processors >that doesn't support it. >> >> This could be an issue. If someday we'll have real performance >> optimization based on OPCH_HTW prefetch, we will have prefetchwt1 on >> system that supports it and NOP on others even if they have usual >> prefetchw which could provide performance improvement too. [Bhanu] Adding the below information only for future reference, (going to point to this thread in the commit log) On systems that has *only* prefetchw and no prefetchwt1 instruction. OPCH_LTW- prefetchw OPCH_MTW - prefetchw OPCH_HTW -prefetchw OPCH_NTW -prefetchw On systems that supports both prefetchw and prefetchwt1, OPCH_LTW- prefetchwt1 OPCH_MTW - prefetchwt1 OPCH_HTW -prefetchw OPCH_NTW -prefetchwt1 So OPCH_HTW would always be prefetchw and LTW/MTW/HTW might turn in to NOPs on processors that support prefetchw alone. (when compiled with CFLAGS = -march=native -mprefetchwt1) >> >> As I understand, checking of '-mprefetchwt1' is equal to checking >> compiler version. It doesn't check anything about supporting of this >instruction in CPU. >> This could end up with non-working performance optimizations and even >> degradation on systems that supports usual prefetches but not >> prefetchwt1 (useless NOPs degrades performance if they are on a hot >path). >> >> IMHO, This compiler option should be passed only if CPU really supports it. >> I guess, the maximum that we can do is add a note into performance >> optimization guide that '-mprefetchwt1' could be passed via CFLAGS if >> user sure that it supported by target CPU. > >That is my thinking as well. The people/organizations building OVS packages >for deployment have the responsibility to specify the minimum requirements >on the target architecture and feed that into the compiler using CFLAGS. That >may well be leaning towards the lower end of capabilities to maximize >compatibility and sacrifice some performance on high-end CPUs. > >The specialized prefetch macros should be mapped to the best available >target instructions by the compiler and/or conditional compile directives >based on the CFLAGS architecture settings. > >We would gather all these target-specific compiler optimization guidelines in >the advanced DPDK documentation of OVS. > >Of course developers or benchmark testers are free to use -march=native or >similar at their discretion in their local test beds for best possible >performance. If the general view is get rid of this flag at compilation and only to document this, I am happy with this and can update the documentation. But I still think we are being too defensive here and with few NOPs performance impact isn't even noticeable. - Bhanuprakash. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 7/8] net: ovs: remove unused hardirq.h
On Fri, Nov 17, 2017 at 3:02 PM, Yang Shiwrote: > Preempt counter APIs have been split out, currently, hardirq.h just > includes irq_enter/exit APIs which are not used by openvswitch at all. > > So, remove the unused hardirq.h. > > Signed-off-by: Yang Shi > Cc: Pravin Shelar > Cc: "David S. Miller" > Cc: d...@openvswitch.org Acked-by: Pravin B Shelar ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 7/8] net: ovs: remove unused hardirq.h
Hi folks, Any comment on this one? Thanks, Yang On 11/17/17 5:48 PM, Yang Shi wrote: It looks the email address of Pravin in MAINTAINERS file is obsolete, sent to the right address. Yang On 11/17/17 3:02 PM, Yang Shi wrote: Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by openvswitch at all. So, remove the unused hardirq.h. Signed-off-by: Yang ShiCc: Pravin Shelar Cc: "David S. Miller" Cc: d...@openvswitch.org --- net/openvswitch/vport-internal_dev.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 04a3128..2f47c65 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -16,7 +16,6 @@ * 02110-1301, USA */ -#include #include #include #include ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpctl: Fix comment describing get_one_dp().
> On Dec 7, 2017, at 8:21 AM, Ben Pfaffwrote: > >> On Wed, Dec 06, 2017 at 10:19:23PM -0800, Justin Pettit wrote: >> Signed-off-by: Justin Pettit >> --- >> lib/dpctl.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/lib/dpctl.c b/lib/dpctl.c >> index 06cfbc4abf21..5e064ec9c100 100644 >> --- a/lib/dpctl.c >> +++ b/lib/dpctl.c >> @@ -129,8 +129,8 @@ if_up(struct netdev *netdev) >> } >> >> /* Retrieve the name of the datapath if exactly one exists. The caller >> - * is responsible for freeing the returned string. If there is not one >> - * datapath, aborts with an error message. */ >> + * is responsible for freeing the returned string. If a single datapath >> + * name cannot be determined, returns NULL. */ >> static char * >> get_one_dp(struct dpctl_params *dpctl_p) > > Acked-by: Ben Pfaff Thanks. I pushed this to master. —Justin ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 2/2] dpctl: Support flush conntrack by conntrack 5-tuple
With this patch, "flush-conntrack" in ovs-dpctl and ovs-appctl accept a conntrack 5-tuple to delete the conntrack entry specified by the 5-tuple. For example, user can use the following command to flush a conntrack entry in zone 5. $ ovs-dpctl flush-conntrack zone=5 \ 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1' $ ovs-appctl dpctl/flush-conntrack zone=5 \ 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1' VMWare-BZ: #1983178 Signed-off-by: Yi-Hung Wei--- NEWS | 2 + lib/ct-dpif.c| 108 +++ lib/ct-dpif.h| 1 + lib/dpctl.c | 76 +-- lib/dpctl.man| 20 ++-- tests/system-kmod-macros.at | 8 +++ tests/system-traffic.at | 65 +++ tests/system-userspace-macros.at | 10 utilities/ovs-dpctl.c| 4 +- 9 files changed, 272 insertions(+), 22 deletions(-) diff --git a/NEWS b/NEWS index 427c8f83d8b2..188a0757a797 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,8 @@ Post-v2.8.0 * ovn-ctl: New commands run_nb_ovsdb and run_sb_ovsdb. - Linux kernel 4.13 * Add support for compiling OVS with the latest Linux 4.13 kernel + - "flush-conntrack" in ovs-dpctl and ovs-appctl now accept a 5-tuple to + delete a specific connection tracking entry. v2.8.0 - 31 Aug 2017 diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c index cee4791565fb..239c848d3735 100644 --- a/lib/ct-dpif.c +++ b/lib/ct-dpif.c @@ -20,6 +20,7 @@ #include #include "ct-dpif.h" +#include "openvswitch/ofp-parse.h" #include "openvswitch/vlog.h" VLOG_DEFINE_THIS_MODULE(ct_dpif); @@ -435,3 +436,110 @@ ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, int conn_per_state) ds_put_cstr(ds, "]"); ds_put_format(ds, "=%u", conn_per_state); } + +/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'. + * Returns true on success. Otherwise, returns false and puts the error + * message in 'ds'. */ +bool +ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct ds *ds) +{ +char *pos, *key, *value, *copy; +memset(tuple, 0, sizeof *tuple); + +pos = copy = xstrdup(s); +while (ofputil_parse_key_value(, , )) { +if (!*value) { +ds_put_format(ds, "field %s missing value", key); +goto error; +} + +if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) { +if (tuple->l3_type && tuple->l3_type != AF_INET) { +ds_put_cstr(ds, "L3 type set multiple times"); +goto error; +} else { +tuple->l3_type = AF_INET; +} +if (!ip_parse(value, key[6] == 's' ? >src.ip : + >dst.ip)) { +goto error_with_msg; +} +} else if (!strcmp(key, "ct_ipv6_src") || + !strcmp(key, "ct_ipv6_dst")) { +if (tuple->l3_type && tuple->l3_type != AF_INET6) { +ds_put_cstr(ds, "L3 type set multiple times"); +goto error; +} else { +tuple->l3_type = AF_INET6; +} +if (!ipv6_parse(value, key[8] == 's' ? >src.in6 : + >dst.in6)) { +goto error_with_msg; +} +} else if (!strcmp(key, "ct_nw_proto")) { +char *err = str_to_u8(value, key, >ip_proto); +if (err) { +free(err); +goto error_with_msg; +} +} else if (!strcmp(key, "ct_tp_src") || !strcmp(key,"ct_tp_dst")) { +uint16_t port; +char *err = str_to_u16(value, key, ); +if (err) { +free(err); +goto error_with_msg; +} +if (key[6] == 's') { +tuple->src_port = htons(port); +} else { +tuple->dst_port = htons(port); +} +} else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") || + !strcmp(key, "icmp_id") ) { +if (tuple->ip_proto != IPPROTO_ICMP && +tuple->ip_proto != IPPROTO_ICMPV6) { +ds_put_cstr(ds, "invalid L4 fields"); +goto error; +} +uint16_t icmp_id; +char *err; +if (key[5] == 't') { +err = str_to_u8(value, key, >icmp_type); +} else if (key[5] == 'c') { +err = str_to_u8(value, key, >icmp_code); +} else { +err = str_to_u16(value, key, _id); +tuple->icmp_id = htons(icmp_id); +} +if (err) { +free(err); +goto error_with_msg; +}
[ovs-dev] [PATCH v3 1/2] ct-dpif, dpif-netlink: Support conntrack flush by ct 5-tuple
This patch adds support of flushing a conntrack entry specified by the conntrack 5-tuple, and provides the implementation in dpif-netlink. The implementation of dpif-netlink in the linux datapath utilizes the NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in nf_conntrack. Future patches will add support for the userspace and Windows datapaths. VMWare-BZ: #1983178 Signed-off-by: Yi-Hung Wei--- lib/ct-dpif.c | 24 +--- lib/ct-dpif.h | 3 +- lib/dpctl.c | 2 +- lib/dpif-netdev.c | 6 ++- lib/dpif-netlink.c | 7 +++- lib/dpif-provider.h | 16 ++-- lib/netlink-conntrack.c | 97 + lib/netlink-conntrack.h | 1 + ofproto/ofproto-dpif.c | 2 +- tests/ovs-ofctl.at | 2 +- 10 files changed, 144 insertions(+), 16 deletions(-) diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c index c79e69e23970..cee4791565fb 100644 --- a/lib/ct-dpif.c +++ b/lib/ct-dpif.c @@ -110,20 +110,32 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump) : EOPNOTSUPP); } -/* Flush the entries in the connection tracker used by 'dpif'. +/* Flush the entries in the connection tracker used by 'dpif'. The + * arguments have the following behavior: * - * If 'zone' is not NULL, flush only the entries in '*zone'. */ + * - If both 'zone' and 'tuple' are NULL, flush all the conntrack entries. + * - If 'zone' is not NULL, and 'tuple' is NULL, flush all the conntrack + * entries in '*zone'. + * - If 'tuple' is not NULL, flush the conntrack entry specified by 'tuple' + * in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */ int -ct_dpif_flush(struct dpif *dpif, const uint16_t *zone) +ct_dpif_flush(struct dpif *dpif, const uint16_t *zone, + const struct ct_dpif_tuple *tuple) { -if (zone) { -VLOG_DBG("%s: ct_flush: %"PRIu16, dpif_name(dpif), *zone); +if (tuple) { +struct ds ds = DS_EMPTY_INITIALIZER; +ct_dpif_format_tuple(, tuple); +VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif), ds_cstr(), +zone ? *zone : 0); +ds_destroy(); +} else if (zone) { +VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone); } else { VLOG_DBG("%s: ct_flush: ", dpif_name(dpif)); } return (dpif->dpif_class->ct_flush -? dpif->dpif_class->ct_flush(dpif, zone) +? dpif->dpif_class->ct_flush(dpif, zone, tuple) : EOPNOTSUPP); } diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h index d5f966175bc0..ef019050c78e 100644 --- a/lib/ct-dpif.h +++ b/lib/ct-dpif.h @@ -195,7 +195,8 @@ int ct_dpif_dump_start(struct dpif *, struct ct_dpif_dump_state **, const uint16_t *zone, int *); int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *); int ct_dpif_dump_done(struct ct_dpif_dump_state *); -int ct_dpif_flush(struct dpif *, const uint16_t *zone); +int ct_dpif_flush(struct dpif *, const uint16_t *zone, + const struct ct_dpif_tuple *); void ct_dpif_entry_uninit(struct ct_dpif_entry *); void ct_dpif_format_entry(const struct ct_dpif_entry *, struct ds *, bool verbose, bool print_stats); diff --git a/lib/dpctl.c b/lib/dpctl.c index 6520788aa428..7fc0e3afab37 100644 --- a/lib/dpctl.c +++ b/lib/dpctl.c @@ -1353,7 +1353,7 @@ dpctl_flush_conntrack(int argc, const char *argv[], return error; } -error = ct_dpif_flush(dpif, pzone); +error = ct_dpif_flush(dpif, pzone, NULL); dpif_close(dpif); return error; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630c2712..b1ef9a6a5992 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -5734,10 +5734,14 @@ dpif_netdev_ct_dump_done(struct dpif *dpif OVS_UNUSED, } static int -dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone) +dpif_netdev_ct_flush(struct dpif *dpif, const uint16_t *zone, + const struct ct_dpif_tuple *tuple) { struct dp_netdev *dp = get_dp_netdev(dpif); +if (tuple) { +return EOPNOTSUPP; +} return conntrack_flush(>conntrack, zone); } diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index c265909f8587..3f76128999d1 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -2892,9 +2892,12 @@ dpif_netlink_ct_dump_done(struct dpif *dpif OVS_UNUSED, } static int -dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, const uint16_t *zone) +dpif_netlink_ct_flush(struct dpif *dpif OVS_UNUSED, const uint16_t *zone, + const struct ct_dpif_tuple *tuple) { -if (zone) { +if (tuple) { +return nl_ct_flush_tuple(tuple, zone ? *zone : 0); +} else if (zone) { return nl_ct_flush_zone(*zone); } else { return nl_ct_flush(); diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h index
[ovs-dev] [PATCH v3 0/2] Support conntrack flush by ct 5-tuple
This patch series add support to flush a connection tracking entry specified by a 5-tuple. v2 -> v3: * Rebase to master. * Fold in Justin's review comments. * Pull out dpif-netdev implementation. Will fix the conntack expectation issue, and submit a follow up patch. * Skip the ct flush by 5-tuple test on userspace datapath. v1 -> v2: * Incoporate suggestions from Justin. * Add dpif-netdev implementation. * Improve error reporting logic in dpctl.c. * Support ICMP 5-tuple. * Rebase to master. Yi-Hung Wei (2): ct-dpif,dpif-netlink: Support conntrack flush by ct 5-tuple dpctl: Support flush conntrack by conntrack 5-tuple NEWS | 2 + lib/ct-dpif.c| 132 +-- lib/ct-dpif.h| 4 +- lib/dpctl.c | 76 +- lib/dpctl.man| 20 -- lib/dpif-netdev.c| 6 +- lib/dpif-netlink.c | 7 ++- lib/dpif-provider.h | 16 - lib/netlink-conntrack.c | 97 lib/netlink-conntrack.h | 1 + ofproto/ofproto-dpif.c | 2 +- tests/ovs-ofctl.at | 2 +- tests/system-kmod-macros.at | 8 +++ tests/system-traffic.at | 65 +++ tests/system-userspace-macros.at | 10 +++ utilities/ovs-dpctl.c| 4 +- 16 files changed, 415 insertions(+), 37 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 3/3] dpctl: Support flush conntrack by conntrack 5-tuple
> I don't think the parentheses are necessary around "ct-tuple". > > I've attached an incremental patch to this message. Can you take a look? If > you agree with the changes, I'll commit this patch and the first from this > series. > > Thanks, > > --Justin Thanks for the review. I think all the comments are valuable, I fold them in v3. As you mentioned in the review comment on the second patch, I will pull out the second patch first, and re-submit it separately once I fix the conntrack expectation issue. Moreover, since the userspace datapath does not support flush by ct-tuple for now, I add a macro to skip the system traffic test on the userspace datapath. The diff after fold in your comments are as following. Thanks, -Yi-Hung === diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at index a7c6808ad1b4..34db21a564ae 100644 --- a/tests/system-kmod-macros.at +++ b/tests/system-kmod-macros.at @@ -96,3 +96,11 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK]) # always supports NAT, so no check is needed. # m4_define([CHECK_CONNTRACK_NAT]) + +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() +# +# Perform requirements checks for running ovs-dpctl flush-conntrack by +# conntrack 5-tuple test. The kernel datapath does support this +# feature. Will remove this check after both kernel and userspace datapath +# support it. +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE]) diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 4c638118d234..56aae69538cf 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -834,6 +834,7 @@ AT_CLEANUP AT_SETUP([conntrack - ct flush by 5-tuple]) CHECK_CONNTRACK() +CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() OVS_TRAFFIC_VSWITCHD_START() ADD_NAMESPACES(at_ns0, at_ns1) diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at index d3d27bb2b8f2..f22061298985 100644 --- a/tests/system-userspace-macros.at +++ b/tests/system-userspace-macros.at @@ -99,3 +99,13 @@ m4_define([CHECK_CONNTRACK_LOCAL_STACK], # datapath supports NAT. # m4_define([CHECK_CONNTRACK_NAT]) + +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() +# +# Perform requirements checks for running ovs-dpctl flush-conntrack by +# conntrack 5-tuple test. The userspace datapath does not support +# this feature yet. +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE], +[ +AT_SKIP_IF([:]) +]) === ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.
> -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of Chandran, Sugesh > Sent: Thursday, December 7, 2017 5:07 PM > To: O Mahony, Billy; d...@openvswitch.org; > b...@ovn.org > Subject: Re: [ovs-dev] [PATCH 2/2] Adding configuration option to > whitelist DPDK physical ports. > > > > Regards > _Sugesh > > > -Original Message- > > From: O Mahony, Billy > > Sent: Thursday, December 7, 2017 11:47 AM > > To: Chandran, Sugesh ; > d...@openvswitch.org; > > b...@ovn.org > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > whitelist DPDK physical ports. > > > > Hi Sugesh, > > > > > -Original Message- > > > From: Chandran, Sugesh > > > Sent: Wednesday, December 6, 2017 6:23 PM > > > To: O Mahony, Billy ; > d...@openvswitch.org; > > > b...@ovn.org > > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > > whitelist DPDK physical ports. > > > > > > Thank you Billy for the review. > > > Please find below my reply. > > > > > > Regards > > > _Sugesh > > > > > > > > > > -Original Message- > > > > From: O Mahony, Billy > > > > Sent: Wednesday, December 6, 2017 5:31 PM > > > > To: Chandran, Sugesh ; > > > > d...@openvswitch.org; b...@ovn.org > > > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > > > whitelist DPDK physical ports. > > > > > > > > Hi Sugesh, > > > > > > > > This is definitely a very useful feature. I'm looking forward to > > > > running trex on the same DUT as my ovs-dpdk. [Mooney, Sean K] you can all ready to this you just need to set the whitelist In other_config:dpdk-extra just repeat "-w $address" for each device. To have two dpdk primary processes on the same system you will also need to change The hugepage prfix used be dpdk which you can also do via the dpdk-extra option. After this patch we will still be able to specify the whitelist using other_config:dpdk-extra correct? If not this may break ovs-dpdk support in openstack installers. I ported our whitelist code in networking-ovs-dpdk to use dpdk-extra when when we moved the dpdk params to the db and I also added it to kolla. im pretty sure tripple0 and fule also do the same. > > > > > > > > However I'd suggest adding an sscanf or some such to verify that > > > > the domain is also specified for each whitelist member. And > either > > > > add the default of '' or complain loudly if the domain is > absent. > > > [Sugesh] Will throw an error in that case then . > > > > > > > > > > > Currently (without this patch) you must specify the domain when > > > > adding > > ports: > > > >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else > an > > > > error such as 'Cannot find unplugged device (05:00.0)' is > reported. > > > > > > > > And with the patch if you include the domain in the other_config > (e.g. > > > > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything > > > > works just as before. > > > > > > > > However with the patch if you add the whitelist *without* a > domain e.g. > > > > ovs-vsctl --no-wait set Open_vSwitch . > > > > other_config:dpdk-whitelist-pci- ids="05:00.0" > > > > > > > > There is no immediate error. However later when doing add-port if > > > > you include the domain (current required practice) you will get > an error. > > > > If you omit the domain all is well. > > > [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the > ''. > > > But to bind and PCI scan its not necessary. > > > So to keep it consistent, I would add check for PCI-ID in whitelist > > > config too, and throw error incase pci-id are mentioned wrong(means > > > without > > ''. [Mooney, Sean K] don't assume it is '' it is only '' if the pci device is A child of the first pci root usally connected to socket 0 unless you server is old Enough to still use a FSB instead of qpi/dmi. > > > Does it looks OK to you? > > > > [[BO'M]] I think the error is the right thing to do. It would be > > tempting to insert the default '' if the domain is omitted but > > then you would have a confusing inconsistency in that it would be ok > > to omit the domain in one place (whitelist) but not in the other > (add-port). > > > [Sugesh] Thank you Billy,Will add the check and error out accordingly. > Will release the next patch version by incorporating the comments, once > you have complete the review of second part of the patch too. [Mooney, Sean K] it should be noted that the dpdk devbing script will only work without The domain if its unambiguious. On a multi socket system with the right or wrong depending on your perspective placement of physical pci devise it possible to get colltions in which case dpdk-devbind will not be able to correctly determing what device to manage so you should always use the full pci address. > > > > > > > > It's a
[ovs-dev] [PATCH V5 2/2] netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature. This is a security feature, which restricts the vhost memory that a virtio device may access. This feature also enables the vhost REPLY_ACK protocol, the implementation of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature is disabled by default in (and should remain so), for the aforementioned older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, even without having an IOMMU device, with no performance penalty. This patch adds a new global config option, vhost-iommu-support, that controls enablement of the vhost IOMMU feature: ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true This value defaults to false; to enable IOMMU support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. Signed-off-by: Mark Kavanagh--- Documentation/topics/dpdk/vhost-user.rst | 28 NEWS | 1 + lib/dpdk-stub.c | 6 ++ lib/dpdk.c | 12 lib/dpdk.h | 3 +++ lib/netdev-dpdk.c| 14 ++ vswitchd/vswitch.xml | 15 +++ 7 files changed, 75 insertions(+), 4 deletions(-) diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 1a5de24..8b1c671 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -273,6 +273,34 @@ One benefit of using this mode is the ability for vHost ports to 'reconnect' in event of the switch crashing or being brought down. Once it is brought back up, the vHost ports will reconnect automatically and normal service will resume. +vhost-user-client IOMMU Support +~~~ + +vhost IOMMU is a feature which restricts the vhost memory that a virtio device +can access, and as such is useful in deployments in which security is a +concern. + +IOMMU support may be enabled via a global config value, +```vhost-iommu-support```. Setting this to true enables vhost IOMMU support for +all vhost ports when/where available:: + +$ ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true + +The default value is false. + +.. important:: + +Changing this value requires restarting the daemon. + +.. important:: + +Enabling the IOMMU feature also enables the vhost user reply-ack protocol; +this is known to work on QEMU v2.10.0, but is buggy on older versions +(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is disabled by +default (and should remain so if using the aforementioned versions of +QEMU). Starting with QEMU v2.9.1, vhost-iommu-support can safely be +enabled, even without having an IOMMU device, with no performance penalty. + .. _dpdk-testpmd: DPDK in the Guest diff --git a/NEWS b/NEWS index d4a1c9a..99c47d8 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,7 @@ Post-v2.8.0 * Add support for compiling OVS with the latest Linux 4.13 kernel - DPDK: * Add support for DPDK v17.11 + * Add support for vHost IOMMU v2.8.0 - 31 Aug 2017 diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..3602180 100644 --- a/lib/dpdk-stub.c +++ b/lib/dpdk-stub.c @@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void) { return NULL; } + +bool +dpdk_vhost_iommu_enabled(void) +{ +return false; +} diff --git a/lib/dpdk.c b/lib/dpdk.c index 8da6c32..6710d10 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -41,6 +41,7 @@ VLOG_DEFINE_THIS_MODULE(dpdk); static FILE *log_stream = NULL; /* Stream for DPDK log redirection */ static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ +static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */ static int process_vhost_flags(char *flag, const char *default_val, int size, @@ -345,6 +346,11 @@ dpdk_init__(const struct smap *ovs_other_config) vhost_sock_dir = sock_dir_subcomponent; } +vhost_iommu_enabled = smap_get_bool(ovs_other_config, +"vhost-iommu-support", false); +VLOG_INFO("IOMMU support for vhost-user-client %s.", + vhost_iommu_enabled ? "enabled" : "disabled"); + argv = grow_argv(, 0, 1); argc = 1; argv[0] = xstrdup(ovs_get_program_name()); @@ -482,6 +488,12 @@ dpdk_get_vhost_sock_dir(void) return vhost_sock_dir; } +bool +dpdk_vhost_iommu_enabled(void) +{ +return vhost_iommu_enabled; +} + void dpdk_set_lcore_id(unsigned cpu) { diff --git a/lib/dpdk.h b/lib/dpdk.h index 673a1f1..dc58d96 100644 --- a/lib/dpdk.h
[ovs-dev] [PATCH V5 1/2] netdev-dpdk: DPDK v17.11 upgrade
This commit adds support for DPDK v17.11: - minor updates to accomodate DPDK API changes - update references to DPDK version in Documentation - update DPDK version in travis' linux-build script - document DPDK v17.11 virtio driver bug Signed-off-by: Mark KavanaghAcked-by: Maxime Coquelin Acked-by: Ciara Loftus Acked-by: Jan Scheurich Tested-by: Jan Scheurich Tested-by: Guoshuai Li --- .travis/linux-build.sh | 2 +- Documentation/faq/releases.rst | 1 + Documentation/intro/install/dpdk.rst | 24 +++- Documentation/topics/dpdk/ring.rst | 2 +- Documentation/topics/dpdk/vhost-user.rst | 5 + NEWS | 2 ++ lib/netdev-dpdk.c| 5 +++-- 7 files changed, 32 insertions(+), 9 deletions(-) diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh index 4d6459f..ed28ee4 100755 --- a/.travis/linux-build.sh +++ b/.travis/linux-build.sh @@ -81,7 +81,7 @@ fi if [ "$DPDK" ]; then if [ -z "$DPDK_VER" ]; then -DPDK_VER="17.05.2" +DPDK_VER="17.11" fi install_dpdk $DPDK_VER if [ "$CC" = "clang" ]; then diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst index d903b06..62a1957 100644 --- a/Documentation/faq/releases.rst +++ b/Documentation/faq/releases.rst @@ -164,6 +164,7 @@ Q: What DPDK version does each Open vSwitch release work with? 2.6.x16.07.2 2.7.x16.11.3 2.8.x17.05.2 +2.9.x17.11 === Q: I get an error like this when I configure Open vSwitch: diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index bb69ae5..8c036dc 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -40,7 +40,7 @@ Build requirements In addition to the requirements described in :doc:`general`, building Open vSwitch with DPDK will require the following: -- DPDK 17.05.2 +- DPDK 17.11 - A `DPDK supported NIC`_ @@ -69,9 +69,9 @@ Install DPDK #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``:: $ cd /usr/src/ - $ wget http://fast.dpdk.org/rel/dpdk-17.05.2.tar.xz - $ tar xf dpdk-17.05.2.tar.xz - $ export DPDK_DIR=/usr/src/dpdk-stable-17.05.2 + $ wget http://fast.dpdk.org/rel/dpdk-17.11.tar.xz + $ tar xf dpdk-17.11.tar.xz + $ export DPDK_DIR=/usr/src/dpdk-17.11 $ cd $DPDK_DIR #. (Optional) Configure DPDK as a shared library @@ -583,7 +583,21 @@ Limitations The latest list of validated firmware versions can be found in the `DPDK release notes`_. -.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_05.html +.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html + +- The DPDK v17.11 virtio net driver contains a bug that prevents guest DPDK + applications from receiving packets. This only occurs when guest-bound traffic + is live before a DPDK application is started within the guest, and where two + or more forwarding cores are used. As such, it is not recommended for guests + which execute DPDK applications to upgrade to DPDK v17.11. + + Note that this issue does not occur when guest network devices are controlled + by the guest kernel. + + Details regarding the virtio driver bug are available on the `DPDK mailing + list`_. + +.. _DPDK mailing list: http://dpdk.org/ml/archives/dev/2017-December/082801.html Reporting Bugs -- diff --git a/Documentation/topics/dpdk/ring.rst b/Documentation/topics/dpdk/ring.rst index ad9d7a5..8d0ede8 100644 --- a/Documentation/topics/dpdk/ring.rst +++ b/Documentation/topics/dpdk/ring.rst @@ -77,4 +77,4 @@ DPDK. However, this functionality was removed because: - :doc:`vhost-user interfaces ` are the defacto DPDK-based path to guests -.. _DPDK documentation: https://dpdk.readthedocs.io/en/v17.05/prog_guide/ring_lib.html +.. _DPDK documentation: https://dpdk.readthedocs.io/en/v17.11/prog_guide/ring_lib.html diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 74ac06e..1a5de24 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -287,6 +287,11 @@ application in the VM. Support for DPDK in the guest requires QEMU >= 2.2 +.. important:: + + DPDK v17.11 virtio PMD contains a bug that affects testpmd/DPDK guest + applications. As such, guests should not upgrade to 17.11. + To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or :ref:`dpdk-vhost-user-client`. Once started, connect to the VM, download the DPDK sources to VM and build DPDK:: diff --git a/NEWS b/NEWS index 427c8f8..d4a1c9a 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,8 @@
[ovs-dev] [PATCH V5 0/2] DPDK v17.11 Support
This patchset adds support for DPDK v17.11: - the first patch introduces minor code updates to accomodate DPDK API changes, and also updates Documentation and travis scripts. - the second patch adds a new global configuration option, vhost-iommu-support; this is required in order to take advantage of the vHost IOMMU feature introduced in DPDK v17.11. --- v5: - document DPDK v17.11 virtio PMD bug in guest - minor typographical corrections in documentation v4: - rebase to HEAD of master - clarify that IOMMU support applies exclusively to vhost user client ports. - reword caveats regarding modifying vhost-iommu-support at runtime - use smap_get_bool() to parse vhost-iommu-support, instead of process_vhost_flags(). - add stub implementation of dpdk_vhost_iommu_enabled(). - move stdbool.h #include outside of DPDK compiler guards. - remove mention of vhost IOMMU mode from netdev_dpdk_vhost_client_configure() log. v3: - erroneous; disregard. v2: - refactor vHost IOMMU enablement mechanism (use a global config value, instead of the previous per-port approach). Mark Kavanagh (2): netdev-dpdk: DPDK v17.11 upgrade netdev-dpdk: vHost IOMMU support .travis/linux-build.sh | 2 +- Documentation/faq/releases.rst | 1 + Documentation/intro/install/dpdk.rst | 24 ++- Documentation/topics/dpdk/ring.rst | 2 +- Documentation/topics/dpdk/vhost-user.rst | 33 NEWS | 3 +++ lib/dpdk-stub.c | 6 ++ lib/dpdk.c | 12 lib/dpdk.h | 3 +++ lib/netdev-dpdk.c| 19 -- vswitchd/vswitch.xml | 15 +++ 11 files changed, 107 insertions(+), 13 deletions(-) -- 1.9.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 3/6] datapath: conntrack: make protocol tracker pointers const
Upstream commit: commit b3480fe059ac9121b5714205b4ddae14b59ef4be Author: Florian WestphalDate: Sat Aug 12 00:57:08 2017 +0200 netfilter: conntrack: make protocol tracker pointers const Doesn't change generated code, but will make it easier to eventually make the actual trackers themselvers const. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Rose --- datapath/conntrack.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/datapath/conntrack.c b/datapath/conntrack.c index d517a87..c97e3e4 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -613,8 +613,8 @@ static struct nf_conn * ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone, u8 l3num, struct sk_buff *skb, bool natted) { - struct nf_conntrack_l3proto *l3proto; - struct nf_conntrack_l4proto *l4proto; + const struct nf_conntrack_l3proto *l3proto; + const struct nf_conntrack_l4proto *l4proto; struct nf_conntrack_tuple tuple; struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.
Regards _Sugesh > -Original Message- > From: O Mahony, Billy > Sent: Thursday, December 7, 2017 11:47 AM > To: Chandran, Sugesh; d...@openvswitch.org; > b...@ovn.org > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist > DPDK physical ports. > > Hi Sugesh, > > > -Original Message- > > From: Chandran, Sugesh > > Sent: Wednesday, December 6, 2017 6:23 PM > > To: O Mahony, Billy ; d...@openvswitch.org; > > b...@ovn.org > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > whitelist DPDK physical ports. > > > > Thank you Billy for the review. > > Please find below my reply. > > > > Regards > > _Sugesh > > > > > > > -Original Message- > > > From: O Mahony, Billy > > > Sent: Wednesday, December 6, 2017 5:31 PM > > > To: Chandran, Sugesh ; > > > d...@openvswitch.org; b...@ovn.org > > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > > whitelist DPDK physical ports. > > > > > > Hi Sugesh, > > > > > > This is definitely a very useful feature. I'm looking forward to > > > running trex on the same DUT as my ovs-dpdk. > > > > > > However I'd suggest adding an sscanf or some such to verify that the > > > domain is also specified for each whitelist member. And either add > > > the default of '' or complain loudly if the domain is absent. > > [Sugesh] Will throw an error in that case then . > > > > > > > > Currently (without this patch) you must specify the domain when adding > ports: > > >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else an > > > error such as 'Cannot find unplugged device (05:00.0)' is reported. > > > > > > And with the patch if you include the domain in the other_config (e.g. > > > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything works > > > just as before. > > > > > > However with the patch if you add the whitelist *without* a domain e.g. > > > ovs-vsctl --no-wait set Open_vSwitch . > > > other_config:dpdk-whitelist-pci- ids="05:00.0" > > > > > > There is no immediate error. However later when doing add-port if > > > you include the domain (current required practice) you will get an error. > > > If you omit the domain all is well. > > [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the ''. > > But to bind and PCI scan its not necessary. > > So to keep it consistent, I would add check for PCI-ID in whitelist > > config too, and throw error incase pci-id are mentioned wrong(means without > ''. > > Does it looks OK to you? > > [[BO'M]] I think the error is the right thing to do. It would be tempting to > insert > the default '' if the domain is omitted but then you would have a > confusing > inconsistency in that it would be ok to omit the domain in one place > (whitelist) > but not in the other (add-port). > [Sugesh] Thank you Billy,Will add the check and error out accordingly. Will release the next patch version by incorporating the comments, once you have complete the review of second part of the patch too. > > > > > > It's a little bit strange as regardless of domain or no domain in > > > the other_config the PCI probe always reports the NIC as expected: > > > 2017-12-06T16:55:27Z|00013|dpdk|INFO|EAL: PCI device > > > :05:00.0 on NUMA socket -1 > > > 2017-12-06T16:55:27Z|00014|dpdk|WARN|EAL: Invalid NUMA socket, > > > default to 0 > > > 2017-12-06T16:55:27Z|00015|dpdk|INFO|EAL: probe driver: 8086:1572 > > > net_i40e > > > > > > I'll be using the other patch in this series "isolate rte-mempool > > > allocation" over the next few days so I'll review that in due course. > > > > > > Thanks, > > > Billy. > > > > > > > -Original Message- > > > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > > > boun...@openvswitch.org] On Behalf Of Sugesh Chandran > > > > Sent: Friday, November 10, 2017 1:29 AM > > > > To: d...@openvswitch.org; b...@ovn.org > > > > Subject: [ovs-dev] [PATCH 2/2] Adding configuration option to > > > > whitelist DPDK physical ports. > > > > > > > > Adding a OVS configuration option to whitelist DPDK physical ports. > > > > By default running multiple instances of DPDK on a single platform > > > > cannot use physical ports at the same time even though they are > > > > distinct. > > > > > > > > The eal init scans all the ports that are bound to DPDK and > > > > initialize the drivers accordingly. This happens for every DPDK process > > > > init. > > > > On a multi instance deployment usecase, it causes issues for using > > > > physical NIC ports. > > > > Consider a two DPDK process that are running on a single platform, > > > > the second DPDK primary process will try to initialize the drivers > > > > for all the physical ports even though it may be used in first DPDK > > > > process. > > > > > > > > To avoid this situation user can whitelist the ports for each DPDK > application. > > > > Whitelisting of
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Thursday, December 7, 2017 4:07 PM > To: Mooney, Sean K; Kevin Traynor > ; Kavanagh, Mark B ; > Ilya Maximets ; Stokes, Ian > ; d...@openvswitch.org > Cc: maxime.coque...@redhat.com; Fischetti, Antonio > ; Bie, Tiwei ; > Mcnamara, John ; Guoshuai Li > ; Loftus, Ciara ; Yuanhan Liu > > Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > > > > Would the virtio PMD bug in DPDK 17.11 in the guest actually be > > > mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser > > > backend inside OVS on the host? > > > [Mooney, Sean K] from talking to mark about this eairlier I don’t > belive so. > > I think if you used 17.11 testpmd in the guest with kernel ovs you > > should get the same behavior e.g. it does not forward packet. The > > guest should not be able to know with certainty what vhost backend is > in use on the host. > > If that is the case why should OVS even consider documenting that there > are known problems when running a DPDK 17.11 application in a guest, > irrespective of the OVS version or datapath? > > If anything, this is a critical DPDK bug and DPDK need to warn their > virtio PMD users not to use 17.11 until this bug is fixed in a 17.11.1 > maintenance release. > > The real issue I see is that OVS-DPDK may also be deployed in in a VM > with virtio ports, and that kind of deployment will indeed be broken > with DPDK 17.11. Can we accept and document that as a limitation in OVS > 2.9 that will be fixed in OVS 2.9.1 as soon as DPDK 17.11.1 is out? [Mooney, Sean K] that is a concern for me also. I run ovs-dpdk in a vm on a daily Basis but I typical configure the vm with e1000 virtual nics but that then requires Me to use kernel ovs on the host. I started using this partly to work around old virtio pmd bugs that are have since been fixed. I ran ovs-dpdk in a vm on top of ovs-dpdk with vhost-user nics via openstack for years However using Virtio-net-pci devices for ovs-dpdk requires special handeling Due to some limitations in the choice of pf driver. Without a virtualized iommu in a guest its not possible to use the vfio-pci Driver with virtio-net-interfaces. qemu/kvm did not support vIOMMUs in the past so this has only recently become an option but I have not validated that it works myself. Uio support is compiled out by default by Ubuntu,fedora,centos and presumable rhel. That mean even on distros that package ovs-dpdk to run it in a vm you have to build From source. Igb_uio was previously known to be buggy with vitio-net-pci devices and dpdk had advised against using it as you could not reliably unbind the virtio interface form igb_uio and back to the kernel driver. So igb_uio worked but you had to reboot the vm to get the kernel dirver working again which was annoying for my development and ci usecases.hence my use of e1000 vNICs This may have been fixed. Uio_pci_generic works with virtio-net-pci devices as far as I know but unlike Vfio-pci or igb_uio its memory access is over privaldged so its not a great choice If you care about you applications security. It did however avoid the buggy igb_uio Rebinding issue and it does not need an iommu to work so has been the most reliable If least secure option for nested ovs-dpdk. Since this bug is in the dpdk virtio pmd and not the pf driver however it will effect Each one equally so I think the onus remain with the vnf vendors that build on ovs and dpdk to validate their dependencies before choosing to rev the ovs and/or dpdk versions they use. Having a warning about this particular issue in the ovs docs however may help some of them Avoid this trap and it will give us something I can refer people to between now and the dpdk 17.11.1 Release. > > Or should we as a temporary solution apply a patch to DPDK 17.11 to fix > this when building OVS? [Mooney, Sean K] dpdk is not built via the ovs make file we just link to it so I don’t think that is a good option for upstream ovs. That said I have patching support In my networking-ovs-dpdk devstack plugin for just this reason. If people are building Ovs form source I think just specifying which dpdk commit fixes the issue should be Enough for them to be able to work around this issue until 17.11.1 is released. > > BR, Jan > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 6/6] travis: Update kernel test list from kernel.org
From: Greg RoseSigned-off-by: Greg Rose --- .travis.yml | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/.travis.yml b/.travis.yml index f217840..8ca6d75 100644 --- a/.travis.yml +++ b/.travis.yml @@ -25,17 +25,16 @@ sudo: false env: - OPTS="--disable-ssl" - - TESTSUITE=1 KERNEL=3.16.47 + - TESTSUITE=1 KERNEL=3.16.51 - TESTSUITE=1 OPTS="--enable-shared" - BUILD_ENV="-m32" OPTS="--disable-ssl" - - KERNEL=3.16.47 DPDK=1 - - KERNEL=3.16.47 DPDK=1 OPTS="--enable-shared" - - KERNEL=4.13 - - KERNEL=4.12.11 - - KERNEL=4.9.48 - - KERNEL=4.4.87 - - KERNEL=4.1.43 - - KERNEL=3.10.107 + - KERNEL=3.16.51 DPDK=1 + - KERNEL=3.16.51 DPDK=1 OPTS="--enable-shared" + - KERNEL=4.14.3 + - KERNEL=4.9.66 + - KERNEL=4.4.103 + - KERNEL=4.1.46 + - KERNEL=3.10.108 - TESTSUITE=1 LIBS=-ljemalloc matrix: -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 5/6] acinclude.m4: Enable Linux 4.14
From: Greg RoseSigned-off-by: Greg Rose --- acinclude.m4 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 6511a24..b815c58 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -143,10 +143,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [ AC_MSG_RESULT([$kversion]) if test "$version" -ge 4; then - if test "$version" = 4 && test "$patchlevel" -le 13; then + if test "$version" = 4 && test "$patchlevel" -le 14; then : # Linux 4.x else - AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.13.x is not supported (please refer to the FAQ for advice)]) + AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.14.x is not supported (please refer to the FAQ for advice)]) fi elif test "$version" = 3 && test "$patchlevel" -ge 10; then : # Linux 3.x -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 4/6] datapath: Fix SKB_GSO_UDP usage
From: Greg RoseUsing SKB_GSO_UDP breaks the compilation on Linux 4.14. Check for the HAVE_SKB_GSO_UDP compiler #define. Signed-off-by: Greg Rose --- datapath/datapath.c | 9 ++--- datapath/linux/compat/stt.c | 11 ++- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index 1780819..a3fdd8f 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -339,8 +339,10 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, const struct dp_upcall_info *upcall_info, uint32_t cutlen) { +#ifdef HAVE_SKB_GSO_UDP unsigned short gso_type = skb_shinfo(skb)->gso_type; struct sw_flow_key later_key; +#endif struct sk_buff *segs, *nskb; struct ovs_skb_cb ovs_cb; int err; @@ -352,7 +354,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, return PTR_ERR(segs); if (segs == NULL) return -EINVAL; - +#ifdef HAVE_SKB_GSO_UDP if (gso_type & SKB_GSO_UDP) { /* The initial flow key extracted by ovs_flow_key_extract() * in this case is for a first fragment, so we need to @@ -361,14 +363,15 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, later_key = *key; later_key.ip.frag = OVS_FRAG_TYPE_LATER; } - +#endif /* Queue all of the segments. */ skb = segs; do { *OVS_CB(skb) = ovs_cb; +#ifdef HAVE_SKB_GSO_UDP if (gso_type & SKB_GSO_UDP && skb != segs) key = _key; - +#endif err = queue_userspace_packet(dp, skb, key, upcall_info, cutlen); if (err) break; diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index 37d5f4b..66a97f2 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c @@ -81,8 +81,13 @@ struct stt_dev { #define STT_PROTO_TCP BIT(3) #define STT_PROTO_TYPES(STT_PROTO_IPV4 | STT_PROTO_TCP) +#ifdef HAVE_SKB_GSO_UDP #define SUPPORTED_GSO_TYPES (SKB_GSO_TCPV4 | SKB_GSO_UDP | SKB_GSO_DODGY | \ SKB_GSO_TCPV6) +#else +#define SUPPORTED_GSO_TYPES (SKB_GSO_TCPV4 | SKB_GSO_DODGY | \ +SKB_GSO_TCPV6) +#endif /* The length and offset of a fragment are encoded in the sequence number. * STT_SEQ_LEN_SHIFT is the left shift needed to store the length. @@ -1310,7 +1315,7 @@ static bool validate_checksum(struct sk_buff *skb) static bool set_offloads(struct sk_buff *skb) { struct stthdr *stth = stt_hdr(skb); - unsigned short gso_type; + unsigned short gso_type = 0; int l3_header_size; int l4_header_size; u16 csum_offset; @@ -1351,7 +1356,9 @@ static bool set_offloads(struct sk_buff *skb) case STT_PROTO_IPV4: /* UDP/IPv4 */ csum_offset = offsetof(struct udphdr, check); +#ifdef HAVE_SKB_GSO_UDP gso_type = SKB_GSO_UDP; +#endif l3_header_size = sizeof(struct iphdr); l4_header_size = sizeof(struct udphdr); skb->protocol = htons(ETH_P_IP); @@ -1359,7 +1366,9 @@ static bool set_offloads(struct sk_buff *skb) default: /* UDP/IPv6 */ csum_offset = offsetof(struct udphdr, check); +#ifdef HAVE_SKB_GSO_UDP gso_type = SKB_GSO_UDP; +#endif l3_header_size = sizeof(struct ipv6hdr); l4_header_size = sizeof(struct udphdr); skb->protocol = htons(ETH_P_IPV6); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 2/6] compat: Do not include headers when not compiling
From: Greg RoseIf the entire file is not going to be compiled because OVS is using upstream tunnel support then also don't bother pulling in the headers. Signed-off-by: Greg Rose --- datapath/linux/compat/ip_gre.c| 2 +- datapath/linux/compat/ip_output.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c index 03c5435..2579093 100644 --- a/datapath/linux/compat/ip_gre.c +++ b/datapath/linux/compat/ip_gre.c @@ -12,6 +12,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#ifndef USE_UPSTREAM_TUNNEL #include #include #include @@ -52,7 +53,6 @@ #include #include -#ifndef USE_UPSTREAM_TUNNEL #if IS_ENABLED(CONFIG_IPV6) #include #include diff --git a/datapath/linux/compat/ip_output.c b/datapath/linux/compat/ip_output.c index edca340..e2f869f 100644 --- a/datapath/linux/compat/ip_output.c +++ b/datapath/linux/compat/ip_output.c @@ -45,6 +45,7 @@ * Hirokazu Takahashi: sendfile() on UDP works now. */ +#ifndef HAVE_CORRECT_MRU_HANDLING #include #include #include @@ -82,7 +83,6 @@ #include #include -#ifndef HAVE_CORRECT_MRU_HANDLING static inline void rpl_ip_options_fragment(struct sk_buff *skb) { unsigned char *optptr = skb_network_header(skb) + sizeof(struct iphdr); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 1/6] datapath: Fix netdev_master_upper_dev_link for 4.14
From: Greg RoseAn extended netlink ack has been added for 4.14 - add compat layer changes so that it compiles for all kernels up to and including 4.14. Signed-off-by: Greg Rose --- acinclude.m4| 3 +++ datapath/linux/compat/include/linux/netdevice.h | 15 ++- datapath/vport-netdev.c | 9 - 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 1179a40..6511a24 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -785,6 +785,9 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netfilter.h], [nf_hook_ops], [list], [OVS_DEFINE([HAVE_LIST_IN_NF_HOOK_OPS])]) + OVS_FIND_PARAM_IFELSE([$KSRC/include/linux/netdevice.h], +[netdev_master_upper_dev_link], [extack], +[OVS_DEFINE([HAVE_UPPER_DEV_LINK_EXTACK])]) if cmp -s datapath/linux/kcompat.h.new \ datapath/linux/kcompat.h >/dev/null 2>&1; then diff --git a/datapath/linux/compat/include/linux/netdevice.h b/datapath/linux/compat/include/linux/netdevice.h index 3c3cf42..c460332 100644 --- a/datapath/linux/compat/include/linux/netdevice.h +++ b/datapath/linux/compat/include/linux/netdevice.h @@ -101,13 +101,26 @@ static inline bool netif_needs_gso(struct sk_buff *skb, #ifndef HAVE_NETDEV_MASTER_UPPER_DEV_LINK_RH static inline int rpl_netdev_master_upper_dev_link(struct net_device *dev, struct net_device *upper_dev, - void *upper_priv, void *upper_info) + void *upper_priv, + void *upper_info, void *extack) { return netdev_master_upper_dev_link(dev, upper_dev); } #define netdev_master_upper_dev_link rpl_netdev_master_upper_dev_link #endif +#else +#ifndef HAVE_UPPER_DEV_LINK_EXTACK +static inline int rpl_netdev_master_upper_dev_link(struct net_device *dev, + struct net_device *upper_dev, + void *upper_priv, + void *upper_info, void *extack) +{ + return netdev_master_upper_dev_link(dev, upper_dev, upper_priv, + upper_info); +} +#define netdev_master_upper_dev_link rpl_netdev_master_upper_dev_link +#endif #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(3,16,0) diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c index 697c442..e2d8eaf 100644 --- a/datapath/vport-netdev.c +++ b/datapath/vport-netdev.c @@ -112,8 +112,15 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name) } rtnl_lock(); +#ifdef HAVE_NETDEV_MASTER_UPPER_DEV_LINK_RH err = netdev_master_upper_dev_link(vport->dev, - get_dpdev(vport->dp), NULL, NULL); + get_dpdev(vport->dp), + NULL, NULL); +#else + err = netdev_master_upper_dev_link(vport->dev, + get_dpdev(vport->dp), + NULL, NULL, NULL); +#endif if (err) goto error_unlock; -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 0/6] Enable OVS on Linux 4.14 kernel
Various fixes and compat layer changes required to enable building OVS for the upstream Linux 4.14 kernel. The constant changing of the netdev_master_upper_dev_link parameters is a real headache. I couldn't think of any cleaner way to do it than the approach I used but I welcome suggestions on how to make that code less ugly - because it's ten miles of bad road ugly. But it does compile and pass basic checks on all of the currently supported kernels. There's more fixes for SKB_GSO_UDP - I'm not sure why these don't show up for the 4.13 or previous kernels but I think it has to do with the recent change to make sure the SKB_GSO_UDP was searched as a whole word and thus exposed more fractures. I updated .travis.yml to use all the most recent supported LTS and stable kernels from kernel.org. Greg Rose (6): datapath: Fix netdev_master_upper_dev_link for 4.14 compat: Do not include headers when not compiling datapath: Constify struct nf_conntrack_l4proto datapath: Fix SKB_GSO_UDP usage acinclude.m4: Enable Linux 4.14 travis: Update kernel test list from kernel.org .travis.yml | 17 - acinclude.m4| 7 +-- datapath/conntrack.c| 2 +- datapath/datapath.c | 9 ++--- datapath/linux/compat/include/linux/netdevice.h | 15 ++- datapath/linux/compat/ip_gre.c | 2 +- datapath/linux/compat/ip_output.c | 2 +- datapath/linux/compat/stt.c | 11 ++- datapath/vport-netdev.c | 9 - 9 files changed, 54 insertions(+), 20 deletions(-) -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpctl: Fix comment describing get_one_dp().
On Wed, Dec 06, 2017 at 10:19:23PM -0800, Justin Pettit wrote: > Signed-off-by: Justin Pettit> --- > lib/dpctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/dpctl.c b/lib/dpctl.c > index 06cfbc4abf21..5e064ec9c100 100644 > --- a/lib/dpctl.c > +++ b/lib/dpctl.c > @@ -129,8 +129,8 @@ if_up(struct netdev *netdev) > } > > /* Retrieve the name of the datapath if exactly one exists. The caller > - * is responsible for freeing the returned string. If there is not one > - * datapath, aborts with an error message. */ > + * is responsible for freeing the returned string. If a single datapath > + * name cannot be determined, returns NULL. */ > static char * > get_one_dp(struct dpctl_params *dpctl_p) Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> > Would the virtio PMD bug in DPDK 17.11 in the guest actually be > > mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend > > inside OVS on the host? > [Mooney, Sean K] from talking to mark about this eairlier I don’t belive so. > I think if you used 17.11 testpmd in the guest with kernel ovs you should > get the same behavior e.g. it does not forward packet. The guest should not > be able to know with certainty what vhost backend is in use on the host. If that is the case why should OVS even consider documenting that there are known problems when running a DPDK 17.11 application in a guest, irrespective of the OVS version or datapath? If anything, this is a critical DPDK bug and DPDK need to warn their virtio PMD users not to use 17.11 until this bug is fixed in a 17.11.1 maintenance release. The real issue I see is that OVS-DPDK may also be deployed in in a VM with virtio ports, and that kind of deployment will indeed be broken with DPDK 17.11. Can we accept and document that as a limitation in OVS 2.9 that will be fixed in OVS 2.9.1 as soon as DPDK 17.11.1 is out? Or should we as a temporary solution apply a patch to DPDK 17.11 to fix this when building OVS? BR, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.
On 07/12/17 14:48, Jan Scheurich wrote: -Original Message- From: Ilya Maximets [mailto:i.maxim...@samsung.com] Sent: Thursday, 07 December, 2017 14:28 This is preparation for 'struct dp_netdev_pmd_thread' modification in upcoming commits. Needed to avoid reordering and regrouping while replacing old and adding new members. Should this be part of the TX batching set? Anyway, I'm ok if it's not stalling the approval :) Unfortunately yes, because members reordered and regrouped just to include new members: pmd->ctx and pmd->n_output_batches. This could not be a standalone change because adding of different members will require different regrouping/ reordering. I moved this change to a separate patch to not do this twice while adding each member in patches 2/7 and 6/7. Anyway, as I mentioned in cover letter, I still prefer reverting of the padding at all by this patch: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html I think we should not spent time designing and reviewing these kind of patches that are made necessary by the introduction of commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941. As far as I can see there was never a single review on the original patch: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339402.html I wonder how it got merged into master in the first place. I strongly support Ilya's revert patch for that commit: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html Let's do that quickly to remove some of the obstacles to merging important patches in time for OVS 2.9. BR, Jan Looking at the above, and going over the patches, I also agree undoing the commit is the right thing to do. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>From: Kevin Traynor [mailto:ktray...@redhat.com] >Sent: Thursday, December 7, 2017 3:29 PM >To: Jan Scheurich; Kavanagh, Mark B > ; Ilya Maximets ; Stokes, >Ian ; d...@openvswitch.org >Cc: maxime.coque...@redhat.com; Mooney, Sean K ; >Fischetti, Antonio ; Bie, Tiwei > ; Mcnamara, John ; Guoshuai Li > ; Loftus, Ciara ; Yuanhan Liu > >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > >On 12/07/2017 03:09 PM, Jan Scheurich wrote: I think the point that both yourself and Sean has made is completely >valid, which puts option a) back on the table. >>> >>> a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be >>> good in addition though. It is nicer that an OVS 2.9 user doesn't have >>> to know they can't use the latest DPDK in the guest. >>> >> >> Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by >running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the >host? >> >> If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK >applications in the guest. Guests should update their DPDK version if it >contains a critical bug. >> > >I don't think there is any documented coupling between host and guest >DPDK versions. I doubt anyone tests lots of combinations but hopefully >virtio provides the necessary means to run multiple combos. I think it's >reasonable in this case to document a warning for an OVS user about a >known bad combination that is likely to be selected (i.e. latest >upstream/releases). > >Kevin. > >> BR, Jan >> Hi Jan, DPDK v17.11 in the host is fine; the observed issue is present in the guest's virtio driver in DPDK v17.11 (which is not present in v17.05.2). As Kevin mentioned, I don't think there is any explicit coupling - or expectation of same - regarding the combination of DPDK versions used in guest and host; however, in this case I think it's certainly sensible to document the issue presented by the inclusion of 17.11 DPDK in the guest (at least when DPDK applications are used therein). Thanks, Mark >> ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Thursday, December 7, 2017 3:09 PM > To: Kevin Traynor; Kavanagh, Mark B > ; Ilya Maximets ; > Stokes, Ian ; d...@openvswitch.org > Cc: maxime.coque...@redhat.com; Mooney, Sean K > ; Fischetti, Antonio > ; Bie, Tiwei ; > Mcnamara, John ; Guoshuai Li > ; Loftus, Ciara ; Yuanhan Liu > > Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > > > > I think the point that both yourself and Sean has made is > completely valid, which puts option a) back on the table. > > > > > > > a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would > > be good in addition though. It is nicer that an OVS 2.9 user doesn't > > have to know they can't use the latest DPDK in the guest. > > > > Would the virtio PMD bug in DPDK 17.11 in the guest actually be > mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend > inside OVS on the host? [Mooney, Sean K] from talking to mark about this eairlier I don’t belive so. I think if you used 17.11 testpmd in the guest with kernel ovs you should get the same behavior e.g. it does not forward packet. The guest should not be able to know with certainty what vhost backend is in use on the host. > > If not, I would prefer if we decoupled the DPDK life cycle of OVS and > DPDK applications in the guest. Guests should update their DPDK version > if it contains a critical bug. > > BR, Jan > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
On 12/07/2017 03:09 PM, Jan Scheurich wrote: >>> I think the point that both yourself and Sean has made is completely valid, >>> which puts option a) back on the table. >>> >> >> a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be >> good in addition though. It is nicer that an OVS 2.9 user doesn't have >> to know they can't use the latest DPDK in the guest. >> > > Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by > running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the > host? > > If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK > applications in the guest. Guests should update their DPDK version if it > contains a critical bug. > I don't think there is any documented coupling between host and guest DPDK versions. I doubt anyone tests lots of combinations but hopefully virtio provides the necessary means to run multiple combos. I think it's reasonable in this case to document a warning for an OVS user about a known bad combination that is likely to be selected (i.e. latest upstream/releases). Kevin. > BR, Jan > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> > I think the point that both yourself and Sean has made is completely valid, > > which puts option a) back on the table. > > > > a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be > good in addition though. It is nicer that an OVS 2.9 user doesn't have > to know they can't use the latest DPDK in the guest. > Would the virtio PMD bug in DPDK 17.11 in the guest actually be mitigated by running DPDK 17.05 or a fixed 17.11.1 as vhostuser backend inside OVS on the host? If not, I would prefer if we decoupled the DPDK life cycle of OVS and DPDK applications in the guest. Guests should update their DPDK version if it contains a critical bug. BR, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
On 12/04/2017 11:15 AM, Mark Kavanagh wrote: > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index c145e1a..6c6df50 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -344,6 +344,21 @@ > > > > + + type='{"type": "boolean"}'> > + > + vHost IOMMU is a security feature, which restricts the vhost > memory very minor nit if you are still reworking anyway, there's some additional spaces above > + that a virtio device may access. vHost IOMMU support is disabled by > + default, due to a bug in QEMU implementations of the vhost > REPLY_ACK > + protocol, (on which vHost IOMMU relies) prior to v2.9.1. Setting > this > + value to true enables vHost IOMMU support for vHost > User > + Client ports in OvS-DPDK, starting from DPDK v17.11. > + > + > + Changing this value requires restarting the daemon. > + > + > + >type='{"type": "integer", "minInteger": 1}'> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
On 12/07/2017 12:27 PM, Kavanagh, Mark B wrote: >> From: Ilya Maximets [mailto:i.maxim...@samsung.com] >> Sent: Thursday, December 7, 2017 12:19 PM >> To: Kavanagh, Mark B; Stokes, Ian >> ; d...@openvswitch.org >> Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean K >> ; Fischetti, Antonio ; >> Bie, Tiwei ; Mcnamara, John ; >> Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara >> ; Yuanhan Liu >> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >> >> On 07.12.2017 14:32, Kavanagh, Mark B wrote: >>> Yuanhan's old email address was used in the previous mail - updated >> correctly here. >>> -Mark >>> -Original Message- From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of Kavanagh, Mark B Sent: Thursday, December 7, 2017 11:24 AM To: Stokes, Ian ; d...@openvswitch.org Cc: Liu, Yuanhan ; Bie, Tiwei ; Mcnamara, John ; maxime.coque...@redhat.com; i.maxim...@samsung.com Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support Hi folks, Thanks for all of your respective reviews and testing of this patchset. It seems, however, that an issue in DPDK v17.11 has come to light that >> affects guests which use testpmd. Specifically, traffic does not reach a guest when traffic is live prior to kicking off the testpmd app within said guest, and at least two forwarding cores are used. [1] This is explained in additional detail is [2], an extract of which is reproduced below: "the vector Rx could be broken if backend has consumed all the avail descs before the device is started. Because in current implementation, the vector Rx will return immediately without refilling the avail ring if the used ring is empty. So we have to refill the avail ring after flushing the elements in the used ring." This issue was initially uncovered by Antonio Fischetti, as part of the >> 17.11 patchset validation, and has since been localized to DPDK, rather than OvS. As a result, it seems now that we should not move to 17.11, but seek an >> out- of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - should we: a) move to 17.11 now, note the issue above in the 'errata' section of the documentation, and move to 17.11.1 when it becomes available in February of next year b) request the early release of the 17.11.1 stable branch, which incorporates a fix for this issue (the possibility, and availability, of >> which are both TBD). Thanks in advance, Mark >> >> Hmm. Isn't it a guest driver issue? Do we need to care so much about running >> OVS inside the VM? If I assumed right, if we're running OVS not inside the >> VM, >> there will be no issues on the OVS side. The issue happens if guest >> application >> uses DPDK 17.11, but it will happen regardless of the DPDK version on the >> host >> side. >> >> So, the only bad scenario is running OVS inside the VM with virtio-net PMD >> driver. >> My question is: Do we need to care about this scenario so much? >> If the answer is "not really" then we can use variant a). >> But if it's very important thing we need to support than b) will be >> preferable. >> >> Am I missing something? > > Hey Ilya, > > I've just had the exact same conversation with Sean Mooney locally. > > I think the point that both yourself and Sean has made is completely valid, > which puts option a) back on the table. > a) Sounds ok to me. I think an early DPDK17.11.1 before OVS 2.9 would be good in addition though. It is nicer that an OVS 2.9 user doesn't have to know they can't use the latest DPDK in the guest. Other than the outstanding documentation, the patches LGTM. > Antonio has agreed to test that this works; once he confirms I can push v5 of > the 17.11 patchet, and simply document the guest DPDK issue. > > Thanks again, > Mark > > >> >> Best regards, Ilya Maximets. >> >> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html > -Original Message- > From: Stokes, Ian > Sent: Thursday, December 7, 2017 8:43 AM > To: Kavanagh, Mark B ; d...@openvswitch.org > Cc: ktray...@redhat.com; maxime.coque...@redhat.com; >> i.maxim...@samsung.com; > jan.scheur...@ericsson.com; Mooney, Sean K > Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >
Re: [ovs-dev] [PATCH RFC 2/5] configure: Include -mprefetchwt1 explicitly.
On 05.12.2017 19:19, Bodireddy, Bhanuprakash wrote: > [...] >> int main() >> { >>int c; >> >>__builtin_prefetch(, 1, 1); >>c = 8; >> >>return c; >> } >> >> on my old Ivy Bridge i7-3770 CPU. It does not support even 'prefetchw': >> >> PREFETCHWT1 = false >> 3DNow! PREFETCH/PREFETCHW instructions = false >> >> Results: > > [Bhanu] I found https://gcc.godbolt.org/ the other day and its handy to > generate code for different targets and compilers. > >> $ gcc 1.c >> $ objdump -S ./a.out | grep prefetch -A2 -B2 >> 40055b: 31 c0 xor%eax,%eax >> 40055d: 48 8d 45 f4 lea-0xc(%rbp),%rax >> 400561: 0f 18 18prefetcht2 (%rax) >> 400564: c7 45 f4 08 00 00 00movl $0x8,-0xc(%rbp) >> 40056b: 8b 45 f4mov-0xc(%rbp),%eax > > [Bhanu] Expected and compiler generates prefetcht2. > >> >> $ gcc 1.c -march=native >> $ objdump -S ./a.out | grep prefetch -A2 -B2 >> 40055b: 31 c0 xor%eax,%eax >> 40055d: 48 8d 45 f4 lea-0xc(%rbp),%rax >> 400561: 0f 18 18prefetcht2 (%rax) >> 400564: c7 45 f4 08 00 00 00movl $0x8,-0xc(%rbp) >> 40056b: 8b 45 f4mov-0xc(%rbp),%eax > > [Bhanu] Though march=native is specified the processor doesn't have it and > still prefetchnt2 is generated by compiler. > >> $ gcc 1.c -march=native -mprefetchwt1 >> $ objdump -S ./a.out | grep prefetch -A2 -B2 >> 40055b: 31 c0 xor%eax,%eax >> 40055d: 48 8d 45 f4 lea-0xc(%rbp),%rax >> 400561: 0f 0d 10prefetchwt1 (%rax) >> 400564: c7 45 f4 08 00 00 00movl $0x8,-0xc(%rbp) >> 40056b: 8b 45 f4mov-0xc(%rbp),%eax > > [Bhanu] The compiler inserts prefetchwt1 instruction as we asked it to do. > >> >> So, it inserts this instruction even if I have no such instruction in CPU. > > [Bhanu] > Though the compiler generates this, as the instruction isn't available on the > processor it just become a multi byte NO-Operation(NOP). > On processors(Intel) that doesn't have prefetchw or 3D Now feature(AMD) it > decodes in to NOP. > http://ref.x86asm.net/coder64.html#x0F0D > - Click on '0D' in two-byte opcode index - (16. 0F0D NOP) >- More information on this can be found in Intel SW > developers manual (Combined Volumes) > >> More interesting is that program still works without any issues. >> I assume that CPU just skips that instruction or executes something else. > > [Bhanu] This is what is mostly expected. On processors that supports > prefetchwt1 it executes and others it just becomes a NOP. > >> >> So, it's really strange and it's unclear what CPU really executes in case >> where >> we have 'prefetchwt1' in code but not supported by CPU. > > [Bhanu] It’s decoded in to NOP may be by pipeline decoding units. > >> >> If CPU just skips this instruction we will lost all the prefetching >> optimizations >> because all the calls will be replaced by non-existent 'prefetchwt1'. > > [Bhanu] I would be worried if core generates an exception treating it as > illegal instruction. Instead pipeline units treat this as NOP if it doesn't > support it. > So the micro optimizations doesn't really do any thing on the processors that > doesn't support it. This could be an issue. If someday we'll have real performance optimization based on OPCH_HTW prefetch, we will have prefetchwt1 on system that supports it and NOP on others even if they have usual prefetchw which could provide performance improvement too. As I understand, checking of '-mprefetchwt1' is equal to checking compiler version. It doesn't check anything about supporting of this instruction in CPU. This could end up with non-working performance optimizations and even degradation on systems that supports usual prefetches but not prefetchwt1 (useless NOPs degrades performance if they are on a hot path). IMHO, This compiler option should be passed only if CPU really supports it. I guess, the maximum that we can do is add a note into performance optimization guide that '-mprefetchwt1' could be passed via CFLAGS if user sure that it supported by target CPU. > >> >> How can we be sure that 'prefetchwt1' was really executed? > > [Bhanu] I don’t know how we can see this unless we can peek in to Instruction > queues & Decoders of the pipeline :(. > > - Bhanuprakash. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH RFC 5/5] dpif-netdev: Prefetch the cacheline having the cycle stats.
On 05.12.2017 18:11, Bodireddy, Bhanuprakash wrote: >> >>> Prefetch the cacheline having the cycle stats so that we can speed up >>> the cycles_count_start() and cycles_count_intermediate(). >> >> Do you have any performance results? > > I don’t have nos. for this patch alone. I was testing the overall throughput > along with other patches (that were *not* part of this RFC series) to verify > performance improvements. I will include in commit log when I do for > individual patches. > > BTW, I usually look at the % of total instructions getting retired, cycles > spent in front and back-end for the functions to see if prefetching does > improve/degrade performance. > > - Bhanuprakash. > >> >>> >>> Signed-off-by: Bhanuprakash Bodireddy >> intel.com> >>> --- >>> lib/dpif-netdev.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>> b74b5d7..ab13d83 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread { >>> struct ovs_mutex flow_mutex; >>> /* 8 pad bytes. */ >>> ); >>> -PADDED_MEMBERS(CACHE_LINE_SIZE, >>> +PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, >> cachelineC, >>> struct cmap flow_table OVS_GUARDED; /* Flow table. */ >>> >>> /* One classifier per in_port polled by the pmd */ @@ -4082,6 >>> +4082,7 @@ reload: >>> lc = UINT_MAX; >>> } >>> >>> +OVS_PREFETCH_CACHE(>cachelineC, OPCH_HTW); How does prefetch just before the infinite loop should improve performance? I didn't test that, but IMHO, this should have zero impact. >>> cycles_count_start(pmd); >>> for (;;) { >>> for (i = 0; i < poll_cnt; i++) { >>> -- >>> 2.4.11 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.
> -Original Message- > From: Ilya Maximets [mailto:i.maxim...@samsung.com] > Sent: Thursday, 07 December, 2017 14:28 > >> This is preparation for 'struct dp_netdev_pmd_thread' modification > >> in upcoming commits. Needed to avoid reordering and regrouping while > >> replacing old and adding new members. > >> > > Should this be part of the TX batching set? Anyway, I'm ok if it's not > > stalling the approval :) > > Unfortunately yes, because members reordered and regrouped just to include > new members: pmd->ctx and pmd->n_output_batches. This could not be a > standalone > change because adding of different members will require different regrouping/ > reordering. I moved this change to a separate patch to not do this twice while > adding each member in patches 2/7 and 6/7. > > Anyway, as I mentioned in cover letter, I still prefer reverting of the > padding > at all by this patch: > https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html > I think we should not spent time designing and reviewing these kind of patches that are made necessary by the introduction of commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941. As far as I can see there was never a single review on the original patch: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339402.html I wonder how it got merged into master in the first place. I strongly support Ilya's revert patch for that commit: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html Let's do that quickly to remove some of the obstacles to merging important patches in time for OVS 2.9. BR, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.
Thanks for review, comments inline. On 07.12.2017 15:49, Eelco Chaudron wrote: > On 01/12/17 16:44, Ilya Maximets wrote: >> This is preparation for 'struct dp_netdev_pmd_thread' modification >> in upcoming commits. Needed to avoid reordering and regrouping while >> replacing old and adding new members. >> > Should this be part of the TX batching set? Anyway, I'm ok if it's not > stalling the approval :) Unfortunately yes, because members reordered and regrouped just to include new members: pmd->ctx and pmd->n_output_batches. This could not be a standalone change because adding of different members will require different regrouping/ reordering. I moved this change to a separate patch to not do this twice while adding each member in patches 2/7 and 6/7. Anyway, as I mentioned in cover letter, I still prefer reverting of the padding at all by this patch: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html > > See some comments below on the use of remaining padding... >> Signed-off-by: Bhanuprakash Bodireddy>> Co-authored-by: Bhanuprakash Bodireddy >> Signed-off-by: Ilya Maximets >> --- >> lib/dpif-netdev.c | 65 >> ++- >> 1 file changed, 40 insertions(+), 25 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 0a62630..7a7c6ce 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -549,29 +549,22 @@ struct tx_port { >> struct dp_netdev_pmd_thread { >> PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, >> struct dp_netdev *dp; >> - struct cmap_node node; /* In 'dp->poll_threads'. */ >> - pthread_cond_t cond; /* For synchronizing pmd thread >> - reload. */ >> + struct cmap_node node; /* In 'dp->poll_threads'. */ >> + pthread_cond_t cond; /* For synchronizing pmd thread reload. >> */ >> ); >> PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, >> struct ovs_mutex cond_mutex; /* Mutex for condition variable. */ >> pthread_t thread; >> - unsigned core_id; /* CPU core id of this pmd thread. >> */ >> - int numa_id; /* numa node id of this pmd thread. >> */ >> ); >> /* Per thread exact-match cache. Note, the instance for cpu core >> * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly >> * need to be protected by 'non_pmd_mutex'. Every other instance >> * will only be accessed by its own pmd thread. */ >> - OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache; >> - struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. >> */ >> - >> - /* Queue id used by this pmd thread to send packets on all netdevs if >> - * XPS disabled for this netdev. All static_tx_qid's are unique and less >> - * than 'cmap_count(dp->poll_threads)'. */ >> - uint32_t static_tx_qid; >> + PADDED_MEMBERS(CACHE_LINE_SIZE, >> + OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache; >> + ); >> /* Flow-Table and classifiers >> * >> @@ -579,7 +572,10 @@ struct dp_netdev_pmd_thread { >> * changes to 'classifiers' must be made while still holding the >> * 'flow_mutex'. >> */ >> - struct ovs_mutex flow_mutex; >> + PADDED_MEMBERS(CACHE_LINE_SIZE, >> + struct ovs_mutex flow_mutex; >> + /* 8 pad bytes. */ > Do we really want to add pad bytes left in this structure? They can easily be > wrong is something changes elsewhere? > Guessing from some the differences you might have other patches applied? > Using the pahole tool I think the 8 here should be 16? > > union { > struct { > struct ovs_mutex flow_mutex; /* 0 48 */ > }; /* 48 */ > uint8_t pad12[64]; /* 64 */ > }; /* 0 64 */ I left 8 bytes here because we may have different size of ovs_mutex on different systems. The biggest value I saw was 56 bytes on my ARMv8 platform. > >> + ); >> PADDED_MEMBERS(CACHE_LINE_SIZE, >> struct cmap flow_table OVS_GUARDED; /* Flow table. */ >> @@ -596,35 +592,50 @@ struct dp_netdev_pmd_thread { >> /* Used to count cycles. See 'cycles_counter_end()'. */ >> unsigned long long last_cycles; >> - struct latch exit_latch; /* For terminating the pmd thread. >> */ >> - ); >> + /* 8 pad bytes. */ >> + ); >> PADDED_MEMBERS(CACHE_LINE_SIZE, >> /* Statistics. */ >> struct dp_netdev_pmd_stats stats; >> + /* 8 pad bytes. */ > Should this not be 24? Yes, you're right. It'll be 8 after applying of the last patch of the series. I can keep 24 in this
Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.
Hi All, I reviewed the code for this V6, however I did not do the testing as before as my setup is torn down at the moment. Some small comments on "[PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension." see reply on that email. Other than that I would like to ack the series: Acked-by: Eelco ChaudronGuess the only thing missing is proper documentation for this change. Thanks, Eelco On 01/12/17 16:44, Ilya Maximets wrote: This patch-set inspired by [1] from Bhanuprakash Bodireddy. Implementation of [1] looks very complex and introduces many pitfalls [2] for later code modifications like possible packet stucks. This version targeted to make simple and flexible output packet batching on higher level without introducing and even simplifying netdev layer. Basic testing of 'PVP with OVS bonding on phy ports' scenario shows significant performance improvement. Test results for time-based batching for v3: https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html Test results for v4: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339624.html [1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet transmission. https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html [2] For example: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html Version 6: * Rebased on current master: - Added new patch to refactor dp_netdev_pmd_thread structure according to following suggestion: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html NOTE: I still prefer reverting of the padding related patch. Rebase done to not block acepting of this series. Revert patch and discussion here: https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341153.html * Added comment about pmd_thread_ctx_time_update() usage. Version 5: * pmd_thread_ctx_time_update() calls moved to different places to call them only from dp_netdev_process_rxq_port() and main polling functions: pmd_thread_main, dpif_netdev_run and dpif_netdev_execute. All other functions should use cached time from pmd->ctx.now. It's guaranteed to be updated at least once per polling cycle. * 'may_steal' patch returned to version from v3 because 'may_steal' in qos is a completely different variable. This patch only removes 'may_steal' from netdev API. * 2 more usec functions added to timeval to have complete public API. * Checking of 'output_cnt' turned to assertion. Version 4: * Rebased on current master. * Rebased on top of "Keep latest measured time for PMD thread." (Jan Scheurich) * Microsecond resolution related patches integrated. * Time-based batching without RFC tag. * 'output_time' renamed to 'flush_time'. (Jan Scheurich) * 'flush_time' update moved to 'dp_netdev_pmd_flush_output_on_port'. (Jan Scheurich) * 'output-max-latency' renamed to 'tx-flush-interval'. * Added patch for output batching statistics. Version 3: * Rebased on current master. * Time based RFC: fixed assert on n_output_batches <= 0. Version 2: * Rebased on current master. * Added time based batching RFC patch. * Fixed mixing packets with different sources in same batch. Ilya Maximets (7): dpif-netdev: Refactor PMD thread structure for further extension. dpif-netdev: Keep latest measured time for PMD thread. dpif-netdev: Output packet batching. netdev: Remove unused may_steal. netdev: Remove useless cutlen. dpif-netdev: Time based output batching. dpif-netdev: Count sent packets and batches. lib/dpif-netdev.c | 412 +- lib/netdev-bsd.c | 6 +- lib/netdev-dpdk.c | 30 ++-- lib/netdev-dummy.c| 6 +- lib/netdev-linux.c| 8 +- lib/netdev-provider.h | 7 +- lib/netdev.c | 12 +- lib/netdev.h | 2 +- vswitchd/vswitch.xml | 16 ++ 9 files changed, 349 insertions(+), 150 deletions(-) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 1/7] dpif-netdev: Refactor PMD thread structure for further extension.
On 01/12/17 16:44, Ilya Maximets wrote: This is preparation for 'struct dp_netdev_pmd_thread' modification in upcoming commits. Needed to avoid reordering and regrouping while replacing old and adding new members. Should this be part of the TX batching set? Anyway, I'm ok if it's not stalling the approval :) See some comments below on the use of remaining padding... Signed-off-by: Bhanuprakash BodireddyCo-authored-by: Bhanuprakash Bodireddy Signed-off-by: Ilya Maximets --- lib/dpif-netdev.c | 65 ++- 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630..7a7c6ce 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -549,29 +549,22 @@ struct tx_port { struct dp_netdev_pmd_thread { PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, struct dp_netdev *dp; -struct cmap_node node; /* In 'dp->poll_threads'. */ -pthread_cond_t cond;/* For synchronizing pmd thread - reload. */ +struct cmap_node node; /* In 'dp->poll_threads'. */ +pthread_cond_t cond; /* For synchronizing pmd thread reload. */ ); PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, struct ovs_mutex cond_mutex;/* Mutex for condition variable. */ pthread_t thread; -unsigned core_id; /* CPU core id of this pmd thread. */ -int numa_id;/* numa node id of this pmd thread. */ ); /* Per thread exact-match cache. Note, the instance for cpu core * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly * need to be protected by 'non_pmd_mutex'. Every other instance * will only be accessed by its own pmd thread. */ -OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache; -struct ovs_refcount ref_cnt;/* Every reference must be refcount'ed. */ - -/* Queue id used by this pmd thread to send packets on all netdevs if - * XPS disabled for this netdev. All static_tx_qid's are unique and less - * than 'cmap_count(dp->poll_threads)'. */ -uint32_t static_tx_qid; +PADDED_MEMBERS(CACHE_LINE_SIZE, +OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache; +); /* Flow-Table and classifiers * @@ -579,7 +572,10 @@ struct dp_netdev_pmd_thread { * changes to 'classifiers' must be made while still holding the * 'flow_mutex'. */ -struct ovs_mutex flow_mutex; +PADDED_MEMBERS(CACHE_LINE_SIZE, +struct ovs_mutex flow_mutex; +/* 8 pad bytes. */ Do we really want to add pad bytes left in this structure? They can easily be wrong is something changes elsewhere? Guessing from some the differences you might have other patches applied? Using the pahole tool I think the 8 here should be 16? union { struct { struct ovs_mutex flow_mutex; /* 0 48 */ }; /* 48 */ uint8_t pad12[64]; /* 64 */ }; /* 0 64 */ +); PADDED_MEMBERS(CACHE_LINE_SIZE, struct cmap flow_table OVS_GUARDED; /* Flow table. */ @@ -596,35 +592,50 @@ struct dp_netdev_pmd_thread { /* Used to count cycles. See 'cycles_counter_end()'. */ unsigned long long last_cycles; -struct latch exit_latch;/* For terminating the pmd thread. */ -); +/* 8 pad bytes. */ +); PADDED_MEMBERS(CACHE_LINE_SIZE, /* Statistics. */ struct dp_netdev_pmd_stats stats; +/* 8 pad bytes. */ Should this not be 24? union { struct { struct dp_netdev_pmd_stats stats; /* 0 40 */ }; /* 40 */ uint8_t pad14[64]; /* 64 */ }; /* 0 64 */ +); +PADDED_MEMBERS(CACHE_LINE_SIZE, +struct latch exit_latch; /* For terminating the pmd thread. */ struct seq *reload_seq; uint64_t last_reload_seq; -atomic_bool reload; /* Do we need to reload ports? */ -bool isolated; - +atomic_bool reload; /* Do we need to reload ports? */ /* Set to true if the pmd thread needs to be reloaded. */ bool need_reload; -/* 5 pad bytes. */ +bool isolated; + +struct ovs_refcount ref_cnt; /* Every reference must be refcount'ed. */ + +/* Queue id used by this pmd thread to send packets on all netdevs if + * XPS disabled for this netdev. All static_tx_qid's are unique and + * less than 'cmap_count(dp->poll_threads)'. */ +uint32_t static_tx_qid; + +
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
I just checked there's no issue if we use dpdk v17.11 on the host and dpdk 17.05.2 in the guest. Antonio > -Original Message- > From: Kavanagh, Mark B > Sent: Thursday, December 7, 2017 12:28 PM > To: Ilya Maximets; Stokes, Ian > ; d...@openvswitch.org > Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean > K ; Fischetti, Antonio > ; Bie, Tiwei ; > Mcnamara, John ; Guoshuai Li > ; ktray...@redhat.com; Loftus, Ciara > ; Yuanhan Liu > Subject: RE: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > > >From: Ilya Maximets [mailto:i.maxim...@samsung.com] > >Sent: Thursday, December 7, 2017 12:19 PM > >To: Kavanagh, Mark B ; Stokes, Ian > > ; d...@openvswitch.org > >Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, > Sean K > > ; Fischetti, Antonio > ; > >Bie, Tiwei ; Mcnamara, John > ; > >Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara > > ; Yuanhan Liu > >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > > > >On 07.12.2017 14:32, Kavanagh, Mark B wrote: > >> Yuanhan's old email address was used in the previous mail - updated > >correctly here. > >> -Mark > >> > >>> -Original Message- > >>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > >boun...@openvswitch.org] > >>> On Behalf Of Kavanagh, Mark B > >>> Sent: Thursday, December 7, 2017 11:24 AM > >>> To: Stokes, Ian ; d...@openvswitch.org > >>> Cc: Liu, Yuanhan ; Bie, Tiwei > ; > >>> Mcnamara, John ; > maxime.coque...@redhat.com; > >>> i.maxim...@samsung.com > >>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU > support > >>> > >>> Hi folks, > >>> > >>> Thanks for all of your respective reviews and testing of this > patchset. > >>> > >>> It seems, however, that an issue in DPDK v17.11 has come to light > that > >affects > >>> guests which use testpmd. > >>> Specifically, traffic does not reach a guest when traffic is live > prior to > >>> kicking off the testpmd app within said guest, and at least two > forwarding > >>> cores are used. [1] > >>> > >>> This is explained in additional detail is [2], an extract of which > is > >>> reproduced below: > >>> > >>> "the vector Rx could be broken if backend has consumed all the > avail > >>> descs before the > >>>device is started. Because in current implementation, the vector > Rx will > >>> return immediately > >>>without refilling the avail ring if the used ring is empty. So we > have > >>> to refill > >>>the avail ring after flushing the elements in the used ring." > >>> > >>> This issue was initially uncovered by Antonio Fischetti, as part of > the > >17.11 > >>> patchset validation, and has since been localized to DPDK, rather > than OvS. > >>> As a result, it seems now that we should not move to 17.11, but seek > an > >out- > >>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on > this - > >>> should we: > >>> a) move to 17.11 now, note the issue above in the 'errata' section > of the > >>> documentation, and move to 17.11.1 when it becomes available in > February of > >>> next year > >>> b) request the early release of the 17.11.1 stable branch, which > >>> incorporates a fix for this issue (the possibility, and > availability, of > >which > >>> are both TBD). > >>> > >>> Thanks in advance, > >>> Mark > > > >Hmm. Isn't it a guest driver issue? Do we need to care so much about > running > >OVS inside the VM? If I assumed right, if we're running OVS not inside > the VM, > >there will be no issues on the OVS side. The issue happens if guest > >application > >uses DPDK 17.11, but it will happen regardless of the DPDK version on > the host > >side. > > > >So, the only bad scenario is running OVS inside the VM with virtio-net > PMD > >driver. > >My question is: Do we need to care about this scenario so much? > >If the answer is "not really" then we can use variant a). > >But if it's very important thing we need to support than b) will be > >preferable. > > > >Am I missing something? > > Hey Ilya, > > I've just had the exact same conversation with Sean Mooney locally. > > I think the point that both yourself and Sean has made is completely > valid, which puts option a) back on the table. > > Antonio has agreed to test that this works; once he confirms I can push > v5 of the 17.11 patchet, and simply document the guest DPDK issue. > > Thanks again, > Mark > > > > > >Best regards, Ilya Maximets. > > > > > >>> > >>> [1]
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>From: Ilya Maximets [mailto:i.maxim...@samsung.com] >Sent: Thursday, December 7, 2017 12:19 PM >To: Kavanagh, Mark B; Stokes, Ian > ; d...@openvswitch.org >Cc: maxime.coque...@redhat.com; jan.scheur...@ericsson.com; Mooney, Sean K > ; Fischetti, Antonio ; >Bie, Tiwei ; Mcnamara, John ; >Guoshuai Li ; ktray...@redhat.com; Loftus, Ciara > ; Yuanhan Liu >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > >On 07.12.2017 14:32, Kavanagh, Mark B wrote: >> Yuanhan's old email address was used in the previous mail - updated >correctly here. >> -Mark >> >>> -Original Message- >>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >boun...@openvswitch.org] >>> On Behalf Of Kavanagh, Mark B >>> Sent: Thursday, December 7, 2017 11:24 AM >>> To: Stokes, Ian ; d...@openvswitch.org >>> Cc: Liu, Yuanhan ; Bie, Tiwei ; >>> Mcnamara, John ; maxime.coque...@redhat.com; >>> i.maxim...@samsung.com >>> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >>> >>> Hi folks, >>> >>> Thanks for all of your respective reviews and testing of this patchset. >>> >>> It seems, however, that an issue in DPDK v17.11 has come to light that >affects >>> guests which use testpmd. >>> Specifically, traffic does not reach a guest when traffic is live prior to >>> kicking off the testpmd app within said guest, and at least two forwarding >>> cores are used. [1] >>> >>> This is explained in additional detail is [2], an extract of which is >>> reproduced below: >>> >>> "the vector Rx could be broken if backend has consumed all the avail >>> descs before the >>> device is started. Because in current implementation, the vector Rx >>> will >>> return immediately >>> without refilling the avail ring if the used ring is empty. So we have >>> to refill >>> the avail ring after flushing the elements in the used ring." >>> >>> This issue was initially uncovered by Antonio Fischetti, as part of the >17.11 >>> patchset validation, and has since been localized to DPDK, rather than OvS. >>> As a result, it seems now that we should not move to 17.11, but seek an >out- >>> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - >>> should we: >>> a) move to 17.11 now, note the issue above in the 'errata' section of >>> the >>> documentation, and move to 17.11.1 when it becomes available in February of >>> next year >>> b) request the early release of the 17.11.1 stable branch, which >>> incorporates a fix for this issue (the possibility, and availability, of >which >>> are both TBD). >>> >>> Thanks in advance, >>> Mark > >Hmm. Isn't it a guest driver issue? Do we need to care so much about running >OVS inside the VM? If I assumed right, if we're running OVS not inside the VM, >there will be no issues on the OVS side. The issue happens if guest >application >uses DPDK 17.11, but it will happen regardless of the DPDK version on the host >side. > >So, the only bad scenario is running OVS inside the VM with virtio-net PMD >driver. >My question is: Do we need to care about this scenario so much? >If the answer is "not really" then we can use variant a). >But if it's very important thing we need to support than b) will be >preferable. > >Am I missing something? Hey Ilya, I've just had the exact same conversation with Sean Mooney locally. I think the point that both yourself and Sean has made is completely valid, which puts option a) back on the table. Antonio has agreed to test that this works; once he confirms I can push v5 of the 17.11 patchet, and simply document the guest DPDK issue. Thanks again, Mark > >Best regards, Ilya Maximets. > > >>> >>> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html >>> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html >>> >>> -Original Message- From: Stokes, Ian Sent: Thursday, December 7, 2017 8:43 AM To: Kavanagh, Mark B ; d...@openvswitch.org Cc: ktray...@redhat.com; maxime.coque...@redhat.com; >i.maxim...@samsung.com; jan.scheur...@ericsson.com; Mooney, Sean K Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > DPDK v17.11 introduces support for the vHost IOMMU feature. > This is a security feature, which restricts the vhost memory that a >virtio > device may access. Hi all, There were a few requests for changes in the v4 of this patch and it's an important aspect of the DPDK 17.11 support for OVS. I'd like to get this series in to the pull request this week. If people >have flagged issues for the latest revision I'd appreciate
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
On 07.12.2017 14:32, Kavanagh, Mark B wrote: > Yuanhan's old email address was used in the previous mail - updated correctly > here. > -Mark > >> -Original Message- >> From: ovs-dev-boun...@openvswitch.org >> [mailto:ovs-dev-boun...@openvswitch.org] >> On Behalf Of Kavanagh, Mark B >> Sent: Thursday, December 7, 2017 11:24 AM >> To: Stokes, Ian; d...@openvswitch.org >> Cc: Liu, Yuanhan ; Bie, Tiwei ; >> Mcnamara, John ; maxime.coque...@redhat.com; >> i.maxim...@samsung.com >> Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >> >> Hi folks, >> >> Thanks for all of your respective reviews and testing of this patchset. >> >> It seems, however, that an issue in DPDK v17.11 has come to light that >> affects >> guests which use testpmd. >> Specifically, traffic does not reach a guest when traffic is live prior to >> kicking off the testpmd app within said guest, and at least two forwarding >> cores are used. [1] >> >> This is explained in additional detail is [2], an extract of which is >> reproduced below: >> >> "the vector Rx could be broken if backend has consumed all the avail >> descs before the >> device is started. Because in current implementation, the vector Rx >> will >> return immediately >> without refilling the avail ring if the used ring is empty. So we have >> to refill >> the avail ring after flushing the elements in the used ring." >> >> This issue was initially uncovered by Antonio Fischetti, as part of the 17.11 >> patchset validation, and has since been localized to DPDK, rather than OvS. >> As a result, it seems now that we should not move to 17.11, but seek an out- >> of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - >> should we: >> a) move to 17.11 now, note the issue above in the 'errata' section of >> the >> documentation, and move to 17.11.1 when it becomes available in February of >> next year >> b) request the early release of the 17.11.1 stable branch, which >> incorporates a fix for this issue (the possibility, and availability, of >> which >> are both TBD). >> >> Thanks in advance, >> Mark Hmm. Isn't it a guest driver issue? Do we need to care so much about running OVS inside the VM? If I assumed right, if we're running OVS not inside the VM, there will be no issues on the OVS side. The issue happens if guest application uses DPDK 17.11, but it will happen regardless of the DPDK version on the host side. So, the only bad scenario is running OVS inside the VM with virtio-net PMD driver. My question is: Do we need to care about this scenario so much? If the answer is "not really" then we can use variant a). But if it's very important thing we need to support than b) will be preferable. Am I missing something? Best regards, Ilya Maximets. >> >> [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html >> [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html >> >> >>> -Original Message- >>> From: Stokes, Ian >>> Sent: Thursday, December 7, 2017 8:43 AM >>> To: Kavanagh, Mark B ; d...@openvswitch.org >>> Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com; >>> jan.scheur...@ericsson.com; Mooney, Sean K >>> Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >>> DPDK v17.11 introduces support for the vHost IOMMU feature. This is a security feature, which restricts the vhost memory that a virtio device may access. >>> >>> Hi all, >>> >>> There were a few requests for changes in the v4 of this patch and it's an >>> important aspect of the DPDK 17.11 support for OVS. >>> >>> I'd like to get this series in to the pull request this week. If people have >>> flagged issues for the latest revision I'd appreciate if they could review >> the >>> latest revision and flag any new issues that need to be raised. >>> >>> Ian >>> This feature also enables the vhost REPLY_ACK protocol, the implementation of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature is disabled by default in (and should remain so), for the aforementioned older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, even without having an IOMMU device, with no performance penalty. This patch adds a new global config option, vhost-iommu-support, that controls enablement of the vhost IOMMU feature: ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true This value defaults to false; to enable IOMMU support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires
Re: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist DPDK physical ports.
Hi Sugesh, > -Original Message- > From: Chandran, Sugesh > Sent: Wednesday, December 6, 2017 6:23 PM > To: O Mahony, Billy; d...@openvswitch.org; > b...@ovn.org > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to whitelist > DPDK physical ports. > > Thank you Billy for the review. > Please find below my reply. > > Regards > _Sugesh > > > > -Original Message- > > From: O Mahony, Billy > > Sent: Wednesday, December 6, 2017 5:31 PM > > To: Chandran, Sugesh ; d...@openvswitch.org; > > b...@ovn.org > > Subject: RE: [ovs-dev] [PATCH 2/2] Adding configuration option to > > whitelist DPDK physical ports. > > > > Hi Sugesh, > > > > This is definitely a very useful feature. I'm looking forward to > > running trex on the same DUT as my ovs-dpdk. > > > > However I'd suggest adding an sscanf or some such to verify that the > > domain is also specified for each whitelist member. And either add the > > default of '' or complain loudly if the domain is absent. > [Sugesh] Will throw an error in that case then . > > > > > Currently (without this patch) you must specify the domain when adding > > ports: > >Vsctl add-port ... options:dpdk-devargs=:05:00.0 Or else an > > error such as 'Cannot find unplugged device (05:00.0)' is reported. > > > > And with the patch if you include the domain in the other_config (e.g. > > other_config:dpdk-whitelist-pci-ids=":05:00.0") everything works > > just as before. > > > > However with the patch if you add the whitelist *without* a domain e.g. > > ovs-vsctl --no-wait set Open_vSwitch . > > other_config:dpdk-whitelist-pci- ids="05:00.0" > > > > There is no immediate error. However later when doing add-port if you > > include the domain (current required practice) you will get an error. > > If you omit the domain all is well. > [Sugesh] It looks to me, the dpdk-devargs need the PCI id with the ''. > But to bind and PCI scan its not necessary. > So to keep it consistent, I would add check for PCI-ID in whitelist config > too, and > throw error incase pci-id are mentioned wrong(means without ''. > Does it looks OK to you? [[BO'M]] I think the error is the right thing to do. It would be tempting to insert the default '' if the domain is omitted but then you would have a confusing inconsistency in that it would be ok to omit the domain in one place (whitelist) but not in the other (add-port). > > > > It's a little bit strange as regardless of domain or no domain in the > > other_config the PCI probe always reports the NIC as expected: > > 2017-12-06T16:55:27Z|00013|dpdk|INFO|EAL: PCI device :05:00.0 > > on NUMA socket -1 > > 2017-12-06T16:55:27Z|00014|dpdk|WARN|EAL: Invalid NUMA socket, > > default to 0 > > 2017-12-06T16:55:27Z|00015|dpdk|INFO|EAL: probe driver: 8086:1572 > > net_i40e > > > > I'll be using the other patch in this series "isolate rte-mempool > > allocation" over the next few days so I'll review that in due course. > > > > Thanks, > > Billy. > > > > > -Original Message- > > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > > boun...@openvswitch.org] On Behalf Of Sugesh Chandran > > > Sent: Friday, November 10, 2017 1:29 AM > > > To: d...@openvswitch.org; b...@ovn.org > > > Subject: [ovs-dev] [PATCH 2/2] Adding configuration option to > > > whitelist DPDK physical ports. > > > > > > Adding a OVS configuration option to whitelist DPDK physical ports. > > > By default running multiple instances of DPDK on a single platform > > > cannot use physical ports at the same time even though they are distinct. > > > > > > The eal init scans all the ports that are bound to DPDK and > > > initialize the drivers accordingly. This happens for every DPDK process > > > init. > > > On a multi instance deployment usecase, it causes issues for using > > > physical NIC ports. > > > Consider a two DPDK process that are running on a single platform, > > > the second DPDK primary process will try to initialize the drivers > > > for all the physical ports even though it may be used in first DPDK > > > process. > > > > > > To avoid this situation user can whitelist the ports for each DPDK > > > application. > > > Whitelisting of ports/PCI-ID in a DPDK process will limit the > > > eal-init only on those ports. > > > > > > To whitelist two physical ports ":06:00.0" and ":06:00.1", > > > the configuration option in OVS would be > > > ovs-vsctl set Open_vSwitch . other_config:dpdk-whitelist-pci- > > > ids=":06:00.0,:06:00.1" > > > > > > To update the whitelist ports, OVS daemon has to be restarted. > > > > > > Signed-off-by: Sugesh Chandran > > > --- > > > lib/dpdk.c | 29 + > > > vswitchd/vswitch.xml | 21 + > > > 2 files changed, 50 insertions(+) > > > > > > diff --git a/lib/dpdk.c b/lib/dpdk.c index 9d187c7..0f11977 100644 > > > ---
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> Hi folks, > > Thanks for all of your respective reviews and testing of this patchset. > > It seems, however, that an issue in DPDK v17.11 has come to light that > affects guests which use testpmd. > Specifically, traffic does not reach a guest when traffic is live prior to > kicking off the testpmd app within said guest, and at least two forwarding > cores are used. [1] > > This is explained in additional detail is [2], an extract of which is > reproduced below: > > "the vector Rx could be broken if backend has consumed all the avail > descs before the >device is started. Because in current implementation, the vector Rx > will return immediately >without refilling the avail ring if the used ring is empty. So we > have to refill >the avail ring after flushing the elements in the used ring." > Thanks for flagging this Mark. > This issue was initially uncovered by Antonio Fischetti, as part of the > 17.11 patchset validation, and has since been localized to DPDK, rather > than OvS. > As a result, it seems now that we should not move to 17.11, but seek an > out-of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on > this - should we: > a) move to 17.11 now, note the issue above in the 'errata' section of > the documentation, and move to 17.11.1 when it becomes available in > February of next year I'm not sure this would work, this seems like a pretty common use case, it would be quite an important errata for users to be aware of. > b) request the early release of the 17.11.1 stable branch, which > incorporates a fix for this issue (the possibility, and availability, of > which are both TBD). +1 for the above option. I'd like to see this issue resolved before moving but I'm open to others input. Thanks Ian > > Thanks in advance, > Mark > > [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html > [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html > > > >-Original Message- > >From: Stokes, Ian > >Sent: Thursday, December 7, 2017 8:43 AM > >To: Kavanagh, Mark B; d...@openvswitch.org > >Cc: ktray...@redhat.com; maxime.coque...@redhat.com; > i.maxim...@samsung.com; > >jan.scheur...@ericsson.com; Mooney, Sean K > >Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > > > >> DPDK v17.11 introduces support for the vHost IOMMU feature. > >> This is a security feature, which restricts the vhost memory that a > virtio > >> device may access. > > > >Hi all, > > > >There were a few requests for changes in the v4 of this patch and it's an > >important aspect of the DPDK 17.11 support for OVS. > > > >I'd like to get this series in to the pull request this week. If people > have > >flagged issues for the latest revision I'd appreciate if they could > review the > >latest revision and flag any new issues that need to be raised. > > > >Ian > > > >> > >> This feature also enables the vhost REPLY_ACK protocol, the > implementation > >> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but > is > >> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the > feature > >> is disabled by default in (and should remain so), for the > aforementioned > >> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can > >> safely be enabled, even without having an IOMMU device, with no > >> performance penalty. > >> > >> This patch adds a new global config option, vhost-iommu-support, that > >> controls enablement of the vhost IOMMU feature: > >> > >> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > >> > >> This value defaults to false; to enable IOMMU support, this field > should > >> be set to true when setting other global parameters on init (such as > >> "dpdk-socket-mem", for example). Changing the value at runtime is not > >> supported, and requires restarting the vswitch daemon. > >> > >> Signed-off-by: Mark Kavanagh > >> > >> --- > >> > >> v4: - rebase to HEAD of master > >> - clarify that IOMMU support applies exclusively to vhost user > >>client ports. > >> - reword caveats regarding modifying vhost-iommu-support at > runtime > >> - use smap_get_bool() to parse vhost-iommu-support, instead of > >>process_vhost_flags(). > >> - add stub implementation of dpdk_vhost_iommu_enabled(). > >> - move stdbool.h #include outside of DPDK compiler guards. > >> - remove mention of vhost IOMMU mode from > >>netdev_dpdk_vhost_client_configure() log. > >> > >> v3: - erroneous; disregard. > >> > >> v2: - rebase to HEAD of master > >> - refactor vHost IOMMU enablement mechanism (use a global > >>config option, instead of the previous per-port approach). > >> --- > >> Documentation/topics/dpdk/vhost-user.rst | 27 > +++ > >> NEWS | 1 + > >> lib/dpdk-stub.c
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
>From: Ilya Maximets [mailto:i.maxim...@samsung.com] >Sent: Thursday, December 7, 2017 9:49 AM >To: Kavanagh, Mark B; d...@openvswitch.org >Cc: ktray...@redhat.com; maxime.coque...@redhat.com; >jan.scheur...@ericsson.com; Mooney, Sean K ; Stokes, >Ian >Subject: Re: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > >Looks good to me in general, but I prefer if you'll fix following >checkpatch warnings: Thanks for your comments Ilya - I'll address same in the next version of this patchset, depending on the outcome of this: https://mail.openvswitch.org/pipermail/ovs-dev/2017-December/341701.html Best regards, Mark > >== Checking 2e9587f27add ("netdev-dpdk: vHost IOMMU support") == >WARNING: Line length is >79-characters long >#52 FILE: Documentation/topics/dpdk/vhost-user.rst:280: >can access, and as such is useful in deployments in which security is a >concern. > >WARNING: Line length is >79-characters long >#54 FILE: Documentation/topics/dpdk/vhost-user.rst:282: >IOMMU support may be enabled via a global config value, ```vhost-iommu- >support```. > >WARNING: Line length is >79-characters long >#71 FILE: Documentation/topics/dpdk/vhost-user.rst:299: >default (and should remain so if using the aforementioned versions of >QEMU). > >One more comment inline. > >Best regards, Ilya Maximets. > >On 04.12.2017 14:15, Mark Kavanagh wrote: >> DPDK v17.11 introduces support for the vHost IOMMU feature. >> This is a security feature, which restricts the vhost memory >> that a virtio device may access. >> >> This feature also enables the vhost REPLY_ACK protocol, the >> implementation of which is known to work in newer versions of >> QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - >> v2.9.0, inclusive). As such, the feature is disabled by default >> in (and should remain so), for the aforementioned older QEMU >> verions. Starting with QEMU v2.9.1, vhost-iommu-support can >> safely be enabled, even without having an IOMMU device, with >> no performance penalty. >> >> This patch adds a new global config option, vhost-iommu-support, >> that controls enablement of the vhost IOMMU feature: >> >> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true >> >> This value defaults to false; to enable IOMMU support, this field >> should be set to true when setting other global parameters on init >> (such as "dpdk-socket-mem", for example). Changing the value at >> runtime is not supported, and requires restarting the vswitch daemon. >> >> Signed-off-by: Mark Kavanagh >> >> --- >> >> v4: - rebase to HEAD of master >> - clarify that IOMMU support applies exclusively to vhost user >>client ports. >> - reword caveats regarding modifying vhost-iommu-support at runtime >> - use smap_get_bool() to parse vhost-iommu-support, instead of >>process_vhost_flags(). >> - add stub implementation of dpdk_vhost_iommu_enabled(). >> - move stdbool.h #include outside of DPDK compiler guards. >> - remove mention of vhost IOMMU mode from >>netdev_dpdk_vhost_client_configure() log. >> >> v3: - erroneous; disregard. >> >> v2: - rebase to HEAD of master >> - refactor vHost IOMMU enablement mechanism (use a global >>config option, instead of the previous per-port approach). >> --- >> Documentation/topics/dpdk/vhost-user.rst | 27 +++ >> NEWS | 1 + >> lib/dpdk-stub.c | 6 ++ >> lib/dpdk.c | 12 >> lib/dpdk.h | 3 +++ >> lib/netdev-dpdk.c| 14 ++ >> vswitchd/vswitch.xml | 15 +++ >> 7 files changed, 74 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/topics/dpdk/vhost-user.rst >b/Documentation/topics/dpdk/vhost-user.rst >> index 5347995..ffef653 100644 >> --- a/Documentation/topics/dpdk/vhost-user.rst >> +++ b/Documentation/topics/dpdk/vhost-user.rst >> @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for vHost >ports to 'reconnect' in >> event of the switch crashing or being brought down. Once it is brought back >up, >> the vHost ports will reconnect automatically and normal service will >resume. >> >> +vhost-user-client IOMMU Support >> +~~~ >> + >> +vhost IOMMU is a feature which restricts the vhost memory that a virtio >device >> +can access, and as such is useful in deployments in which security is a >concern. >> + >> +IOMMU support may be enabled via a global config value, ```vhost-iommu- >support```. >> +Setting this to true enables vhost IOMMU support for all vhost ports >when/where >> +available:: >> + >> +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true > >Few spaces missing in above command. > >> + >> +The default
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
Yuanhan's old email address was used in the previous mail - updated correctly here. -Mark >-Original Message- >From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] >On Behalf Of Kavanagh, Mark B >Sent: Thursday, December 7, 2017 11:24 AM >To: Stokes, Ian; d...@openvswitch.org >Cc: Liu, Yuanhan ; Bie, Tiwei ; >Mcnamara, John ; maxime.coque...@redhat.com; >i.maxim...@samsung.com >Subject: Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > >Hi folks, > >Thanks for all of your respective reviews and testing of this patchset. > >It seems, however, that an issue in DPDK v17.11 has come to light that affects >guests which use testpmd. >Specifically, traffic does not reach a guest when traffic is live prior to >kicking off the testpmd app within said guest, and at least two forwarding >cores are used. [1] > >This is explained in additional detail is [2], an extract of which is >reproduced below: > > "the vector Rx could be broken if backend has consumed all the avail >descs before the >device is started. Because in current implementation, the vector Rx > will >return immediately >without refilling the avail ring if the used ring is empty. So we have >to refill >the avail ring after flushing the elements in the used ring." > >This issue was initially uncovered by Antonio Fischetti, as part of the 17.11 >patchset validation, and has since been localized to DPDK, rather than OvS. >As a result, it seems now that we should not move to 17.11, but seek an out- >of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - >should we: > a) move to 17.11 now, note the issue above in the 'errata' section of > the >documentation, and move to 17.11.1 when it becomes available in February of >next year > b) request the early release of the 17.11.1 stable branch, which >incorporates a fix for this issue (the possibility, and availability, of which >are both TBD). > >Thanks in advance, >Mark > >[1] http://dpdk.org/ml/archives/dev/2017-December/082801.html >[2] http://dpdk.org/ml/archives/dev/2017-December/082874.html > > >>-Original Message- >>From: Stokes, Ian >>Sent: Thursday, December 7, 2017 8:43 AM >>To: Kavanagh, Mark B ; d...@openvswitch.org >>Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com; >>jan.scheur...@ericsson.com; Mooney, Sean K >>Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support >> >>> DPDK v17.11 introduces support for the vHost IOMMU feature. >>> This is a security feature, which restricts the vhost memory that a virtio >>> device may access. >> >>Hi all, >> >>There were a few requests for changes in the v4 of this patch and it's an >>important aspect of the DPDK 17.11 support for OVS. >> >>I'd like to get this series in to the pull request this week. If people have >>flagged issues for the latest revision I'd appreciate if they could review >the >>latest revision and flag any new issues that need to be raised. >> >>Ian >> >>> >>> This feature also enables the vhost REPLY_ACK protocol, the implementation >>> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is >>> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature >>> is disabled by default in (and should remain so), for the aforementioned >>> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can >>> safely be enabled, even without having an IOMMU device, with no >>> performance penalty. >>> >>> This patch adds a new global config option, vhost-iommu-support, that >>> controls enablement of the vhost IOMMU feature: >>> >>> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true >>> >>> This value defaults to false; to enable IOMMU support, this field should >>> be set to true when setting other global parameters on init (such as >>> "dpdk-socket-mem", for example). Changing the value at runtime is not >>> supported, and requires restarting the vswitch daemon. >>> >>> Signed-off-by: Mark Kavanagh >>> >>> --- >>> >>> v4: - rebase to HEAD of master >>> - clarify that IOMMU support applies exclusively to vhost user >>>client ports. >>> - reword caveats regarding modifying vhost-iommu-support at runtime >>> - use smap_get_bool() to parse vhost-iommu-support, instead of >>>process_vhost_flags(). >>> - add stub implementation of dpdk_vhost_iommu_enabled(). >>> - move stdbool.h #include outside of DPDK compiler guards. >>> - remove mention of vhost IOMMU mode from >>>netdev_dpdk_vhost_client_configure() log. >>> >>> v3: - erroneous; disregard. >>> >>> v2: - rebase to HEAD of master >>> - refactor vHost IOMMU enablement mechanism (use a global >>>config option, instead of the previous per-port
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
Hi folks, Thanks for all of your respective reviews and testing of this patchset. It seems, however, that an issue in DPDK v17.11 has come to light that affects guests which use testpmd. Specifically, traffic does not reach a guest when traffic is live prior to kicking off the testpmd app within said guest, and at least two forwarding cores are used. [1] This is explained in additional detail is [2], an extract of which is reproduced below: "the vector Rx could be broken if backend has consumed all the avail descs before the device is started. Because in current implementation, the vector Rx will return immediately without refilling the avail ring if the used ring is empty. So we have to refill the avail ring after flushing the elements in the used ring." This issue was initially uncovered by Antonio Fischetti, as part of the 17.11 patchset validation, and has since been localized to DPDK, rather than OvS. As a result, it seems now that we should not move to 17.11, but seek an out-of-tree 17.11.1 stable/bugfix release. I'm interested in opinions on this - should we: a) move to 17.11 now, note the issue above in the 'errata' section of the documentation, and move to 17.11.1 when it becomes available in February of next year b) request the early release of the 17.11.1 stable branch, which incorporates a fix for this issue (the possibility, and availability, of which are both TBD). Thanks in advance, Mark [1] http://dpdk.org/ml/archives/dev/2017-December/082801.html [2] http://dpdk.org/ml/archives/dev/2017-December/082874.html >-Original Message- >From: Stokes, Ian >Sent: Thursday, December 7, 2017 8:43 AM >To: Kavanagh, Mark B; d...@openvswitch.org >Cc: ktray...@redhat.com; maxime.coque...@redhat.com; i.maxim...@samsung.com; >jan.scheur...@ericsson.com; Mooney, Sean K >Subject: RE: [ovs-dev][PATCH V4 2/2] netdev-dpdk: vHost IOMMU support > >> DPDK v17.11 introduces support for the vHost IOMMU feature. >> This is a security feature, which restricts the vhost memory that a virtio >> device may access. > >Hi all, > >There were a few requests for changes in the v4 of this patch and it's an >important aspect of the DPDK 17.11 support for OVS. > >I'd like to get this series in to the pull request this week. If people have >flagged issues for the latest revision I'd appreciate if they could review the >latest revision and flag any new issues that need to be raised. > >Ian > >> >> This feature also enables the vhost REPLY_ACK protocol, the implementation >> of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is >> buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature >> is disabled by default in (and should remain so), for the aforementioned >> older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can >> safely be enabled, even without having an IOMMU device, with no >> performance penalty. >> >> This patch adds a new global config option, vhost-iommu-support, that >> controls enablement of the vhost IOMMU feature: >> >> ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true >> >> This value defaults to false; to enable IOMMU support, this field should >> be set to true when setting other global parameters on init (such as >> "dpdk-socket-mem", for example). Changing the value at runtime is not >> supported, and requires restarting the vswitch daemon. >> >> Signed-off-by: Mark Kavanagh >> >> --- >> >> v4: - rebase to HEAD of master >> - clarify that IOMMU support applies exclusively to vhost user >>client ports. >> - reword caveats regarding modifying vhost-iommu-support at runtime >> - use smap_get_bool() to parse vhost-iommu-support, instead of >>process_vhost_flags(). >> - add stub implementation of dpdk_vhost_iommu_enabled(). >> - move stdbool.h #include outside of DPDK compiler guards. >> - remove mention of vhost IOMMU mode from >>netdev_dpdk_vhost_client_configure() log. >> >> v3: - erroneous; disregard. >> >> v2: - rebase to HEAD of master >> - refactor vHost IOMMU enablement mechanism (use a global >>config option, instead of the previous per-port approach). >> --- >> Documentation/topics/dpdk/vhost-user.rst | 27 +++ >> NEWS | 1 + >> lib/dpdk-stub.c | 6 ++ >> lib/dpdk.c | 12 >> lib/dpdk.h | 3 +++ >> lib/netdev-dpdk.c| 14 ++ >> vswitchd/vswitch.xml | 15 +++ >> 7 files changed, 74 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/topics/dpdk/vhost-user.rst >> b/Documentation/topics/dpdk/vhost-user.rst >> index 5347995..ffef653 100644 >> ---
Re: [ovs-dev] [PATCH v6 0/7] Output packet batching.
Thanks a lot for testing the series. >From my side I want to share testing results for my primary target of this patch set: PVP with DPDK guest and bonded physical ports in OVS. Setup looks like: netdev@ovs-netdev: br-virt: vhuost-user-1 (dpdkvhostuserclient, 1 rx and 1 tx queue) bond-patch (peer=virt-patch) br-bond: ens1f0 (dpdk) ens1f1 (dpdk) virt-patch (peer=bond-patch) pmd thread numa_id 0 core_id 1: isolated : true port: ens1f0queue-id: 0 port: ens1f1queue-id: 0 pmd thread numa_id 0 core_id 2: isolated : true port: vhuost-user-1queue-id: 0 Bridge "br-bond" Port "bond" Interface "ens1f1" Interface "ens1f0" Port "bond" is the balance-tcp OVS bond port. testpmd DPDK application works in guest in macswap mode. Packet size: 512B Results are in Mpps averaged over three 20 second runs. Results: master patchpatch FlowsMppstx-flush-interval=0 tx-flush-interval=50 83.891 4.236 + 8.8% 4.210+ 8.2% 256 2.612 3.268 +25.1% 3.077+17.8% 1024 2.509 3.144 +25.3% 2.935+16.9% 8192 2.031 2.384 +17.3% 2.379+17.1% 1048576 1.950 2.292 +17.5% 2.287+17.2% The patch-set provides constant significant improvement for all numbers of flows in that testing scenario up to 25% for medium number of flows. Time based batching shows less performance improvement (significantly less for medium numbers of flows) and not preferred in that case. Significant performance improvement achieved due to reduced number of send operations. With patches applied OVS makes only ~2 calls to netdev_send() per input batch (one call per port in bonding) instead of sending almost each packet separately. Best regards, Ilya Maximets. On 05.12.2017 20:26, Jan Scheurich wrote: > We have now repeated our earlier iperf3 tests for this patch series. > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338247.html > > > > We use an iperf3 server as representative for a typical IO-intensive kernel > application. The iperf3 server executes in a VM with 2 vCPUs where both > virtio interrupts and iperf process are pinned to the same vCPU for best > performance. We run two iperf3 clients in parallel on a different server to > avoid the client to become the bottleneck when enabling tx batching. > > > > OVS tx-flush- iperf3 Avg. PMD PMD Iperf ping -f > > version interval Gbps cycles/pkt util CPU load avg rtt > > > -- > > master - 7.24 1778 46.5% 99.7% 23 us > > Patch v6 0 7.18 1873 47.7% 100.0% 29 us > > Patch v6 50 8.99 1108 36.3% 99.7% 38 us > > Patch v6 100 - 88 us > > > > In all cases the vCPU capacity of the of the server VM handing the virtio > interrupts and the iperf3 server thread is the bottleneck. The TCP throughput > is throttled by packets being dropped on Tx to the vhostuser port of the > server VM. The Linux kernel is not fast enough to handle the interrupts and > poll the incoming packets. > > > > As expected the tx batching patch alone with tx-flush-interval=0 does not > provide any benefit as it doesn’t reduce the virtio interrupt rate. > > > > Setting the tx-flush-interval to 50 microseconds immediately improves the > throughput: The PMD utilization drops from 47% to 36% due to the reduced rate > of write calls to the virtio kick fd. (I believe the more pronounced drop in > processing cycles/pkt is an artifact of the patch. The cycles used for > delayed tx to vhostuser are no longer counted as packet processing cost. To > be checked in the individual patch review.) > > > > More importantly, the iperf3 server VM can now receive 8.99 instead of 7.24 > Gbit/s, an increase by 24%. I am sure that 10G line rate could be reached > with vhost multi-queue in the server VM. > > > > Compared to the v4 version of the patches, the impact on latency is now > reduced a lot. Packets with an inter-arrival time larger than the configured > tx-flush-interval are not affected at all. For a 50 us tx-flush-interval this > means packet flows with a packet rate of up to 20 Kpps! > > > > Hence the average RTT reported by “ping -f” experience only a small increases > from 23 us on master to 38 us with tx-flush-interval=50. Only when increasing > tx-flush-interval well beyond the intrinsic average inter-arrival time, it > translates directly into increased latency. > > > > Conclusion: Time-based tx batching fulfills the expectations for >
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
Looks good to me in general, but I prefer if you'll fix following checkpatch warnings: == Checking 2e9587f27add ("netdev-dpdk: vHost IOMMU support") == WARNING: Line length is >79-characters long #52 FILE: Documentation/topics/dpdk/vhost-user.rst:280: can access, and as such is useful in deployments in which security is a concern. WARNING: Line length is >79-characters long #54 FILE: Documentation/topics/dpdk/vhost-user.rst:282: IOMMU support may be enabled via a global config value, ```vhost-iommu-support```. WARNING: Line length is >79-characters long #71 FILE: Documentation/topics/dpdk/vhost-user.rst:299: default (and should remain so if using the aforementioned versions of QEMU). One more comment inline. Best regards, Ilya Maximets. On 04.12.2017 14:15, Mark Kavanagh wrote: > DPDK v17.11 introduces support for the vHost IOMMU feature. > This is a security feature, which restricts the vhost memory > that a virtio device may access. > > This feature also enables the vhost REPLY_ACK protocol, the > implementation of which is known to work in newer versions of > QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - > v2.9.0, inclusive). As such, the feature is disabled by default > in (and should remain so), for the aforementioned older QEMU > verions. Starting with QEMU v2.9.1, vhost-iommu-support can > safely be enabled, even without having an IOMMU device, with > no performance penalty. > > This patch adds a new global config option, vhost-iommu-support, > that controls enablement of the vhost IOMMU feature: > > ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > > This value defaults to false; to enable IOMMU support, this field > should be set to true when setting other global parameters on init > (such as "dpdk-socket-mem", for example). Changing the value at > runtime is not supported, and requires restarting the vswitch daemon. > > Signed-off-by: Mark Kavanagh> > --- > > v4: - rebase to HEAD of master > - clarify that IOMMU support applies exclusively to vhost user >client ports. > - reword caveats regarding modifying vhost-iommu-support at runtime > - use smap_get_bool() to parse vhost-iommu-support, instead of >process_vhost_flags(). > - add stub implementation of dpdk_vhost_iommu_enabled(). > - move stdbool.h #include outside of DPDK compiler guards. > - remove mention of vhost IOMMU mode from >netdev_dpdk_vhost_client_configure() log. > > v3: - erroneous; disregard. > > v2: - rebase to HEAD of master > - refactor vHost IOMMU enablement mechanism (use a global >config option, instead of the previous per-port approach). > --- > Documentation/topics/dpdk/vhost-user.rst | 27 +++ > NEWS | 1 + > lib/dpdk-stub.c | 6 ++ > lib/dpdk.c | 12 > lib/dpdk.h | 3 +++ > lib/netdev-dpdk.c| 14 ++ > vswitchd/vswitch.xml | 15 +++ > 7 files changed, 74 insertions(+), 4 deletions(-) > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index 5347995..ffef653 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for vHost > ports to 'reconnect' in > event of the switch crashing or being brought down. Once it is brought back > up, > the vHost ports will reconnect automatically and normal service will resume. > > +vhost-user-client IOMMU Support > +~~~ > + > +vhost IOMMU is a feature which restricts the vhost memory that a virtio > device > +can access, and as such is useful in deployments in which security is a > concern. > + > +IOMMU support may be enabled via a global config value, > ```vhost-iommu-support```. > +Setting this to true enables vhost IOMMU support for all vhost ports > when/where > +available:: > + > +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true Few spaces missing in above command. > + > +The default value is false. > + > +.. important:: > + > +Changing this value requires restarting the daemon. > + > +.. important:: > + > +Enabling the IOMMU feature also enables the vhost user reply-ack > protocol; > +this is known to work on QEMU v2.10.0, but is buggy on older versions > +(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is disabled by > +default (and should remain so if using the aforementioned versions of > QEMU). > +Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, > even > +without having an IOMMU device, with no performance penalty. > + > .. _dpdk-testpmd: > > DPDK in the Guest > diff --git a/NEWS b/NEWS > index
Re: [ovs-dev] [PATCH V4 2/2] netdev-dpdk: vHost IOMMU support
> DPDK v17.11 introduces support for the vHost IOMMU feature. > This is a security feature, which restricts the vhost memory that a virtio > device may access. Hi all, There were a few requests for changes in the v4 of this patch and it's an important aspect of the DPDK 17.11 support for OVS. I'd like to get this series in to the pull request this week. If people have flagged issues for the latest revision I'd appreciate if they could review the latest revision and flag any new issues that need to be raised. Ian > > This feature also enables the vhost REPLY_ACK protocol, the implementation > of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is > buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature > is disabled by default in (and should remain so), for the aforementioned > older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can > safely be enabled, even without having an IOMMU device, with no > performance penalty. > > This patch adds a new global config option, vhost-iommu-support, that > controls enablement of the vhost IOMMU feature: > > ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > > This value defaults to false; to enable IOMMU support, this field should > be set to true when setting other global parameters on init (such as > "dpdk-socket-mem", for example). Changing the value at runtime is not > supported, and requires restarting the vswitch daemon. > > Signed-off-by: Mark Kavanagh> > --- > > v4: - rebase to HEAD of master > - clarify that IOMMU support applies exclusively to vhost user >client ports. > - reword caveats regarding modifying vhost-iommu-support at runtime > - use smap_get_bool() to parse vhost-iommu-support, instead of >process_vhost_flags(). > - add stub implementation of dpdk_vhost_iommu_enabled(). > - move stdbool.h #include outside of DPDK compiler guards. > - remove mention of vhost IOMMU mode from >netdev_dpdk_vhost_client_configure() log. > > v3: - erroneous; disregard. > > v2: - rebase to HEAD of master > - refactor vHost IOMMU enablement mechanism (use a global >config option, instead of the previous per-port approach). > --- > Documentation/topics/dpdk/vhost-user.rst | 27 +++ > NEWS | 1 + > lib/dpdk-stub.c | 6 ++ > lib/dpdk.c | 12 > lib/dpdk.h | 3 +++ > lib/netdev-dpdk.c| 14 ++ > vswitchd/vswitch.xml | 15 +++ > 7 files changed, 74 insertions(+), 4 deletions(-) > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index 5347995..ffef653 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -273,6 +273,33 @@ One benefit of using this mode is the ability for > vHost ports to 'reconnect' in event of the switch crashing or being > brought down. Once it is brought back up, the vHost ports will reconnect > automatically and normal service will resume. > > +vhost-user-client IOMMU Support > +~~~ > + > +vhost IOMMU is a feature which restricts the vhost memory that a virtio > +device can access, and as such is useful in deployments in which security > is a concern. > + > +IOMMU support may be enabled via a global config value, ```vhost-iommu- > support```. > +Setting this to true enables vhost IOMMU support for all vhost ports > +when/where > +available:: > + > +$ ovs-vsctl set Open_vSwitch.other_config:vhost-iommu-support=true > + > +The default value is false. > + > +.. important:: > + > +Changing this value requires restarting the daemon. > + > +.. important:: > + > +Enabling the IOMMU feature also enables the vhost user reply-ack > protocol; > +this is known to work on QEMU v2.10.0, but is buggy on older versions > +(2.7.0 - 2.9.0, inclusive). Consequently, the IOMMU feaure is > disabled by > +default (and should remain so if using the aforementioned versions of > QEMU). > +Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, > even > +without having an IOMMU device, with no performance penalty. > + > .. _dpdk-testpmd: > > DPDK in the Guest > diff --git a/NEWS b/NEWS > index d4a1c9a..99c47d8 100644 > --- a/NEWS > +++ b/NEWS > @@ -15,6 +15,7 @@ Post-v2.8.0 > * Add support for compiling OVS with the latest Linux 4.13 kernel > - DPDK: > * Add support for DPDK v17.11 > + * Add support for vHost IOMMU > > v2.8.0 - 31 Aug 2017 > > diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index daef729..3602180 > 100644 > --- a/lib/dpdk-stub.c > +++ b/lib/dpdk-stub.c > @@ -48,3 +48,9 @@ dpdk_get_vhost_sock_dir(void) { > return