Re: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT

2017-08-10 Thread Shashank Ram


From: ovs-dev-boun...@openvswitch.org  on 
behalf of Anand Kumar 
Sent: Thursday, August 10, 2017 8:59 PM
To: d...@openvswitch.org
Subject: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for   
ICMP during SNAT/DNAT

During SNAT/DNAT, we should not be updating the port field of ct_endpoint
struct, as ICMP packets do not have port information. Since port and
icmp_id are overlapped in ct_endpoint struct, icmp_id gets changed.
As a result, NAT look up fails to find a matching entry.

This patch addresses this issue by not modifying icmp_id field during
SNAT/DNAT only for ICMP traffic

The current NAT module doesn't take the ICMP type/id/code into account
during the lookups. Fix this to make it similar with the other conntrack
module.

Signed-off-by: Anand Kumar 
___

Acked-by: Shashank Ram 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT

2017-08-10 Thread Anand Kumar
During SNAT/DNAT, we should not be updating the port field of ct_endpoint
struct, as ICMP packets do not have port information. Since port and
icmp_id are overlapped in ct_endpoint struct, icmp_id gets changed.
As a result, NAT look up fails to find a matching entry.

This patch addresses this issue by not modifying icmp_id field during
SNAT/DNAT only for ICMP traffic

The current NAT module doesn't take the ICMP type/id/code into account
during the lookups. Fix this to make it similar with the other conntrack
module.

Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Conntrack-nat.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index ae6b92c..eb6f9db 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -22,6 +22,12 @@ OvsHashNatKey(const OVS_CT_KEY *key)
 HASH_ADD(src.port);
 HASH_ADD(dst.port);
 HASH_ADD(zone);
+/* icmp_id and port overlap in the union */
+HASH_ADD(src.icmp_type);
+HASH_ADD(dst.icmp_type);
+HASH_ADD(src.icmp_code);
+HASH_ADD(dst.icmp_code);
+
 #undef HASH_ADD
 return hash;
 }
@@ -44,6 +50,12 @@ OvsNatKeyAreSame(const OVS_CT_KEY *key1, const OVS_CT_KEY 
*key2)
 FIELD_COMPARE(src.port);
 FIELD_COMPARE(dst.port);
 FIELD_COMPARE(zone);
+FIELD_COMPARE(src.icmp_id);
+FIELD_COMPARE(dst.icmp_id);
+FIELD_COMPARE(src.icmp_type);
+FIELD_COMPARE(dst.icmp_type);
+FIELD_COMPARE(src.icmp_code);
+FIELD_COMPARE(dst.icmp_code);
 return TRUE;
 #undef FIELD_COMPARE
 }
@@ -253,6 +265,7 @@ OvsNatAddEntry(OVS_NAT_ENTRY* entry)
  * Update an Conntrack entry with NAT information. Translated address and
  * port will be generated and write back to the conntrack entry as a
  * result.
+ * Note: For ICMP, only address is translated.
  *
  */
 BOOLEAN
@@ -271,7 +284,7 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry)
 BOOLEAN allPortsTried;
 BOOLEAN originalPortsTried;
 struct ct_addr firstAddr;
-
+
 uint32_t hash = OvsNatHashRange(entry, 0);
 
 if ((entry->natInfo.natAction & NAT_ACTION_SRC) &&
@@ -310,10 +323,14 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry)
 for (;;) {
 if (entry->natInfo.natAction & NAT_ACTION_SRC) {
 entry->rev_key.dst.addr = ctAddr;
-entry->rev_key.dst.port = htons(port);
+if (entry->rev_key.nw_proto != IPPROTO_ICMP) {
+entry->rev_key.dst.port = htons(port);
+}
 } else {
 entry->rev_key.src.addr = ctAddr;
-entry->rev_key.src.port = htons(port);
+if (entry->rev_key.nw_proto != IPPROTO_ICMP) {
+entry->rev_key.src.port = htons(port);
+}
 }
 
 OVS_NAT_ENTRY *natEntry = OvsNatLookup(>rev_key, TRUE);
-- 
2.9.3.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 02:31:43PM -0700, Joe Stringer wrote:
> On 10 August 2017 at 11:41, Ben Pfaff  wrote:
> > On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote:
> >> Besides of that, I see many places consume the table class.
> >> Do you mind to make a macro helps to fetch the class?
> >> Like:
> >>
> >> #define OVSDB_GET_TABLE_CLASS(row) \
> >> ((row)->table->class)
> >
> > Ugh.
> 
> Just in case the context isn't obvious... ;-)
> 
> I think that in general, we try not to obscure what's going on in OVS
> code. This particular suggestion makes it less clear when reading the
> code exactly where the class comes from, because the pointer
> dereference is then hidden behind the macro. It also makes lines
> referencing the class longer, and some of these lines are already
> fairly long.

That's a much better way of explaining what I meant.  Thanks, Joe.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] How to make OVS working with IVSHMEM ?

2017-08-10 Thread Furong


Hello, I encounter a problem with running ovs with IVSHMEM.

I've followed the guide of INSTALL.DPDK.md of ovs-2.5.0 to use IVSHMEM.

Firstly, I started ovs-vswitchd using "./sbin/ovs-vswitchd --dpdk -c 0x1
-n 4 --proc-type=primary -- --pidfile --detach", and I added dpdk
ports(dpdk0,dpdk1) and dpdk ring(dpdkr0) to ovs.

Then, I started ./openvswitch-2.5.0/tests/test-dpdkr using "./test-dpdkr
-c 1 -n 4 --proc-type=secondary -- -n 0", but the EAL initialization of
this program was stucked after printing "EAL: Detected 16 lcore(s)" to stdout.

Does anybody has successful experience with running ovs with IVSHMEM?
Please share with me. Many thanks!!!


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.

2017-08-10 Thread Darrell Ball
Hi Bhanu

Given that you ultimately intend changes beyond those in this patch, would it 
make sense
just to fold the follow up series (at least, the key elements) into this 
series, essentially expanding
on this patch 5 ?

Thanks Darrell

-Original Message-
From:  on behalf of Bhanuprakash Bodireddy 

Date: Tuesday, August 8, 2017 at 10:06 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in 
intermediate queue.

Under low rate traffic conditions, there can be 2 issues.
  (1) Packets potentially can get stuck in the intermediate queue.
  (2) Latency of the packets can increase significantly due to
   buffering in intermediate queue.

This commit handles the (1) issue by flushing the tx port queues using
dp_netdev_flush_txq_ports() as part of PMD packet processing loop.
Also this commit addresses issue (2) by flushing the tx queues after
every rxq port processing. This reduces the latency with out impacting
the forwarding throughput.

   MASTER
  
   Pkt size  min(ns)   avg(ns)   max(ns)
512  4,631  5,022309,914
   1024  5,545  5,749104,294
   1280  5,978  6,159 45,306
   1518  6,419  6,774946,850

  MASTER + COMMIT
  -
   Pkt size  min(ns)   avg(ns)   max(ns)
512  4,711  5,064182,477
   1024  5,601  5,888701,654
   1280  6,018  6,491533,037
   1518  6,467  6,734312,471

PMDs can be teared down and spawned at runtime and so the rxq and txq
mapping of the PMD threads can change. In few cases packets can get
stuck in the queue due to reconfiguration and this commit helps flush
the queues.

Suggested-by: Eelco Chaudron 
Reported-at: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DApril_331039.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28=H0yNRh-c9pdYHacCzkoruc48Dj_Whkcwcjzv-vta-EI=
 
Signed-off-by: Bhanuprakash Bodireddy 
Signed-off-by: Antonio Fischetti 
Co-authored-by: Antonio Fischetti 
Signed-off-by: Markus Magnusson 
Co-authored-by: Markus Magnusson 
Acked-by: Eelco Chaudron 
---
 lib/dpif-netdev.c | 52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..bfb9650 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -340,6 +340,7 @@ enum pmd_cycles_counter_type {
 };
 
 #define XPS_TIMEOUT_MS 500LL
+#define LAST_USED_QID_NONE -1
 
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */
 struct dp_netdev_rxq {
@@ -500,7 +501,13 @@ struct rxq_poll {
 struct tx_port {
 struct dp_netdev_port *port;
 int qid;
-long long last_used;
+int last_used_qid;/* Last queue id where packets got
+ enqueued. */
+long long last_used;  /* In case XPS is enabled, it contains the
+   * timestamp of the last time the port was
+   * used by the thread to send data.  After
+   * XPS_TIMEOUT_MS elapses the qid will be
+   * marked as -1. */
 struct hmap_node node;
 };
 
@@ -3101,6 +3108,25 @@ cycles_count_intermediate(struct 
dp_netdev_pmd_thread *pmd,
 non_atomic_ullong_add(>cycles.n[type], interval);
 }
 
+static void
+dp_netdev_flush_txq_ports(struct dp_netdev_pmd_thread *pmd)
+{
+struct tx_port *cached_tx_port;
+int tx_qid;
+
+HMAP_FOR_EACH (cached_tx_port, node, >send_port_cache) {
+tx_qid = cached_tx_port->last_used_qid;
+
+if (tx_qid != LAST_USED_QID_NONE) {
+netdev_txq_flush(cached_tx_port->port->netdev, tx_qid,
+ cached_tx_port->port->dynamic_txqs);
+
+/* Queue flushed and mark it empty. */
+cached_tx_port->last_used_qid = LAST_USED_QID_NONE;
+}
+}
+}
+
 static int
 dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
struct netdev_rxq *rx,
@@ -3667,6 +3693,9 @@ dpif_netdev_run(struct dpif *dpif)
 dp_netdev_process_rxq_port(non_pmd,

[ovs-dev] DPDK Merge Repo Location

2017-08-10 Thread Darrell Ball
Hi All

As discussed in the fortnightly DPDK meeting, I am using a repo for DPDK patch 
merging.
The repo is here: https://github.com/darball/ovs/
Branch is dpdk_merge - based on ovs master.

Nothing for this week, as reviews are not completed yet for some features.

Thanks Darrell
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 6/6] dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Rxqs consumed processing cycles are used to improve the balance
of how rxqs are assigned to pmds. Currently some reconfiguration
is needed to perform a reassignment.

Add an ovs-appctl command to perform a new assignment in order
to balance based on the latest rxq processing cycle information.

Note: Jan requested this for testing purposes.

Suggested-by: Jan Scheurich 
Signed-off-by: Kevin Traynor 
---
  Documentation/howto/dpdk.rst |  5 -
  lib/dpif-netdev.c| 35 +++
  vswitchd/ovs-vswitchd.8.in   |  2 ++
  3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 493e215..5d009bd 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -140,5 +140,8 @@ Core 7: Q4 (70%) | Q5 (10%)
  core 8: Q0 (60%) | Q0 (30%)

-Rxq to pmds assignment takes place whenever there are configuration changes.
+Rxq to pmds assignment takes place whenever there are configuration changes
+or can be triggered by using::
+
+$ ovs-appctl dpif-netdev/pmd-rxq-rebalance

  QoS
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b0f4010..8857489 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -708,4 +708,6 @@ static inline bool emc_entry_alive(struct emc_entry *ce);
  static void emc_clear_entry(struct emc_entry *ce);

+static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
+
  static void
  emc_cache_init(struct emc_cache *flow_cache)
@@ -1001,4 +1003,34 @@ sorted_poll_thread_list(struct dp_netdev *dp,

  static void
+dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
+  const char *argv[], void *aux OVS_UNUSED)
+{
+struct ds reply = DS_EMPTY_INITIALIZER;
+struct dp_netdev *dp = NULL;
+
+ovs_mutex_lock(_netdev_mutex);
+
+if (argc == 2) {
+dp = shash_find_data(_netdevs, argv[1]);
+} else if (shash_count(_netdevs) == 1) {
+/* There's only one datapath */
+dp = shash_first(_netdevs)->data;
+}
+
+if (!dp) {
+ovs_mutex_unlock(_netdev_mutex);
+unixctl_command_reply_error(conn,
+"please specify an existing datapath");
+return;
+}
+
+dp_netdev_request_reconfigure(dp);
+ovs_mutex_unlock(_netdev_mutex);
+ds_put_cstr(, "pmd rxq rebalance requested.\n");
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
+static void
  dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
   void *aux)
@@ -1078,4 +1110,7 @@ dpif_netdev_init(void)
   0, 1, dpif_netdev_pmd_info,
   (void *)_aux);
+unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
+ 0, 1, dpif_netdev_pmd_rebalance,
+ NULL);
  return 0;
  }
diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
index dfd209e..c18baf6 100644
--- a/vswitchd/ovs-vswitchd.8.in
+++ b/vswitchd/ovs-vswitchd.8.in
@@ -281,4 +281,6 @@ bridge statistics, only the values shown by the above 
command.
  For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with
  port names, which this thread polls.
+.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
+Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage.
  .
  .so ofproto/ofproto-dpif-unixctl.man


Tested-by: Greg Rose 
Reviewed-by: Greg Rose 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 5/6] dpif-netdev: Change pmd selection order.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Up to his point rxqs are sorted by processing cycles they
consumed and assigned to pmds in a round robin manner.

Ian pointed out that on wrap around the most loaded pmd will be
the next one to be assigned an additional rxq and that it would be
better to reverse the pmd order when wraparound occurs.

In other words, change from assigning by rr to assigning in a forward
and reverse cycle through pmds.

Also, now that the algothim has finalised, document an example.

Suggested-by: Ian Stokes 
Signed-off-by: Kevin Traynor 
---
  Documentation/howto/dpdk.rst | 16 
  lib/dpif-netdev.c| 21 -
  tests/pmd.at |  2 +-
  3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 44737e4..493e215 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -124,4 +124,20 @@ will be used where known to assign rxqs with the highest 
consumption of
  processing cycles to different pmds.

+For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
+available, and the measured usage of core cycles per rxq over the last
+interval is seen to be:
+
+- Queue #0: 30%
+- Queue #1: 80%
+- Queue #3: 60%
+- Queue #4: 70%
+- Queue #5: 10%
+
+The rxqs will be assigned to cores 3,7,8 in the following order:
+
+Core 3: Q1 (80%) |
+Core 7: Q4 (70%) | Q5 (10%)
+core 8: Q0 (60%) | Q0 (30%)
+
  Rxq to pmds assignment takes place whenever there are configuration changes.

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b4663ab..b0f4010 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3251,4 +3251,5 @@ struct rr_numa {

  int cur_index;
+bool idx_inc;
  };

@@ -3307,4 +3308,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct 
rr_numa_list *rr)
  numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
  numa->pmds[numa->n_pmds - 1] = pmd;
+/* At least one pmd so initialise curr_idx and idx_inc. */
+numa->cur_index = 0;
+numa->idx_inc = true;
  }
  }
@@ -3313,5 +3317,20 @@ static struct dp_netdev_pmd_thread *
  rr_numa_get_pmd(struct rr_numa *numa)
  {
-return numa->pmds[numa->cur_index++ % numa->n_pmds];
+int numa_idx = numa->cur_index;
+
+if (numa->idx_inc == true) {
+if (numa->cur_index == numa->n_pmds-1) {
+numa->idx_inc = false;
+} else {
+numa->cur_index++;
+}
+} else {
+if (numa->cur_index == 0) {
+numa->idx_inc = true;
+} else {
+numa->cur_index--;
+}
+}
+return numa->pmds[numa_idx];
  }

diff --git a/tests/pmd.at b/tests/pmd.at
index b6732ea..e39a23a 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [

  m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/"])
-m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4 6/\1/;s/\(queue-id: \)1 3 5 
7/\1/"])
+m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5 6/\1/;s/\(queue-id: \)0 3 4 
7/\1/"])
  m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])



Tested-by: Greg Rose 
Reviewed-by: Greg Rose 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor 
---
  Documentation/howto/dpdk.rst |   7 +++
  lib/dpif-netdev.c| 105 ++-
  2 files changed, 81 insertions(+), 31 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d7f6610..44737e4 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned 
will become
thread.

+If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
+automatically. The processing cycles that have been required for each rxq
+will be used where known to assign rxqs with the highest consumption of
+processing cycles to different pmds.
+
+Rxq to pmds assignment takes place whenever there are configuration changes.
+
  QoS
  ---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e344063..b4663ab 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3328,8 +3328,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
  }

+/* Sort Rx Queues by the processing cycles they are consuming. */
+static int
+rxq_cycle_sort(const void *a, const void *b)
+{
+struct dp_netdev_rxq * qa;
+struct dp_netdev_rxq * qb;
+
+qa = *(struct dp_netdev_rxq **) a;
+qb = *(struct dp_netdev_rxq **) b;
+
+if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
+dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
+return -1;
+}
+
+return 1;
+}
+
  /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
   * queues and marks the pmds as isolated.  Otherwise, assign non isolated
   * pmds to unpinned queues.
   *
+ * If 'pinned' is false queues will be sorted by processing cycles they are
+ * consuming and then assigned to pmds in round robin order.
+ *
   * The function doesn't touch the pmd threads, it just stores the assignment
   * in the 'pmd' member of each rxq. */
@@ -3340,18 +3361,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
  struct rr_numa_list rr;
  struct rr_numa *non_local_numa = NULL;
-
-rr_numa_list_populate(dp, );
+struct dp_netdev_rxq ** rxqs = NULL;
+int i, n_rxqs = 0;
+struct rr_numa *numa = NULL;
+int numa_id;

  HMAP_FOR_EACH (port, node, >ports) {
-struct rr_numa *numa;
-int numa_id;
-
  if (!netdev_is_pmd(port->netdev)) {
  continue;
  }

-numa_id = netdev_get_numa_id(port->netdev);
-numa = rr_numa_list_lookup(, numa_id);
-
  for (int qid = 0; qid < port->n_rxq; qid++) {
  struct dp_netdev_rxq *q = >rxqs[qid];
@@ -3371,34 +3388,60 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
  }
  } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
-if (!numa) {
-/* There are no pmds on the queue's local NUMA node.
-   Round-robin on the NUMA nodes that do have pmds. */
-non_local_numa = rr_numa_list_next(, non_local_numa);
-if (!non_local_numa) {
-VLOG_ERR("There is no available (non-isolated) pmd "
- "thread for port \'%s\' queue %d. This queue "
- "will not be polled. Is pmd-cpu-mask set to "
- "zero? Or are all PMDs isolated to other "
- "queues?", netdev_get_name(port->netdev),
- qid);
-continue;
-}
-q->pmd = rr_numa_get_pmd(non_local_numa);
-VLOG_WARN("There's no available (non-isolated) pmd thread "
-  "on numa node %d. Queue %d on port \'%s\' will "
-  "be assigned to the pmd on core %d "
-  "(numa node %d). Expect reduced performance.",
-  numa_id, qid, netdev_get_name(port->netdev),
-  q->pmd->core_id, q->pmd->numa_id);
+if (n_rxqs == 0) {
+rxqs = xmalloc(sizeof *rxqs);
  } else {
-/* Assign queue to the next (round-robin) PMD on it's local
-   NUMA node. */
-q->pmd = rr_numa_get_pmd(numa);
+  

Re: [ovs-dev] [PATCH v4 3/6] dpif-netdev: Count the rxq processing cycles for an rxq.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Count the cycles used for processing an rxq during the
pmd rxq interval. As this is an in flight counter and
pmds run independently, also store the total cycles used
during the last full interval.

Signed-off-by: Kevin Traynor 
---
  lib/dpif-netdev.c | 65 ++-
  1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 41f16b2..e344063 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -182,4 +182,8 @@ struct emc_cache {
  #define DPCLS_OPTIMIZATION_INTERVAL 1000

+/* Time in ms of the interval in which rxq processing cycles used in
+ * rxq to pmd assignments is measured and stored. */
+#define PMD_RXQ_INTERVAL 1000
+
  struct dpcls {
  struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers */
@@ -558,4 +562,6 @@ struct dp_netdev_pmd_thread {
  /* Periodically sort subtable vectors according to hit frequencies */
  long long int next_optimization;
+/* Periodically store the processing cycles used for each rxq. */
+long long int rxq_interval;

  /* Statistics. */
@@ -684,5 +690,13 @@ static void pmd_load_cached_ports(struct 
dp_netdev_pmd_thread *pmd)
  OVS_REQUIRES(pmd->port_mutex);
  static inline void
-dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
+dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
+   struct polled_queue *poll_list, int poll_cnt);
+static void
+dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type,
+ unsigned long long cycles);
+static uint64_t
+dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type);
  static void
  dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
@@ -3114,4 +3128,21 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,
  }

+static void
+dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type,
+ unsigned long long cycles)
+{
+   atomic_store_relaxed(>cycles[type], cycles);
+}
+
+static uint64_t
+dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type)
+{
+unsigned long long tmp;
+atomic_read_relaxed(>cycles[type], );
+return tmp;
+}
+
  static int
  dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
@@ -3158,5 +3189,5 @@ port_reconfigure(struct dp_netdev_port *port)
  {
  struct netdev *netdev = port->netdev;
-int i, err;
+int i, err, last_nrxq;

  port->need_reconfigure = false;
@@ -3167,4 +3198,5 @@ port_reconfigure(struct dp_netdev_port *port)
  port->rxqs[i].rx = NULL;
  }
+last_nrxq = port->n_rxq;
  port->n_rxq = 0;

@@ -3187,4 +3219,9 @@ port_reconfigure(struct dp_netdev_port *port)
  for (i = 0; i < netdev_n_rxq(netdev); i++) {
  port->rxqs[i].port = port;
+if (i >= last_nrxq) {
+/* Only reset cycle stats for new queues */
+dp_netdev_rxq_set_cycles(>rxqs[i], RXQ_CYCLES_PROC_CURR, 0);
+dp_netdev_rxq_set_cycles(>rxqs[i], RXQ_CYCLES_PROC_LAST, 0);
+}
  err = netdev_rxq_open(netdev, >rxqs[i].rx, i);
  if (err) {
@@ -3867,5 +3904,5 @@ reload:
  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
 poll_list[i].port_no);
-cycles_count_intermediate(pmd, NULL,
+cycles_count_intermediate(pmd, poll_list[i].rxq,
process_packets ? PMD_CYCLES_PROCESSING
: PMD_CYCLES_IDLE);
@@ -3878,5 +3915,5 @@ reload:

  coverage_try_clear();
-dp_netdev_pmd_try_optimize(pmd);
+dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
  if (!ovsrcu_try_quiesce()) {
  emc_cache_slow_sweep(>flow_cache);
@@ -4322,4 +4359,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
  cmap_init(>classifiers);
  pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
+pmd->rxq_interval = time_msec() + PMD_RXQ_INTERVAL;
  hmap_init(>poll_list);
  hmap_init(>tx_ports);
@@ -5759,8 +5797,25 @@ dpcls_sort_subtable_vector(struct dpcls *cls)

  static inline void
-dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd)
+dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
+   struct polled_queue *poll_list, int poll_cnt)
  {
  struct dpcls *cls;
  long long int now = time_msec();
+int i;
+uint64_t rxq_cyc_curr;
+
+if (now > pmd->rxq_interval) {
+/* Get the cycles that were used to process each queue and store. */
+for (i = 0; i < poll_cnt; 

Re: [ovs-dev] [PATCH v4 2/6] dpif-netdev: Add rxq processing cycle counters.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Add two counters to dp_netdev_rxq which will be used for storing the
processing cycles of an rxq. Processing cycles will be stored in reference
to a defined interval. One counter is used for storing cycles during the
current in progress interval, while the other is used to store the cycles
of the last fully complete interval.

cycles_count_intermediate was used to count cycles for a pmd. With some small
additions we can also use it to count the cycles used for processing an rxq.

Signed-off-by: Kevin Traynor 
---
  lib/dpif-netdev.c | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f35c079..41f16b2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -340,4 +340,11 @@ enum pmd_cycles_counter_type {
  };

+enum rxq_cycles_counter_type {
+RXQ_CYCLES_PROC_CURR,  /* Cycles spent successfully polling and
+  processing polled packets */
+RXQ_CYCLES_PROC_LAST,
+RXQ_N_CYCLES
+};
+
  #define XPS_TIMEOUT_MS 500LL

@@ -351,4 +358,5 @@ struct dp_netdev_rxq {
particular core. */
  struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+atomic_ullong cycles[RXQ_N_CYCLES];
  };

@@ -677,5 +685,4 @@ static void pmd_load_cached_ports(struct 
dp_netdev_pmd_thread *pmd)
  static inline void
  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
-
  static void
  dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
@@ -3092,4 +3099,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
  static inline void
  cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
+  struct dp_netdev_rxq *rxq,
enum pmd_cycles_counter_type type)
  OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -3100,4 +3108,8 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,

  non_atomic_ullong_add(>cycles.n[type], interval);
+if (rxq && (type == PMD_CYCLES_PROCESSING)) {
+/* Add to the amount of current processing cycles. */
+non_atomic_ullong_add(>cycles[RXQ_CYCLES_PROC_CURR], interval);
+}
  }

@@ -3668,5 +3680,5 @@ dpif_netdev_run(struct dpif *dpif)
 port->rxqs[i].rx,
 port->port_no);
-cycles_count_intermediate(non_pmd, process_packets ?
+cycles_count_intermediate(non_pmd, NULL, process_packets ?
 PMD_CYCLES_PROCESSING
   : PMD_CYCLES_IDLE);
@@ -3855,5 +3867,5 @@ reload:
  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
 poll_list[i].port_no);
-cycles_count_intermediate(pmd,
+cycles_count_intermediate(pmd, NULL,
process_packets ? PMD_CYCLES_PROCESSING
: PMD_CYCLES_IDLE);


Tested-by: Greg Rose 
Reviewed-by: Greg Rose 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/6] dpif-netdev: Change polled_queue to use dp_netdev_rxq.

2017-08-10 Thread Greg Rose

On 08/09/2017 08:45 AM, Kevin Traynor wrote:

Soon we will want to store processing cycle counts in the dp_netdev_rxq,
so use that as a basis for the polled_queue that pmd_thread_main uses.

Signed-off-by: Kevin Traynor 
---
  lib/dpif-netdev.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..f35c079 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -486,5 +486,5 @@ struct dp_netdev_pmd_cycles {

  struct polled_queue {
-struct netdev_rxq *rx;
+struct dp_netdev_rxq *rxq;
  odp_port_t port_no;
  };
@@ -3799,5 +3799,5 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
*pmd,
  i = 0;
  HMAP_FOR_EACH (poll, node, >poll_list) {
-poll_list[i].rx = poll->rxq->rx;
+poll_list[i].rxq = poll->rxq;
  poll_list[i].port_no = poll->rxq->port->port_no;
  i++;
@@ -3837,6 +3837,6 @@ reload:
  for (i = 0; i < poll_cnt; i++) {
 VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
-pmd->core_id, netdev_rxq_get_name(poll_list[i].rx),
-netdev_rxq_get_queue_id(poll_list[i].rx));
+pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx),
+netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
  }

@@ -3853,5 +3853,5 @@ reload:
  for (i = 0; i < poll_cnt; i++) {
  process_packets =
-dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
+dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
 poll_list[i].port_no);
  cycles_count_intermediate(pmd,


Tested-by: Greg Rose 
Reviewed-by: Greg Rose 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] AdvancedMD Users List

2017-08-10 Thread Kristina
Hi,

 

I understand your inbox is a busy place and sometimes it's natural to miss
few emails. I quickly wanted to check whether you had a chance to review my
email which I sent you.

Please let me know how you wish to proceed with this.

Thank you and look forward to hearing from you.

   

Regards, 

Kristina Jones

 

From: Kristina [mailto:krist...@zoomleads.us] 
Sent: Wednesday, August 09, 2017 4:28 PM
To: 'd...@openvswitch.org'
Subject: AdvancedMD Users List

 

Hi,

 

I was curious to know if you would be interested in AdvancedMD Users List
2017?

 

Let me know your target criteria and we will revert back to you with further
details.

 

Target Users:

Geography:

 

Looking forward to continued success with you.

 

Regards,

Kristina Jones | Marketing Analyst

 

 

If you have any other unique requirement please feel free to share your
requirement with us as we can help you with on-demand list customized as per
your requirement.

 

 

To remove from this mailing: reply with subject line as "leave out"

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.

2017-08-10 Thread Joe Stringer
On 10 August 2017 at 11:41, Ben Pfaff  wrote:
> On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote:
>> How about:
>> struct ovsdb_idl_table {
>> ...
>> const struct ovsdb_idl_table_class *table_class
>> 
>> }
>>
>> struct ovsdb_idl {
>> 
>>  const struct ovsdb_idl_class *idl_class;
>> 
>
> Why make it longer?
>
>> Besides of that, I see many places consume the table class.
>> Do you mind to make a macro helps to fetch the class?
>> Like:
>>
>> #define OVSDB_GET_TABLE_CLASS(row) \
>> ((row)->table->class)
>
> Ugh.

Just in case the context isn't obvious... ;-)

I think that in general, we try not to obscure what's going on in OVS
code. This particular suggestion makes it less clear when reading the
code exactly where the class comes from, because the pointer
dereference is then hidden behind the macro. It also makes lines
referencing the class longer, and some of these lines are already
fairly long.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.

2017-08-10 Thread Russell Bryant
On Thu, Aug 10, 2017 at 4:43 PM, Ben Pfaff  wrote:
> On Thu, Aug 10, 2017 at 04:20:23PM -0400, Russell Bryant wrote:
>> ovs-sandbox comes with a script to quickly set up a simple
>> OVN configuration, ovn-setup.sh.  This script set up config in the OVN
>> northbound database, but didn't create the corresponding ports on
>> br-int.  Add that to save another step in provisioning this simple
>> environment.
>>
>> Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl
>> show" output.
>>
>> Signed-off-by: Russell Bryant 
>
> I haven't used this script, and I didn't test it, but it seems
> reasonable to me.  Feel free to ask or wait for a more thorough review,
> if you prefer.
>
> Acked-by: Ben Pfaff 

Thanks.  It works for me.  I'm not too worried so I applied this to master.

FWIW, here's what it gives you, 2 logical switches connected by a
router with a port on each switch.


=== ovn-nbctl show ===

switch 34bfbfec-423e-4a8c-9aef-31e1a847c601 (sw1)
port sw1-port1
addresses: ["50:54:00:00:00:03 11.0.0.2"]
port lrp1-attachment
type: router
addresses: ["00:00:00:00:ff:02"]
router-port: lrp1
switch f33fce42-66dc-4439-9070-c2b09623133d (sw0)
port sw0-port1
addresses: ["50:54:00:00:00:01 192.168.0.2"]
port lrp0-attachment
type: router
addresses: ["00:00:00:00:ff:01"]
router-port: lrp0
router 84e3352f-8fed-4b3b-99d4-92ccd2e4310f (lr0)
port lrp1
mac: "00:00:00:00:ff:02"
networks: ["11.0.0.1/24"]
port lrp0
mac: "00:00:00:00:ff:01"
networks: ["192.168.0.1/24"]

=== ovn-sbctl show ===

Chassis "chassis-1"
hostname: sandbox
Encap geneve
ip: "127.0.0.1"
options: {csum="true"}
Port_Binding "sw0-port1"
Port_Binding "sw1-port1"
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] travis: parallel builds and tests

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 04:41:19PM -0400, Lance Richardson wrote:
> Some recent travis builds have failed due to having exceeded the
> per-job time limit of 50 minutes. This change enables parallel
> builds and parallel test execution in order to reduce overall
> execution time, and will hopefully allow this class of build
> failures to be avoided.
> 
> Since the travis build environment is provisioned with two CPUs,
> use -j2 for builds and -j4 for tests. Testing in a cloned repository
> shows slightly more than a 50% reduction in overall test time.
> 
> Signed-off-by: Lance Richardson 

Thanks, applied to master.

If this makes the master build finish reliably then I'll gladly apply it
to branch-2.8 also.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 03:33:55PM -0400, Aaron Conole wrote:
> Ben Pfaff  writes:
> 
> > On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote:
> >> As part of a devargs rework in DPDK, the PCI header file was removed, and
> >> needs to be directly included.  This isn't required to build with 17.05 or
> >> earlier, but will be required should a future update happen.
> >> 
> >> Signed-off-by: Aaron Conole 
> >
> > Just to be sure, this patch doesn't cause a build failure with 17.05 or
> > earlier, right?  If it does not, then it's a no-brainer to apply it.
> 
> Correct.

Great.

I applied this to master and branch-2.8.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 04:20:23PM -0400, Russell Bryant wrote:
> ovs-sandbox comes with a script to quickly set up a simple
> OVN configuration, ovn-setup.sh.  This script set up config in the OVN
> northbound database, but didn't create the corresponding ports on
> br-int.  Add that to save another step in provisioning this simple
> environment.
> 
> Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl
> show" output.
> 
> Signed-off-by: Russell Bryant 

I haven't used this script, and I didn't test it, but it seems
reasonable to me.  Feel free to ask or wait for a more thorough review,
if you prefer.

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v6] dp-packet: Reset DPDK hwol flags on init.

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 01:22:16PM -0700, Darrell Ball wrote:
> Reset the DPDK hwol flags in dp_packet_init_.  The new hwol bad checksum
> flag is uninitialized for non-dpdk ports and this is noticed as test
> failures using netdev-dummy ports, when built with the --with-dpdk
> flag set. Hence, in this case, packets may be falsely marked as having a
> bad checksum. The existing APIs are simplified at the same time by
> making them specific to either DPDK or otherwise; they also now
> manage a single field.
> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html
> Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation 
> functions.")
> CC: Sugesh Chandran 
> Signed-off-by: Darrell Ball 

Thanks a lot!  I applied this to master and branch-2.8.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] travis: parallel builds and tests

2017-08-10 Thread Lance Richardson
Some recent travis builds have failed due to having exceeded the
per-job time limit of 50 minutes. This change enables parallel
builds and parallel test execution in order to reduce overall
execution time, and will hopefully allow this class of build
failures to be avoided.

Since the travis build environment is provisioned with two CPUs,
use -j2 for builds and -j4 for tests. Testing in a cloned repository
shows slightly more than a 50% reduction in overall test time.

Signed-off-by: Lance Richardson 
---
 .travis/linux-build.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index d6f610ea1..bc7c7087d 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -102,16 +102,16 @@ if [ "$KERNEL" ] && [ ! "$TESTSUITE" ] && [ ! "$DPDK" ]; 
then
 fi
 
 if [ "$CC" = "clang" ]; then
-make CFLAGS="$CFLAGS -Wno-error=unused-command-line-argument"
+make -j2 CFLAGS="$CFLAGS -Wno-error=unused-command-line-argument"
 elif [[ $BUILD_ENV =~ "-m32" ]]; then
 # Disable sparse for 32bit builds on 64bit machine
-make CFLAGS="$CFLAGS $BUILD_ENV"
+make -j2 CFLAGS="$CFLAGS $BUILD_ENV"
 else
-make CFLAGS="$CFLAGS $BUILD_ENV $SPARSE_FLAGS" C=1
+make -j2 CFLAGS="$CFLAGS $BUILD_ENV $SPARSE_FLAGS" C=1
 fi
 
 if [ "$TESTSUITE" ] && [ "$CC" != "clang" ]; then
-if ! make distcheck RECHECK=yes; then
+if ! make distcheck TESTSUITEFLAGS=-j4 RECHECK=yes; then
 # testsuite.log is necessary for debugging.
 cat */_build/tests/testsuite.log
 exit 1
-- 
2.13.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 4/4] ovsdb-idl: Rename 'old' to 'old_datum'.

2017-08-10 Thread Ben Pfaff
On Wed, Aug 09, 2017 at 03:27:42PM -0700, Joe Stringer wrote:
> Now that the 'new' datum is named 'new_datum', be more consistent by
> renaming 'old' to 'old_datum' to match.
> 
> Signed-off-by: Joe Stringer 
> ---
> v2: New patch.

Oh, it's an additional patch.  OK ;-)

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 3/4] ovsdb-idl: Avoid new expression.

2017-08-10 Thread Ben Pfaff
On Wed, Aug 09, 2017 at 03:27:41PM -0700, Joe Stringer wrote:
> In C++, 'new' is a keyword. If this is used as the name for a field,
> then C++ compilers can get confused about the context and fail to
> compile references to such fields. Rename the field to 'new_datum' to
> avoid this issue.
> 
> Signed-off-by: Joe Stringer 
> ---
> v2: Rebase.
> Rename 'new_' to 'new_datum'.
> Update comments above 'struct ovsdb_idl_row'.

Should we rename 'old' to 'old_datum' for consistency?

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch_v6] dp-packet: Reset DPDK hwol flags on init.

2017-08-10 Thread Darrell Ball
Reset the DPDK hwol flags in dp_packet_init_.  The new hwol bad checksum
flag is uninitialized for non-dpdk ports and this is noticed as test
failures using netdev-dummy ports, when built with the --with-dpdk
flag set. Hence, in this case, packets may be falsely marked as having a
bad checksum. The existing APIs are simplified at the same time by
making them specific to either DPDK or otherwise; they also now
manage a single field.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html
Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation 
functions.")
CC: Sugesh Chandran 
Signed-off-by: Darrell Ball 
---

v5->v6: Refactor some existing apis and incorporate
review comments.

v3->v5: Update the commit message with more context.

v2->v3: Use existed API to reset both the DPDK HWOL flags.

v1->v2: Fix build failure for without --with-dpdk.

 lib/dp-packet.c   |  1 +
 lib/dp-packet.h   | 24 +---
 lib/netdev-dpdk.c |  2 +-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 67aa406..c1f43f3 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -31,6 +31,7 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, enum 
dp_packet_source so
 dp_packet_reset_offsets(b);
 pkt_metadata_init(>md, 0);
 dp_packet_rss_invalidate(b);
+dp_packet_mbuf_init(b);
 dp_packet_reset_cutlen(b);
 /* By default assume the packet type to be Ethernet. */
 b->packet_type = htonl(PT_ETH);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 8f52fee..bb3f9db 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -605,12 +605,30 @@ dp_packet_rss_valid(struct dp_packet *p)
 }
 
 static inline void
-dp_packet_rss_invalidate(struct dp_packet *p)
+dp_packet_rss_invalidate(struct dp_packet *p OVS_UNUSED)
+{
+#ifndef DPDK_NETDEV
+p->rss_hash_valid = false;
+#endif
+}
+
+static inline void
+dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED)
 {
 #ifdef DPDK_NETDEV
 p->mbuf.ol_flags &= ~PKT_RX_RSS_HASH;
-#else
-p->rss_hash_valid = false;
+#endif
+}
+
+/* This initialization is needed for packets that do not come
+ * from DPDK interfaces, when vswitchd is built with --with-dpdk.
+ * The DPDK rte library will still otherwise manage the mbuf.
+ * We only need to initialize the mbuf ol_flags. */
+static inline void
+dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
+{
+#ifdef DPDK_NETDEV
+p->mbuf.ol_flags = 0;
 #endif
 }
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1d82bca..9fd98af 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2916,7 +2916,7 @@ netdev_dpdk_ring_send(struct netdev *netdev, int qid,
  * modified by the consumer of the ring and return into the datapath
  * without recalculating the RSS hash. */
 for (i = 0; i < batch->count; i++) {
-dp_packet_rss_invalidate(batch->packets[i]);
+dp_packet_mbuf_rss_flag_reset(batch->packets[i]);
 }
 
 netdev_dpdk_send__(dev, qid, batch, may_steal, concurrent_txq);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] sandbox: Add ports to br-int in ovn-setup.

2017-08-10 Thread Russell Bryant
ovs-sandbox comes with a script to quickly set up a simple
OVN configuration, ovn-setup.sh.  This script set up config in the OVN
northbound database, but didn't create the corresponding ports on
br-int.  Add that to save another step in provisioning this simple
environment.

Add "ovn-sbctl show" output as well, to follow the existing "ovn-nbctl
show" output.

Signed-off-by: Russell Bryant 
---
 tutorial/ovn-setup.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/tutorial/ovn-setup.sh b/tutorial/ovn-setup.sh
index dc4c813b1..943ca58ed 100755
--- a/tutorial/ovn-setup.sh
+++ b/tutorial/ovn-setup.sh
@@ -23,5 +23,13 @@ ovn-nbctl lsp-set-type lrp1-attachment router
 ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
 
+ovs-vsctl add-port br-int p1 -- \
+set Interface p1 external_ids:iface-id=sw0-port1
+ovs-vsctl add-port br-int p2 -- \
+set Interface p2 external_ids:iface-id=sw1-port1
+
 # View a summary of the configuration
+printf "\n=== ovn-nbctl show ===\n\n"
 ovn-nbctl show
+printf "\n=== ovn-sbctl show ===\n\n"
+ovn-sbctl show
-- 
2.13.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.

2017-08-10 Thread Aaron Conole
Darrell Ball  writes:

> -Original Message-
> From: Aaron Conole 
> Date: Thursday, August 10, 2017 at 10:13 AM
> To: Darrell Ball 
> Cc: ovs dev 
> Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags 
> on init.
>
> Darrell Ball  writes:
> 
> > -Original Message-
> > From: Aaron Conole 
> > Date: Wednesday, August 9, 2017 at 12:51 PM
> > To: Darrell Ball 
> > Cc: Joe Stringer , Darrell Ball , ovs
> > dev 
> > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL
> > checksum flags on init.
> >
> > Darrell Ball  writes:
> > 
> > > Thanks Joe
> > > I forgot to add your Tested-by to V5; I have been testing this 
> myself;
> > > but let me know if you would like it added – I can send a V6.
> > 
> > It will automatically be added by patchwork.  It is sufficient to
> > download (ex: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI=
>  )
> > 
> > I usually use patchwork when working on a series - if I download it 
> from
> > patchwork, I can be confident that all the tags are applied 
> properly, so
> > I won't forget.  Plus, all the discussion happens there, so I can
> > quickly browse it.
> >
> > Thanks Aaron
> > In this case, the Tested-by was applied to V4 and I sent a V5 which did 
> not have the Tested-by.
> > The easiest solution would have been to ask Joe to respond to V5 – 
> well, it is moot now anyways with V6 on the way.
> 
> Yep, that happens.
> 
> > The full list is available at:
> > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE=
>  
> >
> > Thanks, I am on this page most of the day (
> > 
> > It would actually be cool to have a few more admins to troll 
> patchwork
> > and do things like ping for status (ex:
> > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4=
>   
> >
> > I don’t have any context on this patch.
> > Aaron, would you like to resurrect this patch with rebase ?; it is RHEL 
> related.
> 
> It doesn't need a rebase; it applies cleanly.  The version of patchwork
> running doesn't recognize my ack, so someone will have to add it
> manually (or omit it... doesn't matter too much, since Ben ack'd as
> well).
> 
> But it would be cool to get it applied;  maybe even to get back as far
> as branch-2.7 (I think it was posted in time for 2.7 branch).
>
>
> One thing is I don’t know if Daniele had a chance to test this, although it 
> looks simple enough.
> Did you get a chance to test it ? if so, maybe a tested-by  ?

I believe I tested it at the time I acked it.  I'll set up a test for it
tomorrow, though.

> > is this still needed?  it
> > does seem to mostly apply).  Then we could make sure we don't miss
> > things.
> > 
> > > Darrell
> > >
> > > -Original Message-
> > > From:  on behalf of Joe
> > > Stringer 
> > > Date: Tuesday, August 8, 2017 at 4:51 PM
> > > To: Darrell Ball 
> > > Cc: ovs dev 
> > > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL 
> checksum
> > > flags on init.
> > >
> > > On 8 August 2017 at 16:39, Darrell Ball  
> wrote:
> > > > Reset the DPDK HWOL checksum flags in dp_packet_init_.
> > > > The new HWOL bad checksum flag is uninitialized on non-dpdk 
> ports and
> > > > this is noticed as test failures using netdev-dummy
> > > > ports where the bad
> > > > checksum flag is checked.
> > > >
> > > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum
> > > > validation functions.")
> > > > CC: Sugesh Chandran 
> > > > Signed-off-by: Darrell Ball 
> > > > ---
> > > 
> > > Tested-by: Joe Stringer 
> > > Tested-at:

Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly

2017-08-10 Thread Aaron Conole
Ben Pfaff  writes:

> On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote:
>> As part of a devargs rework in DPDK, the PCI header file was removed, and
>> needs to be directly included.  This isn't required to build with 17.05 or
>> earlier, but will be required should a future update happen.
>> 
>> Signed-off-by: Aaron Conole 
>
> Just to be sure, this patch doesn't cause a build failure with 17.05 or
> earlier, right?  If it does not, then it's a no-brainer to apply it.

Correct.

> Thanks,
>
> Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovn-controller: Refactor function of consider_port_binding

2017-08-10 Thread Ben Pfaff
On Fri, Aug 04, 2017 at 10:31:14AM +0800, wang.qia...@zte.com.cn wrote:
> The function of consider_port_binding is redundant. This patch split the 
> function to some sub-function by the port type.
> 
> Signed-off-by: wang qianyu 

Can you explain how it is redundant, and how this patch reduces it?

This patch makes two identical copies of the code for handling L3
gateway ports and patch ports, as patch_port_handle() and
l3gateway_port_handle(), so it's hard for me to see what redundancy it
eliminates.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN question: Transmission of link-local IPv6 multicast

2017-08-10 Thread Mark Michelson
On Thu, Aug 10, 2017 at 1:14 PM Ben Pfaff  wrote:

> On Thu, Aug 10, 2017 at 02:41:30PM +, Mark Michelson wrote:
> > I'm curious about the current behavior when ovn-controller originates a
> > packet and transmits it over a link-local IPv6 multicast address. Will
> the
> > packet be delivered only to nodes on the same chassis, or can the packet
> > also reach on-link nodes on other chassis?
>
> When an OVN logical switch receives an Ethernet broadcast or multicast
> packet from a given port, it forwards it to all of its other logical
> switch ports, regardless of chassis.  There are special cases for ARP
> and IPv6 ND, to avoid forwarding them across chassis for known IP
> addresses.  I don't think that we have any special cases for IPv6
> link-local multicast.  Limiting these within a chassis would probably
> require new mechanisms, because logical flows, by design, do not respect
> physical boundaries.
>
>
Interesting you mention IPv6 ND here. RAs fall under the umbrella of IPv6
ND. So depending on how IPv6 ND is detected, it may be that RAs will be
limited from being sent to other chassis. The interesting bit here is the
"for known IP addresses" part. I'll take a closer look at the flows to see
what that means. If the all-nodes link-local multicast address is one of
these known IP addresses, then things might just work without having to do
much extra.


> > The reason I ask is that I'm working towards expanding OVN's IPv6
> neighbor
> > discovery capabilities, and the first step on that path is being able to
> > send periodic router advertisements (RAs). In a traditional physical
> > network, each interface on a router may send periodic RAs to its
> link-local
> > all-nodes address (ff02::1). This way all on-link nodes receive the RA,
> and
> > the world is a happy place.
> >
> > OVN is not a traditional physical router, though :)
> >
> > With OVN, each hypervisor has an ovn-controller instance running,
> blurring
> > the lines about what exactly constitutes a "router interface". From an
> > ovn-nb point of view, a logical router port constitutes a router
> interface.
> > However, in practice, I'm not sure if this holds true once you get down
> to
> > the ovn-controller level. Let's say that a single link (via a logical
> > switch) is spread across three hypervisors. If the ovn-controller on each
> > of those hypervisors sends an RA to the link-local all-nodes address,
> would
> > that result in nodes on each hypervisor receiving three RAs? Or would
> each
> > node only receive one RA since it only receives the RA from the local
> > ovn-controller?
>
> I believe that this would result in 3x advertisements.
>

That's what I was afraid of. This could end up being a bit spammy if all
ovn-controllers are sending RAs. But as I mentioned earlier, this may
actually be taken care of since RAs are part of IPv6 ND.


>
> > I suspect the answer to this all comes down to how openflow flows get
> > installed on each OVS instance. What I'm curious about is the current
> > behavior of OVN in this regard.
>
> I guess that it would be more efficient for each chassis to individually
> generate these router advertisements and ensure that they do not
> propagate beyond the physical chassis boundary?  I can think of ways to
> do that, especially if ovn-controller were generating them.  This is
> because when ovn-controller generates packets, it can tag them with
> additional metadata that indicate the purpose.  For example, to allow it
> to send a chassis-limited packet, we could implement a new metadata bit
> that suppresses output through tunnels, and ovn-controller could set
> that bit.
>

Cool, thanks for the pointer. If the current IPv6 ND-related rules don't
already do what I need, I'll look into the metadata idea.


>
> > P.S. Someone please tell me if ovs-discuss would have been a more
> suitable
> > list for this question. I figured since it was related to a feature I was
> > developing that this would be the proper list, though.
>
> Seems like a legit dev discussion to me.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/5] netdev-dpdk: Add netdev_dpdk_vhost_txq_flush function.

2017-08-10 Thread Bodireddy, Bhanuprakash
>>
  } else {
 +/* If the queue is disabled in the guest, the 
 corresponding qid
 + * map shall be set to OVS_VHOST_QUEUE_DISABLED(-2).
 + *
 + * The packets that were queued in 'qid' could be 
 potentially
 + * stuck and needs to be dropped.
 + *
 + * XXX: The queues may be already disabled in the guest so
 + * flush function in this case only helps in updating 
 stats
 + * and freeing memory.
 + */
 +netdev_dpdk_vhost_txq_flush(>up, qid, 0);
  dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
  }
  netdev_dpdk_remap_txqs(dev);
>>
>> 'netdev_dpdk_remap_txqs()', actually, is able to change mapping for
>> all the disabled in guest queues. So, we need to flush all of them
>> while remapping somewhere inside the function.
>> One other thing is that there is a race window between flush and
>> mapping update where another process able to enqueue more packets in
>> just flushed queue. The order of operations should be changed, or both
>> of them should be done under the same tx_lock. I think, it's required
>> to make tx_q[].map field atomic to fix the race condition, because
>> send function takes the 'map' and then locks the corresponding queue.
>> It wasn't an issue before, because packets in case of race was just
>> dropped on attempt to send to disabled queue, but with this patch
>> applied they will be enqueued to the intermediate queue and stuck there.
>
>Making 'map' atomic will not help. To solve the race we should make 'reading
>of map + enqueue' an atomic operation by some spinlock.
>Like this:
>
>vhost_send:
>
>qid = qid % netdev->n_txq;
>rte_spinlock_lock(>tx_q[qid].tx_lock);
>
>mapped_qid = dev->tx_q[qid].map;
>
>if (qid != mapped_qid) {
>rte_spinlock_lock(>tx_q[mapped_qid].tx_lock);
>}
>
>tx_enqueue(mapped_qid, pkts, cnt);
>
>if (qid != mapped_qid) {
>rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock);
>}
>
>rte_spinlock_unlock(>tx_q[qid].tx_lock);
>
>
>txq remapping inside 'netdev_dpdk_remap_txqs()' or
>'vring_state_changed()':
>
>qid - queue we need to remap.
>new_qid - queue we need to remap to.
>
>rte_spinlock_lock(>tx_q[qid].tx_lock);
>
>mapped_qid = dev->tx_q[qid].map;
>if (qid != mapped_qid) {
>rte_spinlock_lock(>tx_q[mapped_qid].tx_lock);
>}
>
>tx_flush(mapped_qid)
>
>if (qid != mapped_qid) {
>rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock);
>}
>
>dev->tx_q[qid].map = new_qid;
>
>rte_spinlock_unlock(>tx_q[qid].tx_lock);
>
>
>Above schema should work without races, but looks kind of ugly and requires
>taking of additional spinlock on each send.
>
>P.S. Sorry for talking with myself. Just want to share my thoughts.

Hi Ilya,

Can you please review the below changes based on what you suggested above. 
As the problem only happens when the queues are enabled/disabled in the guest, 
I did some  preliminary testing with the below changes by sending some traffic 
in to the VM
and enabling and disabling the queues inside the guest the same time. 

Vhost_send()
-
qid = qid % netdev->n_txq;

/* Acquire tx_lock before reading tx_q[qid].map and enqueueing packets.
 * tx_q[].map gets updated in vring_state_changed() when vrings are
 * enabled/disabled in the guest. */
rte_spinlock_lock(>tx_q[qid].tx_lock);

mapped_qid = dev->tx_q[qid].map;
if (OVS_UNLIKELY(qid != mapped_qid)) {
rte_spinlock_lock(>tx_q[mapped_qid].tx_lock);
}

if (OVS_UNLIKELY(!is_vhost_running(dev) || mapped_qid < 0
 || !(dev->flags & NETDEV_UP))) {
rte_spinlock_lock(>stats_lock);
dev->stats.tx_dropped+= cnt;
rte_spinlock_unlock(>stats_lock);

for (i = 0; i < total_pkts; i++) {
dp_packet_delete(pkts[i]);
}

if (OVS_UNLIKELY(qid != mapped_qid)) {
rte_spinlock_unlock(>tx_q[mapped_qid].tx_lock);
}
rte_spinlock_unlock(>tx_q[qid].tx_lock);

return;
}

cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
/* Check has QoS has been configured for the netdev */
cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt);
dropped = total_pkts - cnt;

int idx = 0;
struct dpdk_tx_queue *txq = >tx_q[mapped_qid];
while (idx < cnt) {
txq->vhost_burst_pkts[txq->vhost_pkt_cnt++] = pkts[idx++];

if (txq->vhost_pkt_cnt >= 

Re: [ovs-dev] [PATCHv2 2/4] ovsdb-idl: Avoid mutable type specifier.

2017-08-10 Thread Ben Pfaff
I like that.

On Thu, Aug 10, 2017 at 11:54:11AM +0800, Gao Zhenyu wrote:
> How about mutable --> is_mutable ?
> 
> 
> Thanks
> Zhenyu Gao
> 
> 2017-08-10 6:27 GMT+08:00 Joe Stringer :
> 
> > In C++, 'mutable' is a keyword. If this is used as the name for a field,
> > then C++ compilers can get confused about the context and fail to
> > compile references to such fields. Rename the field to 'mutable_' to
> > avoid this issue.
> >
> > Signed-off-by: Joe Stringer 
> > ---
> > v2: Rebase.
> > ---
> >  lib/ovsdb-idl-provider.h | 2 +-
> >  lib/ovsdb-idl.c  | 2 +-
> >  ovsdb/ovsdb-idlc.in  | 2 +-
> >  3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h
> > index a2eb8cac67d7..aa1fa9390572 100644
> > --- a/lib/ovsdb-idl-provider.h
> > +++ b/lib/ovsdb-idl-provider.h
> > @@ -89,7 +89,7 @@ struct ovsdb_idl_row {
> >  struct ovsdb_idl_column {
> >  char *name;
> >  struct ovsdb_type type;
> > -bool mutable;
> > +bool mutable_;
> >  void (*parse)(struct ovsdb_idl_row *, const struct ovsdb_datum *);
> >  void (*unparse)(struct ovsdb_idl_row *);
> >  };
> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> > index 227aa5fbfcb2..d474075fa990 100644
> > --- a/lib/ovsdb-idl.c
> > +++ b/lib/ovsdb-idl.c
> > @@ -2871,7 +2871,7 @@ bool
> >  ovsdb_idl_is_mutable(const struct ovsdb_idl_row *row,
> >   const struct ovsdb_idl_column *column)
> >  {
> > -return column->mutable || (row->new && !row->old);
> > +return column->mutable_ || (row->new && !row->old);
> >  }
> >
> >  /* Returns false if 'row' was obtained from the IDL, true if it was
> > initialized
> > diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in
> > index f065ef1c68fe..36b7d2700bb6 100755
> > --- a/ovsdb/ovsdb-idlc.in
> > +++ b/ovsdb/ovsdb-idlc.in
> > @@ -1268,7 +1268,7 @@ void
> >   .type = {
> >  %(type)s
> >   },
> > - .mutable = %(mutable)s,
> > + .mutable_ = %(mutable)s,
> >   .parse = %(s)s_parse_%(c)s,
> >   .unparse = %(s)s_unparse_%(c)s,
> >  },\n""" % {'P': prefix.upper(),
> > --
> > 2.13.3
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/4] ovsdb-idl: Avoid class declaration.

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 01:01:32PM +0800, Gao Zhenyu wrote:
> How about:
> struct ovsdb_idl_table {
> ...
> const struct ovsdb_idl_table_class *table_class
> 
> }
> 
> struct ovsdb_idl {
> 
>  const struct ovsdb_idl_class *idl_class;
> 

Why make it longer?

> Besides of that, I see many places consume the table class.
> Do you mind to make a macro helps to fetch the class?
> Like:
> 
> #define OVSDB_GET_TABLE_CLASS(row) \
> ((row)->table->class)

Ugh.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly

2017-08-10 Thread Ben Pfaff
On Wed, Aug 09, 2017 at 04:00:53PM -0400, Aaron Conole wrote:
> As part of a devargs rework in DPDK, the PCI header file was removed, and
> needs to be directly included.  This isn't required to build with 17.05 or
> earlier, but will be required should a future update happen.
> 
> Signed-off-by: Aaron Conole 

Just to be sure, this patch doesn't cause a build failure with 17.05 or
earlier, right?  If it does not, then it's a no-brainer to apply it.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovsdb-server replication: Possible to exclude syncing specific column of a table??

2017-08-10 Thread Ben Pfaff
OK.

I see what you're talking about.  It's not a bug, then.  OVSDB is just
doing its job ensuring referential integrity.

Please feel free to submit a patch to implement the feature that you
want.

On Thu, Aug 10, 2017 at 06:07:09AM +0530, Arunkumar Rg wrote:
> Hi Ben,
> 
> Thanks for your response!
> 
> I have reported this to b...@openvswitch.org.
> 
> Below is what I have shared to b...@openvswitch.org.
> 
> Issue summary:
> --
> On setting, "ovsdb-server/set-sync-exclude-tables" for certain tables,
> ovsdb-server replication does not work.
> i.e the ovsdb-server which connects to the active ovsdb-server doesn't
> become 'backup'.
> 
> Expected:
> -
> ovsdb-server which connects to active ovsdb-server should become 'backup'
> and recieve sync for all the tables except for the tables specified in the
> ctl command "ovsdb-server/set-sync-exclude-tables".
> 
> Reproduction steps:
> ---
> ["db:Hardware_vtep"---OVSDB-SERVER(Active)-(ptcp:3.1.1.2:7000
> )-]--[-OVSDB-SERVER(Backup)---"db:Hardware_vtep"]
> 
> 1. Start two ovsdb-server instances with db schema as "Hardware_vtep". Lets
> call them instance-1 and instance-2.
> 
> ps -ef | grep ovsdb-server
> root  2704  2093  0 19:56 ?00:00:03 ovsdb-server
> /opt/ovsdb/db/vtep.db --remote=punix:/opt/ovsdb/db/db.sock
> --unixctl=/opt/ovsdb/ovsdb-server.ctl
> 
> 2. At instance-1: Add an entry in 'Manager' table.
> vtep-ctl --db=unix:/opt/ovsdb/db/db.sock set-manager ssl:
> 10.133.130.126:6640
> 
> ovsdb-client dump unix:/opt/ovsdb/db/db.sock<<---showing only the
> tables which is relevant to this issue.
> Global table
> _uuidmanagers
> other_config switches
> 
> --  
> 4121ba6c-a729-45b1-8f69-806b32584d04
> [73affd9e-07f0-422f-a394-47ae2a1ab499] {}
> 
> Manager table
> _uuidinactivity_probe is_connected
> max_backoff other_config status  target
>   
> ---  --- -
> 73affd9e-07f0-422f-a394-47ae2a1ab499 []   false []
>  {}  "ssl:10.133.130.126:6640"
> 
> 
> 3. Now do ovsdb replication related configs:
> 3.1. At instance-1:
> ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl ovsdb-server/add-remote
> ptcp:7000:3.1.1.2
> 
> 3.2 At instance-2:
> ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl
> ovsdb-server/set-active-ovsdb-server tcp:3.1.1.2:7000
> ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl
> ovsdb-server/set-sync-exclude-tables hardware_vtep:Manager
> ovs-appctl -t /opt/ovsdb/ovsdb-server.ctl
> ovsdb-server/connect-active-ovsdb-server
> 
> 4. Issue reproduced now!!
> Instance-2 will not become 'Backup' and it'll remain in 'Active' state.
> 
> This seem to be due to the following error, while processing the
> notification for 'Global' table:
> 
> "Aug  9 12:26:38 Pollux daemon.err ovsdb-server:
> ovs|00070|ovsdb_error|ERR|unexpected ovsdb error: referential integrity
> violation: Table Global column managers row
> 4121ba6c-a729-45b1-8f69-806b32584d04 references nonexistent row
> 5dbedec1-8221-435f-89b3-503867d0e987 in t"
> 
> Note: In the hardware_vtep schema, 'Global' table has a reference to
> 'Manager' table.
> 
> Additional inputs:
> --
> bash-3.2# ovsdb-server --version
> ovsdb-server (Open vSwitch) 2.7.0
> 
> Linux version 4.1.14
> 
> Thanks,
> Arun.
> 
> 
> On Wed, Aug 9, 2017 at 9:27 PM, Ben Pfaff  wrote:
> 
> > On Wed, Aug 09, 2017 at 05:34:16PM +0530, Arunkumar Rg wrote:
> > > Hi All,
> > >
> > > I need a clarification on ovsdb-server replication's
> > > set-sync-exclude-tables.
> > >
> > > *Is there a way, by which we can say exclude syncing specific column of a
> > > table??*
> > >
> > > The use case I'm looking at is:
> > >
> > >1. The replication is configured for DB "hardware_vtep".
> > >2. I'm trying to use the "ovsdb-server/set-sync-exclude-tables"
> > unixctl
> > >command for table "hardware_vtep:Manager" on the backup ovsdb-server.
> > >3. After this the backup ovsdb-server gets replica(notification) from
> > >the active's db except for 'Manager' table.
> > >4. At this poiint, while processing the notification for 'Global'
> > table,
> > >I see the below error:
> > >5. "Aug  9 12:26:38 Pollux daemon.err ovsdb-server:
> > >ovs|00070|ovsdb_error|ERR|unexpected ovsdb error: referential
> > integrity
> > >violation: Table Global column managers row
> > >4121ba6c-a729-45b1-8f69-806b32584d04 references nonexistent row
> > >5dbedec1-8221-435f-89b3-503867d0e987 in t""
> > >6. I think this error could be avoided if we can say exclude syncing
> > >'Manager' column in 'Global' table also.
> >
> > It sounds like a bug.  Probably, you 

Re: [ovs-dev] [PATCH] redhat: add vfio udev rules

2017-08-10 Thread Russell Bryant
On Wed, Aug 9, 2017 at 4:36 PM, Aaron Conole  wrote:
> This commit builds on the non-root ovs work and adds a udev rule which will
> automatically set the group permissions of vfio devices.
>
> Signed-off-by: Aaron Conole 
> ---
> Systemd folks say that this is not something that should be a part of systemd,
> but should be part of openvswitch.  I can make a case for either, but it seems
> it's probably an uphill battle to get it in through systemd.  This should be
> applied on branch-2.8, as it _could_ be considered a "fix" for
> commit e3e738a3d058 ("redhat: allow dpdk to also run as non-root user").
>
>  rhel/automake.mk| 1 +
>  rhel/openvswitch-fedora.spec.in | 8 
>  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
>  3 files changed, 10 insertions(+)
>  create mode 100644 rhel/usr_lib_udev_rules.d_91-vfio.rules

Thanks, I added a fixes header and applied this to master and branch-2.8.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/2] OF support and translation of generic encap and decap

2017-08-10 Thread Ben Pfaff
I'm inclined to make the action name specific to the header,
e.g. encap_nsh, decap_nsh.  There doesn't have to be a one-to-one
correspondence between syntax and OpenFlow encoding.

On Thu, Aug 10, 2017 at 03:25:20PM +, Jan Scheurich wrote:
> The generic code today in function parse_ENCAP() uses the encap header string 
> also as string for the property class. I am afraid that implementer of 
> subsequent encap headers might not realize that this is a temporary 
> implementation shortcut that should have been generalized.
> 
> The minimum we should add now is a comment explaining this, such as, for 
> example:
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
> index bfc8a80..84522e0 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4134,9 +4134,19 @@ parse_ed_props(const uint16_t prop_class, char **arg, 
> int *n_props, struct ofpbu
>  return NULL;
>  }
> 
> -/* The string representation of the encap action is
> - * encap(header_type(prop=,tlv(,,),...))
> - */
> +/* The generic string representation of the encap action is
> + * encap()
> + * encap((=,(),...))
> + *
> + * TODO: The current implementation only supports the simple case that all
> + * encap parameters for a given encap header type are in a single property
> + * class, identified by the same keyword as the encap header type.
> + * To represent different property classes allowed by the OF action, the
> + * syntax should be generalized as follows:
> + *
> + * encap((=,(),...),
> + *   (=,(),...),...)
> +*/
> 
>  static char * OVS_WARN_UNUSED_RESULT
>  parse_ENCAP(char *arg,
> 
> BR, Jan
> 
> > -Original Message-
> > From: Ben Pfaff [mailto:b...@ovn.org]
> > Sent: Tuesday, 08 August, 2017 21:36
> > To: Jan Scheurich 
> > Cc: Yi Yang ; d...@openvswitch.org; Zoltán Balogh 
> > 
> > Subject: Re: [PATCH v5 1/2] OF support and translation of generic encap and 
> > decap
> > 
> > We can add the additional syntax at the same time we add something that
> > has the need for it.
> > 
> > On Tue, Aug 08, 2017 at 03:52:32PM +, Jan Scheurich wrote:
> > > I know this comment is late as the patch has been merged already, but I 
> > > just returned from vacation and only found today:
> > >
> > > I agree that the new simplified ovs-ofctl syntax for encap properties 
> > > looks quite neat for the NSH use case at hand:
> > >
> > > encap(nsh(md_type=1))
> > > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
> > >
> > > But unfortunately we have fallen into the trap of oversimplifying. The 
> > > generic encap action (drafted in EXT-382) distinguishes between
> > the encapsulation packet type (in this case (1,0x894f) for NSH) and the 
> > encap property class (in this case 0x4 for NSH), which may be
> > different for every property in an encap action.
> > >
> > > Using the "nsh" keyword both as shorthand for the packet type and the 
> > > property class selector works here because the NSH encap action
> > only supports NSH properties so far, and in many simple applications of the 
> > generic encap action there may well be 1-1 relation between
> > the encap packet type and the class of all supported properties. But the 
> > ONF draft specifically decouples the packet type from property
> > classes for a richer set of use cases.
> > >
> > > For example there could be a more powerful generic encap operation to add 
> > > multiple protocol layers in one action. In this case the
> > encap properties would likely belong to several property classes. Or an 
> > encapsulation header re-uses a general encap property already
> > defined elsewhere, which should not be duplicated.
> > >
> > > In order not to restrict the generality of the ovs-ofctl syntax, I'd 
> > > suggest to separate packet type and the property class as follows:
> > >
> > > encap()
> > > encap((,(=,()),...)
> > >
> > > For NSH the syntax would look:
> > >
> > > encap(nsh,nsh(md_type=1))
> > > encap(nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
> > >
> > > To be user friendly we can also allow the above more concise syntax in 
> > > the special case of a 1-1 relation between packet type and
> > property class. In that case we could re-add the generalization of the 
> > syntax after the OVS 2.8 release without braking backward
> > compatibility.
> > >
> > > BR, Jan
> > >
> > > > -Original Message-
> > > > From: Yi Yang [mailto:yi.y.y...@intel.com]
> > > > Sent: Wednesday, 02 August, 2017 10:04
> > > > To: d...@openvswitch.org
> > > > Cc: b...@ovn.org; Jan Scheurich ; Yi Yang 
> > > > ; Zoltán Balogh
> > > > 
> > > > Subject: [PATCH v5 1/2] OF support and translation of generic encap and 
> > > > decap
> > > >
> > > > From: Jan Scheurich 
> > > >
> > > > This commit adds support for the OpenFlow actions generic 

Re: [ovs-dev] OVN question: Transmission of link-local IPv6 multicast

2017-08-10 Thread Ben Pfaff
On Thu, Aug 10, 2017 at 02:41:30PM +, Mark Michelson wrote:
> I'm curious about the current behavior when ovn-controller originates a
> packet and transmits it over a link-local IPv6 multicast address. Will the
> packet be delivered only to nodes on the same chassis, or can the packet
> also reach on-link nodes on other chassis?

When an OVN logical switch receives an Ethernet broadcast or multicast
packet from a given port, it forwards it to all of its other logical
switch ports, regardless of chassis.  There are special cases for ARP
and IPv6 ND, to avoid forwarding them across chassis for known IP
addresses.  I don't think that we have any special cases for IPv6
link-local multicast.  Limiting these within a chassis would probably
require new mechanisms, because logical flows, by design, do not respect
physical boundaries.

> The reason I ask is that I'm working towards expanding OVN's IPv6 neighbor
> discovery capabilities, and the first step on that path is being able to
> send periodic router advertisements (RAs). In a traditional physical
> network, each interface on a router may send periodic RAs to its link-local
> all-nodes address (ff02::1). This way all on-link nodes receive the RA, and
> the world is a happy place.
> 
> OVN is not a traditional physical router, though :)
> 
> With OVN, each hypervisor has an ovn-controller instance running, blurring
> the lines about what exactly constitutes a "router interface". From an
> ovn-nb point of view, a logical router port constitutes a router interface.
> However, in practice, I'm not sure if this holds true once you get down to
> the ovn-controller level. Let's say that a single link (via a logical
> switch) is spread across three hypervisors. If the ovn-controller on each
> of those hypervisors sends an RA to the link-local all-nodes address, would
> that result in nodes on each hypervisor receiving three RAs? Or would each
> node only receive one RA since it only receives the RA from the local
> ovn-controller?

I believe that this would result in 3x advertisements.

> I suspect the answer to this all comes down to how openflow flows get
> installed on each OVS instance. What I'm curious about is the current
> behavior of OVN in this regard.

I guess that it would be more efficient for each chassis to individually
generate these router advertisements and ensure that they do not
propagate beyond the physical chassis boundary?  I can think of ways to
do that, especially if ovn-controller were generating them.  This is
because when ovn-controller generates packets, it can tag them with
additional metadata that indicate the purpose.  For example, to allow it
to send a chassis-limited packet, we could implement a new metadata bit
that suppresses output through tunnels, and ovn-controller could set
that bit.

> P.S. Someone please tell me if ovs-discuss would have been a more suitable
> list for this question. I figured since it was related to a feature I was
> developing that this would be the proper list, though.

Seems like a legit dev discussion to me.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.

2017-08-10 Thread Darrell Ball


-Original Message-
From: Aaron Conole 
Date: Thursday, August 10, 2017 at 10:13 AM
To: Darrell Ball 
Cc: ovs dev 
Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on 
init.

Darrell Ball  writes:

> -Original Message-
> From: Aaron Conole 
> Date: Wednesday, August 9, 2017 at 12:51 PM
> To: Darrell Ball 
> Cc: Joe Stringer , Darrell Ball , ovs
> dev 
> Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum 
flags on init.
>
> Darrell Ball  writes:
> 
> > Thanks Joe
> > I forgot to add your Tested-by to V5; I have been testing this 
myself;
> > but let me know if you would like it added – I can send a V6.
> 
> It will automatically be added by patchwork.  It is sufficient to
> download (ex: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI=
 )
> 
> I usually use patchwork when working on a series - if I download it 
from
> patchwork, I can be confident that all the tags are applied properly, 
so
> I won't forget.  Plus, all the discussion happens there, so I can
> quickly browse it.
>
> Thanks Aaron
> In this case, the Tested-by was applied to V4 and I sent a V5 which did 
not have the Tested-by.
> The easiest solution would have been to ask Joe to respond to V5 – well, 
it is moot now anyways with V6 on the way.

Yep, that happens.

> The full list is available at:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE=
 
>
> Thanks, I am on this page most of the day (
> 
> It would actually be cool to have a few more admins to troll patchwork
> and do things like ping for status (ex:
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4=
  
>
> I don’t have any context on this patch.
> Aaron, would you like to resurrect this patch with rebase ?; it is RHEL 
related.

It doesn't need a rebase; it applies cleanly.  The version of patchwork
running doesn't recognize my ack, so someone will have to add it
manually (or omit it... doesn't matter too much, since Ben ack'd as
well).

But it would be cool to get it applied;  maybe even to get back as far
as branch-2.7 (I think it was posted in time for 2.7 branch).


One thing is I don’t know if Daniele had a chance to test this, although it 
looks simple enough.
Did you get a chance to test it ? if so, maybe a tested-by  ?





> is this still needed?  it
> does seem to mostly apply).  Then we could make sure we don't miss
> things.
> 
> > Darrell
> >
> > -Original Message-
> > From:  on behalf of Joe Stringer 

> > Date: Tuesday, August 8, 2017 at 4:51 PM
> > To: Darrell Ball 
> > Cc: ovs dev 
> > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL 
checksum
> > flags on init.
> >
> > On 8 August 2017 at 16:39, Darrell Ball  
wrote:
> > > Reset the DPDK HWOL checksum flags in dp_packet_init_.
> > > The new HWOL bad checksum flag is uninitialized on non-dpdk 
ports and
> > > this is noticed as test failures using netdev-dummy ports 
where the bad
> > > checksum flag is checked.
> > >
> > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum
> > > validation functions.")
> > > CC: Sugesh Chandran 
> > > Signed-off-by: Darrell Ball 
> > > ---
> > 
> > Tested-by: Joe Stringer 
> > Tested-at: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_joestringer_openvswitch_jobs_262464859=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=d664GG1tgARiN5uEJxebCnczlUnMDtHdWluYN0dRc5g=
 
> > 
> > I'll let those more familiar with this code provide the review.
> > 

Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags on init.

2017-08-10 Thread Aaron Conole
Darrell Ball  writes:

> -Original Message-
> From: Aaron Conole 
> Date: Wednesday, August 9, 2017 at 12:51 PM
> To: Darrell Ball 
> Cc: Joe Stringer , Darrell Ball , ovs
> dev 
> Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum flags 
> on init.
>
> Darrell Ball  writes:
> 
> > Thanks Joe
> > I forgot to add your Tested-by to V5; I have been testing this myself;
> > but let me know if you would like it added – I can send a V6.
> 
> It will automatically be added by patchwork.  It is sufficient to
> download (ex: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_799499_mbox=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=HRG5wrQrEXCl_AA8uAGbroVSvYGUIQjQkF8HrMRvNYI=
>  )
> 
> I usually use patchwork when working on a series - if I download it from
> patchwork, I can be confident that all the tags are applied properly, so
> I won't forget.  Plus, all the discussion happens there, so I can
> quickly browse it.
>
> Thanks Aaron
> In this case, the Tested-by was applied to V4 and I sent a V5 which did not 
> have the Tested-by.
> The easiest solution would have been to ask Joe to respond to V5 – well, it 
> is moot now anyways with V6 on the way.

Yep, that happens.

> The full list is available at:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_project_openvswitch_list_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=rqaCq5Jr-Vy-cfQ_t5px-zGtdb55CQn9cIAZCC1PvfE=
>  
>
> Thanks, I am on this page most of the day (
> 
> It would actually be cool to have a few more admins to troll patchwork
> and do things like ping for status (ex:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_719492_=DwIFaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=LK8s59ajLx5MbmH5Ng8SG1F1nAgKZJ_0JBs7phbl6C8=y0RPF47IRVw8BujWHzB9oc7IUihgLuCenLfteZ2vUr4=
>   
>
> I don’t have any context on this patch.
> Aaron, would you like to resurrect this patch with rebase ?; it is RHEL 
> related.

It doesn't need a rebase; it applies cleanly.  The version of patchwork
running doesn't recognize my ack, so someone will have to add it
manually (or omit it... doesn't matter too much, since Ben ack'd as
well).

But it would be cool to get it applied;  maybe even to get back as far
as branch-2.7 (I think it was posted in time for 2.7 branch).

> is this still needed?  it
> does seem to mostly apply).  Then we could make sure we don't miss
> things.
> 
> > Darrell
> >
> > -Original Message-
> > From:  on behalf of Joe Stringer 
> 
> > Date: Tuesday, August 8, 2017 at 4:51 PM
> > To: Darrell Ball 
> > Cc: ovs dev 
> > Subject: Re: [ovs-dev] [patch_v4] dp-packet: Reset DPDK HWOL checksum
> > flags on init.
> >
> > On 8 August 2017 at 16:39, Darrell Ball  wrote:
> > > Reset the DPDK HWOL checksum flags in dp_packet_init_.
> > > The new HWOL bad checksum flag is uninitialized on non-dpdk ports 
> and
> > > this is noticed as test failures using netdev-dummy ports where 
> the bad
> > > checksum flag is checked.
> > >
> > > Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum
> > > validation functions.")
> > > CC: Sugesh Chandran 
> > > Signed-off-by: Darrell Ball 
> > > ---
> > 
> > Tested-by: Joe Stringer 
> > Tested-at: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_joestringer_openvswitch_jobs_262464859=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=d664GG1tgARiN5uEJxebCnczlUnMDtHdWluYN0dRc5g=
>  
> > 
> > I'll let those more familiar with this code provide the review.
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=jXM8RI10bFYBt8zzW6xxX0MRb4YxJ_uCcOdGTE0sfJo=kCTOzFrO_3CdwW2mM_wcQZXMHDtzpo8cAuKlHSOcxTw=
>  
> > 
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > 
> 

Re: [ovs-dev] OVS-DPDK public meeting

2017-08-10 Thread Kevin Traynor
9th August 2017

ATTENDEES: Aaron C, Antonio F, Ciara L, Darrell B,
Olga A, Sugesh C, Mark K, Michael L, Johan T, Peter S, Ian S, Kevin T,
Yipeng W, Billy O'M, Finn C, Georg S, Jan S, Simon H, Ben P, Ori, (may
have missed some)

===
GENERAL
===
- OVS 2.8
-- OVS 2.8 branched
-- Looking at end of August for release
-- Bug fixes at this point

- DPDK 17.05.2
-- Discussion on whether this would be out in time for OVS 2.8
-- Probably too late for validation with OVS 2.8

- DPDK 17.08
-- Released this week

- Staging branch
-- Darrell gave us an update about this
-- No serious problems reported so far
-- Plan to submit merge requests on regular basis (~1 week)
-- Request for usual "applied" email reply to merged patches

- Feature proposals for OVS 2.9
-- Ian will send list of features Intel are planning to work on

- Conferences fyi
-- DPDK Userspace - September 26-27 Dublin
-- DPDK Summit - November 14-15 San Jose
-- OVS Conference - TBC, may align with DPDK summit
-- Openstack Summit - November 6-8 Sydney


PERFORMANCE/FEATURES

- Tx batching
-- Won't make OVS 2.8
-- Bhanu mentioned "Additional changes" on ML. Request to ensure all
required patches are submitted.

- Cuckoo distributor
-- Antonio requested some testing for this.


HARDWARE OFFLOAD

- Mellanox will submit patches based on collaboration with Napatech
- Suggestion/Request to try and make higher layers follow same model as
tc offload
- Request for Joe to review


On 07/25/2017 02:25 PM, Kevin Traynor wrote:
> Hi All,
> 
> The OVS-DPDK public meetings have moved to Wednesday's at the same time.
> Everyone is welcome, it's a chance to share status/plans etc.
> 
> It's scheduled for every 2 weeks starting 26th July. Meeting time is
> currently @ 4pm UTC.
> 
> You can connect through Bluejeans or through dial in:
> 
> https://bluejeans.com/139318596
> 
> US: +1.408.740.7256 
> UK: +44.203.608.5256 
> Germany: +49.32.221.091256 
> Ireland: +353.1.697.1256 
> Other numbers at https://www.bluejeans.com/numbers
> Meeting ID: 139318596
> 
> thanks,
> Kevin.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/2] netdev: Refactor destruction of netdev_ports.

2017-08-10 Thread Greg Rose

On 08/08/2017 11:23 AM, Joe Stringer wrote:

An upcoming patch will reuse this from elsewhere.

Signed-off-by: Joe Stringer 
---
  lib/netdev.c | 14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/netdev.c b/lib/netdev.c
index 7e9896b82928..3e8b211857d7 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -2255,6 +2255,15 @@ netdev_ports_get(odp_port_t port_no, const struct 
dpif_class *dpif_class)
  return ret;
  }

+static void
+netdev_port_data_destroy(struct port_to_netdev_data *data)
+{
+dpif_port_destroy(>dpif_port);
+netdev_close(data->netdev); /* unref and possibly close */
+hmap_remove(_to_netdev, >node);
+free(data);
+}
+
  int
  netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
  {
@@ -2266,10 +2275,7 @@ netdev_ports_remove(odp_port_t port_no, const struct 
dpif_class *dpif_class)
  data = netdev_ports_lookup(port_no, dpif_class);

  if (data) {
-dpif_port_destroy(>dpif_port);
-netdev_close(data->netdev); /* unref and possibly close */
-hmap_remove(_to_netdev, >node);
-free(data);
+netdev_port_data_destroy(data);
  ret = 0;
  }



LGTM

Reviewed-by: Greg Rose 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] dpif-netdev: Per-port conditional EMC insert.

2017-08-10 Thread Ilya Maximets
Conditional EMC insert helps a lot in scenarios with high numbers
of parallel flows, but in current implementation this option affects
all the threads and ports at once. There are scenarios there we have
different number of flows on different ports. For example, if one
of the VMs encapsulates traffic using additional headers, it will
recieve large number of flows but only few flows will come out of
this VM. In this scenario it's much faster to use EMC instead of
classifier for traffic from the VM, but it's better to disable EMC
for the traffic which flows to VM.

To handle above issue 'emc-insert-inv-prob' was converted to per-port
option. Default value and behaviour kept as is.

For example, following command sets the insertion probability for
packets that came from port 'dpdk0' to ~1/20, i.e. ~5%:

  ovs-vsctl set interface dpdk0 other_config:emc-insert-inv-prob=20

Signed-off-by: Ilya Maximets 
---
 Documentation/howto/dpdk.rst |   4 +-
 NEWS |   2 +-
 lib/dpif-netdev.c| 106 +--
 tests/pmd.at |   7 ++-
 vswitchd/vswitch.xml |  42 ++---
 5 files changed, 106 insertions(+), 55 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d7f6610..c620961 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -389,9 +389,9 @@ EMC Insertion Probability
 -
 By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC).
 It is possible to change this insertion probability by setting the
-``emc-insert-inv-prob`` option::
+``emc-insert-inv-prob`` option for the desired interface::
 
-$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N
+$ ovs-vsctl set interface  other_config:emc-insert-inv-prob=N
 
 where:
 
diff --git a/NEWS b/NEWS
index 66eb936..a7bfdaf 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,6 @@
 Post-v2.8.0
 
-   - Nothing yet.
+   - EMC insertion probability turned to per-port other_config.
 
 v2.8.0 - xx xxx 
 -
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 6d42393..94e7bc4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -266,9 +266,6 @@ struct dp_netdev {
 struct ovs_mutex meter_locks[N_METER_LOCKS];
 struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
 
-/* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
-OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min;
-
 /* Protects access to ofproto-dpif-upcall interface during revalidator
  * thread synchronization. */
 struct fat_rwlock upcall_rwlock;
@@ -364,6 +361,8 @@ struct dp_netdev_port {
 unsigned n_rxq; /* Number of elements in 'rxqs' */
 unsigned *txq_used; /* Number of threads that use each tx queue. */
 struct ovs_mutex txq_used_mutex;
+uint32_t emc_insert_min;/* Probability of EMC insertions is a factor
+ * of 'emc_insert_min'. */
 char *type; /* Port type as requested by user. */
 char *rxq_affinity_list;/* Requested affinity of rx queues. */
 };
@@ -487,6 +486,7 @@ struct dp_netdev_pmd_cycles {
 struct polled_queue {
 struct netdev_rxq *rx;
 odp_port_t port_no;
+uint32_t emc_insert_min;
 };
 
 /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
@@ -508,6 +508,8 @@ struct tx_port {
 struct dp_netdev_pmd_thread_ctx {
 /* Latest measured time. */
 long long now;
+/* EMC insertion probability context for the current processing cycle. */
+uint32_t emc_insert_min;
 /* Used to count cycles. See 'cycles_count_end()' */
 unsigned long long last_cycles;
 };
@@ -1202,8 +1204,6 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 
 conntrack_init(>conntrack);
 
-atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
-
 cmap_init(>poll_threads);
 
 ovs_mutex_init(>tx_qid_pool_mutex);
@@ -1485,6 +1485,7 @@ port_create(const char *devname, const char *type,
 port->netdev = netdev;
 port->type = xstrdup(type);
 port->sf = sf;
+port->emc_insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
 port->need_reconfigure = true;
 ovs_mutex_init(>txq_used_mutex);
 
@@ -2104,8 +2105,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
  * default the value is UINT32_MAX / 100 which yields an insertion
  * probability of 1/100 ie. 1% */
 
-uint32_t min;
-atomic_read_relaxed(>dp->emc_insert_min, );
+uint32_t min = pmd->ctx.emc_insert_min;
 
 if (min && random_uint32() <= min) {
 emc_insert(>flow_cache, key, flow);
@@ -2914,10 +2914,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 const char *cmask = smap_get(other_config, "pmd-cpu-mask");
-unsigned long long 

[ovs-dev] [PATCH 1/2] dpif-netdev: Keep latest measured time for PMD thread.

2017-08-10 Thread Ilya Maximets
In current implementation 'now' variable updated once on each
receive cycle and passed through the whole datapath via function
arguments. It'll be better to keep this variable inside PMD
thread structure to be able to get it at any time. Such solution
will save the stack memory and simplify possible modifications
in current logic.

This patch introduces new structure 'dp_netdev_pmd_thread_ctx'
contained by 'struct dp_netdev_pmd_thread' to store any processing
context of this PMD thread. For now, only time and cycles moved to
that structure. Can be extended in the future.

Signed-off-by: Ilya Maximets 
---

Will be used in the next patch to not pass the probability
through the whole datapath.

 lib/dpif-netdev.c | 117 ++
 1 file changed, 65 insertions(+), 52 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..6d42393 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -504,6 +504,16 @@ struct tx_port {
 struct hmap_node node;
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'ctx' member. */
+struct dp_netdev_pmd_thread_ctx {
+/* Latest measured time. */
+long long now;
+/* Used to count cycles. See 'cycles_count_end()' */
+unsigned long long last_cycles;
+};
+
+static void pmd_thread_ctx_time_update(struct dp_netdev_pmd_thread *);
+
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
@@ -556,8 +566,8 @@ struct dp_netdev_pmd_thread {
 /* Cycles counters */
 struct dp_netdev_pmd_cycles cycles;
 
-/* Used to count cicles. See 'cycles_counter_end()' */
-unsigned long long last_cycles;
+/* Current context of the PMD thread. */
+struct dp_netdev_pmd_thread_ctx ctx;
 
 struct latch exit_latch;/* For terminating the pmd thread. */
 struct seq *reload_seq;
@@ -630,8 +640,7 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   struct dp_packet_batch *,
   bool may_steal, const struct flow *flow,
   const struct nlattr *actions,
-  size_t actions_len,
-  long long now);
+  size_t actions_len);
 static void dp_netdev_input(struct dp_netdev_pmd_thread *,
 struct dp_packet_batch *, odp_port_t port_no);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
@@ -679,9 +688,9 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread 
*pmd);
 
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
-   long long now, bool purge);
+   bool purge);
 static int dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd,
-  struct tx_port *tx, long long now);
+  struct tx_port *tx);
 
 static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
@@ -723,6 +732,12 @@ emc_cache_slow_sweep(struct emc_cache *flow_cache)
 flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) & EM_FLOW_HASH_MASK;
 }
 
+static inline void
+pmd_thread_ctx_time_update(struct dp_netdev_pmd_thread *pmd)
+{
+pmd->ctx.now = time_msec();
+}
+
 /* Returns true if 'dpif' is a netdev or dummy dpif, false otherwise. */
 bool
 dpif_is_netdev(const struct dpif *dpif)
@@ -2839,6 +2854,9 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 ovs_mutex_lock(>non_pmd_mutex);
 }
 
+/* Update current time in PMD context. */
+pmd_thread_ctx_time_update(pmd);
+
 /* The action processing expects the RSS hash to be valid, because
  * it's always initialized at the beginning of datapath processing.
  * In this case, though, 'execute->packet' may not have gone through
@@ -2851,8 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 
 dp_packet_batch_init_packet(, execute->packet);
 dp_netdev_execute_actions(pmd, , false, execute->flow,
-  execute->actions, execute->actions_len,
-  time_msec());
+  execute->actions, execute->actions_len);
 
 if (pmd->core_id == NON_PMD_CORE_ID) {
 ovs_mutex_unlock(>non_pmd_mutex);
@@ -3073,7 +3090,7 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd)
 OVS_ACQUIRES(_counter_fake_mutex)
 OVS_NO_THREAD_SAFETY_ANALYSIS
 {
-pmd->last_cycles = cycles_counter();
+pmd->ctx.last_cycles = cycles_counter();
 }
 
 /* Stop counting cycles and add them to the counter 'type' */
@@ -3083,7 +3100,7 @@ cycles_count_end(struct 

[ovs-dev] [PATCH 0/2] Per-port EMC insertion probability.

2017-08-10 Thread Ilya Maximets
Ilya Maximets (2):
  dpif-netdev: Keep latest measured time for PMD thread.
  dpif-netdev: Per-port conditional EMC insert.

 Documentation/howto/dpdk.rst |   4 +-
 NEWS |   2 +-
 lib/dpif-netdev.c| 223 ++-
 tests/pmd.at |   7 +-
 vswitchd/vswitch.xml |  42 
 5 files changed, 171 insertions(+), 107 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags on init.

2017-08-10 Thread Darrell Ball


-Original Message-
From: Darrell Ball 
Date: Wednesday, August 9, 2017 at 1:38 PM
To: "Chandran, Sugesh" , Ben Pfaff 
Cc: "d...@openvswitch.org" 
Subject: Re: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags on 
init.



-Original Message-
From: "Chandran, Sugesh" 
Date: Wednesday, August 9, 2017 at 12:55 PM
To: Darrell Ball , Ben Pfaff 
Cc: "d...@openvswitch.org" 
Subject: RE: [ovs-dev] [patch_v5] dp-packet: Reset DPDK HWOL checksum flags 
on init.



Regards
_Sugesh


> > >
> > > Correct, I reused reset_dp_packet_checksum_ol_flags() to 
do the
> > initialization as well
> > > I could also have created a separate function.
> > >
> > > In case a DPDK dev is used, those flags will be managed 
by DPDK.
> > >
> > >  That sounds like a
> > > bug in itself--is there a missing call to initialize 
the mbuf
> somewhere?
> > >
> > > Are you suggesting to initialize the whole mbuf for each 
packet ?
> >
> > The issue that I'm raising is that it's unusual to take an
> > uninitialized, indeterminate field, and then initialize it 
by clearing a
> > few bits.  It's much more conventional to initialize it by 
setting it to
> > zero, like this:
> >
> > p->mbuf.ol_flags = 0;
> >
> >
> > That is better; I will create a separate function then.
> > I will resend
> > Thanks
> [Sugesh] I also agree with Ben here.
> Currently OVS uses only checksum offload flags from mbuf(As I am 
aware
> of).
> But there are other flag bits that may get used in future like 
TSO.
> So its better to initialize the mbuf properly before using.
> 
> Here is the mbuf reset function in DPDK that get called when an 
existing
> memory is mapped to
> Mbuf.
> I believe only the ol_flags are relevant for now in OVS.
> 
> There is no higher cost associated with initializing all the ol_flags 
vs some
> flags, so that is fine.
> It will be done twice in the case of a packet received from a dpdk 
device, but
> it is a small cost.
> I was more concerned about the dual responsibility conflict when 
packets are
> received from a DPDK device and this is why I wanted to limit the 
scope of
> the flag management in OVS; it should be ok, though.
> Hence, I mentioned that I will initialize all the ol_flags.
> 
> JTBC, are you suggesting to initialize all the fields below ?
> This would mean when packets are received from a DPDK dev, both the 
rte
> library and OVS would separately initialize all the fields below – 
meaning, it
> would be done twice for each packet.
> 
[Sugesh] No, we don’t have to initialize all the fields. But we can 
have a generic function
to init all mbuf fields that are relevant in OVS. 
For now its only ol_flags. In the future this function must be updated 
when we use more
fields from rte_mbuf.

We are on the same page then.
I plan for the function to have a generic name, so this fine.


Sorry, I really didn’t get the case you mentioned above, where the init 
get called twice.
What I can see in the code is , during the dpdk mp init, the 
the dp_packet_init is called for rte_mbuf initialization. This happens 
at port initiation.

On packet reception, there wont any call to dp_packet_init.

I meant for the memory related to packets that are received, not during the 
reception time itself.
However, the memory will be reused for subsequent packets, right. You can’t 
use a piece of memory
once and never use it again – this is called a memory leak. This means each 
mbuf will need to be initialized again and again.

I did some background investigation around this Sugesh.
I originally expected that an init function ptr for OVS only portion of 
dp-packet by passed and saved with the dpdk library to be
used to opaquely init when rte calls priv memory, which is ovs common memory, 
in this case.
The func ptr is used but only for init of the dpdk memory pools (which confirms 
what you mentioned) and not later when mbuf
memory is reused for subsequent packets. Some OVS common memory (i.e. non-mbuf 
portion) of dp-packet must still be initialized
for every packet; the transient fields and seem to be 

[ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

2017-08-10 Thread Ilya Maximets
This allows to collect packets from more than one RX burst
and send them together with a configurable maximum latency.

'other_config:output-max-latency' can be used to configure
time that a packet can wait in output batch for sending.

Signed-off-by: Ilya Maximets 
---
Notes:

* This is an RFC and should not be used for performance testing.
* Millisecond granularity is used for now. Can be easily switched
  to use microseconds instead.

 lib/dpif-netdev.c| 121 ++-
 vswitchd/vswitch.xml |  15 +++
 2 files changed, 115 insertions(+), 21 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index dcf55f3..0d78ae4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -85,6 +85,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 #define MAX_RECIRC_DEPTH 5
 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
 
+/* Use instant packet send by default. */
+#define DEFAULT_OUTPUT_MAX_LATENCY 0
+
 /* Configuration parameters. */
 enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
 enum { MAX_METERS = 65536 };/* Maximum number of meters. */
@@ -262,6 +265,9 @@ struct dp_netdev {
 struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
+/* The time that a packet can wait in output batch for sending. */
+atomic_uint32_t output_max_latency;
+
 /* Meters. */
 struct ovs_mutex meter_locks[N_METER_LOCKS];
 struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
@@ -502,6 +508,7 @@ struct tx_port {
 int qid;
 long long last_used;
 struct hmap_node node;
+long long output_time;
 struct dp_packet_batch output_pkts;
 };
 
@@ -574,6 +581,9 @@ struct dp_netdev_pmd_thread {
  * than 'cmap_count(dp->poll_threads)'. */
 uint32_t static_tx_qid;
 
+/* Number of filled output batches. */
+int n_output_batches;
+
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct hmap poll_list OVS_GUARDED;
@@ -669,9 +679,9 @@ static void dp_netdev_add_rxq_to_pmd(struct 
dp_netdev_pmd_thread *pmd,
 static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd,
struct rxq_poll *poll)
 OVS_REQUIRES(pmd->port_mutex);
-static void
+static int
 dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
-   long long now);
+   long long now, bool force);
 static void reconfigure_datapath(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex);
 static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
@@ -1193,6 +1203,7 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 conntrack_init(>conntrack);
 
 atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
+atomic_init(>output_max_latency, DEFAULT_OUTPUT_MAX_LATENCY);
 
 cmap_init(>poll_threads);
 
@@ -2858,7 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 dp_packet_batch_init_packet(, execute->packet);
 dp_netdev_execute_actions(pmd, , false, execute->flow,
   execute->actions, execute->actions_len, now);
-dp_netdev_pmd_flush_output_packets(pmd, now);
+dp_netdev_pmd_flush_output_packets(pmd, now, true);
 
 if (pmd->core_id == NON_PMD_CORE_ID) {
 ovs_mutex_unlock(>non_pmd_mutex);
@@ -2907,6 +2918,16 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 smap_get_ullong(other_config, "emc-insert-inv-prob",
 DEFAULT_EM_FLOW_INSERT_INV_PROB);
 uint32_t insert_min, cur_min;
+uint32_t output_max_latency, cur_max_latency;
+
+output_max_latency = smap_get_int(other_config, "output-max-latency",
+  DEFAULT_OUTPUT_MAX_LATENCY);
+atomic_read_relaxed(>output_max_latency, _max_latency);
+if (output_max_latency != cur_max_latency) {
+atomic_store_relaxed(>output_max_latency, output_max_latency);
+VLOG_INFO("Output maximum latency set to %"PRIu32" ms",
+  output_max_latency);
+}
 
 if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
 free(dp->pmd_cmask);
@@ -3107,11 +3128,12 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,
 non_atomic_ullong_add(>cycles.n[type], interval);
 }
 
-static void
+static int
 dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd,
struct tx_port *p, long long now)
 {
 int tx_qid;
+int output_cnt;
 bool dynamic_txqs;
 
 dynamic_txqs = p->port->dynamic_txqs;
@@ -3121,21 +3143,39 @@ dp_netdev_pmd_flush_output_on_port(struct 
dp_netdev_pmd_thread *pmd,
 tx_qid = pmd->static_tx_qid;
 }
 
+output_cnt = dp_packet_batch_size(>output_pkts);
 

[ovs-dev] [PATCH v3 3/4] netdev-dpdk: Remove useless cutlen.

2017-08-10 Thread Ilya Maximets
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8e3158f..ddcc574 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1819,8 +1819,6 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 int newcnt = 0;
 int i;
 
-dp_packet_batch_apply_cutlen(batch);
-
 for (i = 0; i < batch->count; i++) {
 int size = dp_packet_size(batch->packets[i]);
 
@@ -1879,7 +1877,6 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 dpdk_do_tx_copy(netdev, qid, batch);
 dp_packet_delete_batch(batch, true);
 } else {
-dp_packet_batch_apply_cutlen(batch);
 __netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count);
 }
 return 0;
@@ -1910,8 +1907,6 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
 int cnt = batch->count;
 struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
-dp_packet_batch_apply_cutlen(batch);
-
 cnt = netdev_dpdk_filter_packet_len(dev, pkts, cnt);
 cnt = netdev_dpdk_qos_run(dev, pkts, cnt);
 dropped = batch->count - cnt;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/4] dpif-netdev: Output packet batching.

2017-08-10 Thread Ilya Maximets
While processing incoming batch of packets they are scattered
across many per-flow batches and sent separately.

This becomes an issue while using more than a few flows.

For example if we have balanced-tcp OvS bonding with 2 ports
there will be 256 datapath internal flows for each dp_hash
pattern. This will lead to scattering of a single recieved
batch across all of that 256 per-flow batches and invoking
send for each packet separately. This behaviour greatly degrades
overall performance of netdev_send because of inability to use
advantages of vectorized transmit functions.
But the half (if 2 ports in bonding) of datapath flows will
have the same output actions. This means that we can collect
them in a single place back and send at once using single call
to netdev_send. This patch introduces per-port packet batch
for output packets for that purpose.

'output_pkts' batch is thread local and located in send port cache.

Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 104 ++
 1 file changed, 82 insertions(+), 22 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..a2a25be 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -502,6 +502,7 @@ struct tx_port {
 int qid;
 long long last_used;
 struct hmap_node node;
+struct dp_packet_batch output_pkts;
 };
 
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
@@ -633,9 +634,10 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   size_t actions_len,
   long long now);
 static void dp_netdev_input(struct dp_netdev_pmd_thread *,
-struct dp_packet_batch *, odp_port_t port_no);
+struct dp_packet_batch *, odp_port_t port_no,
+long long now);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
-  struct dp_packet_batch *);
+  struct dp_packet_batch *, long long now);
 
 static void dp_netdev_disable_upcall(struct dp_netdev *);
 static void dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd);
@@ -667,6 +669,9 @@ static void dp_netdev_add_rxq_to_pmd(struct 
dp_netdev_pmd_thread *pmd,
 static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread *pmd,
struct rxq_poll *poll)
 OVS_REQUIRES(pmd->port_mutex);
+static void
+dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
+   long long now);
 static void reconfigure_datapath(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex);
 static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
@@ -2809,6 +2814,7 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 struct dp_netdev *dp = get_dp_netdev(dpif);
 struct dp_netdev_pmd_thread *pmd;
 struct dp_packet_batch pp;
+long long now = time_msec();
 
 if (dp_packet_size(execute->packet) < ETH_HEADER_LEN ||
 dp_packet_size(execute->packet) > UINT16_MAX) {
@@ -2851,8 +2857,8 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 
 dp_packet_batch_init_packet(, execute->packet);
 dp_netdev_execute_actions(pmd, , false, execute->flow,
-  execute->actions, execute->actions_len,
-  time_msec());
+  execute->actions, execute->actions_len, now);
+dp_netdev_pmd_flush_output_packets(pmd, now);
 
 if (pmd->core_id == NON_PMD_CORE_ID) {
 ovs_mutex_unlock(>non_pmd_mutex);
@@ -3101,6 +3107,37 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,
 non_atomic_ullong_add(>cycles.n[type], interval);
 }
 
+static void
+dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd,
+   struct tx_port *p, long long now)
+{
+int tx_qid;
+bool dynamic_txqs;
+
+dynamic_txqs = p->port->dynamic_txqs;
+if (dynamic_txqs) {
+tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now);
+} else {
+tx_qid = pmd->static_tx_qid;
+}
+
+netdev_send(p->port->netdev, tx_qid, >output_pkts, true, dynamic_txqs);
+dp_packet_batch_init(>output_pkts);
+}
+
+static void
+dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
+   long long now)
+{
+struct tx_port *p;
+
+HMAP_FOR_EACH (p, node, >send_port_cache) {
+if (!dp_packet_batch_is_empty(>output_pkts)) {
+dp_netdev_pmd_flush_output_on_port(pmd, p, now);
+}
+}
+}
+
 static int
 dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
struct netdev_rxq *rx,
@@ -3113,10 +3150,13 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 dp_packet_batch_init();
 

[ovs-dev] [PATCH v3 0/4] Output packet batching.

2017-08-10 Thread Ilya Maximets
This patch-set inspired by [1] from Bhanuprakash Bodireddy.
Implementation of [1] looks very complex and introduces many pitfalls [2]
for later code modifications like possible packet stucks.

This version targeted to make simple and flexible output packet batching on
higher level without introducing and even simplifying netdev layer.

Patch set consists of 3 patches. All the functionality introduced in the
first patch. Two others are just cleanups of netdevs to not do unnecessary
things.

4th patch is just an RFC with possible time based implementation.
Should not be concidered for performance testing.

Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
significant performance improvement.
More accurate and intensive testing required.

[1] [PATCH v4 0/5] netdev-dpdk: Use intermediate queue during packet 
transmission.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337019.html

[2] For example:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html

Version 3:

* Rebased on current master.
* Time based RFC: fixed assert on n_output_batches <= 0.

Version 2:

* Rebased on current master.
* Added time based batching RFC patch.
* Fixed mixing packets with different sources in same batch.

Ilya Maximets (4):
  dpif-netdev: Output packet batching.
  netdev: Remove unused may_steal.
  netdev-dpdk: Remove useless cutlen.
  dpif-netdev: Time based output batching.

 lib/dpif-netdev.c | 197 ++
 lib/netdev-bsd.c  |   4 +-
 lib/netdev-dpdk.c |  30 +++-
 lib/netdev-dummy.c|   4 +-
 lib/netdev-linux.c|   4 +-
 lib/netdev-provider.h |   7 +-
 lib/netdev.c  |  12 +--
 lib/netdev.h  |   2 +-
 vswitchd/vswitch.xml  |  15 
 9 files changed, 208 insertions(+), 67 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/2] OF support and translation of generic encap and decap

2017-08-10 Thread Jan Scheurich
The generic code today in function parse_ENCAP() uses the encap header string 
also as string for the property class. I am afraid that implementer of 
subsequent encap headers might not realize that this is a temporary 
implementation shortcut that should have been generalized.

The minimum we should add now is a comment explaining this, such as, for 
example:

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index bfc8a80..84522e0 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4134,9 +4134,19 @@ parse_ed_props(const uint16_t prop_class, char **arg, 
int *n_props, struct ofpbu
 return NULL;
 }

-/* The string representation of the encap action is
- * encap(header_type(prop=,tlv(,,),...))
- */
+/* The generic string representation of the encap action is
+ * encap()
+ * encap((=,(),...))
+ *
+ * TODO: The current implementation only supports the simple case that all
+ * encap parameters for a given encap header type are in a single property
+ * class, identified by the same keyword as the encap header type.
+ * To represent different property classes allowed by the OF action, the
+ * syntax should be generalized as follows:
+ *
+ * encap((=,(),...),
+ *   (=,(),...),...)
+*/

 static char * OVS_WARN_UNUSED_RESULT
 parse_ENCAP(char *arg,

BR, Jan

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, 08 August, 2017 21:36
> To: Jan Scheurich 
> Cc: Yi Yang ; d...@openvswitch.org; Zoltán Balogh 
> 
> Subject: Re: [PATCH v5 1/2] OF support and translation of generic encap and 
> decap
> 
> We can add the additional syntax at the same time we add something that
> has the need for it.
> 
> On Tue, Aug 08, 2017 at 03:52:32PM +, Jan Scheurich wrote:
> > I know this comment is late as the patch has been merged already, but I 
> > just returned from vacation and only found today:
> >
> > I agree that the new simplified ovs-ofctl syntax for encap properties looks 
> > quite neat for the NSH use case at hand:
> >
> > encap(nsh(md_type=1))
> > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
> >
> > But unfortunately we have fallen into the trap of oversimplifying. The 
> > generic encap action (drafted in EXT-382) distinguishes between
> the encapsulation packet type (in this case (1,0x894f) for NSH) and the encap 
> property class (in this case 0x4 for NSH), which may be
> different for every property in an encap action.
> >
> > Using the "nsh" keyword both as shorthand for the packet type and the 
> > property class selector works here because the NSH encap action
> only supports NSH properties so far, and in many simple applications of the 
> generic encap action there may well be 1-1 relation between
> the encap packet type and the class of all supported properties. But the ONF 
> draft specifically decouples the packet type from property
> classes for a richer set of use cases.
> >
> > For example there could be a more powerful generic encap operation to add 
> > multiple protocol layers in one action. In this case the
> encap properties would likely belong to several property classes. Or an 
> encapsulation header re-uses a general encap property already
> defined elsewhere, which should not be duplicated.
> >
> > In order not to restrict the generality of the ovs-ofctl syntax, I'd 
> > suggest to separate packet type and the property class as follows:
> >
> > encap()
> > encap((,(=,()),...)
> >
> > For NSH the syntax would look:
> >
> > encap(nsh,nsh(md_type=1))
> > encap(nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
> >
> > To be user friendly we can also allow the above more concise syntax in the 
> > special case of a 1-1 relation between packet type and
> property class. In that case we could re-add the generalization of the syntax 
> after the OVS 2.8 release without braking backward
> compatibility.
> >
> > BR, Jan
> >
> > > -Original Message-
> > > From: Yi Yang [mailto:yi.y.y...@intel.com]
> > > Sent: Wednesday, 02 August, 2017 10:04
> > > To: d...@openvswitch.org
> > > Cc: b...@ovn.org; Jan Scheurich ; Yi Yang 
> > > ; Zoltán Balogh
> > > 
> > > Subject: [PATCH v5 1/2] OF support and translation of generic encap and 
> > > decap
> > >
> > > From: Jan Scheurich 
> > >
> > > This commit adds support for the OpenFlow actions generic encap
> > > and decap (as specified in ONF EXT-382) to the OVS control plane.
> > >
> > > CLI syntax for encap action with properties:
> > >   encap()
> > >   encap((=,(,,),...))
> > >
> > > For example:
> > >   encap(ethernet)
> > >   encap(nsh(md_type=1))
> > >   
> > > encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)))
> > >
> > > CLI syntax for decap action:
> > >   decap()
> > >   decap(packet_type(ns=,type=))
> > >
> > > For 

[ovs-dev] New Pricelist

2017-08-10 Thread Bonesca - Jona
    [ View in browser ]( http://r.newsletter.bonescamail.nl/nru6rn1aoatrf.html 
)   
 
[  ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k4taoatrd.html ) 
   [ Click here for our complete pricelist! ]( 
http://r.newsletter.bonescamail.nl/click/2n3cr2k5lqoatrd.html )   
 
SPECIAL PROMO FOR 1 AND 3 PALETS :    


   [ Click here to find all recent offers. ]( 
http://r.newsletter.bonescamail.nl/click/2n3cr2k6e6oatrd.html )     
This email was sent to d...@openvswitch.org
You received this email because you are registered with Bonesca Import en 
Export BV
 
[ Unsubscribe here ]( http://r.newsletter.bonescamail.nl/nru6rn1aoatrg.html )  

Sent by
[  ]( http://r.newsletter.bonescamail.nl/click/2n3cr2k76moatrd.html )     
© 2017 Bonesca Import en Export BV  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVN question: Transmission of link-local IPv6 multicast

2017-08-10 Thread Mark Michelson
Hi,

I'm curious about the current behavior when ovn-controller originates a
packet and transmits it over a link-local IPv6 multicast address. Will the
packet be delivered only to nodes on the same chassis, or can the packet
also reach on-link nodes on other chassis?

The reason I ask is that I'm working towards expanding OVN's IPv6 neighbor
discovery capabilities, and the first step on that path is being able to
send periodic router advertisements (RAs). In a traditional physical
network, each interface on a router may send periodic RAs to its link-local
all-nodes address (ff02::1). This way all on-link nodes receive the RA, and
the world is a happy place.

OVN is not a traditional physical router, though :)

With OVN, each hypervisor has an ovn-controller instance running, blurring
the lines about what exactly constitutes a "router interface". From an
ovn-nb point of view, a logical router port constitutes a router interface.
However, in practice, I'm not sure if this holds true once you get down to
the ovn-controller level. Let's say that a single link (via a logical
switch) is spread across three hypervisors. If the ovn-controller on each
of those hypervisors sends an RA to the link-local all-nodes address, would
that result in nodes on each hypervisor receiving three RAs? Or would each
node only receive one RA since it only receives the RA from the local
ovn-controller?

I suspect the answer to this all comes down to how openflow flows get
installed on each OVS instance. What I'm curious about is the current
behavior of OVN in this regard.

Thanks,
Mark Michelson

P.S. Someone please tell me if ovs-discuss would have been a more suitable
list for this question. I figured since it was related to a feature I was
developing that this would be the proper list, though.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly

2017-08-10 Thread Loftus, Ciara
> 
> On 08/09/2017 10:00 PM, Aaron Conole wrote:
> > As part of a devargs rework in DPDK, the PCI header file was removed, and
> > needs to be directly included.  This isn't required to build with 17.05 or
> > earlier, but will be required should a future update happen.
> >
> > Signed-off-by: Aaron Conole 
> 
> Tried to build with dpdk 17.08
> 
> Tested-By: Timothy Redaelli 

+1 for forwards compatibility

Acked-by: Ciara Loftus 

> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: include dpdk PCI header directly

2017-08-10 Thread Timothy M. Redaelli
On 08/09/2017 10:00 PM, Aaron Conole wrote:
> As part of a devargs rework in DPDK, the PCI header file was removed, and
> needs to be directly included.  This isn't required to build with 17.05 or
> earlier, but will be required should a future update happen.
> 
> Signed-off-by: Aaron Conole 

Tried to build with dpdk 17.08

Tested-By: Timothy Redaelli 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-10 Thread Yi Yang
OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in 2.8 release in compat mode by porting this.

Signed-off-by: Yi Yang 
---
 drivers/net/vxlan.c  |   7 ++
 include/net/nsh.h| 126 ++
 include/uapi/linux/openvswitch.h |  33 
 net/openvswitch/actions.c| 165 +++
 net/openvswitch/flow.c   |  41 ++
 net/openvswitch/flow.h   |   1 +
 net/openvswitch/flow_netlink.c   |  54 -
 7 files changed, 426 insertions(+), 1 deletion(-)
 create mode 100644 include/net/nsh.h

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index dbca067..843714c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
@@ -1267,6 +1268,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
case VXLAN_GPE_NP_IPV6:
*protocol = htons(ETH_P_IPV6);
break;
+   case VXLAN_GPE_NP_NSH:
+   *protocol = htons(ETH_P_NSH);
+   break;
case VXLAN_GPE_NP_ETHERNET:
*protocol = htons(ETH_P_TEB);
break;
@@ -1806,6 +1810,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 
vxflags,
case htons(ETH_P_IPV6):
gpe->next_protocol = VXLAN_GPE_NP_IPV6;
return 0;
+   case htons(ETH_P_NSH):
+   gpe->next_protocol = VXLAN_GPE_NP_NSH;
+   return 0;
case htons(ETH_P_TEB):
gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
return 0;
diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 000..96477a1
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,126 @@
+#ifndef __NET_NSH_H
+#define __NET_NSH_H 1
+
+
+/*
+ * Network Service Header:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|O|C|R|R|R|R|R|R|Length   |   MD Type   |  Next Proto   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Service Path ID| Service Index |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   |
+ * ~   Mandatory/Optional Context Header   ~
+ * |   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * Ver = The version field is used to ensure backward compatibility
+ *   going forward with future NSH updates.  It MUST be set to 0x0
+ *   by the sender, in this first revision of NSH.
+ *
+ * O = OAM. when set to 0x1 indicates that this packet is an operations
+ * and management (OAM) packet.  The receiving SFF and SFs nodes
+ * MUST examine the payload and take appropriate action.
+ *
+ * C = context. Indicates that a critical metadata TLV is present.
+ *
+ * Length : total length, in 4-byte words, of NSH including the Base
+ *  Header, the Service Path Header and the optional variable
+ *  TLVs.
+ * MD Type: indicates the format of NSH beyond the mandatory Base Header
+ *  and the Service Path Header.
+ *
+ * Next Protocol: indicates the protocol type of the original packet. A
+ *  new IANA registry will be created for protocol type.
+ *
+ * Service Path Identifier (SPI): identifies a service path.
+ *  Participating nodes MUST use this identifier for Service
+ *  Function Path selection.
+ *
+ * Service Index (SI): provides location within the SFP.
+ *
+ * [0] https://tools.ietf.org/html/draft-ietf-sfc-nsh-13
+ */
+
+/**
+ * struct nsh_md1_ctx - Keeps track of NSH context data
+ * @nshc<1-4>: NSH Contexts.
+ */
+struct nsh_md1_ctx {
+   __be32 c[4];
+};
+
+struct nsh_md2_tlv {
+   __be16 md_class;
+   u8 type;
+   u8 length;
+   u8 md_value[];
+};
+
+struct nsh_hdr {
+   __be16 ver_flags_len;
+   u8 md_type;
+   u8 next_proto;
+   __be32 path_hdr;
+   union {
+   struct nsh_md1_ctx md1;
+   struct nsh_md2_tlv md2[0];
+   };
+};
+
+/* Masking NSH header fields. */
+#define NSH_VER_MASK   0xc000
+#define NSH_VER_SHIFT  14
+#define NSH_FLAGS_MASK 0x3fc0
+#define NSH_FLAGS_SHIFT6
+#define NSH_LEN_MASK   0x003f
+#define NSH_LEN_SHIFT  0
+
+#define NSH_SPI_MASK   0xff00
+#define NSH_SPI_SHIFT  8
+#define NSH_SI_MASK0x00ff
+#define NSH_SI_SHIFT   0
+
+#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */
+#define ETH_P_NSH   0x894F   /* Ethertype for NSH. */
+
+/* NSH Base Header Next Protocol. */
+#define NSH_P_IPV40x01
+#define NSH_P_IPV60x02
+#define NSH_P_ETHERNET0x03
+#define NSH_P_NSH