Re: [ovs-dev] [PATCH v2 1/2] netdev-dpdk: Use intermediate queue during packet transmission.

2017-01-17 Thread Ilya Maximets
Not a complete review. This code is full of races.
See details inline.

Best regards, Ilya Maximets.

On 17.01.2017 18:37, antonio.fische...@intel.com wrote:
> This patch implements the intermediate Tx queues on 'dpdk' type ports.
> 
> Test results:
>  * In worst case scenario with fewer packets per batch, a significant
>bottleneck is observed for netdev_dpdk_eth_send() function due to
>expensive MMIO writes.
> 
>  * Also its observed that CPI(cycles per instruction) Rate for the function
>stood between 3.15 and 4.1 which is significantly higher than acceptable
>limit of 1.0 for HPC applications and theoretical limit of 0.25 (As Backend
>pipeline can retire 4 micro-operations in a cycle).
> 
>  * With this patch, CPI for netdev_dpdk_eth_send() is at 0.55 and the overall
>throughput improved significantly.
> 
> 
> Signed-off-by: Antonio Fischetti 
> Signed-off-by: Bhanuprakash Bodireddy 
> Co-authored-by: Bhanuprakash Bodireddy 
> Signed-off-by: Markus Magnusson 
> Co-authored-by: Markus Magnusson 
> ---
>  lib/dpif-netdev.c | 53 +++--
>  lib/netdev-bsd.c  |  1 +
>  lib/netdev-dpdk.c | 82 
> ++-
>  lib/netdev-dummy.c|  1 +
>  lib/netdev-linux.c|  1 +
>  lib/netdev-provider.h |  8 +
>  lib/netdev-vport.c|  3 +-
>  lib/netdev.c  |  9 ++
>  lib/netdev.h  |  1 +
>  9 files changed, 149 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 3901129..58ac429 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -289,6 +289,8 @@ struct dp_netdev_rxq {
>  struct dp_netdev_pmd_thread *pmd;  /* pmd thread that will poll this 
> queue. */
>  };
>  
> +#define LAST_USED_QID_NONE -1
> +
>  /* A port in a netdev-based datapath. */
>  struct dp_netdev_port {
>  odp_port_t port_no;
> @@ -303,6 +305,8 @@ struct dp_netdev_port {
>  char *type; /* Port type as requested by user. */
>  char *rxq_affinity_list;/* Requested affinity of rx queues. */
>  bool need_reconfigure;  /* True if we should reconfigure netdev. */
> +int last_used_qid;  /* Last queue id where packets could be
> +   enqueued. */
>  };
>  
>  /* Contained by struct dp_netdev_flow's 'stats' member.  */
> @@ -619,6 +623,9 @@ static int dpif_netdev_xps_get_tx_qid(const struct 
> dp_netdev_pmd_thread *pmd,
>  static inline bool emc_entry_alive(struct emc_entry *ce);
>  static void emc_clear_entry(struct emc_entry *ce);
>  
> +static struct tx_port *pmd_send_port_cache_lookup
> +(const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> +
>  static void
>  emc_cache_init(struct emc_cache *flow_cache)
>  {
> @@ -3507,15 +3514,19 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
> *pmd,
>  return i;
>  }
>  
> +enum { DRAIN_TSC = 2ULL };
> +
>  static void *
>  pmd_thread_main(void *f_)
>  {
>  struct dp_netdev_pmd_thread *pmd = f_;
> -unsigned int lc = 0;
> +unsigned int lc = 0, lc_drain = 0;
>  struct polled_queue *poll_list;
>  bool exiting;
>  int poll_cnt;
>  int i;
> +uint64_t prev = 0, now = 0;
> +struct tx_port *tx_port;
>  
>  poll_list = NULL;
>  
> @@ -3548,6 +3559,26 @@ reload:
> poll_list[i].port_no);
>  }
>  
> +#define MAX_LOOP_TO_DRAIN 128
> +if (lc_drain++ > MAX_LOOP_TO_DRAIN) {
> +lc_drain = 0;
> +prev = now;
> +now = pmd->last_cycles;
> +if ((now - prev) > DRAIN_TSC) {
> +HMAP_FOR_EACH (tx_port, node, >tx_ports) {

'pmd->tx_ports' must be protected by 'pmd->port_mutex'. Also it can be changed
while pmd still working. I think you wanted something like 
'pmd->send_port_cache'.

> +if (tx_port->port->last_used_qid != LAST_USED_QID_NONE) {
> +/* This queue may contain some buffered packets 
> waiting
> + * to be sent out. */
> +netdev_txq_drain(tx_port->port->netdev,
> +tx_port->port->last_used_qid,
> +tx_port->port->dynamic_txqs);
> +/* Mark it as empty. */
> +tx_port->port->last_used_qid = LAST_USED_QID_NONE;

'port' is a pointer to the common structure --> 'port->last_used_qid' will be
concurrently updated by all threads --> total mess.

> +}
> +}
> +}
> +}
> +
>  if (lc++ > 1024) {
>  bool reload;
>  
> @@ -3883,6 +3914,7 @@ dp_netdev_add_port_tx_to_pmd(struct 
> dp_netdev_pmd_thread *pmd,
>  
>  tx->port = port;
>  tx->qid = -1;
> +

[ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration

2017-01-17 Thread Binbin Xu
After the hotplug of DPDK ports, a valid dpdk-devargs must be
specified. Otherwise, the DPDK device can't be available.

Signed-off-by: Binbin Xu 
---
 Documentation/faq/configuration.rst | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/Documentation/faq/configuration.rst 
b/Documentation/faq/configuration.rst
index c03d069..8bd0e11 100644
--- a/Documentation/faq/configuration.rst
+++ b/Documentation/faq/configuration.rst
@@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port?
 startup when other_config:dpdk-init is set to 'true'.
 
 Secondly, when adding a DPDK port, unlike a system port, the type for the
-interface must be specified. For example::
+interface and valid dpdk-devargs must be specified. For example::
 
 $ ovs-vsctl add-br br0
-$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
-
-Finally, it is required that DPDK port names begin with ``dpdk``.
+$ ovs-vsctl add-port br0 myportname -- set Interface myportname \
+type=dpdk options:dpdk-devargs=:06:00.0
 
 Refer to :doc:`/intro/install/dpdk` for more information on enabling and
 using DPDK with Open vSwitch.
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/2] netdev-dpdk: Use intermediate queue during packet transmission.

2017-01-17 Thread antonio . fischetti
After packet classification packets are queued in to batches depending
on the matching netdev flow. Thereafter each batch is processed to execute
the related actions. This becomes particularly inefficient if there
are few packets in each batch as rte_eth_tx_burst() incurs expensive MMIO
writes.

This patchset adds back the intermediate queue implementation. Packets are
queued and burst when their count is >= NETDEV_MAX_BURST. Also a drain
logic is refactored to handle packets hanging into the tx queues. Testing
shows significant performance gains with this implementation.

Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of
queueing of packets.")

V1 => V2:
  * xps_get_tx_qid() is no more called twice. The last used qid is stored so
the drain function will flush the right queue also when XPS is enabled.

  * netdev_txq_drain() is called unconditionally and not just for dpdk ports.

  * txq_drain() takes the 'tx_lock' for queue in case of dynamic tx queues.

  * Restored counting of dropped packets.

  * Changed scheduling of drain function.

  * Updated comments in netdev-provider.h

  * Fixed a comment in dp-packet.h.

Fischetti, Antonio (2):
  netdev-dpdk: Use intermediate queue during packet transmission.
  dp-packet: Fix comment on DPDK buffer data.

 lib/dp-packet.c   |  4 +--
 lib/dp-packet.h   | 10 +++
 lib/dpif-netdev.c | 53 +++--
 lib/netdev-bsd.c  |  1 +
 lib/netdev-dpdk.c | 82 ++-
 lib/netdev-dummy.c|  1 +
 lib/netdev-linux.c|  1 +
 lib/netdev-provider.h |  8 +
 lib/netdev-vport.c|  3 +-
 lib/netdev.c  |  9 ++
 lib/netdev.h  |  1 +
 11 files changed, 156 insertions(+), 17 deletions(-)

-- 
2.4.11

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/2] netdev-dpdk: Use intermediate queue during packet transmission.

2017-01-17 Thread antonio . fischetti
This patch implements the intermediate Tx queues on 'dpdk' type ports.

Test results:
 * In worst case scenario with fewer packets per batch, a significant
   bottleneck is observed for netdev_dpdk_eth_send() function due to
   expensive MMIO writes.

 * Also its observed that CPI(cycles per instruction) Rate for the function
   stood between 3.15 and 4.1 which is significantly higher than acceptable
   limit of 1.0 for HPC applications and theoretical limit of 0.25 (As Backend
   pipeline can retire 4 micro-operations in a cycle).

 * With this patch, CPI for netdev_dpdk_eth_send() is at 0.55 and the overall
   throughput improved significantly.


Signed-off-by: Antonio Fischetti 
Signed-off-by: Bhanuprakash Bodireddy 
Co-authored-by: Bhanuprakash Bodireddy 
Signed-off-by: Markus Magnusson 
Co-authored-by: Markus Magnusson 
---
 lib/dpif-netdev.c | 53 +++--
 lib/netdev-bsd.c  |  1 +
 lib/netdev-dpdk.c | 82 ++-
 lib/netdev-dummy.c|  1 +
 lib/netdev-linux.c|  1 +
 lib/netdev-provider.h |  8 +
 lib/netdev-vport.c|  3 +-
 lib/netdev.c  |  9 ++
 lib/netdev.h  |  1 +
 9 files changed, 149 insertions(+), 10 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3901129..58ac429 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -289,6 +289,8 @@ struct dp_netdev_rxq {
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that will poll this 
queue. */
 };
 
+#define LAST_USED_QID_NONE -1
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -303,6 +305,8 @@ struct dp_netdev_port {
 char *type; /* Port type as requested by user. */
 char *rxq_affinity_list;/* Requested affinity of rx queues. */
 bool need_reconfigure;  /* True if we should reconfigure netdev. */
+int last_used_qid;  /* Last queue id where packets could be
+   enqueued. */
 };
 
 /* Contained by struct dp_netdev_flow's 'stats' member.  */
@@ -619,6 +623,9 @@ static int dpif_netdev_xps_get_tx_qid(const struct 
dp_netdev_pmd_thread *pmd,
 static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
+static struct tx_port *pmd_send_port_cache_lookup
+(const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
+
 static void
 emc_cache_init(struct emc_cache *flow_cache)
 {
@@ -3507,15 +3514,19 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
*pmd,
 return i;
 }
 
+enum { DRAIN_TSC = 2ULL };
+
 static void *
 pmd_thread_main(void *f_)
 {
 struct dp_netdev_pmd_thread *pmd = f_;
-unsigned int lc = 0;
+unsigned int lc = 0, lc_drain = 0;
 struct polled_queue *poll_list;
 bool exiting;
 int poll_cnt;
 int i;
+uint64_t prev = 0, now = 0;
+struct tx_port *tx_port;
 
 poll_list = NULL;
 
@@ -3548,6 +3559,26 @@ reload:
poll_list[i].port_no);
 }
 
+#define MAX_LOOP_TO_DRAIN 128
+if (lc_drain++ > MAX_LOOP_TO_DRAIN) {
+lc_drain = 0;
+prev = now;
+now = pmd->last_cycles;
+if ((now - prev) > DRAIN_TSC) {
+HMAP_FOR_EACH (tx_port, node, >tx_ports) {
+if (tx_port->port->last_used_qid != LAST_USED_QID_NONE) {
+/* This queue may contain some buffered packets waiting
+ * to be sent out. */
+netdev_txq_drain(tx_port->port->netdev,
+tx_port->port->last_used_qid,
+tx_port->port->dynamic_txqs);
+/* Mark it as empty. */
+tx_port->port->last_used_qid = LAST_USED_QID_NONE;
+}
+}
+}
+}
+
 if (lc++ > 1024) {
 bool reload;
 
@@ -3883,6 +3914,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
 
 tx->port = port;
 tx->qid = -1;
+port->last_used_qid = LAST_USED_QID_NONE;
 
 hmap_insert(>tx_ports, >node, hash_port_no(tx->port->port_no));
 pmd->need_reload = true;
@@ -4538,7 +4570,24 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 } else {
 tx_qid = pmd->static_tx_qid;
 }
+//TODO Add UNLIKELY to the 1st condition?
+/* Is the current qid the same as the last one we used? */
+if ((p->port->last_used_qid != LAST_USED_QID_NONE) &&
+(p->port->last_used_qid != tx_qid)) {
+/* The current assigned queue was changed, we need to drain
+ * packets from the previous queue. */
+

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Ben Pfaff
It would be more helpful to have a simple reproduction case.

Why haven't you tried a newer version from branch-2.5?

On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote:
> Hi Ben,
> 
> Here i is more debug information related to this incident (still using 
> version 2.5.0):
> 
> Summary :
> 
> We think that there is some race condition involved in processing OF 
> Controller connections and Packet miss processing in ovs-vswitchd.
> 
> Reasoning :
> 
> Please consider the following GDB Debug Session:
> 
> Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, 
> protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
> (gdb) f 2
> #2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
> ofproto/connmgr.c:1613
> 1613  enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
> (gdb) p *ofconn
> $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, 
> next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, 
> band = OFPROTO_IN_BAND, enable_async_msgs = true, 
>   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, 
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
>   controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 
> 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 
> 0, n_modify = 0, 
>   first_op = -9223372036854775808, last_op = -9223372036854775808, 
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
> n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev 
> = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = 
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> (gdb) bt
> #0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at 
> ofproto/connmgr.c:999
> #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
> ofproto/connmgr.c:982
> #2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
> ofproto/connmgr.c:1613
> #3  0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, 
> version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, 
> table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, 
> honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
> #4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, 
> xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
> #5  0x004481b1 in revalidate_ukey (udpif=0x16a7300, 
> ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, 
> reval_seq=585728, recircs=0x7f2ace7f8a30)
> at ofproto/ofproto-dpif-upcall.c:1866
> #6  0x00448fb2 in revalidate (revalidator=0x1691990) at 
> ofproto/ofproto-dpif-upcall.c:2186
> #7  0x0044593e in udpif_revalidator (arg=0x1691990) at 
> ofproto/ofproto-dpif-upcall.c:862
> #8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at 
> lib/ovs-thread.c:340
> #9  0x7f2ad75c2184 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) f 1
> #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
> ofproto/connmgr.c:982
> 982   ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
> (gdb) l
> 977   {
> 978   if (ofconn->protocol == OFPUTIL_P_NONE &&
> 979   rconn_is_connected(ofconn->rconn)) {
> 980   int version = rconn_get_version(ofconn->rconn);
> 981   if (version > 0) {
> 982   ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
> 983   
> ofputil_protocol_from_ofp_version(version));
> 984   }
> 985   }
> 986   
> (gdb) p *ofconn
> $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, 
> next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, 
> band = OFPROTO_IN_BAND, enable_async_msgs = true, 
>   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, 
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
>   controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 
> 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 
> 0, n_modify = 0, 
>   first_op = -9223372036854775808, last_op = -9223372036854775808, 
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
> n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev 
> = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = 
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> (gdb) p ofconn
> $4 = (const struct ofconn *) 0x16d5810
> (gdb) c
> Continuing.
> [Thread 0x7f2ad79f5980 (LWP 20165) exited]
> 
> 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Vidyasagara Guntaka via dev
This issue happened on our in-use systems and we were trying to find a way
to move forward avoiding this issue so that we do not have to upgrade OVS
on thousands of our hypervisors causing down time. Our debugging did help
us avoid the issue for now by installing an explicit rule to to drop
packets when there is no match and this issue is not seen over many hours
of test runs.

We will definitely run this test with latest version.  But, will need more
time since we are busy with our release related activities.

Regards,
Sagar.

On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff  wrote:

> It would be more helpful to have a simple reproduction case.
>
> Why haven't you tried a newer version from branch-2.5?
>
> On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote:
> > Hi Ben,
> >
> > Here i is more debug information related to this incident (still using
> version 2.5.0):
> >
> > Summary :
> >
> > We think that there is some race condition involved in processing OF
> Controller connections and Packet miss processing in ovs-vswitchd.
> >
> > Reasoning :
> >
> > Please consider the following GDB Debug Session:
> >
> > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810,
> protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
> > (gdb) f 2
> > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > 1613  enum ofputil_protocol protocol =
> ofconn_get_protocol(ofconn);
> > (gdb) p *ofconn
> > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash =
> 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type =
> OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true,
> >   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID,
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170,
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0,
> >   controller_id = 0, reply_counter = 0x1673190, master_async_config =
> {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0,
> n_delete = 0, n_modify = 0,
> >   first_op = -9223372036854775808, last_op = -9223372036854775808,
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808,
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0,
> > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates =
> {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles =
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> > (gdb) bt
> > #0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD)
> at ofproto/connmgr.c:999
> > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> ofproto/connmgr.c:982
> > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > #3  0x00435261 in rule_dpif_lookup_from_table
> (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0,
> stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true,
> > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
> > #4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0,
> xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
> > #5  0x004481b1 in revalidate_ukey (udpif=0x16a7300,
> ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40,
> reval_seq=585728, recircs=0x7f2ace7f8a30)
> > at ofproto/ofproto-dpif-upcall.c:1866
> > #6  0x00448fb2 in revalidate (revalidator=0x1691990) at
> ofproto/ofproto-dpif-upcall.c:2186
> > #7  0x0044593e in udpif_revalidator (arg=0x1691990) at
> ofproto/ofproto-dpif-upcall.c:862
> > #8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at
> lib/ovs-thread.c:340
> > #9  0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/
> libpthread.so.0
> > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) f 1
> > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> ofproto/connmgr.c:982
> > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> ofconn),
> > (gdb) l
> > 977   {
> > 978   if (ofconn->protocol == OFPUTIL_P_NONE &&
> > 979   rconn_is_connected(ofconn->rconn)) {
> > 980   int version = rconn_get_version(ofconn->rconn);
> > 981   if (version > 0) {
> > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> ofconn),
> > 983   ofputil_protocol_from_ofp_
> version(version));
> > 984   }
> > 985   }
> > 986
> > (gdb) p *ofconn
> > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash =
> 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type =
> OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true,
> >   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID,
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170,
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0,
> >   controller_id = 0, reply_counter = 

Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.

2017-01-17 Thread Fischetti, Antonio
Will respin a patch v2 where we addressed your comments. 
So please let's continue our discussion on v2.

Thanks,
Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Fischetti, Antonio
> Sent: Thursday, January 12, 2017 5:40 PM
> To: Ilya Maximets ; Bodireddy, Bhanuprakash
> ; Aaron Conole 
> Cc: d...@openvswitch.org; Daniele Di Proietto 
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during
> packet transmission.
> 
> Hi Ilya,
> thanks for your detailed feedback. I've added a couple of replies
> inline about when the instant send happens and the may_steal param.
> 
> 
> Regards,
> Antonio
> 
> > -Original Message-
> > From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> > Sent: Tuesday, December 20, 2016 12:47 PM
> > To: Bodireddy, Bhanuprakash ; Aaron
> > Conole 
> > Cc: d...@openvswitch.org; Daniele Di Proietto ;
> > Thadeu Lima de Souza Cascardo ; Fischetti, Antonio
> > ; Markus Magnusson
> > 
> > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> during
> > packet transmission.
> >
> > On 20.12.2016 15:19, Bodireddy, Bhanuprakash wrote:
> > >> -Original Message-
> > >> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> > >> Sent: Tuesday, December 20, 2016 8:09 AM
> > >> To: Bodireddy, Bhanuprakash ; Aaron
> > >> Conole 
> > >> Cc: d...@openvswitch.org; Daniele Di Proietto
> ;
> > >> Thadeu Lima de Souza Cascardo ; Fischetti,
> Antonio
> > >> ; Markus Magnusson
> > >> 
> > >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> > during
> > >> packet transmission.
> > >>
> > >> On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote:
> > >>> Thanks Ilya and Aaron for reviewing this patch and providing your
> > >> comments, my reply inline.
> > >>>
> >  -Original Message-
> >  From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >  Sent: Monday, December 19, 2016 8:41 AM
> >  To: Aaron Conole ; Bodireddy, Bhanuprakash
> >  
> >  Cc: d...@openvswitch.org; Daniele Di Proietto
> >  ; Thadeu Lima de Souza Cascardo
> >  ; Fischetti, Antonio
> >  ; Markus Magnusson
> >  
> >  Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
> >  during packet transmission.
> > 
> >  Hi,
> > 
> >  I didn't test this patch yet. Bhanu, could you please describe your
> >  test scenario and performance results in more details.
> > >>>
> > >>> During the recent performance analysis improvements for classifier,
> we
> > >> found that bottleneck was also observed at flow batching.
> > >>> This was due to fewer packets in batch. To reproduce this, a simple
> > P2P test
> > >> case can be used with 30 IXIA streams and matching IP flow rules.
> > >>> Eg:  For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1
> > >>> ovs-ofctl add-flow br0
> > >>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2
> > >>>
> > >>> For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1
> > >>> ovs-ofctl add-flow br0
> > >>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2
> > >>>
> > >>> This leaves fewer packets in batches and
> > packet_batch_per_flow_execute()
> > >> shall be invoked for every batch.
> > >>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY
> case
> > for
> > >> 64 byte udp packets.
> > >>>
> >  It'll be nice if you provide throughput and latency measurement
> >  results for different scenarios and packet sizes. Latency is
> > important here.
> > >>> We are yet to do latency measurements in this case. With 30 IXIA
> > >>> streams comprising of 64 byte udp packets there was an throughput
> > >>> improvement of 30% in P2P case and 13-15% in PVP case(single queue).
> > we
> > >> will try to get the latency stats with and without this patch.
> > >>>
> > 
> >  About the patch itself:
> > 
> >   1. 'drain' called only for PMD threads. This will lead to
> >  broken connection with non-PMD ports.
> > >>> I see that non-PMD ports are handled with vswitchd thread.
> > >>> Tested PVP loopback case with tap ports and found to be working as
> > >>> expected. Can you let me know the specific case you are referring
> here
> > so
> > >> that I can verify if the patch breaks it.
> > >>
> > >> I meant something like this:
> > >>
> > >>
> > >> 

[ovs-dev] Lograr el éxito no es casualidad

2017-01-17 Thread Alta Dirección

 
 
 

 
 
 
 " style="display:block; border:none; outline:none; text-decoration:none; 
color:#ff !important; font-size:36px;" 
src="http://www.tumanualestudioenlinea.com/mail/img/npm-global-executive-mba.png;
 class="banner" / 
 
  
 
 
  
 
 Maestría Global Executive MBA
  
 
Las empresas requieren profesionales más efectivos en los puestos de 
dirección, con pensamiento global y local, con capacidad de motivación y 
gestión intangibles,
  preparados para evaluar y realizar el 
seguimiento de la estrategia a seguir para alcanzar objetivos.


Este programa está diseñado para que los participantes adquieran 
conocimientos, competencias y habilidades que les permitan una gestión efectiva 
y de éxito en sus empresas.


¿Quiere desarrollar las competencias directivas más relevantes?


  
 


Requiero Informacin 

  
 
  
 













Doble Titulacin por:

IOE Business School y

Universidad Rey Juan Carlos 

 
 
 
 
  
 
 
  
 E-Learning
100% Online
  
  
 
 
  
 Temario
Multimedia
  
  
 
 
  
 Progamas
Tutorizados
  
  

  
 
 
 
 
  
 High Quality   
  

 
  
 Garanta de Calidad
  
 Este programa ha superado un estricto proceso de certificacin por 
parte de la Universidad Rey Juan Carlos (Madrid).   
   

  

  
 
 
Si no 
quiere recibir ms publicidad de mster, maestras o 
cursos de capacitacin puede dar de baja su correo electrnico de 
esta lista de distribucin pulsando aqu. Muchas gracias de 
antemano por la atencin prestada. Reciba un cordial saludo. 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-17 Thread Ciara Loftus
Instead of counting all polling cycles as processing cycles, only count
the cycles where packets were received from the polling.

Signed-off-by: Georg Schmuecking 
Signed-off-by: Ciara Loftus 
Co-authored-by: Ciara Loftus 
---
v2:
- Rebase
---
 lib/dpif-netdev.c | 57 ++-
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3901129..3854c79 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -272,7 +272,10 @@ enum dp_stat_type {
 
 enum pmd_cycles_counter_type {
 PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */
-PMD_CYCLES_PROCESSING,  /* Cycles spent processing packets */
+PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful polling */
+PMD_CYCLES_PROCESSING,  /* Cycles spent successfully polling and
+ * processing polled packets */
+
 PMD_N_CYCLES
 };
 
@@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply,
 }
 
 ds_put_format(reply,
-  "\tpolling cycles:%"PRIu64" (%.02f%%)\n"
+  "\tidle cycles:%"PRIu64" (%.02f%%)\n"
   "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
-  cycles[PMD_CYCLES_POLLING],
-  cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100,
+  cycles[PMD_CYCLES_IDLE],
+  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
   cycles[PMD_CYCLES_PROCESSING],
   cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
 
@@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
 non_atomic_ullong_add(>cycles.n[type], interval);
 }
 
-static void
+/* Calculate the intermediate cycle result and add to the counter 'type' */
+static inline void
+cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
+  enum pmd_cycles_counter_type type)
+OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+unsigned long long new_cycles = cycles_counter();
+unsigned long long interval = new_cycles - pmd->last_cycles;
+pmd->last_cycles = new_cycles;
+
+non_atomic_ullong_add(>cycles.n[type], interval);
+}
+
+static int
 dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
struct netdev_rxq *rx,
odp_port_t port_no)
 {
 struct dp_packet_batch batch;
 int error;
+int batch_cnt = 0;
 
 dp_packet_batch_init();
-cycles_count_start(pmd);
 error = netdev_rxq_recv(rx, );
-cycles_count_end(pmd, PMD_CYCLES_POLLING);
 if (!error) {
 *recirc_depth_get() = 0;
 
-cycles_count_start(pmd);
+batch_cnt = batch.count;
 dp_netdev_input(pmd, , port_no);
-cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
 } else if (error != EAGAIN && error != EOPNOTSUPP) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
 VLOG_ERR_RL(, "error receiving data from %s: %s",
 netdev_rxq_get_name(rx), ovs_strerror(error));
 }
+
+return batch_cnt;
 }
 
 static struct tx_port *
@@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif)
 struct dp_netdev *dp = get_dp_netdev(dpif);
 struct dp_netdev_pmd_thread *non_pmd;
 uint64_t new_tnl_seq;
+int process_packets = 0;
 
 ovs_mutex_lock(>port_mutex);
 non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
 if (non_pmd) {
 ovs_mutex_lock(>non_pmd_mutex);
+cycles_count_start(non_pmd);
 HMAP_FOR_EACH (port, node, >ports) {
 if (!netdev_is_pmd(port->netdev)) {
 int i;
 
 for (i = 0; i < port->n_rxq; i++) {
-dp_netdev_process_rxq_port(non_pmd, port->rxqs[i].rx,
-   port->port_no);
+process_packets +=
+dp_netdev_process_rxq_port(non_pmd,
+   port->rxqs[i].rx,
+   port->port_no);
 }
 }
 }
+cycles_count_end(non_pmd, process_packets ? PMD_CYCLES_PROCESSING
+  : PMD_CYCLES_IDLE);
 dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
 ovs_mutex_unlock(>non_pmd_mutex);
 
@@ -3516,6 +3538,7 @@ pmd_thread_main(void *f_)
 bool exiting;
 int poll_cnt;
 int i;
+int process_packets = 0;
 
 poll_list = NULL;
 
@@ -3542,10 +3565,12 @@ reload:
 lc = UINT_MAX;
 }
 
+cycles_count_start(pmd);
 for (;;) {
 for (i = 0; i < poll_cnt; i++) {
-dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
-   poll_list[i].port_no);
+process_packets +=
+

Re: [ovs-dev] [PATCH v10 1/8] ovn: document logical routers and logical patch ports in ovn-architecture

2017-01-17 Thread Ben Pfaff
On Tue, Jan 17, 2017 at 01:45:02AM -0800, Mickey Spiegel wrote:
> This patch adds a description of logical routers and logical patch ports,
> including gateway routers, to ovn/ovn-architecture.7.xml.
> 
> Signed-off-by: Mickey Spiegel 

Thanks for working on the documentation!  I applied this to master,
folding in just one change (because I wanted to emphasize that the
behavior isn't different, just the implementation):

diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index d92f878..c0ceb65 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -1096,8 +1096,8 @@
 
   
 When the packet reaches table 65, the logical egress port is a logical
-patch port.  The behavior at table 65 differs depending on the OVS
-version:
+patch port.  The implementation in table 65 differs depending on the OVS
+version, although the observed behavior is meant to be the same:
   
 
   

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-17 Thread Kevin Traynor
On 01/17/2017 05:43 PM, Ciara Loftus wrote:
> Instead of counting all polling cycles as processing cycles, only count
> the cycles where packets were received from the polling.

This makes these stats much clearer. One minor comment below, other than
that

Acked-by: Kevin Traynor 

> 
> Signed-off-by: Georg Schmuecking 
> Signed-off-by: Ciara Loftus 
> Co-authored-by: Ciara Loftus 
> ---
> v2:
> - Rebase
> ---
>  lib/dpif-netdev.c | 57 
> ++-
>  1 file changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 3901129..3854c79 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -272,7 +272,10 @@ enum dp_stat_type {
>  
>  enum pmd_cycles_counter_type {
>  PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */

this is not used anymore and can be removed

> -PMD_CYCLES_PROCESSING,  /* Cycles spent processing packets */
> +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful polling 
> */
> +PMD_CYCLES_PROCESSING,  /* Cycles spent successfully polling and
> + * processing polled packets */
> +
>  PMD_N_CYCLES
>  };
>  
> @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply,
>  }
>  
>  ds_put_format(reply,
> -  "\tpolling cycles:%"PRIu64" (%.02f%%)\n"
> +  "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>"\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
> -  cycles[PMD_CYCLES_POLLING],
> -  cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100,
> +  cycles[PMD_CYCLES_IDLE],
> +  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
>cycles[PMD_CYCLES_PROCESSING],
>cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 
> 100);
>  
> @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>  non_atomic_ullong_add(>cycles.n[type], interval);
>  }
>  
> -static void
> +/* Calculate the intermediate cycle result and add to the counter 'type' */
> +static inline void
> +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
> +  enum pmd_cycles_counter_type type)
> +OVS_NO_THREAD_SAFETY_ANALYSIS
> +{
> +unsigned long long new_cycles = cycles_counter();
> +unsigned long long interval = new_cycles - pmd->last_cycles;
> +pmd->last_cycles = new_cycles;
> +
> +non_atomic_ullong_add(>cycles.n[type], interval);
> +}
> +
> +static int
>  dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
> struct netdev_rxq *rx,
> odp_port_t port_no)
>  {
>  struct dp_packet_batch batch;
>  int error;
> +int batch_cnt = 0;
>  
>  dp_packet_batch_init();
> -cycles_count_start(pmd);
>  error = netdev_rxq_recv(rx, );
> -cycles_count_end(pmd, PMD_CYCLES_POLLING);
>  if (!error) {
>  *recirc_depth_get() = 0;
>  
> -cycles_count_start(pmd);
> +batch_cnt = batch.count;
>  dp_netdev_input(pmd, , port_no);
> -cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
>  } else if (error != EAGAIN && error != EOPNOTSUPP) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>  
>  VLOG_ERR_RL(, "error receiving data from %s: %s",
>  netdev_rxq_get_name(rx), ovs_strerror(error));
>  }
> +
> +return batch_cnt;
>  }
>  
>  static struct tx_port *
> @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif)
>  struct dp_netdev *dp = get_dp_netdev(dpif);
>  struct dp_netdev_pmd_thread *non_pmd;
>  uint64_t new_tnl_seq;
> +int process_packets = 0;
>  
>  ovs_mutex_lock(>port_mutex);
>  non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>  if (non_pmd) {
>  ovs_mutex_lock(>non_pmd_mutex);
> +cycles_count_start(non_pmd);
>  HMAP_FOR_EACH (port, node, >ports) {
>  if (!netdev_is_pmd(port->netdev)) {
>  int i;
>  
>  for (i = 0; i < port->n_rxq; i++) {
> -dp_netdev_process_rxq_port(non_pmd, port->rxqs[i].rx,
> -   port->port_no);
> +process_packets +=
> +dp_netdev_process_rxq_port(non_pmd,
> +   port->rxqs[i].rx,
> +   port->port_no);
>  }
>  }
>  }
> +cycles_count_end(non_pmd, process_packets ? PMD_CYCLES_PROCESSING
> +  : PMD_CYCLES_IDLE);
>  dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
>  ovs_mutex_unlock(>non_pmd_mutex);
>  
> @@ -3516,6 

Re: [ovs-dev] [PATCH] dp-packet: Fix comment on DPDK buffer data.

2017-01-17 Thread Fischetti, Antonio
Self-NACK. Will include this change with another patch.

Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> Sent: Wednesday, January 11, 2017 5:11 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH] dp-packet: Fix comment on DPDK buffer data.
> 
> Small fix on the comments about the DPDK buffer data to refer
> to dp_packet_init_dpdk() function.
> 
> Signed-off-by: Antonio Fischetti 
> ---
>  lib/dp-packet.c | 4 ++--
>  lib/dp-packet.h | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index 793b54f..14ab15e 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -91,8 +91,8 @@ dp_packet_use_const(struct dp_packet *b, const void
> *data, size_t size)
> 
>  /* Initializes 'b' as an empty dp_packet that contains the 'allocated'
> bytes of
>   * memory starting at 'base'.  DPDK allocated dp_packet and *data is
> allocated
> - * from one continous memory region, so in memory data start right after
> - * dp_packet.  Therefore there is special method to free this type of
> + * from one continuous memory region, so in memory data start right after
> + * dp_packet.  Therefore there is a special method to free this type of
>   * buffer.  dp_packet base, data and size are initialized by dpdk rcv()
> so no
>   * need to initialize those fields. */
>  void
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index cf7d247..b941540 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -39,7 +39,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
>  DPBUF_STACK,   /* Un-movable stack space or static
> buffer. */
>  DPBUF_STUB,/* Starts on stack, may expand into heap.
> */
>  DPBUF_DPDK,/* buffer data is from DPDK allocated
> memory.
> -* ref to build_dp_packet() in netdev-
> dpdk. */
> +* Ref to dp_packet_init_dpdk() in dp-
> packet. */
>  };
> 
>  #define DP_PACKET_CONTEXT_SIZE 64
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Vidyasagara Guntaka via dev
Hi Ben,

Here i is more debug information related to this incident (still using version 
2.5.0):

Summary :

We think that there is some race condition involved in processing OF Controller 
connections and Packet miss processing in ovs-vswitchd.

Reasoning :

Please consider the following GDB Debug Session:

Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, 
protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
(gdb) f 2
#2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
ofproto/connmgr.c:1613
1613enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
(gdb) p *ofconn
$2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next 
= 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = 
OFPROTO_IN_BAND, enable_async_msgs = true, 
  role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers 
= {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
  controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 
0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, 
n_modify = 0, 
  first_op = -9223372036854775808, last_op = -9223372036854775808, 
next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 
0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 
0x16d5938, one = 0x0, mask = 0, n = 0}}
(gdb) bt
#0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at 
ofproto/connmgr.c:999
#1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
ofproto/connmgr.c:982
#2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
ofproto/connmgr.c:1613
#3  0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, 
version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, 
table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, 
honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
#4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, 
xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
#5  0x004481b1 in revalidate_ukey (udpif=0x16a7300, 
ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, 
reval_seq=585728, recircs=0x7f2ace7f8a30)
at ofproto/ofproto-dpif-upcall.c:1866
#6  0x00448fb2 in revalidate (revalidator=0x1691990) at 
ofproto/ofproto-dpif-upcall.c:2186
#7  0x0044593e in udpif_revalidator (arg=0x1691990) at 
ofproto/ofproto-dpif-upcall.c:862
#8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at 
lib/ovs-thread.c:340
#9  0x7f2ad75c2184 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 1
#1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
ofproto/connmgr.c:982
982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
(gdb) l
977 {
978 if (ofconn->protocol == OFPUTIL_P_NONE &&
979 rconn_is_connected(ofconn->rconn)) {
980 int version = rconn_get_version(ofconn->rconn);
981 if (version > 0) {
982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
983 
ofputil_protocol_from_ofp_version(version));
984 }
985 }
986 
(gdb) p *ofconn
$3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next 
= 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = 
OFPROTO_IN_BAND, enable_async_msgs = true, 
  role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers 
= {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
  controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 
0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, 
n_modify = 0, 
  first_op = -9223372036854775808, last_op = -9223372036854775808, 
next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 
0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 
0x16d5938, one = 0x0, mask = 0, n = 0}}
(gdb) p ofconn
$4 = (const struct ofconn *) 0x16d5810
(gdb) c
Continuing.
[Thread 0x7f2ad79f5980 (LWP 20165) exited]

>From the above GDB Session, ovs-vswitchd is in the middle of processing a 
>packet miss that was read from the data path.
The break point was set inside ofconn_set_protocol inside so that we hit if 
protocol was already set to other than OFPUTIL_P_NONE and now is being set to 
OFPUTIL_P_OF10_STD.
Yes, we modified the code in ofconn_set_protocol with this if 

Re: [ovs-dev] [PATCH 3/6] datapath-windows: Add support for OVS_TUNNEL_KEY_ATTR_TP_DST

2017-01-17 Thread Sairam Venugopal
We should consolidate this logic into 1 function for retrieving the tunKey from 
attr. This can be done separately. 

Acked-by: Sairam Venugopal 





On 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin 
Serdean"  wrote:

>Add support for netlink attribute OVS_TUNNEL_KEY_ATTR_TP_DST get/set
>flow functions.
>
>Signed-off-by: Alin Gabriel Serdean 
>---
> datapath-windows/ovsext/Flow.c | 8 
> 1 file changed, 8 insertions(+)
>
>diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c
>index 749d83a..96ff9fa 100644
>--- a/datapath-windows/ovsext/Flow.c
>+++ b/datapath-windows/ovsext/Flow.c
>@@ -1737,6 +1737,9 @@ OvsTunnelAttrToIPv4TunnelKey(PNL_ATTR attr,
> case OVS_TUNNEL_KEY_ATTR_OAM:
> tunKey->flags |= OVS_TNL_F_OAM;
> break;
>+case OVS_TUNNEL_KEY_ATTR_TP_DST:
>+tunKey->dst_port = NlAttrGetBe16(a);
>+break;
> case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
> if (hasOpt) {
> /* Duplicate options attribute is not allowed. */
>@@ -1812,6 +1815,11 @@ MapTunAttrToFlowPut(PNL_ATTR *keyAttrs,
> destKey->tunKey.flags |= OVS_TNL_F_OAM;
> }
> 
>+if (tunAttrs[OVS_TUNNEL_KEY_ATTR_TP_DST]) {
>+destKey->tunKey.dst_port =
>+NlAttrGetU16(tunAttrs[OVS_TUNNEL_KEY_ATTR_TP_DST]);
>+}
>+
> if (tunAttrs[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS]) {
> NTSTATUS status = OvsTunnelAttrToGeneveOptions(
>   tunAttrs[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS],
>-- 
>2.10.2.windows.1
>___
>dev mailing list
>d...@openvswitch.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=b5wDdE2vUUc40ZLOT7Qh6Xa7oJn8L5PUF1xaK4QXlRc=wV7RC6p0TO50RU-QknKEC1uCMcEjT0QBCechGsJCnWc=
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/6] datapath-windows: Allow tunnel action to modify destination port

2017-01-17 Thread Sairam Venugopal
Patch 2 and 3 could’ve been combined into the same patch since they are about 
dst_port.

Acked-by: Sairam Venugopal 





On 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin 
Serdean"  wrote:

>'OvsTunnelAttrToIPv4TunnelKey' modifies 'tunkey' with the received netlink
>attributes(i.e. OVS_TUNNEL_KEY_ATTR_IPV4_DST).
>
>Change the order of the value assignment to reflect the values received via
>userspace.
>
>Signed-off-by: Alin Gabriel Serdean 
>---
> datapath-windows/ovsext/Actions.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Actions.c 
>b/datapath-windows/ovsext/Actions.c
>index b4a286b..1530d8b 100644
>--- a/datapath-windows/ovsext/Actions.c
>+++ b/datapath-windows/ovsext/Actions.c
>@@ -1492,11 +1492,11 @@ OvsExecuteSetAction(OvsForwardingContext *ovsFwdCtx,
> case OVS_KEY_ATTR_TUNNEL:
> {
> OvsIPv4TunnelKey tunKey;
>+tunKey.flow_hash = (uint16)(hash ? *hash : OvsHashFlow(key));
>+tunKey.dst_port = key->ipKey.l4.tpDst;
> NTSTATUS convertStatus = OvsTunnelAttrToIPv4TunnelKey((PNL_ATTR)a, 
> );
> status = SUCCEEDED(convertStatus) ? NDIS_STATUS_SUCCESS : 
> NDIS_STATUS_FAILURE;
> ASSERT(status == NDIS_STATUS_SUCCESS);
>-tunKey.flow_hash = (uint16)(hash ? *hash : OvsHashFlow(key));
>-tunKey.dst_port = key->ipKey.l4.tpDst;
> RtlCopyMemory(>tunKey, , sizeof ovsFwdCtx->tunKey);
> break;
> }
>-- 
>2.10.2.windows.1
>___
>dev mailing list
>d...@openvswitch.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=ZijUW-XONuy4-0Gaus241SbH5Y3jgJ6h3MNUavcwlRc=1XFYfOzat_zlBGT85Z2Z7b3sEH8rtGJQKCGCjXsLtag=
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/6] datapath-windows: VXLAN Check for flow destination port

2017-01-17 Thread Sairam Venugopal
Do we need to convert the tunKey->dst_port to network byte order?



On 1/10/17, 8:48 AM, "ovs-dev-boun...@openvswitch.org on behalf of Alin 
Serdean"  wrote:

>Change the UDP destination port(VXLAN header) to check if it was set by
>the userspace, use it if it was set.
>If the userspace did not specify a destination port, use the configured
>vport destination port.
>
>Signed-off-by: Alin Gabriel Serdean 
>---
> datapath-windows/ovsext/Vxlan.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/datapath-windows/ovsext/Vxlan.c b/datapath-windows/ovsext/Vxlan.c
>index 949e069..84c2f2f 100644
>--- a/datapath-windows/ovsext/Vxlan.c
>+++ b/datapath-windows/ovsext/Vxlan.c
>@@ -289,7 +289,8 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport,
> /* UDP header */
> udpHdr = (UDPHdr *)((PCHAR)ipHdr + sizeof *ipHdr);
> udpHdr->source = htons(tunKey->flow_hash | MAXINT16);
>-udpHdr->dest = htons(vportVxlan->dstPort);
>+udpHdr->dest = tunKey->dst_port ? tunKey->dst_port :
>+  htons(vportVxlan->dstPort);
> udpHdr->len = htons(NET_BUFFER_DATA_LENGTH(curNb) - headRoom +
> sizeof *udpHdr + sizeof *vxlanHdr);
> 
>-- 
>2.10.2.windows.1
>___
>dev mailing list
>d...@openvswitch.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=5_pKhBH-PUBhzkQB9iiJQHwEXF4_z3jQTCICjBvpj1M=LmmZoSZ1s-GGDQBO7ydyZsKXCOqTZ_QAp99ZVHO633M=
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 15/17] datapath: enable vxlangpe creation in compat mode

2017-01-17 Thread Yang, Yi Y
Make sense, will change it in v3.

-Original Message-
From: Jiri Benc [mailto:jb...@redhat.com] 
Sent: Wednesday, January 18, 2017 12:37 AM
To: Yang, Yi Y 
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v2 15/17] datapath: enable vxlangpe creation in 
compat mode

On Wed, 28 Dec 2016 20:26:28 +0800, Yi Yang wrote:
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -291,6 +291,7 @@ enum ovs_vport_attr {  enum {
>   OVS_VXLAN_EXT_UNSPEC,
>   OVS_VXLAN_EXT_GBP,  /* Flag or __u32 */
> + OVS_VXLAN_EXT_GPE,  /* Flag or __u32 */

Please leave several free slots between OVS_VXLAN_EXT_GBP and 
OVS_VXLAN_EXT_GPE. Although it's not expected that we add more values to this 
enum upstream, we should not rule it out completely. Having a buffer here 
allows to prevent nasty problems with attribute conflicts between the upstream 
kernel and the compat code.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-17 Thread Daniele Di Proietto
2017-01-17 11:43 GMT-08:00 Kevin Traynor :
> On 01/17/2017 05:43 PM, Ciara Loftus wrote:
>> Instead of counting all polling cycles as processing cycles, only count
>> the cycles where packets were received from the polling.
>
> This makes these stats much clearer. One minor comment below, other than
> that
>
> Acked-by: Kevin Traynor 
>
>>
>> Signed-off-by: Georg Schmuecking 
>> Signed-off-by: Ciara Loftus 
>> Co-authored-by: Ciara Loftus 

Minor: the co-authored-by tag should be different from the main author.

This makes it easier to understand how busy a pmd thread is, a valid question
that a sysadmin might have.

The counters were originally introduced to help developers understand how cycles
are spent between drivers(netdev rx) and datapath processing(dpif).
Do you think
it's ok to lose this type of information?  Perhaps it is, since a
developer can also
use a profiler, I'm not sure.

Maybe we could 'last_cycles' as it is and introduce a separate counter to get
the idle/busy ratio.  I'm not 100% sure this is the best way.

What do you guys think?

Thanks,

Daniele

>> ---
>> v2:
>> - Rebase
>> ---
>>  lib/dpif-netdev.c | 57 
>> ++-
>>  1 file changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 3901129..3854c79 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -272,7 +272,10 @@ enum dp_stat_type {
>>
>>  enum pmd_cycles_counter_type {
>>  PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */
>
> this is not used anymore and can be removed
>
>> -PMD_CYCLES_PROCESSING,  /* Cycles spent processing packets */
>> +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful 
>> polling */
>> +PMD_CYCLES_PROCESSING,  /* Cycles spent successfully polling and
>> + * processing polled packets */
>> +
>>  PMD_N_CYCLES
>>  };
>>
>> @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply,
>>  }
>>
>>  ds_put_format(reply,
>> -  "\tpolling cycles:%"PRIu64" (%.02f%%)\n"
>> +  "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>>"\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
>> -  cycles[PMD_CYCLES_POLLING],
>> -  cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100,
>> +  cycles[PMD_CYCLES_IDLE],
>> +  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
>>cycles[PMD_CYCLES_PROCESSING],
>>cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 
>> 100);
>>
>> @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>>  non_atomic_ullong_add(>cycles.n[type], interval);
>>  }
>>
>> -static void
>> +/* Calculate the intermediate cycle result and add to the counter 'type' */
>> +static inline void
>> +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>> +  enum pmd_cycles_counter_type type)

I'd add an OVS_REQUIRES(_counter_fake_mutex)

>> +OVS_NO_THREAD_SAFETY_ANALYSIS
>> +{
>> +unsigned long long new_cycles = cycles_counter();
>> +unsigned long long interval = new_cycles - pmd->last_cycles;
>> +pmd->last_cycles = new_cycles;
>> +
>> +non_atomic_ullong_add(>cycles.n[type], interval);
>> +}
>> +
>> +static int
>>  dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>> struct netdev_rxq *rx,
>> odp_port_t port_no)
>>  {
>>  struct dp_packet_batch batch;
>>  int error;
>> +int batch_cnt = 0;
>>
>>  dp_packet_batch_init();
>> -cycles_count_start(pmd);
>>  error = netdev_rxq_recv(rx, );
>> -cycles_count_end(pmd, PMD_CYCLES_POLLING);
>>  if (!error) {
>>  *recirc_depth_get() = 0;
>>
>> -cycles_count_start(pmd);
>> +batch_cnt = batch.count;
>>  dp_netdev_input(pmd, , port_no);
>> -cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
>>  } else if (error != EAGAIN && error != EOPNOTSUPP) {
>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>
>>  VLOG_ERR_RL(, "error receiving data from %s: %s",
>>  netdev_rxq_get_name(rx), ovs_strerror(error));
>>  }
>> +
>> +return batch_cnt;
>>  }
>>
>>  static struct tx_port *
>> @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif)
>>  struct dp_netdev *dp = get_dp_netdev(dpif);
>>  struct dp_netdev_pmd_thread *non_pmd;
>>  uint64_t new_tnl_seq;
>> +int process_packets = 0;
>>
>>  ovs_mutex_lock(>port_mutex);
>>  non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>>  if (non_pmd) {
>>  ovs_mutex_lock(>non_pmd_mutex);
>> +cycles_count_start(non_pmd);
>>  HMAP_FOR_EACH (port, node, >ports) {
>>

[ovs-dev] [RFC] ofproto/bond: operational vs administratively disabled bond interface

2017-01-17 Thread Eelco Chaudron

Currently OVS does not distinguish between a bond slave being operational
disabled, i.e. link being down, and administratively disabled.

Take the example where the administrator disabled a link in a bond,
"ovs-appctl bond/disable-slave bond0 enp129s0f0", it's automatically
enabled again due to the fact the link is up.

I would like to change this behavior such that when disabled trough appctl
the slave is no longer used until explicitly enabled again via appctl.

Any comments? If not I go ahead and prepare a patch-set.

Cheers,

Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 4/8] ovn: add egress_loopback action

2017-01-17 Thread Mickey Spiegel
This patch adds an action that loops a clone of the packet back to the
beginning of the ingress pipeline with logical inport equal to the value
of the current logical outport.  The following actions are executed on
the clone:

clears the connection tracking state
in_port = 0
inport = outport
outport = 0
flags = 0
reg0 ... reg9 = 0
nested actions from inside "{ ... }"
for example "reg9[1] = 1" to indicate that egress loopback has
occurred
executes the ingress pipeline as a subroutine

This action is expected to be executed in the egress pipeline.  No
changes are made to the logical datapath or to the connection tracking
zones, which will continue to be correct when carrying out loopback
from the egress pipeline to the ingress pipeline.

This capability is needed in order to implement some of the east/west
distributed NAT flows.

Signed-off-by: Mickey Spiegel 
---
 include/ovn/actions.h |  5 +++-
 ovn/controller/lflow.c|  1 +
 ovn/lib/actions.c | 71 +--
 ovn/ovn-sb.xml| 35 +++
 ovn/utilities/ovn-trace.c | 49 
 tests/ovn.at  |  4 +++
 tests/test-ovn.c  |  1 +
 7 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 0bf6145..5f9d203 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -68,7 +68,8 @@ struct simap;
 OVNACT(PUT_ND,ovnact_put_mac_bind)  \
 OVNACT(PUT_DHCPV4_OPTS, ovnact_put_dhcp_opts)   \
 OVNACT(PUT_DHCPV6_OPTS, ovnact_put_dhcp_opts)   \
-OVNACT(SET_QUEUE,   ovnact_set_queue)
+OVNACT(SET_QUEUE,   ovnact_set_queue)   \
+OVNACT(EGRESS_LOOPBACK, ovnact_nest)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -444,6 +445,8 @@ struct ovnact_encode_params {
 uint8_t output_ptable;  /* OpenFlow table for 'output' to resubmit. */
 uint8_t mac_bind_ptable;/* OpenFlow table for 'get_arp'/'get_nd' to
resubmit. */
+uint8_t ingress_ptable; /* OpenFlow table for 'egress_loopback' to
+   resubmit. */
 };
 
 void ovnacts_encode(const struct ovnact[], size_t ovnacts_len,
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 3d7633e..c41368f 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -226,6 +226,7 @@ consider_logical_flow(const struct lport_index *lports,
 .first_ptable = first_ptable,
 .output_ptable = output_ptable,
 .mac_bind_ptable = OFTABLE_MAC_BINDING,
+.ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE,
 };
 ovnacts_encode(ovnacts.data, ovnacts.size, , );
 ovnacts_free(ovnacts.data, ovnacts.size);
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 686ecc5..dda675b 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -169,6 +169,21 @@ put_load(uint64_t value, enum mf_field_id dst, int ofs, 
int n_bits,
 bitwise_copy(_value, 8, 0, sf->value, sf->field->n_bytes, ofs, n_bits);
 bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits);
 }
+
+static void
+put_move(enum mf_field_id src, int src_ofs,
+ enum mf_field_id dst, int dst_ofs,
+ int n_bits,
+ struct ofpbuf *ofpacts)
+{
+struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
+move->src.field = mf_from_id(src);
+move->src.ofs = src_ofs;
+move->src.n_bits = n_bits;
+move->dst.field = mf_from_id(dst);
+move->dst.ofs = dst_ofs;
+move->dst.n_bits = n_bits;
+}
 
 /* Context maintained during ovnacts_parse(). */
 struct action_context {
@@ -1021,7 +1036,10 @@ free_CT_LB(struct ovnact_ct_lb *ct_lb)
 }
 
 /* Implements the "arp" and "nd_na" actions, which execute nested actions on a
- * packet derived from the one being processed. */
+ * packet derived from the one being processed.  Also implements the
+ * "egress_loopback" action, which executes nested actions after clearing
+ * registers and connection state, then loops the packet back to the
+ * beginning of the ingress pipeline. */
 static void
 parse_nested_action(struct action_context *ctx, enum ovnact_type type,
 const char *prereq)
@@ -1055,7 +1073,9 @@ parse_nested_action(struct action_context *ctx, enum 
ovnact_type type,
 return;
 }
 
-add_prerequisite(ctx, prereq);
+if (prereq) {
+add_prerequisite(ctx, prereq);
+}
 
 struct ovnact_nest *on = ovnact_put(ctx->ovnacts, type, sizeof *on);
 on->nested_len = nested.size;
@@ -1075,6 +1095,12 @@ parse_ND_NA(struct action_context *ctx)
 }
 
 static void
+parse_EGRESS_LOOPBACK(struct action_context *ctx)
+{
+parse_nested_action(ctx, OVNACT_EGRESS_LOOPBACK, NULL);
+}
+
+static void
 format_nested_action(const struct ovnact_nest *on, const char *name,

[ovs-dev] [PATCH v10 8/8] ovn: ovn-nbctl commands for distributed NAT

2017-01-17 Thread Mickey Spiegel
This patch adds the new optional arguments "logical_port" and
"external_mac" to lr-nat-add, and displays that information in
lr-nat-list.

Signed-off-by: Mickey Spiegel 
---
 ovn/utilities/ovn-nbctl.8.xml | 27 +++---
 ovn/utilities/ovn-nbctl.c | 54 +--
 tests/ovn-nbctl.at| 47 +
 tests/system-ovn.at   | 30 +---
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.8.xml b/ovn/utilities/ovn-nbctl.8.xml
index 4911c6a..c408484 100644
--- a/ovn/utilities/ovn-nbctl.8.xml
+++ b/ovn/utilities/ovn-nbctl.8.xml
@@ -444,7 +444,7 @@
 NAT Commands
 
 
-  [--may-exist] lr-nat-add router 
type external_ip logical_ip
+  [--may-exist] lr-nat-add router 
type external_ip logical_ip 
[logical_port external_mac]
   
 
   Adds the specified NAT to router.
@@ -453,6 +453,13 @@
   The external_ip is an IPv4 address.
   The logical_ip is an IPv4 network (e.g 192.168.1.0/24)
   or an IPv4 address.
+  The logical_port and external_mac are only
+  accepted when router is a distributed router (rather
+  than a gateway router) and type is
+  dnat_and_snat.
+  The logical_port is the name of an existing logical
+  switch port where the logical_ip resides.
+  The external_mac is an Ethernet address.
 
 
   When type is dnat, the externally
@@ -475,8 +482,22 @@
   the IP address in external_ip.
 
 
-  It is an error if a NAT already exists,
-  unless --may-exist is specified.
+  When the logical_port and external_mac
+  are specified, the NAT rule will be programmed on the chassis
+  where the logical_port resides.  This includes
+  ARP replies for the external_ip, which return the
+  value of external_mac.  All packets transmitted
+  with source IP address equal to external_ip will
+  be sent using the external_mac.
+
+
+  It is an error if a NAT already exists with the same values
+  of router, type, external_ip,
+  and logical_ip, unless --may-exist is
+  specified.  When --may-exist,
+  logical_port, and external_mac are all
+  specified, the existing values of logical_port and
+  external_mac are overwritten.
 
   
 
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index 4397daf..661f7de 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -384,7 +384,7 @@ Route commands:\n\
   lr-route-list ROUTER  print routes for ROUTER\n\
 \n\
 NAT commands:\n\
-  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP\n\
+  lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
 add a NAT to ROUTER\n\
   lr-nat-del ROUTER [TYPE [IP]]\n\
 remove NATs from ROUTER\n\
@@ -2233,6 +2233,30 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 new_logical_ip = normalize_ipv4_prefix(ipv4, plen);
 }
 
+const char *logical_port;
+const char *external_mac;
+if (ctx->argc == 6) {
+ctl_fatal("lr-nat-add with logical_port "
+  "must also specify external_mac.");
+} else if (ctx->argc == 7) {
+if (strcmp(nat_type, "dnat_and_snat")) {
+ctl_fatal("logical_port and external_mac are only valid when "
+  "type is \"dnat_and_snat\".");
+}
+
+logical_port = ctx->argv[5];
+lsp_by_name_or_uuid(ctx, logical_port, true);
+
+external_mac = ctx->argv[6];
+struct eth_addr ea;
+if (!eth_addr_from_string(external_mac, )) {
+ctl_fatal("invalid mac address %s.", external_mac);
+}
+} else {
+logical_port = NULL;
+external_mac = NULL;
+}
+
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 int is_snat = !strcmp("snat", nat_type);
 for (size_t i = 0; i < lr->n_nat; i++) {
@@ -2243,6 +2267,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 if (!strcmp(is_snat ? external_ip : new_logical_ip,
 is_snat ? nat->external_ip : nat->logical_ip)) {
 if (may_exist) {
+nbrec_nat_verify_logical_port(nat);
+nbrec_nat_verify_external_mac(nat);
+nbrec_nat_set_logical_port(nat, logical_port);
+nbrec_nat_set_external_mac(nat, external_mac);
 free(new_logical_ip);
 return;
 }
@@ -2265,6 +2293,10 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 nbrec_nat_set_type(nat, nat_type);
 nbrec_nat_set_external_ip(nat, external_ip);
 

[ovs-dev] [PATCH v10 6/8] ovn: avoid snat recirc only on gateway routers

2017-01-17 Thread Mickey Spiegel
Currently, for performance reasons on gateway routers, ct_snat
that does not specify an IP address does not immediately trigger
recirculation.  On gateway routers, ct_snat that does not specify
an IP address happens in the UNSNAT pipeline stage, which is
followed by the DNAT pipeline stage that triggers recirculation
for all packets.  This DNAT pipeline stage recirculation takes
care of the recirculation needs of UNSNAT as well as other cases
such as UNDNAT.

On distributed routers, UNDNAT is handled in the egress pipeline
stage, separately from DNAT in the ingress pipeline stages.  The
DNAT pipeline stage only triggers recirculation for some packets.
Due to this difference in design, UNSNAT needs to trigger its own
recirculation.

This patch restricts the logic that avoids recirculation for
ct_snat, so that it only applies to datapaths representing
gateway routers.

Signed-off-by: Mickey Spiegel 
---
 include/ovn/actions.h  |  3 +++
 ovn/controller/lflow.c | 10 ++
 ovn/lib/actions.c  | 15 +--
 ovn/ovn-sb.xml | 23 +++
 tests/ovn.at   |  2 +-
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 5f9d203..810b901 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -418,6 +418,9 @@ struct ovnact_encode_params {
 /* 'true' if the flow is for a switch. */
 bool is_switch;
 
+/* 'true' if the flow is for a gateway router. */
+bool is_gateway_router;
+
 /* A map from a port name to its connection tracking zone. */
 const struct simap *ct_zones;
 
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index c41368f..b011109 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -107,6 +107,15 @@ is_switch(const struct sbrec_datapath_binding *ldp)
 
 }
 
+static bool
+is_gateway_router(const struct sbrec_datapath_binding *ldp,
+  const struct hmap *local_datapaths)
+{
+struct local_datapath *ld =
+get_local_datapath(local_datapaths, ldp->tunnel_key);
+return ld ? ld->has_local_l3gateway : false;
+}
+
 /* Adds the logical flows from the Logical_Flow table to flow tables. */
 static void
 add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports,
@@ -220,6 +229,7 @@ consider_logical_flow(const struct lport_index *lports,
 .lookup_port = lookup_port_cb,
 .aux = ,
 .is_switch = is_switch(ldp),
+.is_gateway_router = is_gateway_router(ldp, local_datapaths),
 .ct_zones = ct_zones,
 .group_table = group_table,
 
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index dda675b..5c57ab7 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -803,12 +803,15 @@ encode_ct_nat(const struct ovnact_ct_nat *cn,
 ct = ofpacts->header;
 if (cn->ip) {
 ct->flags |= NX_CT_F_COMMIT;
-} else if (snat) {
-/* XXX: For performance reasons, we try to prevent additional
- * recirculations.  So far, ct_snat which is used in a gateway router
- * does not need a recirculation. ct_snat(IP) does need a
- * recirculation.  Should we consider a method to let the actions
- * specify whether an action needs recirculation if there more use
+} else if (snat && ep->is_gateway_router) {
+/* For performance reasons, we try to prevent additional
+ * recirculations.  ct_snat which is used in a gateway router
+ * does not need a recirculation.  ct_snat(IP) does need a
+ * recirculation.  ct_snat in a distributed router needs
+ * recirculation regardless of whether an IP address is
+ * specified.
+ * XXX Should we consider a method to let the actions specify
+ * whether an action needs recirculation if there are more use
  * cases?. */
 ct->recirc_table = NX_CT_RECIRC_NONE;
 }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index a7c29c3..8fe0e2b 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1122,11 +1122,26 @@
 
   
 ct_snat sends the packet through the SNAT zone to
-unSNAT any packet that was SNATed in the opposite direction.  If
-the packet needs to be sent to the next tables, then it should be
-followed by a next; action.  The next tables will not
-see the changes in the packet caused by the connection tracker.
+unSNAT any packet that was SNATed in the opposite direction.  The
+behavior on gateway routers differs from the behavior on a
+distributed router:
   
+  
+
+  On a gateway router, if the packet needs to be sent to the next
+  tables, then it should be followed by a next;
+  action.  The next tables will not see the changes in the packet
+  caused by the connection tracker.
+
+
+  

[ovs-dev] [PATCH v10 5/8] ovn: move load balancing flows after NAT flows

2017-01-17 Thread Mickey Spiegel
This will make it easy for distributed NAT to reuse some of the
existing code for NAT flows, while leaving load balancing and defrag
as functionality specific to gateway routers.  There is no intent to
change any functionality in this patch.

Signed-off-by: Mickey Spiegel 
Acked-by: Gurucharan Shetty 
---
 ovn/northd/ovn-northd.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 87c80d1..219a69c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4099,76 +4099,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 const char *lb_force_snat_ip = get_force_snat_ip(od, "lb",
  _ip);
 
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
-struct smap *vips = >vips;
-struct smap_node *node;
-
-SMAP_FOR_EACH (node, vips) {
-uint16_t port = 0;
-
-/* node->key contains IP:port or just IP. */
-char *ip_address = NULL;
-ip_address_and_port_from_lb_key(node->key, _address, );
-if (!ip_address) {
-continue;
-}
-
-if (!sset_contains(_ips, ip_address)) {
-sset_add(_ips, ip_address);
-}
-
-/* Higher priority rules are added for load-balancing in DNAT
- * table.  For every match (on a VIP[:port]), we add two flows
- * via add_router_lb_flow().  One flow is for specific matching
- * on ct.new with an action of "ct_lb($targets);".  The other
- * flow is for ct.est with an action of "ct_dnat;". */
-ds_clear();
-ds_put_format(, "ct_lb(%s);", node->value);
-
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s",
-  ip_address);
-free(ip_address);
-
-if (port) {
-if (lb->protocol && !strcmp(lb->protocol, "udp")) {
-ds_put_format(, " && udp && udp.dst == %d",
-  port);
-} else {
-ds_put_format(, " && tcp && tcp.dst == %d",
-  port);
-}
-add_router_lb_flow(lflows, od, , , 120,
-   lb_force_snat_ip);
-} else {
-add_router_lb_flow(lflows, od, , , 110,
-   lb_force_snat_ip);
-}
-}
-}
-
-/* If there are any load balancing rules, we should send the
- * packet to conntrack for defragmentation and tracking.  This helps
- * with two things.
- *
- * 1. With tracking, we can send only new connections to pick a
- *DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we need the
- *defragmentation to match on L4 ports. */
-const char *ip_address;
-SSET_FOR_EACH(ip_address, _ips) {
-ds_clear();
-ds_put_format(, "ip && ip4.dst == %s", ip_address);
-ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG,
-  100, ds_cstr(), "ct_next;");
-}
-
-sset_destroy(_ips);
-
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat;
 
@@ -4323,6 +4253,76 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
 * routing in the openflow pipeline. */
 ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
   "ip", "flags.loopback = 1; ct_dnat;");
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+
+for (int i = 0; i < od->nbr->n_load_balancer; i++) {
+struct nbrec_load_balancer *lb = od->nbr->load_balancer[i];
+struct smap *vips = >vips;
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, vips) {
+uint16_t port = 0;
+
+/* node->key contains IP:port or just IP. */
+char *ip_address = NULL;
+ip_address_and_port_from_lb_key(node->key, _address, );
+if (!ip_address) {
+continue;
+}
+
+if (!sset_contains(_ips, ip_address)) {
+sset_add(_ips, ip_address);
+}
+
+/* Higher priority rules are 

Re: [ovs-dev] [PATCH] ovn-controller: Provide the option to set Encap.options:csum

2017-01-17 Thread Numan Siddique
On Mon, Jan 16, 2017 at 11:46 PM, Russell Bryant  wrote:

>
>
> On Mon, Jan 16, 2017 at 12:01 PM, Ben Pfaff  wrote:
>
>> On Mon, Jan 16, 2017 at 09:28:28AM -0500, Russell Bryant wrote:
>> > On Sat, Jan 14, 2017 at 11:29 AM, Ben Pfaff  wrote:
>> >
>> > > On Sat, Jan 14, 2017 at 07:37:53PM +0530, Numan Siddique wrote:
>> > > > On Sat, Jan 14, 2017 at 3:18 AM, Ben Pfaff  wrote:
>> > > >
>> > > > > On Tue, Jan 10, 2017 at 11:34:42AM +0530, Numan Siddique wrote:
>> > > > > > ovn-controller by default enables UDP checksums for geneve
>> > > > > > tunnels. With this patch user can set the desired value in
>> > > > > > Open_vSwitch.external_ids:ovn_encap_csum.
>> > > > > >
>> > > > > > Signed-off-by: Numan Siddique 
>> > > > >
>> > > > > I don't see technical problems with this, but I also don't know
>> why a
>> > > > > user would want to disable checksums.  Can you send a v2 that
>> adds this
>> > > > > rationale to the documentation and to the commit message?
>> > > > >
>> > > >
>> > > > Thanks for the review. Sure I will do that. The reason for this
>> patch is
>> > > -
>> > > > we are seeing significant performance increase (more than 100%) in
>> our
>> > > > testing when tunnel checksum is disabled.​
>> > > >
>> > > > ​The lab servers have nics with geneve offload support (
>> > > > tx-udp_tnl-segmentation
>> > > > ​)​
>> > > > .​
>> > >
>> > > OK, that sounds like a good reason to document.
>> >
>> >
>> > In particular, it looks like the NICs we have, Intel X710, will do TCP
>> > Segmentation Offload (TSO) with geneve or vxlan, but only if udp
>> checksums
>> > are turned off.  Once they're on, TCP throughput gets cut to less than
>> half.
>> >
>> > This is going to be painful to document well if it's hardware dependent.
>> > I'm not sure what the better default is, since checksums should actually
>> > improve performance for NICs without geneve offload support.
>>
>> Well, it's at least helpful to document that performance is the reason,
>> and that offloads factor into the issue.  Otherwise readers will have no
>> idea why they'd want to turn off checksums.  At worst, users can
>> benchmark both cases in their environments.
>>
>
> Totally agreed.  I was just adding some more detail about what we've seen,
> and then reflecting on how we might want to document this new option for
> users.
>
>
​I have submitted v2 of the patch with updated documenation -
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327660.html

Thanks​
​Numan
​

> --
> Russell Bryant
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 3/8] ovn: Introduce distributed gateway port and "chassisredirect" port binding

2017-01-17 Thread Mickey Spiegel
Currently OVN distributed logical routers achieve reachability to
physical networks by passing through a "join" logical switch to a
centralized gateway router, which then connects to another logical
switch that has a localnet port connecting to the physical network.

This patch adds logical port and port binding abstractions that allow
an OVN distributed logical router to connect directly to a logical
switch that has a localnet port connecting to the physical network.
In this patch, this logical router port is called a "distributed
gateway port".

The primary design goal of distributed gateway ports is to allow as
much traffic as possible to be handled locally on the hypervisor
where a VM or container resides.  Whenever possible, packets from
the VM or container to the outside world should be processed
completely on that VM's or container's hypervisor, eventually
traversing a localnet port instance on that hypervisor to the
physical network.  Whenever possible, packets from the outside
world to a VM or container should be directed through the physical
network directly to the VM's or container's hypervisor, where the
packet will enter the integration bridge through a localnet port.

However, due to the implications of the use of L2 learning in the
physical network, as well as the need to support advanced features
such as one-to-many NAT (aka IP masquerading), where multiple
logical IP addresses spread across multiple chassis are mapped to
one external IP address, it will be necessary to handle some of the
logical router processing on a specific chassis in a centralized
manner.  For this reason, the user must associate a chassis with
each distributed gateway port.

In order to allow for the distributed processing of some packets,
distributed gateway ports need to be logical patch ports that
effectively reside on every hypervisor, rather than "l3gateway"
ports that are bound to a particular chassis.  However, the flows
associated with distributed gateway ports often need to be
associated with physical locations.  This is implemented in this
patch (and subsequent patches) by adding "is_chassis_resident()"
match conditions to several logical router flows.

While most of the physical location dependent aspects of distributed
gateway ports can be handled by restricting some flows to specific
chassis, one additional mechanism is required.  When a packet
leaves the ingress pipeline and the logical egress port is the
distributed gateway port, one of two different sets of actions is
required at table 32:
- If the packet can be handled locally on the sender's hypervisor
  (e.g. one-to-one NAT traffic), then the packet should just be
  resubmitted locally to table 33, in the normal manner for
  distributed logical patch ports.
- However, if the packet needs to be handled on the chassis
  associated with the distributed gateway port (e.g. one-to-many
  SNAT traffic or non-NAT traffic), then table 32 must send the
  packet on a tunnel port to that chassis.
In order to trigger the second set of actions, the
"chassisredirect" type of southbound port_binding is introduced.
Setting the logical egress port to the type "chassisredirect"
logical port is simply a way to indicate that although the packet
is destined for the distributed gateway port, it needs to be
redirected to a different chassis.  At table 32, packets with this
logical egress port are sent to a specific chassis, in the same
way that table 32 directs packets whose logical egress port is a
VIF or a type "l3gateway" port to different chassis.  Once the
packet arrives at that chassis, table 33 resets the logical egress
port to the value representing the distributed gateway port.  For
each distributed gateway port, there is one type "chassisredirect"
port, in addition to the distributed logical patch port
representing the distributed gateway port.

A "chassisredirect" port represents a particular instance, bound
to a specific chassis, of an otherwise distributed port.  A
"chassisredirect" port is associated with a chassis in the same
manner as a "l3gateway" port.  However, unlike "l3gateway" ports,
"chassisredirect" ports have no associated IP or MAC addresses,
and "chassisredirect" ports should never be used as the "inport".
Any pipeline stages that depend on port specific IP or MAC addresses
should be carried out in the context of the distributed gateway
port's logical patch port.

Although the abstraction represented by the "chassisredirect" port
binding is generalized, in this patch the "chassisredirect" port binding
is only created for NB logical router ports that specify the new
"redirect-chassis" option.  There is no explicit notion of a
"chassisredirect" port in the NB database.  The expectation is when
capabilities are implemented that take advantage of "chassisredirect"
ports (e.g. distributed gateway ports), flows specifying a
"chassisredirect" port as the outport will be added as part of that
capability.

Signed-off-by: Mickey Spiegel 

[ovs-dev] [PATCH v10 2/8] ovn: add is_chassis_resident match expression component

2017-01-17 Thread Mickey Spiegel
This patch introduces a new match expression component
is_chassis_resident().  Unlike match expression comparisons,
is_chassis_resident is not pushed down to OpenFlow.  It is a
conditional that is evaluated in the controller during expr_simplify(),
when it is replaced by a boolean expression.  The is_chassis_resident
conditional evaluates to "true" when the specified string identifies a
port name that is resident on this controller chassis, i.e., the
corresponding southbound database Port_Binding has a chassis column
that matches this chassis.  Otherwise it evaluates to "false".

This allows higher level features to specify flows that are only
installed on some chassis rather than on all chassis with the
corresponding datapath.

Suggested-by: Ben Pfaff 
Signed-off-by: Mickey Spiegel 
Acked-by: Ben Pfaff 
---
 include/ovn/expr.h  |  22 +-
 ovn/controller/lflow.c  |  31 ++--
 ovn/controller/lflow.h  |   5 +-
 ovn/controller/ovn-controller.c |   5 +-
 ovn/lib/expr.c  | 160 ++--
 ovn/ovn-sb.xml  |  14 
 ovn/utilities/ovn-trace.c   |  21 +-
 tests/ovn.at|  14 
 tests/test-ovn.c|  24 +-
 9 files changed, 279 insertions(+), 17 deletions(-)

diff --git a/include/ovn/expr.h b/include/ovn/expr.h
index 2169a8c..711713e 100644
--- a/include/ovn/expr.h
+++ b/include/ovn/expr.h
@@ -292,6 +292,15 @@ enum expr_type {
 EXPR_T_AND, /* Logical AND of 2 or more subexpressions. */
 EXPR_T_OR,  /* Logical OR of 2 or more subexpressions. */
 EXPR_T_BOOLEAN, /* True or false constant. */
+EXPR_T_CONDITION,   /* Conditional to be evaluated in the
+ * controller during expr_simplify(),
+ * prior to constructing OpenFlow matches. */
+};
+
+/* Expression condition type. */
+enum expr_cond_type {
+EXPR_COND_CHASSIS_RESIDENT, /* Check if specified logical port name is
+ * resident on the controller chassis. */
 };
 
 /* Relational operator. */
@@ -349,6 +358,14 @@ struct expr {
 
 /* EXPR_T_BOOLEAN. */
 bool boolean;
+
+/* EXPR_T_CONDITION. */
+struct {
+enum expr_cond_type type;
+bool not;
+/* XXX Should arguments for conditions be generic? */
+char *string;
+} cond;
 };
 };
 
@@ -375,7 +392,10 @@ void expr_destroy(struct expr *);
 
 struct expr *expr_annotate(struct expr *, const struct shash *symtab,
char **errorp);
-struct expr *expr_simplify(struct expr *);
+struct expr *expr_simplify(struct expr *,
+   bool (*is_chassis_resident)(const void *c_aux,
+   const char *port_name),
+   const void *c_aux);
 struct expr *expr_normalize(struct expr *);
 
 bool expr_honors_invariants(const struct expr *);
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 71d8c59..3d7633e 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -50,12 +50,18 @@ struct lookup_port_aux {
 const struct sbrec_datapath_binding *dp;
 };
 
+struct condition_aux {
+const struct lport_index *lports;
+const struct sbrec_chassis *chassis;
+};
+
 static void consider_logical_flow(const struct lport_index *lports,
   const struct mcgroup_index *mcgroups,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
   uint32_t *conj_id_ofs,
@@ -85,6 +91,16 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 }
 
 static bool
+is_chassis_resident_cb(const void *c_aux_, const char *port_name)
+{
+const struct condition_aux *c_aux = c_aux_;
+
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(c_aux->lports, port_name);
+return pb && pb->chassis && pb->chassis == c_aux->chassis;
+}
+
+static bool
 is_switch(const struct sbrec_datapath_binding *ldp)
 {
 return smap_get(>external_ids, "logical-switch") != NULL;
@@ -98,6 +114,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct 
lport_index *lports,
   const struct hmap *local_datapaths,
   struct group_table *group_table,
   const struct simap *ct_zones,
+  const struct sbrec_chassis *chassis,
  

[ovs-dev] [PATCH v10 1/8] ovn: document logical routers and logical patch ports in ovn-architecture

2017-01-17 Thread Mickey Spiegel
This patch adds a description of logical routers and logical patch ports,
including gateway routers, to ovn/ovn-architecture.7.xml.

Signed-off-by: Mickey Spiegel 
---
 ovn/ovn-architecture.7.xml | 148 ++---
 1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index b049e51..d92f878 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -381,6 +381,36 @@
   switch.  Logical switches and routers are both implemented as logical
   datapaths.
 
+
+
+  
+Logical ports represent the points of connectivity in and
+out of logical switches and logical routers.  Some common types of
+logical ports are:
+  
+
+  
+
+  Logical ports representing VIFs.
+
+
+
+  Localnet ports represent the points of connectivity
+  between logical switches and the physical network.  They are
+  implemented as OVS patch ports between the integration bridge
+  and the separate Open vSwitch bridge that underlay physical
+  ports attach to.
+
+
+
+  Logical patch ports represent the points of
+  connectivity between logical switches and logical routers, and
+  in some cases between peer logical routers.  There is a pair of
+  logical patch ports at each such point of connectivity, one on
+  each side.
+
+  
+
   
 
   Life Cycle of a VIF
@@ -1040,17 +1070,119 @@
 is a container nested with a VM, then before sending the packet the
 actions push on a VLAN header with an appropriate VLAN ID.
   
-
-  
-If the logical egress port is a logical patch port, then table 65
-outputs to an OVS patch port that represents the logical patch port.
-The packet re-enters the OpenFlow flow table from the OVS patch port's
-peer in table 0, which identifies the logical datapath and logical
-input port based on the OVS patch port's OpenFlow port number.
-  
 
   
 
+  Logical Routers and Logical Patch Ports
+
+  
+Typically logical routers and logical patch ports do not have a
+physical location and effectively reside on every hypervisor.  This is
+the case for logical patch ports between logical routers and logical
+switches behind those logical routers, to which VMs (and VIFs) attach.
+  
+
+  
+Consider a packet sent from one virtual machine or container to another
+VM or container that resides on a different subnet.  The packet will
+traverse tables 0 to 65 as described in the previous section
+Architectural Physical Life Cycle of a Packet, using the
+logical datapath representing the logical switch that the sender is
+attached to.  At table 32, the packet will use the fallback flow that
+resubmits locally to table 33 on the same hypervisor.  In this case,
+all of the processing from table 0 to table 65 occurs on the hypervisor
+where the sender resides.
+  
+
+  
+When the packet reaches table 65, the logical egress port is a logical
+patch port.  The behavior at table 65 differs depending on the OVS
+version:
+  
+
+  
+
+  In OVS versions 2.6 and earlier, table 65 outputs to an OVS patch
+  port that represents the logical patch port.  The packet re-enters
+  the OpenFlow flow table from the OVS patch port's peer in table 0,
+  which identifies the logical datapath and logical input port based
+  on the OVS patch port's OpenFlow port number.
+
+
+
+  In OVS versions 2.7 and later, the packet is cloned and resubmitted
+  directly to OpenFlow flow table 16, setting the logical ingress
+  port to the peer logical patch port, and using the peer logical
+  patch port's logical datapath (that represents the logical router).
+
+  
+
+  
+The packet re-enters the ingress pipeline in order to traverse tables
+16 to 65 again, this time using the logical datapath representing the
+logical router.  The processing continues as described in the previous
+section Architectural Physical Life Cycle of a Packet.
+When the packet reachs table 65, the logical egress port will once
+again be a logical patch port.  In the same manner as described above,
+this logical patch port will cause the packet to be resubmitted to
+OpenFlow tables 16 to 65, this time using the logical datapath
+representing the logical switch that the destination VM or container
+is attached to.
+  
+
+  
+The packet traverses tables 16 to 65 a third and final time.  If the
+destination VM or container resides on a remote hypervisor, then table
+32 will send the packet on a tunnel port from the sender's hypervisor
+to the remote hypervisor.  Finally table 65 will output the packet
+directly to the destination VM or 

Re: [ovs-dev] [PATCH] vlan.rst: Strip leftover HTML.

2017-01-17 Thread Russell Bryant
On Mon, Jan 16, 2017 at 5:14 PM, Ben Pfaff  wrote:

> On Mon, Jan 16, 2017 at 04:46:58PM -0500, Russell Bryant wrote:
> > String a couple of closing HTML tags that were left over from when this
> doc
>
> s/String/Strip/
>

Oops, fixed.


>
> > was converted from the web site to RST.
> >
> > Signed-off-by: Russell Bryant 
>
> Acked-by: Ben Pfaff 
>

Thanks, applied to master.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Create separate memory pool for each port

2017-01-17 Thread Robert Wojciechowicz
On Sun, Jan 15, 2017 at 06:14:47PM +, Stokes, Ian wrote:
> > Since it's possible to delete memory pool in DPDK we can try to estimate
> > better required memory size when port is reconfigured, e.g. with different
> > number of rx queues.
> 
> Thanks for the patch.
> 
> It will need to be rebased for it to apply cleanly to master.

Yes, this patch waits here for quite some time. I need to rebase it and send 
once again.

> I've applied the changes manually just to test and from what I can see there 
> are no issues.
> It also resolves issue reported with having 64 queues for dpdk physical ports 
> which causes ovs to crash.

Thanks for testing

> 
> A few other comments below.
> > 
> > Signed-off-by: Robert Wojciechowicz 
> > ---
> >  lib/netdev-dpdk.c | 85 +++---
> > -
> >  1 file changed, 41 insertions(+), 44 deletions(-)
> > 
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7c1523e..4c9e1e8
> > 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
[...]
> > 
> >  static struct dpdk_mp *
> > -dpdk_mp_get(int socket_id, int mtu)
> > +dpdk_mp_get(struct netdev_dpdk *dev, int mtu)
> >  {
> >  struct dpdk_mp *dmp;
> > 
> >  ovs_mutex_lock(_mp_mutex);
> > -LIST_FOR_EACH (dmp, list_node, _mp_list) {
> > -if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
> > -dmp->refcount++;
> > -goto out;
> > -}
> > -}
> > -
> As you remove the use of dpdk_mp_list above, I wonder is there a need to 
> maintain the global list?
> With mempools now set on a per netdev basis, perhaps the global list could be 
> removed also. Thoughts?
> 

I was thinking about that when I prepared this patch. In this implementation 
memory pool
will be always used only by one port, so I could get rid of `refcount` and 
`dpdk_mp_list`.
I didn't do that because I didn't want to make huge revolution in the 
implementation
without even knowing if this idea of separated memory pools makes sense for the 
community.
Now I have positive feedbak from you, so maybe it's worth it to work on this 
patch further.

Br,
Robert
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev