date:20180123

Re: [ovs-dev] 答复: Re: [PATCH] The dependency between ovndb_servers-master andVirtualIP is wrong

2018-01-23 Thread Numan Siddique

Hi xurong,

Please see below for some comments.



On Wed, Jan 24, 2018 at 8:36 AM,  wrote:

> Hello,Liguoshuai:
>
> In my cluster environment, HA manages a lot of services, and VIP is not
> just for ovsdb.
>
> When I move ovsdb out of HA, VIP can't start, and other services are so
> abnormal, such as neutron-server.
>
>
>
>
> 原始邮件
> *发件人：* ;
> *收件人：*赵静静00067370;徐熔00037997;
> *抄送人：* ; ; ; <
> d...@openvswitch.org>; ;
> *日 期 ：*2018年01月24日 10:41
> *主 题 ：**Re: [ovs-dev] [PATCH] The dependency between ovndb_servers-master
> andVirtualIP is wrong*
> Hello, xurong, zhaojingjing:
>
> At that time, in order to avoid the problem of losing after the
> active/slave switchover,
> the slave node was promote prior to the start of the VIP.
> https://github.com/openvswitch/ovs/commit/a38f532320936147e6831153828af5
> bcba2fba54#diff-fb92408f16ddd415d9906e3976e60f47L255
>
> After that, the problem of data loss was completely solved in ovsdb,
> https://github.com/openvswitch/ovs/commit/05ac209a5d3a5e85896f58d16b244e
> 6b2a4cf2d0#diff-6878621809e8ea9210d85a69a5e06367
>
> I think it is possible to change this sequence again.
>
> Are there any new problems detected?
>
> >>> From: zhaojingjing0067370 
> >>>
> >>> ---
> >>>   Documentation/topics/integration.rst | 4 ++--
> >>>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/topics/integration.rst b/
> Documentation/topics/integration.rst
> >>> index 0447faf..d129e21 100644
> >>> --- a/Documentation/topics/integration.rst
> >>> +++ b/Documentation/topics/integration.rst
> >>> @@ -255,6 +255,6 @@ with the active server::
> >>>
> >>>   $ pcs resource create VirtualIP ocf:
> heartbeat:IPaddr2 ip=x.x.x.x \
> >>>   op monitor interval=30s
> >>> -$ pcs constraint order promote ovndb_servers-
> master then VirtualIP
> >>> -$ pcs constraint colocation add VirtualIP with
> master ovndb_servers-master \
> >>> +$ pcs constraint order start VirtualIP then
> promote ovndb_servers-master
>

I think this makes sense.



> >>> +$ pcs constraint colocation add master ovndb_
> servers-master with VirtualIP \
> >>>   score=INFINITY
>

In OpenStack tripleo installations, colocation constraint is set as
"VirtualIP with ovndb_servers-master".
Are you seeing issues with the below commands ?

 $ pcs constraint order start VirtualIP then promote ovndb_servers-master
 $ pcs constraint colocation add VirtualIP with master ovndb_servers-master

I would suggest to only change the "order" command and leave the colocation
command AS IS in your patch.

This is the pcs constraint show command in my tripleo environment.


Ordering Constraints:
  start ip-192.168.24.12 then start haproxy-bundle (kind:Optional)
  start ip-10.0.0.6 then start haproxy-bundle (kind:Optional)
  start ip-172.16.2.4 then start haproxy-bundle (kind:Optional)
  start ip-172.16.2.10 then start haproxy-bundle (kind:Optional)
  start ip-172.16.1.5 then start haproxy-bundle (kind:Optional)
  start ip-172.16.3.11 then start haproxy-bundle (kind:Optional)
Colocation Constraints:
  ip-192.168.24.12 with haproxy-bundle (score:INFINITY)
  ip-10.0.0.6 with haproxy-bundle (score:INFINITY)
  ip-172.16.2.4 with haproxy-bundle (score:INFINITY)
  ip-172.16.2.10 with haproxy-bundle (score:INFINITY)
  ip-172.16.1.5 with haproxy-bundle (score:INFINITY)
  ip-172.16.3.11 with haproxy-bundle (score:INFINITY)
  ip-172.16.2.10 with ovn-dbs-bundle (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:
**

Thanks
Numan


>>> --
> >>> 1.8.3.1
> >>>
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v10] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

2018-01-23 Thread Ilya Maximets

On 23.01.2018 19:55, Ciara Loftus wrote:
> Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
> option to 'true' when configuring the Interface:
> 
> ovs-vsctl set Interface dpdkvhostuserclient0
> options:vhost-server-path=/tmp/dpdkvhostuserclient0
> options:dq-zero-copy=true
> 
> When packets from a vHost device with zero copy enabled are destined for
> a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
> must be set to a smaller value. 128 is recommended. This can be achieved
> like so:
> 
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> 
> Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
> to should not exceed 128. Due to this requirement, the feature is
> considered 'experimental'.
> 
> Testing of the patch showed a 15% improvement when switching 512B
> packets between vHost devices on different VMs on the same host when
> zero copy was enabled on the transmitting device.
> 
> Signed-off-by: Ciara Loftus 
> ---
> v10:
> * Rebase
> * Fix vhost flags
> 
>  Documentation/intro/install/dpdk.rst |  2 +
>  Documentation/topics/dpdk/vhost-user.rst | 73 
> 
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 17 
>  vswitchd/vswitch.xml | 11 +
>  5 files changed, 104 insertions(+)
> 
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 040e62e..93411aa 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK 
> physical interface.
>  The rx queues are assigned to pmd threads on the same NUMA node in a
>  round-robin fashion.
>  
> +.. _dpdk-queues-sizes:
> +
>  DPDK Physical Port Queue Sizes
>  ~~~
>  
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 8447e2d..95517a6 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -458,3 +458,76 @@ Sample XML
>  
>  
>  .. _QEMU documentation: 
> http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy (experimental)
> +---
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy operation
> +must be used to copy that packet from guest address space to host address
> +space. This memcpy can be removed by enabling dequeue zero-copy like so::
> +
> +$ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
> +dpdkvhostuserclient0 type=dpdkvhostuserclient \
> +options:vhost-server-path=/tmp/dpdkvhostclient0 \
> +options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is passed to
> +the host, instead of a copy of the packet. Removing this memcpy can give a
> +performance improvement for some use cases, for example switching large 
> packets
> +between different VMs. However additional packet loss may be observed.
> +
> +Note that the feature is disabled by default and must be explicitly enabled
> +by setting the ``dq-zero-copy`` option to ``true`` while specifying the
> +``vhost-server-path`` option as above. If you wish to split out the command
> +into multiple commands as below, ensure ``dq-zero-copy`` is set before
> +``vhost-server-path``::
> +
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 \
> +options:vhost-server-path=/tmp/dpdkvhostclient0
> +
> +The feature is only available to ``dpdkvhostuserclient`` port types.
> +
> +A limitation exists whereby if packets from a vHost port with
> +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
> +descriptors (``n_txq_desc``) for that port must be reduced to a smaller 
> number,
> +128 being the recommended value. This can be achieved by issuing the 
> following
> +command::
> +
> +$ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +
> +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
> +should not exceed 128. For example, in case of a bond over two physical ports
> +in balance-tcp mode, one must divide 128 by the number of links in the bond.
> +
> +Refer to :ref:`dpdk-queues-sizes` for more information.
> +
> +The reason for this limitation is due to how the zero copy functionality is
> +implemented. The vHost device's 'tx used vring', a virtio structure used for
> +tracking used ie. sent descriptors, will only be updated when the NIC frees
> +the corresponding mbuf. If we don't free the mbufs frequently enough, that
> +vring will be starved and packets will no longer be processed. One way to
> +ensure we don't encounter this scenario, is to

[ovs-dev] [PATCH v3 2/3] datapath-windows: Add a global level RW lock for NAT

2018-01-23 Thread Anand Kumar

Currently NAT module relies on the existing conntrack lock.
This patch provides a basic lock implementation for NAT module
in conntrack.

Signed-off-by: Anand Kumar 
Acked-by: Alin Gabriel Serdean 
---
 datapath-windows/ovsext/Conntrack.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 3cde836..7d56a50 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -32,6 +32,7 @@ KSTART_ROUTINE OvsConntrackEntryCleaner;
 static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
 static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
+static PNDIS_RW_LOCK_EX ovsCtNatLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 static LONG ctTotalEntries;
 
@@ -58,6 +59,13 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
+ovsCtNatLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
+if (ovsCtNatLockObj == NULL) {
+NdisFreeRWLock(ovsConntrackLockObj);
+ovsConntrackLockObj = NULL;
+return STATUS_INSUFFICIENT_RESOURCES;
+}
+
 /* Init the Hash Buffer */
 ovsConntrackTable = OvsAllocateMemoryWithTag(sizeof(LIST_ENTRY)
  * CT_HASH_TABLE_SIZE,
@@ -65,6 +73,8 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 if (ovsConntrackTable == NULL) {
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
@@ -82,6 +92,9 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
 
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
+
 OvsFreeMemoryWithTag(ovsConntrackTable, OVS_CT_POOL_TAG);
 ovsConntrackTable = NULL;
 
@@ -111,7 +124,7 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 VOID
 OvsCleanupConntrack(VOID)
 {
-LOCK_STATE_EX lockState;
+LOCK_STATE_EX lockState, lockStateNat;
 NdisAcquireRWLockWrite(ovsConntrackLockObj, , 0);
 ctThreadCtx.exit = 1;
 KeSetEvent(, 0, FALSE);
@@ -131,7 +144,11 @@ OvsCleanupConntrack(VOID)
 
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatCleanup();
+NdisReleaseRWLock(ovsCtNatLockObj, );
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
 }
 
 static __inline VOID
@@ -197,15 +214,19 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, 
OvsConntrackKeyLookupCtx *ctx,
 if (natInfo == NULL) {
 entry->natInfo.natAction = NAT_ACTION_NONE;
 } else {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 if (OvsIsForwardNat(natInfo->natAction)) {
 entry->natInfo = *natInfo;
 if (!OvsNatTranslateCtEntry(entry)) {
+NdisReleaseRWLock(ovsCtNatLockObj, );
 return FALSE;
 }
 ctx->hash = OvsHashCtKey(>key);
 } else {
 entry->natInfo.natAction = natInfo->natAction;
 }
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 
 entry->timestampStart = now;
@@ -358,7 +379,10 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete)
 }
 if (forceDelete || OvsCtEntryExpired(entry)) {
 if (entry->natInfo.natAction) {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatDeleteKey(>key);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
 RemoveEntryList(>link);
@@ -560,7 +584,10 @@ OvsCtSetupLookupCtx(OvsFlowKey *flowKey,
 return NDIS_STATUS_INVALID_PACKET;
 }
 
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockRead(ovsCtNatLockObj, , 0);
 natEntry = OvsNatLookup(>key, TRUE);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 if (natEntry) {
 /* Translate address first for reverse NAT */
 ctx->key = natEntry->ctEntry->key;
@@ -813,8 +840,11 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
  */
 if (natInfo->natAction != NAT_ACTION_NONE)
 {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatPacket(fwdCtx, entry, entry->natInfo.natAction,
  key, ctx.reply);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 
 OvsCtSetMarkLabel(key, entry, mark, labels, );
@@ -1052,7 +1082,7 @@ OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 
*tuple)
 PLIST_ENTRY link, next;
 POVS_CT_ENTRY entry;
 
-LOCK_STATE_EX lockState;
+LOCK_STATE_EX lockState, lockStateNat;
 NdisAcquireRWLockWrite(ovsConntrackLockObj, ,

[ovs-dev] [PATCH v3 3/3] datapath-windows: Optimize conntrack lock implementation.

2018-01-23 Thread Anand Kumar

Currently, there is one global lock for conntrack module, which protects
conntrack entries and conntrack table. All the NAT operations are
performed holding this lock.

This becomes inefficient, as the number of conntrack entries grow.
With new implementation, we will have two PNDIS_RW_LOCK_EX locks in
conntrack.

1. ovsCtBucketLock - one rw lock per bucket of the conntrack table,
which is shared by all the ct entries that belong to the same bucket.
2. lock -  a rw lock in OVS_CT_ENTRY structure that protects the members
of conntrack entry.

Also, OVS_CT_ENTRY structure will have a lock reference(bucketLockRef)
to the corresponding OvsCtBucketLock of conntrack table.
We need this reference to retrieve ovsCtBucketLock from ct entry
for delete operation.

Signed-off-by: Anand Kumar 
---
v1->v2: Address potential memory leak in conntrack initialization.
v2->v3: Fix invalid memory access after deleting ct entry.
---
 datapath-windows/ovsext/Conntrack-nat.c |   6 +
 datapath-windows/ovsext/Conntrack.c | 233 
 datapath-windows/ovsext/Conntrack.h |   3 +
 3 files changed, 157 insertions(+), 85 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index 7975770..316c946 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -167,12 +167,16 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 {
 UINT32 natFlag;
 const struct ct_endpoint* endpoint;
+LOCK_STATE_EX lockState;
+/* XXX: Move conntrack locks out of NAT after implementing lock in NAT. */
+NdisAcquireRWLockRead(entry->lock, , 0);
 /* When it is NAT, only entry->rev_key contains NATTED address;
When it is unNAT, only entry->key contains the UNNATTED address;*/
 const OVS_CT_KEY *ctKey = reverse ? >key : >rev_key;
 BOOLEAN isSrcNat;
 
 if (!(natAction & (NAT_ACTION_SRC | NAT_ACTION_DST))) {
+NdisReleaseRWLock(entry->lock, );
 return;
 }
 isSrcNat = (((natAction & NAT_ACTION_SRC) && !reverse) ||
@@ -202,6 +206,7 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 }
 } else if (ctKey->dl_type == htons(ETH_TYPE_IPV6)){
 // XXX: IPv6 packet not supported yet.
+NdisReleaseRWLock(entry->lock, );
 return;
 }
 if (natAction & (NAT_ACTION_SRC_PORT | NAT_ACTION_DST_PORT)) {
@@ -215,6 +220,7 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 }
 }
 }
+NdisReleaseRWLock(entry->lock, );
 }
 
 
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 7d56a50..7f75eb2 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -31,7 +31,7 @@
 KSTART_ROUTINE OvsConntrackEntryCleaner;
 static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
-static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
+static PNDIS_RW_LOCK_EX *ovsCtBucketLock;
 static PNDIS_RW_LOCK_EX ovsCtNatLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 static LONG ctTotalEntries;
@@ -49,20 +49,14 @@ MapNlToCtTuple(POVS_MESSAGE msgIn, PNL_ATTR attr,
 NTSTATUS
 OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 {
-NTSTATUS status;
+NTSTATUS status = STATUS_SUCCESS;
 HANDLE threadHandle = NULL;
 ctTotalEntries = 0;
+UINT32 numBucketLocks = CT_HASH_TABLE_SIZE;
 
 /* Init the sync-lock */
-ovsConntrackLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
-if (ovsConntrackLockObj == NULL) {
-return STATUS_INSUFFICIENT_RESOURCES;
-}
-
 ovsCtNatLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
 if (ovsCtNatLockObj == NULL) {
-NdisFreeRWLock(ovsConntrackLockObj);
-ovsConntrackLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
@@ -71,15 +65,27 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
  * CT_HASH_TABLE_SIZE,
  OVS_CT_POOL_TAG);
 if (ovsConntrackTable == NULL) {
-NdisFreeRWLock(ovsConntrackLockObj);
-ovsConntrackLockObj = NULL;
 NdisFreeRWLock(ovsCtNatLockObj);
 ovsCtNatLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
-for (int i = 0; i < CT_HASH_TABLE_SIZE; i++) {
+ovsCtBucketLock = OvsAllocateMemoryWithTag(sizeof(PNDIS_RW_LOCK_EX)
+   * CT_HASH_TABLE_SIZE,
+   OVS_CT_POOL_TAG);
+if (ovsCtBucketLock == NULL) {
+status = STATUS_INSUFFICIENT_RESOURCES;
+goto freeTable;
+}
+
+for (UINT32 i = 0; i < CT_HASH_TABLE_SIZE; i++) {
 InitializeListHead([i]);
+ovsCtBucketLock[i] = NdisAllocateRWLock(context->NdisFilterHandle);
+if (ovsCtBucketLock[i] == NULL) {
+status = STATUS_INSUFFICIENT_RESOURCES;
+numBucketLocks = i;
+

[ovs-dev] [PATCH v3 1/3] datapath-windows: Refactor conntrack code.

2018-01-23 Thread Anand Kumar

Some of the functions and  code are refactored
so that new conntrack lock can be implemented

Signed-off-by: Anand Kumar 
Acked-by: Alin Gabriel Serdean 
---
 datapath-windows/ovsext/Conntrack-nat.c |  11 +-
 datapath-windows/ovsext/Conntrack.c | 174 ++--
 datapath-windows/ovsext/Conntrack.h |   4 -
 3 files changed, 103 insertions(+), 86 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index c778f12..7975770 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -93,26 +93,23 @@ NTSTATUS OvsNatInit()
 sizeof(LIST_ENTRY) * NAT_HASH_TABLE_SIZE,
 OVS_CT_POOL_TAG);
 if (ovsNatTable == NULL) {
-goto failNoMem;
+return STATUS_INSUFFICIENT_RESOURCES;
 }
 
 ovsUnNatTable = OvsAllocateMemoryWithTag(
 sizeof(LIST_ENTRY) * NAT_HASH_TABLE_SIZE,
 OVS_CT_POOL_TAG);
 if (ovsUnNatTable == NULL) {
-goto freeNatTable;
+OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
+return STATUS_INSUFFICIENT_RESOURCES;
 }
 
 for (int i = 0; i < NAT_HASH_TABLE_SIZE; i++) {
 InitializeListHead([i]);
 InitializeListHead([i]);
 }
-return STATUS_SUCCESS;
 
-freeNatTable:
-OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
-failNoMem:
-return STATUS_INSUFFICIENT_RESOURCES;
+return STATUS_SUCCESS;
 }
 
 /*
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 169ec4f..3cde836 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -33,7 +33,7 @@ static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
 static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
-static UINT64 ctTotalEntries;
+static LONG ctTotalEntries;
 
 static __inline OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple);
 static __inline NDIS_STATUS
@@ -212,7 +212,7 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, OvsConntrackKeyLookupCtx 
*ctx,
 InsertHeadList([ctx->hash & CT_HASH_TABLE_MASK],
>link);
 
-ctTotalEntries++;
+InterlockedIncrement((LONG volatile *));
 return TRUE;
 }
 
@@ -235,11 +235,6 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 *entryCreated = FALSE;
 state |= OVS_CS_F_NEW;
 
-parentEntry = OvsCtRelatedLookup(ctx->key, currentTime);
-if (parentEntry != NULL) {
-state |= OVS_CS_F_RELATED;
-}
-
 switch (ipProto) {
 case IPPROTO_TCP:
 {
@@ -283,6 +278,11 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 break;
 }
 
+parentEntry = OvsCtRelatedLookup(ctx->key, currentTime);
+if (parentEntry != NULL && state != OVS_CS_F_INVALID) {
+state |= OVS_CS_F_RELATED;
+}
+
 if (state != OVS_CS_F_INVALID && commit) {
 if (entry) {
 entry->parent = parentEntry;
@@ -315,6 +315,7 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry,
  BOOLEAN reply,
  UINT64 now)
 {
+CT_UPDATE_RES status;
 switch (ipProto) {
 case IPPROTO_TCP:
 {
@@ -322,32 +323,23 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry,
 const TCPHdr *tcp;
 tcp = OvsGetTcp(nbl, l4Offset, );
 if (!tcp) {
-return CT_UPDATE_INVALID;
+status =  CT_UPDATE_INVALID;
+break;
 }
-return OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now);
+status =  OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now);
+break;
 }
 case IPPROTO_ICMP:
-return OvsConntrackUpdateIcmpEntry(entry, reply, now);
+status =  OvsConntrackUpdateIcmpEntry(entry, reply, now);
+break;
 case IPPROTO_UDP:
-return OvsConntrackUpdateOtherEntry(entry, reply, now);
+status =  OvsConntrackUpdateOtherEntry(entry, reply, now);
+break;
 default:
-return CT_UPDATE_INVALID;
-}
-}
-
-static __inline VOID
-OvsCtEntryDelete(POVS_CT_ENTRY entry)
-{
-if (entry == NULL) {
-return;
-}
-if (entry->natInfo.natAction) {
-OvsNatDeleteKey(>key);
+status =  CT_UPDATE_INVALID;
+break;
 }
-OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
-RemoveEntryList(>link);
-OvsFreeMemoryWithTag(entry, OVS_CT_POOL_TAG);
-ctTotalEntries--;
+return status;
 }
 
 static __inline BOOLEAN
@@ -358,6 +350,24 @@ OvsCtEntryExpired(POVS_CT_ENTRY entry)
 return entry->expiration < currentTime;
 }
 
+static __inline VOID
+OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete)
+{
+if (entry == NULL) {
+return;
+}
+if (forceDelete || OvsCtEntryExpired(entry)) {
+if (entry->natInfo.natAction) {
+OvsNatDeleteKey(>key);
+}
+OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
+RemoveEntryList(>link);
+

[ovs-dev] [PATCH v3 0/3] datapath-windows: New lock implementation in conntrack

2018-01-23 Thread Anand Kumar

This patch series replaces existing one RW lock implemenation in
conntrack with two RW locks in conntrack and one RW lock in NAT.
---
v1->v2:
 - Patch 3, address review comments
v2->v3:
 - Patch 3, fix invalid memory access after deleting ct entry
---
Anand Kumar (3):
  datapath-windows: Refactor conntrack code.
  datapath-windows: Add a global level RW lock for NAT
  datapath-windows: Optimize conntrack lock implementation.

 datapath-windows/ovsext/Conntrack-nat.c |  17 +-
 datapath-windows/ovsext/Conntrack.c | 413 
 datapath-windows/ovsext/Conntrack.h |   7 +-
 3 files changed, 279 insertions(+), 158 deletions(-)

-- 
2.9.3.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] skbuff: Add skb_network_trim

2018-01-23 Thread Pravin Shelar

On Mon, Jan 22, 2018 at 6:42 PM, Ed Swierk  wrote:
> IPv4 and IPv6 packets may arrive with lower-layer padding that is not
> included in the L3 length. For example, a short IPv4 packet may have
> up to 6 bytes of padding following the IP payload when received on an
> Ethernet device with a minimum packet length of 64 bytes.
>
> Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
> and help() in nf_conntrack_ftp) assume skb->len reflects the length of
> the L3 header and payload, rather than referring back to
> ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
> lower-layer padding.
>
> In the normal IPv4 receive path, ip_rcv() trims the packet to
> ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
> path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
> in the br_netfilter receive path, br_validate_ipv4() and
> br_validate_ipv6() trim the packet to the L3 length before invoking
> netfilter hooks.
>
> Currently the openvswitch receive path does not perform such trimming,
> which will be fixed by the next patch. In preparation, this patch adds
> a generic skb_network_trim() function.
>
> Signed-off-by: Ed Swierk 
> ---
>  include/linux/skbuff.h |  2 ++
>  net/core/skbuff.c  | 44 
>  2 files changed, 46 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 051e093..0478645 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4018,6 +4018,8 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
> *skb,
>  unsigned int transport_len,
>  __sum16(*skb_chkf)(struct sk_buff *skb));
>
> +int skb_network_trim(struct sk_buff *skb);
> +
>  /**
>   * skb_head_is_locked - Determine if the skb->head is locked down
>   * @skb: skb to check
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 15fa5ba..cef3d1e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4743,6 +4743,50 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
> *skb,
>  }
>  EXPORT_SYMBOL(skb_checksum_trimmed);
>
> +/**
> + * skb_network_trim - trim skb to length specified by the network header
> + * @skb: the skb to trim
> + *
> + * Trims the skb to the length specified by the IP/IPv6 header,
> + * removing any trailing lower-layer padding. This prepares the skb
> + * for higher-layer processing that assumes skb->len excludes padding.
> + *
> + * Leaves the skb alone if the protocol is not IP or IPv6. Frees the
> + * skb on error.
> + *
> + * Caller needs to pull the skb to the network header.
> + */
> +int skb_network_trim(struct sk_buff *skb)
> +{
> +   unsigned int len;
> +   int err = -ENOMEM;
> +
> +   switch (skb->protocol) {
> +   case htons(ETH_P_IP):
> +   if (!pskb_may_pull(skb, sizeof(struct iphdr)))
> +   goto out;
Since you are going to move this to OVS specific code, can you remove
this skb-pull, which is not required in OVS code path.

> +   len = ntohs(ip_hdr(skb)->tot_len);
> +   break;
> +   case htons(ETH_P_IPV6):
> +   if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
> +   goto out;
> +   len = sizeof(struct ipv6hdr)
> +   + ntohs(ipv6_hdr(skb)->payload_len);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofproto-dpif: Delete system tunnel interface when remove ovs bridge

2018-01-23 Thread Ben Pfaff

On Thu, Oct 26, 2017 at 10:24:46AM -0400, Eric Garver wrote:
> On Wed, Oct 25, 2017 at 11:41:27AM +0800, ju...@redhat.com wrote:
> > When there is only one bridge,create tunnel in the bridge,
> > then delete the bridge directly. the system tunnel interface
> > still in.
> > 
> > Cause of only one bridge, backer->refcount values 1, when
> > delete bridge, "close_dpif_backer" will delete the backer,
> > so type_run will return directly, doesn't delete the interface.
> > This patch delete the system interface before free the backer.
> 
> I'll add a bit more explanation..
> 
> This occurs when a tunnel is created with rtnetlink. With the compat API
> the tunnel is created via the vport tunnel interface, so it can be
> implicitly cleaned up by the kernel when the dp is closed or the module
> unloaded.  But with rtnetlink the kernel module is not involved with the
> tunnel device creation (it's added to OVS as a netdev vport), so
> userspace needs to explicitly clean up the tunnel backers - type_run
> can't garbage collect them if the dpif is already deleted.

I guess this has been due to be applied for a long time!  I am sorry
that I missed it.

The code looks OK but I don't yet understand the consequences.  What
problem does this patch solve?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-linux: dev_stats should be initialized before used

2018-01-23 Thread Ben Pfaff

On Sat, Oct 28, 2017 at 04:40:37PM +0800, zhongbaisong wrote:
> From: zhongbasiong 
> 
> struct netdev_stats dev_stats was used without initialized.
> As result, the output of 'ovs-vsctl list interface' has some random
> values.
> 
> Signed-off-by: zhongbasiong 
> ---
>  lib/netdev-linux.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 2ff3e2b..09f174b 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -1726,6 +1726,7 @@ netdev_linux_get_stats(const struct netdev *netdev_,
>  int error;
>  
>  ovs_mutex_lock(>mutex);
> +memset(_stats, 0, sizeof(struct netdev_stats));
>  get_stats_via_vport(netdev_, stats);
>  error = get_stats_via_netlink(netdev_, _stats);
>  if (error) {
> @@ -1777,6 +1778,7 @@ netdev_tap_get_stats(const struct netdev *netdev_, 
> struct netdev_stats *stats)
>  int error;
>  
>  ovs_mutex_lock(>mutex);
> +memset(_stats, 0, sizeof(struct netdev_stats));
>  get_stats_via_vport(netdev_, stats);
>  error = get_stats_via_netlink(netdev_, _stats);
>  if (error) {

Can you help me to understand this better?  The first statement in
get_stats_via_netlink() is
memset(stats, 0xFF, sizeof(struct netdev_stats));
which would seem to overwrite the zeroing in any case.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Question about Rx mergeable buffer

2018-01-23 Thread 王志克

Hi,

I have question about RX merge feature.
Below mentions that set mrg_rxbuf=off can improve performance.  So question 1:
How much would it be affected for throughput?

*
Rx Mergeable 
Buffers¶

Rx mergeable buffers is a virtio feature that allows chaining of multiple 
virtio descriptors to handle large packet sizes. Large packets are handled by 
reserving and chaining multiple free descriptors together. Mergeable buffer 
support is negotiated between the virtio driver and virtio device and is 
supported by the DPDK vhost library. This behavior is supported and enabled by 
default, however in the case where the user knows that rx mergeable buffers are 
not needed i.e. jumbo frames are not needed, it can be forced off by adding 
mrg_rxbuf=off to the QEMU command line options. By not reserving multiple 
chains of descriptors it will make more individual virtio descriptors available 
for rx to the guest using dpdkvhost ports and this can improve performance.
***

http://docs.openvswitch.org/en/latest/howto/dpdk/?highlight=mrg_rxbuf

It mentions mrg_rxbuf must be set to on in order to support Jumbo frames.

So question 2: I am wondering if I set mtu to 2000, should mrg_rxbuf be also a 
MUST? Is there a threshold to switch the configuration?


Some additional configuration is needed to take advantage of jumbo frames with 
vHost ports:
mergeable buffers must be enabled for vHost ports, as demonstrated in the QEMU 
command line snippet below:”.
**

BR,
Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] The dependency between ovndb_servers-master and VirtualIP is wrong

2018-01-23 Thread Guoshuai Li


Hello, xurong, zhaojingjing:

At that time, in order to avoid the problem of losing after the 
active/slave switchover,

the slave node was promote prior to the start of the VIP.
https://github.com/openvswitch/ovs/commit/a38f532320936147e6831153828af5bcba2fba54#diff-fb92408f16ddd415d9906e3976e60f47L255

After that, the problem of data loss was completely solved in ovsdb,
https://github.com/openvswitch/ovs/commit/05ac209a5d3a5e85896f58d16b244e6b2a4cf2d0#diff-6878621809e8ea9210d85a69a5e06367

I think it is possible to change this sequence again.

Are there any new problems detected?


From: zhaojingjing0067370 

---
  Documentation/topics/integration.rst | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/topics/integration.rst 
b/Documentation/topics/integration.rst
index 0447faf..d129e21 100644
--- a/Documentation/topics/integration.rst
+++ b/Documentation/topics/integration.rst
@@ -255,6 +255,6 @@ with the active server::

  $ pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=x.x.x.x \
  op monitor interval=30s
-$ pcs constraint order promote ovndb_servers-master then VirtualIP
-$ pcs constraint colocation add VirtualIP with master ovndb_servers-master 
\
+$ pcs constraint order start VirtualIP then promote ovndb_servers-master
+$ pcs constraint colocation add master ovndb_servers-master with VirtualIP 
\
  score=INFINITY
--
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] checkpatch.py: Add check for "xxx" in comments.

2018-01-23 Thread Justin Pettit

"xxx" is often used to indicate items that the developer wanted to look
at again before committing.  Flag those as a warning.

Signed-off-by: Justin Pettit 
---
 utilities/checkpatch.py | 16 
 1 file changed, 16 insertions(+)

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 33feb6bddca1..42777e6fcb15 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -1,5 +1,6 @@
 #!/usr/bin/env python
 # Copyright (c) 2016, 2017 Red Hat, Inc.
+# Copyright (c) 2018 Nicira, Inc.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -95,9 +96,11 @@ __regex_ends_with_bracket = \
 re.compile(r'[^\s]\) {(\s+/\*[\s\Sa-zA-Z0-9\.,\?\*/+-]*)?$')
 __regex_ptr_declaration_missing_whitespace = re.compile(r'[a-zA-Z0-9]\*[^*]')
 __regex_is_comment_line = re.compile(r'^\s*(/\*|\*\s)')
+__regex_has_comment = re.compile(r'.*(/\*|\*\s)')
 __regex_trailing_operator = re.compile(r'^[^ ]* [^ ]*[?:]$')
 __regex_conditional_else_bracing = re.compile(r'^\s*else\s*{?$')
 __regex_conditional_else_bracing2 = re.compile(r'^\s*}\selse\s*$')
+__regex_has_xxx_mark = re.compile(r'.*xxx.*', re.IGNORECASE)
 
 skip_leading_whitespace_check = False
 skip_trailing_whitespace_check = False
@@ -213,11 +216,19 @@ def is_comment_line(line):
 """Returns TRUE if the current line is part of a block comment."""
 return __regex_is_comment_line.match(line) is not None
 
+def has_comment(line):
+"""Returns TRUE if the current line contains a comment or is part of
+   a block comment."""
+return __regex_has_comment.match(line) is not None
 
 def trailing_operator(line):
 """Returns TRUE if the current line ends with an operatorsuch as ? or :"""
 return __regex_trailing_operator.match(line) is not None
 
+def has_xxx_mark(line):
+"""Returns TRUE if the current line contains 'xxx'."""
+return __regex_has_xxx_mark.match(line) is not None
+
 
 checks = [
 {'regex': None,
@@ -257,6 +268,11 @@ checks = [
  'check': lambda x: trailing_operator(x),
  'print':
  lambda: print_error("Line has '?' or ':' operator at end of line")},
+
+{'regex': '(\.c|\.h)(\.in)?$', 'match_name': None,
+ 'prereq': lambda x: has_comment(x),
+ 'check': lambda x: has_xxx_mark(x),
+ 'print': lambda: print_warning("Comment with 'xxx' marker")},
 ]
 
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v1] ofproto-dpif-xlate: Fix incorrect handling of return value.

2018-01-23 Thread huanglili

From: Lili Huang 

The value cookie_offset should be 'size_t' type.

Signed-off-by: Lili Huang 
---
 ofproto/ofproto-dpif-xlate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 40c04cc..a54ade1 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -2961,7 +2961,7 @@ compose_sample_action(struct xlate_ctx *ctx,
 ctx->xbridge, ctx->xin->flow.in_port.ofp_port);
 uint32_t pid = dpif_port_get_pid(ctx->xbridge->dpif, odp_port,
  flow_hash_5tuple(>xin->flow, 0));
-int cookie_offset = odp_put_userspace_action(pid, cookie, sizeof *cookie,
+size_t cookie_offset = odp_put_userspace_action(pid, cookie, sizeof 
*cookie,
  tunnel_out_port,
  include_actions,
  ctx->odp_actions);
-- 
1.9.5.msysgit.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: [PATCH] Ofproto: fix the bug wrong datapath flow with same in_port and output port was generated

2018-01-23 Thread Lilijun (Jerry)

Yes, I also found this four related functions just again.  

I will do some test with your patch, thanks.

-邮件原件-
发件人: Huanle Han [mailto:hanxue...@gmail.com] 
发送时间: 2018年1月23日 12:33
收件人: Ben Pfaff ; Lilijun (Jerry) 
抄送:  ; Liuyongan 
; Zhoujingbin 
主题: Re: [ovs-dev] [PATCH] Ofproto: fix the bug wrong datapath flow with same 
in_port and output port was generated

I have done a similar fix few months ago. The commit tries to avoid an 
inconsistent result caused by using different xcfg pointers.
https://github.com/openvswitch/ovs/pull/210/commits/ec75e8d9bb0e16690447b40b39d2fd0fb0f7aae2

I think a few more place fixes needed for this patch:
  update_mcast_snooping_table()
  xlate_normal_mcast_send_group()
  xlate_normal_mcast_send_mrouters()
  xlate_normal_mcast_send_fports()

> Thank you for the analysis and the fix!  I applied this to master and 
> backported as far as branch-2.8.
>
> On Fri, Jan 19, 2018 at 08:12:30AM +, Lilijun (Jerry) wrote:
>> Hi all,
>>
>> In my test, the new datapath flow which has the same in_port and actions 
>> output port was found using ovs-appctl dpctl/dump-flows.
>> Then the mac address will move from one port to another and back it again in 
>> the physical switch. This problem result in the VM's traffic become abnormal.
>>
>> My test key steps:
>> 1) There are three VM using ovs bridge and intel 82599 nics as uplink 
>> port, deployed in different hosts connecting to the same physical switch. 
>> They can be named using VM-A, VM-B and VM-C, Host-A, Host-B, Host-C.
>> 2) VM-A send many unicast packets to VM-B, and VM-B also send unicast 
>> packets to VM-A.
>> 3) VM-C ping VM-A continuously, and do ovs port add/delete testing in 
>> Host-C ovs bridge.
>> 4) In some abormal scence, the physical switch clear all the mac-entry 
>> on each ports. Then Host-C ovs bridge's uplink port will receive two 
>> direction packets(VM-A to VM-B, and VM-B to VM-A).
>>
>> The expected result is that this two direction packets should be droppd in 
>> the uplink port. Because the dst port of this packets is the uplink port 
>> which is also the src port by looking ovs bridge's mac-entry table learned 
>> by ovs NORMAL rules.
>> But the truth is some packets being sent back to uplink port and physical 
>> switch. And then VM-A's mac was moved to the physical switch port of Host-C 
>> from the port of Host-A, as a reulst, VM-C ping VM-A failed at this time.
>> When this problem occurs, the abnormal ovs datapath's flow "in_port(2) 
>> actions:2" was found by executing the command "ovs-appctl dpctl/dump-flows".
>>
>> Currently, xlate_normal() uses xbundle pointer compare to verify the 
>> packet's dst port whether is same with its input port. This implemention may 
>> be wrong while calling xlate_txn_start/xlate_txn_commit in type_run() at the 
>> same time, because xcfg/xbridge/xbundle object was reallocated and copied 
>> just before we lookup the dst mac_port and mac_xbundle. Then mac_xbundle and 
>> in_xbundle are same related with the uplink port but not same object pointer.
>>
>> And we can fix this bug by adding ofbundle check conditions shown in my 
>> patch.
>>
>> Signed-off-by: Lilijun 
>>
>>
>> diff --git a/ofproto/ofproto-dpif-xlate.c 
>> b/ofproto/ofproto-dpif-xlate.c index d94e9dc..b7a8b55 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -2604,7 +2604,9 @@ xlate_normal_mcast_send_rports(struct xlate_ctx *ctx,
>>  xcfg = ovsrcu_get(struct xlate_cfg *, );
>>  LIST_FOR_EACH(rport, node, >rport_list) {
>>  mcast_xbundle = xbundle_lookup(xcfg, rport->port);
>> -if (mcast_xbundle && mcast_xbundle != in_xbundle) {
>> +if (mcast_xbundle
>> +&& mcast_xbundle != in_xbundle
>> +&& mcast_xbundle->ofbundle != in_xbundle->ofbundle) {
>>  xlate_report(ctx, OFT_DETAIL,
>>   "forwarding report to mcast flagged port");
>>  output_normal(ctx, mcast_xbundle, xvlan); @@ -2626,6 
>> +2628,7 @@ xlate_normal_flood(struct xlate_ctx *ctx, struct xbundle 
>> *in_xbundle,
>>  LIST_FOR_EACH (xbundle, list_node, >xbridge->xbundles) {
>>  if (xbundle != in_xbundle
>> +&& xbundle->ofbundle != in_xbundle->ofbundle
>>  && xbundle_includes_vlan(xbundle, xvlan)
>>  && xbundle->floodable
>>  && !xbundle_mirror_out(ctx->xbridge, xbundle)) { @@ 
>> -2833,7 +2836,9 @@ xlate_normal(struct xlate_ctx *ctx)
>>  if (mac_port) {
>>  struct xlate_cfg *xcfg = ovsrcu_get(struct xlate_cfg *, );
>>  struct xbundle *mac_xbundle = xbundle_lookup(xcfg, mac_port);
>> -if (mac_xbundle && mac_xbundle != in_xbundle) {
>> +if (mac_xbundle
>> +&& mac_xbundle != in_xbundle
>> +&& mac_xbundle->ofbundle

Re: [ovs-dev] [PATCH v5 1/2] ovn-controller: Add extend_table instead of group_table to expand meter.

2018-01-23 Thread Ben Pfaff

On Wed, Dec 06, 2017 at 12:32:39PM +0800, Guoshuai Li wrote:
> The structure and function of the group table and meter table are similar,
> refactoring code is used to extend for add the meter table.
> The following function as lib: table init/destroy/clear,
> install contents from desired, remove contents from existing,
> Move the contents of desired to existing.
> 
> Signed-off-by: Guoshuai Li 

Thanks a lot for working on this.  I mostly like the new approach.  I do
have one nontrivial request, which is to remove the use of the callback
functions.  Usually callback functions make code harder to write and
harder to understand because of how they break up code in unnatural
ways.  In this case, I think that we can just substitute iterator
macros, which I generally find easier on all accounts.

I posted a 3-patch series that shows the direction that I'm thinking
about.  The first patch is just your patch #1.  The second patch is an
example of what I mean about using an iterator instead of a callback
function.  I only removed the 'create' callback, not the 'remove'
callback, but I'd want both to be removed for the final commit.  The
third patch is just trivia to make the style better match the OVS coding
style.

The series is here:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=25039

Are you willing to revise the series in this direction?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 2/3] extend-table: Use iterator macro instead of create callback.

2018-01-23 Thread Ben Pfaff

This is my suggestion as a way to make the use of extend-table code simpler
and more transparent.  This is intended to be squashed into the previous
commit.

Signed-off-by: Ben Pfaff 
---
 ovn/controller/ofctrl.c | 47 ---
 ovn/lib/extend-table.c  | 18 +-
 ovn/lib/extend-table.h  | 17 +++--
 3 files changed, 32 insertions(+), 50 deletions(-)

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 94ee69c1c3ed..c81c40c6e646 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -69,7 +69,6 @@ static char *ovn_flow_to_string(const struct ovn_flow *);
 static void ovn_flow_log(const struct ovn_flow *, const char *action);
 static void ovn_flow_destroy(struct ovn_flow *);
 
-static void add_group(uint32_t table_id, struct ds *ds, struct ovs_list *msgs);
 static void delete_group(uint32_t table_id, struct ovs_list *msgs);
 
 /* OpenFlow connection to the switch. */
@@ -162,7 +161,6 @@ ofctrl_init(struct ovn_extend_table *group_table)
 ovs_list_init(_updates);
 ovn_init_symtab();
 groups = group_table;
-groups->create = add_group;
 groups->remove = delete_group;
 }
 
@@ -767,30 +765,6 @@ add_group_mod(const struct ofputil_group_mod *gm, struct 
ovs_list *msgs)
 }
 
 static void
-add_group(uint32_t table_id, struct ds *ds, struct ovs_list *msgs) {
-/* Create and install new group. */
-struct ofputil_group_mod gm;
-enum ofputil_protocol usable_protocols;
-char *error;
-struct ds group_string = DS_EMPTY_INITIALIZER;
-ds_put_format(_string, "group_id=%"PRIu32",%s",
-  table_id, ds_cstr(ds));
-
-error = parse_ofp_group_mod_str(, OFPGC11_ADD, ds_cstr(_string),
-NULL, _protocols);
-if (!error) {
-add_group_mod(, msgs);
-} else {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-VLOG_ERR_RL(, "new group %s %s", error,
-ds_cstr(_string));
-free(error);
-}
-ds_destroy(_string);
-ofputil_uninit_group_mod();
-}
-
-static void
 delete_group(uint32_t table_id, struct ovs_list *msgs) {
 /* Delete the group. */
 struct ofputil_group_mod gm;
@@ -878,7 +852,26 @@ ofctrl_put(struct hmap *flow_table, struct shash 
*pending_ct_zones,
 }
 }
 
-ovn_extend_table_install_desired(groups, );
+struct ovn_extend_table_info *desired;
+EXTEND_TABLE_FOR_EACH_UNINSTALLED (desired, groups) {
+char *group_string = xasprintf("group_id=%"PRIu32",%s",
+   desired->table_id,
+   ds_cstr(>info));
+
+struct ofputil_group_mod gm;
+enum ofputil_protocol usable_protocols;
+char *error = parse_ofp_group_mod_str(, OFPGC11_ADD, group_string,
+  NULL, _protocols);
+if (!error) {
+add_group_mod(, );
+} else {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_ERR_RL(, "new group %s %s", error, group_string);
+free(error);
+}
+free(group_string);
+ofputil_uninit_group_mod();
+}
 
 /* Iterate through all of the installed flows.  If any of them are no
  * longer desired, delete them; if any of them should have different
diff --git a/ovn/lib/extend-table.c b/ovn/lib/extend-table.c
index dfd1a9413cfc..26391b6bd264 100644
--- a/ovn/lib/extend-table.c
+++ b/ovn/lib/extend-table.c
@@ -53,7 +53,7 @@ void ovn_extend_table_destroy(struct ovn_extend_table *table)
 
 /* Finds and returns a group_info in 'existing' whose key is identical
  * to 'target''s key, or NULL if there is none. */
-static struct ovn_extend_table_info *
+struct ovn_extend_table_info *
 ovn_extend_table_lookup(struct hmap *exisiting,
 const struct ovn_extend_table_info *target)
 {
@@ -88,22 +88,6 @@ ovn_extend_table_clear(struct ovn_extend_table *table, bool 
existing)
 }
 
 void
-ovn_extend_table_install_desired(struct ovn_extend_table *table,
- struct ovs_list *msgs)
-{
-struct ovn_extend_table_info *desired;
-
-/* Iterate through all the desired tables. If there are new ones,
- * add them to the switch. */
-HMAP_FOR_EACH (desired, hmap_node, >desired) {
-if (!ovn_extend_table_lookup(>existing, desired)) {
-/* Create and install new group. */
-table->create(desired->table_id, >info, msgs);
-}
-}
-}
-
-void
 ovn_extend_table_remove_existing(struct ovn_extend_table *table,
  struct ovs_list *msgs)
 {
diff --git a/ovn/lib/extend-table.h b/ovn/lib/extend-table.h
index ededba59dc01..67ac5d77436e 100644
--- a/ovn/lib/extend-table.h
+++ b/ovn/lib/extend-table.h
@@ -33,9 +33,6 @@ struct ovn_extend_table {
 struct hmap desired;
 struct hmap existing;
 
-/* Used

[ovs-dev] [PATCH 3/3] extend-table: Minor style fixups.

2018-01-23 Thread Ben Pfaff

This is intended to be squashed with the previous commit.

Signed-off-by: Ben Pfaff 
---
 ovn/controller/ofctrl.c |  3 ++-
 ovn/lib/extend-table.c  |  6 --
 ovn/lib/extend-table.h  | 14 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index c81c40c6e646..61bbf54bbdf2 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -765,7 +765,8 @@ add_group_mod(const struct ofputil_group_mod *gm, struct 
ovs_list *msgs)
 }
 
 static void
-delete_group(uint32_t table_id, struct ovs_list *msgs) {
+delete_group(uint32_t table_id, struct ovs_list *msgs)
+{
 /* Delete the group. */
 struct ofputil_group_mod gm;
 enum ofputil_protocol usable_protocols;
diff --git a/ovn/lib/extend-table.c b/ovn/lib/extend-table.c
index 26391b6bd264..22ad88ec2dcd 100644
--- a/ovn/lib/extend-table.c
+++ b/ovn/lib/extend-table.c
@@ -22,7 +22,8 @@
 
 VLOG_DEFINE_THIS_MODULE(extend_table);
 
-void ovn_extend_table_init(struct ovn_extend_table *table)
+void
+ovn_extend_table_init(struct ovn_extend_table *table)
 {
 table->table_ids = bitmap_allocate(MAX_EXT_TABLE_ID);
 bitmap_set1(table->table_ids, 0); /* table id 0 is invalid. */
@@ -30,7 +31,8 @@ void ovn_extend_table_init(struct ovn_extend_table *table)
 hmap_init(>existing);
 }
 
-void ovn_extend_table_destroy(struct ovn_extend_table *table)
+void
+ovn_extend_table_destroy(struct ovn_extend_table *table)
 {
 bitmap_free(table->table_ids);
 
diff --git a/ovn/lib/extend-table.h b/ovn/lib/extend-table.h
index 67ac5d77436e..361819e8c258 100644
--- a/ovn/lib/extend-table.h
+++ b/ovn/lib/extend-table.h
@@ -45,23 +45,23 @@ struct ovn_extend_table_info {
  * ovn_extend_table's 'table_ids' bitmap. */
 };
 
-void ovn_extend_table_init(struct ovn_extend_table *table);
+void ovn_extend_table_init(struct ovn_extend_table *);
 
-void ovn_extend_table_destroy(struct ovn_extend_table *table);
+void ovn_extend_table_destroy(struct ovn_extend_table *);
 
-void ovn_extend_table_remove_existing(struct ovn_extend_table *table,
+void ovn_extend_table_remove_existing(struct ovn_extend_table *,
   struct ovs_list *msgs);
 
 struct ovn_extend_table_info *ovn_extend_table_lookup(
 struct hmap *, const struct ovn_extend_table_info *);
 
 /* Move the contents of desired to existing. */
-void ovn_extend_table_move(struct ovn_extend_table *table);
+void ovn_extend_table_move(struct ovn_extend_table *);
 
-void ovn_extend_table_clear(struct ovn_extend_table *table, bool existing);
+void ovn_extend_table_clear(struct ovn_extend_table *, bool existing);
 
-uint32_t ovn_extend_table_assign_id(struct ovn_extend_table *table,
-struct ds *ds);
+uint32_t ovn_extend_table_assign_id(struct ovn_extend_table *,
+struct ds *);
 
 /* Iterates 'DESIRED' through all of the 'ovn_extend_table_info's in
  * 'TABLE'->desired that are not in 'TABLE'->existing.  (The loop body
-- 
2.10.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 1/3] ovn-controller: Add extend_table instead of group_table to expand meter.

2018-01-23 Thread Ben Pfaff

From: Guoshuai Li 

The structure and function of the group table and meter table are similar,
refactoring code is used to extend for add the meter table.
The following function as lib: table init/destroy/clear,
install contents from desired, remove contents from existing,
Move the contents of desired to existing.

Signed-off-by: Guoshuai Li 
Signed-off-by: Ben Pfaff 
---
 include/ovn/actions.h   |  21 +
 ovn/controller/lflow.c  |   8 +-
 ovn/controller/lflow.h  |   4 +-
 ovn/controller/ofctrl.c | 180 +---
 ovn/controller/ofctrl.h |   7 +-
 ovn/controller/ovn-controller.c |  20 +---
 ovn/lib/actions.c   |  58 ++--
 ovn/lib/automake.mk |   2 +
 ovn/lib/extend-table.c  | 198 
 ovn/lib/extend-table.h  |  69 ++
 tests/test-ovn.c|   8 +-
 11 files changed, 356 insertions(+), 219 deletions(-)
 create mode 100644 ovn/lib/extend-table.c
 create mode 100644 ovn/lib/extend-table.h

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 85a484ffac20..ea90dbb2a69a 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -31,6 +31,7 @@ struct lexer;
 struct ofpbuf;
 struct shash;
 struct simap;
+struct ovn_extend_table;
 
 /* List of OVN logical actions.
  *
@@ -338,24 +339,6 @@ void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t 
len);
 OVNACTS
 #undef OVNACT
 
-#define MAX_OVN_GROUPS 65535
-
-struct group_table {
-unsigned long *group_ids;  /* Used as a bitmap with value set
-* for allocated group ids in either
-* desired_groups or existing_groups. */
-struct hmap desired_groups;
-struct hmap existing_groups;
-};
-
-struct group_info {
-struct hmap_node hmap_node;
-struct ds group;
-uint32_t group_id;
-bool new_group_id;  /* 'True' if 'group_id' was reserved from
- * group_table's 'group_ids' bitmap. */
-};
-
 enum action_opcode {
 /* "arp { ...actions... }".
  *
@@ -505,7 +488,7 @@ struct ovnact_encode_params {
 bool is_gateway_router;
 
 /* A struct to figure out the group_id for group actions. */
-struct group_table *group_table;
+struct ovn_extend_table *group_table;
 
 /* OVN maps each logical flow table (ltable), one-to-one, onto a physical
  * OpenFlow flow table (ptable).  A number of parameters describe this
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index a62ec6ebe09f..3d990c49c195 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -61,7 +61,7 @@ static void consider_logical_flow(struct controller_ctx *ctx,
   const struct chassis_index *chassis_index,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
-  struct group_table *group_table,
+  struct ovn_extend_table *group_table,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
@@ -143,7 +143,7 @@ static void
 add_logical_flows(struct controller_ctx *ctx,
   const struct chassis_index *chassis_index,
   const struct hmap *local_datapaths,
-  struct group_table *group_table,
+  struct ovn_extend_table *group_table,
   const struct sbrec_chassis *chassis,
   const struct shash *addr_sets,
   struct hmap *flow_table,
@@ -190,7 +190,7 @@ consider_logical_flow(struct controller_ctx *ctx,
   const struct chassis_index *chassis_index,
   const struct sbrec_logical_flow *lflow,
   const struct hmap *local_datapaths,
-  struct group_table *group_table,
+  struct ovn_extend_table *group_table,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
@@ -434,7 +434,7 @@ lflow_run(struct controller_ctx *ctx,
   const struct sbrec_chassis *chassis,
   const struct chassis_index *chassis_index,
   const struct hmap *local_datapaths,
-  struct group_table *group_table,
+  struct ovn_extend_table *group_table,
   const struct shash *addr_sets,
   struct hmap *flow_table,
   struct sset *active_tunnels,
diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h
index bfb7415e2b13..087b0ed8da35 100644
--- a/ovn/controller/lflow.h
+++ b/ovn/controller/lflow.h
@@ -37,7 +37,7 @@
 
 struct chassis_index;
 struct controller_ctx;
-struct

[ovs-dev] Cómo vender en tiempos difíciles

2018-01-23 Thread tipos, ventajas e inconvenientes

Temario e Inscripciones:

Respondiendo por este medio "Ventas", Y su  o marcando al:
045 + 5515546630 


Cómo vender en tiempos difíciles
Febrero 02 - webinar Interactivo
 

Temas a tratar:

Herramientas de automotivación y autocontrol.
Vías de contacto: tipos, ventajas e inconvenientes. 
El sondeo: técnicas de preguntas. Tipos y su manejo eficaz.
Como delizar al cliente para que nos mantenga siempre como su primera opción.



Desarrollar y fomentar en el participante nuevas actitudes, comportamientos y 
hábitos que refuercen, junto con su entusiasmo y su compromiso con la empresa, 
las habilidades de venta necesarias para influir más positivamente en sus 
actuales y/o potenciales clientes y generar, en consecuencia, a pesar de las 
dificultades del mercado de hoy, mejores resultados comerciales.
 
Inscríbase Ahora, el cupo es limitado
 Atte. el equipo comercial



















¿Siente que recibe demasiados comunicados de nuestra parte en su cuenta 
d...@openvswitch.org ? Favor de enviar la palabra Calendario y reciba 
únicamente nuestro calendario mensual. Si no esta interesado en seguir 
recibiendo ofertas de nuestra empresa, responda con la palabra Eliminar



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] skbuff: Add skb_network_trim

2018-01-23 Thread David Miller

From: Ed Swierk 
Date: Mon, 22 Jan 2018 18:42:18 -0800

> IPv4 and IPv6 packets may arrive with lower-layer padding that is not
> included in the L3 length. For example, a short IPv4 packet may have
> up to 6 bytes of padding following the IP payload when received on an
> Ethernet device with a minimum packet length of 64 bytes.
> 
> Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
> and help() in nf_conntrack_ftp) assume skb->len reflects the length of
> the L3 header and payload, rather than referring back to
> ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
> lower-layer padding.
> 
> In the normal IPv4 receive path, ip_rcv() trims the packet to
> ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
> path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
> in the br_netfilter receive path, br_validate_ipv4() and
> br_validate_ipv6() trim the packet to the L3 length before invoking
> netfilter hooks.
> 
> Currently the openvswitch receive path does not perform such trimming,
> which will be fixed by the next patch. In preparation, this patch adds
> a generic skb_network_trim() function.
> 
> Signed-off-by: Ed Swierk 

Unfortunately this helper needs some more work.

It really doesn't belong in generic skbuff.c code, as it adds a whole
bunch of protocol specific header accesses and interpretation.

Also, there is only a very specific context in which it can be used
(receive) and this is not clear at all from the name.

To be quite honest, since there will be no other users of this helper
right now, just put it into the OVS conntrack.c code.

Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Kreditangebot (Loan Offer)

2018-01-23 Thread s410016


ATTN !!!

 Sehr geehrter Herr / Frau, Wir sind eine private zertifizierte
Kreditfinanzierungsgesellschaft, die Darlehen an Leute anbietet, die einen
Kredit benötigen. Schuldenkonsolidierung, Business oder Personal benötigt
Darlehen von 3.000,00 Euro bis 100.000.000,00 Euro mit niedrigen Zinssatz
von 3% Kreditzins pro Jahr. Wir betrachten Ihren Fall unabhängig von Ihrer
schlechten Kredit-Geschichte, dieses Darlehen und Finanzen wird Ihnen
helfen, Ihre finanziellen Probleme aufzubauen. Alle Arten von Krediten sind
zugänglich, wenn Sie daran interessiert sind, einen garantierten Barkredit
mit GARANTIERTER GENEHMIGUNG zu erhalten. Je schneller Sie antworten, desto
schneller erhalten Sie das dringend benötigte Bargeld. Jede Form?
Kontaktieren Sie uns jetzt für Ihre dringende
Kreditbearbeitung.  info.climaxfinan...@gmail.com

ALLE ANTWORTEN: info.climaxfinan...@gmail.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] docs: Fix formatting in fedora.rst

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 02:21:31PM -0800, Yi-Hung Wei wrote:
> Fix rst formatting in fedora.rst so that the commands look correctly
> on the web.
> 
> Signed-off-by: Yi-Hung Wei 

Thanks, applied to master and branch-2.9.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] LACP: Check active partner sys id

2018-01-23 Thread Ben Pfaff

On Wed, Dec 06, 2017 at 10:36:33AM +, Róbert Mulik wrote:
> A reboot of one switch in an MC-LAG bond makes all bond links
> to go down, causing a total connectivity loss for 3 seconds.
> 
> Packet capture shows that spurious LACP PDUs are sent to OVS with
> a different MAC address (partner system id) during the final
> stages of the MC-LAG switch reboot. The current implementation
> doesn't care about the partner sys_id (MAC address).

Thanks for the patch and especially for the careful diagnosis!  I
applied this to master and branch-2.9.  Maybe I should apply it to
earlier branches as a bug fix (would love to hear opinions on that).

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] docs: Fix formatting in fedora.rst

2018-01-23 Thread Gregory Rose


On 1/23/2018 2:21 PM, Yi-Hung Wei wrote:

Fix rst formatting in fedora.rst so that the commands look correctly
on the web.

Signed-off-by: Yi-Hung Wei 
---
  Documentation/intro/install/fedora.rst | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/intro/install/fedora.rst 
b/Documentation/intro/install/fedora.rst
index ff9326fbc9c6..5e29e37bf5e6 100644
--- a/Documentation/intro/install/fedora.rst
+++ b/Documentation/intro/install/fedora.rst
@@ -64,10 +64,10 @@ The command below will create a temporary SPEC file::
  
  And to install specific dependencies, use the corresponding tool below.

  For some of the dependencies on RHEL you may need to add two additional
-repositories to help yum-builddep.
+repositories to help yum-builddep, e.g.::
  
-subscription-manager repos --enable=rhel-7-server-extras-rpms

-subscription-manager repos --enable=rhel-7-server-optional-rpms
+$ subscription-manager repos --enable=rhel-7-server-extras-rpms
+$ subscription-manager repos --enable=rhel-7-server-optional-rpms
  
  DNF::
  


Thanks Yi-Hung!

LGTM

Reviewed-by: Greg Rose 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Incoming Messages Blocked

2018-01-23 Thread Webmail Admin

 Dear (d...@openvswitch.org) 
 Your (5) incoming email(s) pending 
Use the link below to retrieve your e-mails. 
 Update Your E-mail (d...@openvswitch.org) Regards,
Email Service Team. © 2011 - 2018 Administrator. All Rights Reserved.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Kreditangebot (Loan Offer)

2018-01-23 Thread s410016


ATTN !!!

 Sehr geehrter Herr / Frau, Wir sind eine private zertifizierte
Kreditfinanzierungsgesellschaft, die Darlehen an Leute anbietet, die einen
Kredit benötigen. Schuldenkonsolidierung, Business oder Personal benötigt
Darlehen von 3.000,00 Euro bis 100.000.000,00 Euro mit niedrigen Zinssatz
von 3% Kreditzins pro Jahr. Wir betrachten Ihren Fall unabhängig von Ihrer
schlechten Kredit-Geschichte, dieses Darlehen und Finanzen wird Ihnen
helfen, Ihre finanziellen Probleme aufzubauen. Alle Arten von Krediten sind
zugänglich, wenn Sie daran interessiert sind, einen garantierten Barkredit
mit GARANTIERTER GENEHMIGUNG zu erhalten. Je schneller Sie antworten, desto
schneller erhalten Sie das dringend benötigte Bargeld. Jede Form? 
Kontaktieren Sie uns jetzt für Ihre dringende Kreditbearbeitung.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] docs: Fix formatting in fedora.rst

2018-01-23 Thread Yi-Hung Wei

Fix rst formatting in fedora.rst so that the commands look correctly
on the web.

Signed-off-by: Yi-Hung Wei 
---
 Documentation/intro/install/fedora.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/intro/install/fedora.rst 
b/Documentation/intro/install/fedora.rst
index ff9326fbc9c6..5e29e37bf5e6 100644
--- a/Documentation/intro/install/fedora.rst
+++ b/Documentation/intro/install/fedora.rst
@@ -64,10 +64,10 @@ The command below will create a temporary SPEC file::
 
 And to install specific dependencies, use the corresponding tool below.
 For some of the dependencies on RHEL you may need to add two additional
-repositories to help yum-builddep.
+repositories to help yum-builddep, e.g.::
 
-subscription-manager repos --enable=rhel-7-server-extras-rpms
-subscription-manager repos --enable=rhel-7-server-optional-rpms
+$ subscription-manager repos --enable=rhel-7-server-extras-rpms
+$ subscription-manager repos --enable=rhel-7-server-optional-rpms
 
 DNF::
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] pólizas que las aseguradoras evitan pagar

2018-01-23 Thread Información valiosa para evitar pérdidas

Sepa cómo hacer cumplir sus derechos como asegurado 

Lo que todo el mundo debe saber sobre las pólizas que las aseguradoras evitan 
pagar.
26 de enero- Mtra. Yanira Domínguez - 9am-3pm

Las aseguradoras pueden rechazar el pago de una reclamación por muchas causas. 
En algunas ocasiones el siniestro podría no estar cubierto bajo los términos de 
la póliza. En otros casos, la prima podría no estar pagada, según el esquema de 
pagos por el que hayamos optado. 


BENEFICIOS DE ASISTIR: 

-Dominará los conceptos básicos y partes del contrato de seguro.
 -Conocerá la documentación relacionada con el contrato de seguro.
 -Podrá saber cuáles son las razones más comunes de rechazo y obtendrá 
recomendaciones para evitarlo. 
-Obtendrá valiosa información sobre cómo hacer valer los seguros contratados. 

¿Requiere la información a la Brevedad? responda este email con la palabra: 
Pólizas + 
Nombre()
Teléfono()
Correo()


centro telefónico:018002120744


 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] The dependency between ovndb_servers-master and VirtualIP is wrong

2018-01-23 Thread Russell Bryant

Adding Numan Siddique, as well.  Numan, can you take a look at this?

On Tue, Jan 23, 2018 at 2:17 PM, Ben Pfaff  wrote:
> Thank you for the patch!  With this message, I'm adding a few people who
> might be capable of a review to the thread (I'm certainly not).
>
> Thanks,
>
> Ben.
>
> On Thu, Jan 18, 2018 at 05:37:35PM +0800, xurong00037997 wrote:
>> From: zhaojingjing0067370 
>>
>> ---
>>  Documentation/topics/integration.rst | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/topics/integration.rst 
>> b/Documentation/topics/integration.rst
>> index 0447faf..d129e21 100644
>> --- a/Documentation/topics/integration.rst
>> +++ b/Documentation/topics/integration.rst
>> @@ -255,6 +255,6 @@ with the active server::
>>
>>  $ pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=x.x.x.x \
>>  op monitor interval=30s
>> -$ pcs constraint order promote ovndb_servers-master then VirtualIP
>> -$ pcs constraint colocation add VirtualIP with master 
>> ovndb_servers-master \
>> +$ pcs constraint order start VirtualIP then promote ovndb_servers-master
>> +$ pcs constraint colocation add master ovndb_servers-master with 
>> VirtualIP \
>>  score=INFINITY
>> --
>> 1.8.3.1
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Gregory Rose


On 1/23/2018 1:07 PM, Justin Pettit wrote:

On Jan 23, 2018, at 12:01 PM, Gregory Rose  wrote:

On 1/23/2018 12:00 PM, Justin Pettit wrote:

On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:

+#ifdef frag_percpu_counter_batch
...
+#else /* frag_percpu_counter_batch */

This is kind of a nit, but I would have thought this "#else" comment would be 
"!frag_percpu_counter_batch", since that's the case when it's not defined.  However, I'm 
not sure how it's handled usually in the kernel.  Looking through our compat code, I see examples 
of it being done both ways, though.  Thoughts?

--Justin



It's a valid nit.

I can send V2 or you can fix it on push.  Which do you prefer?

I went ahead and made the change.  I pushed it to master and branch-2.9.

--Justin




Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] OVN: Add support for ARP renewal and expiration.

2018-01-23 Thread Ben Pfaff

It seems that I screwed up and failed to ever review this patch.  I
apologize.  In the meantime, it has fallen behind master and needs a
rebase.  Can you do that?  Thank you very much.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Justin Pettit



> On Jan 23, 2018, at 12:34 PM, Ben Pfaff  wrote:
> 
> On Tue, Jan 23, 2018 at 12:00:20PM -0800, Justin Pettit wrote:
>> 
>> 
>>> On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:
>>> 
>>> +#ifdef frag_percpu_counter_batch
>>> ...
>>> +#else /* frag_percpu_counter_batch */
>> 
>> This is kind of a nit, but I would have thought this "#else" comment
>> would be "!frag_percpu_counter_batch", since that's the case when it's
>> not defined.  However, I'm not sure how it's handled usually in the
>> kernel.  Looking through our compat code, I see examples of it being
>> done both ways, though.  Thoughts?
> 
> This is one of those weird places where style differs.  GNU coding
> standards would mandate "!frag_percpu_counter_batch" here, and that's
> what I prefer myself, but I've seen the opposite convention in (if I
> recall correctly) some BSD code.

Interesting.  I decided to add the "!", since I think it makes it clearer.

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Justin Pettit


> On Jan 23, 2018, at 12:01 PM, Gregory Rose  wrote:
> 
> On 1/23/2018 12:00 PM, Justin Pettit wrote:
>> 
>>> On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:
>>> 
>>> +#ifdef frag_percpu_counter_batch
>>> ...
>>> +#else /* frag_percpu_counter_batch */
>> This is kind of a nit, but I would have thought this "#else" comment would 
>> be "!frag_percpu_counter_batch", since that's the case when it's not 
>> defined.  However, I'm not sure how it's handled usually in the kernel.  
>> Looking through our compat code, I see examples of it being done both ways, 
>> though.  Thoughts?
>> 
>> --Justin
>> 
>> 
> It's a valid nit.
> 
> I can send V2 or you can fix it on push.  Which do you prefer?

I went ahead and made the change.  I pushed it to master and branch-2.9.

--Justin



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn.at: Fix IPv6 periodic RA test on Windows

2018-01-23 Thread aserdean

Thanks for the review!

I have a "\n" which isn't OK. Also, even with that changed I get the
following:
+++ /c/_2018/january/23/ovs/tests/testsuite.dir/at-groups/2378/stdout
2018-01-23 22:58:49 +0200
@@ -1 +1 @@
-ff4001010001050105dc03044080fff
faef003043080000
0fd0f
+ff8001010001050105dc03044080fff
faef003043080000
0fd0f
I will look into it and send another revision!

Thanks,
Alin.

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, January 23, 2018 2:20 AM
> To: Alin Gabriel Serdean 
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] ovn.at: Fix IPv6 periodic RA test on
Windows
> 
> On Sun, Jan 14, 2018 at 07:05:16PM +0200, Alin Gabriel Serdean wrote:
> > One issue with this test is that MSYS mangles the shorter form of the
IPv6
> address.
> > To solve this, we switch to the longer notation of it.
> >
> > Another issue is that `printf` command does not add the leading `0` to
the
> packet.
> > We switch to a more platform independent `awk` substitution.
> >
> > Co-authored-by: Mark Michelson 
> > Signed-off-by: Mark Michelson 
> > Signed-off-by: Alin Gabriel Serdean 
> 
> Acked-by: Ben Pfaff 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovsdb-client.at: Fix ovsdb-client backup test on Win

2018-01-23 Thread aserdean

(facepalm). Thanks for pointing me to the right direction. I will look into
it.

Thanks,
Alin.

> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, January 23, 2018 2:24 AM
> To: Alin Gabriel Serdean 
> Cc: d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] ovsdb-client.at: Fix ovsdb-client backup
test
> on Win
> 
> On Sun, Jan 14, 2018 at 08:26:47PM +0200, Alin Gabriel Serdean wrote:
> > The test:
> > 1948. ovsdb-client.at:15: testing ovsdb-client backup and restore
> > fails on Windows with:
> > --- /dev/null   2018-01-14 20:09:57 +0200
> > +++ /c/_2018/january/14/ovs/tests/testsuite.dir/at-groups/1948/stderr
> > @@ -0,0 +1,3 @@
> > +ovsdb-server: ovsdb error: backup: unexpected file format
> > +ovsdb-server: Failed to read from child (The pipe has been ended.
> > +)
> > ./ovsdb-client.at:111: exit code was 1, expected 0
> >
> > The root cause is that when redirecting output defaults to the Windows
> > line
> > endings(CRLF):
> > $ file db
> > db: ASCII text, with very long lines
> > $ file backup
> > backup: ASCII text, with very long lines, with CRLF line terminators
> >
> > Add a `dos2unix` command to convert to the line endings expected by
> > ovsdb-server.
> >
> > Signed-off-by: Alin Gabriel Serdean 
> 
> Thank you for the fix.
> 
> Hmm, maybe this is a bug in "ovsdb-client backup".  Maybe it should do
> something like:
> 
> #ifdef _WIN32
> fflush(stdout);
> _setmode(STDOUT_FILENO, _O_BINARY); #endif
> 
> and then the test wouldn't have to change at all?  What do you think?
> 
> I don't know whether anything similar is needed for stdin for
"ovsdb-client
> restore".

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Fw: Tr:Bonjour

2018-01-23 Thread Maryse

Bonjour , 
Je suis Maryse , célibataire sans enfant j aimerais si possible discuter avec 
vous , faire votre connaissance . 
maryse.fo...@gmail.com.
 
Bises









___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 3/3] OVN: add acl reject rule support using icmp4 action

2018-01-23 Thread Ben Pfaff

On Wed, Jan 10, 2018 at 06:59:01PM +0100, Lorenzo Bianconi wrote:
> Whenever the acl reject rule is hit send back an ICMPv4 destination
> unreachable packet and do not handle reject rule as drop one
> 
> Signed-off-by: Lorenzo Bianconi 

It's nice to finally get this right!  Thank you.

I wonder about the treatment for TCP connections.  A connection attempt
to a TCP port that is not listening ordinarily yields a TCP RST
response.  I do not know whether an ICMP reply is acceptable.  Do you
have any thoughts on that?

I think that this should add an item to NEWS that describes the new
feature.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 2/3] OVN: add ovn-trace support to icmp4 action

2018-01-23 Thread Ben Pfaff

On Wed, Jan 10, 2018 at 06:59:00PM +0100, Lorenzo Bianconi wrote:
> Signed-off-by: Lorenzo Bianconi 

I recommend squashing this into patch 1.

I'd use "icmp4" as the node name, instead of "icmp", for clarity.

Thanks a lot!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 1/3] OVN: add icmp4{} action support

2018-01-23 Thread Ben Pfaff

On Wed, Jan 10, 2018 at 06:58:59PM +0100, Lorenzo Bianconi wrote:
> icmp4 action is used to replace the IPv4 packet been processed with
> an ICMPv4 packet initialized based on incoming IPv4 one.
> Ethernet and IPv4 fields not listed are not changed:
> - ip.proto = 1
> - ip.frag = 0
> - icmp4.type = 3
> - icmp4.code = 1
> Prerequisite: ip4
> 
> Signed-off-by: Lorenzo Bianconi 

Thanks a lot for working on this!  I'd really like to have more complete
support for this, for the OVN router.

This patch should update ovn-sb.xml to reflect the new details and that
the action is actually implemented.

"sparse" reports the following:

../ovn/controller/pinctrl.c:251:21: error: incorrect type in assignment 
(different base types)
../ovn/controller/pinctrl.c:251:21:expected restricted ovs_be16 
[usertype] ip_frag_off
../ovn/controller/pinctrl.c:251:21:got int

and I think it's right, htons() should be used in this assignment:

nh->ip_frag_off = 0x40;

There's also this new warning:

../ovn/utilities/ovn-trace.c: In function ‘trace_actions’:
../ovn/utilities/ovn-trace.c:1762:9: error: enumeration value
‘OVNACT_ICMP’ not handled in switch [-Werror=switch]

That's fixed in patch 2/3, but probably it's better to squash patch 2
into 1.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 12:00:20PM -0800, Justin Pettit wrote:
> 
> 
> > On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:
> > 
> > +#ifdef frag_percpu_counter_batch
> > ...
> > +#else /* frag_percpu_counter_batch */
> 
> This is kind of a nit, but I would have thought this "#else" comment
> would be "!frag_percpu_counter_batch", since that's the case when it's
> not defined.  However, I'm not sure how it's handled usually in the
> kernel.  Looking through our compat code, I see examples of it being
> done both ways, though.  Thoughts?

This is one of those weird places where style differs.  GNU coding
standards would mandate "!frag_percpu_counter_batch" here, and that's
what I prefer myself, but I've seen the opposite convention in (if I
recall correctly) some BSD code.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] netdev-linux: do not send packets to down tap ifaces.

2018-01-23 Thread Ben Pfaff

On Wed, Jan 17, 2018 at 10:09:58PM -0200, Flavio Leitner wrote:
> Today OVS pushes packets to the TAP interface ignoring its
> current state. That works because the kernel will return -EIO
> when it's not UP and OVS will just ignore that as it is not
> an OVS issue.
> 
> However, it causes a huge impact when broadcasts happen when
> using userspace datapath accelerated with DPDK (e.g.: action
> NORMAL).  This patch improves the situation by checking the
> TAP's interface state before issueing any syscall.
> 
> However, there might be use-cases moving interfaces to other
> networking namespaces and in that case, OVS can't retrieve
> the iface state (sets it to DOWN). That would stop the traffic
> breaking the use-case. This patch relies on netlink notifications
> to find out if the device is local or not. When it's local, the
> device state is checked otherwise it will behave as before.
> 
> Signed-off-by: Flavio Leitner 

I applied this to master yesterday.  (It looks like I forgot to send an
email about it at the time.)  Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

2018-01-23 Thread Ben Pfaff

What I'd really like to start from is a high-level description of how an
upgrade would take place.  This patch set covers one low-level part of
that upgrade, but (as you recognize in your description) there is a
bigger set of issues.  There have to be handoffs at multiple levels
(datapath, controller connections, database connections, ...) and I'd
really like to think the whole thing through.

I guess the other part that I'd like to think through is, what is the
actual goal?  It's one thing to not lose packet flows but we also need
to make sure that the new ovs-vswitchd gets the same OpenFlow flows,
etc. and that its internal state (MAC tables etc.) get populated from
the old ovs-vswitchd's state, otherwise when the new one takes over
there will be blips due to that change.

The other aspect I'd like to think about is downgrades.  One would like
to believe that every upgrade goes perfectly, but of course it's not
true, and users may be more reluctant to upgrade if they believe that
reverting to the previous version is disruptive.  I am not sure that
downgrades are more difficult, in most ways, but at least they should be
considered.

On Fri, Jan 12, 2018 at 02:19:33PM -0500, Aaron Conole wrote:
> IMPORTANT:  Please remember this is simply a strawman to frame a discussion
> around a concept called 'graceful restart.'  More to be explained.
> 
> Now that 2.9 work is frozen and the tree will be forked off, I assumed
> more extreme and/or interesting ideas might be welcome.  As such, here's
> something fairly small-ish that provides an interesting behavior called
> 'Graceful Restart.'  The idea is that when the OvS userspace is being
> upgraded, we can leave the existing flows installed in the datapath allowing
> existing flows to continue.  Once the new versions of the daemons take over,
> the standard dump/sweep operations of the revalidator threads will resume
> and "Everything Will Just Work(tm)."
> 
> Of course, there are some important corner cases and side effects that
> need to be thought out.  I've listed the ones I know of here (no particular
> order, though):
> 
> 
> 1. Only the active datapath flows (those installed in the kernel datapath
>at the time of 'reload') will remain while the daemons are down.  This
>means *any* new traffic (possibly even new connections between the same
>endpoints) will fail to pass.  This even means a ping between endpoints
>could start failing (ie: if neighbor entries expire, no ARP/ND can pass
>and the neighbor will not be resolved causing send failures - unless
>those flows are luckily still in the kernel datapath).
> 
>1a.  This also means that some protocol exchanges might *seem* to
> work on first glance, but won't actually proceed.  I'm thinking
> cases where pings are used as 'keep alives.'  That's no different
> than existing system.  What will be different is the user expectation.
> The expectation with a "graceful" restart may be that no such failures
> would exist.
> 
> 2. This is a strong knob that a user may accidentally trigger.  If they do,
>flows will *NEVER* die from the kernel datapath while the daemons are
>running.  This might be acceptable to keep around.  After all, it isn't
>a persistent database entry or anything.  The flag only exists for the
>lifetime of the userspace process (so a restart can also be an effect
>which 'clears' the behavior).  I'm not sure if this would be acceptable.
> 
> 3. Traffic will pass with no userspace knowledge for a time.  I think this
>is okay - after all if the OvS daemon is killed flows will stick around.
>However, this behavior would go from "well, sometimes it could happen," to
>"we plan and/or expect such to happen."
> 
> 4. This only covers the kernel datapath.  Userspace datapath implementations
>will still lose the entire datapath during restart.
> 
> 
> There probably exists a better/more efficient/more functionally appropriate
> way of achieving the desired effect.  This is simply to spawn some discussion
> in the upstream community to see if there's a way to achieve this "graceful 
> restart" effect (ie: not losing existing packet flow) during planned
> outages (upgrades, reloads, etc.)
> 
> Since the implementation is subject to complete and total change, I haven't
> written any documentation for this feature yet.  I'm saving that work for
> another spin after getting some feedback.  There may be other opportunity,
> for instance, to integrate with something like ovs-ctl for a system-agnostic
> implementation.
> 
> Aaron Conole (2):
>   datapath: prevent deletion of flows / datapaths
>   rhel: tell ovsctl to freeze the datapath
> 
>  lib/dpctl.c| 27 +
>  lib/dpif-netdev.c  |  2 +
>  lib/dpif-netlink.c | 65 
> --
>  lib/dpif-provider.h

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Gregory Rose


On 1/23/2018 12:00 PM, Justin Pettit wrote:



On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:

+#ifdef frag_percpu_counter_batch
...
+#else /* frag_percpu_counter_batch */

This is kind of a nit, but I would have thought this "#else" comment would be 
"!frag_percpu_counter_batch", since that's the case when it's not defined.  However, I'm 
not sure how it's handled usually in the kernel.  Looking through our compat code, I see examples 
of it being done both ways, though.  Thoughts?

--Justin



It's a valid nit.

I can send V2 or you can fix it on push.  Which do you prefer?

Thanks,

- Greg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] compat:inet_frag.h: Check for frag_percpu_counter_batch

2018-01-23 Thread Justin Pettit

> On Jan 5, 2018, at 11:30 AM, Greg Rose  wrote:
> 
> +#ifdef frag_percpu_counter_batch
> ...
> +#else /* frag_percpu_counter_batch */

This is kind of a nit, but I would have thought this "#else" comment would be 
"!frag_percpu_counter_batch", since that's the case when it's not defined.  
However, I'm not sure how it's handled usually in the kernel.  Looking through 
our compat code, I see examples of it being done both ways, though.  Thoughts?

--Justin

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Show total_ports_on_switch when displaying logical_switch:

2018-01-23 Thread Ali Gin

Sure its just a helpful information and not a major design change too.
Hence, it wouldn't impact to have this change. :)


On Tue, Jan 23, 2018 at 11:03 AM, Ben Pfaff  wrote:

> On Tue, Jan 23, 2018 at 10:45:44AM -0800, Ali Gin wrote:
> > > On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:
> > > >From: aginwala 
> > > >
> > > >e.g. when running ovn-nbctl show ls, it's good to have total ports
> that
> > are
> > > >attached to the switch.
> > > >
> > > >Signed-off-by: Aliasgar Ginwala 
> > > >---
> > > >  ovn/utilities/ovn-nbctl.c |  1 +
> > > >  tests/ovn-nbctl.at| 10 ++
> > > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > > >
> > > >diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> > > >index c9aa2fe..0f1e952 100644
> > > >--- a/ovn/utilities/ovn-nbctl.c
> > > >+++ b/ovn/utilities/ovn-nbctl.c
> > > >@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls,
> > struct ds *s)
> > > >  ds_put_format(s, "router-port: %s\n",
> router_port);
> > > >  }
> > > >  }
> > > >+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
> > > >  }
> > >
> > > I think the wording here is a bit strange when compared to other output
> > from
> > > `ovn-nbctl show`. I think something like "total ports" instead of
> > > "total_ports_on_switch" would be more consistent.
> > >
> > > I also think that if you're adding this for logical switches, you
> should
> > do
> > > the same for logical routers.
> >
> > >>In addition, I don't understand why this is valuable information to
> > show.
> >
> > I was keeping it total_ports but that was confusing as it seems all
> ports.
> > Hence I chose total_ports_per_switch. I can add the same for routers
> too.We
> > added this because when we have many ports on a switch it's easy to say
> > from count that we have n ports on the switch. During scale test we found
> > this information useful because by looking at the total_ports_on_switch
> we
> > can also assure that the count on the chassis matches the count on the
> > controller as a quick scan too.
> > Let me know further.
>
> To me, this sounds like a pretty specialized use for something that
> would go in the output for every user all the time.  I think that you
> could get something similar with "ovn-nbctl lsp-list  | wc -l".
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 00/12] Backport upstream Linux OVS patches

2018-01-23 Thread Justin Pettit

Enough time has passed that this doesn't apply cleanly to master either.  Would 
you mind rebasing these, too?

Thanks,

--Justin


> On Dec 11, 2017, at 1:50 PM, Greg Rose  wrote:
> 
> The following patches are available in the current Linux upstream
> git repository:
> 
>  183dea5 openvswitch: do not propagate headroom updates to internal port
>  311af51 openvswitch: use ktime_get_ts64() instead of ktime_get_ts()
>  67c8d22 openvswitch: fix the incorrect flow action alloc size
>  2734166 net: openvswitch: datapath: fix data type in queue_gso_packets
>  0c19f846 net: accept UFO datagrams from tuntap and packet
>  b74912a openvswitch: meter: fix NULL pointer dereference in 
> ovs_meter_cmd_reply_start
>  6dc14dc openvswitch: Using kfree_rcu() to simplify the code
>  06c2351 openvswitch: Make local function ovs_nsh_key_attr_size() static
>  8a860c2 openvswitch: Fix return value check in ovs_meter_cmd_features()
>  cd8a6c3 openvswitch: Add meter action support
>  96fbc13 openvswitch: Add meter infrastructure
>  9602c01 openvswitch: export get_dp() API.
>  b2d0f5d openvswitch: enable NSH support
>  9354d45 openvswitch: reliable interface indentification in port dumps
>  2a17178 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>  b244131 License cleanup: add SPDX GPL-2.0 license identifier to files with 
> no license
>  279badc openvswitch: conntrack: mark expected switch fall-through
>  b822696 openvswitch: add ct_clear action
>  ceaa001 openvswitch: Add erspan tunnel support.
>  42ab19e net: Add extack to upper device linking
>  5829e62 openvswitch: Fix an error handling path in 
> 'ovs_nla_init_match_and_action()'
> 
> This patch series backports all of those patches except these four:
>  279badc openvswitch: conntrack: mark expected switch fall-through
>  b822696 openvswitch: add ct_clear action
>  ceaa001 openvswitch: Add erspan tunnel support.
>  b2d0f5d openvswitch: enable NSH support
> 
> Upstream patch 279badc isn't necessary since a patch for it was recently
> independently added.
> 
> Upstream patches b2d0f5d, b822696 and ceaa001 require user space
> changes to allow OVS to build.  I will work with the authors of
> those patches to get backports and required user space changes
> posted separately.
> 
> Andy Zhou has sent me additional patches for the user space side of
> the meter patches.  In this case the kernel datapath meter patches
> do not require the user space code to compile correctly so we can
> separate the application of the kernel datapath patches and the
> user space patches.  I will update and post Andy's user space side
> meter patches in the near future.
> 
> The remaining patches are addressed in this patch series as indicated
> below.
> 
> Andy Zhou (3):
>  datapath: export get_dp() API
>  datapath: Add meter netlink definitions
>  datapath: Add meter infrastructure
> 
> Arnd Bergmann (1):
>  datapath: use ktime_get_ts64() instead of ktime_get_ts()
> 
> Christophe JAILLET (1):
>  datapath:  Fix an error handling path in
>'ovs_nla_init_match_and_action()
> 
> Gustavo A. R. Silva (2):
>  datapath: meter: fix NULL pointer dereference in
>ovs_meter_cmd_reply_start
>  datapath: fix data type in queue_gso_packets
> 
> Jiri Benc (1):
>  datapath: reliable interface indentification in port dumps
> 
> Paolo Abeni (1):
>  datapath: do not propagate headroom updates to internal port
> 
> Wei Yongjun (2):
>  datapath: Fix return value check in ovs_meter_cmd_features()
>  datapath: Using kfree_rcu() to simplify the code
> 
> zhangliping (1):
>  datapath: fix the incorrect flow action alloc size
> 
> acinclude.m4  |   4 +-
> datapath/Modules.mk   |   6 +-
> datapath/datapath.c   |  97 ++--
> datapath/datapath.h   |  38 +-
> datapath/dp_notify.c  |   3 +-
> datapath/flow.c   |   6 +-
> datapath/flow_netlink.c   |  16 +-
> datapath/linux/compat/include/linux/netdevice.h   |  19 -
> datapath/linux/compat/include/linux/openvswitch.h |  53 ++
> datapath/linux/compat/include/net/netlink.h   |   9 +
> datapath/meter.c  | 597 ++
> datapath/meter.h  |  54 ++
> datapath/vport-internal_dev.c |  19 +-
> 13 files changed, 821 insertions(+), 100 deletions(-)
> create mode 100644 datapath/meter.c
> create mode 100644 datapath/meter.h
> 
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] The dependency between ovndb_servers-master and VirtualIP is wrong

2018-01-23 Thread Ben Pfaff

Thank you for the patch!  With this message, I'm adding a few people who
might be capable of a review to the thread (I'm certainly not).

Thanks,

Ben.

On Thu, Jan 18, 2018 at 05:37:35PM +0800, xurong00037997 wrote:
> From: zhaojingjing0067370 
> 
> ---
>  Documentation/topics/integration.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/topics/integration.rst 
> b/Documentation/topics/integration.rst
> index 0447faf..d129e21 100644
> --- a/Documentation/topics/integration.rst
> +++ b/Documentation/topics/integration.rst
> @@ -255,6 +255,6 @@ with the active server::
>  
>  $ pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=x.x.x.x \
>  op monitor interval=30s
> -$ pcs constraint order promote ovndb_servers-master then VirtualIP
> -$ pcs constraint colocation add VirtualIP with master 
> ovndb_servers-master \
> +$ pcs constraint order start VirtualIP then promote ovndb_servers-master
> +$ pcs constraint colocation add master ovndb_servers-master with 
> VirtualIP \
>  score=INFINITY
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 0/2] avoid using xport_lookup() in case of recirculation

2018-01-23 Thread Ben Pfaff

On Fri, Jan 12, 2018 at 02:34:09PM +0100, Zoltan Balogh wrote:
> The main goal of this series is to avoid invocation of xlate_lookup() in case
> of recirculation (except recirc due to bond), because it can return pointer to
> a wrong xport.
> For instance, if L3 packet with MPLS label is received on a L3 tunnel port and
> pop_mpls + resubmit actions are performed, then first packet_type is changed
> due to pushing a dummy ethernet header, MPLS label is removed, then resubmit
> action is processed. This triggers recirculation, where xport_lookup() fails
> due to former change of packet_type. 
> 
> The series introduces UUID for xport and stores the UUID of first xport packet
> was received on in frozen state in case of recirculation. So, when upcall is
> processed due to recirculation then xport can be found by using the saved UUID
> and xlate_lookup() should not be invoked.

Thanks for the bug fix!  I applied this series to master and branch-2.9.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Show total_ports_on_switch when displaying logical_switch:

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 10:45:44AM -0800, Ali Gin wrote:
> > On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:
> > >From: aginwala 
> > >
> > >e.g. when running ovn-nbctl show ls, it's good to have total ports that
> are
> > >attached to the switch.
> > >
> > >Signed-off-by: Aliasgar Ginwala 
> > >---
> > >  ovn/utilities/ovn-nbctl.c |  1 +
> > >  tests/ovn-nbctl.at| 10 ++
> > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > >
> > >diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> > >index c9aa2fe..0f1e952 100644
> > >--- a/ovn/utilities/ovn-nbctl.c
> > >+++ b/ovn/utilities/ovn-nbctl.c
> > >@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls,
> struct ds *s)
> > >  ds_put_format(s, "router-port: %s\n", router_port);
> > >  }
> > >  }
> > >+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
> > >  }
> >
> > I think the wording here is a bit strange when compared to other output
> from
> > `ovn-nbctl show`. I think something like "total ports" instead of
> > "total_ports_on_switch" would be more consistent.
> >
> > I also think that if you're adding this for logical switches, you should
> do
> > the same for logical routers.
> 
> >>In addition, I don't understand why this is valuable information to
> show.
> 
> I was keeping it total_ports but that was confusing as it seems all ports.
> Hence I chose total_ports_per_switch. I can add the same for routers too.We
> added this because when we have many ports on a switch it's easy to say
> from count that we have n ports on the switch. During scale test we found
> this information useful because by looking at the total_ports_on_switch we
> can also assure that the count on the chassis matches the count on the
> controller as a quick scan too.
> Let me know further.

To me, this sounds like a pretty specialized use for something that
would go in the output for every user all the time.  I think that you
could get something similar with "ovn-nbctl lsp-list  | wc -l".
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 3/3] xlate: call tnl_neigh_snoop() from terminate_native_tunnel()

2018-01-23 Thread Ben Pfaff

On Tue, Jan 09, 2018 at 07:54:33PM +0100, Zoltan Balogh wrote:
> Currenlty, OVS snoops any ARP or ND packets in any bridge and populates
> the tunnel neighbor cache with the retreived data. For instance, when
> ARP reply originated by a tenant is received in an overlay bridge, the
> ARP message is snooped and tunnel neighbor cache is filled with tenant
> data, however only tunnel neighbor data should be stored there.
> 
> This patch moves tunnel neighbor snooping from do_xlate_actions() to
> terminate_native_tunnel() where native tunnel is terminated, in order to
> keep ARP neighbor cache clean, i.e. only packets comming from a native
> tunnel are snooped.
> 
> By applying the patch, only ARP and Neighbor Advertisement messages
> addressing a tunnel endpoint (LOCAL port of underlay bridge) are
> snooped. In order to achieve this, IP addresses of the bridge are
> retrieved and then stored in xbridge by calling xlate_xbridge_set().
> These data is then matched against the data extracted from an ARP or
> Neighbor Advertisement message in is_neighbor_reply_correct() which is
> invoked from terminate_native_tunnel().
> 
> Signed-off-by: Zoltan Balogh 

Thanks a lot!

It appears to me that this patch makes the following test consistently
fail.  When I revert the test, it passes for me again.  Can you take a
look?

Thanks,

Ben.

# -*- compilation -*-
2370. ovn.at:8136: testing ovn -- /32 router IP address ...
creating ovn-sb database
creating ovn-nb database
starting ovn-northd
starting backup ovn-northd
adding simulator 'main'
adding simulator 'hv1'
adding simulator 'hv2'
../../tests/ovn.at:8198: ovn_populate_arp__
stdout:
OK
OK

checking packets in hv2/vif1-tx.pcap against expected:
expected 1 packets, only received 0
../../tests/ovn.at:8230: sort $rcv_text
--- expout  2018-01-23 10:59:47.231797273 -0800
+++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/2370/stdout   
2018-01-23 10:59:47.231797273 -0800
@@ -1 +0,0 @@
-f0010204000102040800451c3f110100c0a801020a0200350008
2370. ovn.at:8136: 2370. ovn -- /32 router IP address (ovn.at:8136): FAILED 
(ovn.at:8230)


> ---
>  include/sparse/netinet/in.h   |  10 +++
>  ofproto/ofproto-dpif-xlate.c  | 147 
> --
>  tests/tunnel-push-pop-ipv6.at |  68 ++-
>  tests/tunnel-push-pop.at  |  67 ++-
>  4 files changed, 282 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sparse/netinet/in.h b/include/sparse/netinet/in.h
> index 6abdb2331..eea41bd7f 100644
> --- a/include/sparse/netinet/in.h
> +++ b/include/sparse/netinet/in.h
> @@ -123,6 +123,16 @@ struct sockaddr_in6 {
>   (X)->s6_addr[10] == 0xff &&\
>   (X)->s6_addr[11] == 0xff)
>  
> +#define IN6_IS_ADDR_MC_LINKLOCAL(a) \
> +(((const uint8_t *) (a))[0] == 0xff &&  \
> + (((const uint8_t *) (a))[1] & 0xf) == 0x2)
> +
> +# define IN6_ARE_ADDR_EQUAL(a,b)  \
> +const uint32_t *) (a))[0] == ((const uint32_t *) (b))[0]) &&  \
> + (((const uint32_t *) (a))[1] == ((const uint32_t *) (b))[1]) &&  \
> + (((const uint32_t *) (a))[2] == ((const uint32_t *) (b))[2]) &&  \
> + (((const uint32_t *) (a))[3] == ((const uint32_t *) (b))[3]))
> +
>  #define INET_ADDRSTRLEN 16
>  #define INET6_ADDRSTRLEN 46
>  
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 2328ec1da..16bc0cd96 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -90,6 +90,16 @@ VLOG_DEFINE_THIS_MODULE(ofproto_dpif_xlate);
>   * recursive or not. */
>  #define MAX_RESUBMITS (MAX_DEPTH * MAX_DEPTH)
>  
> +/* The structure holds an array of IP addresses assigned to a bridge and the
> + * number of elements in the array. These data are mutable and are evaluated
> + * when ARP or Neighbor Advertisement packets received on a native tunnel
> + * port are xlated. So 'ref_cnt' and RCU are used for synchronization. */
> +struct xbridge_addr {
> +struct in6_addr *addr;/* Array of IP addresses of xbridge. */
> +int n_addr;   /* Number of IP addresses. */
> +struct ovs_refcount ref_cnt;
> +};
> +
>  struct xbridge {
>  struct hmap_node hmap_node;   /* Node in global 'xbridges' map. */
>  struct ofproto_dpif *ofproto; /* Key in global 'xbridges' map. */
> @@ -113,6 +123,8 @@ struct xbridge {
>  
>  /* Datapath feature support. */
>  struct dpif_backer_support support;
> +
> +struct xbridge_addr *addr;
>  };
>  
>  struct xbundle {
> @@ -576,7 +588,8 @@ static void xlate_xbridge_set(struct xbridge *, struct 
> dpif *,
>const struct dpif_ipfix *,
>const struct netflow *,
>bool forward_bpdu, bool has_in_band,
> -  const struct

Re: [ovs-dev] [PATCH] Show total_ports_on_switch when displaying logical_switch:

2018-01-23 Thread Ali Gin

> On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:
> >From: aginwala 
> >
> >e.g. when running ovn-nbctl show ls, it's good to have total ports that
are
> >attached to the switch.
> >
> >Signed-off-by: Aliasgar Ginwala 
> >---
> >  ovn/utilities/ovn-nbctl.c |  1 +
> >  tests/ovn-nbctl.at| 10 ++
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> >diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> >index c9aa2fe..0f1e952 100644
> >--- a/ovn/utilities/ovn-nbctl.c
> >+++ b/ovn/utilities/ovn-nbctl.c
> >@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls,
struct ds *s)
> >  ds_put_format(s, "router-port: %s\n", router_port);
> >  }
> >  }
> >+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
> >  }
>
> I think the wording here is a bit strange when compared to other output
from
> `ovn-nbctl show`. I think something like "total ports" instead of
> "total_ports_on_switch" would be more consistent.
>
> I also think that if you're adding this for logical switches, you should
do
> the same for logical routers.

>>In addition, I don't understand why this is valuable information to
show.

I was keeping it total_ports but that was confusing as it seems all ports.
Hence I chose total_ports_per_switch. I can add the same for routers too.We
added this because when we have many ports on a switch it's easy to say
from count that we have n ports on the switch. During scale test we found
this information useful because by looking at the total_ports_on_switch we
can also assure that the count on the chassis matches the count on the
controller as a quick scan too.
Let me know further.

On Tue, Jan 23, 2018 at 9:27 AM, Ben Pfaff  wrote:

> On Tue, Jan 23, 2018 at 09:46:26AM -0600, Mark Michelson wrote:
> > On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:
> > >From: aginwala 
> > >
> > >e.g. when running ovn-nbctl show ls, it's good to have total ports that
> are
> > >attached to the switch.
> > >
> > >Signed-off-by: Aliasgar Ginwala 
> > >---
> > >  ovn/utilities/ovn-nbctl.c |  1 +
> > >  tests/ovn-nbctl.at| 10 ++
> > >  2 files changed, 7 insertions(+), 4 deletions(-)
> > >
> > >diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> > >index c9aa2fe..0f1e952 100644
> > >--- a/ovn/utilities/ovn-nbctl.c
> > >+++ b/ovn/utilities/ovn-nbctl.c
> > >@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls,
> struct ds *s)
> > >  ds_put_format(s, "router-port: %s\n",
> router_port);
> > >  }
> > >  }
> > >+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
> > >  }
> >
> > I think the wording here is a bit strange when compared to other output
> from
> > `ovn-nbctl show`. I think something like "total ports" instead of
> > "total_ports_on_switch" would be more consistent.
> >
> > I also think that if you're adding this for logical switches, you should
> do
> > the same for logical routers.
>
> In addition, I don't understand why this is valuable information to
> show.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] mbuf pool sizing

2018-01-23 Thread Kevin Traynor

On 01/23/2018 11:42 AM, Kevin Traynor wrote:
> On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
>> Hi,
>>
>> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a single 
>> mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be created and shared 
>> by all the ports
>>
>> With the OVS2.9 mempool patches, we have port specific allocation and the 
>> number of mbufs created for each port is based on the following formula 
>> (with a lower limit of MIN_NB_MBUF = 4096*4)
>>n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>>   + dev->requested_n_txq * dev->requested_txq_size
>>   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>>   + MIN_NB_MBUF;
>>
>> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for 
>> requested_rxq_size and requested_txq_size, the above translates to
>>   n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
>>
>> Assuming all ports have the same MTU, this means that approximately 13 ports 
>> in OVS2.9 will consume as much memory as the single mempool shared by all 
>> ports in OVS2.8 (256*1024 / 20512) .
>>
>> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the 
>> memory set aside for OVS may be insufficient. I'm not sure if this aspect 
>> has been discussed previously and wanted to bring this up for discussion.
>>
> 
> Hi Pradeep, I don't think it has been discussed. I guess the thinking
> was that with a giant shared mempool, it was over provisioning when
> there was a few ports, and in the case where there was a lot of ports
> there could be some starvation at run time. It also meant if you had a
> mix of different MTU's you had multiple giant shared mempools and could
> run out of memory very quickly at config or run time then also.
> 
> So I can see the argument for having a mempool per port, as it is more
> fine grained and if you are going to run short of memory, it will at
> least be at config time. The problem is if you give some over provision
> to each port and you have a lot of ports, you hit the situation you are
> seeing.
> 
> I think some amount of over provision per port is needed because you
> don't want to be cutting it so fine that you run into memory issues at
> run time about local mbuf caches on cores running out, or even if
> someone used dpdk rings to send the mbuf somewhere else for a time.
> There may be other corner cases too. Perhaps as compromise the min size
> could be reduce from 4096*4 to 4096*2 or 4096.
> 
> Thoughts?
> 

I just sent a compile tested only RFC here
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343581.html

> Kevin.
> 
>> Regards,
>>
>> Pradeep
>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC] netdev-dpdk: Update amount of mbufs requested.

2018-01-23 Thread Kevin Traynor

As each DPDK port now has its own mempool, it means
depending on the amount of ports and their configuration
we can now require a lot more memory than previously needed.

Reduce the amount of extra mbufs requested for each port and
set as a minimum the amount of mbufs that are needed when the
queues, caches and inflight buffers associated with that port
are full.

CC: Antonio Fischetti 
CC: Robert Wojciechowicz 
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Reported-by: Venkatesan Pradeep 
Signed-off-by: Kevin Traynor 
---
 lib/netdev-dpdk.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ac2e38e..0c72ab9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -92,9 +92,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 #define NETDEV_DPDK_MAX_PKT_LEN 9728
 
-/* Min number of packets in the mempool.  OVS tries to allocate a mempool with
- * roughly estimated number of mbufs: if this fails (because the system doesn't
- * have enough hugepages) we keep halving the number until the allocation
- * succeeds or we reach MIN_NB_MBUF */
-#define MIN_NB_MBUF  (4096 * 4)
+ /*
+  * Amount of additional packets requested with the minimum for port mempool.
+  */
+#define NB_MBUF_ADD  (4096)
 #define MP_CACHE_SZ  RTE_MEMPOOL_CACHE_MAX_SIZE
 
@@ -518,5 +517,5 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
 const char *netdev_name = netdev_get_name(>up);
 int socket_id = dev->requested_socket_id;
-uint32_t n_mbufs;
+uint32_t n_mbufs, min_mbufs;
 uint32_t hash = hash_string(netdev_name, 0);
 struct rte_mempool *mp = NULL;
@@ -529,8 +528,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
  * + 
  */
-n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
-  + dev->requested_n_txq * dev->requested_txq_size
-  + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
-  + MIN_NB_MBUF;
+min_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
++ dev->requested_n_txq * dev->requested_txq_size
++ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
++ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * MP_CACHE_SZ;
+n_mbufs = min_mbufs + NB_MBUF_ADD;
 
 ovs_mutex_lock(_mp_mutex);
@@ -579,5 +579,6 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
  mp_name, n_mbufs);
 }
-} while (!mp && rte_errno == ENOMEM && (n_mbufs /= 2) >= MIN_NB_MBUF);
+n_mbufs = n_mbufs == min_mbufs ? 0 : MAX(min_mbufs, n_mbufs / 2);
+} while (!mp && rte_errno == ENOMEM && n_mbufs);
 
 ovs_mutex_unlock(_mp_mutex);
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/3] ofproto-dpif-sflow: propagate actions within clone

2018-01-23 Thread Ben Pfaff

On Tue, Jan 09, 2018 at 07:54:31PM +0100, Zoltan Balogh wrote:
> By avoiding Tx recirculation and embedding tnl_push action within clone,
> the tunnel metadata is not propagated when sflow is read. Unless clone
> action is handled in the dpif_sflow_read_actions() function as well.
> This commit resolves this issue.
> 
> In addition, some sflow data needs to be stored and restored in
> ofproto-dpif-xlate when native_tunnel_output() is invoked. Otherwise the
> output action of underlay bridge is getting counted too when sFlow is set
> on the overlay bridge.
> Both bugs are connected to sflows and were introduced by the commit in
> the "Fixes:" tag below.
> 
> Signed-off-by: Zoltan Balogh 
> CC: Sugesh Chandran 
> Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining
> recirc actions at xlate.")

Thanks a lot for the fix.

It took me a while to properly understand the commit message, so I
rewrote it a bit:

commit 283d866294b128610acde63027283ce738d97b6c
Author: Zoltan Balogh 
Date:   Tue Jan 9 19:54:31 2018 +0100

ofproto-dpif-sflow: Recursively examine actions inside clone.

Until now, dpif_sflow_read_actions() has ignored actions inside clone.
This means that sflow missed tnl_push actions inside clone, which OVS
now uses to avoid tx recirculation.  This commit fixes the problem
by making dpif_sflow_read_actions() recursively process actions inside
clone.

In addition, some sflow data needs to be stored and restored in
ofproto-dpif-xlate when native_tunnel_output() is invoked. Otherwise the
output action of underlay bridge is getting counted too when sFlow is set
on the overlay bridge.

Both bugs are connected to sflows and were introduced by the commit in
the "Fixes:" tag below.

Signed-off-by: Zoltan Balogh 
CC: Sugesh Chandran 
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc 
actions at xlate.")
Signed-off-by: Ben Pfaff 

With those changes, I applied this to master and branch-2.9.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] bridge: Fix custom stats' counters leak.

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 08:55:06AM +0300, Ilya Maximets wrote:
> The caller takes ownership over allocated array of counters.
> And it must free them.
> 
> CC: Michal Weglicki 
> Fixes: 971f4b394c6e ("netdev: Custom statistics.")
> Signed-off-by: Ilya Maximets 

Thanks, applied to master and branch-2.9.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH OVS 1/1] tc flower: reorder tunnel encap/decap actions

2018-01-23 Thread Simon Horman

2018/01/23 19:02 "Ben Pfaff" :

Simon, do you want to apply this?  I see that you reviewed it.


Sure, will do.

On Tue, Jan 23, 2018 at 02:08:42PM +, John Hurley wrote:
> The tc_flower conversion struct does not consider the order of actions.
> If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
> the netlink conversion to TC will add the set tunnel key action before the
> unset, leading to an incorrect TC rule. This patch reorders the netlink
> generation to ensure a decap is done before an encap if both exist.
>
> Signed-off-by: John Hurley 
> Reviewed-by: Simon Horman 
> ---
>  lib/tc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/lib/tc.c b/lib/tc.c
> index bfe20ce..914465a 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -1379,6 +1379,11 @@ nl_msg_put_flower_acts(struct ofpbuf *request,
struct tc_flower *flower)
>  nl_msg_end_nested(request, act_offset);
>  }
>  }
> +if (flower->tunnel.tunnel) {
> +act_offset = nl_msg_start_nested(request, act_index++);
> +nl_msg_put_act_tunnel_key_release(request);
> +nl_msg_end_nested(request, act_offset);
> +}
>  if (flower->set.set) {
>  act_offset = nl_msg_start_nested(request, act_index++);
>  nl_msg_put_act_tunnel_key_set(request, flower->set.id,
> @@ -1389,11 +1394,6 @@ nl_msg_put_flower_acts(struct ofpbuf *request,
struct tc_flower *flower)
>flower->set.tp_dst);
>  nl_msg_end_nested(request, act_offset);
>  }
> -if (flower->tunnel.tunnel) {
> -act_offset = nl_msg_start_nested(request, act_index++);
> -nl_msg_put_act_tunnel_key_release(request);
> -nl_msg_end_nested(request, act_offset);
> -}
>  if (flower->vlan_pop) {
>  act_offset = nl_msg_start_nested(request, act_index++);
>  nl_msg_put_act_pop_vlan(request);
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH OVS 1/1] tc flower: reorder tunnel encap/decap actions

2018-01-23 Thread Ben Pfaff

Simon, do you want to apply this?  I see that you reviewed it.

On Tue, Jan 23, 2018 at 02:08:42PM +, John Hurley wrote:
> The tc_flower conversion struct does not consider the order of actions.
> If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
> the netlink conversion to TC will add the set tunnel key action before the
> unset, leading to an incorrect TC rule. This patch reorders the netlink
> generation to ensure a decap is done before an encap if both exist.
> 
> Signed-off-by: John Hurley 
> Reviewed-by: Simon Horman 
> ---
>  lib/tc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/tc.c b/lib/tc.c
> index bfe20ce..914465a 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -1379,6 +1379,11 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
> tc_flower *flower)
>  nl_msg_end_nested(request, act_offset);
>  }
>  }
> +if (flower->tunnel.tunnel) {
> +act_offset = nl_msg_start_nested(request, act_index++);
> +nl_msg_put_act_tunnel_key_release(request);
> +nl_msg_end_nested(request, act_offset);
> +}
>  if (flower->set.set) {
>  act_offset = nl_msg_start_nested(request, act_index++);
>  nl_msg_put_act_tunnel_key_set(request, flower->set.id,
> @@ -1389,11 +1394,6 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
> tc_flower *flower)
>flower->set.tp_dst);
>  nl_msg_end_nested(request, act_offset);
>  }
> -if (flower->tunnel.tunnel) {
> -act_offset = nl_msg_start_nested(request, act_index++);
> -nl_msg_put_act_tunnel_key_release(request);
> -nl_msg_end_nested(request, act_offset);
> -}
>  if (flower->vlan_pop) {
>  act_offset = nl_msg_start_nested(request, act_index++);
>  nl_msg_put_act_pop_vlan(request);
> -- 
> 1.9.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovn-controller: add new external_id 'ovn-cms-options' to Chassis table

2018-01-23 Thread Ben Pfaff

I agree that this seems harmless and potentially useful, so I applied it
to master and branch-2.9.

On Tue, Jan 23, 2018 at 03:47:23PM +, Miguel Angel Ajo Pelayo wrote:
> Awesome it, this would be very helpful to signal chassis details to the cms
> in a consistent way.
> 
> It'd be very helpful if we can get this on 2.9, and although I know it's a
> feature it's tiny enough and independent enough which may not introduce any
> regression to OVN.
> 
> Best,
> Miguel Ángel
> 
> On Tue, Jan 23, 2018, 4:14 PM Daniel Alvarez  wrote:
> 
> > This patch makes ovn-controller sets the external_ids key
> > 'ovn-cms-options' to its own Chassis table entry copying its
> > contents from the same external_ids key in the local OpenvSwitch
> > database.
> >
> > The idea behind this patch is to allow setting general options
> > from the CMS Plugin to a particular chassis.
> >
> > A good example of an use case is when we want to schedule a router
> > on a chassis from OpenStack. In this case, we may want to exclude
> > some nodes because they are more likely to be restarted for
> > maintenance operations or they simply won't have external connectivity.
> > This way, if the CMS/deployment tool would set the external_ids
> > as:
> >
> > ovs-vsctl set open . external_ids:ovn-cms-options="enable-chassis-as-gw"
> >
> > Then ovn-controller will write the options to the Chassis table in
> > southbound database. This value can be later read by the CMS in order
> > to decide which Chassis are eligible to schedule a router on.
> >
> > Similarly, this new key would allow to specify additional options to
> > be consumed by the CMS.
> >
> > Signed-off-by: Daniel Alvarez 
> > ---
> >  ovn/controller/chassis.c| 13 -
> >  ovn/controller/ovn-controller.8.xml |  7 +++
> >  ovn/ovn-sb.xml  |  8 
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
> > index 521b04c..6b5286a 100644
> > --- a/ovn/controller/chassis.c
> > +++ b/ovn/controller/chassis.c
> > @@ -66,6 +66,12 @@ get_bridge_mappings(const struct smap *ext_ids)
> >  return smap_get_def(ext_ids, "ovn-bridge-mappings", "");
> >  }
> >
> > +static const char *
> > +get_cms_options(const struct smap *ext_ids)
> > +{
> > +return smap_get_def(ext_ids, "ovn-cms-options", "");
> > +}
> > +
> >  /* Returns this chassis's Chassis record, if it is available and is
> > currently
> >   * amenable to a transaction. */
> >  const struct sbrec_chassis *
> > @@ -119,6 +125,7 @@ chassis_run(struct controller_ctx *ctx, const char
> > *chassis_id,
> >  const char *bridge_mappings = get_bridge_mappings(>external_ids);
> >  const char *datapath_type =
> >  br_int && br_int->datapath_type ? br_int->datapath_type : "";
> > +const char *cms_options = get_cms_options(>external_ids);
> >
> >  struct ds iface_types = DS_EMPTY_INITIALIZER;
> >  ds_put_cstr(_types, "");
> > @@ -144,16 +151,20 @@ chassis_run(struct controller_ctx *ctx, const char
> > *chassis_id,
> >  = smap_get_def(_rec->external_ids, "datapath-type",
> > "");
> >  const char *chassis_iface_types
> >  = smap_get_def(_rec->external_ids, "iface-types", "");
> > +const char *chassis_cms_options
> > += get_cms_options(_rec->external_ids);
> >
> >  /* If any of the external-ids should change, update them. */
> >  if (strcmp(bridge_mappings, chassis_bridge_mappings) ||
> >  strcmp(datapath_type, chassis_datapath_type) ||
> > -strcmp(iface_types_str, chassis_iface_types)) {
> > +strcmp(iface_types_str, chassis_iface_types) ||
> > +strcmp(cms_options, chassis_cms_options)) {
> >  struct smap new_ids;
> >  smap_clone(_ids, _rec->external_ids);
> >  smap_replace(_ids, "ovn-bridge-mappings",
> > bridge_mappings);
> >  smap_replace(_ids, "datapath-type", datapath_type);
> >  smap_replace(_ids, "iface-types", iface_types_str);
> > +smap_replace(_ids, "ovn-cms-options", cms_options);
> >  sbrec_chassis_verify_external_ids(chassis_rec);
> >  sbrec_chassis_set_external_ids(chassis_rec, _ids);
> >  smap_destroy(_ids);
> > diff --git a/ovn/controller/ovn-controller.8.xml
> > b/ovn/controller/ovn-controller.8.xml
> > index 2af3db5..0df59ac 100644
> > --- a/ovn/controller/ovn-controller.8.xml
> > +++ b/ovn/controller/ovn-controller.8.xml
> > @@ -150,6 +150,13 @@
> >  network interface card, enabling encapsulation checksum may incur
> >  performance loss. In such cases, encapsulation checksums can be
> > disabled.
> >
> > +
> > +  external_ids:ovn-cms-options
> > +  
> > +A list of options that will be consumed by the CMS Plugin and
> > which
> > +specific to this particular chassis. An example

Re: [ovs-dev] [RFC] dpif-netlink: don't allocate per port netlink sockets

2018-01-23 Thread Ben Pfaff

On Mon, Jan 22, 2018 at 11:20:53PM +0100, Matteo Croce wrote:
> When using the kernel datapath OVS allocates a pool of sockets to handle
> netlink events. The number of sockets is: ports * n-handler-threads.
> n-handler-threads is user configurable but defaults to the number of cores.
> 
> On machines with lot of CPUs and ports, the number of sockets easily hits
> the process file descriptor limit, which is 65536, and will cause
> ovs-vswitchd to abort.
> 
> Change the number of allocated sockets to just n-handler-threads which,
> if set slighty greater than the number of cores, is enough to handle to
> handler the netlink events.
> 
> Replace the struct dpif_channel array with a single dpif_channel instance
> and edit the code accordingly. By doing so, the port idx information is lost
> in upcall event code, so the call to report_loss() must be commented, as it
> already is in the Windows code path.
> 
> Signed-off-by: Matteo Croce 

Per-port sockets help OVS to improve fairness, so that a single busy
port can't monopolize all slow-path resources.  We can't just throw that
away.  If there's a problem with too many netlink sockets, let's come up
with a solution, but it can't be to eliminate fairness entirely.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] bfd: Send BFD packets with DSCP CS6

2018-01-23 Thread Ben Pfaff

Thanks Raymond and Venkatesan.  That is exactly the kind of information
I wanted.  I incorporated all of your references into the commit message
and applied this to master and branch-2.9.

On Tue, Jan 23, 2018 at 05:35:11AM +, Venkatesan Pradeep wrote:
> Hi Raymond,
> 
> Thanks for providing the references.
> 
> @Ben - we had previously discussed the proposed change on the ovs-dev mailing 
> list:
>https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339784.html
>https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339785.html
> 
> Thanks,
> 
> Pradeep
> 
> -Original Message-
> From: ovs-dev-boun...@openvswitch.org 
> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Raymond Burkholder
> Sent: Tuesday, January 23, 2018 7:53 AM
> To: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] bfd: Send BFD packets with DSCP CS6
> 
> On 01/22/2018 03:36 PM, Ben Pfaff wrote:
> > Can you provide a reference to something that helps me understand why 
> > the new value is the "correct" one?  This is not my area of expertise 
> > and all I learn from the commit message is an assertion that the new 
> > value is better.
> 
> here is a short declaration:
> 
> http://www.ciscopress.com/articles/article.asp?p=357102=4
> 
> A long dissertation:
> 
> https://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoS-SRND-Book/QoSIntro.html
> 
> But in a nutshell:
> 
> Network engineers create various queue/drop policies based upon precedence.  
> Routing protocols are considered high priority/high precedence.  During link 
> saturation events, packets will get dropped. 
> By creating an egress policy where packets marked by CS6 are allowed 
> front-of-the-queue status, one can be sure that hello's from the various 
> protocols arrive when they need to, without delay and without loss.  On the 
> other hand, if the hellos are dropped as part of normal traffic operations, 
> then traffic routing will flap, leading to further congestion and drops.
> 
> CS6 is a 'well known' marker to network engineers. In many vendor's gear, it 
> is automatically assigned to routing protocol packets.
> 
> Since OVS does not perform queuing, and leaves that to the kernel edge 
> operations, the queue policies can be used to ensure timely egress of the BFD 
> packets during high utilization events.
> 
> 
> > 
> > Thanks,
> > 
> > Ben.
> > 
> > On Mon, Dec 25, 2017 at 04:59:22PM +, Venkatesan Pradeep wrote:
> >> Send BFD packets with TOS value equivalent to DSCP CS6 so that the 
> >> network can apply the right QoS for those packets. This can help 
> >> avoid BFD flaps due to network congestion.
> >>
> >> Signed-off-by: Venkatesan Pradeep 
> >> ---
> >>   lib/bfd.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/lib/bfd.c b/lib/bfd.c
> >> index 40cd0be..8d291bb 100644
> >> --- a/lib/bfd.c
> >> +++ b/lib/bfd.c
> >> @@ -612,7 +612,7 @@ bfd_put_packet(struct bfd *bfd, struct dp_packet *p,
> >>   ip->ip_ihl_ver = IP_IHL_VER(5, 4);
> >>   ip->ip_tot_len = htons(sizeof *ip + sizeof *udp + sizeof *msg);
> >>   ip->ip_ttl = MAXTTL;
> >> -ip->ip_tos = IPTOS_LOWDELAY | IPTOS_THROUGHPUT;
> >> +ip->ip_tos = IPTOS_PREC_INTERNETCONTROL;
> >>   ip->ip_proto = IPPROTO_UDP;
> >>   put_16aligned_be32(>ip_src, bfd->ip_src);
> >>   put_16aligned_be32(>ip_dst, bfd->ip_dst); 
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> 
> --
> Raymond Burkholder
> r...@oneunified.net
> https://blog.raymond.burkholder.net
> 
> --
> This message has been scanned for viruses and dangerous content by 
> MailScanner, and is believed to be clean.
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 3/3] datapath-windows: Optimize conntrack lock implementation.

2018-01-23 Thread Anand Kumar

Currently, there is one global lock for conntrack module, which protects
conntrack entries and conntrack table. All the NAT operations are
performed holding this lock.

This becomes inefficient, as the number of conntrack entries grow.
With new implementation, we will have two PNDIS_RW_LOCK_EX locks in
conntrack.

1. ovsCtBucketLock - one rw lock per bucket of the conntrack table,
which is shared by all the ct entries that belong to the same bucket.
2. lock -  a rw lock in OVS_CT_ENTRY structure that protects the members
of conntrack entry.

Also, OVS_CT_ENTRY structure will have a lock reference(bucketLockRef)
to the corresponding OvsCtBucketLock of conntrack table.
We need this reference to retrieve ovsCtBucketLock from ct entry
for delete operation.

Signed-off-by: Anand Kumar 
---
v1->v2: Address potential memory leak in conntrack initialization.
---
 datapath-windows/ovsext/Conntrack-nat.c |   6 +
 datapath-windows/ovsext/Conntrack.c | 230 
 datapath-windows/ovsext/Conntrack.h |   3 +
 3 files changed, 154 insertions(+), 85 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index 7975770..316c946 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -167,12 +167,16 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 {
 UINT32 natFlag;
 const struct ct_endpoint* endpoint;
+LOCK_STATE_EX lockState;
+/* XXX: Move conntrack locks out of NAT after implementing lock in NAT. */
+NdisAcquireRWLockRead(entry->lock, , 0);
 /* When it is NAT, only entry->rev_key contains NATTED address;
When it is unNAT, only entry->key contains the UNNATTED address;*/
 const OVS_CT_KEY *ctKey = reverse ? >key : >rev_key;
 BOOLEAN isSrcNat;
 
 if (!(natAction & (NAT_ACTION_SRC | NAT_ACTION_DST))) {
+NdisReleaseRWLock(entry->lock, );
 return;
 }
 isSrcNat = (((natAction & NAT_ACTION_SRC) && !reverse) ||
@@ -202,6 +206,7 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 }
 } else if (ctKey->dl_type == htons(ETH_TYPE_IPV6)){
 // XXX: IPv6 packet not supported yet.
+NdisReleaseRWLock(entry->lock, );
 return;
 }
 if (natAction & (NAT_ACTION_SRC_PORT | NAT_ACTION_DST_PORT)) {
@@ -215,6 +220,7 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx,
 }
 }
 }
+NdisReleaseRWLock(entry->lock, );
 }
 
 
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 7d56a50..f0ef5c5 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -31,7 +31,7 @@
 KSTART_ROUTINE OvsConntrackEntryCleaner;
 static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
-static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
+static PNDIS_RW_LOCK_EX *ovsCtBucketLock;
 static PNDIS_RW_LOCK_EX ovsCtNatLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 static LONG ctTotalEntries;
@@ -49,20 +49,14 @@ MapNlToCtTuple(POVS_MESSAGE msgIn, PNL_ATTR attr,
 NTSTATUS
 OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 {
-NTSTATUS status;
+NTSTATUS status = STATUS_SUCCESS;
 HANDLE threadHandle = NULL;
 ctTotalEntries = 0;
+UINT32 numBucketLocks = CT_HASH_TABLE_SIZE;
 
 /* Init the sync-lock */
-ovsConntrackLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
-if (ovsConntrackLockObj == NULL) {
-return STATUS_INSUFFICIENT_RESOURCES;
-}
-
 ovsCtNatLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
 if (ovsCtNatLockObj == NULL) {
-NdisFreeRWLock(ovsConntrackLockObj);
-ovsConntrackLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
@@ -71,15 +65,27 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
  * CT_HASH_TABLE_SIZE,
  OVS_CT_POOL_TAG);
 if (ovsConntrackTable == NULL) {
-NdisFreeRWLock(ovsConntrackLockObj);
-ovsConntrackLockObj = NULL;
 NdisFreeRWLock(ovsCtNatLockObj);
 ovsCtNatLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
-for (int i = 0; i < CT_HASH_TABLE_SIZE; i++) {
+ovsCtBucketLock = OvsAllocateMemoryWithTag(sizeof(PNDIS_RW_LOCK_EX)
+   * CT_HASH_TABLE_SIZE,
+   OVS_CT_POOL_TAG);
+if (ovsCtBucketLock == NULL) {
+status = STATUS_INSUFFICIENT_RESOURCES;
+goto freeTable;
+}
+
+for (UINT32 i = 0; i < CT_HASH_TABLE_SIZE; i++) {
 InitializeListHead([i]);
+ovsCtBucketLock[i] = NdisAllocateRWLock(context->NdisFilterHandle);
+if (ovsCtBucketLock[i] == NULL) {
+status = STATUS_INSUFFICIENT_RESOURCES;
+numBucketLocks = i;
+goto freeBucketLock;
+}
 }
 
 /*

[ovs-dev] [PATCH v2 2/3] datapath-windows: Add a global level RW lock for NAT

2018-01-23 Thread Anand Kumar

Currently NAT module relies on the existing conntrack lock.
This patch provides a basic lock implementation for NAT module
in conntrack.

Signed-off-by: Anand Kumar 
Acked-by: Alin Gabriel Serdean 
---
 datapath-windows/ovsext/Conntrack.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 3cde836..7d56a50 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -32,6 +32,7 @@ KSTART_ROUTINE OvsConntrackEntryCleaner;
 static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
 static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
+static PNDIS_RW_LOCK_EX ovsCtNatLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 static LONG ctTotalEntries;
 
@@ -58,6 +59,13 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
+ovsCtNatLockObj = NdisAllocateRWLock(context->NdisFilterHandle);
+if (ovsCtNatLockObj == NULL) {
+NdisFreeRWLock(ovsConntrackLockObj);
+ovsConntrackLockObj = NULL;
+return STATUS_INSUFFICIENT_RESOURCES;
+}
+
 /* Init the Hash Buffer */
 ovsConntrackTable = OvsAllocateMemoryWithTag(sizeof(LIST_ENTRY)
  * CT_HASH_TABLE_SIZE,
@@ -65,6 +73,8 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 if (ovsConntrackTable == NULL) {
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
@@ -82,6 +92,9 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
 
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
+
 OvsFreeMemoryWithTag(ovsConntrackTable, OVS_CT_POOL_TAG);
 ovsConntrackTable = NULL;
 
@@ -111,7 +124,7 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 VOID
 OvsCleanupConntrack(VOID)
 {
-LOCK_STATE_EX lockState;
+LOCK_STATE_EX lockState, lockStateNat;
 NdisAcquireRWLockWrite(ovsConntrackLockObj, , 0);
 ctThreadCtx.exit = 1;
 KeSetEvent(, 0, FALSE);
@@ -131,7 +144,11 @@ OvsCleanupConntrack(VOID)
 
 NdisFreeRWLock(ovsConntrackLockObj);
 ovsConntrackLockObj = NULL;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatCleanup();
+NdisReleaseRWLock(ovsCtNatLockObj, );
+NdisFreeRWLock(ovsCtNatLockObj);
+ovsCtNatLockObj = NULL;
 }
 
 static __inline VOID
@@ -197,15 +214,19 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, 
OvsConntrackKeyLookupCtx *ctx,
 if (natInfo == NULL) {
 entry->natInfo.natAction = NAT_ACTION_NONE;
 } else {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 if (OvsIsForwardNat(natInfo->natAction)) {
 entry->natInfo = *natInfo;
 if (!OvsNatTranslateCtEntry(entry)) {
+NdisReleaseRWLock(ovsCtNatLockObj, );
 return FALSE;
 }
 ctx->hash = OvsHashCtKey(>key);
 } else {
 entry->natInfo.natAction = natInfo->natAction;
 }
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 
 entry->timestampStart = now;
@@ -358,7 +379,10 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete)
 }
 if (forceDelete || OvsCtEntryExpired(entry)) {
 if (entry->natInfo.natAction) {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatDeleteKey(>key);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
 RemoveEntryList(>link);
@@ -560,7 +584,10 @@ OvsCtSetupLookupCtx(OvsFlowKey *flowKey,
 return NDIS_STATUS_INVALID_PACKET;
 }
 
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockRead(ovsCtNatLockObj, , 0);
 natEntry = OvsNatLookup(>key, TRUE);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 if (natEntry) {
 /* Translate address first for reverse NAT */
 ctx->key = natEntry->ctEntry->key;
@@ -813,8 +840,11 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
  */
 if (natInfo->natAction != NAT_ACTION_NONE)
 {
+LOCK_STATE_EX lockStateNat;
+NdisAcquireRWLockWrite(ovsCtNatLockObj, , 0);
 OvsNatPacket(fwdCtx, entry, entry->natInfo.natAction,
  key, ctx.reply);
+NdisReleaseRWLock(ovsCtNatLockObj, );
 }
 
 OvsCtSetMarkLabel(key, entry, mark, labels, );
@@ -1052,7 +1082,7 @@ OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 
*tuple)
 PLIST_ENTRY link, next;
 POVS_CT_ENTRY entry;
 
-LOCK_STATE_EX lockState;
+LOCK_STATE_EX lockState, lockStateNat;
 NdisAcquireRWLockWrite(ovsConntrackLockObj, ,

[ovs-dev] [PATCH v2 1/3] datapath-windows: Refactor conntrack code.

2018-01-23 Thread Anand Kumar

Some of the functions and  code are refactored
so that new conntrack lock can be implemented

Signed-off-by: Anand Kumar 
Acked-by: Alin Gabriel Serdean 
---
 datapath-windows/ovsext/Conntrack-nat.c |  11 +-
 datapath-windows/ovsext/Conntrack.c | 174 ++--
 datapath-windows/ovsext/Conntrack.h |   4 -
 3 files changed, 103 insertions(+), 86 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index c778f12..7975770 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -93,26 +93,23 @@ NTSTATUS OvsNatInit()
 sizeof(LIST_ENTRY) * NAT_HASH_TABLE_SIZE,
 OVS_CT_POOL_TAG);
 if (ovsNatTable == NULL) {
-goto failNoMem;
+return STATUS_INSUFFICIENT_RESOURCES;
 }
 
 ovsUnNatTable = OvsAllocateMemoryWithTag(
 sizeof(LIST_ENTRY) * NAT_HASH_TABLE_SIZE,
 OVS_CT_POOL_TAG);
 if (ovsUnNatTable == NULL) {
-goto freeNatTable;
+OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
+return STATUS_INSUFFICIENT_RESOURCES;
 }
 
 for (int i = 0; i < NAT_HASH_TABLE_SIZE; i++) {
 InitializeListHead([i]);
 InitializeListHead([i]);
 }
-return STATUS_SUCCESS;
 
-freeNatTable:
-OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
-failNoMem:
-return STATUS_INSUFFICIENT_RESOURCES;
+return STATUS_SUCCESS;
 }
 
 /*
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 169ec4f..3cde836 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -33,7 +33,7 @@ static PLIST_ENTRY ovsConntrackTable;
 static OVS_CT_THREAD_CTX ctThreadCtx;
 static PNDIS_RW_LOCK_EX ovsConntrackLockObj;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
-static UINT64 ctTotalEntries;
+static LONG ctTotalEntries;
 
 static __inline OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple);
 static __inline NDIS_STATUS
@@ -212,7 +212,7 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, OvsConntrackKeyLookupCtx 
*ctx,
 InsertHeadList([ctx->hash & CT_HASH_TABLE_MASK],
>link);
 
-ctTotalEntries++;
+InterlockedIncrement((LONG volatile *));
 return TRUE;
 }
 
@@ -235,11 +235,6 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 *entryCreated = FALSE;
 state |= OVS_CS_F_NEW;
 
-parentEntry = OvsCtRelatedLookup(ctx->key, currentTime);
-if (parentEntry != NULL) {
-state |= OVS_CS_F_RELATED;
-}
-
 switch (ipProto) {
 case IPPROTO_TCP:
 {
@@ -283,6 +278,11 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 break;
 }
 
+parentEntry = OvsCtRelatedLookup(ctx->key, currentTime);
+if (parentEntry != NULL && state != OVS_CS_F_INVALID) {
+state |= OVS_CS_F_RELATED;
+}
+
 if (state != OVS_CS_F_INVALID && commit) {
 if (entry) {
 entry->parent = parentEntry;
@@ -315,6 +315,7 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry,
  BOOLEAN reply,
  UINT64 now)
 {
+CT_UPDATE_RES status;
 switch (ipProto) {
 case IPPROTO_TCP:
 {
@@ -322,32 +323,23 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry,
 const TCPHdr *tcp;
 tcp = OvsGetTcp(nbl, l4Offset, );
 if (!tcp) {
-return CT_UPDATE_INVALID;
+status =  CT_UPDATE_INVALID;
+break;
 }
-return OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now);
+status =  OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now);
+break;
 }
 case IPPROTO_ICMP:
-return OvsConntrackUpdateIcmpEntry(entry, reply, now);
+status =  OvsConntrackUpdateIcmpEntry(entry, reply, now);
+break;
 case IPPROTO_UDP:
-return OvsConntrackUpdateOtherEntry(entry, reply, now);
+status =  OvsConntrackUpdateOtherEntry(entry, reply, now);
+break;
 default:
-return CT_UPDATE_INVALID;
-}
-}
-
-static __inline VOID
-OvsCtEntryDelete(POVS_CT_ENTRY entry)
-{
-if (entry == NULL) {
-return;
-}
-if (entry->natInfo.natAction) {
-OvsNatDeleteKey(>key);
+status =  CT_UPDATE_INVALID;
+break;
 }
-OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
-RemoveEntryList(>link);
-OvsFreeMemoryWithTag(entry, OVS_CT_POOL_TAG);
-ctTotalEntries--;
+return status;
 }
 
 static __inline BOOLEAN
@@ -358,6 +350,24 @@ OvsCtEntryExpired(POVS_CT_ENTRY entry)
 return entry->expiration < currentTime;
 }
 
+static __inline VOID
+OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete)
+{
+if (entry == NULL) {
+return;
+}
+if (forceDelete || OvsCtEntryExpired(entry)) {
+if (entry->natInfo.natAction) {
+OvsNatDeleteKey(>key);
+}
+OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE);
+RemoveEntryList(>link);
+

[ovs-dev] [PATCH v2 0/3] datapath-windows: New lock implementation in conntrack

2018-01-23 Thread Anand Kumar

This patch series replaces existing one RW lock implemenation in
conntrack with two RW locks in conntrack and one RW lock in NAT.
---
v1->v2:
 - Patch 3, address review comments
---
Anand Kumar (3):
  datapath-windows: Refactor conntrack code.
  datapath-windows: Add a global level RW lock for NAT
  datapath-windows: Optimize conntrack lock implementation.

 datapath-windows/ovsext/Conntrack-nat.c |  17 +-
 datapath-windows/ovsext/Conntrack.c | 410 
 datapath-windows/ovsext/Conntrack.h |   7 +-
 3 files changed, 276 insertions(+), 158 deletions(-)

-- 
2.9.3.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix incorrect handling of return values.

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 08:02:55PM +0800, huanglili wrote:
> From: Lili Huang 
> 
> The value cookie_offset should be 'size_t' type.
> 
> Change-Id: Id68b83d26b1235048cbcfd21429ad6915a7a2b36
> Signed-off-by: Lili Huang 

Thanks for the patch.

I get an error trying to apply this:

Applying: ofproto-dpif-xlate: Fix incorrect handling of return values.
error: patch failed: ofproto/ofproto-dpif-xlate.c:2900
error: ofproto/ofproto-dpif-xlate.c: patch does not apply
Patch failed at 0001 ofproto-dpif-xlate: Fix incorrect handling of return 
values.
The copy of the patch that failed is found in: 
/home/blp/nicira/ovs.git/worktrees/ovs/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Show total_ports_on_switch when displaying logical_switch:

2018-01-23 Thread Ben Pfaff

On Tue, Jan 23, 2018 at 09:46:26AM -0600, Mark Michelson wrote:
> On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:
> >From: aginwala 
> >
> >e.g. when running ovn-nbctl show ls, it's good to have total ports that are
> >attached to the switch.
> >
> >Signed-off-by: Aliasgar Ginwala 
> >---
> >  ovn/utilities/ovn-nbctl.c |  1 +
> >  tests/ovn-nbctl.at| 10 ++
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> >diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> >index c9aa2fe..0f1e952 100644
> >--- a/ovn/utilities/ovn-nbctl.c
> >+++ b/ovn/utilities/ovn-nbctl.c
> >@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls, struct 
> >ds *s)
> >  ds_put_format(s, "router-port: %s\n", router_port);
> >  }
> >  }
> >+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
> >  }
> 
> I think the wording here is a bit strange when compared to other output from
> `ovn-nbctl show`. I think something like "total ports" instead of
> "total_ports_on_switch" would be more consistent.
> 
> I also think that if you're adding this for logical switches, you should do
> the same for logical routers.

In addition, I don't understand why this is valuable information to
show.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().

2018-01-23 Thread Stokes, Ian

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, January 23, 2018 7:51 AM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Ben Pfaff ;
> Stokes, Ian ; Ilya Maximets
> ; Weglicki, MichalX 
> Subject: [PATCH] netdev-dpdk: Fix memory leak in
> netdev_dpdk_configure_xstats().
> 

Thanks Ilya, applied to DPDK_MERGE and DPDK_MERGE_2_9.

Regards
Ian
> CC: Michal Weglicki 
> Fixes: 971f4b394c6e ("netdev: Custom statistics.")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index a37c8df..bdce304
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1274,6 +1274,8 @@ netdev_dpdk_configure_xstats(struct netdev_dpdk
> *dev)
>  VLOG_WARN("Can't get XSTATS IDs for port: %"PRIu8,
>dev->port_id);
>  }
> +
> +free(rte_xstats);
>  }
>  }
>  } else {
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix xstats leak on port destruction.

2018-01-23 Thread Stokes, Ian

> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, January 23, 2018 7:52 AM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn ; Ben Pfaff ;
> Stokes, Ian ; Ilya Maximets
> ; Weglicki, MichalX 
> Subject: [PATCH] netdev-dpdk: Fix xstats leak on port destruction.
> 

Thanks Ilya, applied to DPDK_MERGE and DPDK_MERGE_2_9.

Regards
Ian

> CC: Michal Weglicki 
> Fixes: 971f4b394c6e ("netdev: Custom statistics.")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bdce304..50a94d1
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -452,6 +452,8 @@ struct netdev_rxq_dpdk {  static void
> netdev_dpdk_destruct(struct netdev *netdev);  static void
> netdev_dpdk_vhost_destruct(struct netdev *netdev);
> 
> +static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev);
> +
>  int netdev_dpdk_get_vid(const struct netdev_dpdk *dev);
> 
>  struct ingress_policer *
> @@ -1104,6 +1106,7 @@ netdev_dpdk_destruct(struct netdev *netdev)
>  }
>  }
> 
> +netdev_dpdk_clear_xstats(dev);
>  free(dev->devargs);
>  common_destruct(dev);
> 
> @@ -1169,7 +1172,7 @@ netdev_dpdk_dealloc(struct netdev *netdev)  }
> 
>  static void
> -netdev_dpdk_clear_xstats(struct netdev_dpdk *dev) OVS_REQUIRES(dev-
> >mutex)
> +netdev_dpdk_clear_xstats(struct netdev_dpdk *dev)
>  {
>  /* If statistics are already allocated, we have to
>   * reconfigure, as port_id could have been changed. */
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v9] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

2018-01-23 Thread Loftus, Ciara

> 
> On 19.01.2018 20:19, Ciara Loftus wrote:
> > Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
> > option to 'true' when configuring the Interface:
> >
> > ovs-vsctl set Interface dpdkvhostuserclient0
> > options:vhost-server-path=/tmp/dpdkvhostuserclient0
> > options:dq-zero-copy=true
> >
> > When packets from a vHost device with zero copy enabled are destined for
> > a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
> > must be set to a smaller value. 128 is recommended. This can be achieved
> > like so:
> >
> > ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >
> > Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
> > to should not exceed 128. Due to this requirement, the feature is
> > considered 'experimental'.
> >
> > Testing of the patch showed a 15% improvement when switching 512B
> > packets between vHost devices on different VMs on the same host when
> > zero copy was enabled on the transmitting device.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> > v9:
> > * Rebase
> > * Fix docs issue
> > * Move variable asignment inside mutex
> > * Reset dq-zero-copy value if vhost_driver_register fails
> >
> >  Documentation/intro/install/dpdk.rst |  2 +
> >  Documentation/topics/dpdk/vhost-user.rst | 73
> 
> >  NEWS |  1 +
> >  lib/netdev-dpdk.c| 11 -
> >  vswitchd/vswitch.xml | 11 +
> >  5 files changed, 97 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> > index 3fecb5c..087eb88 100644
> > --- a/Documentation/intro/install/dpdk.rst
> > +++ b/Documentation/intro/install/dpdk.rst
> > @@ -518,6 +518,8 @@ The above command sets the number of rx queues
> for DPDK physical interface.
> >  The rx queues are assigned to pmd threads on the same NUMA node in a
> >  round-robin fashion.
> >
> > +.. _dpdk-queues-sizes:
> > +
> >  DPDK Physical Port Queue Sizes
> >  ~~~
> >
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> > index 8447e2d..95517a6 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -458,3 +458,76 @@ Sample XML
> >  
> >
> >  .. _QEMU documentation: http://git.qemu-
> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> user.txt;h=7890d7169;hb=HEAD
> > +
> > +vhost-user Dequeue Zero Copy (experimental)
> > +---
> > +
> > +Normally when dequeuing a packet from a vHost User device, a memcpy
> operation
> > +must be used to copy that packet from guest address space to host
> address
> > +space. This memcpy can be removed by enabling dequeue zero-copy like
> so::
> > +
> > +$ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
> > +dpdkvhostuserclient0 type=dpdkvhostuserclient \
> > +options:vhost-server-path=/tmp/dpdkvhostclient0 \
> > +options:dq-zero-copy=true
> > +
> > +With this feature enabled, a reference (pointer) to the packet is passed to
> > +the host, instead of a copy of the packet. Removing this memcpy can give
> a
> > +performance improvement for some use cases, for example switching
> large packets
> > +between different VMs. However additional packet loss may be
> observed.
> > +
> > +Note that the feature is disabled by default and must be explicitly enabled
> > +by setting the ``dq-zero-copy`` option to ``true`` while specifying the
> > +``vhost-server-path`` option as above. If you wish to split out the
> command
> > +into multiple commands as below, ensure ``dq-zero-copy`` is set before
> > +``vhost-server-path``::
> > +
> > +$ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> copy=true
> > +$ ovs-vsctl set Interface dpdkvhostuserclient0 \
> > +options:vhost-server-path=/tmp/dpdkvhostclient0
> > +
> > +The feature is only available to ``dpdkvhostuserclient`` port types.
> > +
> > +A limitation exists whereby if packets from a vHost port with
> > +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of
> tx
> > +descriptors (``n_txq_desc``) for that port must be reduced to a smaller
> number,
> > +128 being the recommended value. This can be achieved by issuing the
> following
> > +command::
> > +
> > +$ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> > +
> > +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send
> to
> > +should not exceed 128. For example, in case of a bond over two physical
> ports
> > +in balance-tcp mode, one must divide 128 by the number of links in the
> bond.
> > +
> > +Refer to :ref:`dpdk-queues-sizes` for more information.
> > +
> > +The reason for this limitation is due to how the zero copy functionality is
> > +implemented. The vHost device's 'tx

[ovs-dev] [PATCH v10] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

2018-01-23 Thread Ciara Loftus

Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:

ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true

When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.

Testing of the patch showed a 15% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.

Signed-off-by: Ciara Loftus 
---
v10:
* Rebase
* Fix vhost flags

 Documentation/intro/install/dpdk.rst |  2 +
 Documentation/topics/dpdk/vhost-user.rst | 73 
 NEWS |  1 +
 lib/netdev-dpdk.c| 17 
 vswitchd/vswitch.xml | 11 +
 5 files changed, 104 insertions(+)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index 040e62e..93411aa 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK 
physical interface.
 The rx queues are assigned to pmd threads on the same NUMA node in a
 round-robin fashion.
 
+.. _dpdk-queues-sizes:
+
 DPDK Physical Port Queue Sizes
 ~~~
 
diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index 8447e2d..95517a6 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -458,3 +458,76 @@ Sample XML
 
 
 .. _QEMU documentation: 
http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
+
+vhost-user Dequeue Zero Copy (experimental)
+---
+
+Normally when dequeuing a packet from a vHost User device, a memcpy operation
+must be used to copy that packet from guest address space to host address
+space. This memcpy can be removed by enabling dequeue zero-copy like so::
+
+$ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
+dpdkvhostuserclient0 type=dpdkvhostuserclient \
+options:vhost-server-path=/tmp/dpdkvhostclient0 \
+options:dq-zero-copy=true
+
+With this feature enabled, a reference (pointer) to the packet is passed to
+the host, instead of a copy of the packet. Removing this memcpy can give a
+performance improvement for some use cases, for example switching large packets
+between different VMs. However additional packet loss may be observed.
+
+Note that the feature is disabled by default and must be explicitly enabled
+by setting the ``dq-zero-copy`` option to ``true`` while specifying the
+``vhost-server-path`` option as above. If you wish to split out the command
+into multiple commands as below, ensure ``dq-zero-copy`` is set before
+``vhost-server-path``::
+
+$ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
+$ ovs-vsctl set Interface dpdkvhostuserclient0 \
+options:vhost-server-path=/tmp/dpdkvhostclient0
+
+The feature is only available to ``dpdkvhostuserclient`` port types.
+
+A limitation exists whereby if packets from a vHost port with
+``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
+descriptors (``n_txq_desc``) for that port must be reduced to a smaller number,
+128 being the recommended value. This can be achieved by issuing the following
+command::
+
+$ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
+
+Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
+should not exceed 128. For example, in case of a bond over two physical ports
+in balance-tcp mode, one must divide 128 by the number of links in the bond.
+
+Refer to :ref:`dpdk-queues-sizes` for more information.
+
+The reason for this limitation is due to how the zero copy functionality is
+implemented. The vHost device's 'tx used vring', a virtio structure used for
+tracking used ie. sent descriptors, will only be updated when the NIC frees
+the corresponding mbuf. If we don't free the mbufs frequently enough, that
+vring will be starved and packets will no longer be processed. One way to
+ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a
+small enough number such that the 'mbuf free threshold' for the NIC will be hit
+more often and thus free mbufs more frequently. The value of 128 is suggested,
+but values of 64 and 256 have been tested and verified to work too, with
+differing

[ovs-dev] Test

2018-01-23 Thread Nandan Kulkarni

Hi there,

This email is to test OVS dev ML.

Regards,
Nandan Kulkarni
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovn-controller: add new external_id 'ovn-cms-options' to Chassis table

2018-01-23 Thread Miguel Angel Ajo Pelayo

Awesome it, this would be very helpful to signal chassis details to the cms
in a consistent way.

It'd be very helpful if we can get this on 2.9, and although I know it's a
feature it's tiny enough and independent enough which may not introduce any
regression to OVN.

Best,
Miguel Ángel

On Tue, Jan 23, 2018, 4:14 PM Daniel Alvarez  wrote:

> This patch makes ovn-controller sets the external_ids key
> 'ovn-cms-options' to its own Chassis table entry copying its
> contents from the same external_ids key in the local OpenvSwitch
> database.
>
> The idea behind this patch is to allow setting general options
> from the CMS Plugin to a particular chassis.
>
> A good example of an use case is when we want to schedule a router
> on a chassis from OpenStack. In this case, we may want to exclude
> some nodes because they are more likely to be restarted for
> maintenance operations or they simply won't have external connectivity.
> This way, if the CMS/deployment tool would set the external_ids
> as:
>
> ovs-vsctl set open . external_ids:ovn-cms-options="enable-chassis-as-gw"
>
> Then ovn-controller will write the options to the Chassis table in
> southbound database. This value can be later read by the CMS in order
> to decide which Chassis are eligible to schedule a router on.
>
> Similarly, this new key would allow to specify additional options to
> be consumed by the CMS.
>
> Signed-off-by: Daniel Alvarez 
> ---
>  ovn/controller/chassis.c| 13 -
>  ovn/controller/ovn-controller.8.xml |  7 +++
>  ovn/ovn-sb.xml  |  8 
>  3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
> index 521b04c..6b5286a 100644
> --- a/ovn/controller/chassis.c
> +++ b/ovn/controller/chassis.c
> @@ -66,6 +66,12 @@ get_bridge_mappings(const struct smap *ext_ids)
>  return smap_get_def(ext_ids, "ovn-bridge-mappings", "");
>  }
>
> +static const char *
> +get_cms_options(const struct smap *ext_ids)
> +{
> +return smap_get_def(ext_ids, "ovn-cms-options", "");
> +}
> +
>  /* Returns this chassis's Chassis record, if it is available and is
> currently
>   * amenable to a transaction. */
>  const struct sbrec_chassis *
> @@ -119,6 +125,7 @@ chassis_run(struct controller_ctx *ctx, const char
> *chassis_id,
>  const char *bridge_mappings = get_bridge_mappings(>external_ids);
>  const char *datapath_type =
>  br_int && br_int->datapath_type ? br_int->datapath_type : "";
> +const char *cms_options = get_cms_options(>external_ids);
>
>  struct ds iface_types = DS_EMPTY_INITIALIZER;
>  ds_put_cstr(_types, "");
> @@ -144,16 +151,20 @@ chassis_run(struct controller_ctx *ctx, const char
> *chassis_id,
>  = smap_get_def(_rec->external_ids, "datapath-type",
> "");
>  const char *chassis_iface_types
>  = smap_get_def(_rec->external_ids, "iface-types", "");
> +const char *chassis_cms_options
> += get_cms_options(_rec->external_ids);
>
>  /* If any of the external-ids should change, update them. */
>  if (strcmp(bridge_mappings, chassis_bridge_mappings) ||
>  strcmp(datapath_type, chassis_datapath_type) ||
> -strcmp(iface_types_str, chassis_iface_types)) {
> +strcmp(iface_types_str, chassis_iface_types) ||
> +strcmp(cms_options, chassis_cms_options)) {
>  struct smap new_ids;
>  smap_clone(_ids, _rec->external_ids);
>  smap_replace(_ids, "ovn-bridge-mappings",
> bridge_mappings);
>  smap_replace(_ids, "datapath-type", datapath_type);
>  smap_replace(_ids, "iface-types", iface_types_str);
> +smap_replace(_ids, "ovn-cms-options", cms_options);
>  sbrec_chassis_verify_external_ids(chassis_rec);
>  sbrec_chassis_set_external_ids(chassis_rec, _ids);
>  smap_destroy(_ids);
> diff --git a/ovn/controller/ovn-controller.8.xml
> b/ovn/controller/ovn-controller.8.xml
> index 2af3db5..0df59ac 100644
> --- a/ovn/controller/ovn-controller.8.xml
> +++ b/ovn/controller/ovn-controller.8.xml
> @@ -150,6 +150,13 @@
>  network interface card, enabling encapsulation checksum may incur
>  performance loss. In such cases, encapsulation checksums can be
> disabled.
>
> +
> +  external_ids:ovn-cms-options
> +  
> +A list of options that will be consumed by the CMS Plugin and
> which
> +specific to this particular chassis. An example would be:
> +cms_option1,cms_option2:foo.
> +  
>  
>
>  
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index abc241e..4a75135 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -241,6 +241,14 @@
>read-only. See ovn-controller(8) for more information.
>  
>
> +
> +  ovn-controller populates this key with the set of
> options
> +

Re: [ovs-dev] [PATCH] Show total_ports_on_switch when displaying logical_switch:

2018-01-23 Thread Mark Michelson


On 01/19/2018 11:08 PM, amgin...@gmail.com wrote:

From: aginwala 

e.g. when running ovn-nbctl show ls, it's good to have total ports that are
attached to the switch.

Signed-off-by: Aliasgar Ginwala 
---
  ovn/utilities/ovn-nbctl.c |  1 +
  tests/ovn-nbctl.at| 10 ++
  2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index c9aa2fe..0f1e952 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -682,6 +682,7 @@ print_ls(const struct nbrec_logical_switch *ls, struct ds 
*s)
  ds_put_format(s, "router-port: %s\n", router_port);
  }
  }
+ds_put_format(s, "total_ports_on_switch: %u\n", ls->n_ports);
  }


I think the wording here is a bit strange when compared to other output 
from `ovn-nbctl show`. I think something like "total ports" instead of 
"total_ports_on_switch" would be more consistent.


I also think that if you're adding this for logical switches, you should 
do the same for logical routers.


  
  static void

diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at
index 5ac4a6d..5fbaffd 100644
--- a/tests/ovn-nbctl.at
+++ b/tests/ovn-nbctl.at
@@ -43,15 +43,17 @@ AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl
  q
  AT_CHECK([ovn-nbctl show ls0])
  AT_CHECK([ovn-nbctl ls-add ls0])
-AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0],
-  [switch <0> (ls0)
+AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0], [dnl
+switch <0> (ls0)
+total_ports_on_switch: 0
  ])
  AT_CHECK([ovn-nbctl ls-add ls0], [1], [],
[ovn-nbctl: ls0: a switch with this name already exists
  ])
  AT_CHECK([ovn-nbctl --may-exist ls-add ls0])
-AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0],
-  [switch <0> (ls0)
+AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0], [dnl
+switch <0> (ls0)
+total_ports_on_switch: 0
  ])
  AT_CHECK([ovn-nbctl --add-duplicate ls-add ls0])
  AT_CHECK([ovn-nbctl --may-exist --add-duplicate ls-add ls0], [1], [],



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] vswitchd: show DPDK version

2018-01-23 Thread Stokes, Ian

> On 01/15/2018 06:21 PM, Matteo Croce wrote:
> > Show DPDK version if Open vSwitch is compiled with DPDK support.
> > Version can be retrieved with `ovs-switchd --version` or from OVS logs.
> > Small change in ovs-ctl to avoid breakage on output change.
> >
> > Signed-off-by: Matteo Croce 
> 
> LGTM
> 
> Acked-by: Kevin Traynor 

Thanks all for the work put into this.

I've fixed the commit typo and merged it to DPDK_MERGE, I also think it can be 
merged to DPDK_MERGE_2_9 for the ovs 2.9 release.

Thanks
Ian
> 
> > ---
> > v3:
> >  print version in OVS logs too
> >  don't save the version in the DB
> >  use dpdk-stub.c instead of #ifdef
> >
> > v4:
> >  fix only patch subject, I left ovs-vsctl from the older v1/v2, sorry
> >
> >  lib/dpdk-stub.c | 5 +
> >  lib/dpdk.c  | 8 
> >  lib/dpdk.h  | 1 +
> >  utilities/ovs-ctl.in| 2 +-
> >  vswitchd/ovs-vswitchd.c | 1 +
> >  5 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c index
> > 36021807c..041cd0cbb 100644
> > --- a/lib/dpdk-stub.c
> > +++ b/lib/dpdk-stub.c
> > @@ -54,3 +54,8 @@ dpdk_vhost_iommu_enabled(void)  {
> >  return false;
> >  }
> > +
> > +void
> > +print_dpdk_version(void)
> > +{
> > +}
> > diff --git a/lib/dpdk.c b/lib/dpdk.c
> > index 6710d10fc..3f5a55fc1 100644
> > --- a/lib/dpdk.c
> > +++ b/lib/dpdk.c
> > @@ -24,6 +24,7 @@
> >
> >  #include 
> >  #include 
> > +#include 
> >  #ifdef DPDK_PDUMP
> >  #include 
> >  #include 
> > @@ -471,6 +472,7 @@ dpdk_init(const struct smap *ovs_other_config)
> >  static struct ovsthread_once once_enable =
> > OVSTHREAD_ONCE_INITIALIZER;
> >
> >  if (ovsthread_once_start(_enable)) {
> > +VLOG_INFO("Using %s", rte_version());
> >  VLOG_INFO("DPDK Enabled - initializing...");
> >  dpdk_init__(ovs_other_config);
> >  enabled = true;
> > @@ -501,3 +503,9 @@ dpdk_set_lcore_id(unsigned cpu)
> >  ovs_assert(cpu != NON_PMD_CORE_ID);
> >  RTE_PER_LCORE(_lcore_id) = cpu;
> >  }
> > +
> > +void
> > +print_dpdk_version(void)
> > +{
> > +puts(rte_version());
> > +}
> > diff --git a/lib/dpdk.h b/lib/dpdk.h
> > index dc58d968a..b04153591 100644
> > --- a/lib/dpdk.h
> > +++ b/lib/dpdk.h
> > @@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
> > void dpdk_set_lcore_id(unsigned cpu);  const char
> > *dpdk_get_vhost_sock_dir(void);  bool dpdk_vhost_iommu_enabled(void);
> > +void print_dpdk_version(void);
> >
> >  #endif /* dpdk.h */
> > diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in index
> > 1df56c4a5..ef06dd967 100755
> > --- a/utilities/ovs-ctl.in
> > +++ b/utilities/ovs-ctl.in
> > @@ -72,7 +72,7 @@ set_hostname () {
> >  set_system_ids () {
> >  set ovs_vsctl set Open_vSwitch .
> >
> > -OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
> > +OVS_VERSION=`ovs-vswitchd --version | awk '/Open vSwitch/{print
> > + $NF}'`
> >  set "$@" ovs-version="$OVS_VERSION"
> >
> >  case $SYSTEM_ID in
> > diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index
> > d5e07c037..a5e8f4d39 100644
> > --- a/vswitchd/ovs-vswitchd.c
> > +++ b/vswitchd/ovs-vswitchd.c
> > @@ -187,6 +187,7 @@ parse_options(int argc, char *argv[], char
> > **unixctl_pathp)
> >
> >  case 'V':
> >  ovs_print_version(0, 0);
> > +print_dpdk_version();
> >  exit(EXIT_SUCCESS);
> >
> >  case OPT_MLOCKALL:
> >
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] ovn-controller: add new external_id 'ovn-cms-options' to Chassis table

2018-01-23 Thread Daniel Alvarez

This patch makes ovn-controller sets the external_ids key
'ovn-cms-options' to its own Chassis table entry copying its
contents from the same external_ids key in the local OpenvSwitch
database.

The idea behind this patch is to allow setting general options
from the CMS Plugin to a particular chassis.

A good example of an use case is when we want to schedule a router
on a chassis from OpenStack. In this case, we may want to exclude
some nodes because they are more likely to be restarted for
maintenance operations or they simply won't have external connectivity.
This way, if the CMS/deployment tool would set the external_ids
as:

ovs-vsctl set open . external_ids:ovn-cms-options="enable-chassis-as-gw"

Then ovn-controller will write the options to the Chassis table in
southbound database. This value can be later read by the CMS in order
to decide which Chassis are eligible to schedule a router on.

Similarly, this new key would allow to specify additional options to
be consumed by the CMS.

Signed-off-by: Daniel Alvarez 
---
 ovn/controller/chassis.c| 13 -
 ovn/controller/ovn-controller.8.xml |  7 +++
 ovn/ovn-sb.xml  |  8 
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index 521b04c..6b5286a 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -66,6 +66,12 @@ get_bridge_mappings(const struct smap *ext_ids)
 return smap_get_def(ext_ids, "ovn-bridge-mappings", "");
 }
 
+static const char *
+get_cms_options(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-cms-options", "");
+}
+
 /* Returns this chassis's Chassis record, if it is available and is currently
  * amenable to a transaction. */
 const struct sbrec_chassis *
@@ -119,6 +125,7 @@ chassis_run(struct controller_ctx *ctx, const char 
*chassis_id,
 const char *bridge_mappings = get_bridge_mappings(>external_ids);
 const char *datapath_type =
 br_int && br_int->datapath_type ? br_int->datapath_type : "";
+const char *cms_options = get_cms_options(>external_ids);
 
 struct ds iface_types = DS_EMPTY_INITIALIZER;
 ds_put_cstr(_types, "");
@@ -144,16 +151,20 @@ chassis_run(struct controller_ctx *ctx, const char 
*chassis_id,
 = smap_get_def(_rec->external_ids, "datapath-type", "");
 const char *chassis_iface_types
 = smap_get_def(_rec->external_ids, "iface-types", "");
+const char *chassis_cms_options
+= get_cms_options(_rec->external_ids);
 
 /* If any of the external-ids should change, update them. */
 if (strcmp(bridge_mappings, chassis_bridge_mappings) ||
 strcmp(datapath_type, chassis_datapath_type) ||
-strcmp(iface_types_str, chassis_iface_types)) {
+strcmp(iface_types_str, chassis_iface_types) ||
+strcmp(cms_options, chassis_cms_options)) {
 struct smap new_ids;
 smap_clone(_ids, _rec->external_ids);
 smap_replace(_ids, "ovn-bridge-mappings", bridge_mappings);
 smap_replace(_ids, "datapath-type", datapath_type);
 smap_replace(_ids, "iface-types", iface_types_str);
+smap_replace(_ids, "ovn-cms-options", cms_options);
 sbrec_chassis_verify_external_ids(chassis_rec);
 sbrec_chassis_set_external_ids(chassis_rec, _ids);
 smap_destroy(_ids);
diff --git a/ovn/controller/ovn-controller.8.xml 
b/ovn/controller/ovn-controller.8.xml
index 2af3db5..0df59ac 100644
--- a/ovn/controller/ovn-controller.8.xml
+++ b/ovn/controller/ovn-controller.8.xml
@@ -150,6 +150,13 @@
 network interface card, enabling encapsulation checksum may incur
 performance loss. In such cases, encapsulation checksums can be 
disabled.
   
+
+  external_ids:ovn-cms-options
+  
+A list of options that will be consumed by the CMS Plugin and which
+specific to this particular chassis. An example would be:
+cms_option1,cms_option2:foo.
+  
 
 
 
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index abc241e..4a75135 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -241,6 +241,14 @@
   read-only. See ovn-controller(8) for more information.
 
 
+
+  ovn-controller populates this key with the set of options
+  configured in the  column of the Open_vSwitch
+  database's  table.
+  See ovn-controller(8) for more information.
+
+
 
   The overall purpose of these columns is described under Common
   Columns at the beginning of this document.
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovn-controller: add new external_id 'ovn-cms-options' to Chassis table

2018-01-23 Thread Daniel Alvarez

This patch makes ovn-controller sets the external_ids key
'ovn-cms-options' to its own Chassis table entry copying its
contents from the same external_ids key in the local OpenvSwitch
database.

The idea behind this patch is to allow setting general options
from the CMS Plugin to a particular chassis.

A good example of an use case is when we want to schedule a router
on a chassis from OpenStack. In this case, we may want to exclude
some nodes because they are more likely to be restarted for
maintenance operations or they simply won't have external connectivity.
This way, if the CMS/deployment tool would set the external_ids
as:

ovs-vsctl set open . external_ids:ovn-cms-options="enable-chassis-as-gw"

Then ovn-controller will write the options to the Chassis table in
southbound database. This value can be later read by the CMS in order
to decide which Chassis are eligible to schedule a router on.

Similarly, this new key would allow to specify additional options to
be consumed by the CMS.
---
 ovn/controller/chassis.c| 13 -
 ovn/controller/ovn-controller.8.xml |  7 +++
 ovn/ovn-sb.xml  |  8 
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index 521b04c..6b5286a 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -66,6 +66,12 @@ get_bridge_mappings(const struct smap *ext_ids)
 return smap_get_def(ext_ids, "ovn-bridge-mappings", "");
 }
 
+static const char *
+get_cms_options(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-cms-options", "");
+}
+
 /* Returns this chassis's Chassis record, if it is available and is currently
  * amenable to a transaction. */
 const struct sbrec_chassis *
@@ -119,6 +125,7 @@ chassis_run(struct controller_ctx *ctx, const char 
*chassis_id,
 const char *bridge_mappings = get_bridge_mappings(>external_ids);
 const char *datapath_type =
 br_int && br_int->datapath_type ? br_int->datapath_type : "";
+const char *cms_options = get_cms_options(>external_ids);
 
 struct ds iface_types = DS_EMPTY_INITIALIZER;
 ds_put_cstr(_types, "");
@@ -144,16 +151,20 @@ chassis_run(struct controller_ctx *ctx, const char 
*chassis_id,
 = smap_get_def(_rec->external_ids, "datapath-type", "");
 const char *chassis_iface_types
 = smap_get_def(_rec->external_ids, "iface-types", "");
+const char *chassis_cms_options
+= get_cms_options(_rec->external_ids);
 
 /* If any of the external-ids should change, update them. */
 if (strcmp(bridge_mappings, chassis_bridge_mappings) ||
 strcmp(datapath_type, chassis_datapath_type) ||
-strcmp(iface_types_str, chassis_iface_types)) {
+strcmp(iface_types_str, chassis_iface_types) ||
+strcmp(cms_options, chassis_cms_options)) {
 struct smap new_ids;
 smap_clone(_ids, _rec->external_ids);
 smap_replace(_ids, "ovn-bridge-mappings", bridge_mappings);
 smap_replace(_ids, "datapath-type", datapath_type);
 smap_replace(_ids, "iface-types", iface_types_str);
+smap_replace(_ids, "ovn-cms-options", cms_options);
 sbrec_chassis_verify_external_ids(chassis_rec);
 sbrec_chassis_set_external_ids(chassis_rec, _ids);
 smap_destroy(_ids);
diff --git a/ovn/controller/ovn-controller.8.xml 
b/ovn/controller/ovn-controller.8.xml
index 2af3db5..0df59ac 100644
--- a/ovn/controller/ovn-controller.8.xml
+++ b/ovn/controller/ovn-controller.8.xml
@@ -150,6 +150,13 @@
 network interface card, enabling encapsulation checksum may incur
 performance loss. In such cases, encapsulation checksums can be 
disabled.
   
+
+  external_ids:ovn-cms-options
+  
+A list of options that will be consumed by the CMS Plugin and which
+specific to this particular chassis. An example would be:
+cms_option1,cms_option2:foo.
+  
 
 
 
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index abc241e..4a75135 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -241,6 +241,14 @@
   read-only. See ovn-controller(8) for more information.
 
 
+
+  ovn-controller populates this key with the set of options
+  configured in the  column of the Open_vSwitch
+  database's  table.
+  See ovn-controller(8) for more information.
+
+
 
   The overall purpose of these columns is described under Common
   Columns at the beginning of this document.
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH OVS 1/1] tc flower: reorder tunnel encap/decap actions

2018-01-23 Thread John Hurley

The tc_flower conversion struct does not consider the order of actions.
If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
the netlink conversion to TC will add the set tunnel key action before the
unset, leading to an incorrect TC rule. This patch reorders the netlink
generation to ensure a decap is done before an encap if both exist.

Signed-off-by: John Hurley 
Reviewed-by: Simon Horman 
---
 lib/tc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/tc.c b/lib/tc.c
index bfe20ce..914465a 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1379,6 +1379,11 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
 nl_msg_end_nested(request, act_offset);
 }
 }
+if (flower->tunnel.tunnel) {
+act_offset = nl_msg_start_nested(request, act_index++);
+nl_msg_put_act_tunnel_key_release(request);
+nl_msg_end_nested(request, act_offset);
+}
 if (flower->set.set) {
 act_offset = nl_msg_start_nested(request, act_index++);
 nl_msg_put_act_tunnel_key_set(request, flower->set.id,
@@ -1389,11 +1394,6 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
   flower->set.tp_dst);
 nl_msg_end_nested(request, act_offset);
 }
-if (flower->tunnel.tunnel) {
-act_offset = nl_msg_start_nested(request, act_index++);
-nl_msg_put_act_tunnel_key_release(request);
-nl_msg_end_nested(request, act_offset);
-}
 if (flower->vlan_pop) {
 act_offset = nl_msg_start_nested(request, act_index++);
 nl_msg_put_act_pop_vlan(request);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] bridge: Fix custom stats' counters leak.

2018-01-23 Thread Mark Michelson


On 01/22/2018 11:55 PM, Ilya Maximets wrote:

The caller takes ownership over allocated array of counters.
And it must free them.

CC: Michal Weglicki 
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets 


Acked-by: Mark Michelson 


---
  vswitchd/bridge.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index d80da1c..02d97de 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2432,6 +2432,7 @@ iface_refresh_stats(struct iface *iface)
  
  free(values);

  free(keys);
+netdev_free_custom_stats_counters(_stats);
  }
  
  static void




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v9] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

2018-01-23 Thread Ilya Maximets

On 19.01.2018 20:19, Ciara Loftus wrote:
> Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
> option to 'true' when configuring the Interface:
> 
> ovs-vsctl set Interface dpdkvhostuserclient0
> options:vhost-server-path=/tmp/dpdkvhostuserclient0
> options:dq-zero-copy=true
> 
> When packets from a vHost device with zero copy enabled are destined for
> a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
> must be set to a smaller value. 128 is recommended. This can be achieved
> like so:
> 
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> 
> Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
> to should not exceed 128. Due to this requirement, the feature is
> considered 'experimental'.
> 
> Testing of the patch showed a 15% improvement when switching 512B
> packets between vHost devices on different VMs on the same host when
> zero copy was enabled on the transmitting device.
> 
> Signed-off-by: Ciara Loftus 
> ---
> v9:
> * Rebase
> * Fix docs issue
> * Move variable asignment inside mutex
> * Reset dq-zero-copy value if vhost_driver_register fails
> 
>  Documentation/intro/install/dpdk.rst |  2 +
>  Documentation/topics/dpdk/vhost-user.rst | 73 
> 
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 11 -
>  vswitchd/vswitch.xml | 11 +
>  5 files changed, 97 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 3fecb5c..087eb88 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK 
> physical interface.
>  The rx queues are assigned to pmd threads on the same NUMA node in a
>  round-robin fashion.
>  
> +.. _dpdk-queues-sizes:
> +
>  DPDK Physical Port Queue Sizes
>  ~~~
>  
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 8447e2d..95517a6 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -458,3 +458,76 @@ Sample XML
>  
>  
>  .. _QEMU documentation: 
> http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy (experimental)
> +---
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy operation
> +must be used to copy that packet from guest address space to host address
> +space. This memcpy can be removed by enabling dequeue zero-copy like so::
> +
> +$ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
> +dpdkvhostuserclient0 type=dpdkvhostuserclient \
> +options:vhost-server-path=/tmp/dpdkvhostclient0 \
> +options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is passed to
> +the host, instead of a copy of the packet. Removing this memcpy can give a
> +performance improvement for some use cases, for example switching large 
> packets
> +between different VMs. However additional packet loss may be observed.
> +
> +Note that the feature is disabled by default and must be explicitly enabled
> +by setting the ``dq-zero-copy`` option to ``true`` while specifying the
> +``vhost-server-path`` option as above. If you wish to split out the command
> +into multiple commands as below, ensure ``dq-zero-copy`` is set before
> +``vhost-server-path``::
> +
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 \
> +options:vhost-server-path=/tmp/dpdkvhostclient0
> +
> +The feature is only available to ``dpdkvhostuserclient`` port types.
> +
> +A limitation exists whereby if packets from a vHost port with
> +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx
> +descriptors (``n_txq_desc``) for that port must be reduced to a smaller 
> number,
> +128 being the recommended value. This can be achieved by issuing the 
> following
> +command::
> +
> +$ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +
> +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to
> +should not exceed 128. For example, in case of a bond over two physical ports
> +in balance-tcp mode, one must divide 128 by the number of links in the bond.
> +
> +Refer to :ref:`dpdk-queues-sizes` for more information.
> +
> +The reason for this limitation is due to how the zero copy functionality is
> +implemented. The vHost device's 'tx used vring', a virtio structure used for
> +tracking used ie. sent descriptors, will only be updated when the NIC frees
> +the corresponding mbuf. If we don't free the mbufs frequently enough, that
> +vring will be

Re: [ovs-dev] [PATCH v3] netdev-dpdk: fix port addition for ports sharing same PCI id

2018-01-23 Thread Stokes, Ian



> -Original Message-
> From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> Sent: Friday, January 19, 2018 2:48 PM
> To: Stokes, Ian 
> Cc: d...@openvswitch.org; Shahaf Shuler ; Loftus,
> Ciara ; Thomas Monjalon ;
> Kevin Traynor ; Ilya Maximets
> (i.maxim...@samsung.com) ; Olga Shern
> 
> Subject: Re: [PATCH v3] netdev-dpdk: fix port addition for ports sharing
> same PCI id
> 
> On Fri, Jan 19, 2018 at 11:03:52AM +, Stokes, Ian wrote:
> > > -Original Message-
> > > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > > Sent: Wednesday, January 10, 2018 3:05 AM
> > > To: d...@openvswitch.org
> > > Cc: Stokes, Ian ; Shahaf Shuler
> > > ; Yuanhan Liu ; Loftus,
> > > Ciara ; Thomas Monjalon
> > > ; Kevin Traynor 
> > > Subject: [PATCH v3] netdev-dpdk: fix port addition for ports sharing
> > > same PCI id
> > >
> > > Some NICs have only one PCI address associated with multiple ports.
> > > This patch extends the dpdk-devargs option's format to cater for such
> devices.
> > >
> > > To achieve that, this patch uses a new syntax that will be adapted
> > > and implemented in future DPDK release (likely, v18.05):
> > > http://dpdk.org/ml/archives/dev/2017-December/084234.html
> > >
> > > And since it's the DPDK duty to parse the (complete and full) syntax
> > > and this patch is more likely to serve as an intermediate
> > > workaround, here I take a simpler and shorter syntax from it (note
> > > it's allowed to have only one category being provided):
> > > class=eth,mac=00:11:22:33:44:55:66
> > >
> > > Also, old compatibility is kept. Users can still go on with using
> > > the PCI id to add a port (if that's enough for them). Meaning, this
> > > patch will not break anything.
> > >
> > > This patch is basically based on the one from Ciara:
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-
> > > October/339496.html
> > >
> > > Cc: Loftus Ciara 
> > > Cc: Thomas Monjalon 
> > > Cc: Kevin Traynor 
> > > Signed-off-by: Yuanhan Liu 
> >
> > Thanks for the work on this Yuanhan, it's a valuable feature/bug fix.
> >
> > At this point I think this aligns to the DPDK model?
> 
> Yes, I think so.
> 
> >
> > If so, and  if there are no other comments on this patchset then I plan
> to merge this to DPDK merge over the weekend.
> >
> 
> Thank you for taking care of it.

Thanks for all the work on this, I've pushed this to DPDK_MERGE and will also 
push it to DPDK_MERGE_2_9 for it to be part of the 2.9 release.

Ian
> 
>   --yliu

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] netdev-dpdk: Refactor custom stats.

2018-01-23 Thread Ilya Maximets

This code is overcomplicated and completely unreadable. And a
bunch of recently fixed memory leaks confirms that statement.

Main concerns that were fixed:
* Too big nesting level.
* Useless checks like pointer checking after xmalloc.
* Misleading comments.
* Bad names of the variables.

As a bonus, size of the code significantly reduced.

Signed-off-by: Ilya Maximets 
---

This patch made on top of memory leaks' fixes:
* https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343537.html
* https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343538.html

 lib/netdev-dpdk.c | 215 --
 1 file changed, 80 insertions(+), 135 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 50a94d1..e834c7e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -437,10 +437,9 @@ struct netdev_dpdk {
 
 PADDED_MEMBERS(CACHE_LINE_SIZE,
 /* Names of all XSTATS counters */
-struct rte_eth_xstat_name *rte_xstats_names;
-int rte_xstats_names_size;
-int rte_xstats_ids_size;
-uint64_t *rte_xstats_ids;
+struct rte_eth_xstat_name *xstats_names;
+uint64_t *xstats_ids;
+int xstats_count;
 );
 };
 
@@ -901,6 +900,8 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
 dev->vhost_reconfigured = false;
 dev->attached = false;
 
+dev->xstats_count = 0;
+
 ovsrcu_init(>qos_conf, NULL);
 
 ovsrcu_init(>ingress_policer, NULL);
@@ -925,13 +926,6 @@ common_construct(struct netdev *netdev, dpdk_port_t 
port_no,
 ovs_list_push_back(_list, >list_node);
 
 netdev_request_reconfigure(netdev);
-
-dev->rte_xstats_names = NULL;
-dev->rte_xstats_names_size = 0;
-
-dev->rte_xstats_ids = NULL;
-dev->rte_xstats_ids_size = 0;
-
 return 0;
 }
 
@@ -1174,123 +1168,83 @@ netdev_dpdk_dealloc(struct netdev *netdev)
 static void
 netdev_dpdk_clear_xstats(struct netdev_dpdk *dev)
 {
-/* If statistics are already allocated, we have to
- * reconfigure, as port_id could have been changed. */
-if (dev->rte_xstats_names) {
-free(dev->rte_xstats_names);
-dev->rte_xstats_names = NULL;
-dev->rte_xstats_names_size = 0;
-}
-if (dev->rte_xstats_ids) {
-free(dev->rte_xstats_ids);
-dev->rte_xstats_ids = NULL;
-dev->rte_xstats_ids_size = 0;
-}
-}
-
-static const char*
-netdev_dpdk_get_xstat_name(struct netdev_dpdk *dev, uint64_t id)
-{
-if (id >= dev->rte_xstats_names_size) {
-return "UNKNOWN";
+if (dev->xstats_count) {
+free(dev->xstats_ids);
+free(dev->xstats_names);
+dev->xstats_count = 0;
 }
-return dev->rte_xstats_names[id].name;
 }
 
 static bool
 netdev_dpdk_configure_xstats(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
 {
-int rte_xstats_len;
-bool ret;
-struct rte_eth_xstat *rte_xstats;
-uint64_t id;
-int xstats_no;
-const char *name;
-
-/* Retrieving all XSTATS names. If something will go wrong
- * or amount of counters will be equal 0, rte_xstats_names
- * buffer will be marked as NULL, and any further xstats
- * query won't be performed (e.g. during netdev_dpdk_get_stats
- * execution). */
-
-ret = false;
-rte_xstats = NULL;
-
-if (dev->rte_xstats_names == NULL || dev->rte_xstats_ids == NULL) {
-dev->rte_xstats_names_size =
-rte_eth_xstats_get_names(dev->port_id, NULL, 0);
-
-if (dev->rte_xstats_names_size < 0) {
-VLOG_WARN("Cannot get XSTATS for port: %"PRIu8, dev->port_id);
-dev->rte_xstats_names_size = 0;
-} else {
-/* Reserve memory for xstats names and values */
-dev->rte_xstats_names = xcalloc(dev->rte_xstats_names_size,
-sizeof *dev->rte_xstats_names);
-
-if (dev->rte_xstats_names) {
-/* Retreive xstats names */
-rte_xstats_len =
-rte_eth_xstats_get_names(dev->port_id,
- dev->rte_xstats_names,
- dev->rte_xstats_names_size);
-
-if (rte_xstats_len < 0) {
-VLOG_WARN("Cannot get XSTATS names for port: %"PRIu8,
-   dev->port_id);
-goto out;
-} else if (rte_xstats_len != dev->rte_xstats_names_size) {
-VLOG_WARN("XSTATS size doesn't match for port: %"PRIu8,
-  dev->port_id);
-goto out;
-}
-
-dev->rte_xstats_ids = xcalloc(dev->rte_xstats_names_size,
-  sizeof(uint64_t));
-
-/* We have to calculate number of counters */
-rte_xstats = xmalloc(rte_xstats_len * sizeof *rte_xstats);
-

[ovs-dev] Updated Rackspace Users List

2018-01-23 Thread clara . stevens

style="color:rgb(31,78,121)">Hi,style="color:rgb(31,78,121)"> 


We have  
the new updated Rackspace Users lists with emails and complete  
contact information
for your business leads.style="color:rgb(31,78,121)"> 


We also  
have other technology users like: AWS, Azure, Heroku, Amazon EC2, Amazon  
S3,

OpenStack, Salesforce, Joyent, VMware, NetSuite, SoftLayer, Salesforce CRM,
Office365, Microsoft Dynamics AX, Sage and many more…..style="color:rgb(31,78,121)"> 


style="color:rgb(31,78,121)">Information
Fields: Name, Title, Email,  
Phone, Company Name, Physical Address, City, State,
Zip Code, Country, Web Address, Employee Size, Revenue Size and  
Industry. 


We also  
provide IT Decision Makers, Sales and Marketing
Decision Makers, C-level Titles and Decision Makers from all  
Departments. 


style="color:rgb(31,78,121)">Note:style="color:rgb(31,78,121)"> If Rackspace
Users is not relevant to you please reply back with your Target Market, we  
have
all types of target market available.style="color:rgb(31,78,121)"> 


style="color:rgb(31,78,121)">Regards, style="color:rgb(31,78,121)">Clara  
Stevensstyle="color:rgb(31,78,121)">Marketing Executivestyle="color:rgb(31,78,121)"> 


To opt out  
please response REMOVE
powered by GSM. Free mail merge and  
email marketing software for Gmail.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix incorrect handling of return values.

2018-01-23 Thread huanglili

From: Lili Huang 

The value cookie_offset should be 'size_t' type.

Change-Id: Id68b83d26b1235048cbcfd21429ad6915a7a2b36
Signed-off-by: Lili Huang 
---
 ofproto/ofproto-dpif-xlate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index d94e9dc..e1e6194 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -2900,7 +2900,7 @@ compose_sample_action(struct xlate_ctx *ctx,
 ctx->xbridge, ctx->xin->flow.in_port.ofp_port);
 uint32_t pid = dpif_port_get_pid(ctx->xbridge->dpif, odp_port,
  flow_hash_5tuple(>xin->flow, 0));
-int cookie_offset = odp_put_userspace_action(pid, cookie, cookie_size,
+size_t cookie_offset = odp_put_userspace_action(pid, cookie, cookie_size,
  tunnel_out_port,
  include_actions,
  ctx->odp_actions);
-- 
1.9.5.msysgit.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [dpdk-dev] About : Enable optional dequeue zero copy for vHost User

2018-01-23 Thread Chen, Junjie J

Hi
Which version of dpdk you are using? I have some fixes for dequeue zero copy 
and now in 18.02-rc1, you can try 18.02-r1.

Cheers
JJ

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Loftus, Ciara
> Sent: Tuesday, January 23, 2018 6:46 PM
> To: liyang07 
> Cc: d...@dpdk.org; d...@openvswitch.org
> Subject: Re: [dpdk-dev] About : Enable optional dequeue zero copy for vHost
> User
> 
> Hi,
> 
> For the meantime this feature is proposed as ‘experimental’ for OVS DPDK.
> Unless you are transmitting to a NIC, you don’t need to set the n_txq_desc.
> My testing has been only with a DPDK driver in the guest. Have you tried that
> option?
> 
> Thanks,
> Ciara
> 
> From: liyang07 [mailto:liyan...@corp.netease.com]
> Sent: Wednesday, January 17, 2018 10:41 AM
> To: Loftus, Ciara 
> Cc: d...@dpdk.org
> Subject: About : Enable optional dequeue zero copy for vHost User
> 
> 
> Hi Ciara,
> 
> I am tesing the function of "vHost dequeue zero copy" for vm2vm on a
> host, and I have some problems:
> 
> 1. The networking is OK before run iperf, I can ping successed from vm1
> to vm2，but after run iperf, the networking between vm1 and vm2 is down;(I
> think n_txq_desc cause the problem)
> 
> 2. I know the limitation about n_txq_desc, but I cannot set the
> n_txq_desc for dpdkport while the vms on the same host, because there is
> no dpdkports work fow my testing;
> 
> 
> 
> Thus, how can I resolve it, thanks.
> 
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] mbuf pool sizing

2018-01-23 Thread Kevin Traynor

On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
> Hi,
> 
> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a single 
> mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be created and shared 
> by all the ports
> 
> With the OVS2.9 mempool patches, we have port specific allocation and the 
> number of mbufs created for each port is based on the following formula (with 
> a lower limit of MIN_NB_MBUF = 4096*4)
>n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>   + dev->requested_n_txq * dev->requested_txq_size
>   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>   + MIN_NB_MBUF;
> 
> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for 
> requested_rxq_size and requested_txq_size, the above translates to
>   n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
> 
> Assuming all ports have the same MTU, this means that approximately 13 ports 
> in OVS2.9 will consume as much memory as the single mempool shared by all 
> ports in OVS2.8 (256*1024 / 20512) .
> 
> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the 
> memory set aside for OVS may be insufficient. I'm not sure if this aspect has 
> been discussed previously and wanted to bring this up for discussion.
> 

Hi Pradeep, I don't think it has been discussed. I guess the thinking
was that with a giant shared mempool, it was over provisioning when
there was a few ports, and in the case where there was a lot of ports
there could be some starvation at run time. It also meant if you had a
mix of different MTU's you had multiple giant shared mempools and could
run out of memory very quickly at config or run time then also.

So I can see the argument for having a mempool per port, as it is more
fine grained and if you are going to run short of memory, it will at
least be at config time. The problem is if you give some over provision
to each port and you have a lot of ports, you hit the situation you are
seeing.

I think some amount of over provision per port is needed because you
don't want to be cutting it so fine that you run into memory issues at
run time about local mbuf caches on cores running out, or even if
someone used dpdk rings to send the mbuf somewhere else for a time.
There may be other corner cases too. Perhaps as compromise the min size
could be reduce from 4096*4 to 4096*2 or 4096.

Thoughts?

Kevin.

> Regards,
> 
> Pradeep
> 
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/1] netdev-dpdk: Fix requested MTU size validation.

2018-01-23 Thread Stokes, Ian

> -Original Message-
> From: Stokes, Ian
> Sent: Monday, January 22, 2018 6:27 PM
> To: d...@openvswitch.org
> Cc: Stokes, Ian ; Kavanagh, Mark B
> 
> Subject: [PATCH v4 1/1] netdev-dpdk: Fix requested MTU size validation.
> 

Hi all, thanks for the work reviewing this,

I've applied it to the DPDK merge branch, I'll also apply it for backporting on 
OVS 2.9, 2.8 and 2.7 as the issue is present in those. We can also apply it to 
2.6 but will have to submit a new patch as the documentation format changed 
between 2.6 and 2.7.

Thanks
Ian
> This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
> in netdev_dpdk_set_mtu(), in order to determine if the total length of the
> L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.
> 
> When setting an MTU we first check if the requested total frame length
> (which includes associated L2 overhead) will exceed the maximum frame
> length supported in netdev_dpdk_set_mtu(). The frame length is calculated
> by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU for the
> device will be set at a later stage in dpdk_eth_dev_init() using
> rte_eth_dev_set_mtu(mtu).
> 
> However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
> that the frame does not exceed the max frame length for that device varies
> between DPDK device drivers. For example ixgbe driver calculates the frame
> length for a given MTU as
> 
> mtu + ETHER_HDR_LEN + ETHER_CRC_LEN
> 
> i40e driver calculates it as
> 
> mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2
> 
> em driver calculates it as
> 
> mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE
> 
> Currently it is possible to set an MTU for a netdev_dpdk device that
> exceeds the upper limit MTU for that devices DPDK driver. This leads to a
> segfault.
> This is because the frame length comparison as is, does not take into
> account the addition of the vlan tag overhead expected in the drivers. The
> netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
> dpdk_eth_dev_init() will fail before the queues have been created for the
> DPDK device. This coupled with assumptions regarding reconfiguration
> requirements for the netdev will lead to a segfault when the rxq is polled
> for this device.
> 
> A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
> validating a requested MTU in netdev_dpdk_set_mtu().
> MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:
> 
> mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)
> 
> By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS now
> takes into account the maximum L2 overhead that a DPDK driver could allow
> for in its frame size calculation. This allows OVS to flag an error rather
> than the DPDK driver if the frame length exceeds the max DPDK frame
> length. OVS can fail gracefully at this point and use the default MTU of
> 1500 to continue to configure the port.
> 
> Note: this fix is a work around, a better approach would be if DPDK
> devices could report the maximum MTU value that can be requested on a per
> device basis. This capability however is not currently available. A
> downside of this patch is that the MTU upper limit will be reduced by 8
> bytes for DPDK devices that do not need to account for vlan tags in the
> frame length driver calculations e.g. ixgbe devices upper MTU limit is
> reduced from the OVS point of view from 9710 to 9702.
> 
> CC: Mark Kavanagh 
> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
> Signed-off-by: Ian Stokes 
> Co-authored-by: Mark Kavanagh 
> Signed-off-by: Mark Kavanagh 
> Acked-by: Flavio Leitner 
> 
> ---
> v3->v4
> * Add co-authored tags for Mark Kavanagh for commit message rework.
> * Specify 'L2 frame length' throughout commit message, documentation and
>   code instead of 'framse size'.
> 
> v2->v3
> * Remove RFC status.
> * Remove extra white space in documentation.
> * Correct dpdk_eth_dev_init() & rte_eth_dev_set_mtu() in commit message
>   and comments.
> * Use l2 frame calculation in limitation notes.
> * Misc minor grammar corrections.
> * Use 'default' instead of 'fallback' in commit message regarding MTU
>   1500.
> * Rephrase opening comment in netdev-dpdk.c set mtu code.
> * Add acked from Flavio Leitner.
> 
> v1->v2
> * Add note to limitations in DPDK documentation regarding MTU.
> * Correct MTU calculation in commit message.
> * Flag that we provision 8 bytes (2 x vlan header) by using
>   MTU_TO_MAX_FRAME_LEN in commit message and code comments.
> ---
>  Documentation/intro/install/dpdk.rst |   12 
>  lib/netdev-dpdk.c|   13 -
>  2 files changed, 24 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index

Re: [ovs-dev] [PATCH v4] vswitchd: show DPDK version

2018-01-23 Thread Kevin Traynor

On 01/15/2018 06:21 PM, Matteo Croce wrote:
> Show DPDK version if Open vSwitch is compiled with DPDK support.
> Version can be retrieved with `ovs-switchd --version` or from OVS logs.
> Small change in ovs-ctl to avoid breakage on output change.
> 
> Signed-off-by: Matteo Croce 

LGTM

Acked-by: Kevin Traynor 

> ---
> v3:
>  print version in OVS logs too
>  don't save the version in the DB
>  use dpdk-stub.c instead of #ifdef
> 
> v4:
>  fix only patch subject, I left ovs-vsctl from the older v1/v2, sorry
> 
>  lib/dpdk-stub.c | 5 +
>  lib/dpdk.c  | 8 
>  lib/dpdk.h  | 1 +
>  utilities/ovs-ctl.in| 2 +-
>  vswitchd/ovs-vswitchd.c | 1 +
>  5 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
> index 36021807c..041cd0cbb 100644
> --- a/lib/dpdk-stub.c
> +++ b/lib/dpdk-stub.c
> @@ -54,3 +54,8 @@ dpdk_vhost_iommu_enabled(void)
>  {
>  return false;
>  }
> +
> +void
> +print_dpdk_version(void)
> +{
> +}
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 6710d10fc..3f5a55fc1 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -24,6 +24,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #ifdef DPDK_PDUMP
>  #include 
>  #include 
> @@ -471,6 +472,7 @@ dpdk_init(const struct smap *ovs_other_config)
>  static struct ovsthread_once once_enable = 
> OVSTHREAD_ONCE_INITIALIZER;
>  
>  if (ovsthread_once_start(_enable)) {
> +VLOG_INFO("Using %s", rte_version());
>  VLOG_INFO("DPDK Enabled - initializing...");
>  dpdk_init__(ovs_other_config);
>  enabled = true;
> @@ -501,3 +503,9 @@ dpdk_set_lcore_id(unsigned cpu)
>  ovs_assert(cpu != NON_PMD_CORE_ID);
>  RTE_PER_LCORE(_lcore_id) = cpu;
>  }
> +
> +void
> +print_dpdk_version(void)
> +{
> +puts(rte_version());
> +}
> diff --git a/lib/dpdk.h b/lib/dpdk.h
> index dc58d968a..b04153591 100644
> --- a/lib/dpdk.h
> +++ b/lib/dpdk.h
> @@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
>  void dpdk_set_lcore_id(unsigned cpu);
>  const char *dpdk_get_vhost_sock_dir(void);
>  bool dpdk_vhost_iommu_enabled(void);
> +void print_dpdk_version(void);
>  
>  #endif /* dpdk.h */
> diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
> index 1df56c4a5..ef06dd967 100755
> --- a/utilities/ovs-ctl.in
> +++ b/utilities/ovs-ctl.in
> @@ -72,7 +72,7 @@ set_hostname () {
>  set_system_ids () {
>  set ovs_vsctl set Open_vSwitch .
>  
> -OVS_VERSION=`ovs-vswitchd --version | sed 's/.*) //;1q'`
> +OVS_VERSION=`ovs-vswitchd --version | awk '/Open vSwitch/{print $NF}'`
>  set "$@" ovs-version="$OVS_VERSION"
>  
>  case $SYSTEM_ID in
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index d5e07c037..a5e8f4d39 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -187,6 +187,7 @@ parse_options(int argc, char *argv[], char 
> **unixctl_pathp)
>  
>  case 'V':
>  ovs_print_version(0, 0);
> +print_dpdk_version();
>  exit(EXIT_SUCCESS);
>  
>  case OPT_MLOCKALL:
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] About : Enable optional dequeue zero copy for vHost User

2018-01-23 Thread Loftus, Ciara

Hi,

For the meantime this feature is proposed as ‘experimental’ for OVS DPDK.
Unless you are transmitting to a NIC, you don’t need to set the n_txq_desc.
My testing has been only with a DPDK driver in the guest. Have you tried that 
option?

Thanks,
Ciara

From: liyang07 [mailto:liyan...@corp.netease.com]
Sent: Wednesday, January 17, 2018 10:41 AM
To: Loftus, Ciara 
Cc: d...@dpdk.org
Subject: About : Enable optional dequeue zero copy for vHost User


Hi Ciara,

I am tesing the function of "vHost dequeue zero copy" for vm2vm on a host, 
and I have some problems:

1. The networking is OK before run iperf, I can ping successed from vm1 to 
vm2，but after run iperf, the networking between vm1 and vm2 is down;(I think 
n_txq_desc cause the problem)

2. I know the limitation about n_txq_desc, but I cannot set the n_txq_desc 
for dpdkport while the vms on the same host, because there is no dpdkports work 
fow my testing;



Thus, how can I resolve it, thanks.




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v9] netdev-dpdk: Add support for vHost dequeue zero copy (experimental)

2018-01-23 Thread Stokes, Ian

> -Original Message-
> From: Loftus, Ciara
> Sent: Friday, January 19, 2018 5:19 PM
> To: d...@openvswitch.org
> Cc: Loftus, Ciara ; Stokes, Ian
> ; jan.scheur...@ericsson.com; ktray...@redhat.com;
> i.maxim...@samsung.com
> Subject: [PATCH v9] netdev-dpdk: Add support for vHost dequeue zero copy
> (experimental)
> 

Hi All, any final comments on this?

If all issues have been addressed then I'll merge this to DPDK merge today.

Thanks
Ian

> Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
> option to 'true' when configuring the Interface:
> 
> ovs-vsctl set Interface dpdkvhostuserclient0
> options:vhost-server-path=/tmp/dpdkvhostuserclient0
> options:dq-zero-copy=true
> 
> When packets from a vHost device with zero copy enabled are destined for a
> single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must
> be set to a smaller value. 128 is recommended. This can be achieved like
> so:
> 
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> 
> Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
> to should not exceed 128. Due to this requirement, the feature is
> considered 'experimental'.
> 
> Testing of the patch showed a 15% improvement when switching 512B packets
> between vHost devices on different VMs on the same host when zero copy was
> enabled on the transmitting device.
> 
> Signed-off-by: Ciara Loftus 
> ---
> v9:
> * Rebase
> * Fix docs issue
> * Move variable asignment inside mutex
> * Reset dq-zero-copy value if vhost_driver_register fails
> 
>  Documentation/intro/install/dpdk.rst |  2 +
>  Documentation/topics/dpdk/vhost-user.rst | 73
> 
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 11 -
>  vswitchd/vswitch.xml | 11 +
>  5 files changed, 97 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index 3fecb5c..087eb88 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -518,6 +518,8 @@ The above command sets the number of rx queues for
> DPDK physical interface.
>  The rx queues are assigned to pmd threads on the same NUMA node in a
> round-robin fashion.
> 
> +.. _dpdk-queues-sizes:
> +
>  DPDK Physical Port Queue Sizes
>  ~~~
> 
> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 8447e2d..95517a6 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -458,3 +458,76 @@ Sample XML
>  
> 
>  .. _QEMU documentation: http://git.qemu-
> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy (experimental)
> +---
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy
> +operation must be used to copy that packet from guest address space to
> +host address space. This memcpy can be removed by enabling dequeue zero-
> copy like so::
> +
> +$ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \
> +dpdkvhostuserclient0 type=dpdkvhostuserclient \
> +options:vhost-server-path=/tmp/dpdkvhostclient0 \
> +options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is
> +passed to the host, instead of a copy of the packet. Removing this
> +memcpy can give a performance improvement for some use cases, for
> +example switching large packets between different VMs. However additional
> packet loss may be observed.
> +
> +Note that the feature is disabled by default and must be explicitly
> +enabled by setting the ``dq-zero-copy`` option to ``true`` while
> +specifying the ``vhost-server-path`` option as above. If you wish to
> +split out the command into multiple commands as below, ensure
> +``dq-zero-copy`` is set before
> +``vhost-server-path``::
> +
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-
> copy=true
> +$ ovs-vsctl set Interface dpdkvhostuserclient0 \
> +options:vhost-server-path=/tmp/dpdkvhostclient0
> +
> +The feature is only available to ``dpdkvhostuserclient`` port types.
> +
> +A limitation exists whereby if packets from a vHost port with
> +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number
> +of tx descriptors (``n_txq_desc``) for that port must be reduced to a
> +smaller number,
> +128 being the recommended value. This can be achieved by issuing the
> +following
> +command::
> +
> +$ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +
> +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will
> +send to should not exceed 128. For example, in case of a bond over two
> +physical ports in balance-tcp

[ovs-dev] OVS+DPDK: deadlock and race condtion, which leads OVS deadlock or crash

2018-01-23 Thread 王志克

Hi Yuanha and all,

Thanks for review. I change the title and let more people involved.

The issues can be reproduced easily with OVS. 
The steps are below:
1) virsh start vm
2) ovs-vsctl add port
3) delete the port via ovs-vsctl del-port.
4) shutdown VM via vrish destroy.

I run a script to repeatly do such job for 2 VM, and it needs about 1 hour to 
reproduce it. 

However, I found there are more issues (I can reproduce it):

1) see below code
static void
vhost_user_read_cb(int connfd, void *dat, int *remove)
{
struct vhost_user_connection *conn = dat;
struct vhost_user_socket *vsocket = conn->vsocket;
int ret;

ret = vhost_user_msg_handler(conn->vid, connfd);
if (ret < 0) {
close(connfd);
*remove = 1;
vhost_destroy_device(conn->vid);

pthread_mutex_lock(>conn_mutex);
TAILQ_REMOVE(>conn_list, conn, next);
pthread_mutex_unlock(>conn_mutex);

free(conn);
 -> [Zhike Wang] we can see that the 
vsocket does not exist in conn_list or reconn_list at this short time slot. So 
in this time, if rte_vhost_driver_unregister() is called, vsocket is freed. So 
below vhost_user_create_client() may use raw pointer, and lead to crash.

if (vsocket->reconnect)
vhost_user_create_client(vsocket);
}
}

2) regarding the above issue, I think we need one mutex to lock conn_list and 
reconn_list, so the vsocket can in either conn_list or reconn_list, not both or 
none. So I use vhost_user.mutex() 
diff --git a/dpdk-16.11.3/lib/librte_vhost/fd_man.c 
b/dpdk-16.11.3/lib/librte_vhost/fd_man.c
@@ -233,6 +234,7 @@
poll(pfdset->rwfds, numfds, 1000 /* millisecs */);

need_shrink = 0;
+pthread_mutex_lock(_user.mutex);
for (i = 0; i < numfds; i++) {
pthread_mutex_lock(>fd_mutex);

@@ -264,7 +266,10 @@
rcb(fd, dat, );
if (wcb && pfd->revents & (POLLOUT | FDPOLLERR))
wcb(fd, dat, );
+
+   pthread_mutex_lock(>fd_mutex);
pfdentry->busy = 0;
+   pthread_mutex_unlock(>fd_mutex);
/*
 * fdset_del needs to check busy flag.
 * We don't allow fdset_del to be called in callback
@@ -285,5 +290,6 @@

if (need_shrink)
fdset_shrink(pfdset);
+pthread_mutex_unlock(_user.mutex);


@@ -390,6 +395,8 @@ struct vhost_user_reconnect_list {
struct vhost_user_reconnect *reconn, *next;

while (1) {
+
+   pthread_mutex_lock(_user.mutex);
pthread_mutex_lock(_list.mutex);

/*
@@ -417,11 +424,18 @@ struct vhost_user_reconnect_list {
"%s: connected\n", reconn->vsocket->path);
vhost_user_add_connection(reconn->fd, reconn->vsocket);
 remove_fd:

pthread_mutex_unlock(_list.mutex);
+   pthread_mutex_unlock(_user.mutex);
sleep(1);
}

Then I met a deadlock (another thread enters rte_vhost_driver_unregister, and 
waiting for vhost_user.mutex.) 
(gdb) bt
#0  0x7febfaf8ebcd in poll () from /lib64/libc.so.6
#1  0x7fec029b6836 in poll (__timeout=, __nfds=2, 
__fds=0x7febe80008c0) at /usr/include/bits/poll2.h:46
#2  time_poll (pollfds=pollfds@entry=0x7febe80008c0, n_pollfds=2, 
handles=handles@entry=0x0, timeout_when=4098990819, 
elapsed=elapsed@entry=0x7febf7450290) at lib/timeval.c:305
#3  0x7fec0299afea in poll_block () at lib/poll-loop.c:364
#4  0x7fec0297da25 in ovsrcu_synchronize () at lib/ovs-rcu.c:231
#5  0x7fec029e9477 in destroy_device (vid=) at 
lib/netdev-dpdk.c:2651
#6  0x7fec0195db04 in vhost_destroy_device (vid=) at 
/lib/librte_vhost/vhost.c:277
#7  0x7fec0195cbdc in vhost_user_read_cb (connfd=, 
dat=0x7febc80008c0, remove=0x7febf7451420) at /lib/librte_vhost/socket.c:269
#8  0x7fec0195c200 in fdset_event_dispatch 
(pfdset=pfdset@entry=0x7fec01b70180 ) at 
/lib/librte_vhost/fd_man.c:266
#9  0x7fec0195d647 in rte_vhost_driver_session_start () at 
/lib/librte_vhost/socket.c:714
#10 0x7fec029e661b in start_vhost_loop (dummy=) at 
lib/netdev-dpdk.c:2736
#11 0x7fec0297f796 in ovsthread_wrapper (aux_=) at 
lib/ovs-thread.c:348
#12 0x7febfb9b6dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x7febfaf9921d in clone () from /lib64/libc.so.6
(gdb) n


So now I believe it is hard to fix the issues insides the modules, and I would 
like to present identified issues, and want to hear your proposal or fix.

Br,
Zhike Wang



-Original Message-
From: Yuanhan Liu [mailto:y...@fridaylinux.org] 
Sent: Thursday, January 18, 2018 10:04 PM

Re: [ovs-dev] [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow

2018-01-23 Thread Stokes, Ian

> Hi,
> Just added then Napatech NIC supported to the list.
> 

Thanks Finn. I'll be reviewing this series today.

Thanks
Ian

> Finn
> 
> ---
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 40f9d9649..442848870 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -727,3 +727,20 @@ devices to bridge ``br0``. Once complete, follow the
> below steps:
> Check traffic on multiple queues::
> 
> $ cat /proc/interrupts | grep virtio
> +
> +.. _dpdk-flow-hardware-offload:
> +
> +Flow Hardware Offload (Experimental)
> +
> +
> +The flow hardware offload is disabled by default and can be enabled by::
> +
> +$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> +
> +So far only partial flow offload is implemented. Moreover, it only
> +works with PMD drivers have the rte flow action "MARK + RSS" support.
> +
> +The validated NICs are:
> +
> +- Mellanox (ConnectX-4, ConnectX-4 Lx, ConnectX-5)
> +- Napatech (NT200B01)
> diff --git a/NEWS b/NEWS
> index d7c83c21e..b9f9bd415 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -44,6 +44,7 @@ v2.9.0 - xx xxx 
> - DPDK:
>   * Add support for DPDK v17.11
>   * Add support for vHost IOMMU
> + * Add experimental flow hardware offload support
>   * New debug appctl command 'netdev-dpdk/get-mempool-info'.
>   * All the netdev-dpdk appctl commands described in ovs-vswitchd man
> page.
>   * Custom statistics:
> 
> 
> 
> 
> >-Original Message-
> >From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> >Sent: 17. januar 2018 11:02
> >To: Stokes, Ian 
> >Cc: d...@openvswitch.org; Finn Christensen ; Darrell
> >Ball ; Chandran, Sugesh ;
> >Simon Horman 
> >Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> >
> >On Mon, Jan 15, 2018 at 05:28:18PM +, Stokes, Ian wrote:
> >> > Hi,
> >> >
> >> > Here is a joint work from Mellanox and Napatech, to enable the flow
> >> > hw offload with the DPDK generic flow interface (rte_flow).
> >> >
> >> > The basic idea is to associate the flow with a mark id (a unit32_t
> >> > number).
> >> > Later, we then get the flow directly from the mark id, which could
> >> > bypass some heavy CPU operations, including but not limiting to
> >> > mini flow extract, emc lookup, dpcls lookup, etc.
> >> >
> >> > The association is done with CMAP in patch 1. The CPU workload
> >> > bypassing is done in patch 2. The flow offload is done in patch 3,
> >> > which mainly does two things:
> >> >
> >> > - translate the ovs match to DPDK rte flow patterns
> >> > - bind those patterns with a RSS + MARK action.
> >> >
> >> > Patch 5 makes the offload work happen in another thread, for
> >> > leaving the datapath as light as possible.
> >> >
> >> > A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999)
> >> > and
> >> > 1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more
> >> > than 260% performance boost.
> >> >
> >> > Note that it's disabled by default, which can be enabled by:
> >> >
> >> > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> >>
> >> Hi Yuanhan,
> >>
> >> Thanks for working on this, I'll be looking at this over the coming
> >> week so
> >don't consider this a full review.
> >>
> >> Just a general comment, at first glance there doesn't seem to be any
> >documentation in the patchset?
> >
> >Right, my bad.
> >
> >> I would expect a patch for the DPDK section of the OVS docs at
> >> minimum
> >detailing:
> >>
> >> (i) HWOL requirements (Any specific SW/drivers required or DPDK
> >libs/PMDs etc. that have to be enabled for HWOL).
> >> (ii) HWOL Usage (Enablement and disablement as shown above).
> >> (iii) List of Validated HW devices (As HWOL functionality will vary
> >> between
> >devices it would be good to provide some known 'verified' cards to use
> with it.
> >At this stage we don't have to have every card that it will work with
> >it but I'd like to see a list of NIC Models that this has been validated
> on to date).
> >> (iv) Any Known limitations.
> >>
> >> You'll also have to add an entry to the NEWS document to flag that
> >> HWOL
> >has been introduced to OVS with DPDK.
> >>
> >> As discussed previously on the community call, at this point the
> >> feature
> >should be marked experimental in both NEWS and the documentation, this
> >is just to signify that it is subject to change as more capabilities
> >such as full offload is added over time.
> >
> >For not flooding the mailing list, here I listed the changes I made.
> >Please help review it. Also, please let me know whether I should send
> >out a new version (with the doc change) soon or I should wait for you
> full review.
> >
> >Regarding the validated NICs, probably Finn could add the nics from
> Napatech.
> >
> >---
> >diff --git a/Documentation/howto/dpdk.rst
>

Re: [ovs-dev] [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow

2018-01-23 Thread Finn Christensen

Hi,
Just added then Napatech NIC supported to the list.

Finn

---
diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 40f9d9649..442848870 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -727,3 +727,20 @@ devices to bridge ``br0``. Once complete, follow the below 
steps:
Check traffic on multiple queues::
 
$ cat /proc/interrupts | grep virtio
+
+.. _dpdk-flow-hardware-offload:
+
+Flow Hardware Offload (Experimental)
+
+
+The flow hardware offload is disabled by default and can be enabled by::
+
+$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
+
+So far only partial flow offload is implemented. Moreover, it only
+works with PMD drivers have the rte flow action "MARK + RSS" support.
+
+The validated NICs are:
+
+- Mellanox (ConnectX-4, ConnectX-4 Lx, ConnectX-5)
+- Napatech (NT200B01)
diff --git a/NEWS b/NEWS
index d7c83c21e..b9f9bd415 100644
--- a/NEWS
+++ b/NEWS
@@ -44,6 +44,7 @@ v2.9.0 - xx xxx 
- DPDK:
  * Add support for DPDK v17.11
  * Add support for vHost IOMMU
+ * Add experimental flow hardware offload support
  * New debug appctl command 'netdev-dpdk/get-mempool-info'.
  * All the netdev-dpdk appctl commands described in ovs-vswitchd man page.
  * Custom statistics:




>-Original Message-
>From: Yuanhan Liu [mailto:y...@fridaylinux.org]
>Sent: 17. januar 2018 11:02
>To: Stokes, Ian 
>Cc: d...@openvswitch.org; Finn Christensen ; Darrell Ball
>; Chandran, Sugesh ;
>Simon Horman 
>Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
>
>On Mon, Jan 15, 2018 at 05:28:18PM +, Stokes, Ian wrote:
>> > Hi,
>> >
>> > Here is a joint work from Mellanox and Napatech, to enable the flow
>> > hw offload with the DPDK generic flow interface (rte_flow).
>> >
>> > The basic idea is to associate the flow with a mark id (a unit32_t
>> > number).
>> > Later, we then get the flow directly from the mark id, which could
>> > bypass some heavy CPU operations, including but not limiting to mini
>> > flow extract, emc lookup, dpcls lookup, etc.
>> >
>> > The association is done with CMAP in patch 1. The CPU workload
>> > bypassing is done in patch 2. The flow offload is done in patch 3,
>> > which mainly does two things:
>> >
>> > - translate the ovs match to DPDK rte flow patterns
>> > - bind those patterns with a RSS + MARK action.
>> >
>> > Patch 5 makes the offload work happen in another thread, for leaving
>> > the datapath as light as possible.
>> >
>> > A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
>> > 1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more
>> > than 260% performance boost.
>> >
>> > Note that it's disabled by default, which can be enabled by:
>> >
>> > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>>
>> Hi Yuanhan,
>>
>> Thanks for working on this, I'll be looking at this over the coming week so
>don't consider this a full review.
>>
>> Just a general comment, at first glance there doesn't seem to be any
>documentation in the patchset?
>
>Right, my bad.
>
>> I would expect a patch for the DPDK section of the OVS docs at minimum
>detailing:
>>
>> (i) HWOL requirements (Any specific SW/drivers required or DPDK
>libs/PMDs etc. that have to be enabled for HWOL).
>> (ii) HWOL Usage (Enablement and disablement as shown above).
>> (iii) List of Validated HW devices (As HWOL functionality will vary between
>devices it would be good to provide some known 'verified' cards to use with it.
>At this stage we don't have to have every card that it will work with it but 
>I'd
>like to see a list of NIC Models that this has been validated on to date).
>> (iv) Any Known limitations.
>>
>> You'll also have to add an entry to the NEWS document to flag that HWOL
>has been introduced to OVS with DPDK.
>>
>> As discussed previously on the community call, at this point the feature
>should be marked experimental in both NEWS and the documentation, this is
>just to signify that it is subject to change as more capabilities such as full
>offload is added over time.
>
>For not flooding the mailing list, here I listed the changes I made.
>Please help review it. Also, please let me know whether I should send out a
>new version (with the doc change) soon or I should wait for you full review.
>
>Regarding the validated NICs, probably Finn could add the nics from Napatech.
>
>---
>diff --git a/Documentation/howto/dpdk.rst
>b/Documentation/howto/dpdk.rst index d123819..20a4190 100644
>--- a/Documentation/howto/dpdk.rst
>+++ b/Documentation/howto/dpdk.rst
>@@ -709,3 +709,19 @@ devices to bridge ``br0``. Once complete, follow the
>below steps:
>Check traffic on multiple queues::
>
>$ cat /proc/interrupts | grep virtio
>+
>+.. _dpdk-flow-hardware-offload:
>+
>+Flow

Re: [ovs-dev] [PATCH V3 0/6] Enable OVS on Linux 4.14 kernel

2018-01-23 Thread Justin Pettit

Hi, Greg.  I haven't looked at these in detail, but it doesn't look like NEWS 
or the FAQ supported kernel list was updated.  Would you mind adding those and 
rebasing against the current master?  If you can do that, I'll take a closer 
look and get them merged into master and branch-2.9.

Thanks,

--Justin


> On Dec 8, 2017, at 1:58 PM, Greg Rose  wrote:
> 
> Various fixes and compat layer changes required to enable building
> OVS for the upstream Linux 4.14 kernel.
> 
> The constant changing of the netdev_master_upper_dev_link parameters
> is a real headache.  I couldn't think of any cleaner way to do it
> than the approach I used but I welcome suggestions on how to make
> that code less ugly - because it's ten miles of bad road ugly.
> But it does compile and pass basic checks on all of the currently
> supported kernels.
> 
> There's more fixes for SKB_GSO_UDP - I'm not sure why these don't
> show up for the 4.13 or previous kernels but I think it has to do
> with the recent change to make sure the SKB_GSO_UDP was searched
> as a whole word and thus exposed more fractures.
> 
> I updated .travis.yml to use all the most recent supported LTS
> and stable kernels from kernel.org.
> 
> V2 - Pull in upstream patch for conntrack protocol pointers
> V3 - Fix authors
> 
> Greg Rose (6):
>  datapath: Fix netdev_master_upper_dev_link for 4.14
>  compat: Do not include headers when not compiling
>  datapath: conntrack: make protocol tracker pointers const
>  datapath: Fix SKB_GSO_UDP usage
>  acinclude.m4: Enable Linux 4.14
>  travis: Update kernel test list from kernel.org
> 
> .travis.yml | 17 -
> acinclude.m4|  7 +--
> datapath/conntrack.c|  2 +-
> datapath/datapath.c |  9 ++---
> datapath/linux/compat/include/linux/netdevice.h | 15 ++-
> datapath/linux/compat/ip_gre.c  |  2 +-
> datapath/linux/compat/ip_output.c   |  2 +-
> datapath/linux/compat/stt.c | 11 ++-
> datapath/vport-netdev.c |  9 -
> 9 files changed, 54 insertions(+), 20 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

93 matches

Mail list logo