date:20170905

You could use  pmd-rxq-affinity for the queues you want serviced locally and 
let the others go remote

On 9/5/17, 8:14 PM, "王志克"  wrote:

It is a bit different from my expectation.



I have separate CPU and pmd for each NUMA node. However, the physical NIC 
only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
part queues can be polled with pmd on same node, others can be polled with pmd 
on non-local numa node. In this way, we have more pmds contributes the polling 
of physical NIC, so performance improvement is expected from total rx traffic 
view.



Br,

Wang Zhike



-Original Message-

From: Darrell Ball [mailto:db...@vmware.com] 

Sent: Wednesday, September 06, 2017 10:47 AM

To: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org

Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
port



This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.



commit c37813fdb030b4270d05ad61943754f67021a50d

Author: Billy O'Mahony 

Date:   Tue Aug 1 14:38:43 2017 -0700



dpif-netdev: Assign ports to pmds on non-local numa node.



Previously if there is no available (non-isolated) pmd on the numa node

for a port then the port is not polled at all. This can result in a

non-operational system until such time as nics are physically

repositioned. It is preferable to operate with a pmd on the 'wrong' numa

node albeit with lower performance. Local pmds are still chosen when

available.



Signed-off-by: Billy O'Mahony 

Signed-off-by: Ilya Maximets 

Co-authored-by: Ilya Maximets 





The sentence “The rx queues are assigned to pmd threads on the same NUMA 
node in a round-robin fashion.”



under



DPDK Physical Port Rx Queues¶



should be removed since it is outdated in a couple of ways and there is 
other correct documentation on the same page

and also here 
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_howto_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=iNebKvfYjcXbjMsmtLJqThRUImv8W4PRrYWpD-QwUVg=KG3MmQe4QkUkyG3xsCoF6DakFsZh_eg9aEyhYFUKF2c=
 



Maybe you could submit a patch ?



Thanks Darrell





On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
 wrote:



Hi All,







I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.








https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU=
 



DPDK Physical Port Rx 
Queues¶







$ ovs-vsctl set Interface  options:n_rxq=







The above command sets the number of rx queues for DPDK physical 
interface. The rx queues are assigned to pmd threads on the same NUMA node in a 
round-robin fashion.



Consider below case:







One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in 
NUMA node 0 and 1. There are multiple rx queues configured on the physical NIC. 
We configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since 
the physical NIC locates on NUMA node0, only pmds on same NUMA node can poll 
its rxq. As a result, only two cpu can be used for polling physical NIC.







If we compare the OVS kernel mode, there is no such limitation.







So question:



should we remove “same NUMA node” limitation fro physical port rx 
queues? Or we have other options to improve the performance for this case?

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-05 Thread 王志克

It is a bit different from my expectation.

I have separate CPU and pmd for each NUMA node. However, the physical NIC only 
locates on NUMA socket0. So only part of CPU and pmd (the ones in same NUMA 
node) can poll the physical NIC. Since I have multiple rx queue, I hope part 
queues can be polled with pmd on same node, others can be polled with pmd on 
non-local numa node. In this way, we have more pmds contributes the polling of 
physical NIC, so performance improvement is expected from total rx traffic view.

Br,
Wang Zhike

-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Wednesday, September 06, 2017 10:47 AM
To: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.

commit c37813fdb030b4270d05ad61943754f67021a50d
Author: Billy O'Mahony 
Date:   Tue Aug 1 14:38:43 2017 -0700

dpif-netdev: Assign ports to pmds on non-local numa node.

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony 
Signed-off-by: Ilya Maximets 
Co-authored-by: Ilya Maximets 


The sentence “The rx queues are assigned to pmd threads on the same NUMA node 
in a round-robin fashion.”

under

DPDK Physical Port Rx Queues¶

should be removed since it is outdated in a couple of ways and there is other 
correct documentation on the same page
and also here http://docs.openvswitch.org/en/latest/howto/dpdk/

Maybe you could submit a patch ?

Thanks Darrell


On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
 wrote:

Hi All,



I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.




https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU=
 

DPDK Physical Port Rx 
Queues¶



$ ovs-vsctl set Interface  options:n_rxq=



The above command sets the number of rx queues for DPDK physical interface. 
The rx queues are assigned to pmd threads on the same NUMA node in a 
round-robin fashion.

Consider below case:



One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in 
NUMA node 0 and 1. There are multiple rx queues configured on the physical NIC. 
We configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since 
the physical NIC locates on NUMA node0, only pmds on same NUMA node can poll 
its rxq. As a result, only two cpu can be used for polling physical NIC.



If we compare the OVS kernel mode, there is no such limitation.



So question:

should we remove “same NUMA node” limitation fro physical port rx queues? 
Or we have other options to improve the performance for this case?



Br,

Wang Zhike



___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=Whz73vLTYWkBuEL6reD88bkzCgSfqpgb7MDiCG5fB4A=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath-windows: Increment ct packet counters based on ct_state.

2017-09-05 Thread Shashank Ram

Comments inline.




Subject: [ovs-dev] [PATCH] datapath-windows: Increment ct packet counters   
based on ct_state.

For a given packet, packet counters in conntrack should be accounted only
once, even if the packet is processed multiple times by conntrack.

When a packet is processed by conntrack, ct_state flag is set to
OVS_CS_F_TRACKED. Use this state to identify if a packet has been
processed previously by conntrack.

Also update the ct packet counters when ct entry is created.

With this patch, the conntrack's packet counters behavior is similar
to linux

Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Conntrack.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 8bcda05..0adb6d5 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -169,6 +169,18 @@ OvsPostCtEventEntry(POVS_CT_ENTRY entry, UINT8 type)
 OvsPostCtEvent();
 }

+static __inline VOID
+OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST 
nbl)
Function parameters should be on separate lines

+{
+if (reply) {
+entry->rev_key.byteCount+= OvsPacketLenNBL(nbl);
+entry->rev_key.packetCount++;
+} else {
+entry->key.byteCount += OvsPacketLenNBL(nbl);
+entry->key.packetCount++;
+}
+}
+
 static __inline BOOLEAN
 OvsCtAddEntry(POVS_CT_ENTRY entry, OvsConntrackKeyLookupCtx *ctx,
   PNAT_ACTION_INFO natInfo, UINT64 now)
@@ -287,6 +299,9 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 }

 OvsCtUpdateFlowKey(key, state, ctx->key.zone, 0, NULL);
+if (entry) {
+OvsCtIncrementCounters(entry, ctx->reply, curNbl);
+}
 return entry;
 }

@@ -382,18 +397,6 @@ OvsCtKeyAreSame(OVS_CT_KEY ctxKey, OVS_CT_KEY entryKey)
 (ctxKey.zone == entryKey.zone));
 }

-static __inline VOID
-OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST 
nbl)
-{
-if (reply) {
-entry->rev_key.byteCount+= OvsPacketLenNBL(nbl);
-entry->rev_key.packetCount++;
-} else {
-entry->key.byteCount += OvsPacketLenNBL(nbl);
-entry->key.packetCount++;
-}
-}
-
 POVS_CT_ENTRY
 OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 {
@@ -730,6 +733,12 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
 NdisReleaseRWLock(ovsConntrackLockObj, );
 OVS_LOG_ERROR("Conntrack Limit hit: %lu", ctTotalEntries);
 return NDIS_STATUS_RESOURCES;
+
+/* Increment the counters soon after the lookup, since we set ct.state
+ * to OVS_CS_F_TRACKED after processing the ct entry.
+ */
Comments should be styled as follows
/*
  * 
  */
+if (entry && (!(key->ct.state & OVS_CS_F_TRACKED))) {
+OvsCtIncrementCounters(entry, ctx.reply, curNbl);
 }

 if (!entry) {
@@ -740,7 +749,6 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
  );
 } else {
 /* Process the entry and update CT flags */
-OvsCtIncrementCounters(entry, ctx.reply, curNbl);
 entry = OvsProcessConntrackEntry(fwdCtx, layers->l4Offset, , key,
  zone, natInfo, commit, currentTime,
  );
--
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.

commit c37813fdb030b4270d05ad61943754f67021a50d
Author: Billy O'Mahony 
Date:   Tue Aug 1 14:38:43 2017 -0700

dpif-netdev: Assign ports to pmds on non-local numa node.

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony 
Signed-off-by: Ilya Maximets 
Co-authored-by: Ilya Maximets 


The sentence “The rx queues are assigned to pmd threads on the same NUMA node 
in a round-robin fashion.”

under

DPDK Physical Port Rx Queues¶

should be removed since it is outdated in a couple of ways and there is other 
correct documentation on the same page
and also here http://docs.openvswitch.org/en/latest/howto/dpdk/

Maybe you could submit a patch ?

Thanks Darrell


On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
 wrote:

Hi All,



I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.




https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU=
 

DPDK Physical Port Rx 
Queues¶



$ ovs-vsctl set Interface  options:n_rxq=



The above command sets the number of rx queues for DPDK physical interface. 
The rx queues are assigned to pmd threads on the same NUMA node in a 
round-robin fashion.

Consider below case:



One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in 
NUMA node 0 and 1. There are multiple rx queues configured on the physical NIC. 
We configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since 
the physical NIC locates on NUMA node0, only pmds on same NUMA node can poll 
its rxq. As a result, only two cpu can be used for polling physical NIC.



If we compare the OVS kernel mode, there is no such limitation.



So question:

should we remove “same NUMA node” limitation fro physical port rx queues? 
Or we have other options to improve the performance for this case?



Br,

Wang Zhike



___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=Whz73vLTYWkBuEL6reD88bkzCgSfqpgb7MDiCG5fB4A=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 4/7] netdev-dpdk: implement flow put with rte flow



On 9/5/17, 1:30 AM, "Yuanhan Liu"  wrote:

On Fri, Sep 01, 2017 at 10:38:37PM +, Darrell Ball wrote:
> 
> 
> On 8/29/17, 7:33 PM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 02:02:23AM +, Darrell Ball wrote:
> > 
> > > +#define MAX_RTE_FLOW_ITEMS  100
> > > +#define MAX_RTE_FLOW_ACTIONS100
> > > 
> > > I guess these are temporary
> > 
> > Yes, the hardcoded number is really hacky.
> > 
> > > Do we need to do a rte query during initialization ?
> > 
> > query on what?
> > 
> > [Darrell]
> > I mean somehow the max hardware resources available at
> > dev initialization time ? I realize this is non-trivial overall.
> 
> I see you point then. I don't think it's needed then. I'm also not
> aware of there are such limitations existed (say, how many patterns
> are supported).  I think we just add patterns as many as we can and
> let the driver to figure out the rest. If the flow creation is failed,
> we skip the hw offload, with an error message provided.
> 
> [Darrell]
> I understand the present intention.
> But for future enhancements, maybe it would be good to display the max 
capability/capacity and
> remaining capacity to the user in some way.

Agreed, and that's also what in my mind. It's some work in DPDK though.

> This brings back another discussion point: having user specification of 
HWOL flows
> is starting to look more useful,

Are you introducing some new (CLI) interfaces? Could you give a bit more
detailes here?

[Darrell] 
Maybe something wild like this with a special dp only action ‘in_queue’ or 
‘hwol_in_queue’ ?
ovs-appctl dpctl/mod-flow 
"in_port(1),eth(),eth_type(0x800),ipv4(src=1.1.1.1,dst=2.2.2.2)" in_queue(3)

It feels weird to do this as a full Openflow extension since it is special HW 
centric, but maybe that is also
a yet more wild possibility. It would be better to fix the NICs/drivers than to 
go here.


> as it helps the queue action issue and HWOL
> capacity planning/predictability for high value flows.

Yes, I would think so.

--yliu
>  
> > > static (inline) function maybe ?
> > 
> > Indeed. I'm not a big fan of macro like this. Let me turn it to 
function
> > next time. I see no reason to make it inline (explicitly) 
though: it's
> > not in data path. Moreover, it's likely it will be inlined 
implicitly by
> > compiler.
> > 
> > [Darrell]
> > I put ‘(inline)’ in parentheses because we would never specify it 
explicitly,
> > since gcc would likely inline anyways. 
> 
> I see. Sorry for misunderstanding.
> 
>   --yliu
> 
> 
> 
> 




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-05 Thread 王志克

Hi All,

I read below doc about pmd assignment for physical port. I think the limitation 
“on the same NUMA node” may be not efficient.

http://docs.openvswitch.org/en/latest/intro/install/dpdk/
DPDK Physical Port Rx 
Queues¶

$ ovs-vsctl set Interface  options:n_rxq=

The above command sets the number of rx queues for DPDK physical interface. The 
rx queues are assigned to pmd threads on the same NUMA node in a round-robin 
fashion.
Consider below case:

One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in NUMA 
node 0 and 1. There are multiple rx queues configured on the physical NIC. We 
configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since the 
physical NIC locates on NUMA node0, only pmds on same NUMA node can poll its 
rxq. As a result, only two cpu can be used for polling physical NIC.

If we compare the OVS kernel mode, there is no such limitation.

So question:
should we remove “same NUMA node” limitation fro physical port rx queues? Or we 
have other options to improve the performance for this case?

Br,
Wang Zhike

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [PATCH v2] ovn: Support for taas(tap-as-a-service) function

2017-09-05 Thread wang . qianyu

Mirror port only receives mirrored packets. I would remove the flag, and 
uses inport and outport to mark the mirrored packets.

Thanks.

Russell Bryant 
2017/09/06 02:57

收件人：wang.qia...@zte.com.cn, 
抄送：  Gao Zhenyu , ovs dev 
, xurong00037997 , 
zhou.huij...@zte.com.cn
主题：  Re: 答复: Re: 答复: Re: [ovs-dev] 答复: Re: [PATCH v2] 
ovn: Support for taas(tap-as-a-service) function

What if a mirror port *only* receives mirrored packets?  If the only 
packets it ever receives are mirrored packets, a new flag would not be 
necessary.

Do you intend for the port to operate as both a regular port *and* to 
receive a mirror of traffic  for another port?

On Thu, Aug 24, 2017 at 10:31 PM,  wrote:
I know your mean. 
The receiver need to distinguish the traffic is regular or mirror. This 
may need some special flow table to deal with it. 

Thanks 

Gao Zhenyu  
2017/08/25 10:12 

收件人：wang.qia...@zte.com.cn, 
抄送：ovs dev , Russell Bryant <
russ...@ovn.org>, xurong00037997 , 
zhou.huij...@zte.com.cn 
主题：Re: 答复: Re: [ovs-dev] 答复: Re: [PATCH v2] ovn: 
Support for taas(tap-as-a-service) function

I mean for regular packet, ovs should not add the geneve option,  the new 
geneve option is only for mirror traffic.

Did you meant some mirror traffic has mirror flag and some would not have? 

Thanks 
Zhenyu Gao 

2017-08-25 9:44 GMT+08:00 : 
Hi zhenyu, 
Thanks for your opinion. 
The mirror flag is not always exist, so I do not think add a new geneve 
option is a good idea. 

Thanks. 

Gao Zhenyu  
2017/08/25 09:34 

收件人：wang.qia...@zte.com.cn, 
抄送：Russell Bryant , ovs dev <
d...@openvswitch.org>, zhou.huij...@zte.com.cn, xurong00037997 <
xu.r...@zte.com.cn> 
主题：Re: [ovs-dev] 答复: Re: [PATCH v2] ovn: Support for 
taas(tap-as-a-service) function

Although adding a new geneve option is more complicate but I think it 
still worth having that. 
Once the destination chassis found that geneve option, it can tag the 
mirror flag on packet. And it make the whole process looks same no matter 
on same chassis or not.

Thanks 
Zhenyu Gao 

2017-08-25 9:15 GMT+08:00 : 
Hi Russell,

Thanks for your review.

When the mirror destination is in other chassis, the mirror flag which
used to mark the packet need be transmitted to the destination chassis.

We could use the inport, geneve option or new type of out port to indicate
the packet as a mirrored packet.

When we use inport to indicate the flag, this may need use inport as the
match field in the egress pipeline, I think this may conflict with the
egress pipeline.

If use geneve option to deliver the mirror flag, this may be more
complicated. So, I add a new type of port as the destination of mirror
flow. The port types of mirror and taas corresponding to configurations of
tap-flow and tap-service.

Thanks.

Russell Bryant 
2017/08/25 04:44

收件人：wang.qia...@zte.com.cn,
抄送：  ovs dev , zhou.huij...@zte.com.cn,
xurong00037997 
主题：  Re: [ovs-dev] [PATCH v2] ovn: Support for
taas(tap-as-a-service) function 

Sorry for the delay in getting back to this ...

On Tue, Aug 15, 2017 at 4:28 AM,   wrote:
> Taas was designed to provide tenants and service providers a means of
> monitoring the traffic flowing in their Neutron provisioned virtual
> networks. It is useful for network trouble-shooting, security and
> analytics. The taas presentations could be found from
>
https://github.com/openstack/tap-as-a-service/blob/master/doc/source/presentations.rst

> , and the api reference could be found from
>
https://github.com/openstack/tap-as-a-service/blob/master/API_REFERENCE.rst

>
> To support taas function, this patch add two type of logica_switch_port,
> "mirror" and "taas". port with type "mirror" is used as inport for
monitor
> flow in logica_switch, and port with type "taas" is used as outport for
> monitor flow in logica_switch.
>
> The ovn-controller will make the relations of the ports in tap_service
and
> tap_flow to mirror port and taas port.
>
> Signed-off-by: wang qianyu 

> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 31303a8..5fdd045 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -301,6 +301,20 @@
>
>  A port to a logical switch on a VTEP gateway.
>
> +
> +  mirror
> +  
> +A port indicate the inport of mirrored flows. The user need
> to
> +create this port in the logical_switch. This port should
one
> to
> +one correspondence

Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

On Tue, Sep 05, 2017 at 09:12:09PM +0800, Hannes Frederic Sowa wrote:
> "Yang, Yi"  writes:
> 
> > We can change this later if we really find a better way to handle this
> > because it isn't defined in include/uapi/linux/openvswitch.h, so I still
> > have backdoor to do this if needed :-)
> 
> Sorry, I can't follow you.
> 
> It doesn't matter if something is defined in uapi headers, the
> observable behavior matters. If you allow users to configure flows with
> specific fields, it should not stop working at a future point in time.

Anyway this can be changed if it is really needed, so far current way is
the best one we can come up with, we would like to use it if you can
have better proposal. We have explained repeatedly context headers must
be matched and set, this is bottom line.

> 
> > For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
> > context[1] for destination IP for reverse RSP, they are used to match
> > and set in OpenFlow table, you can't limit users to use them in such
> > ways.
> 
> So in your specific case you expect the masks to be completely stable
> because you defined a protocol on top of NSH, understood. And that is
> stable accross all possible paths. Understood as well.
> 
> > If you check GENEVE implementation, tun_metadata* can be set or matched
> > as any other match field.
> 
> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
> is not in tun_metadata as well?

tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
lot of rework on tun_metadata to make it shared between GENEVE and NSH,
I don't think this can happen in near term. So tun_metadata isn't option
for this now.

> 
> > Actually the most important information in NSH are just these context
> > headers, you can't limit imagination of users by removing them from flow
> > keys.
> >
> > My point is to bring miniflow into kernel data path to fix your concern,
> > this will benefit your employer directly :-)
> 
> Okay, interesting. It will probably not help if you still have a hash of
> a packet inside the flow table and use that for load balancing.
> 
> [...]
> 
> BTW I don't want to stop this patch, I am merely interested in how the
> bigger picture will look like in the end.
> 
> Bye,
> Hannes
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Improved Packet Drop Statistics in OVS

Hi Rohith

There was one minor outstanding comment clarification.

Thanks Darrell

On 8/29/17, 4:12 AM, "Rohith Basavaraja" wrote:

Hi Darrell,

Thanks for reviewing the document and providing the feedback.

My response inline, pl search for [Rohith]

On 29/08/17, 1:20 AM, "Darrell Ball" wrote:

Hi Rohith

Thanks for the improvement proposal and writeup.

I wonder if it possible to divide the document by pipeline stage
and type somehow – like receive drops, flow processing drops and
transmit drops.
Flow processing drops due to xlate processing error drops, explicit
openflow drop and resource exhaustion is one division by type.
Maybe also by subtype, such as mbuf depletion for resource
exhaustion type, too deep recursion for xlate error…

[Rohith]Not sure If I understood your comment clearly. Did you meant to
categorize the drops as receive drops, flow processing drops and
transmit drops? In turn for each of the category have sub
type/gategory? In the document we intended the same, let me know if requires
any reformatting to convey the same.

[Darrell] Receive and transmit drops is mostly fine.
My comment was more for the flow processing drops.
The document is already good at mentioning the different kinds
of drops

Here is an example from the document

“
Below is a tentative list of identified silent drop scenarios. This needs to be
completed.

● Parsing Failures/Invalid packets.
● Drops due to mbuf unavailability or mbuf corruption.
● Errors while executing tunnel actions (like TUNNEL_POP/PUSH).
● Errors while executing sampling actions.
● Recirculation error cases.
● Encapsulation error.
“

However, above has at least 3 kinds of error subtypes, maybe more; something
like:
1/ “Packet sanity errors” – eg) Parsing Failures/Invalid packets.
2/ “Resource exhaustion errors” eg) Drops due to mbuf unavailability.
3/ “Unsupported dp translation” eg) Recirculation error cases ?

So, what I meant was - to try to categorize more formally as such.
At the same time try to avoid "Internal DP error” since it is hard to attribute
useful significance to it.

It would be good if ‘Internal DP error’ had more specific errors,
such as the above error cases.

[Rohith]Ok will have subtypes for Internal DP error

Would a table help here based on ‘stage’, ‘type’ and ‘subtype’
somehow; does not need to be as suggested above ?

[Rohith]Sure, will have table to summarize stage, type and subtype
drops.

Seems you are intending this initially for the userspace datapath ?
[Rohith] Yes initially it’s for userspace datapath only.

If you are maintaining stats on a PMD level as mentioned, did you
consider possibly exporting this into
pmd-stats-show (at least initially), maybe as an option, like
verbose or extended or … ?

[Rohith] Sure I consider that if that’s going to be helpful for debugging.

You could separate out the implementation details to a
separate later section; maybe some pseudocode or actual code for
implementation discussion part.

[Rohith] Ok, once the patch is ready will have separate section discussing
the complete code flow.

Thanks Darrell

On 8/27/17, 11:17 PM, "ovs-dev-boun...@openvswitch.org on behalf of
Rohith Basavaraja" wrote:

Hi,

Currently we are working on improving the packet drop
statistics in OVS.

Following link contains details of our proposal to improve
packet drop statistics in OVS.

https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1EtntZ8tPYr8lnbWyM385XMO8NIZ7v-2DRqBD1oPqLInb4_edit-3Fusp-3Dsharing=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=Hs45eNY7V06rCwaZy-m26uNMy8opKXvPKRbz2m8zRkA=Ys4O7MksFf7d-SP6egRvqvz--dn3-f6Sxe9MPdN5hdc=

We would like to get your comments and feedback on our
proposal. Also, welcome any suggestions
to improve/update to our proposal.

Thanks
Rohith

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH] datapath-windows: Increment ct packet counters based on ct_state.

2017-09-05 Thread Anand Kumar

For a given packet, packet counters in conntrack should be accounted only
once, even if the packet is processed multiple times by conntrack.

When a packet is processed by conntrack, ct_state flag is set to
OVS_CS_F_TRACKED. Use this state to identify if a packet has been
processed previously by conntrack.

Also update the ct packet counters when ct entry is created.

With this patch, the conntrack's packet counters behavior is similar
to linux

Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Conntrack.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 8bcda05..0adb6d5 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -169,6 +169,18 @@ OvsPostCtEventEntry(POVS_CT_ENTRY entry, UINT8 type)
 OvsPostCtEvent();
 }
 
+static __inline VOID
+OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST 
nbl)
+{
+if (reply) {
+entry->rev_key.byteCount+= OvsPacketLenNBL(nbl);
+entry->rev_key.packetCount++;
+} else {
+entry->key.byteCount += OvsPacketLenNBL(nbl);
+entry->key.packetCount++;
+}
+}
+
 static __inline BOOLEAN
 OvsCtAddEntry(POVS_CT_ENTRY entry, OvsConntrackKeyLookupCtx *ctx,
   PNAT_ACTION_INFO natInfo, UINT64 now)
@@ -287,6 +299,9 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx,
 }
 
 OvsCtUpdateFlowKey(key, state, ctx->key.zone, 0, NULL);
+if (entry) {
+OvsCtIncrementCounters(entry, ctx->reply, curNbl);
+}
 return entry;
 }
 
@@ -382,18 +397,6 @@ OvsCtKeyAreSame(OVS_CT_KEY ctxKey, OVS_CT_KEY entryKey)
 (ctxKey.zone == entryKey.zone));
 }
 
-static __inline VOID
-OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST 
nbl)
-{
-if (reply) {
-entry->rev_key.byteCount+= OvsPacketLenNBL(nbl);
-entry->rev_key.packetCount++;
-} else {
-entry->key.byteCount += OvsPacketLenNBL(nbl);
-entry->key.packetCount++;
-}
-}
-
 POVS_CT_ENTRY
 OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 {
@@ -730,6 +733,12 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
 NdisReleaseRWLock(ovsConntrackLockObj, );
 OVS_LOG_ERROR("Conntrack Limit hit: %lu", ctTotalEntries);
 return NDIS_STATUS_RESOURCES;
+
+/* Increment the counters soon after the lookup, since we set ct.state
+ * to OVS_CS_F_TRACKED after processing the ct entry.
+ */
+if (entry && (!(key->ct.state & OVS_CS_F_TRACKED))) {
+OvsCtIncrementCounters(entry, ctx.reply, curNbl);
 }
 
 if (!entry) {
@@ -740,7 +749,6 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
  );
 } else {
 /* Process the entry and update CT flags */
-OvsCtIncrementCounters(entry, ctx.reply, curNbl);
 entry = OvsProcessConntrackEntry(fwdCtx, layers->l4Offset, , key,
  zone, natInfo, commit, currentTime,
  );
-- 
2.9.3.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovs-atomic-msvc: Add atomics x64 builds

2017-09-05 Thread aserdean

I sent out a new version changing the commit message.

Please disregard this patch.

Alin.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Alin Gabriel Serdean
> Sent: Wednesday, September 6, 2017 1:30 AM
> To: d...@openvswitch.org
> Cc: Alin Gabriel Serdean 
> Subject: [ovs-dev] [PATCH] ovs-atomic-msvc: Add atomics x64 builds
> 
> This patch enables atomics on x64 builds.
> 
> Reuse the atomics defined for x86 and add atomics for 64 bit reads/writes.
> 
> Before this patch the cmap test gives us:
> $ ./tests/ovstest.exe test-cmap benchmark 1000 3 1 Benchmarking with
> n=1000, 3 threads, 1.00% mutations, batch size 1:
> cmap insert:  20100 ms
> cmap iterate:  2967 ms
> batch search: 10929 ms
> cmap destroy: 13489 ms
> 
> cmap insert:  20079 ms
> cmap iterate:  2953 ms
> cmap search:  10559 ms
> cmap destroy: 13486 ms
> 
> hmap insert:   2021 ms
> hmap iterate:  1162 ms
> hmap search:   5152 ms
> hmap destroy:  1158 ms
> 
> After this change we have:
> $ ./tests/ovstest.exe test-cmap benchmark 1000 3 1 Benchmarking with
> n=1000, 3 threads, 1.00% mutations, batch size 1:
> cmap insert:  20100 ms
> cmap iterate:  2967 ms
> batch search: 10929 ms
> cmap destroy: 13489 ms
> 
> cmap insert:  20079 ms
> cmap iterate:  2953 ms
> cmap search:  10559 ms
> cmap destroy: 13486 ms
> 
> hmap insert:   2021 ms
> hmap iterate:  1162 ms
> hmap search:   5152 ms
> hmap destroy:  1158 ms
> 
> $ ./tests/ovstest.exe test-cmap benchmark 1000 3 1 Benchmarking with
> n=1000, 3 threads, 1.00% mutations, batch size 1:
> cmap insert:   2953 ms
> cmap iterate:   267 ms
> batch search:  2193 ms
> cmap destroy:  2037 ms
> 
> cmap insert:   2909 ms
> cmap iterate:   267 ms
> cmap search:   2167 ms
> cmap destroy:  2087 ms
> 
> hmap insert:   1853 ms
> hmap iterate:  1086 ms
> hmap search:   4395 ms
> hmap destroy:  1140 ms
> 
> We should probably revisit this file and investigate it further to see if
we can
> squeeze more performance.
> 
> As a side effect fix tests on x64 because usage of `ovs-atomic-pthreads.h`
is
> currently broken.
> 
> Signed-off-by: Alin Gabriel Serdean 
> Suggested-by: Ben Pfaff 
> ---
>  lib/ovs-atomic-msvc.h | 12 
>  lib/ovs-atomic.h  |  2 +-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ovs-atomic-msvc.h b/lib/ovs-atomic-msvc.h index
> c6a7db3..0b041c6 100644
> --- a/lib/ovs-atomic-msvc.h
> +++ b/lib/ovs-atomic-msvc.h
> @@ -107,6 +107,7 @@ atomic_signal_fence(memory_order order)
>   * InterlockedExchange64Acquire() available. So we are forced to use
>   * InterlockedExchange64() which uses full memory barrier for everything
>   * greater than 'memory_order_relaxed'. */
> +#ifdef _M_IX86
>  #define atomic_store64(DST, SRC, ORDER)
\
>  if (ORDER == memory_order_relaxed) {
\
>  InterlockedExchangeNoFence64((int64_t volatile *) (DST),
\
> @@ -114,6 +115,11 @@ atomic_signal_fence(memory_order order)
>  } else {
\
>  InterlockedExchange64((int64_t volatile *) (DST), (int64_t)
(SRC));\
>  }
> +#elif _M_X64
> +/* 64 bit writes are atomic on amd64 if 64 bit aligned. */
> +#define atomic_store64(DST, SRC, ORDER) \
> +atomic_storeX(64, DST, SRC, ORDER)
> +#endif
> 
>  /* Used for 8 and 16 bit variations. */
>  #define atomic_storeX(X, DST, SRC, ORDER)   \
> @@ -160,11 +166,17 @@ atomic_signal_fence(memory_order order)
>  /* MSVC converts 64 bit reads into two instructions. So there is
>   * a possibility that an interrupt can make a 64 bit read non-atomic even
>   * when 8 byte aligned. So use fully memory barrier InterlockedOr64(). */
> +#ifdef _M_IX86
>  #define atomic_read64(SRC, DST, ORDER)
\
>  __pragma (warning(push))
\
>  __pragma (warning(disable:4047))
\
>  *(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);
\
>  __pragma (warning(pop))
> +#elif _M_X64
> +/* 64 bit reads are atomic on amd64 if 64 bit aligned. */
> +#define atomic_read64(SRC, DST, ORDER)
\
> +*(DST) = *(SRC);
> +#endif
> 
>  #define atomic_read(SRC, DST)   \
>  atomic_read_explicit(SRC, DST, memory_order_seq_cst) diff --git
> a/lib/ovs-atomic.h b/lib/ovs-atomic.h index f1f2c38..c835eb7 100644
> --- a/lib/ovs-atomic.h
> +++ b/lib/ovs-atomic.h
> @@ -335,7 +335,7 @@
>  #include "ovs-atomic-i586.h"
>  #elif HAVE_GCC4_ATOMICS
>  #include "ovs-atomic-gcc4+.h"
> -#elif _MSC_VER && _M_IX86 >= 500
> +#elif _MSC_VER
>  #include "ovs-atomic-msvc.h"
>  #else
>  /* ovs-atomic-pthreads implementation is provided for
portability.
> --
> 2.10.2.windows.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list

[ovs-dev] [PATCH v2] ovs-atomic-msvc: Add atomics x64 builds

2017-09-05 Thread Alin Gabriel Serdean

This patch enables atomics on x64 builds.

Reuse the atomics defined for x86 and add atomics for 64 bit reads/writes.

Before this patch the cmap test gives us:
$ ./tests/ovstest.exe test-cmap benchmark 1000 3 1
Benchmarking with n=1000, 3 threads, 1.00% mutations, batch size 1:
cmap insert:  20100 ms
cmap iterate:  2967 ms
batch search: 10929 ms
cmap destroy: 13489 ms

cmap insert:  20079 ms
cmap iterate:  2953 ms
cmap search:  10559 ms
cmap destroy: 13486 ms

hmap insert:   2021 ms
hmap iterate:  1162 ms
hmap search:   5152 ms
hmap destroy:  1158 ms

After this change we have:
$ ./tests/ovstest.exe test-cmap benchmark 1000 3 1
Benchmarking with n=1000, 3 threads, 1.00% mutations, batch size 1:
cmap insert:   2953 ms
cmap iterate:   267 ms
batch search:  2193 ms
cmap destroy:  2037 ms

cmap insert:   2909 ms
cmap iterate:   267 ms
cmap search:   2167 ms
cmap destroy:  2087 ms

hmap insert:   1853 ms
hmap iterate:  1086 ms
hmap search:   4395 ms
hmap destroy:  1140 ms

We should probably revisit this file and investigate it further to see if
we can squeeze more performance.

As a side effect fix tests on x64 because usage of `ovs-atomic-pthreads.h`
is currently broken.

Signed-off-by: Alin Gabriel Serdean 
Suggested-by: Ben Pfaff 
---
v2: Change commit message.
---
 lib/ovs-atomic-msvc.h | 12 
 lib/ovs-atomic.h  |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/ovs-atomic-msvc.h b/lib/ovs-atomic-msvc.h
index c6a7db3..0b041c6 100644
--- a/lib/ovs-atomic-msvc.h
+++ b/lib/ovs-atomic-msvc.h
@@ -107,6 +107,7 @@ atomic_signal_fence(memory_order order)
  * InterlockedExchange64Acquire() available. So we are forced to use
  * InterlockedExchange64() which uses full memory barrier for everything
  * greater than 'memory_order_relaxed'. */
+#ifdef _M_IX86
 #define atomic_store64(DST, SRC, ORDER)\
 if (ORDER == memory_order_relaxed) {   \
 InterlockedExchangeNoFence64((int64_t volatile *) (DST),   \
@@ -114,6 +115,11 @@ atomic_signal_fence(memory_order order)
 } else {   \
 InterlockedExchange64((int64_t volatile *) (DST), (int64_t) (SRC));\
 }
+#elif _M_X64
+/* 64 bit writes are atomic on amd64 if 64 bit aligned. */
+#define atomic_store64(DST, SRC, ORDER) \
+atomic_storeX(64, DST, SRC, ORDER)
+#endif
 
 /* Used for 8 and 16 bit variations. */
 #define atomic_storeX(X, DST, SRC, ORDER)   \
@@ -160,11 +166,17 @@ atomic_signal_fence(memory_order order)
 /* MSVC converts 64 bit reads into two instructions. So there is
  * a possibility that an interrupt can make a 64 bit read non-atomic even
  * when 8 byte aligned. So use fully memory barrier InterlockedOr64(). */
+#ifdef _M_IX86
 #define atomic_read64(SRC, DST, ORDER) \
 __pragma (warning(push))   \
 __pragma (warning(disable:4047))   \
 *(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);   \
 __pragma (warning(pop))
+#elif _M_X64
+/* 64 bit reads are atomic on amd64 if 64 bit aligned. */
+#define atomic_read64(SRC, DST, ORDER) \
+*(DST) = *(SRC);
+#endif
 
 #define atomic_read(SRC, DST)   \
 atomic_read_explicit(SRC, DST, memory_order_seq_cst)
diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h
index f1f2c38..c835eb7 100644
--- a/lib/ovs-atomic.h
+++ b/lib/ovs-atomic.h
@@ -335,7 +335,7 @@
 #include "ovs-atomic-i586.h"
 #elif HAVE_GCC4_ATOMICS
 #include "ovs-atomic-gcc4+.h"
-#elif _MSC_VER && _M_IX86 >= 500
+#elif _MSC_VER
 #include "ovs-atomic-msvc.h"
 #else
 /* ovs-atomic-pthreads implementation is provided for portability.
-- 
2.10.2.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovs-atomic-msvc: Add atomics x64 builds

2017-09-05 Thread Alin Gabriel Serdean

This patch enables atomics on x64 builds.

Reuse the atomics defined for x86 and add atomics for 64 bit reads/writes.

Before this patch the cmap test gives us:
$ ./tests/ovstest.exe test-cmap benchmark 1000 3 1
Benchmarking with n=1000, 3 threads, 1.00% mutations, batch size 1:
cmap insert:  20100 ms
cmap iterate:  2967 ms
batch search: 10929 ms
cmap destroy: 13489 ms

cmap insert:  20079 ms
cmap iterate:  2953 ms
cmap search:  10559 ms
cmap destroy: 13486 ms

hmap insert:   2021 ms
hmap iterate:  1162 ms
hmap search:   5152 ms
hmap destroy:  1158 ms

After this change we have:
$ ./tests/ovstest.exe test-cmap benchmark 1000 3 1
Benchmarking with n=1000, 3 threads, 1.00% mutations, batch size 1:
cmap insert:  20100 ms
cmap iterate:  2967 ms
batch search: 10929 ms
cmap destroy: 13489 ms

cmap insert:  20079 ms
cmap iterate:  2953 ms
cmap search:  10559 ms
cmap destroy: 13486 ms

hmap insert:   2021 ms
hmap iterate:  1162 ms
hmap search:   5152 ms
hmap destroy:  1158 ms

$ ./tests/ovstest.exe test-cmap benchmark 1000 3 1
Benchmarking with n=1000, 3 threads, 1.00% mutations, batch size 1:
cmap insert:   2953 ms
cmap iterate:   267 ms
batch search:  2193 ms
cmap destroy:  2037 ms

cmap insert:   2909 ms
cmap iterate:   267 ms
cmap search:   2167 ms
cmap destroy:  2087 ms

hmap insert:   1853 ms
hmap iterate:  1086 ms
hmap search:   4395 ms
hmap destroy:  1140 ms

We should probably revisit this file and investigate it further to see if
we can squeeze more performance.

As a side effect fix tests on x64 because usage of `ovs-atomic-pthreads.h`
is currently broken.

Signed-off-by: Alin Gabriel Serdean 
Suggested-by: Ben Pfaff 
---
 lib/ovs-atomic-msvc.h | 12 
 lib/ovs-atomic.h  |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/ovs-atomic-msvc.h b/lib/ovs-atomic-msvc.h
index c6a7db3..0b041c6 100644
--- a/lib/ovs-atomic-msvc.h
+++ b/lib/ovs-atomic-msvc.h
@@ -107,6 +107,7 @@ atomic_signal_fence(memory_order order)
  * InterlockedExchange64Acquire() available. So we are forced to use
  * InterlockedExchange64() which uses full memory barrier for everything
  * greater than 'memory_order_relaxed'. */
+#ifdef _M_IX86
 #define atomic_store64(DST, SRC, ORDER)\
 if (ORDER == memory_order_relaxed) {   \
 InterlockedExchangeNoFence64((int64_t volatile *) (DST),   \
@@ -114,6 +115,11 @@ atomic_signal_fence(memory_order order)
 } else {   \
 InterlockedExchange64((int64_t volatile *) (DST), (int64_t) (SRC));\
 }
+#elif _M_X64
+/* 64 bit writes are atomic on amd64 if 64 bit aligned. */
+#define atomic_store64(DST, SRC, ORDER) \
+atomic_storeX(64, DST, SRC, ORDER)
+#endif
 
 /* Used for 8 and 16 bit variations. */
 #define atomic_storeX(X, DST, SRC, ORDER)   \
@@ -160,11 +166,17 @@ atomic_signal_fence(memory_order order)
 /* MSVC converts 64 bit reads into two instructions. So there is
  * a possibility that an interrupt can make a 64 bit read non-atomic even
  * when 8 byte aligned. So use fully memory barrier InterlockedOr64(). */
+#ifdef _M_IX86
 #define atomic_read64(SRC, DST, ORDER) \
 __pragma (warning(push))   \
 __pragma (warning(disable:4047))   \
 *(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);   \
 __pragma (warning(pop))
+#elif _M_X64
+/* 64 bit reads are atomic on amd64 if 64 bit aligned. */
+#define atomic_read64(SRC, DST, ORDER) \
+*(DST) = *(SRC);
+#endif
 
 #define atomic_read(SRC, DST)   \
 atomic_read_explicit(SRC, DST, memory_order_seq_cst)
diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h
index f1f2c38..c835eb7 100644
--- a/lib/ovs-atomic.h
+++ b/lib/ovs-atomic.h
@@ -335,7 +335,7 @@
 #include "ovs-atomic-i586.h"
 #elif HAVE_GCC4_ATOMICS
 #include "ovs-atomic-gcc4+.h"
-#elif _MSC_VER && _M_IX86 >= 500
+#elif _MSC_VER
 #include "ovs-atomic-msvc.h"
 #else
 /* ovs-atomic-pthreads implementation is provided for portability.
-- 
2.10.2.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/2] Prepare for 2.5.5.

2017-09-05 Thread Ben Pfaff

On Sun, Sep 03, 2017 at 04:00:40PM -0700, Justin Pettit wrote:
> Signed-off-by: Justin Pettit 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/2] Set release date for 2.5.4 release.

2017-09-05 Thread Ben Pfaff

On Sun, Sep 03, 2017 at 04:00:39PM -0700, Justin Pettit wrote:
> Signed-off-by: Justin Pettit 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Tienes una factura sin pagar

2017-09-05 Thread admin via dev


Tienes una factura sin pagar

Generado el: 05/09/2017

Factura #669790

Descarga la factura con el enlace
Attachment(s) :

Factura_669790.zip
Descarga la factura >> http://bit.ly/2vIaOM0
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/2] ovn: Support chassis hostname in requested-chassis.

2017-09-05 Thread Lance Richardson

> From: "Russell Bryant" 
> To: d...@openvswitch.org
> Cc: lrich...@redhat.com, "Russell Bryant" 
> Sent: Friday, September 1, 2017 9:14:10 PM
> Subject: [PATCH 2/2] ovn: Support chassis hostname in requested-chassis.
> 
> Previously, OVN expected the Chassis "name" in the "requested-chassis"
> option for a Logical_Switch_Port.  It turns out that in the two OVN
> integrations I've checked with that plan to use this option,
> specifying the Chassis "hostname" is much more convenient.  This patch
> extends the "requested-chassis" option to support both the Chassis
> name or the hostname as a value.
> 
> Signed-off-by: Russell Bryant 
> ---

LGTM.

Acked-by: Lance Richardson 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Saithe filets Iceland

2017-09-05 Thread Bonesca - Jona

    [ View in browser ]( http://r.newsletter.bonescamail.nl/nru6rxb2oatrf.html 
)   
 
 
 
MSC Saithe Filets PBO Iceland 3 x 9 kilo FAS
 
5-8 oz € 3,60
8-16 oz € 3,75
16-32 oz € 3,95
32 oz+ € 3,95
 
Prices are per single box, bigger volumes prices on request.    





 
This email was sent to d...@openvswitch.org
You received this email because you are registered with Bonesca Import en 
Export BV
 
[ Unsubscribe here ]( http://r.newsletter.bonescamail.nl/nru6rxb2oatrg.html )  

Sent by
[  ]( http://r.newsletter.bonescamail.nl/track/click/2n3cr3p7xaoatrd )     
© 2017 Bonesca Import en Export BV  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: Re: 答复: Re: 答复: Re: [PATCH v2] ovn: Support for taas(tap-as-a-service) function

2017-09-05 Thread Russell Bryant

What if a mirror port *only* receives mirrored packets?  If the only
packets it ever receives are mirrored packets, a new flag would not be
necessary.

Do you intend for the port to operate as both a regular port *and* to
receive a mirror of traffic  for another port?

On Thu, Aug 24, 2017 at 10:31 PM,  wrote:

> I know your mean.
> The receiver need to distinguish the traffic is regular or mirror. This
> may need some special flow table to deal with it.
>
> Thanks
>
>
>
> *Gao Zhenyu >*
>
> 2017/08/25 10:12
>
> 收件人：wang.qia...@zte.com.cn,
> 抄送：ovs dev , Russell Bryant <
> russ...@ovn.org>, xurong00037997 ,
> zhou.huij...@zte.com.cn
> 主题：Re: 答复: Re: [ovs-dev] 答复: Re: [PATCH v2] ovn: Support
> for taas(tap-as-a-service) function
>
>
> I mean for regular packet, ovs should not add the geneve option,  the new
> geneve option is only for mirror traffic.
>
> Did you meant some mirror traffic has mirror flag and some would not have?
>
> Thanks
> Zhenyu Gao
>
> 2017-08-25 9:44 GMT+08:00 <*wang.qia...@zte.com.cn*
> >:
> Hi zhenyu,
> Thanks for your opinion.
> The mirror flag is not always exist, so I do not think add a new geneve
> option is a good idea.
>
> Thanks.
>
>
> *Gao Zhenyu <**sysugaozhe...@gmail.com* *>*
>
> 2017/08/25 09:34
>
> 收件人：*wang.qia...@zte.com.cn* ,
> 抄送：Russell Bryant <*russ...@ovn.org* >,
> ovs dev <*d...@openvswitch.org* >,
> *zhou.huij...@zte.com.cn* , xurong00037997 <
> *xu.r...@zte.com.cn* >
> 主题：Re: [ovs-dev] 答复: Re: [PATCH v2] ovn: Support for
> taas(tap-as-a-service) function
>
>
>
> Although adding a new geneve option is more complicate but I think it
> still worth having that.
> Once the destination chassis found that geneve option, it can tag the
> mirror flag on packet. And it make the whole process looks same no matter
> on same chassis or not.
>
> Thanks
> Zhenyu Gao
>
> 2017-08-25 9:15 GMT+08:00 <*wang.qia...@zte.com.cn*
> >:
> Hi Russell,
>
> Thanks for your review.
>
> When the mirror destination is in other chassis, the mirror flag which
> used to mark the packet need be transmitted to the destination chassis.
>
> We could use the inport, geneve option or new type of out port to indicate
> the packet as a mirrored packet.
>
> When we use inport to indicate the flag, this may need use inport as the
> match field in the egress pipeline, I think this may conflict with the
> egress pipeline.
>
> If use geneve option to deliver the mirror flag, this may be more
> complicated. So, I add a new type of port as the destination of mirror
> flow. The port types of mirror and taas corresponding to configurations of
> tap-flow and tap-service.
>
> Thanks.
>
>
>
>
>
> Russell Bryant <*russ...@ovn.org* >
> 2017/08/25 04:44
>
> 收件人：*wang.qia...@zte.com.cn* ,
> 抄送：  ovs dev <*d...@openvswitch.org* >,
> *zhou.huij...@zte.com.cn* ,
> xurong00037997 <*xu.r...@zte.com.cn* >
> 主题：  Re: [ovs-dev] [PATCH v2] ovn: Support for
> taas(tap-as-a-service) function
>
>
> Sorry for the delay in getting back to this ...
>
> On Tue, Aug 15, 2017 at 4:28 AM,  <*wang.qia...@zte.com.cn*
> > wrote:
> > Taas was designed to provide tenants and service providers a means of
> > monitoring the traffic flowing in their Neutron provisioned virtual
> > networks. It is useful for network trouble-shooting, security and
> > analytics. The taas presentations could be found from
> >
>
> *https://github.com/openstack/tap-as-a-service/blob/master/doc/source/presentations.rst*
> 
>
> > , and the api reference could be found from
> >
>
> *https://github.com/openstack/tap-as-a-service/blob/master/API_REFERENCE.rst*
> 
>
> >
> > To support taas function, this patch add two type of logica_switch_port,
> > "mirror" and "taas". port with type "mirror" is used as inport for
> monitor
> > flow in logica_switch, and port with type "taas" is used as outport for
> > monitor flow in logica_switch.
> >
> > The ovn-controller will make the relations of the ports in tap_service
> and
> > tap_flow to mirror port and taas port.
> >
> > Signed-off-by: wang qianyu <*wang.qia...@zte.com.cn*
> >
>
> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> > index 31303a8..5fdd045 100644
> > --- a/ovn/ovn-nb.xml
> > +++ b/ovn/ovn-nb.xml
> > @@ -301,6 +301,20 @@
> >
> >  A port to a logical switch on

Re: [ovs-dev] [PATCH 1/2] windows, python: create a different event for sockets

2017-09-05 Thread Russell Bryant

Acked-by: Russell Bryant 

On Tue, Aug 29, 2017 at 5:09 AM,   wrote:
> CC: Russell Bryant ; Lance Richardson 
>
> Can you please take a look?
>
> Tested-by: Alin Gabriel Serdean 
>
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>> boun...@openvswitch.org] On Behalf Of Alin Balutoiu
>> Sent: Friday, August 25, 2017 6:03 PM
>> To: d...@openvswitch.org
>> Cc: Alin Gabriel Serdean 
>> Subject: [ovs-dev] [PATCH 1/2] windows, python: create a different event
>> for sockets
>>
>> At the moment the sockets on Windows use the same events that are being
>> created for the pipes.
>>
>> This is not correct because they should be different events.
>>
>> This patch introduces a new event which should be used for sockets.
>> The new event needs to be set on automatic reset with its initial state
> not
>> signaled.
>>
>> Signed-off-by: Alin Balutoiu 
>> Co-authored-by: Alin Gabriel Serdean 
>> Signed-off-by: Alin Gabriel Serdean 
>> ---
>>  python/ovs/stream.py | 19 +++
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py index
>> 717ea18..9d0536d 100644
>> --- a/python/ovs/stream.py
>> +++ b/python/ovs/stream.py
>> @@ -102,10 +102,6 @@ class Stream(object):
>>  self.socket = socket
>>  self.pipe = pipe
>>  if sys.platform == 'win32':
>> -self._read = pywintypes.OVERLAPPED()
>> -self._read.hEvent = winutils.get_new_event()
>> -self._write = pywintypes.OVERLAPPED()
>> -self._write.hEvent = winutils.get_new_event()
>>  if pipe is not None:
>>  # Flag to check if fd is a server HANDLE.  In the case of
> a
>>  # server handle we have to issue a disconnect before
> closing @@ -
>> 114,6 +110,13 @@ class Stream(object):
>>  suffix = name.split(":", 1)[1]
>>  suffix = ovs.util.abs_file_name(ovs.dirs.RUNDIR, suffix)
>>  self._pipename = winutils.get_pipe_name(suffix)
>> +self._read = pywintypes.OVERLAPPED()
>> +self._read.hEvent = winutils.get_new_event()
>> +self._write = pywintypes.OVERLAPPED()
>> +self._write.hEvent = winutils.get_new_event()
>> +else:
>> +self._wevent = winutils.get_new_event(bManualReset=False,
>> +
>> + bInitialState=False)
>>
>>  self.name = name
>>  if status == errno.EAGAIN:
>> @@ -459,24 +462,24 @@ class Stream(object):
>>win32file.FD_CLOSE)
>>  try:
>>  win32file.WSAEventSelect(self.socket,
>> - self._read.hEvent,
>> + self._wevent,
>>   read_flags)
>>  except pywintypes.error as e:
>>  vlog.err("failed to associate events with socket: %s"
>>   % e.strerror)
>> -poller.fd_wait(self._read.hEvent, ovs.poller.POLLIN)
>> +poller.fd_wait(self._wevent, ovs.poller.POLLIN)
>>  else:
>>  write_flags = (win32file.FD_WRITE |
>> win32file.FD_CONNECT |
>> win32file.FD_CLOSE)
>>  try:
>>  win32file.WSAEventSelect(self.socket,
>> - self._write.hEvent,
>> + self._wevent,
>>   write_flags)
>>  except pywintypes.error as e:
>>  vlog.err("failed to associate events with socket: %s"
>>   % e.strerror)
>> -poller.fd_wait(self._write.hEvent, ovs.poller.POLLOUT)
>> +poller.fd_wait(self._wevent, ovs.poller.POLLOUT)
>>  else:
>>  if wait == Stream.W_RECV:
>>  if self._read:
>> --
>> 2.10.0.windows.1
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>



-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/2] .gitignore: Ignore generated file cxxtest.cc.

2017-09-05 Thread Russell Bryant

On Sun, Sep 3, 2017 at 1:33 PM, Ben Pfaff  wrote:
> On Fri, Sep 01, 2017 at 09:14:09PM -0400, Russell Bryant wrote:
>> Signed-off-by: Russell Bryant 
>
> Acked-by: Ben Pfaff 

Thanks, I applied this to master and branch-2.8.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3] ovn: Check for known logical switch port types.

2017-09-05 Thread Lance Richardson

> From: "Mark Michelson" 
> To: d...@openvswitch.org
> Sent: Friday, August 25, 2017 3:14:33 PM
> Subject: [ovs-dev] [PATCH v3] ovn: Check for known logical switch port types.
> 
> OVN is lenient with the types of logical switch ports. Maybe too
> lenient. This patch attempts to solve this problem on two fronts:
> 
> 1) In ovn-nbctl, if you attempt to set the port type to an unknown
> type, the command will not end up setting the type.
> 2) In northd, when copying the port type from the northbound database to
> the corresponding port-binding in the southbound database, a warning
> will be issued if the port is of an unknown type.
> 
> Signed-off-by: Mark Michelson 
> ---
> v3:
>   * OVN_NB_LSP_TYPES declaration is static
>   * northd will not copy unknown port types to southbound DB
>   * re-ordered port types in OVN_NB_LSP_TYPES
> v2:
>   * Used ARRAY_SIZE to calculate length of OVN_NB_LSP_TYPES
> v1:
>   * Initial patch
> ---

Hi Mark,

This seems like a good enhancement to me. Changes look good, but they
currently cause a test case to fail (setting lsp type to "void"). I think
something like the following will be needed before this can be applied:

diff --git a/tests/ovn-controller-vtep.at b/tests/ovn-controller-vtep.at
index 9b9e42115..0d2711e3a 100644
--- a/tests/ovn-controller-vtep.at
+++ b/tests/ovn-controller-vtep.at
@@ -327,7 +327,7 @@ ${tunnel_key}
 
 # changes the ovn-nb logical port type so that it is no longer
 # vtep port.
-AT_CHECK([ovn-nbctl lsp-set-type br-vtep_lswitch0 void])
+AT_CHECK([ovn-nbctl lsp-set-type br-vtep_lswitch0 ""])
 OVS_WAIT_UNTIL([test -z "`vtep-ctl --columns=tunnel_key list Logical_Switch | 
grep 1`"])
 # now should see the tunnel key reset.
 AT_CHECK([vtep-ctl --columns=tunnel_key list Logical_Switch | cut -d ':' -f2 | 
tr -d ' '], [0], [dnl



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Pricelist 05-09-2017

2017-09-05 Thread Bonesca - Jona

    [ View in browser ]( http://r.newsletter.bonescamail.nl/nru6rwimoatrf.html 
)   
 
 
 

[ Click here ]( http://r.newsletter.bonescamail.nl/track/click/2n3cr3m25aoatrd 
) to find our latest pricelist
[ Klicken Sie hier ]( 
http://r.newsletter.bonescamail.nl/track/click/2n3cr3m2xqoatrd ) fur unsere 
neue Preisliste
[ Klik hier ]( http://r.newsletter.bonescamail.nl/track/click/2n3cr3m3q6oatrd ) 
voor de nieuwe prijslijst
[ Cliquiz ici ]( http://r.newsletter.bonescamail.nl/track/click/2n3cr3m4imoatrd 
) pour la liste de prix nouvelle   
This email was sent to d...@openvswitch.org
You received this email because you are registered with Bonesca Import en 
Export BV
 
[ Unsubscribe here ]( http://r.newsletter.bonescamail.nl/nru6rwimoatrg.html )  

Sent by
[  ]( http://r.newsletter.bonescamail.nl/track/click/2n3cr3m5b2oatrd )     
© 2017 Bonesca Import en Export BV  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] FAQ: Indicate OVS 2.8 supports up to Linux 4.12.

2017-09-05 Thread Justin Pettit

Signed-off-by: Justin Pettit 
---
 Documentation/faq/releases.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 2ecc24c4d50e..c785529528e9 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -65,6 +65,7 @@ Q: What Linux kernel versions does each Open vSwitch release 
work with?
 2.5.x2.6.32 to 4.3
 2.6.x3.10 to 4.7
 2.7.x3.10 to 4.9
+2.8.x3.10 to 4.12
  ==
 
 Open vSwitch userspace should also work with the Linux kernel module built
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Execute QoS Checking before copying to mbuf

Hi Gao

How about the following incremental ?

Thanks Darrell



diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4808d38..648d719 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1843,64 +1843,58 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 #endif
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
-int txcnt_eth = batch->count;
-int dropped = 0;
-int newcnt = 0;
-int i;
+uint32_t cnt = batch->count;
+uint32_t dropped = 0;
 
 if (dev->type != DPDK_DEV_VHOST) {
 /* Check if QoS has been configured for this netdev. */
-txcnt_eth = netdev_dpdk_qos_run(dev,
-(struct rte_mbuf **)batch->packets,
-txcnt_eth, false);
-if (txcnt_eth == 0) {
-dropped = batch->count;
-goto out;
-}
+cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
+  cnt, false);
+dropped += batch->count - cnt;
 }
 
 dp_packet_batch_apply_cutlen(batch);
 
-for (i = 0; i < batch->count; i++) {
-int size = dp_packet_size(batch->packets[i]);
+uint32_t txcnt = 0;
+
+for (uint32_t i = 0; i < cnt; i++) {
+
+uint32_t size = dp_packet_size(batch->packets[i]);
 
 if (OVS_UNLIKELY(size > dev->max_packet_len)) {
-VLOG_WARN_RL(, "Too big size %d max_packet_len %d",
- (int) size, dev->max_packet_len);
+VLOG_WARN_RL(, "Too big size %u max_packet_len %d",
+ size, dev->max_packet_len);
 
 dropped++;
 continue;
 }
 
-pkts[newcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
+pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
 
-if (!pkts[newcnt]) {
-dropped += batch->count - i;
+if (!pkts[txcnt]) {
+dropped += cnt - i;
 break;
 }
 
 /* We have to do a copy for now */
-memcpy(rte_pktmbuf_mtod(pkts[newcnt], void *),
+memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
dp_packet_data(batch->packets[i]), size);
 
-rte_pktmbuf_data_len(pkts[newcnt]) = size;
-rte_pktmbuf_pkt_len(pkts[newcnt]) = size;
+rte_pktmbuf_data_len(pkts[txcnt]) = size;
+rte_pktmbuf_pkt_len(pkts[txcnt]) = size;
 
-newcnt++;
-if (dev->type != DPDK_DEV_VHOST && newcnt >= txcnt_eth) {
-dropped += batch->count - i - 1;
-break;
-}
+txcnt++;
 }
 
-if (dev->type == DPDK_DEV_VHOST) {
-__netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) pkts,
- newcnt);
-} else {
-dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, newcnt);
+if (OVS_LIKELY(txcnt)) {
+if (dev->type == DPDK_DEV_VHOST) {
+__netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) pkts,
+ txcnt);
+} else {
+dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+}
 }
 
-out:
 if (OVS_UNLIKELY(dropped)) {
 rte_spinlock_lock(>stats_lock);
 dev->stats.tx_dropped += dropped;


//


On 8/29/17, 4:39 AM, "ovs-dev-boun...@openvswitch.org on behalf of Zhenyu Gao" 
 wrote:

In dpdk_do_tx_copy function, all packets were copied to mbuf first,
but QoS checking may drop some of them.
Move the QoS checking in front of copying data to mbuf, it helps to
reduce useless copy.

Signed-off-by: Zhenyu Gao 
---
 lib/netdev-dpdk.c | 55 
---
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1aaf6f7..50874b8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -279,7 +279,7 @@ struct dpdk_qos_ops {
  * For all QoS implementations it should always be non-null.
  */
 int (*qos_run)(struct qos_conf *qos_conf, struct rte_mbuf **pkts,
-   int pkt_cnt);
+   int pkt_cnt, bool may_steal);
 };
 
 /* dpdk_qos_ops for each type of user space QoS implementation */
@@ -1501,7 +1501,8 @@ netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm 
*meter,
 
 static int
 netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
-struct rte_mbuf **pkts, int pkt_cnt)
+struct rte_mbuf **pkts, int pkt_cnt,
+bool may_steal)
 {
 int i = 0;
 int cnt = 0;
@@ -1517,7 +1518,9 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow



On 9/5/17, 2:01 AM, "Yuanhan Liu"  wrote:

On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote:
> 
> 
> On 8/31/17, 3:13 AM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> > 
> > [Finn]
> > 
> > I think we should not further intermix the rxqs distributed to 
different pmd's, other than initially configured, when setting up hw-offload. 
If we make a round-robin distribution of the rxqs, a different pmd will most 
likely receive the hw-offloaded packets - not the same pmd that ran the 
slow-path originally creating the flow.
> > 
> > It is usual to optimize caches etc. per pmd and that would not 
work then. Maybe the further processing of the hw-offloaded packets does not 
need these optimizations at the moment, however, IMHO I think we would be 
better off using the first proposal above (use the same rxq as the one creating 
the flow).
> > 
> > [Darrell] Several ideas have some validity.
> >  However, this sounds reasonable and simple and we 
could revisit as needed.
> >  What do you think Yuanhan ?
> 
> Just want to make sure we are on the same page: do you mean the 
original
> solution/workaround I mentioned in the cover letter: record the rxq at
> recv and pass it down to flow creation?
> 
> If so, I'm okay with it.
> 
> [Darrell]
> This is the relevant part from the cover letter:
> 
> “One possible
>  solution is to record the rxq and pass it down to the flow creation
>  stage. It would be much better, but it's still far away from being 
perfect.
>  Because it might have changed the steering rules stealthily, which may
>  break the default RSS setup by OVS-DPDK.”
> 
> This is a reasonable first cut.
> However, the flows installed are masked flows but the associated packets 
would ‘normally’ end up on multiple
> PMDs due to RSS, right ?

Why it's "multiple PMDs due to RSS"?　Isn't RSS for distributing packets
to multiple RX queues inside the NIC?

[Darrell] I was referring to the general case of using the distribution across 
queues to
distribute work to different PMDs.


--yliu

> But for HWOL, we specify ‘the queue’ to be the one we receive the first 
packet from. 
> This is what I was getting at b4. So, future workarounds would be 
‘auto-splitting flows’ across queues, user specified flow->queue
> associations etc
> 
> 
> 
> 
> 
> 
>   --yliu
> 
> 
> 
> 
> 
> 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] docs: fix a typo in vhost-user documentation.

2017-09-05 Thread Rami Rosen

This patch fixes a trivial typo in vhost-user documentation:
the path to the second socket should be /tmp/dpdkvhostclient1
and not /tmp/dpdkvhostclient0.

Signed-off-by: Rami Rosen 
---
 Documentation/topics/dpdk/vhost-user.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index a3d5de382..d2942034d 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -44,7 +44,7 @@ existing bridge called ``br0``::
 
 For the above examples to work, an appropriate server socket must be created
 at the paths specified (``/tmp/dpdkvhostclient0`` and
-``/tmp/dpdkvhostclient0``).  These sockets can be created with QEMU; see the
+``/tmp/dpdkvhostclient1``).  These sockets can be created with QEMU; see the
 :ref:`vhost-user client ` section for details.
 
 vhost-user vs. vhost-user-client
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-09-05 Thread Loftus, Ciara

> 
> Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> So L4 packets's cksum were calculated in VM side but performance is not
> good.
> Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput in
> VM->phy->phy->VM situation. And it makes virtio-net frontend-driver
> support NETIF_F_SG(feature scatter-gather) as well.
> 
> Signed-off-by: Zhenyu Gao 
> ---
> 
> Here is some performance number:

Hi Zhenyu,

Thanks for the code changes since v3.
I tested a VM to VM case using iperf and observed a performance degradation 
when the tx cksum was offloaded to the host:

checksum in VM
0.0-30.0 sec  10.9 GBytes  3.12 Gbits/sec
0.0-30.0 sec  10.9 GBytes  3.11 Gbits/sec
0.0-30.0 sec  11.0 GBytes  3.16 Gbits/sec

checksum in ovs dpdk
0.0-30.0 sec  7.52 GBytes  2.15 Gbits/sec
0.0-30.0 sec  7.12 GBytes  2.04 Gbits/sec
0.0-30.0 sec  8.17 GBytes  2.34 Gbits/sec

I think for this feature to enabled we need performance to be roughly the same 
or better for all use cases. For now the gap here is too big I think.

If you wish to reproduce:

1 host, 2 VMs each with 1 vhost port and flows set up to switch packets from 
each vhost port to the other.

VM1:
ifconfig eth1 1.1.1.1/24 up
ethtool -K eth2 tx on/off
iperf -c 1.1.1.2 -i 1 -t 30

VM2:
ifconfig eth1 1.1.1.2/24 up
ethtool -K eth1 tx on/off
iperf -s -i 1

Thanks,
Ciara

> 
> Setup:
> 
>  qperf client
> +-+
> |   VM|
> +-+
>  |
>  |  qperf server
> +--+  ++
> | vswitch+dpdk |  | bare-metal |
> +--+  ++
>||
>||
>   pNic-PhysicalSwitch
> 
> do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on'
> in VM side.
>   It offload cksum job to ovs-dpdk side.
> 
> do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM
> side.
> VM calculate cksum for tcp/udp packets.
> 
> We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> cksum.
> 
> [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01
> tcp_bw tcp_lat udp_bw udp_lat
>   do cksum in ovs-dpdk  do cksum in VM without this patch
> tcp_bw:
> bw  =  1.9 MB/sec bw  =  1.92 MB/secbw  =  1.95 MB/sec
> tcp_bw:
> bw  =  3.97 MB/secbw  =  3.99 MB/secbw  =  3.98 MB/sec
> tcp_bw:
> bw  =  7.75 MB/secbw  =  7.79 MB/secbw  =  7.89 MB/sec
> tcp_bw:
> bw  =  14.7 MB/secbw  =  14.7 MB/secbw  =  14.9 MB/sec
> tcp_bw:
> bw  =  27.7 MB/secbw  =  27.4 MB/secbw  =  28 MB/sec
> tcp_bw:
> bw  =  51.1 MB/secbw  =  51.3 MB/secbw  =  51.8 MB/sec
> tcp_bw:
> bw  =  86.2 MB/secbw  =  84.4 MB/secbw  =  87.6 MB/sec
> tcp_bw:
> bw  =  141 MB/sec bw  =  142 MB/secbw  =  141 MB/sec
> tcp_bw:
> bw  =  203 MB/sec bw  =  201 MB/secbw  =  211 MB/sec
> tcp_bw:
> bw  =  267 MB/sec bw  =  250 MB/secbw  =  260 MB/sec
> tcp_bw:
> bw  =  324 MB/sec bw  =  295 MB/secbw  =  302 MB/sec
> tcp_bw:
> bw  =  397 MB/sec bw  =  363 MB/secbw  =  347 MB/sec
> tcp_bw:
> bw  =  765 MB/sec bw  =  510 MB/secbw  =  383 MB/sec
> tcp_bw:
> bw  =  850 MB/sec bw  =  710 MB/secbw  =  417 MB/sec
> tcp_bw:
> bw  =  1.09 GB/secbw  =  860 MB/secbw  =  444 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  979 MB/secbw  =  447 MB/sec
> tcp_bw:
> bw  =  1.17 GB/secbw  =  1.07 GB/sec   bw  =  462 MB/sec
> tcp_lat:
> latency  =  29.1 us   latency  =  28.7 uslatency  =  29.1 us
> tcp_lat:
> latency  =  29 us latency  =  28.8 uslatency  =  29 us
> tcp_lat:
> latency  =  29 us latency  =  28.8 uslatency  =  29 us
> tcp_lat:
> latency  =  29 us latency  =  28.9 uslatency  =  29 us
> tcp_lat:
> latency  =  29.2 us   latency  =  28.9 uslatency  =  29.1 us
> tcp_lat:
> latency  =  29.1 us   latency  =  29.1 uslatency  =  29.1 us
> tcp_lat:
> latency  =  29.5 us   latency  =  29.5 uslatency  =  29.5 us
> tcp_lat:
> latency  =  29.8 us   latency  =  29.8 uslatency  =  29.9 us
> tcp_lat:
> latency  =  30.7 us   latency  =  30.7 uslatency  =  30.7 us
> tcp_lat:
> latency  =  47.1 us   latency  =  46.2 uslatency  =  47.1 us
> tcp_lat:
> latency  =  52.1 us   latency  =  52.3 uslatency  =  53.3 us
> tcp_lat:
> latency  =  44 us latency  =  43.8 uslatency  =  43.2 us
> tcp_lat:
> latency  =  50 us latency  =  46.6 uslatency  =  47.8 us
> tcp_lat:
>

Re: [ovs-dev] [PATCH] Set release date for 2.8.0.

2017-09-05 Thread Flavio Leitner

On Fri, 1 Sep 2017 11:47:19 -0700
Justin Pettit  wrote:

> > On Aug 31, 2017, at 1:18 PM, Justin Pettit  wrote:
> >   
> >> 
> >> On Aug 31, 2017, at 1:14 PM, Ben Pfaff  wrote:
> >> 
> >> Since you are getting that last revert in for Flavio, and you usually do
> >> the releases, do you want to take over this patch, then?  
> > 
> > Sure.  I'm happy to do that.  
> 
> I pushed this to branch-2.8.

We ran tests with branch-2.8 from a couple weeks ago and found
only the ct() behavior change as a blocker issue. Since the
patch was reverted, the issue is resolved.

We found SELinux issues which are important to RHEL/Fedora distros,
which Aaron posted fixes already and we can continue following up
if needed.

Thanks,
-- 
Flavio

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-05 Thread Hannes Frederic Sowa

Hello Jan,

Jan Scheurich  writes:

> Please have a look at the Google doc that sketches the overall
> solution to support NSH in OVS.
> https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8
>
> In details it is slightly outdated but the NSH MD1 support (and all
> prerequisites like PTAP and Generic Encap/Decap) have been implemented
> in OVS 2.8 (ofproto layer and the userspace datapath).
>
> The NSH use cases are discussed in the chapter "Support for NSH". OVS
> is primarily targeting the Classifier and SFF use cases. Obviously the
> classifier must be able to set the MD1 context headers. The final SFF
> must be able to inspect the context headers, typically to determine
> the L2 or L3 forwarding context (e.g. VRF) in which to forward the
> decapsulated packet on to its final destination.

>From Yi Yang's mail I understood that you put a tunnel ID into
context[0]. In this case, I can understand that you want to match on
that. I was under the impression that the combination of tenant id from
the vxlan-gpe header and the path id plus ttl would give enough state
for the SDN to keep state on where the packet is in the network. I don't
understand what a tunnel id is.

I understood that the context fields are usable by the service function
chains, but they are actually in use by the outer SDN and basically
standardized.

> This information has end-to-end significance for the SFP and is
> encoded by the classifier in the NSH context headers. It cannot be put
> into transport tunnel options of e.g. a Geneve tunnel connecting two
> SFFs along the SFP.

Because the TLVs are not end to end in the geneve case? Yes, this makes
sense.

> We are now trying to establish feature parity in the kernel
> datapath. If the kernel datapath chooses not to support the MD1
> fields, OVS with kernel datapath will not be able to address the
> classifier and final SFF use cases.

As I understand this now, you are designing some kind of protocol or
particular implementation on top of NSH. Thus you don't expect masks to
grow limitless and you don't allow any SFC elements to take part in the
decision what to communicate over the context fields.

I still wonder about the hash as part of the NSH context and how that
fits into the picture, which looked like the only standardized
application of context headers from your google doc.

Btw., I don't want to block this patch.

Bye,
Hannes
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-05 Thread Hannes Frederic Sowa

"Yang, Yi"  writes:

> On Tue, Sep 05, 2017 at 12:30:09PM +0200, Hannes Frederic Sowa wrote:
>> "Yang, Yi"  writes:
>> 
>> > I'm not sure what new action you expect to bring here, I think group
>> > action is just for this, as you said it isn't only bound to NSH, you can
>> > start a new thread to discuss this. I don't think it is in scope of NSH.
>> 
>> It is in scope of this discussion as you will provide a user space API
>> that makes the NSH context fields accessible from user space in a
>> certain way. If you commit to this, there is no way going back.
>
> We can change this later if we really find a better way to handle this
> because it isn't defined in include/uapi/linux/openvswitch.h, so I still
> have backdoor to do this if needed :-)

Sorry, I can't follow you.

It doesn't matter if something is defined in uapi headers, the
observable behavior matters. If you allow users to configure flows with
specific fields, it should not stop working at a future point in time.

>> I haven't yet grasped the idea on how those fields will be used in OVS
>> besides load balancing. Even for load balancing the tunnel itself
>> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
>> entropy to do per-flow load balancing. What else is needed?  Why a
>> context header for that? You just need multiple action chains and pick
>> one randomly.
>
> For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
> context[1] for destination IP for reverse RSP, they are used to match
> and set in OpenFlow table, you can't limit users to use them in such
> ways.

So in your specific case you expect the masks to be completely stable
because you defined a protocol on top of NSH, understood. And that is
stable accross all possible paths. Understood as well.

> If you check GENEVE implementation, tun_metadata* can be set or matched
> as any other match field.

Yes, I wrote that in my previous mail. I wonder why NSH context metadata
is not in tun_metadata as well?

> Actually the most important information in NSH are just these context
> headers, you can't limit imagination of users by removing them from flow
> keys.
>
> My point is to bring miniflow into kernel data path to fix your concern,
> this will benefit your employer directly :-)

Okay, interesting. It will probably not help if you still have a hash of
a packet inside the flow table and use that for load balancing.

[...]

BTW I don't want to stop this patch, I am merely interested in how the
bigger picture will look like in the end.

Bye,
Hannes
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-05 Thread Jan Scheurich

Hi Hannes,

Please have a look at the Google doc that sketches the overall solution to 
support NSH in OVS. 
https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8

In details it is slightly outdated but the NSH MD1 support (and all 
prerequisites like PTAP and Generic Encap/Decap) have been implemented in OVS 
2.8 (ofproto layer and the userspace datapath). 

The NSH use cases are discussed in the chapter "Support for NSH". OVS is 
primarily targeting the Classifier and SFF use cases. Obviously the classifier 
must be able to set the MD1 context headers. The final SFF must be able to 
inspect the context headers, typically to determine the L2 or L3 forwarding 
context (e.g. VRF) in which to forward the decapsulated packet on to its final 
destination.

This information has end-to-end significance for the SFP and is encoded by the 
classifier in the NSH context headers. It cannot be put into transport tunnel 
options of e.g. a Geneve tunnel connecting two SFFs along the SFP.

We are now trying to establish feature parity in the kernel datapath. If the 
kernel datapath chooses not to support the MD1 fields, OVS with kernel datapath 
will not be able to address the classifier and final SFF use cases.

BR, Jan

> -Original Message-
> From: Hannes Frederic Sowa [mailto:han...@stressinduktion.org]
> Sent: Tuesday, 05 September, 2017 12:30
> To: Yang, Yi 
> Cc: net...@vger.kernel.org; d...@openvswitch.org; jb...@redhat.com; 
> e...@erig.me; b...@ovn.org; Jan Scheurich
> 
> Subject: Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> 
> "Yang, Yi"  writes:
> 
> > I'm not sure what new action you expect to bring here, I think group
> > action is just for this, as you said it isn't only bound to NSH, you can
> > start a new thread to discuss this. I don't think it is in scope of NSH.
> 
> It is in scope of this discussion as you will provide a user space API
> that makes the NSH context fields accessible from user space in a
> certain way. If you commit to this, there is no way going back.
> 
> I haven't yet grasped the idea on how those fields will be used in OVS
> besides load balancing. Even for load balancing the tunnel itself
> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
> entropy to do per-flow load balancing. What else is needed?  Why a
> context header for that? You just need multiple action chains and pick
> one randomly.
> 
> The only protocol that I can compare that to is geneve with TLVs, but
> the TLVs are global and uniquie and a property of the networking
> forwarding backplane and not a property of the path inside a tenant. So
> I expect this actually to be the first case where I think that matters.
> 
> Why are context labels that special that they are not part of tun_ops?
> 
> Thanks,
> Hannes
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

On Tue, Sep 05, 2017 at 12:30:09PM +0200, Hannes Frederic Sowa wrote:
> "Yang, Yi"  writes:
> 
> > I'm not sure what new action you expect to bring here, I think group
> > action is just for this, as you said it isn't only bound to NSH, you can
> > start a new thread to discuss this. I don't think it is in scope of NSH.
> 
> It is in scope of this discussion as you will provide a user space API
> that makes the NSH context fields accessible from user space in a
> certain way. If you commit to this, there is no way going back.

We can change this later if we really find a better way to handle this
because it isn't defined in include/uapi/linux/openvswitch.h, so I still
have backdoor to do this if needed :-)

> 
> I haven't yet grasped the idea on how those fields will be used in OVS
> besides load balancing. Even for load balancing the tunnel itself
> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
> entropy to do per-flow load balancing. What else is needed?  Why a
> context header for that? You just need multiple action chains and pick
> one randomly.

For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
context[1] for destination IP for reverse RSP, they are used to match
and set in OpenFlow table, you can't limit users to use them in such
ways.

If you check GENEVE implementation, tun_metadata* can be set or matched
as any other match field.

Actually the most important information in NSH are just these context
headers, you can't limit imagination of users by removing them from flow
keys.

My point is to bring miniflow into kernel data path to fix your concern,
this will benefit your employer directly :-)

I'm just curious your company has hardware to offload NSH? Which
company are you from?

> 
> The only protocol that I can compare that to is geneve with TLVs, but
> the TLVs are global and uniquie and a property of the networking
> forwarding backplane and not a property of the path inside a tenant. So
> I expect this actually to be the first case where I think that matters.
> 
> Why are context labels that special that they are not part of tun_ops?
> 
> Thanks,
> Hannes
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

On Tue, Sep 05, 2017 at 11:46:45AM +0200, Jiri Benc wrote:
> On Tue, 5 Sep 2017 13:51:45 +0800, Yang, Yi wrote:
> > But if we follow your way, how does nla_for_each_nested handle such
> > pattern?
> > 
> > attribute1
> > attribute1_mask
> > attribute2
> > attribute2_mask
> > attribute3
> > attribute3_mask
> 
> Uh? That will just work. Note that nla len contains the size of the
> whole attribute, i.e. includes both fields.
> 
> > I don't think this can increase performance and readability.
>

I got the point, actually value and mask are data of one attribute, so
they are three attributes but not six. 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

On Tue, Sep 05, 2017 at 11:47:41AM +0200, Jiri Benc wrote:
> On Tue, 5 Sep 2017 14:37:05 +0800, Yang, Yi wrote:
> > I checked source code but can't find where we can prepopulate it and how
> > we can deliver the prepopulated header to push_nsh, can you expand it in
> > order that I can know how to do this in code level?
> 
> I already said that it's not implemented yet and can be done as a
> future performance enhancement.

Got it, thanks.

> 
>  Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jan Scheurich

> From: Jiri Benc [mailto:jb...@redhat.com]

> > So what is the correct layout for MASKED_SET action with nested fields?
> > 1. All nested values, followed by all nested masks, or
> > 2. For each nested field value followed by mask?
> >
> > I guess alternative 1, but just to be sure.
> 
> It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement
> and probably impossible to be properly validated.

OK. For outsiders this was far from obvious :-)

So, for OVS_ACTION_ATTR_SET_MASKED any nested attribute, no matter on which 
nesting level, must contain value directly followed by mask.

If that is the principle of handling masks in Netlink APIs, than we must adhere 
to it.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc

On Mon, 4 Sep 2017 14:45:50 +, Jan Scheurich wrote:
> So what is the correct layout for MASKED_SET action with nested fields?
> 1. All nested values, followed by all nested masks, or
> 2. For each nested field value followed by mask?
> 
> I guess alternative 1, but just to be sure.

It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement
and probably impossible to be properly validated.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc

On Tue, 5 Sep 2017 14:37:05 +0800, Yang, Yi wrote:
> I checked source code but can't find where we can prepopulate it and how
> we can deliver the prepopulated header to push_nsh, can you expand it in
> order that I can know how to do this in code level?

I already said that it's not implemented yet and can be done as a
future performance enhancement.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jiri Benc

On Tue, 5 Sep 2017 13:51:45 +0800, Yang, Yi wrote:
> But if we follow your way, how does nla_for_each_nested handle such
> pattern?
> 
> attribute1
> attribute1_mask
> attribute2
> attribute2_mask
> attribute3
> attribute3_mask

Uh? That will just work. Note that nla len contains the size of the
whole attribute, i.e. includes both fields.

> I don't think this can increase performance and readability.

Do you realize you're stating that not copying data around in hot path
can't increase performance? ;-)

> if we use one function to handle both attributes and masks, I can't
> see any substantial difference between two ways as far as the
> performance is concerned.

Except that what you did is so unexpected to netlink that you had to go
out of your way to parse it. Those two memcpys speak for themselves.

> If we consider OVS_KEY_ATTR_NSH as a big attribute, current way is
> precisely following current design pattern

Do you have an idea how nested netlink attributes work? It's very well
defined. What you're doing is breaking the definition. You can't do
that.

The (non-nested) attributes contain header followed by data. The data
is a single value or it is a struct. In ovs for masked actions,
attributes contain a struct of two fields (value and mask). The struct
access is open coded but it's still a struct.

Now, nested attributes are very well defined: it's a nla header
followed by a stream of attributes. There's no requirement on the order
of the attributes. Do you see how you're breaking this assumption? You
expect that each attribute is present exactly twice, once among first
half of attributes, once among second half of attributes. You don't
check that this weird assumption is adhered to. You don't even check
that the point where you split the data is between attributes! You may
very well be splitting an attribute in half and interpreting its data
as nla headers.

Consider your proposal NACKed from my side.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 8/8] netdev-dpdk: set FDIR config

From: Finn Christensen 

The Intel i40e PMD driver requires the fdir mode set to
RTE_FDIR_MODE_PERFECT, otherwise, the flow creation would
be failed.

Co-authored-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
---
---
 lib/netdev-dpdk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e363c92..beb7b68 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -177,6 +177,9 @@ static const struct rte_eth_conf port_conf = {
 .txmode = {
 .mq_mode = ETH_MQ_TX_NONE,
 },
+.fdir_conf = {
+.mode = RTE_FDIR_MODE_PERFECT,
+},
 };
 
 /*
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 7/8] netdev-dpdk: remove offloaded flow on deletion

Inovke netdev class '->flow_del' method on flow deletion. The dpdk netdev
implementation will then remove the rte flow associated with the ufid.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
---

v2: - check the returned "port" from dp_netdev_lookup_port
- error log when flow is not found
---
 lib/dpif-netdev.c | 10 --
 lib/netdev-dpdk.c | 19 ++-
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3099c73..2fbaa96 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1857,8 +1857,14 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 dpcls_remove(cls, >cr);
 cmap_remove(>flow_table, node, dp_netdev_flow_hash(>ufid));
 if (flow->has_mark) {
-cmap_remove(>mark_to_flow, mark_node, flow->mark);
-flow->has_mark = false;
+struct dp_netdev_port *port;
+
+port = dp_netdev_lookup_port(pmd->dp, in_port);
+if (port) {
+netdev_flow_del(port->netdev, >ufid, NULL);
+cmap_remove(>mark_to_flow, mark_node, flow->mark);
+flow->has_mark = false;
+}
 }
 flow->dead = true;
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 320fe80..e363c92 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3853,6 +3853,23 @@ netdev_dpdk_flow_put(struct netdev *netdev, struct match 
*match,
 actions_len, ufid, info);
 }
 
+static int
+netdev_dpdk_flow_del(struct netdev *netdev, const ovs_u128 *ufid,
+ struct dpif_flow_stats *stats OVS_UNUSED)
+{
+
+struct rte_flow *rte_flow = get_rte_flow_by_ufid(ufid);
+
+if (!rte_flow) {
+VLOG_ERR("failed to find flow associated with ufid " UUID_FMT "\n",
+ UUID_ARGS((struct uuid *)ufid));
+return -1;
+}
+
+return netdev_dpdk_destroy_rte_flow(netdev_dpdk_cast(netdev),
+ufid, rte_flow);
+}
+
 #define DPDK_FLOW_OFFLOAD_API \
 NULL,   /* flow_flush */  \
 NULL,   /* flow_dump_create */\
@@ -3860,7 +3877,7 @@ netdev_dpdk_flow_put(struct netdev *netdev, struct match 
*match,
 NULL,   /* flow_dump_next */  \
 netdev_dpdk_flow_put, \
 NULL,   /* flow_get */\
-NULL,   /* flow_del */\
+netdev_dpdk_flow_del, \
 NULL/* init_flow_api */
 
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry with queue action

From: Finn Christensen 

AFAIK, most (if not all) NICs (including Mellanox and Intel) do not
support a pure MARK action.  It's required to be used together with
some other actions, like QUEUE.

To workaround it, retry with a queue action when first try failed.

Moreover, some Intel's NIC (say XL710) needs the QUEUE action set
before the MARK action.

Co-authored-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
---

v2: - workarounded the queue action issue, which sets the queue index
  to 0 blindly.
- added some comments regarding to the retry with queue action
---
 lib/netdev-dpdk.c | 59 ---
 1 file changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 37b0f99..320fe80 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3324,6 +3324,7 @@ static struct ovs_mutex ufid_lock = OVS_MUTEX_INITIALIZER;
 struct ufid_dpdk_flow_data {
 struct hmap_node node;
 ovs_u128 ufid;
+int rxq;
 struct rte_flow *rte_flow;
 };
 
@@ -3366,7 +3367,8 @@ del_ufid_dpdk_flow_mapping(const ovs_u128 *ufid)
 
 /* Add ufid to dpdk_flow mapping */
 static inline void
-add_ufid_dpdk_flow_mapping(const ovs_u128 *ufid, struct rte_flow *rte_flow)
+add_ufid_dpdk_flow_mapping(const ovs_u128 *ufid, struct rte_flow *rte_flow,
+   int rxq)
 {
 size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
 struct ufid_dpdk_flow_data *data = xzalloc(sizeof(*data));
@@ -3381,6 +3383,7 @@ add_ufid_dpdk_flow_mapping(const ovs_u128 *ufid, struct 
rte_flow *rte_flow)
 
 data->ufid = *ufid;
 data->rte_flow = rte_flow;
+data->rxq = rxq;
 
 ovs_mutex_lock(_lock);
 hmap_insert(_dpdk_flow, >node, hash);
@@ -3664,15 +3667,51 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
 }
 add_flow_action(, RTE_FLOW_ACTION_TYPE_END, NULL);
 
+bool tried = 0;
+struct rte_flow_action_queue queue;
+again:
 flow = rte_flow_create(dev->port_id, _attr, patterns.items,
actions.actions, );
-if (!flow) {
+if (!flow && !tried && actions_len) {
+/*
+ * Seems most (if not all) NICs do not support a pure MARK
+ * action. It's required to be used together with some other
+ * actions, such as QUEUE.
+ *
+ * Thus, if flow creation fails, here we try again with
+ * QUEUE action.
+ *
+ */
+if (info->rxq < 0 || info->rxq >= netdev->n_rxq) {
+VLOG_ERR("invalid queue index: %d\n", info->rxq);
+ret = -1;
+goto out;
+}
+queue.index = info->rxq;
+
+/* re-build the action */
+actions.cnt = 0;
+/*
+ * NOTE: Intel PMD driver needs the QUEUE action set before
+ * the MARK action.
+ */
+add_flow_action(, RTE_FLOW_ACTION_TYPE_QUEUE, );
+add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
+add_flow_action(, RTE_FLOW_ACTION_TYPE_END, NULL);
+
+VLOG_INFO("failed to create rte flow, "
+  "try again with QUEUE action (queue index = %d)\n",
+  info->rxq);
+tried = true;
+
+goto again;
+} else if (!flow) {
 VLOG_ERR("rte flow creat error: %u : message : %s\n",
  error.type, error.message);
 ret = -1;
 goto out;
 }
-add_ufid_dpdk_flow_mapping(ufid, flow);
+add_ufid_dpdk_flow_mapping(ufid, flow, info->rxq);
 VLOG_INFO("installed flow %p by ufid "UUID_FMT"\n",
   flow, UUID_ARGS((struct uuid *)ufid));
 
@@ -3784,17 +3823,23 @@ netdev_dpdk_flow_put(struct netdev *netdev, struct 
match *match,
  const ovs_u128 *ufid, struct offload_info *info,
  struct dpif_flow_stats *stats OVS_UNUSED)
 {
-struct rte_flow *rte_flow;
+struct ufid_dpdk_flow_data *flow_data;
 int ret;
 
 /*
  * If an old rte_flow exists, it means it's a flow modification.
  * Here destroy the old rte flow first before adding a new one.
  */
-rte_flow = get_rte_flow_by_ufid(ufid);
-if (rte_flow) {
+flow_data = find_ufid_dpdk_flow_mapping(ufid);
+if (flow_data && flow_data->rte_flow) {
+/*
+ * there is no rxq given for flow modification. Instead, we
+ * retrieve it from the rxq firstly registered here (by flow
+ * add operation).
+ */
+info->rxq = flow_data->rxq;
 ret = netdev_dpdk_destroy_rte_flow(netdev_dpdk_cast(netdev),
-   ufid, rte_flow);
+   ufid, flow_data->rte_flow);
 if (ret < 0)
 return ret;
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH v2 5/8] dpif-netdev: record rx queue id for the upcall

From: Shachar Beiser 

For the DPDK flow offload, which basically just binds a MARK action to
a flow, the MARK is required to be used together with a QUEUE action for
the most NICs I'm aware of. The QUEUE action then needs a queue index,
which is not given in the flow content.

Here we record the rx queue while recieving the pkts to solve above issue.

Co-authored-by: Yuanhan Liu 
Signed-off-by: Shachar Beiser 
Signed-off-by: Yuanhan Liu 
---
 lib/dp-packet.h   |  1 +
 lib/dpif-netdev.c | 14 +-
 lib/netdev.c  |  1 +
 lib/netdev.h  |  1 +
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a7a062f..479a734 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -709,6 +709,7 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets 
in a batch. */
 struct dp_packet_batch {
 size_t count;
 bool trunc; /* true if the batch needs truncate. */
+int rxq;
 struct dp_packet *packets[NETDEV_MAX_BURST];
 };
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a95b8d4..3099c73 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2474,7 +2474,8 @@ out:
 static struct dp_netdev_flow *
 dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
struct match *match, const ovs_u128 *ufid,
-   const struct nlattr *actions, size_t actions_len)
+   const struct nlattr *actions, size_t actions_len,
+   int rxq)
 OVS_REQUIRES(pmd->flow_mutex)
 {
 struct dp_netdev_flow *flow;
@@ -2527,6 +2528,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 dp_netdev_alloc_flow_mark(pmd, _mark)) {
 struct dp_netdev_port *port;
 port = dp_netdev_lookup_port(pmd->dp, in_port);
+info.rxq = rxq;
 if (netdev_flow_put(port->netdev, match,
 CONST_CAST(struct nlattr *, actions),
 actions_len, ufid, , NULL) == 0) {
@@ -2607,7 +2609,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 if (put->flags & DPIF_FP_CREATE) {
 if (cmap_count(>flow_table) < MAX_FLOWS) {
 dp_netdev_flow_add(pmd, match, ufid, put->actions,
-   put->actions_len);
+   put->actions_len, -1);
 error = 0;
 } else {
 error = EFBIG;
@@ -2635,6 +2637,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 in_port = netdev_flow->flow.in_port.odp_port;
 port = dp_netdev_lookup_port(pmd->dp, in_port);
 info.flow_mark = netdev_flow->mark;
+info.rxq = -1;
 ret = netdev_flow_put(port->netdev, match,
   CONST_CAST(struct nlattr *,
  put->actions),
@@ -5026,7 +5029,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
  struct dp_packet *packet,
  const struct netdev_flow_key *key,
  struct ofpbuf *actions, struct ofpbuf *put_actions,
- int *lost_cnt, long long now)
+ int *lost_cnt, long long now, int rxq)
 {
 struct ofpbuf *add_actions;
 struct dp_packet_batch b;
@@ -5082,7 +5085,8 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
 if (OVS_LIKELY(!netdev_flow)) {
 netdev_flow = dp_netdev_flow_add(pmd, , ,
  add_actions->data,
- add_actions->size);
+ add_actions->size,
+ rxq);
 }
 ovs_mutex_unlock(>flow_mutex);
 emc_probabilistic_insert(pmd, key, netdev_flow);
@@ -5152,7 +5156,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 
 miss_cnt++;
 handle_packet_upcall(pmd, packets[i], [i], ,
- _actions, _cnt, now);
+ _actions, _cnt, now, packets_->rxq);
 }
 
 ofpbuf_uninit();
diff --git a/lib/netdev.c b/lib/netdev.c
index b4e570b..c9b7019 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -700,6 +700,7 @@ netdev_rxq_recv(struct netdev_rxq *rx, struct 
dp_packet_batch *batch)
 
 retval = rx->netdev->netdev_class->rxq_recv(rx, batch);
 if (!retval) {
+batch->rxq = rx->queue_id;
 COVERAGE_INC(netdev_received);
 } else {
 batch->count = 0;
diff --git a/lib/netdev.h b/lib/netdev.h
index 2003165..28ad39d 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -194,6 +194,7 @@ struct offload_info {
  * it will be in the pkt meta data.
  */
 uint32_t flow_mark;
+int rxq;
 };
 struct dpif_class;
 struct netdev_flow_dump;
-- 
2.7.4

[ovs-dev] [PATCH v2 4/8] netdev-dpdk: implement flow put with rte flow

From: Finn Christensen 

The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with a MARK action.
Afterwards, all pkts matches the flow will have the mark id in the mbuf.

For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.

Co-authored-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
---

v2: - convert some macros to functions
- do not hardcode the max number of flow/action
- fix L2 patterns for Intel nic
- add comments for not implemented offload methods
---
 lib/netdev-dpdk.c | 421 +-
 1 file changed, 420 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 46f9885..37b0f99 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -58,6 +58,7 @@
 #include "smap.h"
 #include "sset.h"
 #include "unaligned.h"
+#include "uuid.h"
 #include "timeval.h"
 #include "unixctl.h"
 
@@ -3400,6 +3401,424 @@ get_rte_flow_by_ufid(const ovs_u128 *ufid)
 }
 
 
+struct flow_patterns {
+struct rte_flow_item *items;
+int cnt;
+int max;
+};
+
+struct flow_actions {
+struct rte_flow_action *actions;
+int cnt;
+int max;
+};
+
+static void
+add_flow_pattern(struct flow_patterns *patterns, enum rte_flow_item_type type,
+ const void *spec, const void *mask)
+{
+int cnt = patterns->cnt;
+
+if (cnt == 0) {
+patterns->max = 8;
+patterns->items = xcalloc(patterns->max, sizeof(struct rte_flow_item));
+} else if (cnt == patterns->max) {
+patterns->max *= 2;
+patterns->items = xrealloc(patterns->items, patterns->max *
+   sizeof(struct rte_flow_item));
+}
+
+patterns->items[cnt].type = type;
+patterns->items[cnt].spec = spec;
+patterns->items[cnt].mask = mask;
+patterns->items[cnt].last = NULL;
+patterns->cnt++;
+}
+
+static void
+add_flow_action(struct flow_actions *actions, enum rte_flow_action_type type,
+const void *conf)
+{
+int cnt = actions->cnt;
+
+if (cnt == 0) {
+actions->max = 8;
+actions->actions = xcalloc(actions->max,
+   sizeof(struct rte_flow_action));
+} else if (cnt == actions->max) {
+actions->max *= 2;
+actions->actions = xrealloc(actions->actions, actions->max *
+sizeof(struct rte_flow_action));
+}
+
+actions->actions[cnt].type = type;
+actions->actions[cnt].conf = conf;
+actions->cnt++;
+}
+
+static int
+netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
+ const struct match *match,
+ struct nlattr *nl_actions OVS_UNUSED,
+ size_t actions_len,
+ const ovs_u128 *ufid,
+ struct offload_info *info)
+{
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+const struct rte_flow_attr flow_attr = {
+.group = 0,
+.priority = 0,
+.ingress = 1,
+.egress = 0
+};
+struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
+struct flow_actions actions = { .actions = NULL, .cnt = 0 };
+struct rte_flow *flow;
+struct rte_flow_error error;
+uint8_t *ipv4_next_proto_mask = NULL;
+int ret = 0;
+
+/* Eth */
+struct rte_flow_item_eth eth_spec;
+struct rte_flow_item_eth eth_mask;
+memset(_mask, 0, sizeof(eth_mask));
+if (match->wc.masks.dl_src.be16[0] ||
+match->wc.masks.dl_src.be16[1] ||
+match->wc.masks.dl_src.be16[2] ||
+match->wc.masks.dl_dst.be16[0] ||
+match->wc.masks.dl_dst.be16[1] ||
+match->wc.masks.dl_dst.be16[2]) {
+rte_memcpy(_spec.dst, >flow.dl_dst, sizeof(eth_spec.dst));
+rte_memcpy(_spec.src, >flow.dl_src, sizeof(eth_spec.src));
+eth_spec.type = match->flow.dl_type;
+
+rte_memcpy(_mask.dst, >wc.masks.dl_dst,
+   sizeof(eth_mask.dst));
+rte_memcpy(_mask.src, >wc.masks.dl_src,
+   sizeof(eth_mask.src));
+eth_mask.type = match->wc.masks.dl_type;
+
+add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
+ _spec, _mask);
+} else {
+/*
+ * If user specifies a flow (like UDP flow) without L2 patterns,
+ * OVS will at least set the dl_type. Normally, it's enough to
+ * create an eth pattern just with it. Unluckily, some Intel's
+ * NIC (such as XL710) doesn't support that. Below is a workaround,
+ * which simply matches any L2 pkts.
+ */
+add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL);
+}
+
+/* VLAN */
+struct rte_flow_item_vlan vlan_spec;
+

[ovs-dev] [PATCH v2 3/8] netdev-dpdk: convert ufid to dpdk flow

Flows offloaded to DPDK are identified by rte_flow pointer while OVS
flows are identified by ufid. This patches adds a hmap to convert ufid
to dpdk flow (rte_flow).

Most of the code are stolen from netdev-tc-offloads.c, with some
modificatons.

Some functions are marked as "inline", which is a trick to workaround
the temp "functiond defined but not used" warnings.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
---
 lib/netdev-dpdk.c | 88 +++
 1 file changed, 88 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f58e9be..46f9885 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dirs.h"
 #include "dp-packet.h"
@@ -3312,6 +3313,93 @@ unlock:
 return err;
 }
 
+/*
+ * A mapping from ufid to dpdk rte_flow pointer
+ */
+
+static struct hmap ufid_dpdk_flow = HMAP_INITIALIZER(_dpdk_flow);
+static struct ovs_mutex ufid_lock = OVS_MUTEX_INITIALIZER;
+
+struct ufid_dpdk_flow_data {
+struct hmap_node node;
+ovs_u128 ufid;
+struct rte_flow *rte_flow;
+};
+
+/*
+ * Find ufid_dpdk_flow_data node associated with @ufid
+ */
+static struct ufid_dpdk_flow_data *
+find_ufid_dpdk_flow_mapping(const ovs_u128 *ufid)
+{
+size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
+struct ufid_dpdk_flow_data *data = NULL;
+
+ovs_mutex_lock(_lock);
+HMAP_FOR_EACH_WITH_HASH (data, node, hash, _dpdk_flow) {
+if (ovs_u128_equals(*ufid, data->ufid)) {
+break;
+}
+}
+ovs_mutex_unlock(_lock);
+
+return data;
+}
+
+/*
+ * Remove ufid_dpdk_flow_data node associated with @ufid
+ */
+static inline void
+del_ufid_dpdk_flow_mapping(const ovs_u128 *ufid)
+{
+struct ufid_dpdk_flow_data *data;
+
+data = find_ufid_dpdk_flow_mapping(ufid);
+if (data) {
+ovs_mutex_lock(_lock);
+hmap_remove(_dpdk_flow, >node);
+free(data);
+ovs_mutex_unlock(_lock);
+}
+}
+
+/* Add ufid to dpdk_flow mapping */
+static inline void
+add_ufid_dpdk_flow_mapping(const ovs_u128 *ufid, struct rte_flow *rte_flow)
+{
+size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
+struct ufid_dpdk_flow_data *data = xzalloc(sizeof(*data));
+
+/*
+ * We should not simply overwrite an existing rte flow.
+ * We should have deleted it first before re-adding it.
+ * Thus, if following assert triggers, something is wrong:
+ * the rte_flow is not destroyed.
+ */
+ovs_assert(find_ufid_dpdk_flow_mapping(ufid) == NULL);
+
+data->ufid = *ufid;
+data->rte_flow = rte_flow;
+
+ovs_mutex_lock(_lock);
+hmap_insert(_dpdk_flow, >node, hash);
+ovs_mutex_unlock(_lock);
+}
+
+/* Get rte_flow by ufid.
+ *
+ * Returns rte rte_flow if successful. Otherwise returns 0.
+ */
+static inline struct rte_flow *
+get_rte_flow_by_ufid(const ovs_u128 *ufid)
+{
+struct ufid_dpdk_flow_data *data;
+
+data = find_ufid_dpdk_flow_mapping(ufid);
+return data ? data->rte_flow : NULL;
+}
+
+
 #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT,\
   SET_CONFIG, SET_TX_MULTIQ, SEND,\
   GET_CARRIER, GET_STATS, \
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2 2/8] dpif-netdev: retrieve flow directly from the flow mark

So that we could skip the heavy emc processing, notably, the
miniflow_extract function. A simple PHY-PHY forwarding testing
shows 53% performance improvement.

Note that though the heavy miniflow_extract is skipped, we
still have to do per packet checking, due to we have to check
the tcp_flags.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
---

v2: update tcp_flags, which also fixes the build warnings
---
 lib/dp-packet.h   | 13 ++
 lib/dpif-netdev.c | 27 ++-
 lib/flow.c| 78 +++
 lib/flow.h|  1 +
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 046f3ab..a7a062f 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -691,6 +691,19 @@ reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
 #define reset_dp_packet_checksum_ol_flags(arg)
 #endif
 
+static inline bool
+dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED,
+uint32_t *mark OVS_UNUSED)
+{
+#ifdef DPDK_NETDEV
+if (p->mbuf.ol_flags & PKT_RX_FDIR_ID) {
+*mark = p->mbuf.hash.fdir.hi;
+return true;
+}
+#endif
+return false;
+}
+
 enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f3b7f25..a95b8d4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4883,10 +4883,10 @@ struct packet_batch_per_flow {
 static inline void
 packet_batch_per_flow_update(struct packet_batch_per_flow *batch,
  struct dp_packet *packet,
- const struct miniflow *mf)
+ uint16_t tcp_flags)
 {
 batch->byte_count += dp_packet_size(packet);
-batch->tcp_flags |= miniflow_get_tcp_flags(mf);
+batch->tcp_flags |= tcp_flags;
 batch->array.packets[batch->array.count++] = packet;
 }
 
@@ -4921,7 +4921,7 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 
 static inline void
 dp_netdev_queue_batches(struct dp_packet *pkt,
-struct dp_netdev_flow *flow, const struct miniflow *mf,
+struct dp_netdev_flow *flow, uint16_t tcp_flags,
 struct packet_batch_per_flow *batches,
 size_t *n_batches)
 {
@@ -4932,7 +4932,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
 packet_batch_per_flow_init(batch, flow);
 }
 
-packet_batch_per_flow_update(batch, pkt, mf);
+packet_batch_per_flow_update(batch, pkt, tcp_flags);
 }
 
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
@@ -4960,11 +4960,13 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 const size_t size = dp_packet_batch_size(packets_);
 uint32_t cur_min;
 int i;
+uint16_t tcp_flags;
 
 atomic_read_relaxed(>dp->emc_insert_min, _min);
 
 DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, packets_) {
 struct dp_netdev_flow *flow;
+uint32_t flow_mark;
 
 if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
 dp_packet_delete(packet);
@@ -4972,6 +4974,16 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 continue;
 }
 
+if (dp_packet_has_flow_mark(packet, _mark)) {
+flow = dp_netdev_pmd_find_flow_by_mark(pmd, flow_mark);
+if (flow) {
+tcp_flags = parse_tcp_flags(packet);
+dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
+n_batches);
+continue;
+}
+}
+
 if (i != size - 1) {
 struct dp_packet **packets = packets_->packets;
 /* Prefetch next packet data and metadata. */
@@ -4989,7 +5001,8 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 /* If EMC is disabled skip emc_lookup */
 flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
 if (OVS_LIKELY(flow)) {
-dp_netdev_queue_batches(packet, flow, >mf, batches,
+tcp_flags = miniflow_get_tcp_flags(>mf);
+dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
 n_batches);
 } else {
 /* Exact match cache missed. Group missed packets together at
@@ -5166,7 +5179,9 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 flow = dp_netdev_flow_cast(rules[i]);
 
 emc_probabilistic_insert(pmd, [i], flow);
-dp_netdev_queue_batches(packet, flow, [i].mf, batches, n_batches);
+dp_netdev_queue_batches(packet, flow,
+miniflow_get_tcp_flags([i].mf),
+batches, n_batches);
 }
 
 dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
diff --git

[ovs-dev] [PATCH v2 1/8] dpif-netdev: associate flow with a mark id

This patch associate a flow with a mark id (a uint32_t number) by CMAP.

It re-uses the flow API (more precisely, the ->flow_put method) to setup
the hw flow. The flow_put implementation then is supposed to create a
flow with MARK action.

Co-authored-by: Finn Christensen 
Signed-off-by: Yuanhan Liu 
Signed-off-by: Finn Christensen 
---
 lib/dpif-netdev.c | 85 +++
 lib/netdev.h  |  6 
 2 files changed, 91 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 071ec14..f3b7f25 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -446,6 +446,12 @@ struct dp_netdev_flow {
 const unsigned pmd_id;   /* The 'core_id' of pmd thread owning this */
  /* flow. */
 
+const struct cmap_node mark_node;   /* In owning dp_netdev_pmd_thread's */
+/* 'mark_to_flow' */
+bool has_mark;   /* A flag to tell whether this flow has a
+valid mark asscoiated with it. */
+uint32_t mark;   /* Unique flow mark assiged to a flow */
+
 /* Number of references.
  * The classifier owns one reference.
  * Any thread trying to keep a rule from being freed should hold its own
@@ -569,6 +575,8 @@ struct dp_netdev_pmd_thread {
 struct ovs_mutex flow_mutex;
 struct cmap flow_table OVS_GUARDED; /* Flow table. */
 
+struct cmap mark_to_flow;
+
 /* One classifier per in_port polled by the pmd */
 struct cmap classifiers;
 /* Periodically sort subtable vectors according to hit frequencies */
@@ -1839,6 +1847,8 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 OVS_REQUIRES(pmd->flow_mutex)
 {
 struct cmap_node *node = CONST_CAST(struct cmap_node *, >node);
+struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
+ >mark_node);
 struct dpcls *cls;
 odp_port_t in_port = flow->flow.in_port.odp_port;
 
@@ -1846,6 +1856,10 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 ovs_assert(cls != NULL);
 dpcls_remove(cls, >cr);
 cmap_remove(>flow_table, node, dp_netdev_flow_hash(>ufid));
+if (flow->has_mark) {
+cmap_remove(>mark_to_flow, mark_node, flow->mark);
+flow->has_mark = false;
+}
 flow->dead = true;
 
 dp_netdev_flow_unref(flow);
@@ -2235,6 +2249,38 @@ dp_netdev_pmd_find_flow(const struct 
dp_netdev_pmd_thread *pmd,
 return NULL;
 }
 
+static struct dp_netdev_flow *
+dp_netdev_pmd_find_flow_by_mark(const struct dp_netdev_pmd_thread *pmd,
+const uint32_t mark)
+{
+struct dp_netdev_flow *netdev_flow;
+
+CMAP_FOR_EACH_WITH_HASH (netdev_flow, mark_node, mark,
+ >mark_to_flow) {
+if (netdev_flow->has_mark && netdev_flow->mark == mark) {
+return netdev_flow;
+}
+}
+
+return NULL;
+}
+
+static bool
+dp_netdev_alloc_flow_mark(const struct dp_netdev_pmd_thread *pmd,
+  uint32_t *mark)
+{
+uint32_t i;
+
+for (i = 0; i < UINT32_MAX; i++) {
+if (!dp_netdev_pmd_find_flow_by_mark(pmd, i)) {
+*mark = i;
+return true;
+}
+}
+
+return false;
+}
+
 static void
 get_dpif_flow_stats(const struct dp_netdev_flow *netdev_flow_,
 struct dpif_flow_stats *stats)
@@ -2434,6 +2480,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 struct dp_netdev_flow *flow;
 struct netdev_flow_key mask;
 struct dpcls *cls;
+struct offload_info info;
 
 /* Make sure in_port is exact matched before we read it. */
 ovs_assert(match->wc.masks.in_port.odp_port == ODPP_NONE);
@@ -2460,6 +2507,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 memset(>stats, 0, sizeof flow->stats);
 flow->dead = false;
 flow->batch = NULL;
+flow->has_mark = false;
 *CONST_CAST(unsigned *, >pmd_id) = pmd->core_id;
 *CONST_CAST(struct flow *, >flow) = match->flow;
 *CONST_CAST(ovs_u128 *, >ufid) = *ufid;
@@ -2475,6 +2523,22 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 cmap_insert(>flow_table, CONST_CAST(struct cmap_node *, >node),
 dp_netdev_flow_hash(>ufid));
 
+if (netdev_is_flow_api_enabled() &&
+dp_netdev_alloc_flow_mark(pmd, _mark)) {
+struct dp_netdev_port *port;
+port = dp_netdev_lookup_port(pmd->dp, in_port);
+if (netdev_flow_put(port->netdev, match,
+CONST_CAST(struct nlattr *, actions),
+actions_len, ufid, , NULL) == 0) {
+flow->has_mark = true;
+flow->mark = info.flow_mark;
+cmap_insert(>mark_to_flow,
+CONST_CAST(struct cmap_node *, >mark_node),
+info.flow_mark);
+

[ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also reuses the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
about 54% performance boost. For PHY-vhost forwarding, I got about 41%
performance boost. The reason it's lower than v1 is I added the logic
to get the correct tcp_flags, which examines all packets recieved.

The major issue mentioned in last version is also workarounded: the
queue index is never set to 0 blindly anymore, but set to the rxq that
first receives the upcall pkt.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true


v2: - workaround the queue action issue
- fixed the tcp_flags being skipped issue, which also fixed the
  build warnings
- fixed l2 patterns for Intel nic
- Converted some macros to functions
- did not hardcode the max number of flow/action
- rebased on top of the lastest code

Thanks.

--yliu


---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Shachar Beiser (1):
  dpif-netdev: record rx queue id for the upcall

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  14 ++
 lib/dpif-netdev.c | 132 +++--
 lib/flow.c|  78 
 lib/flow.h|   1 +
 lib/netdev-dpdk.c | 574 +-
 lib/netdev.c  |   1 +
 lib/netdev.h  |   7 +
 7 files changed, 795 insertions(+), 12 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote:
> 
> 
> On 8/31/17, 3:13 AM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> > 
> > [Finn]
> > 
> > I think we should not further intermix the rxqs distributed to 
> different pmd's, other than initially configured, when setting up hw-offload. 
> If we make a round-robin distribution of the rxqs, a different pmd will most 
> likely receive the hw-offloaded packets - not the same pmd that ran the 
> slow-path originally creating the flow.
> > 
> > It is usual to optimize caches etc. per pmd and that would not work 
> then. Maybe the further processing of the hw-offloaded packets does not need 
> these optimizations at the moment, however, IMHO I think we would be better 
> off using the first proposal above (use the same rxq as the one creating the 
> flow).
> > 
> > [Darrell] Several ideas have some validity.
> >  However, this sounds reasonable and simple and we 
> could revisit as needed.
> >  What do you think Yuanhan ?
> 
> Just want to make sure we are on the same page: do you mean the original
> solution/workaround I mentioned in the cover letter: record the rxq at
> recv and pass it down to flow creation?
> 
> If so, I'm okay with it.
> 
> [Darrell]
> This is the relevant part from the cover letter:
> 
> “One possible
>  solution is to record the rxq and pass it down to the flow creation
>  stage. It would be much better, but it's still far away from being perfect.
>  Because it might have changed the steering rules stealthily, which may
>  break the default RSS setup by OVS-DPDK.”
> 
> This is a reasonable first cut.
> However, the flows installed are masked flows but the associated packets 
> would ‘normally’ end up on multiple
> PMDs due to RSS, right ?

Why it's "multiple PMDs due to RSS"?　Isn't RSS for distributing packets
to multiple RX queues inside the NIC?

--yliu

> But for HWOL, we specify ‘the queue’ to be the one we receive the first 
> packet from. 
> This is what I was getting at b4. So, future workarounds would be 
> ‘auto-splitting flows’ across queues, user specified flow->queue
> associations etc
> 
> 
> 
> 
> 
> 
>   --yliu
> 
> 
> 
> 
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 4/7] netdev-dpdk: implement flow put with rte flow

On Fri, Sep 01, 2017 at 10:38:37PM +, Darrell Ball wrote:
> 
> 
> On 8/29/17, 7:33 PM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 02:02:23AM +, Darrell Ball wrote:
> > 
> > > +#define MAX_RTE_FLOW_ITEMS  100
> > > +#define MAX_RTE_FLOW_ACTIONS100
> > > 
> > > I guess these are temporary
> > 
> > Yes, the hardcoded number is really hacky.
> > 
> > > Do we need to do a rte query during initialization ?
> > 
> > query on what?
> > 
> > [Darrell]
> > I mean somehow the max hardware resources available at
> > dev initialization time ? I realize this is non-trivial overall.
> 
> I see you point then. I don't think it's needed then. I'm also not
> aware of there are such limitations existed (say, how many patterns
> are supported).  I think we just add patterns as many as we can and
> let the driver to figure out the rest. If the flow creation is failed,
> we skip the hw offload, with an error message provided.
> 
> [Darrell]
> I understand the present intention.
> But for future enhancements, maybe it would be good to display the max 
> capability/capacity and
> remaining capacity to the user in some way.

Agreed, and that's also what in my mind. It's some work in DPDK though.

> This brings back another discussion point: having user specification of HWOL 
> flows
> is starting to look more useful,

Are you introducing some new (CLI) interfaces? Could you give a bit more
detailes here?

> as it helps the queue action issue and HWOL
> capacity planning/predictability for high value flows.

Yes, I would think so.

--yliu
>  
> > > static (inline) function maybe ?
> > 
> > Indeed. I'm not a big fan of macro like this. Let me turn it to 
> function
> > next time. I see no reason to make it inline (explicitly) though: 
> it's
> > not in data path. Moreover, it's likely it will be inlined 
> implicitly by
> > compiler.
> > 
> > [Darrell]
> > I put ‘(inline)’ in parentheses because we would never specify it 
> explicitly,
> > since gcc would likely inline anyways. 
> 
> I see. Sorry for misunderstanding.
> 
>   --yliu
> 
> 
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 7/7] keepalive: Add support to query keepalive status and statistics.

Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 7/7] keepalive: Add support to query keepalive
> status and statistics.
> 
> This commit adds support to query keepalive status and statistics.
> 
>   $ ovs-appctl keepalive/status
> keepAlive Status: Enabled
> 
>   $ ovs-appctl keepalive/pmd-health-show
> 
>   Keepalive status
> 
> keepalive status  : Enabled
> keepalive interval: 1000 ms
> PMD threads   : 4
> 
>  PMDCORESTATE   LAST SEEN TIMESTAMP(UTC)
> pmd620  ALIVE   21 Aug 2017 16:29:31
> pmd631  ALIVE   21 Aug 2017 16:29:31
> pmd642  ALIVE   21 Aug 2017 16:29:31
> pmd653  GONE21 Aug 2017 16:26:31
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/keepalive.c | 97 
> +
>  1 file changed, 97 insertions(+)
> 
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 2497f00..119e351 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -22,10 +22,12 @@
> 
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
> +#include "openvswitch/dynamic-string.h"
>  #include "openvswitch/vlog.h"
>  #include "ovs-thread.h"
>  #include "process.h"
>  #include "timeval.h"
> +#include "unixctl.h"
> 
>  VLOG_DEFINE_THIS_MODULE(keepalive);
> 
> @@ -295,6 +297,95 @@ ka_stats_run(void)
>  return ka_stats;
>  }
> 
> +static void
> +ka_unixctl_status(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +  const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +ds_put_format(, "keepAlive Status: %s",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
> +static void
> +ka_unixctl_pmd_health_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
> +   const char *argv[] OVS_UNUSED, void *ka_info_)
> +{
> +struct ds ds = DS_EMPTY_INITIALIZER;
> +ds_put_format(,
> +  "\n\t\tKeepalive status\n\n");
> +
> +ds_put_format(, "keepalive status  : %s\n",
> +  ka_is_enabled() ? "Enabled" : "Disabled");
> +
> +if (!ka_is_enabled()) {
> +goto out;
> +}
> +
> +ds_put_format(, "keepalive interval: %"PRIu32" ms\n",
> +  get_ka_interval());
> +
> +struct keepalive_info *ka_info = (struct keepalive_info *)ka_info_;
> +if (OVS_UNLIKELY(!ka_info)) {
> +goto out;
> +}
> +
> +ds_put_format(, "PMD threads   : %"PRIu32" \n", ka_info->pmd_cnt);
> +ds_put_format(,
> +  "\n PMD\tCORE\tSTATE\tLAST SEEN TIMESTAMP(UTC)\n");
> +
> +struct ka_process_info *pinfo, *pinfo_next;
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
> +char *state = NULL;
> +
> +if (pinfo->core_state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->core_state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_DOZING:
> +state = "DOZING";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +}

[Antonio]
Quite similarly to comment in patch #2, I'd add to the switch
statement at the end something like?
+default:
+VLOG_DBG("Unexpected %d value for core_state.", pinfo->core_state);

> +
> +char *utc = xastrftime_msec("%d %b %Y %H:%M:%S",
> +pinfo->core_last_seen_times, true);
> +
> +ds_put_format(, "%s\t%2d\t%s\t%s\n",
> +  pinfo->name, pinfo->core_id, state, utc);
> +
> +free(utc);
> +}
> +ovs_mutex_unlock(_info->proclist_mutex);
> +
> +ds_put_format(, "\n");
> +out:
> +unixctl_command_reply(conn, ds_cstr());
> +ds_destroy();
> +}
> +
>  /* Dispatch heartbeats. */
>  void
>  dispatch_heartbeats(void)
> @@ -412,6 +503,12 @@ ka_init(const struct smap *ovs_other_config)
>  ka_init_status = ka_init_success;
>  }
> 
> +unixctl_command_register("keepalive/status", "", 0, 0,
> +  ka_unixctl_status, NULL);
> +
> +unixctl_command_register("keepalive/pmd-health-show", "", 0,

Re: [ovs-dev] [PATCH v4 5/7] keepalive: Retrieve PMD status periodically.

Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 5/7] keepalive: Retrieve PMD status periodically.
> 
> This commit implements APIs to retrieve the PMD thread status and return
> the status in the below format for each PMD thread.
> 
>   Format: pmdid="status,core id,last_seen_timestamp(epoch)"
>   eg: pmd62="ALIVE,2,150332575"
>   pmd63="GONE,3,150332525"
> 
> The status is periodically retrieved by keepalive thread and stored in
> keepalive_stats struc which later shall be retrieved by vswitchd thread.
> In case of four PMD threads the status is as below:
> 
>"pmd62"="ALIVE,0,150332575"
>"pmd63"="ALIVE,1,150332575"
>"pmd64"="ALIVE,2,150332575"
>"pmd65"="ALIVE,3,150332575"
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/dpif-netdev.c |  1 +
>  lib/keepalive.c   | 69 
> +++
>  lib/keepalive.h   |  1 +
>  3 files changed, 71 insertions(+)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 67ee424..8475a24 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -990,6 +990,7 @@ ovs_keepalive(void *f_)
>  int n_pmds = cmap_count(>poll_threads) - 1;
>  if (n_pmds > 0) {
>  dispatch_heartbeats();
> +get_ka_stats();
>  }
> 
>  xusleep(get_ka_interval() * 1000);
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> index 4ee89c0..9fd71b2 100644
> --- a/lib/keepalive.c
> +++ b/lib/keepalive.c
> @@ -23,6 +23,7 @@
>  #include "keepalive.h"
>  #include "lib/vswitch-idl.h"
>  #include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
>  #include "process.h"
>  #include "timeval.h"
> 
> @@ -33,6 +34,9 @@ static bool ka_init_status = ka_init_failure; /* Keepalive
> initialization */
>  static uint32_t keepalive_timer_interval; /* keepalive timer interval */
>  static struct keepalive_info *ka_info = NULL;
> 
> +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
> +static struct smap *keepalive_stats OVS_GUARDED_BY(mutex);
> +
>  inline bool
>  ka_is_enabled(void)
>  {
> @@ -211,6 +215,71 @@ ka_get_timer_interval(const struct smap *ovs_other_config
> OVS_UNUSED)
>  return ka_interval;
>  }
> 
> +static void
> +get_pmd_status(struct smap *ka_pmd_stats)
> +OVS_REQUIRES(ka_info->proclist_mutex)
> +{
> +if (OVS_UNLIKELY(!ka_info)) {
> +return;
> +}
> +
> +struct ka_process_info *pinfo, *pinfo_next;
> +HMAP_FOR_EACH_SAFE (pinfo, pinfo_next, node, _info->process_list) {
> +int core_id = pinfo->core_id;
> +char *state = NULL;
> +if (pinfo->core_state == KA_STATE_UNUSED) {
> +continue;
> +}
> +
> +switch (pinfo->core_state) {
> +case KA_STATE_ALIVE:
> +state = "ALIVE";
> +break;
> +case KA_STATE_MISSING:
> +state = "MISSING";
> +break;
> +case KA_STATE_DEAD:
> +state = "DEAD";
> +break;
> +case KA_STATE_GONE:
> +state = "GONE";
> +break;
> +case KA_STATE_DOZING:
> +state = "DOZING";
> +break;
> +case KA_STATE_SLEEP:
> +state = "SLEEP";
> +break;
> +case KA_STATE_UNUSED:
> +break;
> +}

[Antonio]
Quite similarly to comment in patch #2, I'd add to the switch
statement at the end something like?
+default:
+VLOG_DBG("Unexpected %d value for core_state.", pinfo->core_state);


> +
> +smap_add_format(ka_pmd_stats, pinfo->name, "%s,%d,%ld",
> +state, core_id, pinfo->core_last_seen_times);
> +}
> +}
> +
> +void
> +get_ka_stats(void)
> +{
> +struct smap *ka_pmd_stats;
> +ka_pmd_stats = xmalloc(sizeof *ka_pmd_stats);
> +smap_init(ka_pmd_stats);
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +get_pmd_status(ka_pmd_stats);
> +ovs_mutex_unlock(_info->proclist_mutex);
> +
> +ovs_mutex_lock();
> +if (keepalive_stats) {
> +smap_destroy(keepalive_stats);
> +free(keepalive_stats);
> +keepalive_stats = NULL;
> +}
> +keepalive_stats = ka_pmd_stats;
> +ovs_mutex_unlock();
> +}
> +
>  /* Dispatch heartbeats. */
>  void
>  dispatch_heartbeats(void)
> diff --git a/lib/keepalive.h b/lib/keepalive.h
> index a344006..f5da460 100644
> --- a/lib/keepalive.h
> +++ b/lib/keepalive.h
> @@ -100,6 +100,7 @@ uint32_t get_ka_interval(void);
>  int get_ka_init_status(void);
>  int ka_alloc_portstats(unsigned, int);
>  void ka_destroy_portstats(void);
> +void get_ka_stats(void);
> 
>  void dispatch_heartbeats(void);
>  #endif /* keepalive.h */
> --
> 2.4.11
> 
>

Re: [ovs-dev] [PATCH v4 2/7] Keepalive: Add initial keepalive support.

Comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 2/7] Keepalive: Add initial keepalive support.
> 
> This commit introduces the initial keepalive support by adding
> 'keepalive' module and also helper and initialization functions
> that will be invoked by later commits.
> 
> This commit adds new ovsdb column "keepalive" that shows the status
> of the datapath threads. This is implemented for DPDK datapath and
> only status of PMD threads is reported.
> 
> For eg:
>   To enable keepalive feature.
>   'ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true'

[Antonio]
It would help saying that this command must be run 'before'
launching ovs-vswitchd otherwise has no effect.

> 
>   To set timer interval of 5000ms for monitoring packet processing cores.
>   'ovs-vsctl --no-wait set Open_vSwitch . \
>  other_config:keepalive-interval="5000"
> 
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/automake.mk|   2 +
>  lib/keepalive.c| 183 
> +
>  lib/keepalive.h|  87 +
>  vswitchd/bridge.c  |   3 +
>  vswitchd/vswitch.ovsschema |   8 +-
>  vswitchd/vswitch.xml   |  49 
>  6 files changed, 330 insertions(+), 2 deletions(-)
>  create mode 100644 lib/keepalive.c
>  create mode 100644 lib/keepalive.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 2415f4c..0d99f0a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -110,6 +110,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/json.c \
>   lib/jsonrpc.c \
>   lib/jsonrpc.h \
> + lib/keepalive.c \
> + lib/keepalive.h \
>   lib/lacp.c \
>   lib/lacp.h \
>   lib/latch.h \
> diff --git a/lib/keepalive.c b/lib/keepalive.c
> new file mode 100644
> index 000..ac73a42
> --- /dev/null
> +++ b/lib/keepalive.c
> @@ -0,0 +1,183 @@
> +/*
> + * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "keepalive.h"
> +#include "lib/vswitch-idl.h"
> +#include "openvswitch/vlog.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(keepalive);
> +
> +static bool keepalive_enable = false;/* Keepalive disabled by default */
> +static bool ka_init_status = ka_init_failure; /* Keepalive initialization */
> +static uint32_t keepalive_timer_interval; /* keepalive timer interval */
> +static struct keepalive_info *ka_info = NULL;
> +
> +inline bool
> +ka_is_enabled(void)
> +{
> +return keepalive_enable;
> +}
> +
> +inline int
> +ka_get_pmd_tid(unsigned core_idx)
> +{
> +if (ka_is_enabled()) {
> +return ka_info->thread_id[core_idx];
> +}
> +
> +return -EINVAL;
> +}
> +
> +void
> +ka_set_pmd_state_ts(unsigned core_id, enum keepalive_state state,
> +uint64_t last_alive)
> +{
> +struct ka_process_info *pinfo;
> +int tid = ka_get_pmd_tid(core_id);
> +
> +ovs_mutex_lock(_info->proclist_mutex);
> +HMAP_FOR_EACH_WITH_HASH (pinfo, node, hash_int(tid, 0),
> + _info->process_list) {
> +if ((pinfo->core_id == core_id) && (pinfo->tid == tid)) {
> +pinfo->core_state = state;
> +pinfo->core_last_seen_times = last_alive;
> +}
> +}
> +ovs_mutex_unlock(_info->proclist_mutex);
> +}
> +
> +/* Retrieve and return the keepalive timer interval from OVSDB. */
> +static uint32_t
> +ka_get_timer_interval(const struct smap *ovs_other_config OVS_UNUSED)
> +{
> +#define OVS_KEEPALIVE_TIMEOUT 1000/* Default timeout set to 1000ms */
> +uint32_t ka_interval;
> +
> +/* Timer granularity in milliseconds
> + * Defaults to OVS_KEEPALIVE_TIMEOUT(ms) if not set */
> +ka_interval = smap_get_int(ovs_other_config, "keepalive-interval",
> +  OVS_KEEPALIVE_TIMEOUT);
> +
> +VLOG_INFO("Keepalive timer interval set to %"PRIu32" (ms)\n",
> ka_interval);
> +return ka_interval;
> +}
> +
> +/*
> + * This function shall be invoked periodically to write the core status and
> + * last seen timestamp of the cores in

Re: [ovs-dev] [PATCH v4 0/7] Add OVS DPDK keep-alive functionality.

Hi Bhanu,
I added some comments on patches #2, 5 and 7.

Besides that LGTM. I applied this patch series to 
commit 84d2723305064e25402cb89a16bf7ad1aa2cda70
and it works as expected.

-Antonio

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org]
> On Behalf Of Bhanuprakash Bodireddy
> Sent: Tuesday, August 22, 2017 10:41 AM
> To: d...@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: [ovs-dev] [PATCH v4 0/7] Add OVS DPDK keep-alive functionality.
> 
> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing cores(PMD thread cores) by dispatching heartbeats at regular
> intervals. Incase of heartbeat misses additional health checks are
> enabled on the PMD thread to detect the failure and the same shall be
> reported to higher level fault management systems/frameworks.
> 
> The implementation uses OVSDB for reporting the health of the PMD threads.
> Any external monitoring application can read the status from OVSDB at
> regular intervals (or) subscribe to the updates in OVSDB so that they get
> notified when the changes happen on OVSDB.
> 
> keepalive info struct is created and initialized for storing the
> status of the PMD threads. This is initialized by main thread(vswitchd)
> as part of init process and will be periodically updated by 'keepalive'
> thread. keepalive feature can be enabled through below OVSDB settings.
> 
> enable-keepalive=true
>   - Keepalive feature is disabled by default.
> 
> keepalive-interval="5000"
>   - Timer interval in milliseconds for monitoring the packet
> processing cores.
> 
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd cores
> in keepalive info struct. This information shall be read by vswitchd thread
> and write the status in to 'keepalive' column of Open_vSwitch table in OVSDB.
> 
> An external monitoring framework like collectd with ovs events support
> can read (or) subscribe to the datapath status changes in ovsdb. When the 
> state
> is updated, the collectd shall be notified and will eventually relay the 
> status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
> 
> Compute NodeControllerCompute Node
> 
> Collectd  <--> Ceilometer <>   Collectd
> 
> OvS DPDK   OvS DPDK
> 
> +-+
> | VM  |
> +--+--+
> \---+---/
> |
> +--+---+   ++--+ +--+---+
> | OVS  |-> |   ovsevents plugin| --> |   collectd   |
> +--+---+   ++--+ +--+---+
> 
> +--+-+ +---++ |
> | Ceilometer | <-- | collectd ceilometer plugin |  <---
> +--+-+ +---++
> 
> github: The patches can be found here:
>   https://github.com/bbodired/ovs (Last master commit e7cd8c363)
> 
> Performance impact:
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
> 
> -
> v3 -> v4
>   * Split the functionality in to 2 parts. This patch series only updates
> PMD status to OVSDB. The incremental patch series to handle false
> positives,
> negatives and more checking and stats.
>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>   * Merged few patches and simplified the patch series.
>   * Timestamp in human readable form.
> 
> v2 -> v3
>   * Rebase.
>   * Verified with dpdk-stable-17.05.1 release.
>   * Fixed build issues with MSVC and cross checked with appveyor.
> 
> v1 -> v2
>   * Rebase
>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
> is already applied as separate patch.
> 
> RFCv3 -> v1
>   * Made changes to fix failures in some unit test cases.
>   * some more code cleanup w.r.t process related APIs.
> 
> RFCv2 -> RFCv3
>   * Remove POSIX shared memory block implementation (suggested by Aaron).
>   * Rework the logic to register and track threads instead of cores. This way
> in the future any thread can be registered to KA framework. For now only
> PMD
> threads are tracked (suggested by Aaron).
>   * Refactor few APIs and further clean up the code.
> 
> RFCv1 -> RFCv2
>   * Merged the xml and schema commits to later commit where the actual
> implementation is done(suggested by Ben).
>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-show,
> pmd-xstats-show.
>   * Refactor code and fixed APIs dealing with PMD health monitoring.
> 
> 
> Bhanuprakash Bodireddy (7):
>   process: Extend get_process_info() for additional fields.
>   Keepalive: Add initial keepalive support.
>

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support

On Mon, Sep 04, 2017 at 06:42:16PM +0800, Jiri Benc wrote:
> On Mon, 4 Sep 2017 16:00:05 +0800, Yang, Yi wrote:
> > how can we know next push_nsh uses the same nsh header as previous
> > one?
> 
> We store the prepopulated header together with the action.
>

I checked source code but can't find where we can prepopulate it and how
we can deliver the prepopulated header to push_nsh, can you expand it in
order that I can know how to do this in code level?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v7] openvswitch: enable NSH support