Re: [ovs-dev] [PATCH v5] netdev-dpdk: fix ifindex assignment for DPDK ports

2017-04-03 Thread Darrell Ball


On 4/3/17, 5:27 AM, "ovs-dev-boun...@openvswitch.org on behalf of Przemyslaw 
Lal"  
wrote:

In current implementation port_id is used as an ifindex for all netdev-dpdk
interfaces.

For physical DPDK interfaces using port_id as ifindex causes that '0' is 
set as
ifindex for 'dpdk0' interface, '1' for 'dpdk1' and so on. For the DPDK vHost
interfaces ifindexes are not even assigned (0 is used by default) due to the
fact that vHost ports don't use port_id field from the DPDK library.

This causes multiple negative side-effects. First of all 0 is an invalid
ifindex value. The other issue is possible overlapping of 'dpdkX' interfaces
ifindex values with the ifindexes of kernel space interfaces which may cause
problems in any external tools that use those values. Neither 'dpdk0', nor 
any
DPDK vHost interfaces are visible in sFlow collector tools, as all 
interfaces
with ifindexes smaller than 1 are ignored.

Proposed solution to these issues is to calculate a hash of interface's name
and use calculated value as an ifindex. This way interfaces keep their
ifindexes during OVS-DPDK restarts, ports re-initialization events, etc., 
show
up in sFlow collectors and meet RFC 2863 specification regarding re-using
ifindex values by the same virtual interfaces and maximum ifindex value.

Signed-off-by: Przemyslaw Lal 
---
 lib/netdev-dpdk.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ddc651b..687b0a5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2215,7 +2215,11 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev)
 int ifindex;
 
 ovs_mutex_lock(>mutex);
-ifindex = dev->port_id;
+/* Calculate hash from the netdev name. Ensure that ifindex is a 24-bit
+ * postive integer to meet RFC 2863 recommendations.
+ */
+uint32_t h = hash_string(netdev->name, 0);
+ifindex = (int)(h % 0xfe + 1);


If user configuration was supported, enforcing uniqueness would be the 
responsibility of the user.

One minor question:
I know other ifindex implementations do not limit to 24 bits, so I checked RFC 
2863.
What section is the 24 bit limit recommendation mentioned in RFC 2863; I missed 
it. 



 ovs_mutex_unlock(>mutex);
 
 return ifindex;
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=Rcpenle5jqQ1mu3STwFMODvsTFjKL9iMBrwMfu9J8FM=FJtoKpLo8NxpzHwFKSz6xsT3aGqZCiR507-MDVxjFeE=
 




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v2] ofproto: Use macros to define DPIF support fields

2017-04-03 Thread Andy Zhou
On Mon, Apr 3, 2017 at 5:30 PM, Joe Stringer  wrote:
> On 3 April 2017 at 13:30, Andy Zhou  wrote:
>> When adding a new field in the 'struct dpif_backer_support', the
>> corresponding appctl show command should be updated to display
>> the new field. Currently, there is nothing to remind the developer
>> that to update the show command. This can lead to code maintenance
>> issues.
>>
>> Switch to use macros to define those fields. This makes the show
>> command update automatic.
>>
>> Signed-off-by: Andy Zhou 

Thanks for the review. Pushed with fixes suggested.
>>
>> ---
>
> Looks like there's a few typos:
>
> 's/FILED/FIELD/g'
Fixed
>
> A couple more comments below, but otherwise LGTM. Thanks!
>
> Acked-by: Joe Stringer 
>
>> v1->v2:
>> Add more comments. Fix typos
>> ---
>>  lib/odp-util.h | 52 ++--
>>  ofproto/ofproto-dpif.c | 28 ++--
>>  ofproto/ofproto-dpif.h | 59 
>> ++
>>  3 files changed, 78 insertions(+), 61 deletions(-)
>>
>> diff --git a/lib/odp-util.h b/lib/odp-util.h
>> index 50fa1d133e1f..d9668a75e811 100644
>> --- a/lib/odp-util.h
>> +++ b/lib/odp-util.h
>> @@ -167,30 +167,40 @@ int odp_flow_from_string(const char *s,
>>   const struct simap *port_names,
>>   struct ofpbuf *, struct ofpbuf *);
>>
>> +/* ODP_SUPPORT_FIELD(TYPE, FIELD_NAME, FILED_DESCRIPTION)
>> + *
>> + * Each 'ODP_SUPPORT_FIELD' defines a member in 'struct odp_support',
>> + * and represents support for related OVS_KEY_ATTR_* fields.
>> + * They are defined as macros so that 'dpif_show_support()' is less
>> + * likely to go out of sync when new fields are added.   */
>
> This is a bit simpler language, similarly for the dpif_support below.
>
> "They are defined as macros to keep 'dpif_show_support()' in sync as
> new fields are added."

Thanks, Will change.
>
> 
>
>> +/* DPIF_SUPPORT_FILED(TYPE, FIELD_NAME, FIELD_DESCRIPTION)
>> + *
>> + * Each 'DPIF_SUPPORT_FILED' defines a member in 'struct 
>> dpif_backer_support'
>> + * and represents support for a datapath action.
>> + * They are defined as macros so that 'dpif_show_support()' is less
>> + * likely to go out of sync when new fields are added.  */
>> +#define DPIF_SUPPORT_FIELDS 
>> \
>
> If DPIF_SUPPORT_FIELDS is about actions, perhaps a name like
> DPIF_SUPPORT_ACTION() is a better name?

They don't have to be only about actions down the road. The name is
about defining a field
for 'struct dpif_backer_support'. I kept it as is for now.  We can
always change them later.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v2] ofproto: Use macros to define DPIF support fields

2017-04-03 Thread Joe Stringer
On 3 April 2017 at 13:30, Andy Zhou  wrote:
> When adding a new field in the 'struct dpif_backer_support', the
> corresponding appctl show command should be updated to display
> the new field. Currently, there is nothing to remind the developer
> that to update the show command. This can lead to code maintenance
> issues.
>
> Switch to use macros to define those fields. This makes the show
> command update automatic.
>
> Signed-off-by: Andy Zhou 
>
> ---

Looks like there's a few typos:

's/FILED/FIELD/g'

A couple more comments below, but otherwise LGTM. Thanks!

Acked-by: Joe Stringer 

> v1->v2:
> Add more comments. Fix typos
> ---
>  lib/odp-util.h | 52 ++--
>  ofproto/ofproto-dpif.c | 28 ++--
>  ofproto/ofproto-dpif.h | 59 
> ++
>  3 files changed, 78 insertions(+), 61 deletions(-)
>
> diff --git a/lib/odp-util.h b/lib/odp-util.h
> index 50fa1d133e1f..d9668a75e811 100644
> --- a/lib/odp-util.h
> +++ b/lib/odp-util.h
> @@ -167,30 +167,40 @@ int odp_flow_from_string(const char *s,
>   const struct simap *port_names,
>   struct ofpbuf *, struct ofpbuf *);
>
> +/* ODP_SUPPORT_FIELD(TYPE, FIELD_NAME, FILED_DESCRIPTION)
> + *
> + * Each 'ODP_SUPPORT_FIELD' defines a member in 'struct odp_support',
> + * and represents support for related OVS_KEY_ATTR_* fields.
> + * They are defined as macros so that 'dpif_show_support()' is less
> + * likely to go out of sync when new fields are added.   */

This is a bit simpler language, similarly for the dpif_support below.

"They are defined as macros to keep 'dpif_show_support()' in sync as
new fields are added."



> +/* DPIF_SUPPORT_FILED(TYPE, FIELD_NAME, FIELD_DESCRIPTION)
> + *
> + * Each 'DPIF_SUPPORT_FILED' defines a member in 'struct dpif_backer_support'
> + * and represents support for a datapath action.
> + * They are defined as macros so that 'dpif_show_support()' is less
> + * likely to go out of sync when new fields are added.  */
> +#define DPIF_SUPPORT_FIELDS \

If DPIF_SUPPORT_FIELDS is about actions, perhaps a name like
DPIF_SUPPORT_ACTION() is a better name?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] ofproto-dpif-mirror: Fix issue of reseting snaplen in mirroring

2017-04-03 Thread Andy Zhou
On Fri, Mar 31, 2017 at 8:09 AM, William Tu  wrote:
> Looks good to me, thanks for the fix.
>
> Acked-by: William Tu 
>
> On Sun, Mar 26, 2017 at 8:16 PM, Zhenyu Gao  wrote:
>> Currently, the mirror code doesn't check new value of snaplen when try
>> to reconfigure snaplen.
>> This patch fix this issue and add testings to reconfigure snaplen.
>>
>> Signed-off-by: Zhenyu Gao 

Thanks Zhenyu for the bug fix and William for the review!
Pushed to master, branch-2.7 and branch 2.6.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] compat: Remove rpl_dev_queue_xmit() backport.

2017-04-03 Thread Yi-Hung Wei
Hi Simon,

I posted a patch series[1,2,3] to backport the MPLS GSO upstream
commits. It would be great if you could provide some feedback on that.
I have tested the backport series on kernel 4.4, and 4.9 with
the similar setup as in [3]. To trigger MPLS GSO, I use iperf instead of
the ping test in [3].

Thanks,

[1] https://patchwork.ozlabs.org/patch/746673/
[2] https://patchwork.ozlabs.org/patch/746674/
[3] https://patchwork.ozlabs.org/patch/746675/

-Yi-Hung

On Mon, Mar 20, 2017 at 12:37 PM, Joe Stringer  wrote:
> On 20 March 2017 at 06:36, Simon Horman  wrote:
>> On Thu, Mar 02, 2017 at 03:00:37PM -0800, Joe Stringer wrote:
>>> On 8 February 2017 at 16:50, Joe Stringer  wrote:
>>> > The MPLS portions of this were inadvertently broken in v2.4 due to
>>> > 433637881ca5 ("datapath: define compat __skb_gso_segment()") which
>>> > inverts the supports_mpls_gso() logic, then when rpl_dev_queue_xmit()
>>> > backport dropped its VLAN portion in v2.6, the whole function became a
>>> > no-op - since b63bf2488209 ("datapath: remove VLAN compat code from GSO").
>>> >
>>> > Apparently the MPLS side of this code never worked in a released version
>>> > of OVS and no-one noticed, so remove it.
>>> >
>>> > Signed-off-by: Joe Stringer 
>>>
>>> Picking this thread back up... it's a bit elaborate, but as per my
>>> understanding there were basically three important versions of the
>>> MPLS GSO code in upstream Linux to date:
>>>
>>> Linux earlier than 3.19 had a broken implementation.
>>> Linux 3.19 up until 4.9 had a version that worked OK for OVS
>>> codepaths, but was broken in other paths.
>>> Linux 4.9 and later should be correct in all cases[0 + fixes].
>>>
>>> Then we have our OVS out-of-tree backport of the upstream OVS module.
>>> Today, the MPLS code in OVS matches roughly Linux 4.8. Prior to 3.19,
>>> we backported MPLS GSO, but there was a problem with the activation of
>>> the code[1] so MPLS GSO is broken with out-of-tree against kernels up
>>> to 3.19. For kernels 3.19 to 4.8, the MPLS GSO works correctly in the
>>> OVS tree. Linux 4.9 expects the skb to be prepared differently for
>>> MPLS GSO to work, so the out-of-tree module in conjunction with that
>>> version will be broken. When we backport this patch[0], that should be
>>> fixed---but we'll need to take care to do it in a way that still works
>>> on earlier versions..
>>>
>>> This affects OVS 2.7 - users wanting to run OVS 2.7 with out-of-tree
>>> module with MPLS GSO (including loading the mpls_gso module) and a
>>> Linux 4.9 kernel may observe some issues like this when using MPLS
>>> push/pop. As far as I can tell, the most likely affected users would
>>> be those running recent Fedora releases but I don't think that they
>>> use the DKMS module by default. Given how elaborate this chain of
>>> requirements is, I'm not so strongly concerned about addressing this
>>> before bringing the three or four later backport series into the
>>> tree.. they've been blocking for pretty long already. It'd definitely
>>> be nice to fix it soon though.
>>>
>>> Yi-Hung has been doing the investigation on this, thanks for providing
>>> the details and I hope I portrayed it correctly to the list here!
>>
>> Am I correct in understanding that you would like to:
>> 1. Address outstanding backports and then
>> 2. Address the problem discussed in this thread, particularly for
>>OVS2.7 + Linux 4.9
>>
>> If so that sounds reasonable to me. I'd be happy to assist with this work
>> but I suspect that two (or more) heads will be better than one given the
>> complexity of getting it correct for all the relevant kernel versions.
>
> Correct. At this point I've reviewed and applied most backport changes
> up until the latest changes on net/net-next over the past few weeks,
> with the exception of these MPLS changes. I agree that this stuff is
> tricky and it would be helpful to have some assistance, thanks for the
> offer. I believe that Yi-Hung should be able to share a patch soon,
> and we can continue the discussion from there.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] system-traffic: Add test for mpls actions

2017-04-03 Thread Yi-Hung Wei
Add ping test to verify the behavior of mpls_push/pop actions. In this
test, we use the resubmit action to trigger recirulation for making sure
the flow key is revalidated after mpls_push/pop. This test depends on
commit 5ba0c107c51e ("datapath: Fix ovs_flow_key_update()") to behave
correctly.

Signed-off-by: Yi-Hung Wei 
---
 tests/system-traffic.at | 36 
 1 file changed, 36 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 1816b1a88f72..85b29c89db06 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -369,6 +369,42 @@ 
icmp,vlan_tci=0x,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - mpls actions])
+OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
+
+AT_CHECK([ip link add patch0 type veth peer name patch1])
+on_exit 'ip link del patch0'
+
+AT_CHECK([ip link set dev patch0 up])
+AT_CHECK([ip link set dev patch1 up])
+AT_CHECK([ovs-vsctl add-port br0 patch0])
+AT_CHECK([ovs-vsctl add-port br1 patch1])
+
+AT_DATA([flows.txt], [dnl
+table=0,priority=100,dl_type=0x0800 
actions=push_mpls:0x8847,set_mpls_label:3,resubmit(,1)
+table=0,priority=100,dl_type=0x8847,mpls_label=3 
actions=pop_mpls:0x0800,resubmit(,1)
+table=0,priority=10 actions=resubmit(,1)
+table=1,priority=10 actions=normal
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-ofctl add-flows br1 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
 AT_SETUP([datapath - basic truncate action])
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_TRAFFIC_VSWITCHD_START()
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/3] datapath: Fixups for MPLS GSO

2017-04-03 Thread Yi-Hung Wei
This commit backports the following upstream commits to fix MPLS GSO in ovs
datapath. It has been tested on kernel 4.4 and 4.9.

Upstream commit:
commit 48d2ab609b6bbecb7698487c8579bc40de9d6dfa
Author: David Ahern 
Date:   Wed Aug 24 20:10:44 2016 -0700

net: mpls: Fixups for GSO

As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
   protocol layers are not getting run, and

2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
   are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.

Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.

Reported-by: Lennert Buytenhek 
Signed-off-by: David Ahern 
Signed-off-by: David S. Miller 

Upstream commit:
commit f7d49bce8e741e1e6aa14ce4db1b6cea7e4be4e8
Author: Jiri Benc 
Date:   Fri Sep 30 19:08:05 2016 +0200

openvswitch: mpls: set network header correctly on key extract

After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
openvswitch was changed to have network header pointing to the start of the
MPLS headers and inner_network_header pointing after the MPLS headers.

However, key_extract was missed by the mentioned commit, causing incorrect
headers to be set when a MPLS packet just enters the bridge or after it is
recirculated.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Upstream commit:
commit 85de4a2101acb85c3b1dde465e84596ccca99f2c
Author: Jiri Benc 
Date:   Fri Sep 30 19:08:07 2016 +0200

openvswitch: use mpls_hdr

skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
instead.

Signed-off-by: Jiri Benc 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Upstream commit:
commit c66549ffd05831abf6cf19ce0571ad868e39
Author: Jiri Benc 
Date:   Wed Oct 5 15:01:57 2016 +0200

openvswitch: correctly fragment packet with mpls headers

If mpls headers were pushed to a defragmented packet, the refragmentation no
longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
network header has to be shifted after the mpls headers for the
fragmentation and restored afterwards.

Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
Signed-off-by: Jiri Benc 
Signed-off-by: David S. Miller 

Signed-off-by: Yi-Hung Wei 
---
 acinclude.m4 |  1 +
 datapath/actions.c   | 57 ++--
 datapath/flow.c  | 11 ++
 datapath/linux/compat/include/net/mpls.h | 18 --
 4 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 744d8f89525c..82ca4a28015c 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -479,6 +479,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
 [OVS_GREP_IFELSE([$KSRC/include/net/dst_cache.h], 
[dst_cache],
  
[OVS_DEFINE([USE_UPSTREAM_TUNNEL])])])])
 
+  OVS_GREP_IFELSE([$KSRC/include/linux/mpls.h], [mpls_hdr])
   OVS_GREP_IFELSE([$KSRC/include/linux/net.h], [sock_create_kern.*net],
   [OVS_DEFINE([HAVE_SOCK_CREATE_KERN_NET])])
   OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [ndo_fill_metadata_dst])
diff --git a/datapath/actions.c b/datapath/actions.c
index 06080b24ea5a..ecc5136a3529 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -61,7 +61,8 @@ struct ovs_frag_data {
   

[ovs-dev] [PATCH 2/3] datapath: Fix ovs_flow_key_update()

2017-04-03 Thread Yi-Hung Wei
Upstream commit:
ovs_flow_key_update() is called when the flow key is invalid, and it is
used to update and revalidate the flow key. Commit 329f45bc4f19
("openvswitch: add mac_proto field to the flow key") introduces mac_proto
field to flow key and use it to determine whether the flow key is valid.
However, the commit does not update the code path in ovs_flow_key_update()
to revalidate the flow key which may cause BUG_ON() on execute_recirc().
This patch addresses the aforementioned issue.

Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key")
Signed-off-by: Yi-Hung Wei 
Acked-by: Jiri Benc 
Signed-off-by: David S. Miller 

Signed-off-by: Yi-Hung Wei 
---
 datapath/flow.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/datapath/flow.c b/datapath/flow.c
index 1d40b2400406..c4f63b0b5d76 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -531,7 +531,7 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
 
/* Link layer. */
clear_vlan(key);
-   if (key->mac_proto == MAC_PROTO_NONE) {
+   if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
if (unlikely(eth_type_vlan(skb->protocol)))
return -EINVAL;
 
@@ -751,7 +751,13 @@ static int key_extract(struct sk_buff *skb, struct 
sw_flow_key *key)
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 {
-   return key_extract(skb, key);
+   int res;
+
+   res = key_extract(skb, key);
+   if (!res)
+   key->mac_proto &= ~SW_FLOW_KEY_INVALID;
+
+   return res;
 }
 
 static int key_extract_mac_proto(struct sk_buff *skb)
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Traffic fails in vhost user port

2017-04-03 Thread Nadathur, Sundar
Hi,
I have an OVS bridge br0 with no NICs and  1 vhost user port which is 
connected to a VM. But ping fails between the VM and the br0 port, either way. 
The stats show zero all the time. Inside the VM, tcpdump shows nothing.

This is with OVS 2.7.0 and DPDK 17.02. Please indicate what could be going 
wrong.

In the host, the bridge's internal port is up.
# ip addr show br0
24: br0:  mtu 1500 qdisc noqueue state UNKNOWN 
qlen 500
link/ether 52:2f:57:13:d8:40 brd ff:ff:ff:ff:ff:ff
inet 200.1.1.1/24 scope global br0
   valid_lft forever preferred_lft forever

In the VM, the eth interface is up with address 200.1.1.2/24.

The ports are in the following state, even after "ovs-ofctl mod-port br0 vi1 
up":
# ovs-ofctl dump-ports-desc br0
OFPST_PORT_DESC reply (xid=0x2):
5(vi1): addr:00:00:00:00:00:00
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
LOCAL(br0): addr:52:2f:57:13:d8:40
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max

The flows are configured as below:
# ovs-ofctl dump-flows br0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=2833.612s, table=0, n_packets=0, n_bytes=0, idle_age=2833, 
in_port=1 actions=output:5
cookie=0x2, duration=2819.820s, table=0, n_packets=0, n_bytes=0, idle_age=2819, 
in_port=5 actions=output:1

The table info is as below:
# ovs-ofctl dump-tables br0 | more
OFPST_TABLE reply (xid=0x2):
  table 0 ("classifier"):
active=2, lookup=37, matched=28
max_entries=100
matching:
  in_port: exact match or wildcard
  eth_src: exact match or wildcard
  eth_dst: exact match or wildcard
  eth_type: exact match or wildcard
  vlan_vid: exact match or wildcard
  vlan_pcp: exact match or wildcard
  ip_src: exact match or wildcard
  ip_dst: exact match or wildcard
  nw_proto: exact match or wildcard
  nw_tos: exact match or wildcard
  tcp_src: exact match or wildcard
  tcp_dst: exact match or wildcard

  table 1 ("table1"):
   active=0, lookup=0, matched=0
(same features)

Thanks,
Sundar




Regards,
Sundar

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/8] userspace: Support for L3 tunneling

2017-04-03 Thread Jan Scheurich
I just realized that we missed to implement some of Ben's comments on patch 
series v1.
Will fix those and resubmit v3 as soon as possible.

Regards, Jan

> -Original Message-
> From: Zoltán Balogh
> Sent: Monday, 03 April, 2017 18:28
> To: 'd...@openvswitch.org' 
> Cc: Jan Scheurich ; Georg Schmuecking 
> 
> Subject: [PATCH v2 0/8] userspace: Support for L3 tunneling
> 
> From: Jan Scheurich 
> 
> This patch set is part of an initiative to deal with non-Ethernet packets in 
> OVS for advanced use cases like L3 tunneling or NSH. The
> initiative is centering on the new OpenFlow concepts of "Packet type-aware 
> pipeline" (PTAP) and "Generic encap/decap actions" (EXT-
> 382). The overall design is documented in
> https://docs.google.com/document/d/1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8
> 
> This patch series implements the user-space parts of the support for L3 
> tunnels connected to the legacy Ethernet-only pipeline in OVS.
> 
> In large parts it is an adaptation of the earlier work on L3 tunneling by 
> Lorand Jakab, Simon Horman, Jiri Benc and Yi Yang, adapted to the
> new design for packet type aware pipelines as prototyped by Jean Tourrilhes.
> 
> It supersedes user-space patches 9-11 of the most recent patch set submitted 
> by Yi Yang (https://mail.openvswitch.org/pipermail/ovs-
> dev/2016-December/326865.html). The remaining user-space parts of that series 
> (VXLAN-GPE) will be superseded by further patches in
> the context of this initiative.
> 
> Key changes are the introduction of explicit packet_type fields in the 
> dp_packet and flow structs, as well as a simpler handling of L3
> tunnels limited to the ofproto layer.
> 
>userspace: Add packet_type in dp_packet and flow
>userspace: Support for push_eth and pop_eth actions
>userspace: Switching of L3 packets in L2 pipeline
>ofproto-dpif-upcall: Intialize dump-seq of new flow to zero
>userspace: L3 tunnel support for GRE and LISP
>dpif-netlink: Don't send PACKET_TYPE to kernel
>ofproto-dpif-xlate: refactor compose_output_action__
>userspace: add vxlan gpe support to vport
> 
> For native L3 tunneling with the user-space datapath these patches are 
> complete. To apply L3 tunneling to the kernel datapath, they are
> dependent on the following other contributions:
> 
> .[PATCH net-next v13 0/8] openvswitch: support for layer 3 encapsulated 
> packets
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/324943.html
> The net-next patch has been accepted
> .[RFC PATCH v4 0/6] create tunnel devices using rtnetlink interface
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327749.html
> .Another user-space patch to send the "layer3" option down via rtnetlink 
> interface
> .Backports of the kernel datapath patches to the OVS tree module:
> Patches 01-08 of patch series [PATCH v2 00/17] port Jiri Benc's L3 patchset 
> to ovs
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/326865.html
> .A compatibility mechanism to send down "layer3" option to the backported 
> kernel module to support older kernels.
> 
> To ease the progress with the implementation of the subsequent user-space 
> patches for the PTAP, EXT-382 and NSH and avoid the pain of
> frequent re-basing of these large patch series, we would like to request 
> exemption for merging the present patch series before the kernel
> datapath pieces are all in place.
> 
>  build-aux/extract-ofp-fields  |   1 +
>  datapath/linux/compat/include/linux/openvswitch.h |   9 +-
>  include/openflow/openflow-common.h|   9 +
>  include/openvswitch/automake.mk   |   4 +-
>  include/openvswitch/flow.h|  16 +-
>  include/openvswitch/match.h   |   1 +
>  include/openvswitch/meta-flow.h   |  15 +-
>  include/openvswitch/ofp-print.h   |   9 +-
>  include/openvswitch/vxlangpe.h|  76 
>  lib/cfm.c |   2 +-
>  lib/conntrack.c   |   2 +-
>  lib/dp-packet.c   |   3 +
>  lib/dp-packet.h   |  25 +-
>  lib/dpif-netdev.c |  50 ++-
>  lib/dpif-netlink.c|  56 ++-
>  lib/dpif.c|   6 +-
>  lib/flow.c| 112 +++---
>  lib/flow.h|  11 +-
>  lib/match.c   |  33 +-
>  lib/meta-flow.c   |   2 +
>  lib/netdev-bsd.c  |   1 +
>  lib/netdev-dummy.c|   5 +
>  lib/netdev-linux.c|   4 +-
>  lib/netdev-native-tnl.c  

Re: [ovs-dev] [PATCH 2/3] ofproto: Store meters into imap

2017-04-03 Thread Andy Zhou
On Fri, Mar 31, 2017 at 7:53 PM, Ben Pfaff  wrote:
> On Fri, Mar 31, 2017 at 01:42:02PM -0700, Andy Zhou wrote:
>> Currently, meters are stored in a fixed pointer array. It is not
>> very efficient since the controller, at least in theory, can
>> pick any meter id (up to the limits to uint32_t), not necessarily
>> within the lower end of a region, or in close range to each other.
>> In particular, OFPM_SLOWPATH and OFPM_CONTROLLER meters are specified
>> at the high region.
>>
>> Switching to use imap, so that ofproto layer does not restrict
>> the number of meters that controller can add, nor does it care
>> about the value of meter_id. Only datapth actually limits the
>> number of meters ofproto layer can support.
>>
>> Signed-off-by: Andy Zhou 
>
> Instead of using a new integer-to-pointer map data structure, did you
> consider adding an hmap_node to struct meter?  That's the first approach
> that comes to mind, so I'm curious to know whether it's a poor approach
> for some reason.
I just thought 'imap' will be more general. Then I coded the way you suggested,
and it end up very similar code wise.  I will drop 'imap' in future
versions. Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v2] ofproto: Use macros to define DPIF support fields

2017-04-03 Thread Andy Zhou
When adding a new field in the 'struct dpif_backer_support', the
corresponding appctl show command should be updated to display
the new field. Currently, there is nothing to remind the developer
that to update the show command. This can lead to code maintenance
issues.

Switch to use macros to define those fields. This makes the show
command update automatic.

Signed-off-by: Andy Zhou 

---
v1->v2:
Add more comments. Fix typos
---
 lib/odp-util.h | 52 ++--
 ofproto/ofproto-dpif.c | 28 ++--
 ofproto/ofproto-dpif.h | 59 ++
 3 files changed, 78 insertions(+), 61 deletions(-)

diff --git a/lib/odp-util.h b/lib/odp-util.h
index 50fa1d133e1f..d9668a75e811 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -167,30 +167,40 @@ int odp_flow_from_string(const char *s,
  const struct simap *port_names,
  struct ofpbuf *, struct ofpbuf *);
 
+/* ODP_SUPPORT_FIELD(TYPE, FIELD_NAME, FILED_DESCRIPTION)
+ *
+ * Each 'ODP_SUPPORT_FIELD' defines a member in 'struct odp_support',
+ * and represents support for related OVS_KEY_ATTR_* fields.
+ * They are defined as macros so that 'dpif_show_support()' is less
+ * likely to go out of sync when new fields are added.   */
+#define ODP_SUPPORT_FIELDS   \
+/* Maximum number of 802.1q VLAN headers to serialize in a mask. */  \
+ODP_SUPPORT_FIELD(size_t, max_vlan_headers, "Max VLAN headers")  \
+/* Maximum number of MPLS label stack entries to serialise in a mask. */ \
+ODP_SUPPORT_FIELD(size_t, max_mpls_depth, "Max MPLS depth")  \
+/* If this is true, then recirculation fields will always be \
+ * serialised. */\
+ODP_SUPPORT_FIELD(bool, recirc, "Recirc")\
+/* If true, serialise the corresponding OVS_KEY_ATTR_CONN_* field. */\
+ODP_SUPPORT_FIELD(bool, ct_state, "CT state")\
+ODP_SUPPORT_FIELD(bool, ct_zone, "CT zone")  \
+ODP_SUPPORT_FIELD(bool, ct_mark, "CT mark")  \
+ODP_SUPPORT_FIELD(bool, ct_label, "CT label")\
+ \
+/* If true, it means that the datapath supports the NAT bits in  \
+ * 'ct_state'.  The above 'ct_state' member must be true for this\
+ * to make sense */  \
+ODP_SUPPORT_FIELD(bool, ct_state_nat, "CT state NAT")\
+ \
+/* Conntrack original direction tuple matching * supported. */   \
+ODP_SUPPORT_FIELD(bool, ct_orig_tuple, "CT orig tuple")
+
 /* Indicates support for various fields. This defines how flows will be
  * serialised. */
 struct odp_support {
-/* Maximum number of 802.1q VLAN headers to serialize in a mask. */
-size_t max_vlan_headers;
-/* Maximum number of MPLS label stack entries to serialise in a mask. */
-size_t max_mpls_depth;
-
-/* If this is true, then recirculation fields will always be serialised. */
-bool recirc;
-
-/* If true, serialise the corresponding OVS_KEY_ATTR_CONN_* field. */
-bool ct_state;
-bool ct_zone;
-bool ct_mark;
-bool ct_label;
-
-/* If true, it means that the datapath supports the NAT bits in
- * 'ct_state'.  The above 'ct_state' member must be true for this
- * to make sense */
-bool ct_state_nat;
-
-bool ct_orig_tuple;   /* Conntrack original direction tuple matching
-   * supported. */
+#define ODP_SUPPORT_FIELD(TYPE, NAME, TITLE) TYPE NAME;
+ODP_SUPPORT_FIELDS
+#undef ODP_SUPPORT_FIELD
 };
 
 struct odp_flow_key_parms {
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 523adad6fa52..9556a0e0cd58 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4955,13 +4955,13 @@ ofproto_unixctl_dpif_dump_dps(struct unixctl_conn 
*conn, int argc OVS_UNUSED,
 }
 
 static void
-show_dp_feature_b(struct ds *ds, const char *feature, bool b)
+show_dp_feature_bool(struct ds *ds, const char *feature, bool b)
 {
 ds_put_format(ds, "%s: %s\n", feature, b ? "Yes" : "No");
 }
 
 static void
-show_dp_feature_s(struct ds *ds, const char *feature, size_t s)
+show_dp_feature_size_t(struct ds *ds, const char *feature, size_t s)
 {
 ds_put_format(ds, "%s: %"PRIuSIZE"\n", feature, s);
 }
@@ -4969,21 +4969,15 @@ show_dp_feature_s(struct ds *ds, const char *feature, 
size_t s)
 static void
 dpif_show_support(const struct dpif_backer_support *support, struct ds *ds)
 {
-show_dp_feature_b(ds, "Variable length userdata",
-

Re: [ovs-dev] [PATCH 3/3] netdev-dpdk: Use uint8_t for port_id.

2017-04-03 Thread Aaron Conole
Ilya Maximets  writes:

> Currently, signed integer is used for 'port_id' variable and
> '-1' as identifier of bad or uninitialized 'port_id'.
>
> This inconsistent with dpdk library and, also, in few cases,
> leads to passing '-1' to dpdk functions where uint8_t expected.
>
> Such behaviour doesn't produce any issues, but it's better to
> use same type as in dpdk library for consistency.
>
> Also, magic number '-1' replaced with DPDK_ETH_PORT_ID_INVALID
> macro.
>
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] netdev-dpdk: Fix device leak on port deletion.

2017-04-03 Thread Aaron Conole
Ilya Maximets  writes:

> Currently, once created device in dpdk will exist forever
> even after del-port operation untill we manually call
> 'ovs-appctl netdev-dpdk/detach ', where  is not
> the port's name but the name of dpdk eth device or pci address.
>
> Few issues with current implementation:
>
>   1. Different API for usual (system) and DPDK devices.
>  (We have to call 'ovs-appctl netdev-dpdk/detach' each
>   time after 'del-port' to actually free the device)
>  This is a big issue mostly for virtual DPDK devices.
>
>   2. Follows from 1:
>  For DPDK devices 'del-port' leads just to
>  'rte_eth_dev_stop' and subsequent 'add-port' will
>  just start the already existing device. Such behaviour
>  will not reset the device to initial state as it could
>  be expected. For example: virtual pcap pmd will continue
>  reading input file instead of reading it from the beginning.
>
>   3. Follows from 2:
>  After execution of the following commands 'port1' will be
>  configured with the 'old-options' while 'ovs-vsctl show'
>  will show us 'new-options' in dpdk-devargs field:
>
>ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
>  options:dpdk-devargs=,
>ovs-vsctl del-port port1
>ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
>  options:dpdk-devargs=,
>
>   4. Follows from 1:
>  Not detached device consumes 'port_id'. Since we have very
>  limited number of 'port_id's (32 in common case) this may
>  lead to quick exhausting of id pool and inability to add any
>  other port.
>
> To avoid above issues we need to detach all the attached devices on
> port destruction.
> appctl 'netdev-dpdk/detach' removed because not needed anymore.
>
> CC: Ciara Loftus 
> Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
> Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] netdev-dpdk: Fix double attaching of virtual devices.

2017-04-03 Thread Aaron Conole
Ilya Maximets  writes:

> 'devargs' for virtual devices contains not only name but
> also a list of arguments like this:
>
>   'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap'
>   or
>   'eth_af_packet0,iface=eth0'
>
> We must cut off the arguments from this string before calling
> 'rte_eth_dev_get_port_by_name()' to avoid double attaching of
> the same device.
>
> CC: Ciara Loftus 
> Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/7] netdev-dpdk: Add support for keepalive functionality.

2017-04-03 Thread Bodireddy, Bhanuprakash
>>>
>>>This whole mechanism seems very error prone.  Is it possible to hang a
>>>thread with the subsequent sem_post?
>>
>> The relay function is called by 'ovs_keepalive' thread. I didn't
>> completely understand your concerns here. I would be happy to verify
>> if you have any scenarios in mind that you think pose problems.
>
>My concern stems from this:
>   "is relayed to monitoring framework by unlocking the semaphore."
>
>Assume the OvS vswitchd crashes while another process is pended on this
>locked semaphore.  Since you do a shm_unlink, I think the pended process
>waiting on that semaphore will be stuck.  Even worse, since we do have a
>process left with a handle, the name will be unlinked, and the memory and
>reading process (which may be ceilometer, but may be something else) will be
>left orphaned waiting for an update so that they can destroy it.
>
>There's a lot of coordination that needs to be added here, and it's likely not
>portable.  I'm not even sure if there are examples where it has been shown to
>work 100% on specific systems.
>
>A much better design is to use a socket to share state.  Since it is tied to 
>the
>lifetime of a process, and cleanup is guaranteed by POSIX (and all sane OSes,
>anyway) you can rely on it.  It is a much better mechanism for signaling
>changes, and a message based interface means you can make this dump
>information to _any_ interested subscriber - even those who are off system if
>you choose.
>

Thanks Aaron for clarifying this and appreciate all your feedback on this patch 
series.
I shall try out the scenarios leading to unexpected crashes and see how well 
the applications
recover on a restart and check for any further shortcomings. 

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovs V6 00/24] Introducing HW offload support for openvswitch

2017-04-03 Thread Joe Stringer
On 1 April 2017 at 20:50, Roi Dayan  wrote:
>
>
> On 31/03/2017 01:11, Marcelo Ricardo Leitner wrote:
>>
>> On Thu, Mar 30, 2017 at 03:43:36PM -0300, Marcelo Ricardo Leitner wrote:
>>>
>>> On Wed, Mar 29, 2017 at 03:43:10PM +0300, Roi Dayan wrote:

 This patch series introduces rule offload functionality to dpif-netlink
 via netdev ports new flow offloading API. The user can specify whether
 to
 enable rule offloading or not via OVS configuration. Netdev providers
 are able to implement netdev flow offload API in order to offload rules.

 This patch series also implements one offload scheme for netdev-linux,
 using TC flower classifier, which was chosen because its sort of natural
 to state OVS DP rules for this classifier. However, the code can be
 extended to support other classifiers such as U32, eBPF, etc which
 support
 offload as well.

 The use-case we are currently addressing is the newly sriov switchdev
 mode
 in the Linux kernel which was introduced in version 4.8 [1][2].
 This series was tested against sriov vfs vports representors of the
 Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.


 V5->V6:
 - Rebase over master branch, fix compilation issue
 - Add Nicira copyright to tc interface
>>>
>>>
>>> Hi,
>>>
>>> I built rpm packages based on OVS upstream commit 36664f336664. Then I
>>
>>
>> Please s/36664f336664/36664f377d048/g in this email.
>> This was the commit used as base for the test:
>> commit 36664f377d04857256c1d8582057e1cf3e0e3923
>> Author: Ben Pfaff 
>> Date:   Mon Mar 20 10:56:22 2017 -0700
>>
>> ovsdb-server: Drop unnecessary find_db() function.
>>
>> Thanks
>>
>>> built a new set of packages with this patchset, v6, applied.
>>>
>>> When offloading is enabled, the rules are merged in an unexpected way.
>>> See the results below from the two scenarios.
>>>
>>> openvswitch-2.7.90-1.fc24.36664f336664.x86_64.rpm
>>>    upstream ovs commit
>>>
>>> [root@wsfd-netdev36 ~]# ovs-ofctl dump-flows ovs_pvp_br0
>>> NXST_FLOW reply (xid=0x4):
>>>  cookie=0x0, duration=12.501s, table=0, n_packets=1022759,
>>> n_bytes=61365540, idle_age=0, ip,in_port=1,nw_src=16.0.0.1,nw_dst=1.0.0.1
>>> actions=output:2
>>>  cookie=0x0, duration=12.501s, table=0, n_packets=1020990,
>>> n_bytes=61259400, idle_age=0, ip,in_port=1,nw_src=16.0.0.2,nw_dst=1.0.0.2
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1023860,
>>> n_bytes=61431600, idle_age=0, ip,in_port=1,nw_src=16.0.0.3,nw_dst=1.0.0.3
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1022554,
>>> n_bytes=61353240, idle_age=0, ip,in_port=1,nw_src=16.0.0.4,nw_dst=1.0.0.4
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1021516,
>>> n_bytes=61290960, idle_age=0, ip,in_port=1,nw_src=16.0.0.5,nw_dst=1.0.0.5
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1020599,
>>> n_bytes=61235940, idle_age=0, ip,in_port=1,nw_src=16.0.0.6,nw_dst=1.0.0.6
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1022285,
>>> n_bytes=61337100, idle_age=0, ip,in_port=1,nw_src=16.0.0.7,nw_dst=1.0.0.7
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1021725,
>>> n_bytes=61303500, idle_age=0, ip,in_port=1,nw_src=16.0.0.8,nw_dst=1.0.0.8
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1022281,
>>> n_bytes=61336860, idle_age=0, ip,in_port=1,nw_src=16.0.0.9,nw_dst=1.0.0.9
>>> actions=output:2
>>>  cookie=0x0, duration=12.500s, table=0, n_packets=1022449,
>>> n_bytes=61346940, idle_age=0, ip,in_port=1,nw_src=16.0.0.10,nw_dst=1.0.0.10
>>> actions=output:2
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2580, n_bytes=154800,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.1,nw_dst=1.0.0.1 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2675, n_bytes=160500,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.2,nw_dst=1.0.0.2 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2520, n_bytes=151200,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.3,nw_dst=1.0.0.3 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2587, n_bytes=155220,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.4,nw_dst=1.0.0.4 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2640, n_bytes=158400,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.5,nw_dst=1.0.0.5 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2593, n_bytes=155580,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.6,nw_dst=1.0.0.6 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2634, n_bytes=158040,
>>> idle_age=0, ip,in_port=2,nw_src=16.0.0.7,nw_dst=1.0.0.7 actions=output:1
>>>  cookie=0x0, duration=12.208s, table=0, n_packets=2549, n_bytes=152940,
>>> idle_age=0, 

Re: [ovs-dev] [PATCH ovs V5 00/24] Introducing HW offload support for openvswitch

2017-04-03 Thread Joe Stringer
On 3 April 2017 at 03:27, Roi Dayan  wrote:
>
>
> On 29/03/2017 20:13, Joe Stringer wrote:
>>
>> On 29 March 2017 at 04:50, Roi Dayan  wrote:
>>>
>>>
>>>
>>> On 23/03/2017 09:01, Joe Stringer wrote:


 I ran the make check-offloads tests on a recent net-next kernel and it
 failed, output was not as expected:

 ../../tests/system-offloaded-traffic.at:54
 : ovs-appctl dpctl/dump-flows |
 grep "eth_type(0x0800)" | sed -e


 's/used:[0-9].[0-9]*s/used:0.001s/;s/eth(src=[a-z0-9:]*,dst=[a-z0-9:]*)/eth(mac
 s)/;s/actions:[0-9,]*/actions:output/;s/recirc_id(0),//' | sort
 --- - 2017-03-22 16:43:37.598689692 -0700
 +++


 /home/vagrant/ovs/_build-clang/tests/system-offloads-testsuite.dir/at-groups/2/stdout
 2017-03-22 16:43:37.595628000 -0700
 @@ -1,3 +1,3 @@
 -in_port(2),eth(macs),eth_type(0x0800), packets:9, bytes:756,
 used:0.001s, actions:output
 -in_port(3),eth(macs),eth_type(0x0800), packets:9, bytes:756,
 used:0.001s, actions:output
 +in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
 bytes:882, used:0.001s, actions:output
 +in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
 bytes:882, used:0.001s, actions:output

>>>
>>> Hi Joe,
>>>
>>> can you tell me what kernel you used here?
>>> maybe tc offloads were not supported and there was a fallback to OVS dp.
>>
>>
>> I believe that it was a snapshot of net-next relatively recently,
>> 01461abe62df ("Merge branch 'fib-notifications-cleanup'"). I could try
>> again with latest net-next? Or do you think there may be some
>> userspace dependency the test relies on?
>>
>
> I installed net-next kernel and make check-offloads pass for me.
> The last commit I'm on is
> 397df70 Merge branch '40GbE' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
> last tag is 4.11-rc3+
> I'm thinking maybe the test fails for you from something else like a second
> openvswitch process running already?

I don't think that was the case, but let me try again with your latest
series and the above commit.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] ***BULK*** E-mail Account Upgrade

2017-04-03 Thread Raluca Negrea
E-mail Account Upgrade

Your email box account needs to be  
 
Upgraded now to the latest version 
inother for you to receive your pending incoming messages that has been 
suspended.


Microsoft Webmail (c) 2017









































































































































































































































?




NOTE:
The content of the document you received is confidential and is the property of 
RATEN ICN. It is addressed only to those authorized to receive it. If you are 
not the addressee, we inform you that any disclosure, copying or distribution 
of information related to the contents of this text are strictly prohibited and 
subject to applicable law. RATEN ICN is not responsible for altering the 
computer message nor for any technical problems that might arise after its 
reception.?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 7/8] ofproto-dpif-xlate: refactor compose_output_action__

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

The very long function compose_output_action__() has been re-factored to make
the different cases for output to patch-port, native tunnel port, kernel tunnel
port, recirculation, or termination of a native tunnel at output to LOCAL port
clearer. Larger, self-contained blocks have been split out into separate
functions.

Signed-off-by; Jan Scheurich 
Co-authored-by: Zoltan Balogh 
---
 ofproto/ofproto-dpif-xlate.c | 435 +++
 1 file changed, 235 insertions(+), 200 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 64a33ba..7dc5220 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3258,39 +3258,28 @@ xlate_flow_is_protected(const struct xlate_ctx *ctx, 
const struct flow *flow, co
 xport_in->xbundle->protected && xport_out->xbundle->protected);
 }
 
-static void
-compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
-const struct xlate_bond_recirc *xr, bool check_stp)
+static bool
+check_output_prerequisites(struct xlate_ctx *ctx,
+   const struct xport *xport,
+   struct flow *flow,
+   bool check_stp)
 {
-const struct xport *xport = get_ofp_port(ctx->xbridge, ofp_port);
 struct flow_wildcards *wc = ctx->wc;
-struct flow *flow = >xin->flow;
-struct flow_tnl flow_tnl;
-union flow_vlan_hdr flow_vlans[FLOW_MAX_VLAN_HEADERS];
-uint8_t flow_nw_tos;
-odp_port_t out_port, odp_port;
-bool tnl_push_pop_send = false;
-uint8_t dscp;
-
-/* If 'struct flow' gets additional metadata, we'll need to zero it out
- * before traversing a patch port. */
-BUILD_ASSERT_DECL(FLOW_WC_SEQ == 39);
-memset(_tnl, 0, sizeof flow_tnl);
 
 if (!xport) {
 xlate_report(ctx, OFT_WARN, "Nonexistent output port");
-return;
+return false;
 } else if (xport->config & OFPUTIL_PC_NO_FWD) {
 xlate_report(ctx, OFT_DETAIL, "OFPPC_NO_FWD set, skipping output");
-return;
+return false;
 } else if (ctx->mirror_snaplen != 0 && xport->odp_port == ODPP_NONE) {
 xlate_report(ctx, OFT_WARN,
  "Mirror truncate to ODPP_NONE, skipping output");
-return;
+return false;
 } else if (xlate_flow_is_protected(ctx, flow, xport)) {
 xlate_report(ctx, OFT_WARN,
  "Flow is between protected ports, skipping output.");
-return;
+return false;
 } else if (check_stp) {
 if (is_stp(>base_flow)) {
 if (!xport_stp_should_forward_bpdu(xport) &&
@@ -3304,7 +3293,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t 
ofp_port,
  "RSTP not managing BPDU in this state, "
  "skipping bpdu output");
 }
-return;
+return false;
 }
 } else if ((xport->cfm && cfm_should_process_flow(xport->cfm, flow, 
wc))
|| (xport->bfd && bfd_should_process_flow(xport->bfd, flow,
@@ -3319,162 +3308,214 @@ compose_output_action__(struct xlate_ctx *ctx, 
ofp_port_t ofp_port,
 xlate_report(ctx, OFT_WARN,
  "RSTP not in forwarding state, skipping output");
 }
-return;
+return false;
 }
 }
+return true;
+}
 
-if (flow->packet_type == htonl(PT_ETH) && xport->is_layer3 ) {
-/* Ethernet packet to L3 outport -> pop ethernet header. */
-flow->packet_type = PACKET_TYPE_BE(OFPHTN_ETHERTYPE,
-   ntohs(flow->dl_type));
+static void
+xlate_output_patch_port(struct xlate_ctx *ctx,
+const struct xport *xport)
+{
+const struct xport *peer = xport->peer;
+struct flow *flow = >xin->flow;
+struct flow old_flow = ctx->xin->flow;
+struct flow_tnl old_flow_tnl_wc = ctx->wc->masks.tunnel;
+bool old_conntrack = ctx->conntracked;
+bool old_was_mpls = ctx->was_mpls;
+ovs_version_t old_version = ctx->xin->tables_version;
+struct ofpbuf old_stack = ctx->stack;
+uint8_t new_stack[1024];
+struct ofpbuf old_action_set = ctx->action_set;
+struct ovs_list *old_trace = ctx->xin->trace;
+uint64_t actset_stub[1024 / 8];
+
+ofpbuf_use_stub(>stack, new_stack, sizeof new_stack);
+ofpbuf_use_stub(>action_set, actset_stub, sizeof actset_stub);
+flow->in_port.ofp_port = peer->ofp_port;
+flow->metadata = htonll(0);
+memset(>tunnel, 0, sizeof flow->tunnel);
+flow->tunnel.metadata.tab = ofproto_get_tun_tab(
+>xbridge->ofproto->up);
+ctx->wc->masks.tunnel.metadata.tab = flow->tunnel.metadata.tab;
+memset(flow->regs, 0, sizeof flow->regs);
+flow->actset_output = 

[ovs-dev] [PATCH v2 8/8] userspace: add vxlan gpe support to vport

2017-04-03 Thread Zoltán Balogh
From: Georg Schmuecking 

This patch is based on the "datapath: enable vxlangpe creation in compat mode"
from Yi Yang. It introduces an extension option "gpe" to the vxlan port in the
netdev-dpdk datapath. Furthermore it introduces a new protocol header file
vxlangpe.h in which the vxlan gpe protocoll is described. In the vxlan specific
methods the different packet are introduced and handled.

Signed-off-by: Yi Yang 
Signed-off-by: Georg Schmuecking 
---
 datapath/linux/compat/include/linux/openvswitch.h |  1 +
 include/openvswitch/automake.mk   |  4 +-
 include/openvswitch/vxlangpe.h| 76 +++
 lib/netdev-native-tnl.c   | 61 --
 lib/netdev-vport.c| 18 +-
 5 files changed, 150 insertions(+), 10 deletions(-)
 create mode 100755 include/openvswitch/vxlangpe.h

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index 61f9e22..0146d23 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -291,6 +291,7 @@ enum ovs_vport_attr {
 enum {
OVS_VXLAN_EXT_UNSPEC,
OVS_VXLAN_EXT_GBP,  /* Flag or __u32 */
+   OVS_VXLAN_EXT_GPE = 8,  /* Flag or __u32 */
__OVS_VXLAN_EXT_MAX,
 };
 
diff --git a/include/openvswitch/automake.mk b/include/openvswitch/automake.mk
index c0e276f..c125f1e 100644
--- a/include/openvswitch/automake.mk
+++ b/include/openvswitch/automake.mk
@@ -29,5 +29,5 @@ openvswitchinclude_HEADERS = \
include/openvswitch/uuid.h \
include/openvswitch/version.h \
include/openvswitch/vconn.h \
-   include/openvswitch/vlog.h
-
+   include/openvswitch/vlog.h \
+   include/openvswitch/vxlangpe.h
diff --git a/include/openvswitch/vxlangpe.h b/include/openvswitch/vxlangpe.h
new file mode 100755
index 000..4c18d90
--- /dev/null
+++ b/include/openvswitch/vxlangpe.h
@@ -0,0 +1,76 @@
+#ifndef __OPENVSWITCH_VXLANGPE_H
+#define __OPENVSWITCH_VXLANGPE_H 1
+
+#include "openvswitch/types.h"
+
+#define u8 uint8_t
+#define u32 uint8_t
+#define __be32 ovs_be32
+
+/*
+ * VXLAN Generic Protocol Extension (VXLAN_F_GPE):
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|Ver|I|P|R|O|   Reserved|Next Protocol  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |VXLAN Network Identifier (VNI) |   Reserved|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Ver = Version. Indicates VXLAN GPE protocol version.
+ *
+ * P = Next Protocol Bit. The P bit is set to indicate that the
+ * Next Protocol field is present.
+ *
+ * O = OAM Flag Bit. The O bit is set to indicate that the packet
+ * is an OAM packet.
+ *
+ * Next Protocol = This 8 bit field indicates the protocol header
+ * immediately following the VXLAN GPE header.
+ *
+ * https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01
+ */
+
+struct vxlanhdr_gpe {
+#ifdef WORDS_BIGENDIAN
+   u8  reserved_flags2:2,
+   version:2,
+   instance_applied:1,
+   np_applied:1,
+   reserved_flags1:1,
+   oam_flag:1;
+#else
+   u8  oam_flag:1,
+   reserved_flags1:1,
+   np_applied:1,
+   instance_applied:1,
+   version:2,
+   reserved_flags2:2;
+#endif
+   u8  reserved_flags3;
+   u8  reserved_flags4;
+   u8  next_protocol;
+   __be32  vx_vni;
+};
+
+/* VXLAN-GPE header flags. */
+#define VXLAN_HF_VER   ((1U <<29) | (1U <<28))
+#define VXLAN_HF_NP(1U <<26)
+#define VXLAN_HF_OAM   (1U <<24)
+
+#define VXLAN_GPE_USED_BITS (VXLAN_HF_VER | VXLAN_HF_NP | VXLAN_HF_OAM | \
+0xff)
+
+/* VXLAN-GPE header Next Protocol. */
+#define VXLAN_GPE_NP_IPV4  0x01
+#define VXLAN_GPE_NP_IPV6  0x02
+#define VXLAN_GPE_NP_ETHERNET  0x03
+#define VXLAN_GPE_NP_NSH   0x04
+
+struct vxlan_metadata {
+   u32 gbp;
+   u32 gpe;
+};
+
+#define VXLAN_F_GPE0x4000
+#define VXLAN_HF_GPE 0x0400
+
+#endif /* __OPENVSWITCH_VXLANGPE_H */
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index b48fccd..e8ac078 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -44,6 +44,7 @@
 #include "unaligned.h"
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
+#include "openvswitch/vxlangpe.h"
 
 VLOG_DEFINE_THIS_MODULE(native_tnl);
 static struct vlog_rate_limit err_rl = VLOG_RATE_LIMIT_INIT(60, 5);
@@ -498,6 +499,8 @@ netdev_vxlan_pop_header(struct dp_packet *packet)
 struct flow_tnl *tnl = >tunnel;
 struct vxlanhdr *vxh;
 unsigned int hlen;
+ovs_be32 vx_flags;
+enum packet_type next_pt = PT_ETH;
 
 

[ovs-dev] [PATCH v2 6/8] dpif-netlink: Don't send PACKET_TYPE to kernel

2017-04-03 Thread Zoltán Balogh

The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.

This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.

Signed-off-by; Zoltan Balogh 
---
 lib/dpif-netlink.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index c01c89e..7144465 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2911,6 +2911,34 @@ dpif_netlink_flow_from_ofpbuf(struct dpif_netlink_flow 
*flow,
 return 0;
 }
 
+static inline void
+nl_msg_put_exclude_packet_type(struct ofpbuf* buf, uint16_t type,
+   const struct nlattr *data,
+   uint16_t data_len)
+{
+const struct nlattr *packet_type = nl_attr_find__(data, data_len,
+  
OVS_KEY_ATTR_PACKET_TYPE);
+if (packet_type) {
+/* exclude PACKET_TYPE Netlink attribute. */
+ovs_assert(NLA_ALIGN(packet_type->nla_len) == NLA_HDRLEN + 
sizeof(uint32_t));
+
+size_t packet_type_len = NLA_HDRLEN + sizeof(uint32_t);
+size_t first_chunk_size =
+(size_t)((uint8_t *)packet_type - (uint8_t *)data);
+size_t second_chunk_size = data_len - first_chunk_size
+   - packet_type_len;
+uint8_t *first_attr = NULL;
+struct nlattr *next_attr = nl_attr_next(packet_type);
+
+first_attr = nl_msg_put_unspec_uninit(buf, type,
+  data_len - packet_type_len);
+memcpy(first_attr, data, first_chunk_size);
+memcpy(first_attr + first_chunk_size, next_attr, second_chunk_size);
+} else {
+nl_msg_put_unspec(buf, type, data, data_len);
+}
+}
+
 /* Appends to 'buf' (which must initially be empty) a "struct ovs_header"
  * followed by Netlink attributes corresponding to 'flow'. */
 static void
@@ -2937,13 +2965,12 @@ dpif_netlink_flow_to_ofpbuf(const struct 
dpif_netlink_flow *flow,
 }
 if (!flow->ufid_terse || !flow->ufid_present) {
 if (flow->key_len) {
-nl_msg_put_unspec(buf, OVS_FLOW_ATTR_KEY,
-  flow->key, flow->key_len);
+nl_msg_put_exclude_packet_type(buf, OVS_FLOW_ATTR_KEY, flow->key,
+   flow->key_len);
 }
-
 if (flow->mask_len) {
-nl_msg_put_unspec(buf, OVS_FLOW_ATTR_MASK,
-  flow->mask, flow->mask_len);
+nl_msg_put_exclude_packet_type(buf, OVS_FLOW_ATTR_MASK, flow->mask,
+   flow->mask_len);
 }
 if (flow->actions || flow->actions_len) {
 nl_msg_put_unspec(buf, OVS_FLOW_ATTR_ACTIONS,
-- 
2.7.4
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/8] userspace: Support for push_eth and pop_eth actions

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

Add support for actions push_eth and pop_eth to the netdev datapath and
the supporting libraries. This patch relies on the support for these actions
in the kernel datapath to be present.

Signed-off-by: Lorand Jakab 
Signed-off-by: Simon Horman 
Signed-off-by: Jiri Benc 
Signed-off-by: Yi Yang 
Signed-off-by: Jean Tourrilhes 
Signed-off-by: Jan Scheurich 
Co-authored-by: Zoltan Balogh 
---
 datapath/linux/compat/include/linux/openvswitch.h |  6 ++-
 lib/odp-execute.c | 17 +-
 lib/odp-util.c| 64 +--
 lib/odp-util.h|  7 +++
 lib/packets.c | 42 +++
 lib/packets.h |  4 ++
 ofproto/ofproto-dpif-sflow.c  | 14 ++---
 7 files changed, 139 insertions(+), 15 deletions(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index f2ef0c1..ad979eb 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -747,6 +747,9 @@ enum ovs_ct_attr {
  */
 struct ovs_action_push_eth {
struct ovs_key_ethernet addresses;
+#ifndef __KERNEL__
+__be16  eth_type;
+#endif
 };
 
 /**
@@ -823,8 +826,7 @@ enum ovs_nat_attr {
  * entries in the flow key.
  * @OVS_ACTION_ATTR_PUSH_ETH: Push a new outermost Ethernet header onto the
  * packet.
- * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
- * packet.
+ * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the packet.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
diff --git a/lib/odp-execute.c b/lib/odp-execute.c
index c4563b1..8e090c8 100644
--- a/lib/odp-execute.c
+++ b/lib/odp-execute.c
@@ -732,6 +732,21 @@ odp_execute_actions(void *dp, struct dp_packet_batch 
*batch, bool steal,
 case OVS_ACTION_ATTR_METER:
 /* Not implemented yet. */
 break;
+case OVS_ACTION_ATTR_PUSH_ETH: {
+const struct ovs_action_push_eth *eth = nl_attr_get(a);
+
+DP_PACKET_BATCH_FOR_EACH (packet, batch) {
+push_eth(packet, >addresses.eth_dst,
+ >addresses.eth_src, eth->eth_type);
+}
+break;
+}
+
+case OVS_ACTION_ATTR_POP_ETH:
+DP_PACKET_BATCH_FOR_EACH (packet, batch) {
+pop_eth(packet);
+}
+break;
 
 case OVS_ACTION_ATTR_OUTPUT:
 case OVS_ACTION_ATTR_TUNNEL_PUSH:
@@ -739,8 +754,6 @@ odp_execute_actions(void *dp, struct dp_packet_batch 
*batch, bool steal,
 case OVS_ACTION_ATTR_USERSPACE:
 case OVS_ACTION_ATTR_RECIRC:
 case OVS_ACTION_ATTR_CT:
-case OVS_ACTION_ATTR_PUSH_ETH:
-case OVS_ACTION_ATTR_POP_ETH:
 case OVS_ACTION_ATTR_UNSPEC:
 case __OVS_ACTION_ATTR_MAX:
 OVS_NOT_REACHED();
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 8747778..f137044 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -865,9 +865,11 @@ format_odp_action(struct ds *ds, const struct nlattr *a)
 break;
 case OVS_ACTION_ATTR_PUSH_ETH: {
 const struct ovs_action_push_eth *eth = nl_attr_get(a);
-ds_put_format(ds, "push_eth(src="ETH_ADDR_FMT",dst="ETH_ADDR_FMT")",
+ds_put_format(ds, "push_eth(src="ETH_ADDR_FMT",dst="ETH_ADDR_FMT
+  ",type=0x%04"PRIx16")",
   ETH_ADDR_ARGS(eth->addresses.eth_src),
-  ETH_ADDR_ARGS(eth->addresses.eth_dst));
+  ETH_ADDR_ARGS(eth->addresses.eth_dst),
+  ntohs(eth->eth_type));
 break;
 }
 case OVS_ACTION_ATTR_POP_ETH:
@@ -1078,14 +1080,41 @@ parse_odp_userspace_action(const char *s, struct ofpbuf 
*actions)
 odp_put_userspace_action(pid, user_data, user_data_size,
  tunnel_out_port, include_actions, 
actions);
 res = n + n1;
+goto out;
 } else if (s[n] == ')') {
 odp_put_userspace_action(pid, user_data, user_data_size,
  ODPP_NONE, include_actions, actions);
 res = n + 1;
-} else {
-res = -EINVAL;
+goto out;
+}
+}
+
+{
+struct ovs_action_push_eth push;
+int eth_type = 0;
+int n1 = -1;
+
+if (ovs_scan([n], "push_eth(src="ETH_ADDR_SCAN_FMT","
+ "dst="ETH_ADDR_SCAN_FMT",type=%i)%n",
+ 

[ovs-dev] [PATCH v2 5/8] userspace: L3 tunnel support for GRE and LISP

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

Add a boolean "layer3" configuration option for tunnel vports.
The layer3 option defaults to false for all ports except LISP.
GRE ports accept both true and false for "layer3".

A tunnel vport configured with layer3=true receives L3 packets.
which are then converted to Ethernet packets by pushing a dummy
Ethernet heder at the ingress of the OpenFlow pipeline. The
Ethernet header of a packet is stripped before sending to a
layer3 tunnel vport.

Presently a single GRE vport cannot carry both L2 and L3 packets.
But it is possible to create two GRE vports representing the same
GRE tunel, one with layer3=false, the other with layer3=true.
L2 packet from the tunnel are received on the first vport, L3
packets on the second. The controller must send packets to the
layer3 GRE vport to tunnel them without their Ethernet header.

Units tests have been added to check the L3 tunnel handling.

Signed-off-by: Simon Horman 
Signed-off-by: Jiri Benc 
Signed-off-by: Yi Yang 
Signed-off-by; Jan Scheurich 
Co-authored-by: Zoltan Balogh 
---
 lib/netdev-native-tnl.c   | 27 ++-
 lib/netdev-vport.c| 14 --
 ofproto/tunnel.c  | 14 +++---
 tests/tunnel-push-pop-ipv6.at | 15 ++-
 tests/tunnel-push-pop.at  | 33 +++--
 vswitchd/vswitch.xml  | 13 +
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 2798324..b48fccd 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -154,6 +154,10 @@ netdev_tnl_push_ip_header(struct dp_packet *packet,
 *ip_tot_size = dp_packet_size(packet) - sizeof (struct eth_header);
 
 memcpy(eth, header, size);
+/* The encapsulated packet has type Ethernet. Adjust dp_packet. */
+packet->packet_type = htonl(PT_ETH);
+dp_packet_reset_offsets(packet);
+packet->l3_ofs = sizeof (struct eth_header);
 
 if (netdev_tnl_is_header_ipv6(header)) {
 ip6 = netdev_tnl_ipv6_hdr(eth);
@@ -345,6 +349,7 @@ parse_gre_header(struct dp_packet *packet,
 ovs_16aligned_be32 *options;
 int hlen;
 unsigned int ulen;
+uint16_t greh_protocol;
 
 greh = netdev_tnl_ip_extract_tnl_md(packet, tnl, );
 if (!greh) {
@@ -355,10 +360,6 @@ parse_gre_header(struct dp_packet *packet,
 return -EINVAL;
 }
 
-if (greh->protocol != htons(ETH_TYPE_TEB)) {
-return -EINVAL;
-}
-
 hlen = ulen + gre_header_len(greh->flags);
 if (hlen > dp_packet_size(packet)) {
 return -EINVAL;
@@ -388,6 +389,15 @@ parse_gre_header(struct dp_packet *packet,
 options++;
 }
 
+/* Set the new packet type depending on the GRE protocol field. */
+greh_protocol = ntohs(greh->protocol);
+if (greh_protocol == ETH_TYPE_TEB) {
+packet->packet_type = htonl(PT_ETH);
+} else {
+/* Allow all GRE protocol values as Ethertypes */
+packet->packet_type = PACKET_TYPE_BE(OFPHTN_ETHERTYPE, greh_protocol);
+}
+
 return hlen;
 }
 
@@ -451,7 +461,12 @@ netdev_gre_build_header(const struct netdev *netdev,
 
 greh = netdev_tnl_ip_build_header(data, params, IPPROTO_GRE);
 
-greh->protocol = htons(ETH_TYPE_TEB);
+/* XXX: Needs fixing for versatile tunnels in PTAP bridges */
+if (tnl_cfg->is_layer3) {
+greh->protocol = params->flow->dl_type;
+} else {
+greh->protocol = htons(ETH_TYPE_TEB);
+}
 greh->flags = 0;
 
 options = (ovs_16aligned_be32 *) (greh + 1);
@@ -504,6 +519,7 @@ netdev_vxlan_pop_header(struct dp_packet *packet)
 tnl->tun_id = htonll(ntohl(get_16aligned_be32(>vx_vni)) >> 8);
 tnl->flags |= FLOW_TNL_F_KEY;
 
+packet->packet_type = htonl(PT_ETH);
 dp_packet_reset_packet(packet, hlen + VXLAN_HLEN);
 
 return packet;
@@ -583,6 +599,7 @@ netdev_geneve_pop_header(struct dp_packet *packet)
 tnl->metadata.present.len = opts_len;
 tnl->flags |= FLOW_TNL_F_UDPIF;
 
+packet->packet_type = htonl(PT_ETH);
 dp_packet_reset_packet(packet, hlen);
 
 return packet;
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index a6c1e23..8653e96 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -410,7 +410,7 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 const char *name = netdev_get_name(dev_);
 const char *type = netdev_get_type(dev_);
 struct ds errors = DS_EMPTY_INITIALIZER;
-bool needs_dst_port, has_csum;
+bool needs_dst_port, has_csum, optional_layer3;
 uint16_t dst_proto = 0, src_proto = 0;
 struct netdev_tunnel_config tnl_cfg;
 struct smap_node *node;
@@ -418,6 +418,7 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 
 has_csum = strstr(type, "gre") || 

[ovs-dev] [PATCH v2 0/8] userspace: Support for L3 tunneling

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

This patch set is part of an initiative to deal with non-Ethernet packets in 
OVS for advanced use cases like L3 tunneling or NSH. The initiative is 
centering on the new OpenFlow concepts of "Packet type-aware pipeline" (PTAP) 
and "Generic encap/decap actions" (EXT-382). The overall design is documented 
in 
https://docs.google.com/document/d/1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8

This patch series implements the user-space parts of the support for L3 tunnels 
connected to the legacy Ethernet-only pipeline in OVS. 

In large parts it is an adaptation of the earlier work on L3 tunneling by 
Lorand Jakab, Simon Horman, Jiri Benc and Yi Yang, adapted to the new design 
for packet type aware pipelines as prototyped by Jean Tourrilhes. 

It supersedes user-space patches 9-11 of the most recent patch set submitted by 
Yi Yang 
(https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/326865.html). The 
remaining user-space parts of that series (VXLAN-GPE) will be superseded by 
further patches in the context of this initiative.

Key changes are the introduction of explicit packet_type fields in the 
dp_packet and flow structs, as well as a simpler handling of L3 tunnels limited 
to the ofproto layer.

   userspace: Add packet_type in dp_packet and flow
   userspace: Support for push_eth and pop_eth actions 
   userspace: Switching of L3 packets in L2 pipeline 
   ofproto-dpif-upcall: Intialize dump-seq of new flow to zero 
   userspace: L3 tunnel support for GRE and LISP 
   dpif-netlink: Don't send PACKET_TYPE to kernel 
   ofproto-dpif-xlate: refactor compose_output_action__
   userspace: add vxlan gpe support to vport

For native L3 tunneling with the user-space datapath these patches are 
complete. To apply L3 tunneling to the kernel datapath, they are dependent on 
the following other contributions:

.   [PATCH net-next v13 0/8] openvswitch: support for layer 3 encapsulated 
packets
https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/324943.html
The net-next patch has been accepted
.   [RFC PATCH v4 0/6] create tunnel devices using rtnetlink interface
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327749.html
.   Another user-space patch to send the "layer3" option down via rtnetlink 
interface
.   Backports of the kernel datapath patches to the OVS tree module:
Patches 01-08 of patch series [PATCH v2 00/17] port Jiri Benc's L3 
patchset to ovs
https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/326865.html
.   A compatibility mechanism to send down "layer3" option to the 
backported kernel module to support   older kernels.

To ease the progress with the implementation of the subsequent user-space 
patches for the PTAP, EXT-382 and NSH and avoid the pain of frequent re-basing 
of these large patch series, we would like to request exemption for merging the 
present patch series before the kernel datapath pieces are all in place.

 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   9 +-
 include/openflow/openflow-common.h|   9 +
 include/openvswitch/automake.mk   |   4 +-
 include/openvswitch/flow.h|  16 +-
 include/openvswitch/match.h   |   1 +
 include/openvswitch/meta-flow.h   |  15 +-
 include/openvswitch/ofp-print.h   |   9 +-
 include/openvswitch/vxlangpe.h|  76 
 lib/cfm.c |   2 +-
 lib/conntrack.c   |   2 +-
 lib/dp-packet.c   |   3 +
 lib/dp-packet.h   |  25 +-
 lib/dpif-netdev.c |  50 ++-
 lib/dpif-netlink.c|  56 ++-
 lib/dpif.c|   6 +-
 lib/flow.c| 112 +++---
 lib/flow.h|  11 +-
 lib/match.c   |  33 +-
 lib/meta-flow.c   |   2 +
 lib/netdev-bsd.c  |   1 +
 lib/netdev-dummy.c|   5 +
 lib/netdev-linux.c|   4 +-
 lib/netdev-native-tnl.c   |  90 -
 lib/netdev-vport.c|  34 +-
 lib/netdev.h  |   1 +
 lib/nx-match.c|   2 +-
 lib/odp-execute.c |  21 +-
 lib/odp-util.c| 249 ++--
 lib/odp-util.h|  15 +-
 lib/ofp-print.c   |  29 +-
 lib/ofp-util.c|   2 +-
 lib/packets.c 

[ovs-dev] [PATCH v2 4/8] ofproto-dpif-upcall: Intialize dump-seq of new flow to zero

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

This forces updating of flow stat at the next re-validation, even for
flows that are being created when the revalidation has already commenced.

It enables reliable testing of fast path flow stats using ovs-appctl
time/warp after flow creation.

Signed-off-by: Jan Scheurich 
---
 ofproto/ofproto-dpif-upcall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 5d9f7ed..241cce1 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1118,7 +1118,7 @@ upcall_xlate(struct udpif *udpif, struct upcall *upcall,
  * with pushing its stats eventually. */
 }
 
-upcall->dump_seq = seq_read(udpif->dump_seq);
+upcall->dump_seq = 0;
 upcall->reval_seq = seq_read(udpif->reval_seq);
 
 xlate_actions(, >xout);
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/8] userspace: Switching of L3 packets in L2 pipeline

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

Ports have a new layer3 attribute if they send/receive L3 packets.

The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.

A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.

For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.

Signed-off-by: Lorand Jakab 
Signed-off-by: Simon Horman 
Signed-off-by: Jiri Benc 
Signed-off-by: Yi Yang 
Signed-off-by: Jan Scheurich 
Co-authored-by: Zoltan Balogh 

Conflicts:
lib/dpif-netdev.c
ofproto/ofproto-dpif-xlate.c
---
 build-aux/extract-ofp-fields  |   1 +
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 include/openvswitch/match.h   |   1 +
 include/openvswitch/meta-flow.h   |  15 +-
 lib/dpif-netdev.c |  47 +++---
 lib/dpif-netlink.c|   2 +-
 lib/match.c   |  31 +++-
 lib/meta-flow.c   |   2 +
 lib/netdev-vport.c|   8 +-
 lib/netdev.h  |   1 +
 lib/odp-execute.c |   2 +
 lib/odp-util.c| 185 ++
 lib/odp-util.h|   6 +-
 lib/packets.h |   1 -
 ofproto/ofproto-dpif-sflow.c  |   1 +
 ofproto/ofproto-dpif-upcall.c |   4 +-
 ofproto/ofproto-dpif-xlate.c  |  49 +-
 ofproto/ofproto-dpif.c|   6 +-
 ofproto/tunnel.c  |   3 +
 tests/tunnel-push-pop-ipv6.at |  10 +-
 tests/tunnel-push-pop.at  |  12 +-
 tests/tunnel.at   |  28 ++--
 22 files changed, 305 insertions(+), 112 deletions(-)

diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
index af7c69b..d5b8a82 100755
--- a/build-aux/extract-ofp-fields
+++ b/build-aux/extract-ofp-fields
@@ -39,6 +39,7 @@ FORMATTING = {"decimal":("MFS_DECIMAL",  1,   
8),
   "TCP flags":  ("MFS_TCP_FLAGS",2,   2)}
 
 PREREQS = {"none": "MFP_NONE",
+   "Ethernet": "MFP_ETHERNET",
"ARP": "MFP_ARP",
"VLAN VID": "MFP_VLAN_VID",
"IPv4": "MFP_IPV4",
diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index ad979eb..61f9e22 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -363,6 +363,8 @@ enum ovs_key_attr {
/* Only used within kernel data path. */
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
 #endif
+
+   OVS_KEY_ATTR_PACKET_TYPE,  /* be32 packet type */
__OVS_KEY_ATTR_MAX
 };
 
diff --git a/include/openvswitch/match.h b/include/openvswitch/match.h
index 06fa04c..ce06919 100644
--- a/include/openvswitch/match.h
+++ b/include/openvswitch/match.h
@@ -115,6 +115,7 @@ void match_set_ct_ipv6_dst(struct match *, const struct 
in6_addr *);
 void match_set_ct_ipv6_dst_masked(struct match *, const struct in6_addr *,
   const struct in6_addr *);
 
+void match_set_packet_type(struct match *, ovs_be32 packet_type);
 void match_set_skb_priority(struct match *, uint32_t skb_priority);
 void match_set_dl_type(struct match *, ovs_be16);
 void match_set_dl_src(struct match *, const struct eth_addr );
diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
index 11852d2..c284ec6 100644
--- a/include/openvswitch/meta-flow.h
+++ b/include/openvswitch/meta-flow.h
@@ -985,7 +985,7 @@ enum OVS_PACKED_ENUM mf_field_id {
  * Type: MAC.
  * Maskable: bitwise.
  * Formatting: Ethernet.
- * Prerequisites: none.
+ * Prerequisites: Ethernet.
  * Access: read/write.
  * NXM: NXM_OF_ETH_SRC(2) since v1.1.
  * OXM: OXM_OF_ETH_SRC(4) since OF1.2 and v1.7.
@@ -1001,7 +1001,7 @@ enum OVS_PACKED_ENUM mf_field_id {
  * Type: MAC.
  * Maskable: bitwise.
  * Formatting: Ethernet.
- * Prerequisites: none.
+ * Prerequisites: Ethernet.
  * Access: read/write.
  * NXM: NXM_OF_ETH_DST(1) since v1.1.
  * OXM: OXM_OF_ETH_DST(3) since OF1.2 and v1.7.
@@ -1020,7 +1020,7 @@ enum OVS_PACKED_ENUM 

[ovs-dev] [PATCH v2 1/8] userspace: Add packet_type in dp_packet and flow

2017-04-03 Thread Zoltán Balogh
From: Jan Scheurich 

This commit adds a packet_type attribute to the structs dp_packet and flow
to explicitly carry the type of the packet as prepration for the
introduction of the so-called packet type-aware pipeline (PTAP) in OVS.

The packet_type is a big-endian 32 bit integer with the encoding as
specified in OpenFlow verion 1.5.

The upper 16 bits contain the packet type name space. Pre-defined values
are defined in openflow-common.h:

enum ofp_header_type_namespaces {
OFPHTN_ONF = 0, /* ONF namespace. */
OFPHTN_ETHERTYPE = 1,   /* ns_type is an Ethertype. */
OFPHTN_IP_PROTO = 2,/* ns_type is a IP protocol number. */
OFPHTN_UDP_TCP_PORT = 3,/* ns_type is a TCP or UDP port. */
OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
};

The lower 16 bits specify the actual type in the context of the name space.

Only name spaces 0 and 1 will be supported for now.

For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet).
This is the default packet_type in OVS and the only one supported so far.
Packets of type (OFPHTN_ONF, 0) are called Ethernet packets.

In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet.
A packet of type (OFPHTN_ETHERTYPE, ) is a standard L2 packet
whith the Ethernet header (and any VLAN tags) removed to expose the L3
(or L2.5) payload of the packet. These will simply be called L3 packets.

The Ethernet address fields dl_src and dl_dst in struct flow are not
applicable for an L3 packet and must be zero. However, to maintain
compatibility with the large code base, we have chosen to copy the
Ethertype of an L3 packet into the the dl_type field of struct flow.

This does not mean that it will be possible to match on dl_type for L3
packets with PTAP later on. Matching must be done on packet_type instead.

New dp_packets are initialized with packet_type Ethernet. Ports that
receive L3 packets will have to explicitly adjust the packet_type.

Signed-off-by: Jean Tourrilhes 
Signed-off-by: Jan Scheurich 
Co-authored-by: Zoltan Balogh 
---
 include/openflow/openflow-common.h |   9 +++
 include/openvswitch/flow.h |  16 +++---
 include/openvswitch/ofp-print.h|   9 ++-
 lib/cfm.c  |   2 +-
 lib/conntrack.c|   2 +-
 lib/dp-packet.c|   3 +
 lib/dp-packet.h|  25 +++--
 lib/dpif-netdev.c  |   3 +-
 lib/dpif-netlink.c |  17 ++
 lib/dpif.c |   6 +-
 lib/flow.c | 112 ++---
 lib/flow.h |  11 +++-
 lib/match.c|   2 +-
 lib/netdev-bsd.c   |   1 +
 lib/netdev-dummy.c |   5 ++
 lib/netdev-linux.c |   4 +-
 lib/netdev-native-tnl.c|   4 +-
 lib/nx-match.c |   2 +-
 lib/odp-execute.c  |   2 +-
 lib/odp-util.h |   2 +-
 lib/ofp-print.c|  29 --
 lib/ofp-util.c |   2 +-
 lib/packets.c  |  10 +++-
 lib/packets.h  |  25 +
 lib/pcap-file.c|   4 +-
 ofproto/ofproto-dpif-rid.h |   2 +-
 ofproto/ofproto-dpif-xlate.c   |   2 +-
 ofproto/ofproto-dpif.c |   4 +-
 ovn/controller/pinctrl.c   |   6 +-
 tests/test-flows.c |   2 +-
 30 files changed, 232 insertions(+), 91 deletions(-)

diff --git a/include/openflow/openflow-common.h 
b/include/openflow/openflow-common.h
index 7b619a9..2382682 100644
--- a/include/openflow/openflow-common.h
+++ b/include/openflow/openflow-common.h
@@ -454,4 +454,13 @@ enum ofp_table_config {
 OFPTC14_VACANCY_EVENTS= 1 << 3, /* Enable vacancy events. */
 };
 
+/* Header and packet type name spaces. */
+enum ofp_header_type_namespaces {
+OFPHTN_ONF = 0, /* ONF namespace. */
+OFPHTN_ETHERTYPE = 1,   /* ns_type is an Ethertype. */
+OFPHTN_IP_PROTO = 2,/* ns_type is a IP protocol number. */
+OFPHTN_UDP_TCP_PORT = 3,/* ns_type is a TCP or UDP port. */
+OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
+};
+
 #endif /* openflow/openflow-common.h */
diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 188467d..36a2a85 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -23,7 +23,7 @@
 /* This sequence number should be incremented whenever anything involving flows
  * or the wildcarding of flows changes.  This will cause build assertion
  * failures in places which likely need to be updated. */
-#define FLOW_WC_SEQ 38
+#define FLOW_WC_SEQ 39
 
 /* Number of Open vSwitch extension 32-bit registers. */
 #define FLOW_N_REGS 16
@@ -108,7 +108,7 @@ struct flow {
  

Re: [ovs-dev] [RFC] OpenStack Metadata API and OVN

2017-04-03 Thread Vasiliy Tolstov
Thanks! Very interesting

2017-04-03 19:03 GMT+03:00 Russell Bryant :
> I worked with a fellow team member to come up with a proposal for how
> to support the OpenStack Metadata API with OVN.  The proposal requires
> no additional changes to OVN itself, but I wanted to share on this
> list anyway for anyone that may be interested.
>
> https://review.openstack.org/#/c/452811/
>
> Thanks,
>
> --
> Russell Bryant
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] OpenStack Metadata API and OVN

2017-04-03 Thread Russell Bryant
I worked with a fellow team member to come up with a proposal for how
to support the OpenStack Metadata API with OVN.  The proposal requires
no additional changes to OVN itself, but I wanted to share on this
list anyway for anyone that may be interested.

https://review.openstack.org/#/c/452811/

Thanks,

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/7] netdev-dpdk: Add support for keepalive functionality.

2017-04-03 Thread Aaron Conole
"Bodireddy, Bhanuprakash"  writes:

>>>
>>> +/*
>>> + * OVS Shared Memory structure
>>> + *
>>> + * The information in the shared memory block will be read by collectd.
>>> + * */
>>> +struct dpdk_keepalive_shm {
>>> +/* IPC semaphore. Posted when a core dies */
>>> +sem_t core_died;
>>> +
>>> +/*
>>> + * Relayed status of each core.
>>> + * UNUSED[0], ALIVE[1], DEAD[2], GONE[3], MISSING[4], DOZING[5],
>>SLEEP[6]
>>> + **/
>>> +enum rte_keepalive_state core_state[RTE_KEEPALIVE_MAXCORES];
>>
>>What is 'DOZING'?  What is 'MISSING'?  Where is a definition of these states
>>and what they mean?  What is DEAD?
>
> State 'DOZING' means core going idle,
>'MISSING' indicates first heartbeat miss,
>'DEAD' indicates two heart beat misses,
>'GONE' means it missed three or more heartbeats and is completely 
> 'burried'
>
> Note that these states are defined in DPDK Keepalive library. 
> [rte_keepalive.h]

I don't see that documented in the keepalive header - merely it's
documented by the fact that the code seems to implement a statemachine
that implies that.

Anyway, that's a dpdk issue - but it would be good to describe a little
something here so that the reader can understand why/when you are
transitioning things.

Even still, I think you should provide your own shim layer for enum
values.  With this, you are making DPDK's state values as a part of
the guaranteed ABI that an end application (such as ceilometer) will
rely on.  If DPDK changes the enum, versions won't properly match up.

>>> +/* Last seen timestamp of the core */
>>> +uint64_t core_last_seen_times[RTE_KEEPALIVE_MAXCORES];
>>> +
>>> +/* Store pmd thread tid */
>>> +pid_t thread_id[RTE_KEEPALIVE_MAXCORES];
>>> +};
>>> +
>>> +static struct dpdk_keepalive_shm *ka_shm;
>>>  static int netdev_dpdk_class_init(void);  static int
>>> netdev_dpdk_vhost_class_init(void);
>>>
>>> @@ -586,6 +613,202 @@ netdev_dpdk_mempool_configure(struct
>>netdev_dpdk *dev)
>>>  return 0;
>>>  }
>>>
>>> +void
>>> +dpdk_ka_get_tid(unsigned core_idx)
>>> +{
>>> +uint32_t tid = rte_sys_gettid();
>>> +
>>> +if (dpdk_is_ka_enabled() && ka_shm) {
>>> +ka_shm->thread_id[core_idx] = tid;
>>> +}
>>> +}
>>> +
>>> +/* Callback function invoked on heartbeat miss.  Verify if it is
>>> +genuine
>>> + * heartbeat miss or a false positive and log the message accordingly.
>>> + */
>>> +static void
>>> +dpdk_failcore_cb(void *ptr_data, const int core_id) {
>>> +struct dpdk_keepalive_shm *ka_shm = (struct dpdk_keepalive_shm
>>> +*)ptr_data;
>>> +
>>> +if (ka_shm) {
>>> +int pstate;
>>> +uint32_t tid = ka_shm->thread_id[core_id];
>>> +int err = get_process_status(tid, );
>>> +
>>> +if (!err) {
>>> +switch (pstate) {
>>> +
>>> +case ACTIVE_STATE:
>>> +VLOG_INFO_RL(,"False positive, pmd tid[%"PRIu32"] 
>>> alive\n",
>>> +  tid);
>>
>>Can we get false positives?  Doesn't that diminish the usefulness?
>
> You made a good point! Thanks for asking this. 
> On false positives, this can happen in few scenarios and depends on
> the load and port distribution among PMD threads.  This was also
> observed when the timer intervals aren't appropriately tuned in OvS
> and collectd service.
>
> For example, If single PMD thread is handling multiple PHY ports and
> vhostuser ports it can have a long list of ports to be polled and at
> times can miss successive heart beats. This may happen because PMD
> thread is spending more time processing the packets and haven't marked
> itself alive for a while. If the timer intervals are aggressive the
> chances of it missing the heartbeats are higher. This made me to add
> helper functions to check if the PMD thread is active and subsequently
> log it appropriately.
>
> This doesn't diminish the usefulness of the feature as it all boils
> down to setting sensible timeout values in ovsdb and collectd
> services. I would guess millisecond granularity isn't needed by every
> user and the case I explained above is more for customers looking at
> <=5ms granularity.
>
>>
>>> +break;
>>> +case STOPPED_STATE:
>>> +case TRACED_STATE:
>>> +case DEFUNC_STATE:
>>> +case UNINTERRUPTIBLE_SLEEP_STATE:
>>> +VLOG_WARN_RL(,
>>> +"PMD tid[%"PRIu32"] on core[%d] is unresponsive\n",
>>> +tid, core_id);
>>> +break;
>>> +default:
>>> +VLOG_DBG("%s: The process state: %d\n", __FUNCTION__,
>>pstate);
>>> +OVS_NOT_REACHED();
>>> +}
>>> +}
>>> +}
>>> +}
>>> +
>>> +/* Notify the external monitoring application for change in core state.
>>> + *
>>> + * On a consecutive heartbeat miss the core is considered dead and
>>> +the status
>>> + * is relayed to 

Re: [ovs-dev] OVN meeting report

2017-04-03 Thread Valentine Sinitsyn

Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:

Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)
I've checked out your raft3 branch, and even learned how to create an 
OVSDB cluster. Thanks for the docs!


What I don't get though is how do I instruct IDL to connect to the 
cluster now? Do I just connect to a random server, or there should be 
some dispatcher, or whatever?


Thanks,
Valentine


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7] Add OVS DPDK keep-alive functionality

2017-04-03 Thread Aaron Conole
"Bodireddy, Bhanuprakash"  writes:

> Thanks Aaron for reviewing this patch series.  My comments inline.
>
>>
>>Hi Bhanu,
>>
>>Bhanuprakash Bodireddy  writes:
>>
>>> This patch is aimed at achieving Fastpath Service Assurance in
>>> OVS-DPDK deployments. This commit adds support for monitoring the
>>> packet processing cores(pmd thread cores) by dispatching heartbeats at
>>> regular intervals. Incase of heartbeat miss the failure shall be
>>> detected & reported to higher level fault management
>>systems/frameworks.
>>>
>>> The implementation uses POSIX shared memory object for storing the
>>> events that will be read by monitoring framework. keep-alive feature
>>> can be enabled through below OVSDB settings.
>>>
>>> keepalive=true
>>>- Keepalive feature is disabled by default
>>>
>>> keepalive-interval="50"
>>>- Timer interval in milliseconds for monitoring the packet
>>>  processing cores.
>>>
>>> keepalive-shm-name="/dpdk_keepalive_shm_name"
>>>- Shared memory block name where the events shall be updated.
>>>
>>> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
>>> up at regular intervals to update the timestamp and status of pmd
>>> cores in shared memory region.
>>>
>>> An external monitoring framework like collectd(with dpdk plugin
>>> support) can read the status updates from shared memory. On a missing
>>> heartbeat, the collectd shall relay the status to ceilometer service
>>> running in the controller. Below is the high level overview of deployment
>>model.
>>
>>Given this runs in-line in the fastpath, can you tell me what kind of impact 
>>it is
>>to throughput (and possibly latency) that this series implies?
>
> This feature has very minimal impact on the throughput. 
> As you know, there is a significant drop on the current master due to 
> commit: daf4d3c18da4("odp: Support conntrack orig tuple key.").
> I folded in the RFC patch  by Daniele that fixes the memset and measured the
> throughput with and without KA feature and see no significant difference in 
> throughput.
>
> Agree that Latency is indeed important. I would collect the latency
> stats and will share
> the results in this thread. 

Awesome.  Even better would be to put those informations in the cover
letter of v2.  Just a short summary of the tests and what the results
were for latency / throughput would be good.

Thanks, Bhanu.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] netdev-dpdk: Use uint8_t for port_id.

2017-04-03 Thread Ilya Maximets
Currently, signed integer is used for 'port_id' variable and
'-1' as identifier of bad or uninitialized 'port_id'.

This inconsistent with dpdk library and, also, in few cases,
leads to passing '-1' to dpdk functions where uint8_t expected.

Such behaviour doesn't produce any issues, but it's better to
use same type as in dpdk library for consistency.

Also, magic number '-1' replaced with DPDK_ETH_PORT_ID_INVALID
macro.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 61 +++
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 658a454..216ced8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -140,6 +140,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / 
ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
 #define OVS_VHOST_QUEUE_DISABLED(-2) /* Queue was disabled by guest and not
   * yet mapped to another queue. */
 
+#define DPDK_ETH_PORT_ID_INVALIDRTE_MAX_ETHPORTS
+
 #define VHOST_ENQ_RETRY_NUM 8
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 
@@ -309,7 +311,7 @@ struct dpdk_ring {
 struct rte_ring *cring_tx;
 struct rte_ring *cring_rx;
 unsigned int user_port_id; /* User given port no, parsed from port name */
-int eth_port_id; /* ethernet device port id */
+uint8_t eth_port_id; /* ethernet device port id */
 struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
 };
 
@@ -325,7 +327,7 @@ enum dpdk_hw_ol_features {
 
 struct netdev_dpdk {
 struct netdev up;
-int port_id;
+uint8_t port_id;
 int max_packet_len;
 enum dpdk_dev_type type;
 
@@ -402,7 +404,7 @@ struct netdev_dpdk {
 
 struct netdev_rxq_dpdk {
 struct netdev_rxq up;
-int port_id;
+uint8_t port_id;
 };
 
 static int netdev_dpdk_class_init(void);
@@ -602,12 +604,12 @@ check_link_status(struct netdev_dpdk *dev)
 dev->link_reset_cnt++;
 dev->link = link;
 if (dev->link.link_status) {
-VLOG_DBG_RL(, "Port %d Link Up - speed %u Mbps - %s",
+VLOG_DBG_RL(, "Port %"PRIu8" Link Up - speed %u Mbps - %s",
 dev->port_id, (unsigned) dev->link.link_speed,
 (dev->link.link_duplex == ETH_LINK_FULL_DUPLEX) ?
  ("full-duplex") : ("half-duplex"));
 } else {
-VLOG_DBG_RL(, "Port %d Link Down", dev->port_id);
+VLOG_DBG_RL(, "Port %"PRIu8" Link Down", dev->port_id);
 }
 }
 }
@@ -725,8 +727,8 @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
 if (rx_csum_ol_flag &&
 (info.rx_offload_capa & rx_chksm_offload_capa) !=
  rx_chksm_offload_capa) {
-VLOG_WARN_ONCE("Rx checksum offload is not supported on device %d",
-   dev->port_id);
+VLOG_WARN_ONCE("Rx checksum offload is not supported on device %"PRIu8,
+   dev->port_id);
 dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
 return;
 }
@@ -737,7 +739,8 @@ static void
 dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex)
 {
 if (rte_eth_dev_flow_ctrl_set(dev->port_id, >fc_conf)) {
-VLOG_WARN("Failed to enable flow control on device %d", dev->port_id);
+VLOG_WARN("Failed to enable flow control on device %"PRIu8,
+  dev->port_id);
 }
 }
 
@@ -775,7 +778,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 
 memset(_addr, 0x0, sizeof(eth_addr));
 rte_eth_macaddr_get(dev->port_id, _addr);
-VLOG_INFO_RL(, "Port %d: "ETH_ADDR_FMT,
+VLOG_INFO_RL(, "Port %"PRIu8": "ETH_ADDR_FMT,
 dev->port_id, ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes));
 
 memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN);
@@ -787,7 +790,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 /* Get the Flow control configuration for DPDK-ETH */
 diag = rte_eth_dev_flow_ctrl_get(dev->port_id, >fc_conf);
 if (diag) {
-VLOG_DBG("cannot get flow control parameters on port=%d, err=%d",
+VLOG_DBG("cannot get flow control parameters on port=%"PRIu8", err=%d",
  dev->port_id, diag);
 }
 
@@ -832,7 +835,7 @@ netdev_dpdk_alloc_txq(unsigned int n_txqs)
 }
 
 static int
-common_construct(struct netdev *netdev, unsigned int port_no,
+common_construct(struct netdev *netdev, uint8_t port_no,
  enum dpdk_dev_type type, int socket_id)
 OVS_REQUIRES(dpdk_mutex)
 {
@@ -917,7 +920,8 @@ vhost_common_construct(struct netdev *netdev)
 return ENOMEM;
 }
 
-return common_construct(netdev, -1, DPDK_DEV_VHOST, socket_id);
+return common_construct(netdev, DPDK_ETH_PORT_ID_INVALID,
+DPDK_DEV_VHOST, socket_id);
 }
 
 static int
@@ -977,7 +981,8 @@ netdev_dpdk_construct(struct netdev *netdev)
 int err;
 
 ovs_mutex_lock(_mutex);
-err = common_construct(netdev, 

[ovs-dev] [PATCH 2/3] netdev-dpdk: Fix device leak on port deletion.

2017-04-03 Thread Ilya Maximets
Currently, once created device in dpdk will exist forever
even after del-port operation untill we manually call
'ovs-appctl netdev-dpdk/detach ', where  is not
the port's name but the name of dpdk eth device or pci address.

Few issues with current implementation:

1. Different API for usual (system) and DPDK devices.
   (We have to call 'ovs-appctl netdev-dpdk/detach' each
time after 'del-port' to actually free the device)
   This is a big issue mostly for virtual DPDK devices.

2. Follows from 1:
   For DPDK devices 'del-port' leads just to
   'rte_eth_dev_stop' and subsequent 'add-port' will
   just start the already existing device. Such behaviour
   will not reset the device to initial state as it could
   be expected. For example: virtual pcap pmd will continue
   reading input file instead of reading it from the beginning.

3. Follows from 2:
   After execution of the following commands 'port1' will be
   configured with the 'old-options' while 'ovs-vsctl show'
   will show us 'new-options' in dpdk-devargs field:

 ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
   options:dpdk-devargs=,
 ovs-vsctl del-port port1
 ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
   options:dpdk-devargs=,

4. Follows from 1:
   Not detached device consumes 'port_id'. Since we have very
   limited number of 'port_id's (32 in common case) this may
   lead to quick exhausting of id pool and inability to add any
   other port.

To avoid above issues we need to detach all the attached devices on
port destruction.
appctl 'netdev-dpdk/detach' removed because not needed anymore.

CC: Ciara Loftus 
Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
Signed-off-by: Ilya Maximets 
---
 Documentation/howto/dpdk.rst |  5 ++--
 lib/netdev-dpdk.c| 66 +++-
 2 files changed, 18 insertions(+), 53 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index dc63f7d..20d8975 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -342,10 +342,9 @@ Then it can be attached to OVS::
 $ ovs-vsctl add-port br0 dpdkx -- set Interface dpdkx type=dpdk \
 options:dpdk-devargs=:01:00.0
 
-It is also possible to detach a port from ovs, the user has to remove the
-port using the del-port command, then it can be detached using::
+Detaching will be performed while processing del-port command::
 
-$ ovs-appctl netdev-dpdk/detach :01:00.0
+$ ovs-vsctl del-port dpdkx
 
 This feature is not supported with VFIO and does not work with some NICs.
 For more information please refer to the `DPDK Port Hotplug Framework
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c8d7108..658a454 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -359,6 +359,9 @@ struct netdev_dpdk {
 /* Device arguments for dpdk ports */
 char *devargs;
 
+/* If true, device was attached by rte_eth_dev_attach(). */
+bool attached;
+
 /* In dpdk_list. */
 struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
 
@@ -851,6 +854,7 @@ common_construct(struct netdev *netdev, unsigned int 
port_no,
 dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
 ovsrcu_index_init(>vid, -1);
 dev->vhost_reconfigured = false;
+dev->attached = false;
 
 ovsrcu_init(>qos_conf, NULL);
 
@@ -996,10 +1000,21 @@ static void
 netdev_dpdk_destruct(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+char devname[RTE_ETH_NAME_MAX_LEN];
 
 ovs_mutex_lock(_mutex);
 
 rte_eth_dev_stop(dev->port_id);
+
+if (dev->attached) {
+rte_eth_dev_close(dev->port_id);
+if (rte_eth_dev_detach(dev->port_id, devname) < 0) {
+VLOG_ERR("Device '%s' can not be detached", dev->devargs);
+} else {
+VLOG_INFO("Device '%s' detached", devname);
+}
+}
+
 free(dev->devargs);
 common_destruct(dev);
 
@@ -1128,6 +1143,7 @@ netdev_dpdk_process_devargs(const char *devargs, char 
**errp)
 /* Device not found in DPDK, attempt to attach it */
 if (!rte_eth_dev_attach(devargs, _port_id)) {
 /* Attach successful */
+dev->attached = true;
 VLOG_INFO("Device '%s' attached to DPDK", devargs);
 } else {
 /* Attach unsuccessful */
@@ -2436,53 +2452,6 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, 
int argc,
 unixctl_command_reply(conn, "OK");
 }
 
-static void
-netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
-   const char *argv[], void 

[ovs-dev] [PATCH 1/3] netdev-dpdk: Fix double attaching of virtual devices.

2017-04-03 Thread Ilya Maximets
'devargs' for virtual devices contains not only name but
also a list of arguments like this:

'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap'
or
'eth_af_packet0,iface=eth0'

We must cut off the arguments from this string before calling
'rte_eth_dev_get_port_by_name()' to avoid double attaching of
the same device.

CC: Ciara Loftus 
Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ddc651b..c8d7108 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1114,9 +1114,16 @@ static int
 netdev_dpdk_process_devargs(const char *devargs, char **errp)
 {
 uint8_t new_port_id = UINT8_MAX;
+char *ind, *name = xstrdup(devargs);
+
+/* Get the name from the comma separated list of arguments. */
+ind = index(name, ',');
+if (ind != NULL) {
+*ind = '\0';
+}
 
 if (!rte_eth_dev_count()
-|| rte_eth_dev_get_port_by_name(devargs, _port_id)
+|| rte_eth_dev_get_port_by_name(name, _port_id)
 || !rte_eth_dev_is_valid_port(new_port_id)) {
 /* Device not found in DPDK, attempt to attach it */
 if (!rte_eth_dev_attach(devargs, _port_id)) {
@@ -1129,6 +1136,7 @@ netdev_dpdk_process_devargs(const char *devargs, char 
**errp)
 }
 }
 
+free(name);
 return new_port_id;
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/3] Hotplug fixes & port_id refactoring

2017-04-03 Thread Ilya Maximets

Ilya Maximets (3):
  netdev-dpdk: Fix double attaching of virtual devices.
  netdev-dpdk: Fix device leak on port deletion.
  netdev-dpdk: Use uint8_t for port_id.

 Documentation/howto/dpdk.rst |   5 +-
 lib/netdev-dpdk.c| 137 +++
 2 files changed, 62 insertions(+), 80 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/7] utils: Introduce xusleep for subsecond granularity.

2017-04-03 Thread Aaron Conole
"Bodireddy, Bhanuprakash"  writes:

>>Bhanuprakash Bodireddy  writes:
>>
>>> This will be used by KA framework that needs millisecond granularity.
>>>
>>> Signed-off-by: Bhanuprakash Bodireddy
>>> 
>>> ---
>>
>>Without this patch, builds starting at 3/7 will fail.  That's pretty
>> bad.  Please see
>>my earlier comment about including helpers when they are used.
>
> As mentioned earlier, it was my mistake and will fix this in v2.
>
>>
>>>  lib/util.c | 12 
>>>  lib/util.h |  1 +
>>>  2 files changed, 13 insertions(+)
>>>
>>> diff --git a/lib/util.c b/lib/util.c
>>> index 1c06ce0..889ebd8 100644
>>> --- a/lib/util.c
>>> +++ b/lib/util.c
>>> @@ -2125,6 +2125,18 @@ xsleep(unsigned int seconds)
>>>  ovsrcu_quiesce_end();
>>>  }
>>>
>>> +void
>>> +xusleep(unsigned int microseconds)
>>> +{
>>> +ovsrcu_quiesce_start();
>>> +#ifdef _WIN32
>>> +Sleep(microseconds/1000);
>>> +#else
>>> +usleep(microseconds);
>>> +#endif
>>> +ovsrcu_quiesce_end();
>>> +}
>>> +
>>
>>Wow!  This is deceptive.  If I call this with microseconds argument
>> as, say, 999
>>there's a *strong* chance this will NOT sleep for at least that amount of 
>>time.
>>This function needs a different implementation or just keep it non-windows.
>
> This is indeed deceptive :).  
> I did spend some time to understand if there is an equivalent usleep()
> implementation in Windows and came across Queryperformancecounter and
> usermode scheduling in windows.  I would better stick to non-windows
> implementation and allow the WINDOWS experts add their equivalent
> implantation above.

Since DPDK is only for linux/BSD right now, better to not expose this.
Let it be implemented when it's needed.  I'd make it a static inside
the dpdk.c file.  If the functionality is required from the windows
side, it can be re-abstracted.

> - Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 7/7] Documentation: Update DPDK doc with Keepalive feature.

2017-04-03 Thread Bodireddy, Bhanuprakash
>-Original Message-
>From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
>boun...@openvswitch.org] On Behalf Of Stephen Finucane
>Sent: Monday, April 3, 2017 11:15 AM
>To: ovs-dev@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH 7/7] Documentation: Update DPDK doc with
>Keepalive feature.
>
>On Sat, 2017-04-01 at 20:02 +0100, Bhanuprakash Bodireddy wrote:
>> Signed-off-by: Bhanuprakash Bodireddy
>> 
>
>Couple of changes suggested below if there's a future v2.
>
Thanks for reviewing the documentation. I would make the edits as suggested 
here when I post v2.

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/7] utils: Introduce xusleep for subsecond granularity.

2017-04-03 Thread Bodireddy, Bhanuprakash
>Bhanuprakash Bodireddy  writes:
>
>> This will be used by KA framework that needs millisecond granularity.
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> 
>> ---
>
>Without this patch, builds starting at 3/7 will fail.  That's pretty bad.  
>Please see
>my earlier comment about including helpers when they are used.

As mentioned earlier, it was my mistake and will fix this in v2.

>
>>  lib/util.c | 12 
>>  lib/util.h |  1 +
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/lib/util.c b/lib/util.c
>> index 1c06ce0..889ebd8 100644
>> --- a/lib/util.c
>> +++ b/lib/util.c
>> @@ -2125,6 +2125,18 @@ xsleep(unsigned int seconds)
>>  ovsrcu_quiesce_end();
>>  }
>>
>> +void
>> +xusleep(unsigned int microseconds)
>> +{
>> +ovsrcu_quiesce_start();
>> +#ifdef _WIN32
>> +Sleep(microseconds/1000);
>> +#else
>> +usleep(microseconds);
>> +#endif
>> +ovsrcu_quiesce_end();
>> +}
>> +
>
>Wow!  This is deceptive.  If I call this with microseconds argument as, say, 
>999
>there's a *strong* chance this will NOT sleep for at least that amount of time.
>This function needs a different implementation or just keep it non-windows.

This is indeed deceptive :).  
I did spend some time to understand if there is  an equivalent usleep() 
implementation in Windows and came across Queryperformancecounter and usermode 
scheduling in windows.  I would better stick to non-windows implementation and 
allow the WINDOWS experts add their equivalent implantation above. 

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/7] process: Retrieve process status.

2017-04-03 Thread Bodireddy, Bhanuprakash
>-Original Message-
>From: Aaron Conole [mailto:acon...@redhat.com]
>Sent: Monday, April 3, 2017 2:00 AM
>To: Bodireddy, Bhanuprakash 
>Cc: d...@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH 4/7] process: Retrieve process status.
>
>Bhanuprakash Bodireddy  writes:
>
>> Implement function to retrieve the process status. This will be used
>> by Keepalive monitoring thread for detecting false alarms.
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> 
>> ---
>>  lib/process.c | 60
>>
>+++
>>  lib/process.h | 10 ++
>>  2 files changed, 70 insertions(+)
>>
>> diff --git a/lib/process.c b/lib/process.c index e9d0ba9..e0601dd
>> 100644
>> --- a/lib/process.c
>> +++ b/lib/process.c
>> @@ -50,6 +50,20 @@ struct process {
>>  int status;
>>  };
>>
>> +struct pstate2Num {
>> +char *tidState;
>> +int num;
>> +};
>> +
>> +const struct pstate2Num pstate_map[] = {
>> +{ "S", STOPPED_STATE },
>> +{ "R", ACTIVE_STATE },
>> +{ "t", TRACED_STATE },
>> +{ "Z", DEFUNC_STATE },
>> +{ "D", UNINTERRUPTIBLE_SLEEP_STATE },
>> +{ "NULL", UNUSED_STATE },
>> +};
>> +
>>  /* Pipe used to signal child termination. */  static int fds[2];
>>
>> @@ -390,6 +404,52 @@ process_run(void)  #endif  }
>>
>> +int
>> +get_process_status(int tid, int *pstate) { #ifndef _WIN32
>
>The following is Linux specific.  Please use an '#if LINUX' - there are 
>examples
>in the code for this.

That's right. I would do this in v2.

- Bhanuprakash
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/7] netdev-dpdk: Add support for keepalive functionality.

2017-04-03 Thread Bodireddy, Bhanuprakash
>>
>> +/*
>> + * OVS Shared Memory structure
>> + *
>> + * The information in the shared memory block will be read by collectd.
>> + * */
>> +struct dpdk_keepalive_shm {
>> +/* IPC semaphore. Posted when a core dies */
>> +sem_t core_died;
>> +
>> +/*
>> + * Relayed status of each core.
>> + * UNUSED[0], ALIVE[1], DEAD[2], GONE[3], MISSING[4], DOZING[5],
>SLEEP[6]
>> + **/
>> +enum rte_keepalive_state core_state[RTE_KEEPALIVE_MAXCORES];
>
>What is 'DOZING'?  What is 'MISSING'?  Where is a definition of these states
>and what they mean?  What is DEAD?

State 'DOZING' means core going idle,
   'MISSING' indicates first heartbeat miss,
   'DEAD' indicates two heart beat misses,
   'GONE' means it missed three or more heartbeats and is completely 
'burried'

Note that these states are defined in DPDK Keepalive library. [rte_keepalive.h]

>
>> +/* Last seen timestamp of the core */
>> +uint64_t core_last_seen_times[RTE_KEEPALIVE_MAXCORES];
>> +
>> +/* Store pmd thread tid */
>> +pid_t thread_id[RTE_KEEPALIVE_MAXCORES];
>> +};
>> +
>> +static struct dpdk_keepalive_shm *ka_shm;
>>  static int netdev_dpdk_class_init(void);  static int
>> netdev_dpdk_vhost_class_init(void);
>>
>> @@ -586,6 +613,202 @@ netdev_dpdk_mempool_configure(struct
>netdev_dpdk *dev)
>>  return 0;
>>  }
>>
>> +void
>> +dpdk_ka_get_tid(unsigned core_idx)
>> +{
>> +uint32_t tid = rte_sys_gettid();
>> +
>> +if (dpdk_is_ka_enabled() && ka_shm) {
>> +ka_shm->thread_id[core_idx] = tid;
>> +}
>> +}
>> +
>> +/* Callback function invoked on heartbeat miss.  Verify if it is
>> +genuine
>> + * heartbeat miss or a false positive and log the message accordingly.
>> + */
>> +static void
>> +dpdk_failcore_cb(void *ptr_data, const int core_id) {
>> +struct dpdk_keepalive_shm *ka_shm = (struct dpdk_keepalive_shm
>> +*)ptr_data;
>> +
>> +if (ka_shm) {
>> +int pstate;
>> +uint32_t tid = ka_shm->thread_id[core_id];
>> +int err = get_process_status(tid, );
>> +
>> +if (!err) {
>> +switch (pstate) {
>> +
>> +case ACTIVE_STATE:
>> +VLOG_INFO_RL(,"False positive, pmd tid[%"PRIu32"] 
>> alive\n",
>> +  tid);
>
>Can we get false positives?  Doesn't that diminish the usefulness?

You made a good point! Thanks for asking this. 
On false positives, this can happen in few scenarios and depends on the load 
and port distribution among PMD threads.  This was also observed when the timer 
intervals aren't appropriately tuned in OvS and collectd service. 

For example, If single PMD thread is handling multiple PHY ports and  vhostuser 
ports it can have a long list  of ports to be polled  and at times can miss 
successive heart beats. This may happen because PMD thread is spending  more 
time processing the packets and haven't marked itself alive for a while. If the 
timer intervals are aggressive the chances of it missing the heartbeats are 
higher. This made me to add helper functions to check if the PMD thread is 
active and subsequently log it appropriately.

This doesn't diminish the usefulness of the feature as it all boils down to 
setting sensible timeout values in ovsdb and collectd services. I would guess 
millisecond granularity isn't needed by  every user and the case I explained 
above is  more for customers looking at <=5ms granularity.

>
>> +break;
>> +case STOPPED_STATE:
>> +case TRACED_STATE:
>> +case DEFUNC_STATE:
>> +case UNINTERRUPTIBLE_SLEEP_STATE:
>> +VLOG_WARN_RL(,
>> +"PMD tid[%"PRIu32"] on core[%d] is unresponsive\n",
>> +tid, core_id);
>> +break;
>> +default:
>> +VLOG_DBG("%s: The process state: %d\n", __FUNCTION__,
>pstate);
>> +OVS_NOT_REACHED();
>> +}
>> +}
>> +}
>> +}
>> +
>> +/* Notify the external monitoring application for change in core state.
>> + *
>> + * On a consecutive heartbeat miss the core is considered dead and
>> +the status
>> + * is relayed to monitoring framework by unlocking the semaphore.
>> + */
>> +static void
>> +dpdk_ka_relay_core_state(void *ptr_data, const int core_id,
>> +   const enum rte_keepalive_state core_state, uint64_t
>> +last_alive) {
>> +struct dpdk_keepalive_shm *ka_shm = (struct dpdk_keepalive_shm
>*)ptr_data;
>> +int count;
>> +
>> +if (!ka_shm) {
>> +VLOG_ERR("KeepAlive: Invalid shared memory block\n");
>> +return;
>> +}
>> +
>> +VLOG_DBG_RL(,
>> +   "TS(%lu):CORE%d, old state:%d, current_state:%d\n",
>> +   (unsigned 
>> long)time(NULL),core_id,ka_shm->core_state[core_id],
>> +   core_state);
>> +
>> +switch (core_state) {
>> +case RTE_KA_STATE_ALIVE:
>> +case RTE_KA_STATE_MISSING:
>> +

Re: [ovs-dev] [PATCH 1/7] dpdk: Add helper functions for DPDK keepalive.

2017-04-03 Thread Bodireddy, Bhanuprakash
>-Original Message-
>From: Aaron Conole [mailto:acon...@redhat.com]
>Sent: Monday, April 3, 2017 1:58 AM
>To: Bodireddy, Bhanuprakash 
>Cc: d...@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH 1/7] dpdk: Add helper functions for DPDK
>keepalive.
>
>Bhanuprakash Bodireddy  writes:
>
>> Introduce helper functions in 'dpdk' module that are needed for
>> keepalive functionality. Also add dummy functions in 'dpdk-stub'
>> module that are needed when DPDK is not available.
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> 
>> ---
>
>I think it's better to add helpers at the time they are first called.
>That means that there's no dead code at any point in the build, and it
>becomes obvious why the function is added.

Completely agree. I split  my earlier RFC patch in to smaller patches
and rebased them with master and later realized they were out of order. 
I will  handle this appropriately in next version. 

-Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7] Add OVS DPDK keep-alive functionality

2017-04-03 Thread Bodireddy, Bhanuprakash
Thanks Aaron for reviewing this patch series.  My comments inline.

>
>Hi Bhanu,
>
>Bhanuprakash Bodireddy  writes:
>
>> This patch is aimed at achieving Fastpath Service Assurance in
>> OVS-DPDK deployments. This commit adds support for monitoring the
>> packet processing cores(pmd thread cores) by dispatching heartbeats at
>> regular intervals. Incase of heartbeat miss the failure shall be
>> detected & reported to higher level fault management
>systems/frameworks.
>>
>> The implementation uses POSIX shared memory object for storing the
>> events that will be read by monitoring framework. keep-alive feature
>> can be enabled through below OVSDB settings.
>>
>> keepalive=true
>>- Keepalive feature is disabled by default
>>
>> keepalive-interval="50"
>>- Timer interval in milliseconds for monitoring the packet
>>  processing cores.
>>
>> keepalive-shm-name="/dpdk_keepalive_shm_name"
>>- Shared memory block name where the events shall be updated.
>>
>> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
>> up at regular intervals to update the timestamp and status of pmd
>> cores in shared memory region.
>>
>> An external monitoring framework like collectd(with dpdk plugin
>> support) can read the status updates from shared memory. On a missing
>> heartbeat, the collectd shall relay the status to ceilometer service
>> running in the controller. Below is the high level overview of deployment
>model.
>
>Given this runs in-line in the fastpath, can you tell me what kind of impact 
>it is
>to throughput (and possibly latency) that this series implies?

This feature has very minimal impact on the throughput. 
As you know, there is a significant drop on the current master due to 
commit: daf4d3c18da4("odp: Support conntrack orig tuple key.").
I folded in the RFC patch  by Daniele that fixes the memset and measured the
throughput with and without KA feature and see no significant difference in 
throughput.

Agree that Latency is indeed important. I would collect the latency stats  and 
will share 
the results in this thread. 

- Bhanuprakash.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5] netdev-dpdk: fix ifindex assignment for DPDK ports

2017-04-03 Thread Przemyslaw Lal
In current implementation port_id is used as an ifindex for all netdev-dpdk
interfaces.

For physical DPDK interfaces using port_id as ifindex causes that '0' is set as
ifindex for 'dpdk0' interface, '1' for 'dpdk1' and so on. For the DPDK vHost
interfaces ifindexes are not even assigned (0 is used by default) due to the
fact that vHost ports don't use port_id field from the DPDK library.

This causes multiple negative side-effects. First of all 0 is an invalid
ifindex value. The other issue is possible overlapping of 'dpdkX' interfaces
ifindex values with the ifindexes of kernel space interfaces which may cause
problems in any external tools that use those values. Neither 'dpdk0', nor any
DPDK vHost interfaces are visible in sFlow collector tools, as all interfaces
with ifindexes smaller than 1 are ignored.

Proposed solution to these issues is to calculate a hash of interface's name
and use calculated value as an ifindex. This way interfaces keep their
ifindexes during OVS-DPDK restarts, ports re-initialization events, etc., show
up in sFlow collectors and meet RFC 2863 specification regarding re-using
ifindex values by the same virtual interfaces and maximum ifindex value.

Signed-off-by: Przemyslaw Lal 
---
 lib/netdev-dpdk.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ddc651b..687b0a5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2215,7 +2215,11 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev)
 int ifindex;
 
 ovs_mutex_lock(>mutex);
-ifindex = dev->port_id;
+/* Calculate hash from the netdev name. Ensure that ifindex is a 24-bit
+ * postive integer to meet RFC 2863 recommendations.
+ */
+uint32_t h = hash_string(netdev->name, 0);
+ifindex = (int)(h % 0xfe + 1);
 ovs_mutex_unlock(>mutex);
 
 return ifindex;
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 3/3] ovn-northd: Add logical flows to support native DNS

2017-04-03 Thread nusiddiq
From: Numan Siddique 

OVN implements native DNS resolution which can be used to resolve the
internal DNS names belonging to a logical datapath.

To support this, a new table 'DNS' is added in the NB DB. A new column
'dns_records' is added in 'Logical_Switch' table which references to the
'DNS' table.

Following flows are added for each logical switch if configured with
DNS records in the 'dns_records' column
 - A logical flow in DNS_LOOKUP stage which uses the action 'dns_lookup'
   to transform the DNS query to DNS reply packet and advances
   to the next stage - DNS_RESPONSE.

 - A logical flow in DNS_RESPONSE stage which implements the DNS responder
   by sending the DNS reply from previous stage back to the inport.

Signed-off-by: Numan Siddique 
---
 ovn/northd/ovn-northd.8.xml |  85 +-
 ovn/northd/ovn-northd.c | 180 -
 ovn/ovn-nb.ovsschema|  20 ++-
 ovn/ovn-nb.xml  |  27 +++-
 ovn/utilities/ovn-nbctl.c   |   3 +
 tests/ovn.at| 377 
 6 files changed, 679 insertions(+), 13 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index ab8fd88..cb7473c 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -724,7 +724,71 @@ output;
   
 
 
-Ingress Table 13 Destination Lookup
+Ingress Table 13 DNS Lookup
+
+
+  This table looks up and resolves the DNS names of the logical ports
+  if configured with the host names.
+
+
+
+  
+
+  A priority-100 logical flow for each logical switch datapath
+  if it is configured with DNS records, which matches the IPv4 and IPv6
+  packets with udp.dst = 53 and applies the action
+  dns_lookup and advances the packet to the next table.
+
+
+
+reg0[4] = dns_lookup(); next;
+
+
+
+  For valid DNS packets, this transforms the packet into a DNS
+  reply if the DNS name can be resolved, and stores 1 into reg0[4].
+  For failed DNS resolution or other kinds of packets, it just stores
+  0 into reg0[4]. Either way, it continues to the next table.
+
+  
+
+
+Ingress Table 14 DNS Responses
+
+
+  This table implements DNS responder for the DNS replies generated by
+  the previous table.
+
+
+
+  
+
+  A priority-100 logical flow for each logical switch datapath
+  if it is configured with DNS records, which matches the IPv4 and IPv6
+  packets with udp.dst = 53  reg0[4] == 1
+  and responds back to the inport after applying these
+  actions.  If reg0[4] is set to 1, it means that the
+  action dns_lookup was successful.
+
+
+
+eth.dst  eth.src;
+ip4.src  ip4.dst;
+udp.dst = udp.src;
+udp.src = 53;
+outport = P;
+flags.loopback = 1;
+output;
+
+
+
+  (This terminates ingress packet processing; the packet does not go
+   to the next ingress table.)
+
+  
+
+
+Ingress Table 15 Destination Lookup
 
 
   This table implements switching behavior.  It contains these logical
@@ -834,11 +898,22 @@ output;
 
 
 
-  Also a priority 34000 logical flow is added for each logical port which
-  has DHCPv4 options defined to allow the DHCPv4 reply packet and which has
-  DHCPv6 options defined to allow the DHCPv6 reply packet from the
-  Ingress Table 12: DHCP responses.
+  Also the following flows are added.
 
+
+  
+A priority 34000 logical flow is added for each logical port which
+has DHCPv4 options defined to allow the DHCPv4 reply packet and which 
has
+DHCPv6 options defined to allow the DHCPv6 reply packet from the
+Ingress Table 12: DHCP responses.
+  
+
+  
+A priority 34000 logical flow is added for each logical switch datapath
+if it is configured with DNS records, which allows the DNS reply packet
+from the Ingress Table 14:DNS responses.
+  
+
 
 Egress Table 7: Egress Port Security - IP
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 5a2e5ab..5753777 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -112,7 +112,9 @@ enum ovn_stage {
 PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,10, "ls_in_arp_rsp")   \
 PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  11, "ls_in_dhcp_options")  \
 PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 12, "ls_in_dhcp_response") \
-PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,   13, "ls_in_l2_lkup")   \
+PIPELINE_STAGE(SWITCH, IN,  DNS_LOOKUP,  13, "ls_in_dns_lookup") \
+PIPELINE_STAGE(SWITCH, IN,  DNS_RESPONSE,  14, "ls_in_dns_response") \
+PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,   15, "ls_in_l2_lkup")   \
   

[ovs-dev] [PATCH v4 2/3] ovn-controller: Add 'dns_lookup' action

2017-04-03 Thread nusiddiq
From: Numan Siddique 

This patch adds a new OVN action 'dns_lookup' to support native DNS.
ovn-controller parses this action and adds a NXT_PACKET_IN2
OF flow with 'pause' flag set.

A new table 'DNS' is added in the SB DB to look up and resolve
the DNS queries. When a valid DNS packet is received by
ovn-controller, it looks up the DNS name in the 'DNS' table
and if successful, it frames a DNS reply, resumes the packet
and stores 1 in the 1-bit subfield. If the packet is invalid
or cannot be resolved, it resumes the packet without any
modifications and stores 0 in the 1-bit subfield.

reg0[4] = dns_lookup(); next;

An upcoming patch will use this action and adds logical flows.

Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h |  17 ++-
 ovn/controller/pinctrl.c  | 262 +-
 ovn/lib/actions.c |  52 +
 ovn/lib/ovn-util.h|  19 
 ovn/ovn-sb.ovsschema  |  21 +++-
 ovn/ovn-sb.xml|  75 -
 ovn/utilities/ovn-sbctl.c |   3 +
 ovn/utilities/ovn-trace.c |   5 +
 tests/ovn.at  |   7 ++
 9 files changed, 452 insertions(+), 9 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index d2510fd..0e66ee8 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -70,7 +70,8 @@ struct simap;
 OVNACT(PUT_ND,ovnact_put_mac_bind)  \
 OVNACT(PUT_DHCPV4_OPTS, ovnact_put_dhcp_opts)   \
 OVNACT(PUT_DHCPV6_OPTS, ovnact_put_dhcp_opts)   \
-OVNACT(SET_QUEUE,   ovnact_set_queue)
+OVNACT(SET_QUEUE,   ovnact_set_queue)   \
+OVNACT(DNS_LOOKUP,  ovnact_dns_lookup)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -258,6 +259,12 @@ struct ovnact_set_queue {
 uint16_t queue_id;
 };
 
+/* OVNACT_DNS_LOOKUP. */
+struct ovnact_dns_lookup {
+struct ovnact ovnact;
+struct expr_field dst;  /* 1-bit destination field. */
+};
+
 /* Internal use by the helpers below. */
 void ovnact_init(struct ovnact *, enum ovnact_type, size_t len);
 void *ovnact_put(struct ofpbuf *, enum ovnact_type, size_t len);
@@ -385,6 +392,14 @@ enum action_opcode {
  *   - Any number of DHCPv6 options.
  */
 ACTION_OPCODE_PUT_DHCPV6_OPTS,
+
+/* "result = dns_lookup()".
+ * Arguments follow the action_header, in this format:
+ *   - A 32-bit or 64-bit OXM header designating the result field.
+ *   - A 32-bit integer specifying a bit offset within the result field.
+ *
+ */
+ACTION_OPCODE_DNS_LOOKUP,
 };
 
 /* Header. */
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 01259e2..812e731 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -662,7 +662,256 @@ exit:
 }
 
 static void
-process_packet_in(const struct ofp_header *msg)
+pinctrl_handle_dns_lookup(
+struct dp_packet *pkt_in, struct ofputil_packet_in *pin,
+struct ofpbuf *userdata, struct ofpbuf *continuation,
+struct controller_ctx *ctx)
+{
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+enum ofp_version version = rconn_get_version(swconn);
+enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version);
+struct dp_packet *pkt_out_ptr = NULL;
+uint32_t success = 0;
+
+/* Parse result field. */
+const struct mf_field *f;
+enum ofperr ofperr = nx_pull_header(userdata, NULL, , NULL);
+if (ofperr) {
+   VLOG_WARN_RL(, "bad result OXM (%s)", ofperr_to_string(ofperr));
+   goto exit;
+}
+
+/* Parse result offset. */
+ovs_be32 *ofsp = ofpbuf_try_pull(userdata, sizeof *ofsp);
+if (!ofsp) {
+VLOG_WARN_RL(, "offset not present in the userdata");
+goto exit;
+}
+
+/* Check that the result is valid and writable. */
+struct mf_subfield dst = { .field = f, .ofs = ntohl(*ofsp), .n_bits = 1 };
+ofperr = mf_check_dst(, NULL);
+if (ofperr) {
+VLOG_WARN_RL(, "bad result bit (%s)", ofperr_to_string(ofperr));
+goto exit;
+}
+
+/* Extract the DNS header */
+struct dns_header const *in_dns_header = dp_packet_get_udp_payload(pkt_in);
+
+/* Check if it is DNS request or not */
+if (in_dns_header->lo_flag & 0x80) {
+/* It's a DNS response packet which we are not interested in */
+goto exit;
+}
+
+/* Check if at least one query request is present */
+if (!in_dns_header->qdcount) {
+goto exit;
+}
+
+struct udp_header *in_udp = dp_packet_l4(pkt_in);
+size_t udp_len = ntohs(in_udp->udp_len);
+size_t l4_len = dp_packet_l4_size(pkt_in);
+uint8_t *end = (uint8_t *)in_udp + MIN(udp_len, l4_len);
+uint8_t *in_dns_data = (uint8_t *)(in_dns_header + 1);
+uint8_t *in_hostname = in_dns_data;
+uint8_t idx = 0;
+struct ds hostname;
+ds_init();
+/* Extract the hostname. If the hostname is - 'www.ovn.org' it would be
+ * 

[ovs-dev] [PATCH v4 1/3] ovn-util: Add a new util function extract_ip_addresses

2017-04-03 Thread nusiddiq
From: Numan Siddique 

An upcoming commit will use this function to extract the IP (v4 and
v6) addresses from a string without a preceding eth address.

Signed-off-by: Numan Siddique 
---
 ovn/lib/ovn-util.c | 72 ++
 ovn/lib/ovn-util.h |  1 +
 2 files changed, 52 insertions(+), 21 deletions(-)

diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c
index 644a5dc..8580391 100644
--- a/ovn/lib/ovn-util.c
+++ b/ovn/lib/ovn-util.c
@@ -82,19 +82,9 @@ is_dynamic_lsp_address(const char *address)
  ETH_ADDR_SCAN_ARGS(ea), ) && address[n] == '\0'));
 }
 
-/* Extracts the mac, IPv4 and IPv6 addresses from * 'address' which
- * should be of the format "MAC [IP1 IP2 ..] .." where IPn should be a
- * valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and
- * 'ipv6_addrs' fields of 'laddrs'.  There may be additional content in
- * 'address' after "MAC [IP1 IP2 .. ]".  The value of 'ofs' that is
- * returned indicates the offset where that additional content begins.
- *
- * Returns true if at least 'MAC' is found in 'address', false otherwise.
- *
- * The caller must call destroy_lport_addresses(). */
-bool
-extract_addresses(const char *address, struct lport_addresses *laddrs,
-  int *ofs)
+static bool
+parse_and_store_addresses(const char *address, struct lport_addresses *laddrs,
+  int *ofs, bool extract_eth_addr)
 {
 memset(laddrs, 0, sizeof *laddrs);
 
@@ -102,15 +92,18 @@ extract_addresses(const char *address, struct 
lport_addresses *laddrs,
 const char *const start = buf;
 int buf_index = 0;
 const char *buf_end = buf + strlen(address);
-if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT,
-  ETH_ADDR_SCAN_ARGS(laddrs->ea))) {
-laddrs->ea = eth_addr_zero;
-*ofs = 0;
-return false;
-}
 
-snprintf(laddrs->ea_s, sizeof laddrs->ea_s, ETH_ADDR_FMT,
- ETH_ADDR_ARGS(laddrs->ea));
+if (extract_eth_addr) {
+if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT,
+  ETH_ADDR_SCAN_ARGS(laddrs->ea))) {
+laddrs->ea = eth_addr_zero;
+*ofs = 0;
+return false;
+}
+
+snprintf(laddrs->ea_s, sizeof laddrs->ea_s, ETH_ADDR_FMT,
+ ETH_ADDR_ARGS(laddrs->ea));
+}
 
 ovs_be32 ip4;
 struct in6_addr ip6;
@@ -145,6 +138,23 @@ extract_addresses(const char *address, struct 
lport_addresses *laddrs,
 }
 
 /* Extracts the mac, IPv4 and IPv6 addresses from * 'address' which
+ * should be of the format "MAC [IP1 IP2 ..] .." where IPn should be a
+ * valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and
+ * 'ipv6_addrs' fields of 'laddrs'.  There may be additional content in
+ * 'address' after "MAC [IP1 IP2 .. ]".  The value of 'ofs' that is
+ * returned indicates the offset where that additional content begins.
+ *
+ * Returns true if at least 'MAC' is found in 'address', false otherwise.
+ *
+ * The caller must call destroy_lport_addresses(). */
+bool
+extract_addresses(const char *address, struct lport_addresses *laddrs,
+  int *ofs)
+{
+return parse_and_store_addresses(address, laddrs, ofs, true);
+}
+
+/* Extracts the mac, IPv4 and IPv6 addresses from * 'address' which
  * should be of the format 'MAC [IP1 IP2 ..]" where IPn should be a
  * valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and
  * 'ipv6_addrs' fields of 'laddrs'.
@@ -166,6 +176,26 @@ extract_lsp_addresses(const char *address, struct 
lport_addresses *laddrs)
 return success;
 }
 
+/* Extracts the IPv4 and IPv6 addresses from * 'address' which
+ * should be of the format 'IP1 IP2 .." where IPn should be a
+ * valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and
+ * 'ipv6_addrs' fields of 'laddrs'.
+ *
+ * Return true if at least one IP address is found in 'address',
+ * false otherwise.
+ *
+ * The caller must call destroy_lport_addresses(). */
+bool
+extract_ip_addresses(const char *address, struct lport_addresses *laddrs)
+{
+int ofs;
+if (parse_and_store_addresses(address, laddrs, , false)) {
+return (laddrs->n_ipv4_addrs || laddrs->n_ipv6_addrs);
+}
+
+return false;
+}
+
 /* Extracts the mac, IPv4 and IPv6 addresses from the
  * "nbrec_logical_router_port" parameter 'lrp'.  Stores the IPv4 and
  * IPv6 addresses in the 'ipv4_addrs' and 'ipv6_addrs' fields of
diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h
index 8252283..b3d2125 100644
--- a/ovn/lib/ovn-util.h
+++ b/ovn/lib/ovn-util.h
@@ -57,6 +57,7 @@ bool is_dynamic_lsp_address(const char *address);
 bool extract_addresses(const char *address, struct lport_addresses *,
int *ofs);
 bool extract_lsp_addresses(const char *address, struct lport_addresses *);
+bool extract_ip_addresses(const char *address, struct lport_addresses *);
 bool 

[ovs-dev] [PATCH v4 0/3] ovn: Native DNS support

2017-04-03 Thread nusiddiq
From: Numan Siddique 

v3 -> v4


* PS1 (ovn-util: Add a new util function extract_ip_addresses)
  - Added this patch in v4 to extract the IP addresses from a string

* PS2 (ovn-controller: Add 'dns_lookup' action)
  - Addressed the review comments and modified the DNS table in SB DB
as per suggestions from Guru

* PS3 (ovn-northd: Add logical flows to support native DNS)
  - Modified the DNS table in NB DB as per suggestions from Guru.


Numan Siddique (3):
  ovn-util: Add a new util function extract_ip_addresses
  ovn-controller: Add 'dns_lookup' action
  ovn-northd: Add logical flows to support native DNS

 include/ovn/actions.h   |  17 +-
 ovn/controller/pinctrl.c| 262 +-
 ovn/lib/actions.c   |  52 ++
 ovn/lib/ovn-util.c  |  72 ++---
 ovn/lib/ovn-util.h  |  20 +++
 ovn/northd/ovn-northd.8.xml |  85 +-
 ovn/northd/ovn-northd.c | 180 -
 ovn/ovn-nb.ovsschema|  20 ++-
 ovn/ovn-nb.xml  |  27 +++-
 ovn/ovn-sb.ovsschema|  21 ++-
 ovn/ovn-sb.xml  |  75 -
 ovn/utilities/ovn-nbctl.c   |   3 +
 ovn/utilities/ovn-sbctl.c   |   3 +
 ovn/utilities/ovn-trace.c   |   5 +
 tests/ovn.at| 384 
 15 files changed, 1183 insertions(+), 43 deletions(-)

-- 
2.9.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovs V5 00/24] Introducing HW offload support for openvswitch

2017-04-03 Thread Roi Dayan



On 29/03/2017 20:13, Joe Stringer wrote:

On 29 March 2017 at 04:50, Roi Dayan  wrote:



On 23/03/2017 09:01, Joe Stringer wrote:


I ran the make check-offloads tests on a recent net-next kernel and it
failed, output was not as expected:

../../tests/system-offloaded-traffic.at:54
: ovs-appctl dpctl/dump-flows |
grep "eth_type(0x0800)" | sed -e

's/used:[0-9].[0-9]*s/used:0.001s/;s/eth(src=[a-z0-9:]*,dst=[a-z0-9:]*)/eth(mac
s)/;s/actions:[0-9,]*/actions:output/;s/recirc_id(0),//' | sort
--- - 2017-03-22 16:43:37.598689692 -0700
+++

/home/vagrant/ovs/_build-clang/tests/system-offloads-testsuite.dir/at-groups/2/stdout
2017-03-22 16:43:37.595628000 -0700
@@ -1,3 +1,3 @@
-in_port(2),eth(macs),eth_type(0x0800), packets:9, bytes:756,
used:0.001s, actions:output
-in_port(3),eth(macs),eth_type(0x0800), packets:9, bytes:756,
used:0.001s, actions:output
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
bytes:882, used:0.001s, actions:output
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9,
bytes:882, used:0.001s, actions:output



Hi Joe,

can you tell me what kernel you used here?
maybe tc offloads were not supported and there was a fallback to OVS dp.


I believe that it was a snapshot of net-next relatively recently,
01461abe62df ("Merge branch 'fib-notifications-cleanup'"). I could try
again with latest net-next? Or do you think there may be some
userspace dependency the test relies on?



I installed net-next kernel and make check-offloads pass for me.
The last commit I'm on is
397df70 Merge branch '40GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

last tag is 4.11-rc3+
I'm thinking maybe the test fails for you from something else like a 
second openvswitch process running already?


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 5/6] port_group: ofproto-dpif: add port_group definition and init

2017-04-03 Thread Matthias May
Signed-off-by: Matthias May 
---
 ofproto/ofproto-dpif.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 523adad6f..32d7d69c5 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -93,6 +93,7 @@ struct ofbundle {
 
 /* Configuration. */
 struct ovs_list ports;  /* Contains "struct ofport"s. */
+ofp_port_t port_group;  /* Group of ports to forbid loopback */
 enum port_vlan_mode vlan_mode; /* VLAN mode */
 uint16_t qinq_ethtype;
 int vlan;   /* -1=trunk port, else a 12-bit VLAN ID. */
@@ -451,8 +452,9 @@ type_run(const char *type)
 
 HMAP_FOR_EACH (bundle, hmap_node, >bundles) {
 xlate_bundle_set(ofproto, bundle, bundle->name,
- bundle->vlan_mode, bundle->qinq_ethtype,
- bundle->vlan, bundle->trunks, bundle->cvlans,
+ bundle->port_group, bundle->vlan_mode,
+ bundle->qinq_ethtype, bundle->vlan,
+ bundle->trunks, bundle->cvlans,
  bundle->use_priority_tags,
  bundle->bond, bundle->lacp,
  bundle->floodable, bundle->protected);
@@ -2896,6 +2898,7 @@ bundle_set(struct ofproto *ofproto_, void *aux,
 bundle->name = NULL;
 
 ovs_list_init(>ports);
+bundle->port_group = 0;
 bundle->vlan_mode = PORT_VLAN_TRUNK;
 bundle->qinq_ethtype = ETH_TYPE_VLAN_8021AD;
 bundle->vlan = -1;
@@ -2957,6 +2960,12 @@ bundle_set(struct ofproto *ofproto_, void *aux,
 return EINVAL;
 }
 
+/* Set port_group */
+if (s->port_group != bundle->port_group) {
+bundle->port_group = s->port_group;
+need_flush = true;
+}
+
 /* Set VLAN tagging mode */
 if (s->vlan_mode != bundle->vlan_mode
 || s->use_priority_tags != bundle->use_priority_tags) {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 3/6] port_group: bridge: set port_group in port config

2017-04-03 Thread Matthias May
Signed-off-by: Matthias May 
---
 vswitchd/bridge.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 867a26d8d..316e4742e 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -966,6 +966,17 @@ port_configure(struct port *port)
 s.slaves[s.n_slaves++] = iface->ofp_port;
 }
 
+/* Get port_group. */
+if (cfg->port_group) {
+if ((*cfg->port_group > 0 && *cfg->port_group < OFPP_MAX)
+|| *cfg->port_group == OFPP_LOCAL) {
+s.port_group = (uint32_t) *cfg->port_group;
+}
+} else {
+/* On a port with multiple interfaces we default to the first. */
+s.port_group = s.slaves[0];
+}
+
 /* Get VLAN tag. */
 s.vlan = -1;
 if (cfg->tag && *cfg->tag >= 0 && *cfg->tag <= 4095) {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 4/6] port_group: ofproto: add port_group definition

2017-04-03 Thread Matthias May
Signed-off-by: Matthias May 
---
 ofproto/ofproto.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index 1e48e1952..29d7cdb91 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -402,6 +402,7 @@ struct ofproto_bundle_settings {
 ofp_port_t *slaves; /* OpenFlow port numbers for slaves. */
 size_t n_slaves;
 
+ofp_port_t port_group;  /* Group of ports to forbid loopback */
 enum port_vlan_mode vlan_mode; /* Selects mode for vlan and trunks */
 uint16_t qinq_ethtype;
 int vlan;   /* VLAN VID, except for PORT_VLAN_TRUNK. */
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 2/6] port_group: add documentation entry

2017-04-03 Thread Matthias May
Signed-off-by: Matthias May 
---
 vswitchd/vswitch.xml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 14297bf9a..3a7324f5d 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -1280,6 +1280,13 @@
   bonded Port.
 
 
+
+  The group affiliation of a port. If not set this defaults to the
+  open flow port number. Valid values are 1-65279, 65534.
+  Traffic processed by the normal action will not egress on a port which
+  has the same port_group as the port on which the frame ingressed.
+
+
 
   
 In short, a VLAN (short for ``virtual LAN'') is a way to partition a
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 1/6] port_group: add db definitions

2017-04-03 Thread Matthias May
Signed-off-by: Matthias May 
---
 vswitchd/vswitch.ovsschema | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 19b49daf1..7ebcd0d6a 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.15.0",
- "cksum": "544856471 23228",
+ "version": "7.16.0",
+ "cksum": "2597117342 23427",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -160,6 +160,11 @@
"enum": ["set", ["trunk", "access", "native-tagged",
 "native-untagged", "dot1q-tunnel"]]},
  "min": 0, "max": 1}},
+   "port_group": {
+ "type": {"key": {"type": "integer",
+  "minInteger": 1,
+  "maxInteger": 65279},
+  "min": 0, "max": 1}},
"qos": {
  "type": {"key": {"type": "uuid",
   "refTable": "QoS"},
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] [PATCH 0/6] Add port_group to prevent loopback between interfaces

2017-04-03 Thread Matthias May
When using openvswitch in combination with a hardware switch attached via dsa
it may be desirable to prevent frames from being looped back to interfaces
which reside on the same switch and are already processed by the
switching fabric in hardware.
This patch series achieves this by introducing a new parameter port_group to
be used by the normal action.
When the port_group is not set explicitly, it defaults to the ofp number.
Ports which are in the same group will not forward a frame between each other.

Matthias May (6):
  port_group: add db definitions
  port_group: add documentation entry
  port_group: bridge: set port_group in port config
  port_group: ofproto: add port_group definition
  port_group: ofproto-dpif: add port_group definition and init
  port_group: ofproto-dpif-xlate: implement port_group

 ofproto/ofproto-dpif-xlate.c | 53 +---
 ofproto/ofproto-dpif-xlate.h |  4 ++--
 ofproto/ofproto-dpif.c   | 13 +--
 ofproto/ofproto.h|  1 +
 vswitchd/bridge.c| 11 +
 vswitchd/vswitch.ovsschema   |  9 ++--
 vswitchd/vswitch.xml |  7 ++
 7 files changed, 79 insertions(+), 19 deletions(-)

-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev