Re: [ovs-dev] [PATCH] openflow.rst: Update to reflect current status.

2017-06-15 Thread Ben Pfaff
On Wed, Jun 14, 2017 at 03:24:49PM -0700, Greg Rose wrote:
> On 06/14/2017 08:21 AM, Ben Pfaff wrote:
> >OpenFlow 1.1 and 1.2 support is complete.  Simon Horman is not known to
> >be working on flow entry notifications.
> >
> >Signed-off-by: Ben Pfaff 
> >---
> >  Documentation/topics/openflow.rst | 23 +++
> >  1 file changed, 3 insertions(+), 20 deletions(-)
> >
> >diff --git a/Documentation/topics/openflow.rst 
> >b/Documentation/topics/openflow.rst
> >index 00e65411c455..6f71d6c752fc 100644
> >--- a/Documentation/topics/openflow.rst
> >+++ b/Documentation/topics/openflow.rst
> >@@ -75,28 +75,12 @@ Here are the current approaches in a few tricky areas:
> >  OpenFlow 1.1
> >  
> >
> >-The list of remaining work items for OpenFlow 1.1 is below.  It is probably
> >-incomplete.
> >-
> >-* Match and set double-tagged VLANs (QinQ).
> >-
> >-  This requires kernel work for reasonable performance.
> >-
> >-  (optional for OF1.1+)
> >-
> >-* VLANs tagged with 88a8 Ethertype.
> >-
> >-  This requires kernel work for reasonable performance.
> >-
> >-  (required for OF1.1+)
> >+OpenFlow 1.1 support is believed to be complete.
> >
> >  OpenFlow 1.2
> >  
> >
> >-OpenFlow 1.2 support requires OpenFlow 1.1 as a prerequisite. All the
> >-additional work specific to Openflow 1.2 are complete.  (This is based on 
> >the
> >-change log at the end of the OF1.2 spec.  I didn't compare the specs 
> >carefully
> >-yet.)
> >+OpenFlow 1.2 support is believed to be complete.
> 
> FWIW it looks fine to me but I am wondering about the passive voice here.
> 
> Why not just say 'OpenFlow 1.2 support is complete'?  Just curious.

You're right.  This is documentation, not cross-examination.  I made
that change.

> >
> >  OpenFlow 1.3
> >  
> >@@ -170,8 +154,7 @@ in OVS.
> >
> >  * Flow entry notifications
> >
> >-  This seems to be modelled after OVS's NXST_FLOW_MONITOR.  (Simon Horman is
> >-  working on this.)
> >+  This seems to be modelled after OVS's NXST_FLOW_MONITOR.
> 
> LGTM
> 
> Acked-by: Greg Rose 

Thanks, I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] checkpatch: Fix skipping of the most recent commit.

2017-06-15 Thread Ben Pfaff
On Thu, Jun 15, 2017 at 02:57:30PM +0300, Ilya Maximets wrote:
> 'range(n_patches, 0, -1)' generates list starting from 'n_patches'
> and not including zero. This leads to checking of N most recent
> commits starting from the second one.
> 
> New version will generate right list starting from 'n_patches - 1'
> and including zero. So, the most recent commit (HEAD~0) will be
> checked and desired behavior will be achieved.
> 
> Also, 'reversed' looks better than 'range(n_patches - 1, -1, -1)'
> 
> Fixes: a1fccabce2cb ("checkpatch: Support checking recent commits in the 
> current repo.")
> Signed-off-by: Ilya Maximets 

Thanks for the fix.  I applied it.

(This was a truly dumb logic error of mine.  I spent a few minutes
playing with Python expressions to count down from N to 1, inclusive,
thinking I wanted, e.g., HEAD~4 through HEAD~1, and even tested it, and
never realized that I really want HEAD~3 through HEAD~0.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 15/31] fixup: Change from global modal behavior to L3 tunnel only mode.

2017-06-15 Thread Ben Pfaff
On Wed, Jun 14, 2017 at 10:48:50PM +, Jan Scheurich wrote:
> We do understand your motivation for proposing this. But such change is a 
> significant and requires careful consideration. We need to satisfy four main 
> requirements:
> 
> 
> 
> 1. Legacy controllers that are not packet-type aware must not be affected by 
> the introduction of packet_type in OVS. The externally visible behavior of 
> OVS must not change as long as OVS used in "legacy" mode:
> 
> * Matching works exactly as without support packet_type as long as the 
> pipeline handles only Ethernet packets.
> 
> * The controller shall not be exposed to unexpected packet_type fields in 
> e.g. flow or packet in messages.
> 
> 
> 
> 2. Packet type-unaware controllers shall be able to use legacy L3 tunnels as 
> in the past (LISP OVS 2.7, or with the just merged series for L3 tunneling in 
> user space), i.e. implicit conversion to/from Ethernet for processing in the 
> pipeline without controller involvement.
> 
> 
> 
> 3. Packet-type aware controllers shall be able to use versatile tunnel ports 
> directly without implicit conversion of packet types through OVS.
> 
> 
> 
> 4. The tunnel handling logic should be easily understandable not only for OVS 
> experts but also for controller developers of the two camps (packet 
> type-aware and unaware).
> 
> 
> 
> We think backward compatibility requirement (1) can be fulfilled even
> if an OVS bridge is always packet type-ware, provided that the default
> packet_type (0,0) is filtered away in all OF matches sent out by OVS
> (flow messages and packet in). 

This is what OVS does in the packet set I have proposed.

> (Note: It is not quite obvious that packet type-aware controllers will
> recognize flows if they have explicitly included a packet_type=(0,0)
> match in the flows they sent. But we have the reciprocal problem in
> our previous patch also).

Related problems exist already, since OpenFlow does not define a
canonical form of flow matches.  For example, OXMs can appear in any
order that satisfies prerequisites, which leads to a large number of
valid permutations in many cases.  I have been surprised for a long time
that OVS has received few bug reports in this area; I suspect that
either controllers tolerate OVS's particular foibles here or that
controllers tend to identify flows via cookie.

> Regarding your proposed renaming of the alternative bridge property:
> We think the name used in the OVSDB schema XML "legacy-l3-tunnel" (or
> perhaps "legacy-l3-tunneling"?) is more appropriate than the name
> "legacy-l3-pipeline" that is actually used in the code (and with
> negation between OVSDB and ofproto).

I'm happy to do that renaming.

> It seems clear that (2) and (3) together require some config knob to
> control the implicit packet type conversion for legacy controllers. If
> bridge property packet-type-aware is not available for this, we either
> need a different bridge property (as in your proposal) or we have to
> configure this per port.
> 
> 
> We can assume that the legacy L3 tunnel handling behavior (2) is
> set. In your proposal this is selected through
> "legacy-l3-tunnel=true". Tunnel ports operating in this mode are
> either L2 only or L3 only and the behavior is determined through the
> "layer3" option per tunnel. The semantics are summarized by the table
> below:
> 
> 
> 
> layer3Rx from tunnel  
> Tx to tunnel
> 
> 
> false  Ethernet packets are processed as is,   
> Ethernet packet are sent as is,
> L3 packets are dropped.   
>  L3 packets are dropped.
> 
> 
> true   L3 packets are encapsulated in   
> The Ethernet header is stripped
> dummy Ethernet header for pipeline   
> before transmission to tunnel,
> processing,   
>   L3 packets are dropped.
> Ethernet packets are dropped.
> 
> 
> 
> The new L3/versatile tunnel port handling for packet-type-aware controllers 
> is as follows:
> 
> 
> 
> Rx from tunnel
>   Tx to tunnel
> 
> 
> 
> All packets that can be mapped to a 
> All packet types are sent as is w/o
> known packet type (0,0) or (1,x) are 
> implicit type conversion. If the tunnel
> processed as is.  
>does not support the packet type as
> Other packets are dropped.
> payload, packets are dropped.
> 
> 
> 
> Our understanding is that you intend to apply this behavior for
> "legacy-l3-tunnel=false". It is not clear from the 

Re: [ovs-dev] [PATCH] system-traffic: 802.1ad: Add double VLAN match test case

2017-06-15 Thread Eric Garver
On Thu, Jun 15, 2017 at 04:49:17PM -0700, Joe Stringer wrote:
> On 8 June 2017 at 14:42, Joe Stringer  wrote:
> > Yes, I'm the right person.
> >
> > At this point I'm looking for the tap device persistence issue to be
> > addressed before running it on my tester box which runs tests on a
> > variety of platforms.
> 
> Hi Eric,
> 
> I'm finding that things are still pretty unstable in the
> system-traffic tests, particularly around kernels 4.8-4.9, but also
> with regards to cleaning up properly after testsuite failures. Without
> clean runs on master, it's a bit hard for me to run these tests as
> well and observe that these tests are good on a range of kernels.

Understood. I don't think there is any rush to get this particular patch
in (or the IPv6 underlay tests for that matter). When you get back
around to it let me know when/if they need rebased.

Thanks.
Eric.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 15/31] fixup: Change from global modal behavior to L3 tunnel only mode.

2017-06-15 Thread Jan Scheurich
Hi Jean, thanks for your feedback.

>   A few more details...
>   Requirement (1) is not valid for 1.5 controllers. There is no
> excuse for a controller supporting 1.5 to no support packet type, it's
> part of the spec ;-)

The usual pattern is that controllers (just like OVS) add protocol support for 
an OF wire protocol version without implementing all new features. As PTAP is a 
pretty major semantical addition, I think it would not be wise to rely on 
controllers to handle PTAP (especially non-Ethernet packet types) just because 
they can talk OF 1.5 wire protocol.

>   I claim that a switch can't properly support EXT-112/PTAP in
> OF prior to version 1.5. This is why EXT-112 was not ported to OF 1.3
> as an extension. For example, prior to 1.5 the controller can't do
> packet-out on a PTAP switch, which is pretty major.

We believe that an OF 1.3 controller extended with support for PTAP and 
EXT-382/Generic encap/decap for Ethernet is fully equipped (EXT-382 is the next 
patch series we will post). It can emulate the OF 1.5 Packet Out for 
non-Ethernet packets using the OF 1.3 Packet Out with an Ethernet packet and 
include a decap action to remove the unwanted Ethernet header before output.

>   I think a "legacy" controller can't work with PTAP flow
> entries and PTAP packet-in. Section 7.1.3 of OF 1.3.5 can only help so
> much. Converting those to "ethernet" when sending to the controller is
> dodgy, so IMHO the only way is to supress them. I don't remember what
> I did in my prototype patch.

As long as OVS is only configured with legacy tunnel ports and programmed by 
legacy controllers, all packets in the pipeline will be Ethernet and legacy 
controllers will not be subject to PTAP packet ins. Combining legacy 
controllers, PTAP controllers and versatile tunnels in the same bridge can in 
my eyes never work and should be considered misconfiguration.

>   A tunnel/port setting seems an easy way to avoid a lot of
> complexity. It's annoying to have such setting, it increases confusion
> and chances of misconfiguration. The alternative would be for every
> flow entry set by a non-PTAP controller to be duplicated as an
> Ethernet and L3 flow. Yuck.
>   IMHO, there is no way for a "legacy" controller to manage
> a "ptap" L3 tunnel.

Agreed. Legacy controllers can handle L3 tunnels only in the legacy way 
supported by OVS: L3 packets are implicitly converted to/from Ethernet at 
rx/tx, so that the pipeline only deals with Ethernet packets.

>   The switch need to make sure that only the appropriate packet
> types are sent on the tunnel, VxLAN can not encaps L3 packets and LISP
> can not encaps L2 packets, and GRE can encaps both L2 and L3.
>   In most cases, the tunnel knows the encaps it's dealing with,
> so in theory it could check the outgoing packet type and drop
> offending types itself. Same thing in reverse when packet out of the
> tunnel must be tagged with the proper type. Again, in most cases the
> decapsulation knows what packet type it receives and can select the
> tag.
>   I guess, it depends how and where you want to implement such
> filtering/tagging, and what kind of enforcement. OpenFlow has always
> been "let the controller shoot itself in the foot".

Yes, for PTAP tunnels OVS does the mapping between tunnel payload and pipeline 
packet type automatically, and drops packets for which such mapping is not 
defined or not implemented. 

>   It's clear to me that having both a bridge property
> (legacy-l3-tunnel) and a port property (layer3) is confusing, not only
> for the user but also for the implementers ;-) I don't see why you
> need to have a setting for the whole bridge, as it's really a property
> of the specific port.

We tend to agree here :-)

>   I can even imagine some bridges mixing up some tunnels in
> legacy mode and some tunnels in ptap mode. Let's go crazy.
>   That is why you propose to have the following per-port
> property, and no per-bridge setting :
>   1) layer3=layer2-only   // Legacy L2
>   2) layer3=layer3-to-ethernet// Legacy L3
>   3) layer3=versatile // PTAP aware
> 
>   But, I believe you can go one step further in the
> simplification. You just need to realise that (1) and (3) are
> effectively the same thing, if you assume the port knows what it is
> doing. There is no point in denying non-L2 outgoing packets at the
> property level in (1), because the encaps itself will take care of it
> (can't encode it, drop). And if a tunnel in (1) generate an incomming
> PTAP packet, OVS classifier will do the right thing (match nothing,
> drop, can't packet-in to <1.5 controller).
>   So, in conclusion, you just need a single per port boolean,
> and no per-bridge setting :
>   convert-to-ethernet=true/false

A single property looks attractive at first glance. The issue we see with that 
is that it can break backward compatibility for 

Re: [ovs-dev] [PATCH] dpif: execute ct together with recirc when slow path

2017-06-15 Thread Joe Stringer
On 14 June 2017 at 19:56, Wei Li  wrote:
> Great idea, this can process actions like
> "ct(...nat),recirc(...),output(...)" correctly, output packet will be a
> NATed packet
>
> Before implementing your idea, how deal with my patch?
>
> 1: drop it
>
> 2:accept it for fixing limited scenario (i will do checkpatch)
>
> What's your suggestion?

I think that we should probably work towards a proper solution, this
only addresses a subset of the currently-broken cases.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Document that hw-offload is experimental.

2017-06-15 Thread Joe Stringer
On 15 June 2017 at 16:36, Joe Stringer  wrote:
> Currently, the set of flows that may be offloaded is very small compared
> to the overall capabilities of the OpenFlow support in OVS. In the
> majority of cases, if a user attempts to enable this flag they are
> unlikely to observe a performance increase, because for instance they
> lack the correct hardware; lack the correct kernel version; or their
> flow tables are too complex for the hardware to handle.
>
> To moderate expectations around this feature, describe it as
> experimental. Over time, we expect that the functionality and usefulness
> of this feature will grow and we should be in a better shape to revisit
> the status of this functionality after it has had some time to mature.
>
> Signed-off-by: Joe Stringer 
> ---

Hi folks, this is a pretty high level description of some of the
limitations. I wonder if we could add another table to the FAQ
underneath the question "Are all features available with all
datapaths?" to answer the questions "Which features can be offloaded
to hardware?" and "Which hardware supports hardware offload"? It seems
to me like minimum to configure this stuff is 4.2, but various fields
are still being added to the matching side of this... then only a
small subset of actions can be offloaded.. then there's the question
of which kernels began to distribute drivers with support to offload
to particular NICs and switches. Could someone take a stab at
documenting this for users?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] docs: Document that hw-offload is experimental.

2017-06-15 Thread Joe Stringer
Currently, the set of flows that may be offloaded is very small compared
to the overall capabilities of the OpenFlow support in OVS. In the
majority of cases, if a user attempts to enable this flag they are
unlikely to observe a performance increase, because for instance they
lack the correct hardware; lack the correct kernel version; or their
flow tables are too complex for the hardware to handle.

To moderate expectations around this feature, describe it as
experimental. Over time, we expect that the functionality and usefulness
of this feature will grow and we should be in a better shape to revisit
the status of this functionality after it has had some time to mature.

Signed-off-by: Joe Stringer 
---
CC: Paul Blakey 
CC: Roi Dayan 
CC: Simon Horman 
CC: Flavio Leitner 
---
 NEWS | 2 +-
 vswitchd/vswitch.xml | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index a2f5a6dc8e54..4ea7e628e1fc 100644
--- a/NEWS
+++ b/NEWS
@@ -65,7 +65,7 @@ Post-v2.7.0
  * Transparently pop and push Ethernet headers at transmit/reception
of packets to/from L3 tunnels.
- The BFD detection multiplier is now user-configurable.
-   - New support for HW offloading
+   - Add experimental support for hardware offloading
  * HW offloading is disabled by default.
  * HW offloading is done through the TC interface.
 
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 9bb828faa8eb..500f05a24280 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -192,6 +192,11 @@
 
   Currently Open vSwitch supports hardware offloading on
   Linux systems.  On other systems, this value is ignored.
+  This functionality is considered 'experimental'. Depending
+  on which OpenFlow matches and actions are configured,
+  which kernel version is used, and what hardware is
+  available, Open vSwitch may not be able to offload
+  functionality to hardware.
 
   
 
-- 
2.11.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V11 00/33] Introducing HW offload support for openvswitch

2017-06-15 Thread Joe Stringer
On 15 June 2017 at 03:15, Simon Horman  wrote:
> On Tue, Jun 13, 2017 at 06:03:22PM +0300, Roi Dayan wrote:
>> This patch series introduces rule offload functionality to dpif-netlink
>> via netdev ports new flow offloading API. The user can specify whether to
>> enable rule offloading or not via OVS configuration. Netdev providers
>> are able to implement netdev flow offload API in order to offload rules.
>>
>> This patch series also implements one offload scheme for netdev-linux,
>> using TC flower classifier, which was chosen because its sort of natural
>> to state OVS DP rules for this classifier. However, the code can be
>> extended to support other classifiers such as U32, eBPF, etc which support
>> offload as well.
>>
>> The use-case we are currently addressing is the newly sriov switchdev mode
>> in the Linux kernel which was introduced in version 4.8.
>> This series was tested against sriov vfs vports representors of the
>> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
>>
>> The feature is disabled by default and can be enabled by setting
>> other_config:hw-offload to True.
>> Currently offloading is done for rules consisting on matching on L2 MAC,
>> L3 IP (not including IP flags), L4 TCP/UDP (not inclduing TCP flags),
>> VLAN, VXLAN tunnel.
>> HW offloading is supported only for drop and for single output action with
>
> Thanks Roi,
>
> I have applied these to the master branch
> (in two batches; 1 - 7 yesterday and the rest today).
>
> There were some follow-up requests from Flavio for some of the patches,
> please lets not forget those.

Thanks all who've been involved in this work, it's a good start and I
look forward to seeing it become more fleshed out and become more
useful for a wider range of use cases.

Looking over the commits, I see that there was no mention of
experimental status yet so I'll send a followup patch shortly to
address this. I don't think that this should surprise anyone, as I
raised this a few times with people at netdev 2.1.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-ICMP entry

2017-06-15 Thread Alin Serdean
Mind if we backport this to 2.7 also?

Acked-by: Alin Gabriel Serdean 

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Shashank Ram
> Sent: Friday, June 16, 2017 12:31 AM
> To: Sairam Venugopal ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-
> ICMP entry
> 
> Looks good, one minor clarification needed (inline).
> 
> From: ovs-dev-boun...@openvswitch.org  boun...@openvswitch.org> on behalf of Sairam Venugopal
> 
> Sent: Thursday, June 15, 2017 2:07 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-
> ICMP entry
> 
> Set conntrack-icmp entry to {0}. Add some compile time asserts to ensure
> that conn_* struct's first member is OVS_CT_ENTRY.
> 
> >>> It would help to mention why this is a requirement.
> 
> Signed-off-by: Sairam Venugopal 
> ---
>  datapath-windows/ovsext/Conntrack-icmp.c  | 3 ++-  datapath-
> windows/ovsext/Conntrack-other.c | 1 +
>  datapath-windows/ovsext/Conntrack-tcp.c   | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/datapath-windows/ovsext/Conntrack-icmp.c b/datapath-
> windows/ovsext/Conntrack-icmp.c
> index b1b6043..4da0665 100644
> --- a/datapath-windows/ovsext/Conntrack-icmp.c
> +++ b/datapath-windows/ovsext/Conntrack-icmp.c
> @@ -27,6 +27,7 @@ struct conn_icmp {
>  struct OVS_CT_ENTRY up;
>  enum icmp_state state;
>  };
> +C_ASSERT(offsetof(struct conn_icmp, up) == 0);
> 
>  static const enum ct_timeout icmp_timeouts[] = {
>  [ICMPS_FIRST] = 60 * CT_INTERVAL_SEC, @@ -78,7 +79,7 @@
> OvsConntrackCreateIcmpEntry(UINT64 now)
>  if (!conn) {
>  return NULL;
>  }
> -
> +conn->up = (OVS_CT_ENTRY) {0};
>  conn->state = ICMPS_FIRST;
> 
>  OvsConntrackUpdateExpiration(>up, now, diff --git a/datapath-
> windows/ovsext/Conntrack-other.c b/datapath-
> windows/ovsext/Conntrack-other.c
> index 6c68ba8..962cc8a 100644
> --- a/datapath-windows/ovsext/Conntrack-other.c
> +++ b/datapath-windows/ovsext/Conntrack-other.c
> @@ -27,6 +27,7 @@ struct conn_other {
>  struct OVS_CT_ENTRY up;
>  enum other_state state;
>  };
> +C_ASSERT(offsetof(struct conn_other, up) == 0);
> 
>  static const long long other_timeouts[] = {
>  [OTHERS_FIRST] = 60 * CT_INTERVAL_SEC, diff --git a/datapath-
> windows/ovsext/Conntrack-tcp.c b/datapath-windows/ovsext/Conntrack-
> tcp.c
> index f533b93..f8e85a2 100644
> --- a/datapath-windows/ovsext/Conntrack-tcp.c
> +++ b/datapath-windows/ovsext/Conntrack-tcp.c
> @@ -51,6 +51,7 @@ struct conn_tcp {
>  struct OVS_CT_ENTRY up;
>  struct tcp_peer peer[2];
>  };
> +C_ASSERT(offsetof(struct conn_tcp, up) == 0);
> 
>  enum {
>  TCPOPT_EOL,
> --
> 2.9.0.windows.1
> 
> ___
> 
> Acked-by: Shashank Ram 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Define NAT_ACTION enum correctly

2017-06-15 Thread Alin Serdean
Acked-by: Alin Gabriel Serdean 


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Sairam Venugopal
> Sent: Friday, June 16, 2017 12:20 AM
> To: Shashank Ram ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] datapath-windows: Define NAT_ACTION
> enum correctly
> 
> Acked-by: Sairam Venugopal 
> 
> 
> 
> 
> 
> On 6/15/17, 12:46 PM, "ovs-dev-boun...@openvswitch.org on behalf of
> Shashank Ram"  r...@vmware.com> wrote:
> 
> >The existing code throws a warning when compiled with the Windows 10
> >SDK:
[Alin Serdean] (*)WDK
> >'typedef ': ignored on left of 'NAT_ACTION' when no variable is
> >declared
> >
> >Signed-off-by: Shashank Ram 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath-windows: Add validations for IP_HEADER_LEN

2017-06-15 Thread Shashank Ram
Adds validations in OvsGetIp() to make sure the IHL is
within valid bounds. If IHL is invalid, then the packet
is dropped by the callers of this function.

Signed-off-by: Shashank Ram 
---
 datapath-windows/ovsext/Flow.c | 5 +
 datapath-windows/ovsext/Offload.c  | 3 +++
 datapath-windows/ovsext/PacketParser.h | 9 -
 datapath-windows/ovsext/Stt.c  | 2 +-
 datapath-windows/ovsext/User.c | 5 +
 datapath-windows/ovsext/Vxlan.c| 3 ++-
 6 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c
index 60f9b1c..852a22f 100644
--- a/datapath-windows/ovsext/Flow.c
+++ b/datapath-windows/ovsext/Flow.c
@@ -2141,6 +2141,9 @@ OvsExtractLayers(const NET_BUFFER_LIST *packet,
 }
 }
 }
+} else {
+/* Invalid network header */
+return NDIS_STATUS_INVALID_PACKET;
 }
 } else if (dlType == htons(ETH_TYPE_IPV6)) {
 NDIS_STATUS status;
@@ -2360,8 +2363,10 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
 }
 }
 } else {
+/* Invalid network header */
 ((UINT64 *)ipKey)[0] = 0;
 ((UINT64 *)ipKey)[1] = 0;
+return NDIS_STATUS_INVALID_PACKET;
 }
 } else if (flow->l2.dlType == htons(ETH_TYPE_IPV6)) {
 NDIS_STATUS status;
diff --git a/datapath-windows/ovsext/Offload.c 
b/datapath-windows/ovsext/Offload.c
index 65d3b67..0905c80 100644
--- a/datapath-windows/ovsext/Offload.c
+++ b/datapath-windows/ovsext/Offload.c
@@ -563,6 +563,9 @@ OvsValidateIPChecksum(PNET_BUFFER_LIST curNbl,
 if (checksum != hdrChecksum) {
 return NDIS_STATUS_FAILURE;
 }
+} else {
+/* Invalid network header */
+return NDIS_STATUS_FAILURE;
 }
 }
 return NDIS_STATUS_SUCCESS;
diff --git a/datapath-windows/ovsext/PacketParser.h 
b/datapath-windows/ovsext/PacketParser.h
index f1d7f28..0d5c0a6 100644
--- a/datapath-windows/ovsext/PacketParser.h
+++ b/datapath-windows/ovsext/PacketParser.h
@@ -17,6 +17,8 @@
 #ifndef __PACKET_PARSER_H_
 #define __PACKET_PARSER_H_ 1
 
+#define MIN_IPV4_HLEN 20
+
 #include "precomp.h"
 #include "NetProto.h"
 
@@ -107,7 +109,12 @@ OvsGetIp(const NET_BUFFER_LIST *packet,
 const IPHdr *ip = OvsGetPacketBytes(packet, sizeof *ip, ofs, storage);
 if (ip) {
 int ipLen = ip->ihl * 4;
-if (ipLen >= sizeof *ip && OvsPacketLenNBL(packet) >= ofs + ipLen) {
+if (ipLen <  MIN_IPV4_HLEN ||
+ipLen > MAX_IPV4_HLEN ||
+OvsPacketLenNBL(packet) < ofs + ipLen) {
+   /* IP header is invalid, flag it */
+   return NULL;
+} else {
 return ip;
 }
 }
diff --git a/datapath-windows/ovsext/Stt.c b/datapath-windows/ovsext/Stt.c
index 1f36835..676cf0c 100644
--- a/datapath-windows/ovsext/Stt.c
+++ b/datapath-windows/ovsext/Stt.c
@@ -1019,7 +1019,7 @@ OvsDecapStt(POVS_SWITCH_CONTEXT switchContext,
 innerIpHdr->check = IPChecksum((UINT8 *)innerIpHdr,
 innerIpHdr->ihl * 4, 0);
 } else {
-status = NDIS_STATUS_RESOURCES;
+status = NDIS_STATUS_INVALID_PACKET;
 goto dropNbl;
 }
 } else if (layers.isIPv6) {
diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c
index 7880220..22ee7af 100644
--- a/datapath-windows/ovsext/User.c
+++ b/datapath-windows/ovsext/User.c
@@ -465,6 +465,11 @@ OvsExecuteDpIoctl(OvsPacketExecute *execute)
 ndisStatus = OvsExtractFlow(pNbl, execute->inPort, , ,
  tempTunKey.tunKey.dst == 0 ? NULL : );
 
+if (ndisStatus != NDIS_STATUS_SUCCESS) {
+/* Invalid network header */
+goto dropit;
+}
+
 ctx = (POVS_BUFFER_CONTEXT)NET_BUFFER_LIST_CONTEXT_DATA_START(pNbl);
 ctx->mru = execute->mru;
 
diff --git a/datapath-windows/ovsext/Vxlan.c b/datapath-windows/ovsext/Vxlan.c
index 427f31c..f66a7e5 100644
--- a/datapath-windows/ovsext/Vxlan.c
+++ b/datapath-windows/ovsext/Vxlan.c
@@ -489,7 +489,8 @@ OvsSlowPathDecapVxlan(const PNET_BUFFER_LIST packet,
 if (nh) {
 layers.l4Offset = layers.l3Offset + nh->ihl * 4;
 } else {
-break;
+   status = NDIS_STATUS_INVALID_PACKET;
+   break;
 }
 
 /* make sure it's a VXLAN packet */
-- 
2.9.3.windows.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-ICMP entry

2017-06-15 Thread Shashank Ram
Looks good, one minor clarification needed (inline).

From: ovs-dev-boun...@openvswitch.org  on 
behalf of Sairam Venugopal 
Sent: Thursday, June 15, 2017 2:07 PM
To: d...@openvswitch.org
Subject: [ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-ICMP 
entry

Set conntrack-icmp entry to {0}. Add some compile time asserts to ensure
that conn_* struct's first member is OVS_CT_ENTRY.

>>> It would help to mention why this is a requirement.

Signed-off-by: Sairam Venugopal 
---
 datapath-windows/ovsext/Conntrack-icmp.c  | 3 ++-
 datapath-windows/ovsext/Conntrack-other.c | 1 +
 datapath-windows/ovsext/Conntrack-tcp.c   | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/datapath-windows/ovsext/Conntrack-icmp.c 
b/datapath-windows/ovsext/Conntrack-icmp.c
index b1b6043..4da0665 100644
--- a/datapath-windows/ovsext/Conntrack-icmp.c
+++ b/datapath-windows/ovsext/Conntrack-icmp.c
@@ -27,6 +27,7 @@ struct conn_icmp {
 struct OVS_CT_ENTRY up;
 enum icmp_state state;
 };
+C_ASSERT(offsetof(struct conn_icmp, up) == 0);

 static const enum ct_timeout icmp_timeouts[] = {
 [ICMPS_FIRST] = 60 * CT_INTERVAL_SEC,
@@ -78,7 +79,7 @@ OvsConntrackCreateIcmpEntry(UINT64 now)
 if (!conn) {
 return NULL;
 }
-
+conn->up = (OVS_CT_ENTRY) {0};
 conn->state = ICMPS_FIRST;

 OvsConntrackUpdateExpiration(>up, now,
diff --git a/datapath-windows/ovsext/Conntrack-other.c 
b/datapath-windows/ovsext/Conntrack-other.c
index 6c68ba8..962cc8a 100644
--- a/datapath-windows/ovsext/Conntrack-other.c
+++ b/datapath-windows/ovsext/Conntrack-other.c
@@ -27,6 +27,7 @@ struct conn_other {
 struct OVS_CT_ENTRY up;
 enum other_state state;
 };
+C_ASSERT(offsetof(struct conn_other, up) == 0);

 static const long long other_timeouts[] = {
 [OTHERS_FIRST] = 60 * CT_INTERVAL_SEC,
diff --git a/datapath-windows/ovsext/Conntrack-tcp.c 
b/datapath-windows/ovsext/Conntrack-tcp.c
index f533b93..f8e85a2 100644
--- a/datapath-windows/ovsext/Conntrack-tcp.c
+++ b/datapath-windows/ovsext/Conntrack-tcp.c
@@ -51,6 +51,7 @@ struct conn_tcp {
 struct OVS_CT_ENTRY up;
 struct tcp_peer peer[2];
 };
+C_ASSERT(offsetof(struct conn_tcp, up) == 0);

 enum {
 TCPOPT_EOL,
--
2.9.0.windows.1

___

Acked-by: Shashank Ram 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] datapath-windows: Add validations in fragmentation module

2017-06-15 Thread Sairam Venugopal
Thanks for adding in the CVE.

Acked-by: Sairam Venugopal 





On 6/9/17, 7:54 PM, "ovs-dev-boun...@openvswitch.org on behalf of Anand Kumar" 
 wrote:

>- Minimum valid fragment size is 400 bytes, any fragment smaller
>is likely to be intentionally crafted (CVE-2000-0305).
>
>- Validate maximum length of an Ip datagram
>
>- Added counters to keep track of number of fragments for a given
>Ip datagram.
>
>Signed-off-by: Anand Kumar 
>Acked-by: Alin Gabriel Serdean 
>---
>v1->v2: Update commmit message to add (CVE-2000-0305) vulnerability
>---
> datapath-windows/ovsext/Actions.c|  2 +-
> datapath-windows/ovsext/IpFragment.c | 41 +---
> datapath-windows/ovsext/IpFragment.h |  2 ++
> 3 files changed, 32 insertions(+), 13 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Actions.c 
>b/datapath-windows/ovsext/Actions.c
>index c3f0362..4eeaab3 100644
>--- a/datapath-windows/ovsext/Actions.c
>+++ b/datapath-windows/ovsext/Actions.c
>@@ -2030,7 +2030,7 @@ OvsDoExecuteActions(POVS_SWITCH_CONTEXT switchContext,
> if (status != NDIS_STATUS_SUCCESS) {
> /* Pending NBLs are consumed by Defragmentation. */
> if (status != NDIS_STATUS_PENDING) {
>-OVS_LOG_ERROR("CT Action failed");
>+OVS_LOG_ERROR("CT Action failed status = %lu", status);
> dropReason = L"OVS-conntrack action failed";
> } else {
> /* We added a new pending NBL to be consumed later.
>diff --git a/datapath-windows/ovsext/IpFragment.c 
>b/datapath-windows/ovsext/IpFragment.c
>index 0874cb9..e601a15 100644
>--- a/datapath-windows/ovsext/IpFragment.c
>+++ b/datapath-windows/ovsext/IpFragment.c
>@@ -25,6 +25,10 @@
> #undef OVS_DBG_MOD
> #endif
> #define OVS_DBG_MOD OVS_DBG_IPFRAG
>+/* Based on MIN_FRAGMENT_SIZE.*/
>+#define MAX_FRAGMENTS 164
>+#define MIN_FRAGMENT_SIZE 400
>+#define MAX_IPDATAGRAM_SIZE 65535
> 
> /* Function declarations */
> static VOID OvsIpFragmentEntryCleaner(PVOID data);
>@@ -146,8 +150,9 @@ OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext,
> IPHdr *ipHdr, *newIpHdr;
> CHAR *ethBuf[sizeof(EthHdr)];
> CHAR *packetBuf;
>-UINT16 ipHdrLen, packetLen, packetHeader;
>+UINT16 ipHdrLen, packetHeader;
> POVS_FRAGMENT_LIST head = NULL;
>+UINT32 packetLen;
> 
> curNb = NET_BUFFER_LIST_FIRST_NB(*curNbl);
> ASSERT(NET_BUFFER_NEXT_NB(curNb) == NULL);
>@@ -161,7 +166,10 @@ OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext,
> if (ipHdr == NULL) {
> return NDIS_STATUS_INVALID_PACKET;
> }
>-ipHdrLen = (UINT16)(ipHdr->ihl * 4);
>+ipHdrLen = ipHdr->ihl * 4;
>+if (ipHdrLen + entry->totalLen > MAX_IPDATAGRAM_SIZE) {
>+return NDIS_STATUS_INVALID_LENGTH;
>+}
> packetLen = ETH_HEADER_LENGTH + ipHdrLen + entry->totalLen;
> packetBuf = (CHAR*)OvsAllocateMemoryWithTag(packetLen,
> OVS_IPFRAG_POOL_TAG);
>@@ -185,7 +193,10 @@ OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext,
> packetHeader = ETH_HEADER_LENGTH + ipHdrLen;
> head = entry->head;
> while (head) {
>-ASSERT((packetHeader + head->offset) <= packetLen);
>+if ((UINT32)(packetHeader + head->offset) > packetLen) {
>+status = NDIS_STATUS_INVALID_DATA;
>+goto cleanup;
>+}
> NdisMoveMemory(packetBuf + packetHeader + head->offset,
>head->pbuff, head->len);
> head = head->next;
>@@ -197,10 +208,6 @@ OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext,
> status = NDIS_STATUS_RESOURCES;
> }
> 
>-OvsFreeMemoryWithTag(packetBuf, OVS_IPFRAG_POOL_TAG);
>-/* Timeout the entry so that clean up thread deletes it .*/
>-entry->expiration -= IPFRAG_ENTRY_TIMEOUT;
>-
> /* Complete the fragment NBL */
> ctx = (POVS_BUFFER_CONTEXT)NET_BUFFER_LIST_CONTEXT_DATA_START(*curNbl);
> if (ctx->flags & OVS_BUFFER_NEED_COMPLETE) {
>@@ -214,6 +221,9 @@ OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext,
> ctx = (POVS_BUFFER_CONTEXT)NET_BUFFER_LIST_CONTEXT_DATA_START(*newNbl);
> ctx->mru = entry->mru;
> *curNbl = *newNbl;
>+cleanup:
>+OvsFreeMemoryWithTag(packetBuf, OVS_IPFRAG_POOL_TAG);
>+entry->markedForDelete = TRUE;
> return status;
> }
> /*
>@@ -259,12 +269,15 @@ OvsProcessIpv4Fragment(POVS_SWITCH_CONTEXT switchContext,
> if (ipHdr == NULL) {
> return NDIS_STATUS_INVALID_PACKET;
> }
>-ipHdrLen = (UINT16)(ipHdr->ihl * 4);
>+ipHdrLen = ipHdr->ihl * 4;
> payloadLen = ntohs(ipHdr->tot_len) - ipHdrLen;
> offset = ntohs(ipHdr->frag_off) & IP_OFFSET;
> offset <<= 3;
> flags = ntohs(ipHdr->frag_off) & IP_MF;
>-
>+/* Only the last fragment can be of smaller size.*/
>+if (flags && 

Re: [ovs-dev] [PATCH] datapath-windows: Add support for UPDATE events in Conntrack

2017-06-15 Thread Sairam Venugopal
Acked-by: Sairam Venugopal 





On 6/12/17, 10:21 PM, "ovs-dev-boun...@openvswitch.org on behalf of Anand 
Kumar"  
wrote:

>  - Parse netlink ct attr OVS_CT_ATTR_EVENTMASK
>  - Add a new CT_EVENT_TYPE, OVS_EVENT_CT_UPDATE which is triggered
>only when CT_ATTR_EVENTMASK is set for MARK and LABEL updates.
>
>Signed-off-by: Anand Kumar 
>---
> datapath-windows/include/OvsDpInterfaceCtExt.h |  1 +
> datapath-windows/ovsext/Conntrack.c| 27 --
> datapath-windows/ovsext/Datapath.c |  3 +++
> datapath-windows/ovsext/DpInternal.h   |  3 ++-
> datapath-windows/ovsext/Event.c|  3 ++-
> lib/netlink-conntrack.c|  3 +++
> 6 files changed, 36 insertions(+), 4 deletions(-)
>
>diff --git a/datapath-windows/include/OvsDpInterfaceCtExt.h 
>b/datapath-windows/include/OvsDpInterfaceCtExt.h
>index 3b94778..45e7ff8 100644
>--- a/datapath-windows/include/OvsDpInterfaceCtExt.h
>+++ b/datapath-windows/include/OvsDpInterfaceCtExt.h
>@@ -154,6 +154,7 @@ enum cntl_msg_types {
> IPCTNL_MSG_CT_GET_STATS,
> IPCTNL_MSG_CT_GET_DYING,
> IPCTNL_MSG_CT_GET_UNCONFIRMED,
>+IPCTNL_MSG_CT_UPDATE,
> IPCTNL_MSG_MAX
> };
> 
>diff --git a/datapath-windows/ovsext/Conntrack.c 
>b/datapath-windows/ovsext/Conntrack.c
>index 68ed395..ab53993 100644
>--- a/datapath-windows/ovsext/Conntrack.c
>+++ b/datapath-windows/ovsext/Conntrack.c
>@@ -698,9 +698,11 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
>   MD_MARK *mark,
>   MD_LABELS *labels,
>   PCHAR helper,
>-  PNAT_ACTION_INFO natInfo)
>+  PNAT_ACTION_INFO natInfo,
>+  BOOLEAN postUpdateEvent)
> {
> NDIS_STATUS status = NDIS_STATUS_SUCCESS;
>+BOOLEAN sendUpdateEvent = FALSE;
> POVS_CT_ENTRY entry = NULL;
> PNET_BUFFER_LIST curNbl = fwdCtx->curNbl;
> OvsConntrackKeyLookupCtx ctx = { 0 };
>@@ -752,10 +754,16 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
> }
> 
> if (entry && mark) {
>+if (!entryCreated) {
>+sendUpdateEvent = TRUE;
>+}
> OvsConntrackSetMark(key, entry, mark->value, mark->mask);
> }
> 
> if (entry && labels) {
>+if (!entryCreated) {
>+sendUpdateEvent = TRUE;
>+}
> OvsConntrackSetLabels(key, entry, >value, >mask);
> }
> 
>@@ -790,6 +798,9 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx,
> if (entryCreated && entry) {
> OvsPostCtEventEntry(entry, OVS_EVENT_CT_NEW);
> }
>+if (postUpdateEvent && sendUpdateEvent) {
>+OvsPostCtEventEntry(entry, OVS_EVENT_CT_UPDATE);
>+}
> 
> NdisReleaseRWLock(ovsConntrackLockObj, );
> 
>@@ -811,7 +822,9 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx,
> PNL_ATTR ctAttr;
> BOOLEAN commit = FALSE;
> BOOLEAN force = FALSE;
>+BOOLEAN postUpdateEvent = FALSE;
> UINT16 zone = 0;
>+UINT32 eventmask = 0;
> MD_MARK *mark = NULL;
> MD_LABELS *labels = NULL;
> PCHAR helper = NULL;
>@@ -932,9 +945,17 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx,
> /* Force implicitly means commit */
> commit = TRUE;
> }
>+ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_EVENTMASK);
>+if (ctAttr) {
>+eventmask = NlAttrGetU32(ctAttr);
>+/* Only mark and label updates are supported. */
>+if (eventmask & (1 << IPCT_MARK | 1 << IPCT_LABEL))
>+postUpdateEvent = TRUE;
>+}
> /* If newNbl is not allocated, use the current Nbl*/
> status = OvsCtExecute_(fwdCtx, key, layers,
>-   commit, force, zone, mark, labels, helper, 
>);
>+   commit, force, zone, mark, labels, helper, 
>,
>+   postUpdateEvent);
> return status;
> }
> 
>@@ -1290,6 +1311,8 @@ OvsCreateNlMsgFromCtEntry(POVS_CT_ENTRY entry,
> nlmsgType = (UINT16) (NFNL_SUBSYS_CTNETLINK << 8 | IPCTNL_MSG_CT_NEW);
> } else if (eventType == OVS_EVENT_CT_DELETE) {
> nlmsgType = (UINT16) (NFNL_SUBSYS_CTNETLINK << 8 | 
> IPCTNL_MSG_CT_DELETE);
>+} else if (eventType == OVS_EVENT_CT_UPDATE) {
>+nlmsgType = (UINT16) (NFNL_SUBSYS_CTNETLINK << 8 | 
>IPCTNL_MSG_CT_UPDATE);
> } else {
> return STATUS_INVALID_PARAMETER;
> }
>diff --git a/datapath-windows/ovsext/Datapath.c 
>b/datapath-windows/ovsext/Datapath.c
>index 83d996e..10412a1 100644
>--- a/datapath-windows/ovsext/Datapath.c
>+++ b/datapath-windows/ovsext/Datapath.c
>@@ -1312,6 +1312,9 @@ OvsSubscribeEventCmdHandler(POVS_USER_PARAMS_CONTEXT 
>usrParamsCtx,
> if (mcastGrp == NFNLGRP_CONNTRACK_DESTROY) {
> request.mask = OVS_EVENT_CT_DELETE;
> }
>+if (mcastGrp == NFNLGRP_CONNTRACK_UPDATE) {
>+request.mask = OVS_EVENT_CT_UPDATE;
>+}
> }
> 
> status = 

Re: [ovs-dev] [PATCH] datapath-windows: use NlAttrGet() in Conntrack.c

2017-06-15 Thread Sairam Venugopal
Acked-by: Sairam Venugopal 





On 6/14/17, 4:01 PM, "ovs-dev-boun...@openvswitch.org on behalf of Nithin Raju" 
 wrote:

>Couple of minor fixes that got flagged with a static checker.
>
>Signed-off-by: Nithin Raju 
>---
> datapath-windows/ovsext/Conntrack.c| 14 ++
> datapath-windows/ovsext/Netlink/Netlink.c  |  2 +-
> datapath-windows/ovsext/Netlink/NetlinkProto.h |  2 +-
> 3 files changed, 4 insertions(+), 14 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Conntrack.c 
>b/datapath-windows/ovsext/Conntrack.c
>index 68ed395..07a9583 100644
>--- a/datapath-windows/ovsext/Conntrack.c
>+++ b/datapath-windows/ovsext/Conntrack.c
>@@ -863,23 +863,13 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx,
> ? NAT_ACTION_SRC : NAT_ACTION_DST);
> break;
> case OVS_NAT_ATTR_IP_MIN:
>-   if (natAttr->nlaLen < NLA_HDRLEN) {
>-OVS_LOG_ERROR("Incorrect header length for "
>-  "OVS_NAT_ATTR_IP_MIN message.");
>-break;
>-}
> memcpy(,
>-   NlAttrData(natAttr), natAttr->nlaLen - NLA_HDRLEN);
>+   NlAttrData(natAttr), NlAttrGetSize(natAttr));
> hasMinIp = TRUE;
> break;
> case OVS_NAT_ATTR_IP_MAX:
>-if (natAttr->nlaLen < NLA_HDRLEN) {
>-OVS_LOG_ERROR("Incorrect header length for "
>-  "OVS_NAT_ATTR_IP_MAX message.");
>-break;
>-}
> memcpy(,
>-   NlAttrData(natAttr), natAttr->nlaLen - NLA_HDRLEN);
>+   NlAttrData(natAttr), NlAttrGetSize(natAttr));
> hasMaxIp = TRUE;
> break;
> case OVS_NAT_ATTR_PROTO_MIN:
>diff --git a/datapath-windows/ovsext/Netlink/Netlink.c 
>b/datapath-windows/ovsext/Netlink/Netlink.c
>index a63f066..156732c 100644
>--- a/datapath-windows/ovsext/Netlink/Netlink.c
>+++ b/datapath-windows/ovsext/Netlink/Netlink.c
>@@ -1000,7 +1000,7 @@ PCHAR
> NlAttrGetString(const PNL_ATTR nla)
> {
> ASSERT(nla->nlaLen >= NLA_HDRLEN);
>-if (!memchr(NlAttrGet(nla), '\0', nla->nlaLen - NLA_HDRLEN)) {
>+if (!memchr(NlAttrGet(nla), '\0', NlAttrGetSize(nla))) {
> return NULL;
> }
> return NlAttrGet(nla);
>diff --git a/datapath-windows/ovsext/Netlink/NetlinkProto.h 
>b/datapath-windows/ovsext/Netlink/NetlinkProto.h
>index 5175311..59b5656 100644
>--- a/datapath-windows/ovsext/Netlink/NetlinkProto.h
>+++ b/datapath-windows/ovsext/Netlink/NetlinkProto.h
>@@ -123,7 +123,7 @@ BUILD_ASSERT_DECL(sizeof(NL_ATTR) == 4);
> #define GENL_HDRLEN NLMSG_ALIGN(sizeof(GENL_MSG_HDR))
> #define NF_GEN_MSG_HDRLEN NLMSG_ALIGN(sizeof(NF_GEN_MSG_HDR))
> #define OVS_HDRLEN NLMSG_ALIGN(sizeof(OVS_HDR))
>-#define NLA_HDRLEN ((INT) NLA_ALIGN(sizeof(NL_ATTR)))
>+#define NLA_HDRLEN ((UINT16) NLA_ALIGN(sizeof(NL_ATTR)))
> 
> #define NETLINK_NETFILTER   12
> #define NETLINK_GENERIC 16
>-- 
>2.7.1.windows.1
>
>___
>dev mailing list
>d...@openvswitch.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=Gcu2sMIlG_kV-8opOt_iKbhfjzkPO9cYR2j-OsbKWtg=7_Bz1HQYeFWidKM6e_f97gb2MHM3ySqlpHx02X4XDtY=
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Define NAT_ACTION enum correctly

2017-06-15 Thread Sairam Venugopal
Acked-by: Sairam Venugopal 





On 6/15/17, 12:46 PM, "ovs-dev-boun...@openvswitch.org on behalf of Shashank 
Ram"  wrote:

>The existing code throws a warning when compiled
>with the Windows 10 SDK:
>'typedef ': ignored on left of 'NAT_ACTION' when no variable is declared
>
>Signed-off-by: Shashank Ram 
>---
> datapath-windows/ovsext/Conntrack.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Conntrack.h 
>b/datapath-windows/ovsext/Conntrack.h
>index fcdd96e..04ca299 100644
>--- a/datapath-windows/ovsext/Conntrack.h
>+++ b/datapath-windows/ovsext/Conntrack.h
>@@ -69,14 +69,14 @@ typedef struct MD_LABELS {
> struct ovs_key_ct_labels mask;
> } MD_LABELS;
> 
>-typedef enum NAT_ACTION {
>+typedef enum _NAT_ACTION {
> NAT_ACTION_NONE = 0,
> NAT_ACTION_REVERSE = 1 << 0,
> NAT_ACTION_SRC = 1 << 1,
> NAT_ACTION_SRC_PORT = 1 << 2,
> NAT_ACTION_DST = 1 << 3,
> NAT_ACTION_DST_PORT = 1 << 4,
>-};
>+} NAT_ACTION;
> 
> typedef struct _OVS_CT_KEY {
> struct ct_endpoint src;
>-- 
>2.9.3.windows.2
>
>___
>dev mailing list
>d...@openvswitch.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=Z6vowHUOjP5ysP_g372c49Nqc1vEKqHKNBkR5Q5Z7uo=3Dz3HHrBXQtG1qmIwlchkCobPqUjawGcRt9P4TMS6MA=G14YPFl6TG_Nkp3lTLYzBJsy2SiG9meKRUs8j-taX3w=
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath-windows: Zero initialize Conntrack-ICMP entry

2017-06-15 Thread Sairam Venugopal
Set conntrack-icmp entry to {0}. Add some compile time asserts to ensure
that conn_* struct's first member is OVS_CT_ENTRY.

Signed-off-by: Sairam Venugopal 
---
 datapath-windows/ovsext/Conntrack-icmp.c  | 3 ++-
 datapath-windows/ovsext/Conntrack-other.c | 1 +
 datapath-windows/ovsext/Conntrack-tcp.c   | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/datapath-windows/ovsext/Conntrack-icmp.c 
b/datapath-windows/ovsext/Conntrack-icmp.c
index b1b6043..4da0665 100644
--- a/datapath-windows/ovsext/Conntrack-icmp.c
+++ b/datapath-windows/ovsext/Conntrack-icmp.c
@@ -27,6 +27,7 @@ struct conn_icmp {
 struct OVS_CT_ENTRY up;
 enum icmp_state state;
 };
+C_ASSERT(offsetof(struct conn_icmp, up) == 0);
 
 static const enum ct_timeout icmp_timeouts[] = {
 [ICMPS_FIRST] = 60 * CT_INTERVAL_SEC,
@@ -78,7 +79,7 @@ OvsConntrackCreateIcmpEntry(UINT64 now)
 if (!conn) {
 return NULL;
 }
-
+conn->up = (OVS_CT_ENTRY) {0};
 conn->state = ICMPS_FIRST;
 
 OvsConntrackUpdateExpiration(>up, now,
diff --git a/datapath-windows/ovsext/Conntrack-other.c 
b/datapath-windows/ovsext/Conntrack-other.c
index 6c68ba8..962cc8a 100644
--- a/datapath-windows/ovsext/Conntrack-other.c
+++ b/datapath-windows/ovsext/Conntrack-other.c
@@ -27,6 +27,7 @@ struct conn_other {
 struct OVS_CT_ENTRY up;
 enum other_state state;
 };
+C_ASSERT(offsetof(struct conn_other, up) == 0);
 
 static const long long other_timeouts[] = {
 [OTHERS_FIRST] = 60 * CT_INTERVAL_SEC,
diff --git a/datapath-windows/ovsext/Conntrack-tcp.c 
b/datapath-windows/ovsext/Conntrack-tcp.c
index f533b93..f8e85a2 100644
--- a/datapath-windows/ovsext/Conntrack-tcp.c
+++ b/datapath-windows/ovsext/Conntrack-tcp.c
@@ -51,6 +51,7 @@ struct conn_tcp {
 struct OVS_CT_ENTRY up;
 struct tcp_peer peer[2];
 };
+C_ASSERT(offsetof(struct conn_tcp, up) == 0);
 
 enum {
 TCPOPT_EOL,
-- 
2.9.0.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Define NAT_ACTION enum correctly

2017-06-15 Thread Nithin Raju
On Jun 15, 2017, at 12:46 PM, Shashank Ram 
> wrote:

The existing code throws a warning when compiled
with the Windows 10 SDK:
'typedef ': ignored on left of 'NAT_ACTION' when no variable is declared

Signed-off-by: Shashank Ram >

Acked-by: Nithin Raju >

Thanks,
-- Nithin

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath-windows: Define NAT_ACTION enum correctly

2017-06-15 Thread Shashank Ram
The existing code throws a warning when compiled
with the Windows 10 SDK:
'typedef ': ignored on left of 'NAT_ACTION' when no variable is declared

Signed-off-by: Shashank Ram 
---
 datapath-windows/ovsext/Conntrack.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.h 
b/datapath-windows/ovsext/Conntrack.h
index fcdd96e..04ca299 100644
--- a/datapath-windows/ovsext/Conntrack.h
+++ b/datapath-windows/ovsext/Conntrack.h
@@ -69,14 +69,14 @@ typedef struct MD_LABELS {
 struct ovs_key_ct_labels mask;
 } MD_LABELS;
 
-typedef enum NAT_ACTION {
+typedef enum _NAT_ACTION {
 NAT_ACTION_NONE = 0,
 NAT_ACTION_REVERSE = 1 << 0,
 NAT_ACTION_SRC = 1 << 1,
 NAT_ACTION_SRC_PORT = 1 << 2,
 NAT_ACTION_DST = 1 << 3,
 NAT_ACTION_DST_PORT = 1 << 4,
-};
+} NAT_ACTION;
 
 typedef struct _OVS_CT_KEY {
 struct ct_endpoint src;
-- 
2.9.3.windows.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 15/31] fixup: Change from global modal behavior to L3 tunnel only mode.

2017-06-15 Thread Jean Tourrilhes
Jan Scheurich wrote :
> 

Just a warning. I forgotten everything about my code, so those
are only high level thougts.

> 1. Legacy controllers that are not packet-type aware must not be
> affected by the introduction of packet_type in OVS. The externally
> visible behavior of OVS must not change as long as OVS used in
> "legacy" mode:

The way this is usually done by OpenFlow and OVS is by using
the OpenFlow version negociated by the switch. This would work
beautifully here :
If the controller negotiate 1.4 or earlier, the switch deals
with it in "legacy" mode.
If the controller negotiate 1.5 or later, the switch deals
with it in "ptap" mode.

A few more details...
Requirement (1) is not valid for 1.5 controllers. There is no
excuse for a controller supporting 1.5 to no support packet type, it's
part of the spec ;-)
I claim that a switch can't properly support EXT-112/PTAP in
OF prior to version 1.5. This is why EXT-112 was not ported to OF 1.3
as an extension. For example, prior to 1.5 the controller can't do
packet-out on a PTAP switch, which is pretty major.
There is no "the controller", there is "one of the
controllers". Even ovs-ofctl is effectively a controller. The switch
has to deal with a mix of "legacy" and "ptap" controllers. So,
a single modal switch won't work, it needs to be a per-controller
setting.
I think a "legacy" controller can't work with PTAP flow
entries and PTAP packet-in. Section 7.1.3 of OF 1.3.5 can only help so
much. Converting those to "ethernet" when sending to the controller is
dodgy, so IMHO the only way is to supress them. I don't remember what
I did in my prototype patch.
A tunnel/port setting seems an easy way to avoid a lot of
complexity. It's annoying to have such setting, it increases confusion
and chances of misconfiguration. The alternative would be for every
flow entry set by a non-PTAP controller to be duplicated as an
Ethernet and L3 flow. Yuck.
IMHO, there is no way for a "legacy" controller to manage
a "ptap" L3 tunnel.

> You say "tunnel ports configured as layer3 can only send and receive
> L3 packets." without detailing what that means.

The switch need to make sure that only the appropriate packet
types are sent on the tunnel, VxLAN can not encaps L3 packets and LISP
can not encaps L2 packets, and GRE can encaps both L2 and L3.
In most cases, the tunnel knows the encaps it's dealing with,
so in theory it could check the outgoing packet type and drop
offending types itself. Same thing in reverse when packet out of the
tunnel must be tagged with the proper type. Again, in most cases the
decapsulation knows what packet type it receives and can select the
tag.
I guess, it depends how and where you want to implement such
filtering/tagging, and what kind of enforcement. OpenFlow has always
been "let the controller shoot itself in the foot".

> The alternative to having both bridge property "legacy-l3-tunnel"
> and a port property "layer3" would be to go for a single port
> property "tunnel-mode" with three values "layer2" (default),
> "layer3" and "versatile".

It's clear to me that having both a bridge property
(legacy-l3-tunnel) and a port property (layer3) is confusing, not only
for the user but also for the implementers ;-) I don't see why you
need to have a setting for the whole bridge, as it's really a property
of the specific port.
I can even imagine some bridges mixing up some tunnels in
legacy mode and some tunnels in ptap mode. Let's go crazy.
That is why you propose to have the following per-port
property, and no per-bridge setting :
1) layer3=layer2-only   // Legacy L2
2) layer3=layer3-to-ethernet// Legacy L3
3) layer3=versatile // PTAP aware

But, I believe you can go one step further in the
simplification. You just need to realise that (1) and (3) are
effectively the same thing, if you assume the port knows what it is
doing. There is no point in denying non-L2 outgoing packets at the
property level in (1), because the encaps itself will take care of it
(can't encode it, drop). And if a tunnel in (1) generate an incomming
PTAP packet, OVS classifier will do the right thing (match nothing,
drop, can't packet-in to <1.5 controller).
So, in conclusion, you just need a single per port boolean,
and no per-bridge setting :
convert-to-ethernet=true/false

Regards,

Jean
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] How to efficiently connect docker network to ovs+dpdk switch

2017-06-15 Thread Darrell Ball
Pls use the ovs-discuss mailing list


On 6/15/17, 5:46 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
 wrote:

Hi All,

Previously I use kernel ovs, and docker veth-pair port can be added to ovs 
bridge directly. In this case, docker traffic from kernel will direct to ovs 
kernel module.

Now I want to use ovs+dpdk to speed up the forwarding performance, but I am 
wondering how docker traffic would go to ovs userspace bridge. From my 
understanding, veth-pair traffic would
always go from kernel, and it needs to be copied to userspace, then bridged 
by ovs+dpdk. The performance is quite low.

So question:
1) What is the proposed docker network port for ovs+dpdk? Ideally kernel 
should NOT be involved. I am not sure whether it is possible.
2) currently veth-pair port can only be handled by main thread 
(NON_PMD_CORE_ID) since its is_pmd attributes is false. This means it can only 
be handled by 1 CPU. How can multiple CPU handle such case?

Br,
Wang Zhike

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=XStPvCJIaKLmJge8S1ORuqJ8EtXnY3eVeGxZgW7F3Xo=jAxn10xcjrlumptia0fIu73EfFFut5hMd2_P-GjZ5HY=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: use NlAttrGet() in Conntrack.c

2017-06-15 Thread Shashank Ram


From: ovs-dev-boun...@openvswitch.org  on 
behalf of Nithin Raju 
Sent: Wednesday, June 14, 2017 4:01 PM
To: d...@openvswitch.org
Subject: [ovs-dev] [PATCH] datapath-windows: use NlAttrGet() in Conntrack.c

Couple of minor fixes that got flagged with a static checker.

Signed-off-by: Nithin Raju 
---
 datapath-windows/ovsext/Conntrack.c| 14 ++
 datapath-windows/ovsext/Netlink/Netlink.c  |  2 +-
 datapath-windows/ovsext/Netlink/NetlinkProto.h |  2 +-
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 68ed395..07a9583 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -863,23 +863,13 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx,
 ? NAT_ACTION_SRC : NAT_ACTION_DST);
 break;
 case OVS_NAT_ATTR_IP_MIN:
-   if (natAttr->nlaLen < NLA_HDRLEN) {
-OVS_LOG_ERROR("Incorrect header length for "
-  "OVS_NAT_ATTR_IP_MIN message.");
-break;
-}
 memcpy(,
-   NlAttrData(natAttr), natAttr->nlaLen - NLA_HDRLEN);
+   NlAttrData(natAttr), NlAttrGetSize(natAttr));
 hasMinIp = TRUE;
 break;
 case OVS_NAT_ATTR_IP_MAX:
-if (natAttr->nlaLen < NLA_HDRLEN) {
-OVS_LOG_ERROR("Incorrect header length for "
-  "OVS_NAT_ATTR_IP_MAX message.");
-break;
-}
 memcpy(,
-   NlAttrData(natAttr), natAttr->nlaLen - NLA_HDRLEN);
+   NlAttrData(natAttr), NlAttrGetSize(natAttr));
 hasMaxIp = TRUE;
 break;
 case OVS_NAT_ATTR_PROTO_MIN:
diff --git a/datapath-windows/ovsext/Netlink/Netlink.c 
b/datapath-windows/ovsext/Netlink/Netlink.c
index a63f066..156732c 100644
--- a/datapath-windows/ovsext/Netlink/Netlink.c
+++ b/datapath-windows/ovsext/Netlink/Netlink.c
@@ -1000,7 +1000,7 @@ PCHAR
 NlAttrGetString(const PNL_ATTR nla)
 {
 ASSERT(nla->nlaLen >= NLA_HDRLEN);
-if (!memchr(NlAttrGet(nla), '\0', nla->nlaLen - NLA_HDRLEN)) {
+if (!memchr(NlAttrGet(nla), '\0', NlAttrGetSize(nla))) {
 return NULL;
 }
 return NlAttrGet(nla);
diff --git a/datapath-windows/ovsext/Netlink/NetlinkProto.h 
b/datapath-windows/ovsext/Netlink/NetlinkProto.h
index 5175311..59b5656 100644
--- a/datapath-windows/ovsext/Netlink/NetlinkProto.h
+++ b/datapath-windows/ovsext/Netlink/NetlinkProto.h
@@ -123,7 +123,7 @@ BUILD_ASSERT_DECL(sizeof(NL_ATTR) == 4);
 #define GENL_HDRLEN NLMSG_ALIGN(sizeof(GENL_MSG_HDR))
 #define NF_GEN_MSG_HDRLEN NLMSG_ALIGN(sizeof(NF_GEN_MSG_HDR))
 #define OVS_HDRLEN NLMSG_ALIGN(sizeof(OVS_HDR))
-#define NLA_HDRLEN ((INT) NLA_ALIGN(sizeof(NL_ATTR)))
+#define NLA_HDRLEN ((UINT16) NLA_ALIGN(sizeof(NL_ATTR)))

 #define NETLINK_NETFILTER   12
 #define NETLINK_GENERIC 16
--
2.7.1.windows.1

___

Acked-by: Shashank Ram 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] rte_eal_init() error when using ovs-dpdk with secondary application.

2017-06-15 Thread Aaron Conole
Junguk Cho  writes:

> Hi, Aaron.
>
> Thank you for reply.
>
> You will need to use populate this option into other_config:dpdk-extra
> in the ovsdb.
>
>   ovs-vsctl --no-wait set Open_vSwitch \
>   . other_config:dpdk-extra="--base-virtaddr=..."
>
> -> I tried to use "struct rte_memseg *m = rte_eal_get_physmem_layout()" 
> option based on
> (http://dpdk.org/doc/api/structrte__memseg.html).
> Always it returns different number. 
> How do I get virtual address value to use it as an input of "base-viraddr"?
>
> I don't know the base virtual address value you should use, however.
> -> Do you mean it will not help? 

I mean I don't know what value to use.

> Thanks,
> Junguk
>
> On Thu, Jun 15, 2017 at 9:28 AM, Aaron Conole  wrote:
>
>  Junguk Cho  writes:
>
>  > Hi,
>  >
>  > I use ovs-dpdk (ovs-2.7, dpdk-16.11.1) with one application which talks to
>  > ovs by using ring device and "--proc-type=secondary" (secondary processes).
>  > It generally works well, but sometimes it shows this error.
>  >
>  > It seems it could not find correct memory mapping in the application.
>  >
>  > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: Probing VFIO support...
>  > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: WARNING: Address Space
>  > Layout Randomization (ASLR) is enabled in the kernel.
>  > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL:This may cause
>  > issues with mapping memory into secondary processes
>  > 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: Could not mmap
>  > 17146314752 <(714)%20631-4752> bytes in /dev/zero at [0x7f662c80], got
>  > [0x7f650600] - please use '--base-virtaddr' option
>  > 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: It is recommended to
>  > disable ASLR in the kernel and retry running both primary and secondary
>  > processes
>  > 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] PANIC in rte_eal_init():
>  > 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] Cannot init memory
>  >
>  > I tried to disable ASLR, but it did not work.
>  >
>  > Does anyone have a similar situation?
>  > Based on this page (https://github.com/collectd/collectd/issues/2284), this
>  > is known issue and it seemed we can avoid this with "--base-virtaddr"
>  > option.
>  > Does anyone know how to use this option in ovs-dpdk?
>
>  You will need to use populate this option into other_config:dpdk-extra
>  in the ovsdb.
>
>ovs-vsctl --no-wait set Open_vSwitch \
>. other_config:dpdk-extra="--base-virtaddr=..."
>
>  I don't know the base virtual address value you should use, however.
>
>  -Aaron
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] rte_eal_init() error when using ovs-dpdk with secondary application.

2017-06-15 Thread Junguk Cho
Hi, Aaron.

Thank you for reply.

You will need to use populate this option into other_config:dpdk-extra
in the ovsdb.

  ovs-vsctl --no-wait set Open_vSwitch \
  . other_config:dpdk-extra="--base-virtaddr=..."

-> I tried to use "struct rte_memseg *m = rte_eal_get_physmem_layout()"
option based on (http://dpdk.org/doc/api/structrte__memseg.html).
Always it returns different number.
How do I get virtual address value to use it as an input of "base-viraddr"?

I don't know the base virtual address value you should use, however.
-> Do you mean it will not help?

Thanks,
Junguk


On Thu, Jun 15, 2017 at 9:28 AM, Aaron Conole  wrote:

> Junguk Cho  writes:
>
> > Hi,
> >
> > I use ovs-dpdk (ovs-2.7, dpdk-16.11.1) with one application which talks
> to
> > ovs by using ring device and "--proc-type=secondary" (secondary
> processes).
> > It generally works well, but sometimes it shows this error.
> >
> > It seems it could not find correct memory mapping in the application.
> >
> > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: Probing VFIO
> support...
> > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: WARNING: Address
> Space
> > Layout Randomization (ASLR) is enabled in the kernel.
> > 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL:This may cause
> > issues with mapping memory into secondary processes
> > 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: Could not mmap
> > 17146314752 <(714)%20631-4752> bytes in /dev/zero at [0x7f662c80],
> got
> > [0x7f650600] - please use '--base-virtaddr' option
> > 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: It is recommended to
> > disable ASLR in the kernel and retry running both primary and secondary
> > processes
> > 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] PANIC in rte_eal_init():
> > 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] Cannot init memory
> >
> > I tried to disable ASLR, but it did not work.
> >
> > Does anyone have a similar situation?
> > Based on this page (https://github.com/collectd/collectd/issues/2284),
> this
> > is known issue and it seemed we can avoid this with "--base-virtaddr"
> > option.
> > Does anyone know how to use this option in ovs-dpdk?
>
> You will need to use populate this option into other_config:dpdk-extra
> in the ovsdb.
>
>   ovs-vsctl --no-wait set Open_vSwitch \
>   . other_config:dpdk-extra="--base-virtaddr=..."
>
> I don't know the base virtual address value you should use, however.
>
> -Aaron
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] rte_eal_init() error when using ovs-dpdk with secondary application.

2017-06-15 Thread Aaron Conole
Junguk Cho  writes:

> Hi,
>
> I use ovs-dpdk (ovs-2.7, dpdk-16.11.1) with one application which talks to
> ovs by using ring device and "--proc-type=secondary" (secondary processes).
> It generally works well, but sometimes it shows this error.
>
> It seems it could not find correct memory mapping in the application.
>
> 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: Probing VFIO support...
> 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: WARNING: Address Space
> Layout Randomization (ASLR) is enabled in the kernel.
> 2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL:This may cause
> issues with mapping memory into secondary processes
> 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: Could not mmap
> 17146314752 <(714)%20631-4752> bytes in /dev/zero at [0x7f662c80], got
> [0x7f650600] - please use '--base-virtaddr' option
> 2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: It is recommended to
> disable ASLR in the kernel and retry running both primary and secondary
> processes
> 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] PANIC in rte_eal_init():
> 2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] Cannot init memory
>
> I tried to disable ASLR, but it did not work.
>
> Does anyone have a similar situation?
> Based on this page (https://github.com/collectd/collectd/issues/2284), this
> is known issue and it seemed we can avoid this with "--base-virtaddr"
> option.
> Does anyone know how to use this option in ovs-dpdk?

You will need to use populate this option into other_config:dpdk-extra
in the ovsdb.

  ovs-vsctl --no-wait set Open_vSwitch \
  . other_config:dpdk-extra="--base-virtaddr=..."

I don't know the base virtual address value you should use, however.

-Aaron
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 01/31] userspace: Add OXM field MFF_PACKET_TYPE

2017-06-15 Thread Ben Pfaff
On Thu, Jun 15, 2017 at 11:16:15AM -0400, Aaron Conole wrote:
> Hi Ben,
> 
> Ben Pfaff  writes:
> 
> > From: Jan Scheurich 
> >
> > Allow packet type namespace OFPHTN_ETHERTYPE as alternative pre-requisite
> > for matching L3 protocols (MPLS, IP, IPv6, ARP etc).
> >
> > Change the meta-flow definition of packet_type field to use the new
> > custom format MFS_PACKET_TYPE representing "(NS,NS_TYPE)".
> >
> > Parsing routine for MFS_PACKET_TYPE added to meta-flow.c. Formatting
> > routine for field packet_type extracted from match_format() and moved to
> > flow.c to be used from meta-flow.c for formatting MFS_PACKET_TYPE.
> >
> > Updated the ovs-fields man page source meta-flow.xml with documentation
> > for packet-type-aware bridges and added documentation for field packet_type.
> >
> > Added packet_type to the matching properties in tests/ofproto.at. Should be
> > removed later, when packet_type_aware bridge attribute will be introduced.
> >
> > Signed-off-by: Jan Scheurich 
> > Signed-off-by: Ben Pfaff 
> 
> Does this need a co-authored by?  If not, there's probably a useful
> change from the checkpatch side to match From: tags with Signed-off-by:
> tags (for these delivery cases to squelch the error message).

This doesn't need a co-authored-by because I'm merely passing along a
patch authored by someone else.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 01/31] userspace: Add OXM field MFF_PACKET_TYPE

2017-06-15 Thread Aaron Conole
Hi Ben,

Ben Pfaff  writes:

> From: Jan Scheurich 
>
> Allow packet type namespace OFPHTN_ETHERTYPE as alternative pre-requisite
> for matching L3 protocols (MPLS, IP, IPv6, ARP etc).
>
> Change the meta-flow definition of packet_type field to use the new
> custom format MFS_PACKET_TYPE representing "(NS,NS_TYPE)".
>
> Parsing routine for MFS_PACKET_TYPE added to meta-flow.c. Formatting
> routine for field packet_type extracted from match_format() and moved to
> flow.c to be used from meta-flow.c for formatting MFS_PACKET_TYPE.
>
> Updated the ovs-fields man page source meta-flow.xml with documentation
> for packet-type-aware bridges and added documentation for field packet_type.
>
> Added packet_type to the matching properties in tests/ofproto.at. Should be
> removed later, when packet_type_aware bridge attribute will be introduced.
>
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Ben Pfaff 

Does this need a co-authored by?  If not, there's probably a useful
change from the checkpatch side to match From: tags with Signed-off-by:
tags (for these delivery cases to squelch the error message).

-Aaron
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Stokes, Ian


> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Thursday, June 15, 2017 12:37 PM
> To: d...@openvswitch.org; Darrell Ball 
> Cc: Heetae Ahn ; Daniele Di Proietto
> ; Ben Pfaff ; Pravin Shelar
> ; Loftus, Ciara ; Stokes, Ian
> ; Kevin Traynor ; Ilya Maximets
> 
> Subject: [PATCH v3 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-
> cpu-mask changes.
> 
> Reconfiguration of HW NICs may lead to packet drops.
> In current model all physical ports will be reconfigured each time number
> of PMD threads changed. Since we not stopping threads on pmd-cpu-mask
> changes, this patch will help to further decrease port's downtime by
> setting the maximum possible number of wanted tx queues to avoid
> unnecessary reconfigurations.
> 
> Signed-off-by: Ilya Maximets 
> Tested-by: Ian Stokes 
> ---
>  lib/dpif-netdev.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c0bcca0..d2443da
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3450,7 +3450,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
>  struct dp_netdev_pmd_thread *pmd;
>  struct dp_netdev_port *port;
> -int wanted_txqs;
> +int needed_txqs, wanted_txqs;
> 
>  dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
> 
> @@ -3458,7 +3458,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>   * on the system and the user configuration. */
>  reconfigure_pmd_threads(dp);
> 
> -wanted_txqs = cmap_count(>poll_threads);
> +/* We need 1 Tx queue for each thread to avoid locking, but we will
> try
> + * to allocate the maximum possible value to minimize the number of
> port
> + * reconfigurations. */
> +needed_txqs = cmap_count(>poll_threads);
> +/* (n_cores + 1) is the maximum that we might need to have.
> + * Additional queue is for non-PMD threads. */
> +wanted_txqs = ovs_numa_get_n_cores();
> +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
> +wanted_txqs++;
> 
>  /* The number of pmd threads might have changed, or a port can be
> new:
>   * adjust the txqs. */
> @@ -3471,9 +3479,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> 
>  /* Check for all the ports that need reconfiguration.  We cache this
> in
>   * 'port->reconfigure', because netdev_is_reconf_required() can
> change at
> - * any time. */
> + * any time.
> + * Also mark for reconfiguration all ports which will likely change
> their
> + * 'dynamic_txqs' parameter. It's required to stop using them before
> + * changing this setting and it's simpler to mark ports here and
> allow
> + * 'pmd_remove_stale_ports' to remove them from threads. There will
> be
> + * no actual reconfiguration in 'port_reconfigure' because it's
> + * unnecessary.  */
>  HMAP_FOR_EACH (port, node, >ports) {
> -if (netdev_is_reconf_required(port->netdev)) {
> +if (netdev_is_reconf_required(port->netdev)
> +|| (port->dynamic_txqs
> +!= (netdev_n_txq(port->netdev) < needed_txqs))) {
>  port->need_reconfigure = true;
>  }
>  }
> @@ -3508,7 +3524,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>  seq_change(dp->port_seq);
>  port_destroy(port);
>  } else {
> -port->dynamic_txqs = netdev_n_txq(port->netdev) <
> wanted_txqs;
> +port->dynamic_txqs = netdev_n_txq(port->netdev) <
> + needed_txqs;
>  }
>  }
> 
LGTM, 

Acked-by: Ian Stokes 
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] How to efficiently connect docker network to ovs+dpdk switch

2017-06-15 Thread 王志克
Hi All,

Previously I use kernel ovs, and docker veth-pair port can be added to ovs 
bridge directly. In this case, docker traffic from kernel will direct to ovs 
kernel module.

Now I want to use ovs+dpdk to speed up the forwarding performance, but I am 
wondering how docker traffic would go to ovs userspace bridge. From my 
understanding, veth-pair traffic would
always go from kernel, and it needs to be copied to userspace, then bridged by 
ovs+dpdk. The performance is quite low.

So question:
1) What is the proposed docker network port for ovs+dpdk? Ideally kernel should 
NOT be involved. I am not sure whether it is possible.
2) currently veth-pair port can only be handled by main thread 
(NON_PMD_CORE_ID) since its is_pmd attributes is false. This means it can only 
be handled by 1 CPU. How can multiple CPU handle such case?

Br,
Wang Zhike

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Fwd: Re: [PATCH V11 00/33] Introducing HW offload support for openvswitch

2017-06-15 Thread Eelco Chaudron

FYI HW offload applied to OVS master ;)

//Eelco


 Forwarded Message 
Subject: 	Re: [ovs-dev] [PATCH V11 00/33] Introducing HW offload support 
for openvswitch

Date:   Thu, 15 Jun 2017 12:15:20 +0200
From:   Simon Horman 
To: Roi Dayan 
CC: 	d...@openvswitch.org, Shahar Klein , Hadar Hen 
Zion , Rony Efraim , Flavio 
Leitner , Jiri Pirko , Marcelo 
Ricardo Leitner , Or Gerlitz 
, Andy Gospodarek 




On Tue, Jun 13, 2017 at 06:03:22PM +0300, Roi Dayan wrote:

This patch series introduces rule offload functionality to dpif-netlink
via netdev ports new flow offloading API. The user can specify whether to
enable rule offloading or not via OVS configuration. Netdev providers
are able to implement netdev flow offload API in order to offload rules.

This patch series also implements one offload scheme for netdev-linux,
using TC flower classifier, which was chosen because its sort of natural
to state OVS DP rules for this classifier. However, the code can be
extended to support other classifiers such as U32, eBPF, etc which support
offload as well.

The use-case we are currently addressing is the newly sriov switchdev mode
in the Linux kernel which was introduced in version 4.8.
This series was tested against sriov vfs vports representors of the
Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.

The feature is disabled by default and can be enabled by setting
other_config:hw-offload to True.
Currently offloading is done for rules consisting on matching on L2 MAC,
L3 IP (not including IP flags), L4 TCP/UDP (not inclduing TCP flags),
VLAN, VXLAN tunnel.
HW offloading is supported only for drop and for single output action with


Thanks Roi,

I have applied these to the master branch
(in two batches; 1 - 7 yesterday and the rest today).

There were some follow-up requests from Flavio for some of the patches,
please lets not forget those.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev: Fix netdev_open() to adhere to class type if given

2017-06-15 Thread Eelco Chaudron

On 14/06/17 18:27, Ben Pfaff wrote:

On Wed, Jun 14, 2017 at 06:21:50PM +0200, Eelco Chaudron wrote:

...
Can we get this also on the 2.7 branch?

OK.  If it causes problems, we'll need to fix them or revert it, of
course.

If something does come up I'll try my best the help out/fix it.

Thanks,

Eelco
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] checkpatch: Fix skipping of the most recent commit.

2017-06-15 Thread Ilya Maximets
'range(n_patches, 0, -1)' generates list starting from 'n_patches'
and not including zero. This leads to checking of N most recent
commits starting from the second one.

New version will generate right list starting from 'n_patches - 1'
and including zero. So, the most recent commit (HEAD~0) will be
checked and desired behavior will be achieved.

Also, 'reversed' looks better than 'range(n_patches - 1, -1, -1)'

Fixes: a1fccabce2cb ("checkpatch: Support checking recent commits in the 
current repo.")
Signed-off-by: Ilya Maximets 
---
 utilities/checkpatch.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 243a430..b45a255 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -480,7 +480,7 @@ if __name__ == '__main__':
 
 if n_patches:
 status = 0
-for i in range(n_patches, 0, -1):
+for i in reversed(range(0, n_patches)):
 revision = 'HEAD~%d' % i
 f = os.popen('git format-patch -1 --stdout %s' % revision, 'r')
 patch = f.read()
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Stokes, Ian
> >> -Original Message-
> >> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >> Sent: Thursday, June 15, 2017 9:45 AM
> >> To: Stokes, Ian ; d...@openvswitch.org; Darrell
> >> Ball 
> >> Cc: Heetae Ahn ; Daniele Di Proietto
> >> ; Ben Pfaff ; Pravin Shelar
> >> ; Loftus, Ciara ; Kevin
> >> Traynor 
> >> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration
> >> on pmd-cpu-mask changes.
> >>
> >> On 15.06.2017 11:02, Stokes, Ian wrote:
>  Reconfiguration of HW NICs may lead to packet drops.
>  In current model all physical ports will be reconfigured each time
>  number of PMD threads changed. Since we not stopping threads on
>  pmd-cpu-mask changes, this patch will help to further decrease
>  port's downtime by setting the maximum possible number of wanted tx
>  queues to avoid unnecessary reconfigurations.
> >>>
> >>> Hi Ilya,
> >>>
> >>> Thanks for the patch, in testing I can confirm it resolves the
> >>> issue. A
> >> few observations and comments below.
> >>
> >> Hi Ian. Thanks for testing.
> >>
> >>>
> 
>  Signed-off-by: Ilya Maximets 
>  ---
>   lib/dpif-netdev.c | 26 +-
>   1 file changed, 21 insertions(+), 5 deletions(-)
> 
>  diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>  596d133..79db770
>  100644
>  --- a/lib/dpif-netdev.c
>  +++ b/lib/dpif-netdev.c
>  @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
>   struct dp_netdev_pmd_thread *pmd;
>   struct dp_netdev_port *port;
>  -int wanted_txqs;
>  +int needed_txqs, wanted_txqs;
> 
>   dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
> 
>  @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>    * on the system and the user configuration. */
>   reconfigure_pmd_threads(dp);
> 
>  -wanted_txqs = cmap_count(>poll_threads);
>  +/* We need 1 Tx queue for each thread to avoid locking, but we
>  + will
>  try
>  + * to allocate the maximum possible value to minimize the
>  + number of
>  port
>  + * reconfigurations. */
>  +needed_txqs = cmap_count(>poll_threads);
>  +/* (n_cores + 1) is the maximum that we might need to have.
>  + * Additional queue is for non-PMD threads. */
>  +wanted_txqs = ovs_numa_get_n_cores();
>  +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>  +wanted_txqs++;
> >>>
> >>> So just after this line there is a call
> >>>
> >>> HMAP_FOR_EACH (port, node, >ports) {
> >>> netdev_set_tx_multiq(port->netdev, wanted_txqs);
> >>> }
> >>>
> >>> My initial concern with this patch was twofold:
> >>>
> >>> (i) What would happen if the number of cores was greater than the
> >>> number
> >> of supported queues.
> >>>
> >>> That's not an issue from what I can see as the requested_txqs will
> >>> be
> >> compared to what's supported by the device and the smaller of the two
> >> will be chosen in the eventual call to dpdk_eth_dev_init():
> >>>
> >>> n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >>>
> >>> (ii) What happens when a device reports the incorrect number of max
> >>> tx
> >> queues? (can occur depending on the mode of the device).
> >>>
> >>> I think this case is ok as well as it's handled by the existing txq
> >> retry logic in dpdk_eth_dev_queue_setup().
> >>
> >> Actually these concerns are not about this patch. Such situations are
> >> possible on current master. But you're right, both of them handled
> >> properly with existing code according to conventions between
> >> dpif-netdev and netdev layers.
> >
> > Sure, they are not direct concerns as they could be reproduced on master
> as is, but moving to using core count can potentially cause the number of
> txqs requests to be larger than what's supported, for example with hyper
> threading a system can appear with 64 cores & will require 65 txqs queues,
> depending on the DPDK device and mode this can increase the chance of
> requesting more txqs than the device supports (instead of the current
> model of requesting the number of txqs as the number polling threads).
> >
> > Either way the code is in place to handle such a situation so I think
> it's fine.
> >>
>   /* The number of pmd threads might have changed, or a port can
>  be
>  new:
>    * adjust the txqs. */
>  @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> 
>   /* Check for all the ports that need reconfiguration.  We
>  cache this in
>    * 'port->reconfigure', because netdev_is_reconf_required()
>  can change at
>  - * any time. */
>  + * any time.
>  + * Also mark for 

[ovs-dev] [PATCH v3 3/3] dpif-netdev: Don't uninit emc on reload.

2017-06-15 Thread Ilya Maximets
There are many reasons for reloading of pmd threads:
* reconfiguration of one of the ports.
* Adjusting of static_tx_qid.
* Adding new tx/rx ports.

In many cases EMC is still useful after reload and uninit
will only lead to unnecessary upcalls/classifier lookups.

Such behaviour slows down the datapath. Uninit itself slows
down the reload path. All this factors leads to additional
unexpected latencies/drops on events not directly connected
to current PMD thread.

Lets not uninitialize emc cache on reload path.
'emc_cache_slow_sweep()' and replacements should free all
the old/unwanted entries.

Signed-off-by: Ilya Maximets 
Acked-by: Cian Ferriter 
Tested-by: Cian Ferriter 
---
 lib/dpif-netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d2443da..e7ab306 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3794,9 +3794,9 @@ pmd_thread_main(void *f_)
 ovs_numa_thread_setaffinity_core(pmd->core_id);
 dpdk_set_lcore_id(pmd->core_id);
 poll_cnt = pmd_load_queues_and_ports(pmd, _list);
+emc_cache_init(>flow_cache);
 reload:
 pmd_alloc_static_tx_qid(pmd);
-emc_cache_init(>flow_cache);
 
 /* List port/core affinity */
 for (i = 0; i < poll_cnt; i++) {
@@ -3843,13 +3843,13 @@ reload:
  * reloading the updated configuration. */
 dp_netdev_pmd_reload_done(pmd);
 
-emc_cache_uninit(>flow_cache);
 pmd_free_static_tx_qid(pmd);
 
 if (!exiting) {
 goto reload;
 }
 
+emc_cache_uninit(>flow_cache);
 free(poll_list);
 pmd_free_cached_ports(pmd);
 return NULL;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Ilya Maximets
Reconfiguration of HW NICs may lead to packet drops.
In current model all physical ports will be reconfigured each
time number of PMD threads changed. Since we not stopping
threads on pmd-cpu-mask changes, this patch will help to further
decrease port's downtime by setting the maximum possible number
of wanted tx queues to avoid unnecessary reconfigurations.

Signed-off-by: Ilya Maximets 
Tested-by: Ian Stokes 
---
 lib/dpif-netdev.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c0bcca0..d2443da 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3450,7 +3450,7 @@ reconfigure_datapath(struct dp_netdev *dp)
 {
 struct dp_netdev_pmd_thread *pmd;
 struct dp_netdev_port *port;
-int wanted_txqs;
+int needed_txqs, wanted_txqs;
 
 dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
 
@@ -3458,7 +3458,15 @@ reconfigure_datapath(struct dp_netdev *dp)
  * on the system and the user configuration. */
 reconfigure_pmd_threads(dp);
 
-wanted_txqs = cmap_count(>poll_threads);
+/* We need 1 Tx queue for each thread to avoid locking, but we will try
+ * to allocate the maximum possible value to minimize the number of port
+ * reconfigurations. */
+needed_txqs = cmap_count(>poll_threads);
+/* (n_cores + 1) is the maximum that we might need to have.
+ * Additional queue is for non-PMD threads. */
+wanted_txqs = ovs_numa_get_n_cores();
+ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
+wanted_txqs++;
 
 /* The number of pmd threads might have changed, or a port can be new:
  * adjust the txqs. */
@@ -3471,9 +3479,17 @@ reconfigure_datapath(struct dp_netdev *dp)
 
 /* Check for all the ports that need reconfiguration.  We cache this in
  * 'port->reconfigure', because netdev_is_reconf_required() can change at
- * any time. */
+ * any time.
+ * Also mark for reconfiguration all ports which will likely change their
+ * 'dynamic_txqs' parameter. It's required to stop using them before
+ * changing this setting and it's simpler to mark ports here and allow
+ * 'pmd_remove_stale_ports' to remove them from threads. There will be
+ * no actual reconfiguration in 'port_reconfigure' because it's
+ * unnecessary.  */
 HMAP_FOR_EACH (port, node, >ports) {
-if (netdev_is_reconf_required(port->netdev)) {
+if (netdev_is_reconf_required(port->netdev)
+|| (port->dynamic_txqs
+!= (netdev_n_txq(port->netdev) < needed_txqs))) {
 port->need_reconfigure = true;
 }
 }
@@ -3508,7 +3524,7 @@ reconfigure_datapath(struct dp_netdev *dp)
 seq_change(dp->port_seq);
 port_destroy(port);
 } else {
-port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs;
+port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
 }
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/3] dpif-netdev: Incremental addition/deletion of PMD threads.

2017-06-15 Thread Ilya Maximets
Currently, change of 'pmd-cpu-mask' is very heavy operation.
It requires destroying of all the PMD threads and creating
them back. After that, all the threads will sleep until
ports' redistribution finished.

This patch adds ability to not stop the datapath while
adjusting number/placement of PMD threads. All not affected
threads will forward traffic without any additional latencies.

id-pool created for static tx queue ids to keep them sequential
in a flexible way. non-PMD thread will always have
static_tx_qid = 0 as it was before.

Signed-off-by: Ilya Maximets 
Tested-by: Mark Kavanagh 
Acked-by: Mark Kavanagh 
---
 lib/dpif-netdev.c | 146 +++---
 tests/pmd.at  |   2 +-
 2 files changed, 108 insertions(+), 40 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2f224db..c0bcca0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -48,6 +48,7 @@
 #include "fat-rwlock.h"
 #include "flow.h"
 #include "hmapx.h"
+#include "id-pool.h"
 #include "latch.h"
 #include "netdev.h"
 #include "netdev-vport.h"
@@ -278,6 +279,9 @@ struct dp_netdev {
 
 /* Stores all 'struct dp_netdev_pmd_thread's. */
 struct cmap poll_threads;
+/* id pool for per thread static_tx_qid. */
+struct id_pool *tx_qid_pool;
+struct ovs_mutex tx_qid_pool_mutex;
 
 /* Protects the access of the 'struct dp_netdev_pmd_thread'
  * instance for non-pmd thread. */
@@ -563,7 +567,7 @@ struct dp_netdev_pmd_thread {
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and less
  * than 'cmap_count(dp->poll_threads)'. */
-const int static_tx_qid;
+uint32_t static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -643,6 +647,8 @@ static struct dp_netdev_pmd_thread 
*dp_netdev_get_pmd(struct dp_netdev *dp,
   unsigned core_id);
 static struct dp_netdev_pmd_thread *
 dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
+static void dp_netdev_del_pmd(struct dp_netdev *dp,
+  struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp, bool non_pmd);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
@@ -1182,10 +1188,17 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
 
 cmap_init(>poll_threads);
+
+ovs_mutex_init(>tx_qid_pool_mutex);
+/* We need 1 Tx queue for each possible core + 1 for non-PMD threads. */
+dp->tx_qid_pool = id_pool_create(0, ovs_numa_get_n_cores() + 1);
+
 ovs_mutex_init_recursive(>non_pmd_mutex);
 ovsthread_key_create(>per_pmd_key, NULL);
 
 ovs_mutex_lock(>port_mutex);
+/* non-PMD will be created before all other threads and will
+ * allocate static_tx_qid = 0. */
 dp_netdev_set_nonpmd(dp);
 
 error = do_add_port(dp, name, dpif_netdev_port_open_type(dp->class,
@@ -1280,6 +1293,9 @@ dp_netdev_free(struct dp_netdev *dp)
 dp_netdev_destroy_all_pmds(dp, true);
 cmap_destroy(>poll_threads);
 
+ovs_mutex_destroy(>tx_qid_pool_mutex);
+id_pool_destroy(dp->tx_qid_pool);
+
 ovs_mutex_destroy(>non_pmd_mutex);
 ovsthread_key_delete(dp->per_pmd_key);
 
@@ -3296,12 +3312,29 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 }
 
 static void
+reload_affected_pmds(struct dp_netdev *dp)
+{
+struct dp_netdev_pmd_thread *pmd;
+
+CMAP_FOR_EACH (pmd, node, >poll_threads) {
+if (pmd->need_reload) {
+dp_netdev_reload_pmd__(pmd);
+pmd->need_reload = false;
+}
+}
+}
+
+static void
 reconfigure_pmd_threads(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex)
 {
 struct dp_netdev_pmd_thread *pmd;
 struct ovs_numa_dump *pmd_cores;
+struct ovs_numa_info_core *core;
+struct hmapx to_delete = HMAPX_INITIALIZER(_delete);
+struct hmapx_node *node;
 bool changed = false;
+bool need_to_adjust_static_tx_qids = false;
 
 /* The pmd threads should be started only if there's a pmd port in the
  * datapath.  If the user didn't provide any "pmd-cpu-mask", we start
@@ -3314,40 +3347,64 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 pmd_cores = ovs_numa_dump_n_cores_per_numa(NR_PMD_THREADS);
 }
 
-/* Check for changed configuration */
-if (ovs_numa_dump_count(pmd_cores) != cmap_count(>poll_threads) - 1) {
-changed = true;
-} else {
-CMAP_FOR_EACH (pmd, node, >poll_threads) {
-if (pmd->core_id != NON_PMD_CORE_ID
-&& 

[ovs-dev] [PATCH v3 0/3] Incremental addition/deletion of PMD threads.

2017-06-15 Thread Ilya Maximets
Version 3:
* Added comment about 'static_txq_id's adjustment.
* Added additional parentheses around 'dynamic_txqs'
  comparison operand because of warning from GCC 6.3.1.

Version 2:
* Dropped patch [1/4] as already applied to master.
* Fixed checkpatch warning in patch [2/4].
* 'reconfigure_pmd_threads' modified for better maintainability
  as suggested by Daniele Di Proietto in patch [2/4]:
  1. delete old pmd threads
  2. reconfigure if 'static_tx_qid's adjustment needed
  3. add new pmd threads
* Fixed locking of HW tx queues in case of big amount of
  available cores in patch [3/4].
* Dropped RFC tag for patch [4/4].

Ilya Maximets (3):
  dpif-netdev: Incremental addition/deletion of PMD threads.
  dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.
  dpif-netdev: Don't uninit emc on reload.

 lib/dpif-netdev.c | 176 --
 tests/pmd.at  |   2 +-
 2 files changed, 131 insertions(+), 47 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V11 00/33] Introducing HW offload support for openvswitch

2017-06-15 Thread Roi Dayan



On 15/06/2017 13:15, Simon Horman wrote:

On Tue, Jun 13, 2017 at 06:03:22PM +0300, Roi Dayan wrote:

This patch series introduces rule offload functionality to dpif-netlink
via netdev ports new flow offloading API. The user can specify whether to
enable rule offloading or not via OVS configuration. Netdev providers
are able to implement netdev flow offload API in order to offload rules.

This patch series also implements one offload scheme for netdev-linux,
using TC flower classifier, which was chosen because its sort of natural
to state OVS DP rules for this classifier. However, the code can be
extended to support other classifiers such as U32, eBPF, etc which support
offload as well.

The use-case we are currently addressing is the newly sriov switchdev mode
in the Linux kernel which was introduced in version 4.8.
This series was tested against sriov vfs vports representors of the
Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.

The feature is disabled by default and can be enabled by setting
other_config:hw-offload to True.
Currently offloading is done for rules consisting on matching on L2 MAC,
L3 IP (not including IP flags), L4 TCP/UDP (not inclduing TCP flags),
VLAN, VXLAN tunnel.
HW offloading is supported only for drop and for single output action with


Thanks Roi,

I have applied these to the master branch
(in two batches; 1 - 7 yesterday and the rest today).

There were some follow-up requests from Flavio for some of the patches,
please lets not forget those.



yes I wrote myself everything I think. here is the list in case someone
wants to help out with something.

1. add support for sctp
2. able to change hw-offload & tc-policy. I'll submit a patch for this
   next week maybe. need to recheck i dont reproduce an issue i had
   with it.
3. in dump-flows to show offloaded=no. I think Flavio said he'll do it.
4. change void* for the hmap to something more nice.
5. move the hmap locks to be per dpif.
6. try to avoid forcing eth_type from ovs_be16 to uint16
7. try to refactor nl_parse_tcf to use tc_ticks_to_bytes
8. better handle cls_flower and other modules (i.e. a nice error if
   modules not found)


Thanks everyone!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Ilya Maximets
On 15.06.2017 12:03, Stokes, Ian wrote:
>> -Original Message-
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Thursday, June 15, 2017 9:45 AM
>> To: Stokes, Ian ; d...@openvswitch.org; Darrell Ball
>> 
>> Cc: Heetae Ahn ; Daniele Di Proietto
>> ; Ben Pfaff ; Pravin Shelar
>> ; Loftus, Ciara ; Kevin Traynor
>> 
>> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on
>> pmd-cpu-mask changes.
>>
>> On 15.06.2017 11:02, Stokes, Ian wrote:
 Reconfiguration of HW NICs may lead to packet drops.
 In current model all physical ports will be reconfigured each time
 number of PMD threads changed. Since we not stopping threads on
 pmd-cpu-mask changes, this patch will help to further decrease port's
 downtime by setting the maximum possible number of wanted tx queues
 to avoid unnecessary reconfigurations.
>>>
>>> Hi Ilya,
>>>
>>> Thanks for the patch, in testing I can confirm it resolves the issue. A
>> few observations and comments below.
>>
>> Hi Ian. Thanks for testing.
>>
>>>

 Signed-off-by: Ilya Maximets 
 ---
  lib/dpif-netdev.c | 26 +-
  1 file changed, 21 insertions(+), 5 deletions(-)

 diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
 596d133..79db770
 100644
 --- a/lib/dpif-netdev.c
 +++ b/lib/dpif-netdev.c
 @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
  struct dp_netdev_pmd_thread *pmd;
  struct dp_netdev_port *port;
 -int wanted_txqs;
 +int needed_txqs, wanted_txqs;

  dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);

 @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
   * on the system and the user configuration. */
  reconfigure_pmd_threads(dp);

 -wanted_txqs = cmap_count(>poll_threads);
 +/* We need 1 Tx queue for each thread to avoid locking, but we
 + will
 try
 + * to allocate the maximum possible value to minimize the number
 + of
 port
 + * reconfigurations. */
 +needed_txqs = cmap_count(>poll_threads);
 +/* (n_cores + 1) is the maximum that we might need to have.
 + * Additional queue is for non-PMD threads. */
 +wanted_txqs = ovs_numa_get_n_cores();
 +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
 +wanted_txqs++;
>>>
>>> So just after this line there is a call
>>>
>>> HMAP_FOR_EACH (port, node, >ports) {
>>> netdev_set_tx_multiq(port->netdev, wanted_txqs);
>>> }
>>>
>>> My initial concern with this patch was twofold:
>>>
>>> (i) What would happen if the number of cores was greater than the number
>> of supported queues.
>>>
>>> That's not an issue from what I can see as the requested_txqs will be
>> compared to what's supported by the device and the smaller of the two will
>> be chosen in the eventual call to dpdk_eth_dev_init():
>>>
>>> n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
>>>
>>> (ii) What happens when a device reports the incorrect number of max tx
>> queues? (can occur depending on the mode of the device).
>>>
>>> I think this case is ok as well as it's handled by the existing txq
>> retry logic in dpdk_eth_dev_queue_setup().
>>
>> Actually these concerns are not about this patch. Such situations are
>> possible on current master. But you're right, both of them handled
>> properly with existing code according to conventions between dpif-netdev
>> and netdev layers.
> 
> Sure, they are not direct concerns as they could be reproduced on master as 
> is, but moving to using core count can potentially cause the number of txqs 
> requests to be larger than what's supported, for example with hyper threading 
> a system can appear with 64 cores & will require 65 txqs queues, depending on 
> the DPDK device and mode this can increase the chance of requesting more txqs 
> than the device supports (instead of the current model of requesting the 
> number of txqs as the number polling threads). 
> 
> Either way the code is in place to handle such a situation so I think it's 
> fine.
>>
  /* The number of pmd threads might have changed, or a port can
 be
 new:
   * adjust the txqs. */
 @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)

  /* Check for all the ports that need reconfiguration.  We cache
 this in
   * 'port->reconfigure', because netdev_is_reconf_required() can
 change at
 - * any time. */
 + * any time.
 + * Also mark for reconfiguration all ports which will likely
 + change
 their
 + * 'dynamic_txqs' parameter. It's required to stop using them
>> before
 + * changing this setting and it's simpler to 

Re: [ovs-dev] [PATCH V11 00/33] Introducing HW offload support for openvswitch

2017-06-15 Thread Simon Horman
On Tue, Jun 13, 2017 at 06:03:22PM +0300, Roi Dayan wrote:
> This patch series introduces rule offload functionality to dpif-netlink
> via netdev ports new flow offloading API. The user can specify whether to
> enable rule offloading or not via OVS configuration. Netdev providers
> are able to implement netdev flow offload API in order to offload rules.
> 
> This patch series also implements one offload scheme for netdev-linux,
> using TC flower classifier, which was chosen because its sort of natural
> to state OVS DP rules for this classifier. However, the code can be
> extended to support other classifiers such as U32, eBPF, etc which support
> offload as well.
> 
> The use-case we are currently addressing is the newly sriov switchdev mode
> in the Linux kernel which was introduced in version 4.8.
> This series was tested against sriov vfs vports representors of the
> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
> 
> The feature is disabled by default and can be enabled by setting
> other_config:hw-offload to True.
> Currently offloading is done for rules consisting on matching on L2 MAC,
> L3 IP (not including IP flags), L4 TCP/UDP (not inclduing TCP flags),
> VLAN, VXLAN tunnel.
> HW offloading is supported only for drop and for single output action with

Thanks Roi,

I have applied these to the master branch
(in two batches; 1 - 7 yesterday and the rest today).

There were some follow-up requests from Flavio for some of the patches,
please lets not forget those.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V10 28/33] dpctl: Indicate if flow is offloaded when dumping flows of all types

2017-06-15 Thread Simon Horman
On Sat, Jun 10, 2017 at 10:23:11AM -0300, Flavio Leitner wrote:
> On Thu, Jun 08, 2017 at 02:46:45PM +0300, Roi Dayan wrote:
> > From: Paul Blakey 
> > 
> > When verbosity is requested on dump-flows (-m) indicate which flows
> > are offloaded.
> > 
> > Signed-off-by: Paul Blakey 
> > Reviewed-by: Roi Dayan 
> > ---
> 
> I'd suggest to either say offloaded: yes or no, and not to just
> add the field if offloaded. It makes harder to parse on a script.
> 
> I can propose an improvement if this gets merged, so it's not a
> problem for me.
> 
> Acked-by: Flavio Leitner 

I have applied v11 to master.

I for one am in favour of seeing your proposed improvement.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V11 18/33] netdev-tc-offloads: Implement netdev flow put using tc interface

2017-06-15 Thread Simon Horman
On Tue, Jun 13, 2017 at 06:03:40PM +0300, Roi Dayan wrote:
> From: Paul Blakey 
> 
> Currently only tunnel offload is supported.
> 
> Signed-off-by: Paul Blakey 
> Reviewed-by: Roi Dayan 
> Reviewed-by: Simon Horman 
> Acked-by: Flavio Leitner 

I have applied this to master with the following trivial modification.

> +static int
> +test_key_and_mask(struct match *match) {

I changed the above to:

static int
test_key_and_mask(struct match *match)
{
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental addition/deletion of PMD threads.

2017-06-15 Thread Kavanagh, Mark B
>From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>Sent: Thursday, June 15, 2017 11:07 AM
>To: Kavanagh, Mark B ; d...@openvswitch.org; 
>Darrell Ball
>
>Cc: Heetae Ahn ; Daniele Di Proietto 
>; Ben
>Pfaff ; Pravin Shelar ; Loftus, Ciara 
>;
>Stokes, Ian ; Kevin Traynor 
>Subject: Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental 
>addition/deletion of PMD
>threads.
>
>On 06.06.2017 13:02, Kavanagh, Mark B wrote:
>>
>>
>>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>>> Sent: Monday, June 5, 2017 9:18 AM
>>> To: Kavanagh, Mark B ; d...@openvswitch.org; 
>>> Darrell Ball
>>> 
>>> Cc: Heetae Ahn ; Daniele Di Proietto 
>>> ; Ben
>>> Pfaff ; Pravin Shelar ; Loftus, Ciara
>;
>>> Stokes, Ian ; Kevin Traynor 
>>> Subject: Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental 
>>> addition/deletion of PMD
>>> threads.
>>>
>>> Thanks for testing, Mark.
>>>
>>> Comments inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 01.06.2017 17:30, Kavanagh, Mark B wrote:
> From: ovs-dev-boun...@openvswitch.org 
> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf
>>> Of
> Ilya Maximets
> Sent: Tuesday, May 30, 2017 3:12 PM
> To: d...@openvswitch.org; Darrell Ball 
> Cc: Heetae Ahn ; Ilya Maximets 
> 
> Subject: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental 
> addition/deletion of PMD
>>> threads.
>
> Currently, change of 'pmd-cpu-mask' is very heavy operation.
> It requires destroying of all the PMD threads and creating
> them back. After that, all the threads will sleep until
> ports' redistribution finished.
>
> This patch adds ability to not stop the datapath while
> adjusting number/placement of PMD threads. All not affected
> threads will forward traffic without any additional latencies.
>
> id-pool created for static tx queue ids to keep them sequential
> in a flexible way. non-PMD thread will always have
> static_tx_qid = 0 as it was before.
>
> Signed-off-by: Ilya Maximets 

 Hi Ilya,

 Patch LGTM - just once comment inline.

 I conducted some light testing and observed the following:
 - temporary dip in forwarding rate (~50%) when modifying pmd-cpu-mask 
 (in previous
>>> implementation, forwarding stopped completely)
 - occasional drop to zero in forwarding rate when modifying 
 pmd-cpu-mask (e.g.
>>> change from '2' to '12')
>>>
>>> I guess, drop to zero in this case could be fixed by second
>>> patch of the series. (If it caused by port reconfiguration)
>>
>> Yes, that's true - my colleague, Ian Stokes, will review and test that patch 
>> shortly.
>>
>>>
 - temporary drop of ~1mpps when adding/deleting port (similar to 
 existing
>>> implementation)

 I've also conducted the following sanity checks on the patch, and found no 
 issues:
  - checkpatch.py
  - compilation with clang
  - compilation with gcc
  - sparse
  - make check (no additional tests fail after application of this patch)

 Tested-by: Mark Kavanagh 

 Thanks,
 Mark

> ---
> lib/dpif-netdev.c | 143 
> +++---
> tests/pmd.at  |   2 +-
> 2 files changed, 105 insertions(+), 40 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index b50164b..596d133 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -48,6 +48,7 @@
> #include "fat-rwlock.h"
> #include "flow.h"
> #include "hmapx.h"
> +#include "id-pool.h"
> #include "latch.h"
> #include "netdev.h"
> #include "netdev-vport.h"
> @@ -277,6 +278,9 @@ struct dp_netdev {
>
> /* Stores all 'struct dp_netdev_pmd_thread's. */
> struct cmap poll_threads;
> +/* id pool for per thread static_tx_qid. */
> +struct id_pool *tx_qid_pool;
> +struct ovs_mutex tx_qid_pool_mutex;
>
> /* Protects the access of the 'struct dp_netdev_pmd_thread'
>  * instance for non-pmd thread. */
> @@ -562,7 +566,7 @@ struct dp_netdev_pmd_thread {
> /* Queue id used by this pmd thread to send packets on all netdevs if
>  * XPS disabled for this netdev. All static_tx_qid's are unique and 
> less
>  * than 'cmap_count(dp->poll_threads)'. */
> -const int static_tx_qid;
> +uint32_t static_tx_qid;
>
> struct ovs_mutex port_mutex;/* Mutex 

Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental addition/deletion of PMD threads.

2017-06-15 Thread Ilya Maximets
On 06.06.2017 13:02, Kavanagh, Mark B wrote:
> 
> 
>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
>> Sent: Monday, June 5, 2017 9:18 AM
>> To: Kavanagh, Mark B ; d...@openvswitch.org; 
>> Darrell Ball
>> 
>> Cc: Heetae Ahn ; Daniele Di Proietto 
>> ; Ben
>> Pfaff ; Pravin Shelar ; Loftus, Ciara 
>> ;
>> Stokes, Ian ; Kevin Traynor 
>> Subject: Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental 
>> addition/deletion of PMD
>> threads.
>>
>> Thanks for testing, Mark.
>>
>> Comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>> On 01.06.2017 17:30, Kavanagh, Mark B wrote:
 From: ovs-dev-boun...@openvswitch.org 
 [mailto:ovs-dev-boun...@openvswitch.org] On Behalf
>> Of
 Ilya Maximets
 Sent: Tuesday, May 30, 2017 3:12 PM
 To: d...@openvswitch.org; Darrell Ball 
 Cc: Heetae Ahn ; Ilya Maximets 
 
 Subject: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental 
 addition/deletion of PMD
>> threads.

 Currently, change of 'pmd-cpu-mask' is very heavy operation.
 It requires destroying of all the PMD threads and creating
 them back. After that, all the threads will sleep until
 ports' redistribution finished.

 This patch adds ability to not stop the datapath while
 adjusting number/placement of PMD threads. All not affected
 threads will forward traffic without any additional latencies.

 id-pool created for static tx queue ids to keep them sequential
 in a flexible way. non-PMD thread will always have
 static_tx_qid = 0 as it was before.

 Signed-off-by: Ilya Maximets 
>>>
>>> Hi Ilya,
>>>
>>> Patch LGTM - just once comment inline.
>>>
>>> I conducted some light testing and observed the following:
>>> - temporary dip in forwarding rate (~50%) when modifying pmd-cpu-mask 
>>> (in previous
>> implementation, forwarding stopped completely)
>>> - occasional drop to zero in forwarding rate when modifying 
>>> pmd-cpu-mask (e.g.
>> change from '2' to '12')
>>
>> I guess, drop to zero in this case could be fixed by second
>> patch of the series. (If it caused by port reconfiguration)
> 
> Yes, that's true - my colleague, Ian Stokes, will review and test that patch 
> shortly.
> 
>>
>>> - temporary drop of ~1mpps when adding/deleting port (similar to 
>>> existing
>> implementation)
>>>
>>> I've also conducted the following sanity checks on the patch, and found no 
>>> issues:
>>>  - checkpatch.py
>>>  - compilation with clang
>>>  - compilation with gcc
>>>  - sparse
>>>  - make check (no additional tests fail after application of this patch)
>>>
>>> Tested-by: Mark Kavanagh 
>>>
>>> Thanks,
>>> Mark
>>>
 ---
 lib/dpif-netdev.c | 143 
 +++---
 tests/pmd.at  |   2 +-
 2 files changed, 105 insertions(+), 40 deletions(-)

 diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
 index b50164b..596d133 100644
 --- a/lib/dpif-netdev.c
 +++ b/lib/dpif-netdev.c
 @@ -48,6 +48,7 @@
 #include "fat-rwlock.h"
 #include "flow.h"
 #include "hmapx.h"
 +#include "id-pool.h"
 #include "latch.h"
 #include "netdev.h"
 #include "netdev-vport.h"
 @@ -277,6 +278,9 @@ struct dp_netdev {

 /* Stores all 'struct dp_netdev_pmd_thread's. */
 struct cmap poll_threads;
 +/* id pool for per thread static_tx_qid. */
 +struct id_pool *tx_qid_pool;
 +struct ovs_mutex tx_qid_pool_mutex;

 /* Protects the access of the 'struct dp_netdev_pmd_thread'
  * instance for non-pmd thread. */
 @@ -562,7 +566,7 @@ struct dp_netdev_pmd_thread {
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and 
 less
  * than 'cmap_count(dp->poll_threads)'. */
 -const int static_tx_qid;
 +uint32_t static_tx_qid;

 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 
 'tx_ports'. */
 /* List of rx queues to poll. */
 @@ -642,6 +646,8 @@ static struct dp_netdev_pmd_thread 
 *dp_netdev_get_pmd(struct dp_netdev
 *dp,
   unsigned core_id);
 static struct dp_netdev_pmd_thread *
 dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
 +static void dp_netdev_del_pmd(struct dp_netdev *dp,
 +  struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp, bool non_pmd);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 

[ovs-dev] rte_eal_init() error when using ovs-dpdk with secondary application.

2017-06-15 Thread Junguk Cho
Hi,

I use ovs-dpdk (ovs-2.7, dpdk-16.11.1) with one application which talks to
ovs by using ring device and "--proc-type=secondary" (secondary processes).
It generally works well, but sometimes it shows this error.

It seems it could not find correct memory mapping in the application.

2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: Probing VFIO support...
2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL: WARNING: Address Space
Layout Randomization (ASLR) is enabled in the kernel.
2017-06-13 11:21:40.988 STDERR Thread-2 [INFO] EAL:This may cause
issues with mapping memory into secondary processes
2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: Could not mmap
17146314752 <(714)%20631-4752> bytes in /dev/zero at [0x7f662c80], got
[0x7f650600] - please use '--base-virtaddr' option
2017-06-13 11:21:40.989 STDERR Thread-2 [INFO] EAL: It is recommended to
disable ASLR in the kernel and retry running both primary and secondary
processes
2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] PANIC in rte_eal_init():
2017-06-13 11:21:40.990 STDERR Thread-2 [INFO] Cannot init memory

I tried to disable ASLR, but it did not work.

Does anyone have a similar situation?
Based on this page (https://github.com/collectd/collectd/issues/2284), this
is known issue and it seemed we can avoid this with "--base-virtaddr"
option.
Does anyone know how to use this option in ovs-dpdk?

Thanks,
Junguk
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Stokes, Ian
> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Thursday, June 15, 2017 9:45 AM
> To: Stokes, Ian ; d...@openvswitch.org; Darrell Ball
> 
> Cc: Heetae Ahn ; Daniele Di Proietto
> ; Ben Pfaff ; Pravin Shelar
> ; Loftus, Ciara ; Kevin Traynor
> 
> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on
> pmd-cpu-mask changes.
> 
> On 15.06.2017 11:02, Stokes, Ian wrote:
> >> Reconfiguration of HW NICs may lead to packet drops.
> >> In current model all physical ports will be reconfigured each time
> >> number of PMD threads changed. Since we not stopping threads on
> >> pmd-cpu-mask changes, this patch will help to further decrease port's
> >> downtime by setting the maximum possible number of wanted tx queues
> >> to avoid unnecessary reconfigurations.
> >
> > Hi Ilya,
> >
> > Thanks for the patch, in testing I can confirm it resolves the issue. A
> few observations and comments below.
> 
> Hi Ian. Thanks for testing.
> 
> >
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  lib/dpif-netdev.c | 26 +-
> >>  1 file changed, 21 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >> 596d133..79db770
> >> 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
> >>  struct dp_netdev_pmd_thread *pmd;
> >>  struct dp_netdev_port *port;
> >> -int wanted_txqs;
> >> +int needed_txqs, wanted_txqs;
> >>
> >>  dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
> >>
> >> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
> >>   * on the system and the user configuration. */
> >>  reconfigure_pmd_threads(dp);
> >>
> >> -wanted_txqs = cmap_count(>poll_threads);
> >> +/* We need 1 Tx queue for each thread to avoid locking, but we
> >> + will
> >> try
> >> + * to allocate the maximum possible value to minimize the number
> >> + of
> >> port
> >> + * reconfigurations. */
> >> +needed_txqs = cmap_count(>poll_threads);
> >> +/* (n_cores + 1) is the maximum that we might need to have.
> >> + * Additional queue is for non-PMD threads. */
> >> +wanted_txqs = ovs_numa_get_n_cores();
> >> +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
> >> +wanted_txqs++;
> >
> > So just after this line there is a call
> >
> > HMAP_FOR_EACH (port, node, >ports) {
> > netdev_set_tx_multiq(port->netdev, wanted_txqs);
> > }
> >
> > My initial concern with this patch was twofold:
> >
> > (i) What would happen if the number of cores was greater than the number
> of supported queues.
> >
> > That's not an issue from what I can see as the requested_txqs will be
> compared to what's supported by the device and the smaller of the two will
> be chosen in the eventual call to dpdk_eth_dev_init():
> >
> > n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >
> > (ii) What happens when a device reports the incorrect number of max tx
> queues? (can occur depending on the mode of the device).
> >
> > I think this case is ok as well as it's handled by the existing txq
> retry logic in dpdk_eth_dev_queue_setup().
> 
> Actually these concerns are not about this patch. Such situations are
> possible on current master. But you're right, both of them handled
> properly with existing code according to conventions between dpif-netdev
> and netdev layers.

Sure, they are not direct concerns as they could be reproduced on master as is, 
but moving to using core count can potentially cause the number of txqs 
requests to be larger than what's supported, for example with hyper threading a 
system can appear with 64 cores & will require 65 txqs queues, depending on the 
DPDK device and mode this can increase the chance of requesting more txqs than 
the device supports (instead of the current model of requesting the number of 
txqs as the number polling threads). 

Either way the code is in place to handle such a situation so I think it's fine.
> 
> >>  /* The number of pmd threads might have changed, or a port can
> >> be
> >> new:
> >>   * adjust the txqs. */
> >> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> >>
> >>  /* Check for all the ports that need reconfiguration.  We cache
> >> this in
> >>   * 'port->reconfigure', because netdev_is_reconf_required() can
> >> change at
> >> - * any time. */
> >> + * any time.
> >> + * Also mark for reconfiguration all ports which will likely
> >> + change
> >> their
> >> + * 'dynamic_txqs' parameter. It's required to stop using them
> before
> >> + * changing this setting and it's simpler to mark ports here and
> >> allow
> >> + * 'pmd_remove_stale_ports' to remove them from 

Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Ilya Maximets
On 15.06.2017 11:02, Stokes, Ian wrote:
>> Reconfiguration of HW NICs may lead to packet drops.
>> In current model all physical ports will be reconfigured each time number
>> of PMD threads changed. Since we not stopping threads on pmd-cpu-mask
>> changes, this patch will help to further decrease port's downtime by
>> setting the maximum possible number of wanted tx queues to avoid
>> unnecessary reconfigurations.
> 
> Hi Ilya,
> 
> Thanks for the patch, in testing I can confirm it resolves the issue. A few 
> observations and comments below.

Hi Ian. Thanks for testing.

> 
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  lib/dpif-netdev.c | 26 +-
>>  1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 596d133..79db770
>> 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
>>  struct dp_netdev_pmd_thread *pmd;
>>  struct dp_netdev_port *port;
>> -int wanted_txqs;
>> +int needed_txqs, wanted_txqs;
>>
>>  dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
>>
>> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>>   * on the system and the user configuration. */
>>  reconfigure_pmd_threads(dp);
>>
>> -wanted_txqs = cmap_count(>poll_threads);
>> +/* We need 1 Tx queue for each thread to avoid locking, but we will
>> try
>> + * to allocate the maximum possible value to minimize the number of
>> port
>> + * reconfigurations. */
>> +needed_txqs = cmap_count(>poll_threads);
>> +/* (n_cores + 1) is the maximum that we might need to have.
>> + * Additional queue is for non-PMD threads. */
>> +wanted_txqs = ovs_numa_get_n_cores();
>> +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>> +wanted_txqs++;
> 
> So just after this line there is a call
> 
> HMAP_FOR_EACH (port, node, >ports) {
> netdev_set_tx_multiq(port->netdev, wanted_txqs);
> }
> 
> My initial concern with this patch was twofold:
> 
> (i) What would happen if the number of cores was greater than the number of 
> supported queues.
> 
> That's not an issue from what I can see as the requested_txqs will be 
> compared to what's supported by the device and the smaller of the two will be 
> chosen in the eventual call to dpdk_eth_dev_init():
> 
> n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> 
> (ii) What happens when a device reports the incorrect number of max tx 
> queues? (can occur depending on the mode of the device).
> 
> I think this case is ok as well as it's handled by the existing txq retry 
> logic in dpdk_eth_dev_queue_setup(). 

Actually these concerns are not about this patch. Such situations are possible
on current master. But you're right, both of them handled properly with
existing code according to conventions between dpif-netdev and netdev layers.

>>  /* The number of pmd threads might have changed, or a port can be
>> new:
>>   * adjust the txqs. */
>> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)
>>
>>  /* Check for all the ports that need reconfiguration.  We cache this
>> in
>>   * 'port->reconfigure', because netdev_is_reconf_required() can
>> change at
>> - * any time. */
>> + * any time.
>> + * Also mark for reconfiguration all ports which will likely change
>> their
>> + * 'dynamic_txqs' parameter. It's required to stop using them before
>> + * changing this setting and it's simpler to mark ports here and
>> allow
>> + * 'pmd_remove_stale_ports' to remove them from threads. There will
>> be
>> + * no actual reconfiguration in 'port_reconfigure' because it's
>> + * unnecessary.  */
>>  HMAP_FOR_EACH (port, node, >ports) {
>> -if (netdev_is_reconf_required(port->netdev)) {
>> +if (netdev_is_reconf_required(port->netdev)
>> +|| (port->dynamic_txqs
>> +!=  netdev_n_txq(port->netdev) < needed_txqs)) {
> 
> GCC caught the following
> 
> lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison in 
> operand of '!=' [-Werror=parentheses]
>  !=  netdev_n_txq(port->netdev) < needed_txqs)) {

It's unnecessary because equality operators has higher priority than
relational, but I can add parentheses if compiler complains.

What version of GCC you're using? I didn't see such warnings on my
systems.

>>  port->need_reconfigure = true;
>>  }
>>  }
>> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>>  seq_change(dp->port_seq);
>>  port_destroy(port);
>>  } else {
>> -port->dynamic_txqs = netdev_n_txq(port->netdev) <
>> wanted_txqs;
>> +port->dynamic_txqs = netdev_n_txq(port->netdev) <
>> + needed_txqs;
>>  }
>>  }
>>
>> --
>> 2.7.4
___
dev mailing list

[ovs-dev] [PATCH v1 6/6] ovn-northd: Add logical flows to support native IPv6 RA

2017-06-15 Thread nusiddiq
From: Zongkai LI 

This patch adds logical flows which sends IPv6 Router Advertisement
packet in response to the IPv6 Router Solicitation request. It uses
the actions "put_nd_ra_opts" to transform the RS packet to RA packet
in the newly added ingress stage "lr_in_nd_ra_options" in router
pipeline. If the action "put_nd_ra_opts" is successful, it sends the
RA packet back to the originating port in the next ingress stage
"lr_in_nd_ra_response".

A new column "ipv6_ra_configs" is added in the Logical_Router_Port
table, which the CMS is expected to configure IPv6 RA
configurations - "address_mode" and "mtu" for adding these flows.

Co-authored-by: Numan Siddique 
Signed-off-by: Zongkai LI 
Signed-off-by: Numan Siddique 
---
 ovn/northd/ovn-northd.8.xml |  80 ++-
 ovn/northd/ovn-northd.c | 125 +++
 ovn/ovn-nb.ovsschema|   7 +-
 ovn/ovn-nb.xml  |  39 
 tests/ovn.at| 237 
 5 files changed, 465 insertions(+), 23 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 7ff5245..92c593c 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -1573,7 +1573,79 @@ icmp4 {
   
 
 
-Ingress Table 5: IP Routing
+Ingress Table 5: IPv6 ND RA option processing
+
+
+  
+
+  A priority-50 logical flow is added for each logical router port
+  configured with IPv6 ND RA options which matches IPv6 ND Router
+  Solicitation packet and applies the action 
put_nd_ra_opts
+  and advances the packet to the next table.
+
+
+
+reg0[5] = put_nd_ra_opts(options);next;
+
+
+
+  For a valid IPv6 ND RS packet, this transforms the packet into an
+  IPv6 ND RA reply and sets the RA options to the packet and stores 1
+  into reg0[5]. For other kinds of packets, it just stores 0 into
+  reg0[5]. Either way, it continues to the next table.
+
+  
+
+  
+A priority-0 logical flow with match 1 has actions
+next;.
+  
+
+
+Ingress Table 6: IPv6 ND RA responder
+
+
+  This table implements IPv6 ND RA responder for the IPv6 ND RA replies
+  generated by the previous table.
+
+
+
+  
+
+  A priority-50 logical flow is added for each logical router port
+  configured with IPv6 ND RA options which matches IPv6 ND RA
+  packets and reg0[5] == 1 and responds back to the
+  inport after applying these actions.
+  If reg0[5] is set to 1, it means that the action
+  put_nd_ra_opts was successful.
+
+
+
+eth.src = E;
+ip6.src = I;
+outport = P;
+flags.loopback = 1;
+output;
+
+
+
+  where E is the MAC address and I is the IPv6
+  link local address of the logical router port.
+
+
+
+  (This terminates packet processing in ingress pipeline; the packet
+  does not go to the next ingress table.)
+
+  
+
+  
+A priority-0 logical flow with match 1 has actions
+next;.
+  
+
+
+Ingress Table 7: IP Routing
 
 
   A packet that arrives at this table is an IP packet that should be
@@ -1675,7 +1747,7 @@ next;
   
 
 
-Ingress Table 6: ARP/ND Resolution
+Ingress Table 8: ARP/ND Resolution
 
 
   Any packet that reaches this table is an IP packet whose next-hop
@@ -1768,7 +1840,7 @@ next;
   
 
 
-Ingress Table 7: Gateway Redirect
+Ingress Table 9: Gateway Redirect
 
 
   For distributed logical routers where one of the logical router
@@ -1825,7 +1897,7 @@ next;
   
 
 
-Ingress Table 8: ARP Request
+Ingress Table 10: ARP Request
 
 
   In the common case where the Ethernet destination has been resolved, this
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0a1d652..4c29b61 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -128,15 +128,17 @@ enum ovn_stage {
 PIPELINE_STAGE(SWITCH, OUT, PORT_SEC_L2,  8, "ls_out_port_sec_l2")\
   \
 /* Logical router ingress stages. */  \
-PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission")\
-PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,1, "lr_in_ip_input") \
-PIPELINE_STAGE(ROUTER, IN,  DEFRAG,  2, "lr_in_defrag")   \
-PIPELINE_STAGE(ROUTER, IN,  UNSNAT,  3, "lr_in_unsnat")   \
-PIPELINE_STAGE(ROUTER, IN,  DNAT,4, "lr_in_dnat") \
-PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  5, "lr_in_ip_routing")   \
-PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 6, "lr_in_arp_resolve")  \
-PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT, 7, 

[ovs-dev] [PATCH v1 5/6] ovn-controller: Add a new action - 'put_nd_ra_opts'

2017-06-15 Thread nusiddiq
From: Numan Siddique 

This patch adds a new OVN action 'put_nd_ra_opts' to support native
IPv6 Router Advertisement in OVN. This action can be used to respond
to the IPv6 Router Solicitation requests.

ovn-controller parses this action and adds a NXT_PACKET_IN2 OF flow
with 'pause' flag set and the RA options stored in 'userdata' field.
This action is similar to 'put_dhcp_opts' and 'put_dhcpv6_opts'.

When a valid IPv6 RS packet is received by the pinctrl module of
ovn-controller, it frames a new RA packet and sets the RA options
from the 'userdata' field and resumes the packet storing 1 in the
1-bit result sub-field. If the packet is invalid, it resumes the
packet without any modifications storing 0 in the 1-bit result
sub-field.

Eg. reg0[5] = put_nd_ra_opts(address_mode = "slaac", mtu = 1450,
 slla = 01:02:03:04:05:06, prefix = aef0::/64)

Co-authored-by: Zongkai LI 
Signed-off-by: Zongkai LI 
Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h |  14 +++-
 ovn/controller/lflow.c|  11 +++-
 ovn/controller/pinctrl.c  | 144 +
 ovn/lib/actions.c | 161 +-
 ovn/lib/ovn-l7.h  |  55 
 ovn/ovn-sb.xml|  74 +
 ovn/utilities/ovn-trace.c |  46 +
 tests/ovn.at  |  31 +
 tests/test-ovn.c  |  13 +++-
 9 files changed, 529 insertions(+), 20 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index de55d88..d15c13b 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -71,7 +71,8 @@ struct simap;
 OVNACT(PUT_DHCPV4_OPTS, ovnact_put_opts)   \
 OVNACT(PUT_DHCPV6_OPTS, ovnact_put_opts)   \
 OVNACT(SET_QUEUE,   ovnact_set_queue)   \
-OVNACT(DNS_LOOKUP,  ovnact_dns_lookup)
+OVNACT(DNS_LOOKUP,  ovnact_dns_lookup)  \
+OVNACT(PUT_ND_RA_OPTS,  ovnact_put_opts)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -400,6 +401,14 @@ enum action_opcode {
  *
  */
 ACTION_OPCODE_DNS_LOOKUP,
+
+/* "result = put_nd_ra_opts(option, ...)".
+ * Arguments follow the action_header, in this format:
+ *   - A 32-bit or 64-bit OXM header designating the result field.
+ *   - A 32-bit integer specifying a bit offset within the result field.
+ *   - Any number of ICMPv6 options.
+ */
+ACTION_OPCODE_PUT_ND_RA_OPTS,
 };
 
 /* Header. */
@@ -420,6 +429,9 @@ struct ovnact_parse_params {
 /* hmap of 'struct gen_opts_map'  to support 'put_dhcpv6_opts' action */
 const struct hmap *dhcpv6_opts;
 
+/* hmap of 'struct gen_opts_map' to support 'put_nd_ra_opts' action */
+const struct hmap *nd_ra_opts;
+
 /* Each OVN flow exists in a logical table within a logical pipeline.
  * These parameters express this context for a set of OVN actions being
  * parsed:
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index c4fe5f9..fce1321 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -63,6 +63,7 @@ static void consider_logical_flow(const struct lport_index 
*lports,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
+  struct hmap *nd_ra_opts,
   uint32_t *conj_id_ofs,
   const struct shash *addr_sets,
   struct hmap *flow_table);
@@ -143,15 +144,19 @@ add_logical_flows(struct controller_ctx *ctx, const 
struct lport_index *lports,
 dhcpv6_opt_row->type);
 }
 
+struct hmap nd_ra_opts = HMAP_INITIALIZER(_ra_opts);
+nd_ra_opts_init(_ra_opts);
+
 SBREC_LOGICAL_FLOW_FOR_EACH (lflow, ctx->ovnsb_idl) {
 consider_logical_flow(lports, mcgroups, lflow, local_datapaths,
   group_table, chassis,
-  _opts, _opts, _id_ofs,
-  addr_sets, flow_table);
+  _opts, _opts, _ra_opts,
+  _id_ofs, addr_sets, flow_table);
 }
 
 dhcp_opts_destroy(_opts);
 dhcp_opts_destroy(_opts);
+nd_ra_opts_destroy(_ra_opts);
 }
 
 static void
@@ -163,6 +168,7 @@ consider_logical_flow(const struct lport_index *lports,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
+  struct hmap *nd_ra_opts,
   uint32_t *conj_id_ofs,
   const struct shash *addr_sets,
   struct hmap *flow_table)
@@ -196,6 +202,7 @@ consider_logical_flow(const 

[ovs-dev] [PATCH v1 4/6] ovn util: Refactor dhcp_opts_map to make it generic

2017-06-15 Thread nusiddiq
From: Numan Siddique 

Renamed 'struct dhcp_opts_map' to 'struct gen_opts_map' and
renamed ovn-dhcp.h to ovn-l7.h. An upcoming commit to support IPv6
Router Advertisement, will make use of the refactored code to store
the IPv6 ND RA options in 'struct gen_opts_map'.

Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h|  16 +++
 ovn/controller/lflow.c   |   2 +-
 ovn/controller/pinctrl.c |   2 +-
 ovn/lib/actions.c| 100 ++-
 ovn/lib/automake.mk  |   2 +-
 ovn/lib/{ovn-dhcp.h => ovn-l7.h} |  67 ++
 ovn/northd/ovn-northd.c  |  14 +++---
 ovn/utilities/ovn-trace.c|  10 ++--
 tests/test-ovn.c |   3 +-
 9 files changed, 127 insertions(+), 89 deletions(-)
 rename ovn/lib/{ovn-dhcp.h => ovn-l7.h} (78%)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 9e4a5c5..de55d88 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -68,8 +68,8 @@ struct simap;
 OVNACT(PUT_ARP,   ovnact_put_mac_bind)  \
 OVNACT(GET_ND,ovnact_get_mac_bind)  \
 OVNACT(PUT_ND,ovnact_put_mac_bind)  \
-OVNACT(PUT_DHCPV4_OPTS, ovnact_put_dhcp_opts)   \
-OVNACT(PUT_DHCPV6_OPTS, ovnact_put_dhcp_opts)   \
+OVNACT(PUT_DHCPV4_OPTS, ovnact_put_opts)   \
+OVNACT(PUT_DHCPV6_OPTS, ovnact_put_opts)   \
 OVNACT(SET_QUEUE,   ovnact_set_queue)   \
 OVNACT(DNS_LOOKUP,  ovnact_dns_lookup)
 
@@ -233,16 +233,16 @@ struct ovnact_put_mac_bind {
 struct expr_field mac;  /* 48-bit Ethernet address. */
 };
 
-struct ovnact_dhcp_option {
-const struct dhcp_opts_map *option;
+struct ovnact_gen_option {
+const struct gen_opts_map *option;
 struct expr_constant_set value;
 };
 
 /* OVNACT_PUT_DHCPV4_OPTS, OVNACT_PUT_DHCPV6_OPTS. */
-struct ovnact_put_dhcp_opts {
+struct ovnact_put_opts {
 struct ovnact ovnact;
 struct expr_field dst;  /* 1-bit destination field. */
-struct ovnact_dhcp_option *options;
+struct ovnact_gen_option *options;
 size_t n_options;
 };
 
@@ -414,10 +414,10 @@ struct ovnact_parse_params {
  * expr_parse()). */
 const struct shash *symtab;
 
-/* hmap of 'struct dhcp_opts_map' to support 'put_dhcp_opts' action */
+/* hmap of 'struct gen_opts_map' to support 'put_dhcp_opts' action */
 const struct hmap *dhcp_opts;
 
-/* hmap of 'struct dhcp_opts_map'  to support 'put_dhcpv6_opts' action */
+/* hmap of 'struct gen_opts_map'  to support 'put_dhcpv6_opts' action */
 const struct hmap *dhcpv6_opts;
 
 /* Each OVN flow exists in a logical table within a logical pipeline.
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index b1b4b23..c4fe5f9 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -24,7 +24,7 @@
 #include "ovn-controller.h"
 #include "ovn/actions.h"
 #include "ovn/expr.h"
-#include "ovn/lib/ovn-dhcp.h"
+#include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-sb-idl.h"
 #include "packets.h"
 #include "physical.h"
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 660233a..7239669 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -39,7 +39,7 @@
 #include "ovn/actions.h"
 #include "ovn/lex.h"
 #include "ovn/lib/logical-fields.h"
-#include "ovn/lib/ovn-dhcp.h"
+#include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-util.h"
 #include "poll-loop.h"
 #include "rconn.h"
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 937f94d..be82165 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -20,7 +20,7 @@
 #include "bitmap.h"
 #include "byte-order.h"
 #include "compiler.h"
-#include "ovn-dhcp.h"
+#include "ovn-l7.h"
 #include "hash.h"
 #include "logical-fields.h"
 #include "nx-match.h"
@@ -1373,19 +1373,17 @@ ovnact_put_mac_bind_free(struct ovnact_put_mac_bind 
*put_mac OVS_UNUSED)
 }
 
 static void
-parse_dhcp_opt(struct action_context *ctx, struct ovnact_dhcp_option *o,
-   bool v6)
+parse_gen_opt(struct action_context *ctx, struct ovnact_gen_option *o,
+  const struct hmap *gen_opts, const char *opts_type)
 {
 if (ctx->lexer->token.type != LEX_T_ID) {
 lexer_syntax_error(ctx->lexer, NULL);
 return;
 }
 
-const char *name = v6 ? "DHCPv6" : "DHCPv4";
-const struct hmap *map = v6 ? ctx->pp->dhcpv6_opts : ctx->pp->dhcp_opts;
-o->option = map ? dhcp_opts_find(map, ctx->lexer->token.s) : NULL;
+o->option = gen_opts ? gen_opts_find(gen_opts, ctx->lexer->token.s) : NULL;
 if (!o->option) {
-lexer_syntax_error(ctx->lexer, "expecting %s option name", name);
+lexer_syntax_error(ctx->lexer, "expecting %s option name", opts_type);
 return;
 }
 lexer_get(ctx->lexer);
@@ -1402,22 +1400,22 @@ parse_dhcp_opt(struct action_context *ctx, struct 
ovnact_dhcp_option *o,
 if (!strcmp(o->option->type, "str")) {
 

[ovs-dev] [PATCH v1 3/6] ovn: extend expr symbols for ND

2017-06-15 Thread nusiddiq
From: Zong Kai LI 

This patch updates ND symbols in logical-fields - "nd", "nd.target",
"nd.sll" and "nd.tll" to describe more clear about "icmp6.type"
predicate.

It adds new symbols:
 - "nd_rs" - to match Router Solicitation messages
 - "nd_ra" - to match Router Advertisement messages

Co-authored-by: Numan Siddique 
Signed-off-by: Zongkai LI 
Signed-off-by: Numan Siddique 
---
 ovn/lib/logical-fields.c | 18 ++
 ovn/northd/ovn-northd.c  | 10 ++
 ovn/ovn-sb.xml   |  4 +++-
 tests/ovn.at |  2 +-
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
index 26e336f..f8837f2 100644
--- a/ovn/lib/logical-fields.c
+++ b/ovn/lib/logical-fields.c
@@ -178,14 +178,24 @@ ovn_init_symtab(struct shash *symtab)
 expr_symtab_add_field(symtab, "arp.tha", MFF_ARP_THA, "arp", false);
 
 expr_symtab_add_predicate(symtab, "nd",
-  "icmp6.type == {135, 136} && icmp6.code == 0 && ip.ttl == 255");
+  "icmp6.type == {133, 134, 135, 136} "
+  "&& icmp6.code == 0 && ip.ttl == 255");
+expr_symtab_add_predicate(symtab, "nd_rs",
+  "icmp6.type == 133 && icmp6.code == 0 && ip.ttl == 255");
+expr_symtab_add_predicate(symtab, "nd_ra",
+  "icmp6.type == 134 && icmp6.code == 0 && ip.ttl == 255");
 expr_symtab_add_predicate(symtab, "nd_ns",
   "icmp6.type == 135 && icmp6.code == 0 && ip.ttl == 255");
 expr_symtab_add_predicate(symtab, "nd_na",
   "icmp6.type == 136 && icmp6.code == 0 && ip.ttl == 255");
-expr_symtab_add_field(symtab, "nd.target", MFF_ND_TARGET, "nd", false);
-expr_symtab_add_field(symtab, "nd.sll", MFF_ND_SLL, "nd_ns", false);
-expr_symtab_add_field(symtab, "nd.tll", MFF_ND_TLL, "nd_na", false);
+expr_symtab_add_field(symtab, "nd.target", MFF_ND_TARGET,
+  "icmp6.type == {135, 136} "
+  "&& icmp6.code == 0 && ip.ttl == 255", false);
+expr_symtab_add_field(symtab, "nd.sll", MFF_ND_SLL,
+  "icmp6.type == {133, 134, 135} "
+  "&& icmp6.code == 0 && ip.ttl == 255", false);
+expr_symtab_add_field(symtab, "nd.tll", MFF_ND_TLL,
+  "icmp6.type == 136 && icmp6.code == 0 && ip.ttl == 255", false);
 
 expr_symtab_add_predicate(symtab, "tcp", "ip.proto == 6");
 expr_symtab_add_field(symtab, "tcp.src", MFF_TCP_SRC, "tcp", false);
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index be3b371..b9a4b5e 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -2140,9 +2140,10 @@ build_port_security_ipv6_nd_flow(
 struct ds *match, struct eth_addr ea, struct ipv6_netaddr *ipv6_addrs,
 int n_ipv6_addrs)
 {
-ds_put_format(match, " && ip6 && nd && ((nd.sll == "ETH_ADDR_FMT" || "
-  "nd.sll == "ETH_ADDR_FMT") || ((nd.tll == "ETH_ADDR_FMT" || "
-  "nd.tll == "ETH_ADDR_FMT")", ETH_ADDR_ARGS(eth_addr_zero),
+ds_put_format(match, " && (nd_ns || nd_na) && ((nd.sll == "ETH_ADDR_FMT
+  " || nd.sll == "ETH_ADDR_FMT") || ((nd.tll == "ETH_ADDR_FMT
+  " || nd.tll == "ETH_ADDR_FMT")",
+  ETH_ADDR_ARGS(eth_addr_zero),
   ETH_ADDR_ARGS(ea), ETH_ADDR_ARGS(eth_addr_zero),
   ETH_ADDR_ARGS(ea));
 if (!n_ipv6_addrs) {
@@ -2270,7 +2271,8 @@ build_port_security_nd(struct ovn_port *op, struct hmap 
*lflows)
 }
 
 ds_clear();
-ds_put_format(, "inport == %s && (arp || nd)", op->json_key);
+ds_put_format(, "inport == %s && (arp || nd_ns || nd_na)",
+  op->json_key);
 ovn_lflow_add(lflows, op->od, S_SWITCH_IN_PORT_SEC_ND, 80,
   ds_cstr(), "drop;");
 ds_destroy();
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index b22d1ac..db33c31 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -905,7 +905,9 @@
 ip.later_frag expands to ip.frag[1]
 ip.first_frag expands to ip.is_frag  
!ip.later_frag
 arp expands to eth.type == 0x806
-nd expands to icmp6.type == {135, 136} 
 icmp6.code == 0  ip.ttl == 255
+nd expands to icmp6.type == {133, 134, 135, 
136}  icmp6.code == 0  ip.ttl == 255
+nd_rs expands to icmp6.type == 133  
icmp6.code == 0  ip.ttl == 255
+nd_ra expands to icmp6.type == 134  
icmp6.code == 0  ip.ttl == 255
 nd_ns expands to icmp6.type == 135  
icmp6.code == 0  ip.ttl == 255
 nd_na expands to icmp6.type == 136  
icmp6.code == 0  ip.ttl == 255
 tcp expands to ip.proto == 6
diff --git a/tests/ovn.at b/tests/ovn.at
index efcbd91..9133304 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -993,7 +993,7 @@ get_nd(xxreg0, ip6.dst);
 # put_nd
 put_nd(inport, nd.target, nd.sll);
 encodes as 

[ovs-dev] [PATCH v1 2/6] packets: Fix the reset dp_packet buffer issue in packet_put_ra_prefix_opt

2017-06-15 Thread nusiddiq
From: Numan Siddique 

packet_put_ra_prefix_opt() resets the dp_packet buffer incorrectly.

Fixes: b24ab67c2dfd ("packets: add compose_nd_ra")
Signed-off-by: Numan Siddique 
---
 lib/packets.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/packets.c b/lib/packets.c
index d51c91a..7a9071c 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1536,7 +1536,8 @@ packet_put_ra_prefix_opt(struct dp_packet *b,
 nh->ip6_plen = htons(prev_l4_size + ND_PREFIX_OPT_LEN);
 
 struct ovs_ra_msg *ra = dp_packet_l4(b);
-struct ovs_nd_prefix_opt *prefix_opt = dp_packet_put_uninit(b, sizeof *b);
+struct ovs_nd_prefix_opt *prefix_opt =
+dp_packet_put_uninit(b, sizeof *prefix_opt);
 prefix_opt->type = ND_OPT_PREFIX_INFORMATION;
 prefix_opt->len = 4;
 prefix_opt->prefix_len = plen;
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 1/6] ofproto-dpif: Fix the clone issue with continuation

2017-06-15 Thread nusiddiq
From: Numan Siddique 

When the clone action is composed and if the inner clone actions
modifies any of the metadata fields, the updated values are not
preserved when xlate_clone() returns. This causes the controller
to receive invalid metadata if the cloned packet has a 'controller'
action with the 'pause' flag set. When the controller resumes the
packet the resumed packet will not continue properly.

This patch addresses this issue.

The issue can be reproduced running the test case added in this
patch without the fix.

Signed-off-by: Numan Siddique 
---
 ofproto/ofproto-dpif-xlate.c |  4 +++-
 tests/ofproto-dpif.at| 41 +
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index e15e3de..281737a 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -358,6 +358,7 @@ struct xlate_ctx {
 uint32_t dp_hash_alg;
 uint32_t dp_hash_basis;
 struct ofpbuf frozen_actions;
+struct flow paused_flow;
 const struct ofpact_controller *pause;
 
 /* True if a packet was but is no longer MPLS (due to an MPLS pop action).
@@ -4341,7 +4342,7 @@ emit_continuation(struct xlate_ctx *ctx, const struct 
frozen_state *state)
 .max_len = UINT16_MAX,
 },
 };
-flow_get_metadata(>xin->flow, >pin.up.public.flow_metadata);
+flow_get_metadata(>paused_flow, >pin.up.public.flow_metadata);
 
 /* Async messages are only sent once, so if we send one now, no
  * xlate cache entry is created.  */
@@ -5614,6 +5615,7 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t 
ofpacts_len,
 if (controller->pause) {
 ctx->pause = controller;
 ctx->xout->slow |= SLOW_CONTROLLER;
+ctx->paused_flow = ctx->xin->flow;
 ctx_trigger_freeze(ctx);
 a = ofpact_next(a);
 } else {
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index e222866..fe40e0e 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -4962,6 +4962,47 @@ CHECK_CONTINUATION([patch ports], [4], [1],
-- set interface patch11 type=patch options:peer=patch10 \
 ofport_request=11])
 
+
+# Check that pause works after the packet is cloned.
+AT_SETUP([ofproto-dpif - continuation after clone])
+AT_KEYWORDS([continuations clone pause resume])
+OVS_VSWITCHD_START
+
+add_of_ports --pcap br0 `seq 1 3`
+
+flow="in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)"
+
+AT_DATA([flows.txt], [dnl
+table=0 in_port=1 actions=set_field:1->reg1 resubmit(,2)
+table=1 reg1=0x5 actions=controller(pause) resubmit(,3)
+table=1 reg1=0x1 actions=2
+table=2 in_port=1 actions=clone(set_field:5->reg1, resubmit(,1))
+table=3 reg1=0x1 actions=3
+table=3 reg1=0x5 actions=2
+])
+
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor0.log])
+ovs-ofctl monitor br0 resume --detach --no-chdir \
+--pidfile=ovs-ofctl0.pid 2> ofctl_monitor0.log
+
+# Run a packet through the switch.
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 "$flow"], [0], [stdout])
+
+ovs-vsctl show
+ovs-ofctl dump-flows br0
+
+# The packet should be recieved by port 2 and not port 3
+AT_CHECK([test 1 = `$PYTHON "$top_srcdir/utilities/ovs-pcap.in" p2-tx.pcap | 
wc -l`])
+AT_CHECK([test 0 = `$PYTHON "$top_srcdir/utilities/ovs-pcap.in" p3-tx.pcap | 
wc -l`])
+
+# NXT_RESUMEs should be 1 and reg1 should be set to 0x5.
+OVS_WAIT_UNTIL([test 1 = `cat ofctl_monitor*.log | grep NXT_RESUME | grep -c 
reg1=0x5`])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 # Two testcases below are for the ofproto/trace command
 # The first one tests all correct syntax:
 # ofproto/trace [dp_name] odp_flow [-generate|packet]
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 0/6] ovn: Add IPv6 Router Solicitation responder support

2017-06-15 Thread nusiddiq
From: Numan Siddique 

This patch series supports IPv6 Router solicitation responder support.
This patch series picks up from the patches from here [1] and reset the version
to 1 since few of the patches were accepted.

In this new series continuations are used to support this feature.
When IPv6 RS packet is received, it is sent to ovn-controller which applies
the action 'put_nd_ra_opts' to transform it to an IPv6 RA packet and resumes
the packet. ovn-northd adds the necessary flows to respond the RA packet
back to the originating port.

[1] - https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/332008.html

Numan Siddique (4):
  ofproto-dpif: Fix the clone issue with continuation
  packets: Fix the reset dp_packet buffer issue in
packet_put_ra_prefix_opt
  ovn util: Refactor dhcp_opts_map to make it generic
  ovn-controller: Add a new action - 'put_nd_ra_opts'

Zong Kai LI (2):
  ovn: extend expr symbols for ND
  ovn-northd: Add logical flows to support native IPv6 RA

 include/ovn/actions.h|  30 +++--
 lib/packets.c|   3 +-
 ofproto/ofproto-dpif-xlate.c |   4 +-
 ovn/controller/lflow.c   |  13 +-
 ovn/controller/pinctrl.c | 146 -
 ovn/lib/actions.c| 261 ++---
 ovn/lib/automake.mk  |   2 +-
 ovn/lib/logical-fields.c |  18 ++-
 ovn/lib/{ovn-dhcp.h => ovn-l7.h} | 122 +++---
 ovn/northd/ovn-northd.8.xml  |  80 +++-
 ovn/northd/ovn-northd.c  | 149 +
 ovn/ovn-nb.ovsschema |   7 +-
 ovn/ovn-nb.xml   |  39 ++
 ovn/ovn-sb.xml   |  78 ++-
 ovn/utilities/ovn-trace.c|  52 +---
 tests/ofproto-dpif.at|  41 ++
 tests/ovn.at | 270 ++-
 tests/test-ovn.c |  16 ++-
 18 files changed, 1189 insertions(+), 142 deletions(-)
 rename ovn/lib/{ovn-dhcp.h => ovn-l7.h} (61%)

-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental addition/deletion of PMD threads.

2017-06-15 Thread Stokes, Ian
> >From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >Sent: Monday, June 5, 2017 9:18 AM
> >To: Kavanagh, Mark B ; d...@openvswitch.org;
> >Darrell Ball 
> >Cc: Heetae Ahn ; Daniele Di Proietto
> >; Ben Pfaff ; Pravin Shelar
> >; Loftus, Ciara ; Stokes, Ian
> >; Kevin Traynor 
> >Subject: Re: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental
> >addition/deletion of PMD threads.
> >
> >Thanks for testing, Mark.
> >
> >Comments inline.
> >
> >Best regards, Ilya Maximets.
> >
> >On 01.06.2017 17:30, Kavanagh, Mark B wrote:
> >>> From: ovs-dev-boun...@openvswitch.org
> >>> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf
> >Of
> >>> Ilya Maximets
> >>> Sent: Tuesday, May 30, 2017 3:12 PM
> >>> To: d...@openvswitch.org; Darrell Ball 
> >>> Cc: Heetae Ahn ; Ilya Maximets
> >>> 
> >>> Subject: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Incremental
> >>> addition/deletion of PMD
> >threads.
> >>>
> >>> Currently, change of 'pmd-cpu-mask' is very heavy operation.
> >>> It requires destroying of all the PMD threads and creating them
> >>> back. After that, all the threads will sleep until ports'
> >>> redistribution finished.
> >>>
> >>> This patch adds ability to not stop the datapath while adjusting
> >>> number/placement of PMD threads. All not affected threads will
> >>> forward traffic without any additional latencies.
> >>>
> >>> id-pool created for static tx queue ids to keep them sequential in a
> >>> flexible way. non-PMD thread will always have static_tx_qid = 0 as
> >>> it was before.
> >>>
> >>> Signed-off-by: Ilya Maximets 
> >>
> >> Hi Ilya,
> >>
> >> Patch LGTM - just once comment inline.
> >>
> >> I conducted some light testing and observed the following:
> >> - temporary dip in forwarding rate (~50%) when modifying
> >> pmd-cpu-mask (in previous
> >implementation, forwarding stopped completely)
> >> - occasional drop to zero in forwarding rate when modifying
> pmd-cpu-mask (e.g.
> >change from '2' to '12')
> >
> >I guess, drop to zero in this case could be fixed by second patch of
> >the series. (If it caused by port reconfiguration)
> 
> Yes, that's true - my colleague, Ian Stokes, will review and test that
> patch shortly.

Just a follow up here, I've tested the avoid port reconfiguration patch and it 
removes the drop caused by reconfiguration.

https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334123.html

Ian
> 
> >
> >> - temporary drop of ~1mpps when adding/deleting port (similar to
> >> existing
> >implementation)
> >>
> >> I've also conducted the following sanity checks on the patch, and found
> no issues:
> >>  - checkpatch.py
> >>  - compilation with clang
> >>  - compilation with gcc
> >>  - sparse
> >>  - make check (no additional tests fail after application of this
> >> patch)
> >>
> >> Tested-by: Mark Kavanagh 
> >>
> >> Thanks,
> >> Mark
> >>
> >>> ---
> >>> lib/dpif-netdev.c | 143 +++---
> 
> >>> tests/pmd.at  |   2 +-
> >>> 2 files changed, 105 insertions(+), 40 deletions(-)
> >>>
> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >>> b50164b..596d133 100644
> >>> --- a/lib/dpif-netdev.c
> >>> +++ b/lib/dpif-netdev.c
> >>> @@ -48,6 +48,7 @@
> >>> #include "fat-rwlock.h"
> >>> #include "flow.h"
> >>> #include "hmapx.h"
> >>> +#include "id-pool.h"
> >>> #include "latch.h"
> >>> #include "netdev.h"
> >>> #include "netdev-vport.h"
> >>> @@ -277,6 +278,9 @@ struct dp_netdev {
> >>>
> >>> /* Stores all 'struct dp_netdev_pmd_thread's. */
> >>> struct cmap poll_threads;
> >>> +/* id pool for per thread static_tx_qid. */
> >>> +struct id_pool *tx_qid_pool;
> >>> +struct ovs_mutex tx_qid_pool_mutex;
> >>>
> >>> /* Protects the access of the 'struct dp_netdev_pmd_thread'
> >>>  * instance for non-pmd thread. */ @@ -562,7 +566,7 @@ struct
> >>> dp_netdev_pmd_thread {
> >>> /* Queue id used by this pmd thread to send packets on all netdevs
> if
> >>>  * XPS disabled for this netdev. All static_tx_qid's are unique
> and less
> >>>  * than 'cmap_count(dp->poll_threads)'. */
> >>> -const int static_tx_qid;
> >>> +uint32_t static_tx_qid;
> >>>
> >>> struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and
> 'tx_ports'. */
> >>> /* List of rx queues to poll. */ @@ -642,6 +646,8 @@ static
> >>> struct dp_netdev_pmd_thread *dp_netdev_get_pmd(struct dp_netdev *dp,
> >>>   unsigned
> >>> core_id); static struct dp_netdev_pmd_thread *
> >>> dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position
> >>> *pos);
> >>> +static void dp_netdev_del_pmd(struct dp_netdev *dp,
> >>> +   

[ovs-dev] 答复: [spam可疑邮件]Re: 答复: Re: [PATCH 2/2] ovn-northd: Fix ping failure of vlan networks.

2017-06-15 Thread wang . qianyu
Hi Russell, I am sorry for the late reply.
The route not bound to a chassis, and have no redirect-chassis. The dumped 
northbound db is as follow.
Ip addresses of 100.0.0.148 and 200.0.0.2 locate on different chassis. The 
ping between them is not success before this patch.


[root@tecs159 ~]# 
[root@tecs159 ~]# ovsdb-client dump 
unix:/var/run/openvswitch/ovnnb_db.sock
ACL table
_uuidactiondirection  external_ids 
log   match priority
 - -- 
 - 
--
 

ac2900f9-49fd-430a-b646-88d1f7c54ab8 allow from-lport 
{"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport == 
\"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip4 && ip4.dst == 
{255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67" 
1002 
784a55c3-05fd-4c4d-a51e-5b9ee5cc1e8e allow from-lport 
{"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport == 
\"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip4 && ip4.dst == 
{255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67" 
1002 
08be2532-f8ff-493f-83e3-085eede36e08 allow from-lport 
{"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport == 
\"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip4 && ip4.dst == 
{255.255.255.255, 100.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67" 
1002 
bb263947-a436-4a0d-9218-5abd89546a69 allow from-lport 
{"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport == 
\"f8de0603-f4ec-4546-a8f3-574640f270e8\" && ip4 && ip4.dst == 
{255.255.255.255, 200.0.0.0/24} && udp && udp.src == 68 && udp.dst == 67" 
1002 
092964cc-2ce5-4a34-b747-558006bb3de1 allow-related from-lport 
{"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport == 
\"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip4" 1002 
5f2ebb8e-edbc-40aa-ada6-2fc90fc104af allow-related from-lport 
{"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "inport == 
\"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip6" 1002 
13d32fab-0ed7-4472-97c2-1e3057eaca6e allow-related from-lport 
{"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport == 
\"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip4" 1002 
7fa4e0b0-ffce-436f-a20a-07b0584c3285 allow-related from-lport 
{"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "inport == 
\"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip6" 1002 
b32cf462-a8e5-4597-9c6e-4dc02ae2e2c4 allow-related from-lport 
{"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport == 
\"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip4" 1002 
4d003f24-f546-49fa-a33c-92384e4d3549 allow-related from-lport 
{"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "inport == 
\"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip6" 1002 
7078873a-fa44-4c64-be7f-067d19361fb4 allow-related from-lport 
{"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport == 
\"f8de0603-f4ec-4546-a8f3-574640f270e8\" && ip4" 1002 
a15bd032-9755-45a5-b7ea-9687b9d14560 allow-related from-lport 
{"neutron:lport"="f8de0603-f4ec-4546-a8f3-574640f270e8"} false "inport == 
\"f8de0603-f4ec-4546-a8f3-574640f270e8\" && ip6" 1002 
4ace5b98-e6dd-467c-a7cf-af5e76a258f5 allow-related to-lport 
{"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "outport == 
\"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip4 && ip4.src == 
$as_ip4_539bd583_ca35_4ae7_9774_299fd56765ef" 1002 
6e3453ee-a717-49fe-8160-ab304daa7bd8 allow-related to-lport 
{"neutron:lport"="1ef52eb4-1f0e-416d-8dc2-e2fc7557979c"} false "outport == 
\"1ef52eb4-1f0e-416d-8dc2-e2fc7557979c\" && ip6 && ip6.src == 
$as_ip6_539bd583_ca35_4ae7_9774_299fd56765ef" 1002 
cc1b88c5-e9d7-42fe-8e17-deb2fbc7c7a2 allow-related to-lport 
{"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "outport == 
\"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip4 && ip4.src == 
$as_ip4_539bd583_ca35_4ae7_9774_299fd56765ef" 1002 
ecb2798f-4a87-4260-b9a8-3cdea1eca638 allow-related to-lport 
{"neutron:lport"="6c04e45e-ad83-4cf0-ae74-84f7720a5bc4"} false "outport == 
\"6c04e45e-ad83-4cf0-ae74-84f7720a5bc4\" && ip6 && ip6.src == 
$as_ip6_539bd583_ca35_4ae7_9774_299fd56765ef" 1002 
71c56144-3b95-454a-acb4-67cd924dff08 allow-related to-lport 
{"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "outport == 
\"c5ff4f7b-bd0d-4757-ac18-636f9d62b94c\" && ip4 && ip4.src == 
$as_ip4_539bd583_ca35_4ae7_9774_299fd56765ef" 1002 
15766592-aa79-465b-8935-bbc916692b75 allow-related to-lport 
{"neutron:lport"="c5ff4f7b-bd0d-4757-ac18-636f9d62b94c"} false "outport == 

Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-06-15 Thread Stokes, Ian
> Reconfiguration of HW NICs may lead to packet drops.
> In current model all physical ports will be reconfigured each time number
> of PMD threads changed. Since we not stopping threads on pmd-cpu-mask
> changes, this patch will help to further decrease port's downtime by
> setting the maximum possible number of wanted tx queues to avoid
> unnecessary reconfigurations.

Hi Ilya,

Thanks for the patch, in testing I can confirm it resolves the issue. A few 
observations and comments below.

> 
> Signed-off-by: Ilya Maximets 
> ---
>  lib/dpif-netdev.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 596d133..79db770
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)  {
>  struct dp_netdev_pmd_thread *pmd;
>  struct dp_netdev_port *port;
> -int wanted_txqs;
> +int needed_txqs, wanted_txqs;
> 
>  dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
> 
> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>   * on the system and the user configuration. */
>  reconfigure_pmd_threads(dp);
> 
> -wanted_txqs = cmap_count(>poll_threads);
> +/* We need 1 Tx queue for each thread to avoid locking, but we will
> try
> + * to allocate the maximum possible value to minimize the number of
> port
> + * reconfigurations. */
> +needed_txqs = cmap_count(>poll_threads);
> +/* (n_cores + 1) is the maximum that we might need to have.
> + * Additional queue is for non-PMD threads. */
> +wanted_txqs = ovs_numa_get_n_cores();
> +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
> +wanted_txqs++;

So just after this line there is a call

HMAP_FOR_EACH (port, node, >ports) {
netdev_set_tx_multiq(port->netdev, wanted_txqs);
}

My initial concern with this patch was twofold:

(i) What would happen if the number of cores was greater than the number of 
supported queues.

That's not an issue from what I can see as the requested_txqs will be compared 
to what's supported by the device and the smaller of the two will be chosen in 
the eventual call to dpdk_eth_dev_init():

n_txq = MIN(info.max_tx_queues, dev->up.n_txq);

(ii) What happens when a device reports the incorrect number of max tx queues? 
(can occur depending on the mode of the device).

I think this case is ok as well as it's handled by the existing txq retry logic 
in dpdk_eth_dev_queue_setup(). 

> 
>  /* The number of pmd threads might have changed, or a port can be
> new:
>   * adjust the txqs. */
> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> 
>  /* Check for all the ports that need reconfiguration.  We cache this
> in
>   * 'port->reconfigure', because netdev_is_reconf_required() can
> change at
> - * any time. */
> + * any time.
> + * Also mark for reconfiguration all ports which will likely change
> their
> + * 'dynamic_txqs' parameter. It's required to stop using them before
> + * changing this setting and it's simpler to mark ports here and
> allow
> + * 'pmd_remove_stale_ports' to remove them from threads. There will
> be
> + * no actual reconfiguration in 'port_reconfigure' because it's
> + * unnecessary.  */
>  HMAP_FOR_EACH (port, node, >ports) {
> -if (netdev_is_reconf_required(port->netdev)) {
> +if (netdev_is_reconf_required(port->netdev)
> +|| (port->dynamic_txqs
> +!=  netdev_n_txq(port->netdev) < needed_txqs)) {

GCC caught the following

lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison in 
operand of '!=' [-Werror=parentheses]
 !=  netdev_n_txq(port->netdev) < needed_txqs)) {

>  port->need_reconfigure = true;
>  }
>  }
> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>  seq_change(dp->port_seq);
>  port_destroy(port);
>  } else {
> -port->dynamic_txqs = netdev_n_txq(port->netdev) <
> wanted_txqs;
> +port->dynamic_txqs = netdev_n_txq(port->netdev) <
> + needed_txqs;
>  }
>  }
> 
> --
> 2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS NAT feature using contrackd with asymmetric active-active mode

2017-06-15 Thread Sangho Shin
Hello,

I am testing OVS NAT features using OVS 2.6.1. 
I have set up two gateways using OVS, where source NAT is performed. The two 
gateways are active - active and are connected to a router.
So, with router ECMP support, the ingress packet might pass through different 
gateway in an asymmetric way. To support the asymmetric case, we use 
‘conntrackd’ to synchronize the conntrack entries.

I found out that using conntrackd active-active mode connrack entries are 
synchronized successfully, and it works with ICMP and UDP. However, TCP is NOT 
working strangely. I have checked that the conntrack entries for the TCP 
connection including NAT information are there in the two gateways. However, 
when SYN_ACK packet comes to a different gateway from egress gateway, OVS NAT 
is not working. OVS NAT works only when the SYN_ACK (ingress packet) comes 
though the same gateway as the egress packet (SYN).

I wonder if anybody has tested the same case. 


The following two rules are used for NAT

(egress)
cookie=0x493114b4dc, duration=4239.867s, table=0, n_packets=9, n_bytes=736, 
send_flow_rem priority=40500,ip,dl_dst=fe:00:00:00:00:02 
actions=ct(commit,nat(src=172.27.0.1)),set_field:fe:00:00:00:00:01->eth_dst,set_field:fe:00:00:00:00:02->e

(ingress)
cookie=0x4975ba74d4, duration=4229.834s, table=0, n_packets=19, 
n_bytes=1926, send_flow_rem priority=40500,ct_state=-trk,ip,nw_dst=172.27.0.1 
actions=set_field:fe:00:00:00:00:01->eth_dst,ct(table=0,nat)


The following is the output of conntrack entry using "conntrack -L”

(Gateway 1)
tcp  6 117 SYN_SENT src=172.1.1.3 dst=216.187.11.75 sport=57998 dport=80 
[UNREPLIED] src=216.187.11.75 dst=172.27.0.1 sport=80 dport=57998 mark=0 use=1

(Gateway 2)
conntrack v1.4.1 (conntrack-tools): 59 flow entries have been shown.
tcp  6 103 SYN_SENT src=172.1.1.3 dst=216.187.11.75 sport=57998 dport=80 
[UNREPLIED] src=216.187.11.75 dst=172.27.0.1 sport=80 dport=57998 mark=0 use=1

Also, I have checked that iptables are NOT filtering anything.

Anybody has any idea why it is not working??

Thank you,

Sangho

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev