Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-03-05 Thread Adrien Mazarguil
On Mon, Feb 26, 2018 at 05:44:01PM +, Doherty, Declan wrote:
> On 13/02/2018 5:05 PM, Adrien Mazarguil wrote:
> > Hi,
> > 
> > Apologies for being late to this thread, I've read the ensuing discussion
> > (hope I didn't miss any) and also think rte_flow could be improved in
> > several ways to enable TEP support, in particular regarding the ordering of
> > actions.
> > 
> > On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm
> > not convinced rte_security chose the right path and would like to avoid
> > repeating the same mistakes if possible, more below.
> > 
> > On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote:
> > > This RFC contains a proposal to add a new tunnel endpoint API to DPDK 
> > > that when used
> > > in conjunction with rte_flow enables the configuration of inline data 
> > > path encapsulation
> > > and decapsulation of tunnel endpoint network overlays on accelerated IO 
> > > devices.
> > > 
> > > The proposed new API would provide for the creation, destruction, and
> > > monitoring of a tunnel endpoint in supporting hw, as well as capabilities 
> > > APIs to allow the
> > > acceleration features to be discovered by applications.

> > Although I'm not convinced an opaque object is the right approach, if we
> > choose this route I suggest the much simpler:
> > 
> >   struct rte_flow_action_tep_(encap|decap) {
> >   struct rte_tep *tep;
> >   uint32_t flow_id;
> >   };
> > 
> 
> That's a fair point, the only other action that we currently had the
> encap/decap actions supporting was the Ethernet item, and going back to a
> comment from Boris having the Ethernet header separate from the tunnel is
> probably not ideal anyway. As one of our reasons for using an opaque tep
> item was to allow modification of the TEP independently of all the flows
> being carried on it. So for instance if the src or dst MAC needs to be
> modified or the output port needs to changed, the TEP itself could be
> modified.

Makes sense. I think there's now consensus that without a dedicated API, it
can be done through multiple rte_flow groups and "jump" actions targeting
them. Such actions remain to be formally defined though.

In the meantime there is an alternative approach when opaque pattern
items/actions are unavoidable: by using negative values [1].

In addition to an opaque object to use with rte_flow, a PMD could return a
PMD-specific negative value cast as enum rte_flow_{item,action}_type and
usable with the associated port ID only.

An API could even initialize a pattern item or an action object directly:

 struct rte_flow_action tep_action;
 
 if (rte_tep_create(port_id, &tep_action, ...) != 0)
  rte_panic("no!");
 /*
  * tep_action is now initialized with an opaque type and conf pointer, it
  * can be used with rte_flow_create() as part of an action list.
  */

[1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#negative-types


> > > struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);
> > > 
> > > Once the tep context is created flows can then be directed to that 
> > > endpoint for
> > > processing. The following sections will outline how the author envisage 
> > > flow
> > > programming will work and also how TEP acceleration can be combined with 
> > > other
> > > accelerations.
> > 
> > In order to allow a single TEP context object to be shared by multiple flow
> > rules, a whole new API must be implemented and applications still have to
> > additionally create one rte_flow rule per TEP flow_id to manage. While this
> > probably results in shorter flow rule patterns and action lists, is it
> > really worth it?
> > 
> > While I understand the reasons for this approach, I'd like to push for a
> > rte_flow-only API as much as possible, I'll provide suggestions below.
> > 
> 
> Not only are the rules shorter to implement, it could help to greatly
> reduces the amount of cycles required to add flows, both in terms of the
> application marshaling the data in rte_flow patterns and the PMD parsing
> that those patterns every time a flow is added, in the case where 10k's of
> flow are getting added per second this could add a significant overhead on
> the system.

True, although only if the underlying hardware supports it; some PMDs may
still have to update each flow rule independently in order to expose such an
API. Applications can't be certain an update operation will be quick and
atomic.


> > > /** VERY IMPORTANT NOTE **/
> > > One of the core concepts of this proposal is that actions which modify the
> > > packet are defined in the order which they are to be processed. So first 
> > > decap
> > > outer ethernet header, then the outer TEP headers.
> > > I think this is not only logical from a usability point of view, it 
> > > should also
> > > simplify the logic required in PMDs to parse the desired actions.
> > 
> > This. I've been thinking about it for a very long time but never got around
> > submit a patch. Handling rte_flow 

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-02-26 Thread Doherty, Declan

On 13/02/2018 5:05 PM, Adrien Mazarguil wrote:

Hi,

Apologies for being late to this thread, I've read the ensuing discussion
(hope I didn't miss any) and also think rte_flow could be improved in
several ways to enable TEP support, in particular regarding the ordering of
actions.

On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm
not convinced rte_security chose the right path and would like to avoid
repeating the same mistakes if possible, more below.

On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote:

This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when 
used
in conjunction with rte_flow enables the configuration of inline data path 
encapsulation
and decapsulation of tunnel endpoint network overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs 
to allow the
acceleration features to be discovered by applications.

/** Tunnel Endpoint context, opaque structure */
struct rte_tep;

enum rte_tep_type {
RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
...
};

/** Tunnel Endpoint Attributes */
struct rte_tep_attr {
enum rte_type_type type;

/* other endpoint attributes here */
}

/**
* Create a tunnel end-point context as specified by the flow attribute and 
pattern
*
* @param   port_id Port identifier of Ethernet device.
* @param   attrFlow rule attributes.
* @param   pattern Pattern specification by list of rte_flow_items.
* @return
*  - On success returns pointer to TEP context
*  - On failure returns NULL
*/
struct rte_tep *rte_tep_create(uint16_t port_id,
   struct rte_tep_attr *attr, struct rte_flow_item 
pattern[])

/**
* Destroy an existing tunnel end-point context. All the end-points context
* will be destroyed, so all active flows using tep should be freed before
* destroying context.
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)

/**
* Get tunnel endpoint statistics
*
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @param   stats  Tunnel endpoint statistics
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
Int
rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
   struct rte_tep_stats *stats)

/**
* Get ports tunnel endpoint capabilities
*
* @param   port_idPort identifier of Ethernet device.
* @param   capabilitiesTunnel endpoint capabilities
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int
rte_tep_capabilities_get(uint16_t port_id,
   struct rte_tep_capabilities *capabilities)


To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
enhanced to add a new flow item type. This contains a pointer to the
TEP context as well as the overlay flow id to which the traffic flow is
associated.

struct rte_flow_item_tep {
struct rte_tep *tep;
uint32_t flow_id;
}


What I dislike is rte_flow item/actions relying on externally-generated
opaque objects when these can be avoided, as it means yet another API
applications have to deal with and PMDs need to implement; this adds a layer
of inefficiency in my opinion.

I believe TEP can be fully implemented through a combination of new rte_flow
pattern items/actions without involving external API calls. More on that
later.


Also 2 new generic actions types are added encapsulation and decapsulation.

RTE_FLOW_ACTION_TYPE_ENCAP
RTE_FLOW_ACTION_TYPE_DECAP

struct rte_flow_action_encap {
struct rte_flow_item *item;
}

struct rte_flow_action_decap {
struct rte_flow_item *item;
}


Encap/decap actions are definitely needed and useful, no question about
that. I'm unsure about doing so through a generic action with the described
structures instead of dedicated ones though.

These can't work with anything other than rte_flow_item_tep; a special
pattern item using some kind of opaque object is needed (e.g. using
rte_flow_item_tcp makes no sense with them).

Also struct rte_flow_item is tailored for flow rule patterns, using it with
actions is not only confusing, it makes its "mask" and "last" members
useless and inconsistent with their documentation.

Although I'm not convinced an opaque object is the right approach, if we
choose this route I suggest the much simpler:

  struct rte_flow_action_tep_(encap|decap) {
  struct rte_tep *tep;
  uint32_t flow_id;
  };



That's a fair point, the only other action that we currently had the 
encap/decap actions supporting was the 

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-02-13 Thread Adrien Mazarguil
Hi,

Apologies for being late to this thread, I've read the ensuing discussion
(hope I didn't miss any) and also think rte_flow could be improved in
several ways to enable TEP support, in particular regarding the ordering of
actions.

On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm
not convinced rte_security chose the right path and would like to avoid
repeating the same mistakes if possible, more below.

On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote:
> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that 
> when used
> in conjunction with rte_flow enables the configuration of inline data path 
> encapsulation
> and decapsulation of tunnel endpoint network overlays on accelerated IO 
> devices.
> 
> The proposed new API would provide for the creation, destruction, and
> monitoring of a tunnel endpoint in supporting hw, as well as capabilities 
> APIs to allow the
> acceleration features to be discovered by applications.
> 
> /** Tunnel Endpoint context, opaque structure */
> struct rte_tep;
> 
> enum rte_tep_type {
>RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
>RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
>...
> };
> 
> /** Tunnel Endpoint Attributes */
> struct rte_tep_attr {
>enum rte_type_type type;
> 
>/* other endpoint attributes here */
> }
> 
> /**
> * Create a tunnel end-point context as specified by the flow attribute and 
> pattern
> *
> * @param   port_id Port identifier of Ethernet device.
> * @param   attrFlow rule attributes.
> * @param   pattern Pattern specification by list of rte_flow_items.
> * @return
> *  - On success returns pointer to TEP context
> *  - On failure returns NULL
> */
> struct rte_tep *rte_tep_create(uint16_t port_id,
>   struct rte_tep_attr *attr, struct rte_flow_item 
> pattern[])
> 
> /**
> * Destroy an existing tunnel end-point context. All the end-points context
> * will be destroyed, so all active flows using tep should be freed before
> * destroying context.
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)
> 
> /**
> * Get tunnel endpoint statistics
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @param   stats  Tunnel endpoint statistics
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> Int
> rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
>   struct rte_tep_stats *stats)
> 
> /**
> * Get ports tunnel endpoint capabilities
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   capabilitiesTunnel endpoint capabilities
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int
> rte_tep_capabilities_get(uint16_t port_id,
>   struct rte_tep_capabilities *capabilities)
> 
> 
> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
> enhanced to add a new flow item type. This contains a pointer to the
> TEP context as well as the overlay flow id to which the traffic flow is
> associated.
> 
> struct rte_flow_item_tep {
>struct rte_tep *tep;
>uint32_t flow_id;
> }

What I dislike is rte_flow item/actions relying on externally-generated
opaque objects when these can be avoided, as it means yet another API
applications have to deal with and PMDs need to implement; this adds a layer
of inefficiency in my opinion.

I believe TEP can be fully implemented through a combination of new rte_flow
pattern items/actions without involving external API calls. More on that
later.

> Also 2 new generic actions types are added encapsulation and decapsulation.
> 
> RTE_FLOW_ACTION_TYPE_ENCAP
> RTE_FLOW_ACTION_TYPE_DECAP
> 
> struct rte_flow_action_encap {
>struct rte_flow_item *item;
> }
> 
> struct rte_flow_action_decap {
>struct rte_flow_item *item;
> }

Encap/decap actions are definitely needed and useful, no question about
that. I'm unsure about doing so through a generic action with the described
structures instead of dedicated ones though.

These can't work with anything other than rte_flow_item_tep; a special
pattern item using some kind of opaque object is needed (e.g. using
rte_flow_item_tcp makes no sense with them).

Also struct rte_flow_item is tailored for flow rule patterns, using it with
actions is not only confusing, it makes its "mask" and "last" members
useless and inconsistent with their documentation.

Although I'm not convinced an opaque object is the right approach, if we
choose this route I suggest the much simpler:

 struct rte_flow_action_tep_(encap|decap) {
 struct rte_tep *

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-02-01 Thread Shahaf Shuler
Hi Declan, sorry for the late response. 

Tuesday, January 23, 2018 5:36 PM, Doherty, Declan:
> > If I get it right, the API proposed here is to have a tunnel endpoint which 
> > is
> a logical port on top of ethdev port. the TEP is able to receive and monitor
> some specific tunneled traffic, for example VXLAN, GENEVE and more.
> > For example, VXLAN TEP can have multiple flows with different VNIs all
> under the same context.
> >
> > Now, with the current rte_flow APIs, we can do exactly the same and give
> the application the full flexibility to group the tunnel flows into logical 
> TEP.
> > On this suggestion application will:
> > 1. Create rte_flow rules for the pattern it want to receive.
> > 2. In case it is interested in counting, a COUNT action will be added to the
> flow.
> > 3. In case header manipulation is required, a DECAP/ENCAP/REWRITE
> action will be added to the flow.
> > 4. Grouping of flows into a logical TEP will be done on the application 
> > layer
> simply by keeping the relevant rte_flow rules in some dedicated struct. With
> it, create/destroy TEP can be translated to create/destroy the flow rules.
> Statistics query can be done be querying each flow count and sum. Note that
> some devices can support the same counter for multiple flows. Even though
> it is not yet exposed in rte_flow this can be an interesting optimization.
> 
> As I responsed in John's mail I think this approach fails in devices which
> support switching offload also. As the flows never hit the host application
> configuring the TEP and flows there is no easy way to sum those statistics,

Devices which supports switching offloads must use NIC support to count the 
flows. It can be either by associating count action with a flow or by using TEP 
in your proposal.
The TEP counting could be introduced in another way - instead of having 1:1 
relation between flow counter and rte_flow, to introduce a counter element 
which can be attached to multiple flows. 
So this counter element along with the rte_flows it is associate with are 
basically the TEP:
1. it holds the sum of statistics from all the TEP flows it is associate with.
2. it holds the receive pattern 

My point is, I don't think it is correct to bound between the TEP and the 
switching offloads actions (encap/decap/rewrite on this context). 
The TEP can be presented as auxiliary library/API to help with the flows 
grouping, however application still need to have the ability to make the switch 
offloads control as it wish. 

> also flows are transitory in terms of runtime so it would not be possible to
> keep accurate statistics over a period of time.

Am not sure I understand what you mean here. 
In order to receive traffic you need flows. Even the default RSS configuration 
of the PMD can be described by rte_flows. 
So as long as one receive traffic it has one/more flows configured on the 
device. 

> 
> 
> >
> >>>
>  As for the capabilities - what specifically you had in mind? The
>  current
> >>> usage you show with tep is with rte_flow rules. There are no
> >>> capabilities currently for rte_flow supported actions/pattern. To
> >>> check such capabilities application uses rte_flow_validate.
> >>>
> >>> I envisaged that the application should be able to see if an ethdev
> >>> can support TEP in the rx/tx offloads, and then the
> >>> rte_tep_capabilities would allow applications to query what tunnel
> >>> endpoint protocols are supported etc. I would like a simple
> >>> mechanism to allow users to see if a particular tunnel endpoint type
> >>> is supported without having to build actual flows to validate.
> >>
> >> I can see the value of that, but in the end wouldn't the API call
> >> rte_flow_validate anyways? Maybe we don't add the layer now or maybe
> >> it doesn't really belong in DPDK? I'm in favor of deferring the
> >> capabilities API until we know it's really needed.  I hate to see
> >> special capabilities APIs start sneaking in after we decided to go
> >> the rte_flow_validate route and users are starting to get used to it.
> >
> > I don't see how it is different from any other rte_flow creation.
> > We don't hold caps for device ability to filter packets according to VXLAN 
> > or
> GENEVE items. Why we should start now?
> 
> I don't know, possibly if it makes adoption of the features easier for the end
> user.
> 
> >
> > We have already the rte_flow_veirfy. I think part of the reasons for it was
> that the number of different capabilities possible with rte_flow is huge. I
> think this also the case with the TEP capabilities (even though It is still 
> not
> clear to me what exactly they will include).
> 
> It may be that only need advertise that we are capable of encap/decap
> services, but it would be good to have input from downstream users what
> they would like to see.
> 
> >
> >>>
>  Regarding the creation/destroy of tep. Why not simply use rte_flow
>  API
> >>> and avoid this extra control?
>  For example - with 17.11 

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-23 Thread Doherty, Declan

On 11/01/2018 9:44 PM, John Daley (johndale) wrote:

Hi,
One comment on DECAP action and a "feature request".  I'll also reply to the 
top of thread discussion separately. Thanks for the RFC Declan!

Feature request associated with ENCAP action:

VPP (and probably other apps) would like the ability to simply specify an 
independent tunnel ID as part of egress match criteria in an rte_flow rule. 
Then egress packets could specify a tunnel ID  and valid flag in the mbuf. If 
it matched the rte_flow tunnel ID item, a simple lookup in the nic could be 
done and the associated actions (particularly ENCAP) executed. The application 
already know the tunnel that the packet is associated with so no need to have 
the nic do matching on a header pattern. Plus it's possible that packet headers 
alone are not enough to determine the correct encap action (the bridge where 
the packet came from might be required).

This would require a new mbuf field to specify the tunnel ID (maybe in 
tx_offload) and a valid flag.  It would also require a new rte flow item type 
for matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID).

Is something like this being considered by others? If not, should it be part of 
this RFC or a new one? I think this would be the 1st meta-data match criteria 
in rte_flow, but I could see others following.


This sounds similar to what we needed to do in rte_security to support 
metadata for inline crypto on the ixgbe. I wasn't aware of devices which 
supported this type of function for overlaps, but it definitely sounds 
like we need to consider it here.




-johnd


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
Sent: Thursday, December 21, 2017 2:21 PM
To: dev@dpdk.org
Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
when used in conjunction with rte_flow enables the configuration of inline
data path encapsulation and decapsulation of tunnel endpoint network
overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
to allow the acceleration features to be discovered by applications.

/** Tunnel Endpoint context, opaque structure */ struct rte_tep;

enum rte_tep_type {
RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
...
};

/** Tunnel Endpoint Attributes */
struct rte_tep_attr {
enum rte_type_type type;

/* other endpoint attributes here */ }

/**
* Create a tunnel end-point context as specified by the flow attribute and
pattern
*
* @param   port_id Port identifier of Ethernet device.
* @param   attrFlow rule attributes.
* @param   pattern Pattern specification by list of rte_flow_items.
* @return
*  - On success returns pointer to TEP context
*  - On failure returns NULL
*/
struct rte_tep *rte_tep_create(uint16_t port_id,
   struct rte_tep_attr *attr, struct rte_flow_item 
pattern[])

/**
* Destroy an existing tunnel end-point context. All the end-points context
* will be destroyed, so all active flows using tep should be freed before
* destroying context.
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)

/**
* Get tunnel endpoint statistics
*
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @param   stats  Tunnel endpoint statistics
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
Int
rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
   struct rte_tep_stats *stats)

/**
* Get ports tunnel endpoint capabilities
*
* @param   port_idPort identifier of Ethernet device.
* @param   capabilitiesTunnel endpoint capabilities
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int
rte_tep_capabilities_get(uint16_t port_id,
   struct rte_tep_capabilities *capabilities)


To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
enhanced to add a new flow item type. This contains a pointer to the TEP
context as well as the overlay flow id to which the traffic flow is associated.

struct rte_flow_item_tep {
struct rte_tep *tep;
uint32_t flow_id;
}

Also 2 new generic actions types are added encapsulation and decapsulation.

RTE_FLOW_ACTION_TYPE_ENCAP
RTE_FLOW_ACTION_TYPE_DECAP

struct rte_flow_action_encap {
struct rte_flow_item *item; }

struct rte_flow_action_decap {
struct rte_flow_item *item; }

The following section outlines the intended

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-23 Thread Doherty, Declan

On 16/01/2018 8:22 AM, Shahaf Shuler wrote:

Thursday, January 11, 2018 11:45 PM, John Daley:

Hi Declan and Shahaf,


I can't see how the existing
ethdev API could be used for statistics as a single ethdev could be
supporting may concurrent TEPs, therefore we would either need to use
the extended stats with many entries, one for each TEP, or if we treat
a TEP as an attribute of a port in a similar manner to the way
rte_security manages an IPsec SA, the state of each TEP can be
monitored and managed independently of both the overall port or the

flows being transported on that endpoint.

Assuming we can define one rte_flow rule per TEP, does what you propose
give us anything more than just using the COUNT action?


I agree with John here, and I also not sure we need such assumption.

If I get it right, the API proposed here is to have a tunnel endpoint which is 
a logical port on top of ethdev port. the TEP is able to receive and monitor 
some specific tunneled traffic, for example VXLAN, GENEVE and more.
For example, VXLAN TEP can have multiple flows with different VNIs all under 
the same context.

Now, with the current rte_flow APIs, we can do exactly the same and give the 
application the full flexibility to group the tunnel flows into logical TEP.
On this suggestion application will:
1. Create rte_flow rules for the pattern it want to receive.
2. In case it is interested in counting, a COUNT action will be added to the 
flow.
3. In case header manipulation is required, a DECAP/ENCAP/REWRITE action will 
be added to the flow.
4. Grouping of flows into a logical TEP will be done on the application layer 
simply by keeping the relevant rte_flow rules in some dedicated struct. With 
it, create/destroy TEP can be translated to create/destroy the flow rules. 
Statistics query can be done be querying each flow count and sum. Note that 
some devices can support the same counter for multiple flows. Even though it is 
not yet exposed in rte_flow this can be an interesting optimization.


As I responsed in John's mail I think this approach fails in devices 
which support switching offload also. As the flows never hit the host 
application configuring the TEP and flows there is no easy way to sum 
those statistics, also flows are transitory in terms of runtime so it 
would not be possible to keep accurate statistics over a period of time.








As for the capabilities - what specifically you had in mind? The
current

usage you show with tep is with rte_flow rules. There are no
capabilities currently for rte_flow supported actions/pattern. To
check such capabilities application uses rte_flow_validate.

I envisaged that the application should be able to see if an ethdev
can support TEP in the rx/tx offloads, and then the
rte_tep_capabilities would allow applications to query what tunnel
endpoint protocols are supported etc. I would like a simple mechanism
to allow users to see if a particular tunnel endpoint type is
supported without having to build actual flows to validate.


I can see the value of that, but in the end wouldn't the API call
rte_flow_validate anyways? Maybe we don't add the layer now or maybe it
doesn't really belong in DPDK? I'm in favor of deferring the capabilities API
until we know it's really needed.  I hate to see special capabilities APIs start
sneaking in after we decided to go the rte_flow_validate route and users are
starting to get used to it.


I don't see how it is different from any other rte_flow creation.
We don't hold caps for device ability to filter packets according to VXLAN or 
GENEVE items. Why we should start now?


I don't know, possibly if it makes adoption of the features easier for 
the end user.




We have already the rte_flow_veirfy. I think part of the reasons for it was 
that the number of different capabilities possible with rte_flow is huge. I 
think this also the case with the TEP capabilities (even though It is still not 
clear to me what exactly they will include).


It may be that only need advertise that we are capable of encap/decap 
services, but it would be good to have input from downstream users what 
they would like to see.







Regarding the creation/destroy of tep. Why not simply use rte_flow
API

and avoid this extra control?

For example - with 17.11 APIs, application can put the port in
isolate mode,

and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to
some queue/do RSS. Such operation, per my understanding, will create a
tunnel endpoint. What are the down sides of doing it with the current

APIs?


That doesn't enable encapsulation and decapsulation of the outer
tunnel endpoint in the hw as far as I know. Apart from the inability
to monitor the endpoint statistics I mentioned above. It would also
require that you redefine the endpoints parameters ever time to you
wish to add a new flow to it. I think the having the rte_tep object
semantics should also simplify the ability to enable a full vswitch
offload of TEP where the hw i

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-23 Thread Doherty, Declan

On 11/01/2018 9:45 PM, John Daley (johndale) wrote:

Hi Declan and Shahaf,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
Sent: Tuesday, January 09, 2018 9:31 AM
To: Shahaf Shuler ; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

On 24/12/2017 5:30 PM, Shahaf Shuler wrote:

Hi Declan,



Hey Shahaf, apologies for the delay in responding, I have been out of office
for the last 2 weeks.


Friday, December 22, 2017 12:21 AM, Doherty, Declan:

This RFC contains a proposal to add a new tunnel endpoint API to DPDK
that when used in conjunction with rte_flow enables the configuration
of inline data path encapsulation and decapsulation of tunnel
endpoint network overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as
capabilities APIs to allow the acceleration features to be discovered by

applications.







Am not sure I understand why there is a need for the above control

methods.

Are you introducing a new "tep device" ? > As the tunnel endpoint is
sending and receiving Ethernet packets from

the network I think it should still be counted as Ethernet device but with
more capabilities (for example it supported encap/decap etc..), therefore it
should use the Ethdev layer API to query statistics (for example).

No, the new APIs are only intended to be a method of creating, monitoring
and deleting tunnel-endpoints on an existing ethdev. The rationale for APIs
separate to rte_flow are the same as that in the rte_security, there is not a
1:1 mapping of TEPs to flows. Many flows (VNI's in VxLAN for example) can
be originate/terminate on the same TEP, therefore managing the TEP
independently of the flows being transmitted on it is important to allow
visibility of that endpoint stats for example.


I don't quite understand what you mean by tunnel and flow here. Can you define 
exactly what you mean? Flow is an overloaded word in our world. I think that 
defining it will make understanding the RFC a little easier.



Hey John,

I think that's a good idea, for me the tunnel endpoint defines the l3/l4 
parameters of the endpoint, so for VxLAN over IPv4 this would include 
the IPv4, UDP and VxLAN headers excluding the VNI(flow id). I'm not sure 
if it makes more sense that each TEP contains the VNI(flow id) or not. I 
believe the model currently used by OvS today is similar to the RFC it 
that many VNIs can be terminated in the same TEP port context.


From terms of flows definitions, for encapsulated ingress I would see 
the definition of a flow to include the l2 and l3/l4 headers of the 
outer including the flow id of the tunnel and optionally include any or 
all of the inner headers. For non-encapsulated egress traffic the flow 
defines any combination of the l2, l3, l4 headers as defined by the user.



Taking VxLAN, I think of the tunnel as including up through the VxLAN header, 
including the VNI. If you go by this definition, I would consider a flow to be 
all packets with the same VNI and the same 5-tuple hash of the inner packet. Is 
this what you mean by tunnel (or TEP) and flow here?


Yes, with the exception that I had excluded the or flow id from the TEP 
definition and it was part of the flow but otherwise essentially yes.




With these definitions, VPP for example might need up to a couple thousand TEPs 
on an interface and each TEP could have hundreds or thousands of flows. It 
would be quite possible to have 1 rte flow rule per TEP (or 2- ingress/decap 
and egress/encap). The COUNT action could be used to count the number of 
packets through each TEP. Is this adequate, or are you proposing that we need a 
mechanism to get stats of flows within each TEP? Is that the main point of the 
API? Assuming no need for stats on a per TEP/flow basis is there anything else 
the API adds?


Yes the basis of having TEP as separate API is to allow flows to tracked 
independently of the overlay they may be transported on. I believe this 
will be a requirement for acceleration of any vswitch, as we could have 
a case that flows are bypassing the host vswitch completely and 
encap/decap and switched in hw directly to/from the guest to physical 
port. OvS currently can track both flows and TEP statistics and I think 
we need to support this model.

I can't see how the existing
ethdev API could be used for statistics as a single ethdev could be supporting
may concurrent TEPs, therefore we would either need to use the extended
stats with many entries, one for each TEP, or if we treat a TEP as an attribute
of a port in a similar manner to the way rte_security manages an IPsec SA,
the state of each TEP can be monitored and managed independently of both
the overall port or the flows being transported on that endpoint.


Assuming we can define one rte_flow rule per TEP, does what you pr

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-16 Thread Shahaf Shuler
Thursday, January 11, 2018 11:45 PM, John Daley:
> Hi Declan and Shahaf,
> 
> > I can't see how the existing
> > ethdev API could be used for statistics as a single ethdev could be
> > supporting may concurrent TEPs, therefore we would either need to use
> > the extended stats with many entries, one for each TEP, or if we treat
> > a TEP as an attribute of a port in a similar manner to the way
> > rte_security manages an IPsec SA, the state of each TEP can be
> > monitored and managed independently of both the overall port or the
> flows being transported on that endpoint.
> 
> Assuming we can define one rte_flow rule per TEP, does what you propose
> give us anything more than just using the COUNT action?

I agree with John here, and I also not sure we need such assumption. 

If I get it right, the API proposed here is to have a tunnel endpoint which is 
a logical port on top of ethdev port. the TEP is able to receive and monitor 
some specific tunneled traffic, for example VXLAN, GENEVE and more.
For example, VXLAN TEP can have multiple flows with different VNIs all under 
the same context. 

Now, with the current rte_flow APIs, we can do exactly the same and give the 
application the full flexibility to group the tunnel flows into logical TEP. 
On this suggestion application will:
1. Create rte_flow rules for the pattern it want to receive.
2. In case it is interested in counting, a COUNT action will be added to the 
flow.
3. In case header manipulation is required, a DECAP/ENCAP/REWRITE action will 
be added to the flow. 
4. Grouping of flows into a logical TEP will be done on the application layer 
simply by keeping the relevant rte_flow rules in some dedicated struct. With 
it, create/destroy TEP can be translated to create/destroy the flow rules. 
Statistics query can be done be querying each flow count and sum. Note that 
some devices can support the same counter for multiple flows. Even though it is 
not yet exposed in rte_flow this can be an interesting optimization. 

> >
> > > As for the capabilities - what specifically you had in mind? The
> > > current
> > usage you show with tep is with rte_flow rules. There are no
> > capabilities currently for rte_flow supported actions/pattern. To
> > check such capabilities application uses rte_flow_validate.
> >
> > I envisaged that the application should be able to see if an ethdev
> > can support TEP in the rx/tx offloads, and then the
> > rte_tep_capabilities would allow applications to query what tunnel
> > endpoint protocols are supported etc. I would like a simple mechanism
> > to allow users to see if a particular tunnel endpoint type is
> > supported without having to build actual flows to validate.
> 
> I can see the value of that, but in the end wouldn't the API call
> rte_flow_validate anyways? Maybe we don't add the layer now or maybe it
> doesn't really belong in DPDK? I'm in favor of deferring the capabilities API
> until we know it's really needed.  I hate to see special capabilities APIs 
> start
> sneaking in after we decided to go the rte_flow_validate route and users are
> starting to get used to it.

I don't see how it is different from any other rte_flow creation.
We don't hold caps for device ability to filter packets according to VXLAN or 
GENEVE items. Why we should start now?

We have already the rte_flow_veirfy. I think part of the reasons for it was 
that the number of different capabilities possible with rte_flow is huge. I 
think this also the case with the TEP capabilities (even though It is still not 
clear to me what exactly they will include). 

> >
> > > Regarding the creation/destroy of tep. Why not simply use rte_flow
> > > API
> > and avoid this extra control?
> > > For example - with 17.11 APIs, application can put the port in
> > > isolate mode,
> > and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to
> > some queue/do RSS. Such operation, per my understanding, will create a
> > tunnel endpoint. What are the down sides of doing it with the current
> APIs?
> >
> > That doesn't enable encapsulation and decapsulation of the outer
> > tunnel endpoint in the hw as far as I know. Apart from the inability
> > to monitor the endpoint statistics I mentioned above. It would also
> > require that you redefine the endpoints parameters ever time to you
> > wish to add a new flow to it. I think the having the rte_tep object
> > semantics should also simplify the ability to enable a full vswitch
> > offload of TEP where the hw is handling both encap/decap and switching to
> a particular port.
> 
> If we have the ingress/decap and egress/encap actions and 1 rte_flow rule
> per TEP and use the COUNT action, I think we get all but the last bit. For 
> that,
> perhaps the application could keep  ingress and egress rte_flow template for
> each tunnel type (VxLAN, GRE, ..). Then copying the template and filling in
> the outer packet info and tunnel Id is all that would be required. We could
> also define these in rte_fl

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-11 Thread John Daley (johndale)
Hi Declan and Shahaf,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
> Sent: Tuesday, January 09, 2018 9:31 AM
> To: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
> 
> On 24/12/2017 5:30 PM, Shahaf Shuler wrote:
> > Hi Declan,
> >
> 
> Hey Shahaf, apologies for the delay in responding, I have been out of office
> for the last 2 weeks.
> 
> > Friday, December 22, 2017 12:21 AM, Doherty, Declan:
> >> This RFC contains a proposal to add a new tunnel endpoint API to DPDK
> >> that when used in conjunction with rte_flow enables the configuration
> >> of inline data path encapsulation and decapsulation of tunnel
> >> endpoint network overlays on accelerated IO devices.
> >>
> >> The proposed new API would provide for the creation, destruction, and
> >> monitoring of a tunnel endpoint in supporting hw, as well as
> >> capabilities APIs to allow the acceleration features to be discovered by
> applications.
> >>
> 
> >
> >
> > Am not sure I understand why there is a need for the above control
> methods.
> > Are you introducing a new "tep device" ? > As the tunnel endpoint is
> > sending and receiving Ethernet packets from
> the network I think it should still be counted as Ethernet device but with
> more capabilities (for example it supported encap/decap etc..), therefore it
> should use the Ethdev layer API to query statistics (for example).
> 
> No, the new APIs are only intended to be a method of creating, monitoring
> and deleting tunnel-endpoints on an existing ethdev. The rationale for APIs
> separate to rte_flow are the same as that in the rte_security, there is not a
> 1:1 mapping of TEPs to flows. Many flows (VNI's in VxLAN for example) can
> be originate/terminate on the same TEP, therefore managing the TEP
> independently of the flows being transmitted on it is important to allow
> visibility of that endpoint stats for example.

I don't quite understand what you mean by tunnel and flow here. Can you define 
exactly what you mean? Flow is an overloaded word in our world. I think that 
defining it will make understanding the RFC a little easier.

Taking VxLAN, I think of the tunnel as including up through the VxLAN header, 
including the VNI. If you go by this definition, I would consider a flow to be 
all packets with the same VNI and the same 5-tuple hash of the inner packet. Is 
this what you mean by tunnel (or TEP) and flow here?

With these definitions, VPP for example might need up to a couple thousand TEPs 
on an interface and each TEP could have hundreds or thousands of flows. It 
would be quite possible to have 1 rte flow rule per TEP (or 2- ingress/decap 
and egress/encap). The COUNT action could be used to count the number of 
packets through each TEP. Is this adequate, or are you proposing that we need a 
mechanism to get stats of flows within each TEP? Is that the main point of the 
API? Assuming no need for stats on a per TEP/flow basis is there anything else 
the API adds?

> I can't see how the existing
> ethdev API could be used for statistics as a single ethdev could be supporting
> may concurrent TEPs, therefore we would either need to use the extended
> stats with many entries, one for each TEP, or if we treat a TEP as an 
> attribute
> of a port in a similar manner to the way rte_security manages an IPsec SA,
> the state of each TEP can be monitored and managed independently of both
> the overall port or the flows being transported on that endpoint.

Assuming we can define one rte_flow rule per TEP, does what you propose give us 
anything more than just using the COUNT action?
> 
> > As for the capabilities - what specifically you had in mind? The current
> usage you show with tep is with rte_flow rules. There are no capabilities
> currently for rte_flow supported actions/pattern. To check such capabilities
> application uses rte_flow_validate.
> 
> I envisaged that the application should be able to see if an ethdev can
> support TEP in the rx/tx offloads, and then the rte_tep_capabilities would
> allow applications to query what tunnel endpoint protocols are supported
> etc. I would like a simple mechanism to allow users to see if a particular
> tunnel endpoint type is supported without having to build actual flows to
> validate.

I can see the value of that, but in the end wouldn't the API call 
rte_flow_validate anyways? Maybe we don't add the layer now or maybe it doesn't 
really belong in DPDK? I'm in favor of deferring the capabilities API until we 
know it's really needed.  I hate to see special capabilities APIs start 
sneaking in after we decid

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-11 Thread John Daley (johndale)
Hi,
One comment on DECAP action and a "feature request".  I'll also reply to the 
top of thread discussion separately. Thanks for the RFC Declan!

Feature request associated with ENCAP action:

VPP (and probably other apps) would like the ability to simply specify an 
independent tunnel ID as part of egress match criteria in an rte_flow rule. 
Then egress packets could specify a tunnel ID  and valid flag in the mbuf. If 
it matched the rte_flow tunnel ID item, a simple lookup in the nic could be 
done and the associated actions (particularly ENCAP) executed. The application 
already know the tunnel that the packet is associated with so no need to have 
the nic do matching on a header pattern. Plus it's possible that packet headers 
alone are not enough to determine the correct encap action (the bridge where 
the packet came from might be required). 

This would require a new mbuf field to specify the tunnel ID (maybe in 
tx_offload) and a valid flag.  It would also require a new rte flow item type 
for matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID).

Is something like this being considered by others? If not, should it be part of 
this RFC or a new one? I think this would be the 1st meta-data match criteria 
in rte_flow, but I could see others following. 

-johnd

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
> Sent: Thursday, December 21, 2017 2:21 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
> 
> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
> when used in conjunction with rte_flow enables the configuration of inline
> data path encapsulation and decapsulation of tunnel endpoint network
> overlays on accelerated IO devices.
> 
> The proposed new API would provide for the creation, destruction, and
> monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
> to allow the acceleration features to be discovered by applications.
> 
> /** Tunnel Endpoint context, opaque structure */ struct rte_tep;
> 
> enum rte_tep_type {
>RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
>RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
>...
> };
> 
> /** Tunnel Endpoint Attributes */
> struct rte_tep_attr {
>enum rte_type_type type;
> 
>/* other endpoint attributes here */ }
> 
> /**
> * Create a tunnel end-point context as specified by the flow attribute and
> pattern
> *
> * @param   port_id Port identifier of Ethernet device.
> * @param   attrFlow rule attributes.
> * @param   pattern Pattern specification by list of rte_flow_items.
> * @return
> *  - On success returns pointer to TEP context
> *  - On failure returns NULL
> */
> struct rte_tep *rte_tep_create(uint16_t port_id,
>   struct rte_tep_attr *attr, struct rte_flow_item 
> pattern[])
> 
> /**
> * Destroy an existing tunnel end-point context. All the end-points context
> * will be destroyed, so all active flows using tep should be freed before
> * destroying context.
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)
> 
> /**
> * Get tunnel endpoint statistics
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @param   stats  Tunnel endpoint statistics
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> Int
> rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
>   struct rte_tep_stats *stats)
> 
> /**
> * Get ports tunnel endpoint capabilities
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   capabilitiesTunnel endpoint capabilities
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int
> rte_tep_capabilities_get(uint16_t port_id,
>   struct rte_tep_capabilities *capabilities)
> 
> 
> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
> enhanced to add a new flow item type. This contains a pointer to the TEP
> context as well as the overlay flow id to which the traffic flow is 
> associated.
> 
> struct rte_flow_item_tep {
>struct rte_tep *tep;
>uint32_t flow_id;
> }
> 
> Also 2 new generic actions types are added encapsulation and decapsulation.
> 
> RTE_FLOW_ACTION_TYPE_ENCAP
> RTE_FLOW_ACTION_TYPE_DECAP
> 
> struct rte_flow_action_encap {
>struct rte_flow_item *item; }
> 
> struct rte_flow_action_decap {
>struct rte_flow_item *item; }
> 
> The following section outlines the intended usage of the new APIs and then
> how they are combined with the existing rte_flow APIs.
> 
> Tunnel endp

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-10 Thread Doherty, Declan

Adding discussion back to list.

On 24/12/2017 2:18 PM, Boris Pismenny wrote:

Hi Declan

On 12/22/2017 12:21 AM, Doherty, Declan wrote:
This RFC contains a proposal to add a new tunnel endpoint API to DPDK 
that when used
in conjunction with rte_flow enables the configuration of inline data 
path encapsulation
and decapsulation of tunnel endpoint network overlays on accelerated 
IO devices.


The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as 
capabilities APIs to allow the

acceleration features to be discovered by applications.

/** Tunnel Endpoint context, opaque structure */
struct rte_tep;

enum rte_tep_type {
    RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
    RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
    ...
};

/** Tunnel Endpoint Attributes */
struct rte_tep_attr {
    enum rte_type_type type;

    /* other endpoint attributes here */
}

/**
* Create a tunnel end-point context as specified by the flow attribute 
and pattern

*
* @param   port_id Port identifier of Ethernet device.
* @param   attr    Flow rule attributes.
* @param   pattern Pattern specification by list of rte_flow_items.
* @return
*  - On success returns pointer to TEP context
*  - On failure returns NULL
*/
struct rte_tep *rte_tep_create(uint16_t port_id,
   struct rte_tep_attr *attr, struct 
rte_flow_item pattern[])


/**
* Destroy an existing tunnel end-point context. All the end-points 
context

* will be destroyed, so all active flows using tep should be freed before
* destroying context.
* @param   port_id    Port identifier of Ethernet device.
* @param   tep    Tunnel endpoint context
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)

/**
* Get tunnel endpoint statistics
*
* @param   port_id    Port identifier of Ethernet device.
* @param   tep    Tunnel endpoint context
* @param   stats  Tunnel endpoint statistics
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
Int
rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
   struct rte_tep_stats *stats)

/**
* Get ports tunnel endpoint capabilities
*
* @param   port_id    Port identifier of Ethernet device.
* @param   capabilities    Tunnel endpoint capabilities
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int
rte_tep_capabilities_get(uint16_t port_id,
   struct rte_tep_capabilities *capabilities)


To direct traffic flows to hw terminated tunnel endpoint the rte_flow 
API is

enhanced to add a new flow item type. This contains a pointer to the
TEP context as well as the overlay flow id to which the traffic flow is
associated.

struct rte_flow_item_tep {
    struct rte_tep *tep;
    uint32_t flow_id;
}

Also 2 new generic actions types are added encapsulation and 
decapsulation.


RTE_FLOW_ACTION_TYPE_ENCAP
RTE_FLOW_ACTION_TYPE_DECAP

struct rte_flow_action_encap {
    struct rte_flow_item *item;
}

struct rte_flow_action_decap {
    struct rte_flow_item *item;
}

The following section outlines the intended usage of the new APIs and 
then how

they are combined with the existing rte_flow APIs.

Tunnel endpoints are created on logical ports which support the 
capability

using rte_tep_create() using a combination of TEP attributes and
rte_flow_items. In the example below a new IPv4 VxLAN endpoint is 
being defined.
The attrs parameter sets the TEP type, and could be used for other 
possible

attributes.

struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN };

The values for the headers which make up the tunnel endpointr are then
defined using spec parameter in the rte flow items (IPv4, UDP and
VxLAN in this case)

struct rte_flow_item_ipv4 ipv4_item = {
    .hdr = { .src_addr = saddr, .dst_addr = daddr }
};

struct rte_flow_item_udp udp_item = {
    .hdr = { .src_port = sport, .dst_port = dport }
};

struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags };

struct rte_flow_item pattern[] = {
    { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
    { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item },
    { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = 
&vxlan_item },

    { .type = RTE_FLOW_ITEM_TYPE_END }
};

The tunnel endpoint can then be create on the port. Whether or not any hw
configuration is required at this point would be hw dependent, but if not
the context for the TEP is available for use in programming flow, so the
application is not forced to redefine the TEP parameters on each flow
addition.

struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);

Once the tep context is created flows can then be directed to that 
endpoint for
processing. The fol

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-09 Thread Doherty, Declan

On 24/12/2017 5:30 PM, Shahaf Shuler wrote:

Hi Declan,



Hey Shahaf, apologies for the delay in responding, I have been out of 
office for the last 2 weeks.



Friday, December 22, 2017 12:21 AM, Doherty, Declan:

This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
when used in conjunction with rte_flow enables the configuration of inline
data path encapsulation and decapsulation of tunnel endpoint network
overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
to allow the acceleration features to be discovered by applications.






Am not sure I understand why there is a need for the above control methods.
Are you introducing a new "tep device" ? > As the tunnel endpoint is sending and receiving Ethernet packets from 
the network I think it should still be counted as Ethernet device but 
with more capabilities (for example it supported encap/decap etc..), 
therefore it should use the Ethdev layer API to query statistics (for 
example).


No, the new APIs are only intended to be a method of creating, 
monitoring and deleting tunnel-endpoints on an existing ethdev. The 
rationale for APIs separate to rte_flow are the same as that in the 
rte_security, there is not a 1:1 mapping of TEPs to flows. Many flows 
(VNI's in VxLAN for example) can be originate/terminate on the same TEP, 
therefore managing the TEP independently of the flows being transmitted 
on it is important to allow visibility of that endpoint stats for 
example. I can't see how the existing ethdev API could be used for 
statistics as a single ethdev could be supporting may concurrent TEPs, 
therefore we would either need to use the extended stats with many 
entries, one for each TEP, or if we treat a TEP as an attribute of a 
port in a similar manner to the way rte_security manages an IPsec SA, 
the state of each TEP can be monitored and managed independently of both 
the overall port or the flows being transported on that endpoint.



As for the capabilities - what specifically you had in mind? The current usage 
you show with tep is with rte_flow rules. There are no capabilities currently 
for rte_flow supported actions/pattern. To check such capabilities application 
uses rte_flow_validate.


I envisaged that the application should be able to see if an ethdev can 
support TEP in the rx/tx offloads, and then the rte_tep_capabilities 
would allow applications to query what tunnel endpoint protocols are 
supported etc. I would like a simple mechanism to allow users to see if 
a particular tunnel endpoint type is supported without having to build 
actual flows to validate.



Regarding the creation/destroy of tep. Why not simply use rte_flow API and 
avoid this extra control?
For example - with 17.11 APIs, application can put the port in isolate mode, 
and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some 
queue/do RSS. Such operation, per my understanding, will create a tunnel 
endpoint. What are the down sides of doing it with the current APIs?


That doesn't enable encapsulation and decapsulation of the outer tunnel 
endpoint in the hw as far as I know. Apart from the inability to monitor
the endpoint statistics I mentioned above. It would also require that 
you redefine the endpoints parameters ever time to you wish to add a new 
flow to it. I think the having the rte_tep object semantics should also 
simplify the ability to enable a full vswitch offload of TEP where the 
hw is handling both encap/decap and switching to a particular port.







To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
enhanced to add a new flow item type. This contains a pointer to the TEP
context as well as the overlay flow id to which the traffic flow is associated.

struct rte_flow_item_tep {
struct rte_tep *tep;
uint32_t flow_id;
}


Can you provide more detailed definition about the flow id ? to which field 
from the packet headers it refers to?
On your below examples it looks like it is to match the VXLAN vni in case of 
VXLAN, what about the other protocols? And also, why not using the already 
exists VXLAN item?


I have only been looking initially at couple of the tunnel endpoint 
procotols, namely Geneve, NvGRE, and VxLAN, but the idea here is to 
allow the user to define the VNI in the case of Geneve and VxLAN and the 
VSID in the case of NvGRE on a per flow basis, as per my understanding 
these are used to identify the source/destination hosts on the overlay 
network independently from the endpoint there are transported across.


The VxLAN item is used in the creation of the TEP object, using the TEP 
object just removes the need for the user to constantly redefine all the 
tunnel parameters and also I think dependent on the hw implementation it 
may simplify the drivers work if it know the exact endpoint the actio

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-01-02 Thread Boris Pismenny


Hi Declan,

On 12/22/2017 12:21 AM, Doherty, Declan wrote:

This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when 
used
in conjunction with rte_flow enables the configuration of inline data path 
encapsulation
and decapsulation of tunnel endpoint network overlays on accelerated IO devices.

The proposed new API would provide for the creation, destruction, and
monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs 
to allow the
acceleration features to be discovered by applications.

/** Tunnel Endpoint context, opaque structure */
struct rte_tep;

enum rte_tep_type {
RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
...
};

/** Tunnel Endpoint Attributes */
struct rte_tep_attr {
enum rte_type_type type;

/* other endpoint attributes here */
}

/**
* Create a tunnel end-point context as specified by the flow attribute and 
pattern
*
* @param   port_id Port identifier of Ethernet device.
* @param   attrFlow rule attributes.
* @param   pattern Pattern specification by list of rte_flow_items.
* @return
*  - On success returns pointer to TEP context
*  - On failure returns NULL
*/
struct rte_tep *rte_tep_create(uint16_t port_id,
   struct rte_tep_attr *attr, struct rte_flow_item 
pattern[])

/**
* Destroy an existing tunnel end-point context. All the end-points context
* will be destroyed, so all active flows using tep should be freed before
* destroying context.
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)

/**
* Get tunnel endpoint statistics
*
* @param   port_idPort identifier of Ethernet device.
* @param   tepTunnel endpoint context
* @param   stats  Tunnel endpoint statistics
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
Int
rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
   struct rte_tep_stats *stats)

/**
* Get ports tunnel endpoint capabilities
*
* @param   port_idPort identifier of Ethernet device.
* @param   capabilitiesTunnel endpoint capabilities
*
* @return
*  - On success returns 0
*  - On failure returns 1
*/
int
rte_tep_capabilities_get(uint16_t port_id,
   struct rte_tep_capabilities *capabilities)


To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
enhanced to add a new flow item type. This contains a pointer to the
TEP context as well as the overlay flow id to which the traffic flow is
associated.

struct rte_flow_item_tep {
struct rte_tep *tep;
uint32_t flow_id;
}

Also 2 new generic actions types are added encapsulation and decapsulation.

RTE_FLOW_ACTION_TYPE_ENCAP
RTE_FLOW_ACTION_TYPE_DECAP

struct rte_flow_action_encap {
struct rte_flow_item *item;
}

struct rte_flow_action_decap {
struct rte_flow_item *item;
}

The following section outlines the intended usage of the new APIs and then how
they are combined with the existing rte_flow APIs.

Tunnel endpoints are created on logical ports which support the capability
using rte_tep_create() using a combination of TEP attributes and
rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined.
The attrs parameter sets the TEP type, and could be used for other possible
attributes.

struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN };

The values for the headers which make up the tunnel endpointr are then
defined using spec parameter in the rte flow items (IPv4, UDP and
VxLAN in this case)

struct rte_flow_item_ipv4 ipv4_item = {
.hdr = { .src_addr = saddr, .dst_addr = daddr }
};

struct rte_flow_item_udp udp_item = {
.hdr = { .src_port = sport, .dst_port = dport }
};

struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags };

struct rte_flow_item pattern[] = {
{ .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
{ .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item },
{ .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item },
{ .type = RTE_FLOW_ITEM_TYPE_END }
};

The tunnel endpoint can then be create on the port. Whether or not any hw
configuration is required at this point would be hw dependent, but if not
the context for the TEP is available for use in programming flow, so the
application is not forced to redefine the TEP parameters on each flow
addition.

struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);

Once the tep context is created flows can then be directed to that endpoint for
processing. The following sections will outline how the author envisage flow
programming will work and also how TEP 

Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2017-12-24 Thread Shahaf Shuler
Hi Declan,

Friday, December 22, 2017 12:21 AM, Doherty, Declan:
> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
> when used in conjunction with rte_flow enables the configuration of inline
> data path encapsulation and decapsulation of tunnel endpoint network
> overlays on accelerated IO devices.
> 
> The proposed new API would provide for the creation, destruction, and
> monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
> to allow the acceleration features to be discovered by applications.
> 
> /** Tunnel Endpoint context, opaque structure */ struct rte_tep;
> 
> enum rte_tep_type {
>RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
>RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */
>...
> };
> 
> /** Tunnel Endpoint Attributes */
> struct rte_tep_attr {
>enum rte_type_type type;
> 
>/* other endpoint attributes here */ }
> 
> /**
> * Create a tunnel end-point context as specified by the flow attribute and
> pattern
> *
> * @param   port_id Port identifier of Ethernet device.
> * @param   attrFlow rule attributes.
> * @param   pattern Pattern specification by list of rte_flow_items.
> * @return
> *  - On success returns pointer to TEP context
> *  - On failure returns NULL
> */
> struct rte_tep *rte_tep_create(uint16_t port_id,
>   struct rte_tep_attr *attr, struct rte_flow_item 
> pattern[])
> 
> /**
> * Destroy an existing tunnel end-point context. All the end-points context
> * will be destroyed, so all active flows using tep should be freed before
> * destroying context.
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)
> 
> /**
> * Get tunnel endpoint statistics
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   tepTunnel endpoint context
> * @param   stats  Tunnel endpoint statistics
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> Int
> rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
>   struct rte_tep_stats *stats)
> 
> /**
> * Get ports tunnel endpoint capabilities
> *
> * @param   port_idPort identifier of Ethernet device.
> * @param   capabilitiesTunnel endpoint capabilities
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int
> rte_tep_capabilities_get(uint16_t port_id,
>   struct rte_tep_capabilities *capabilities)


Am not sure I understand why there is a need for the above control methods. 
Are you introducing a new "tep device" ? 
As the tunnel endpoint is sending and receiving Ethernet packets from the 
network I think it should still be counted as Ethernet device but with more 
capabilities (for example it supported encap/decap etc..), therefore it should 
use the Ethdev layer API to query statistics (for example). 
As for the capabilities - what specifically you had in mind? The current usage 
you show with tep is with rte_flow rules. There are no capabilities currently 
for rte_flow supported actions/pattern. To check such capabilities application 
uses rte_flow_validate. 
Regarding the creation/destroy of tep. Why not simply use rte_flow API and 
avoid this extra control? 
For example - with 17.11 APIs, application can put the port in isolate mode, 
and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some 
queue/do RSS. Such operation, per my understanding, will create a tunnel 
endpoint. What are the down sides of doing it with the current APIs? 

> 
> 
> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
> enhanced to add a new flow item type. This contains a pointer to the TEP
> context as well as the overlay flow id to which the traffic flow is 
> associated.
> 
> struct rte_flow_item_tep {
>struct rte_tep *tep;
>uint32_t flow_id;
> }

Can you provide more detailed definition about the flow id ? to which field 
from the packet headers it refers to? 
On your below examples it looks like it is to match the VXLAN vni in case of 
VXLAN, what about the other protocols? And also, why not using the already 
exists VXLAN item? 

Generally I like the idea of separating the encap/decap context from the 
action. However looks like the rte_flow_item has double meaning on this RFC, 
once for the classification and once for the action.
>From the top of my head I would think of an API which separate those, and 
>re-use the existing flow items. Something like:

 struct rte_flow_item pattern[] = {
{ set of already exists pattern  },
{ ... },
{ .type = RTE_FLOW_ITEM_TYPE_END } };

encap_ctx = create_enacap_context(pattern)

rte_flow_action act