Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On Mon, Feb 26, 2018 at 05:44:01PM +, Doherty, Declan wrote: > On 13/02/2018 5:05 PM, Adrien Mazarguil wrote: > > Hi, > > > > Apologies for being late to this thread, I've read the ensuing discussion > > (hope I didn't miss any) and also think rte_flow could be improved in > > several ways to enable TEP support, in particular regarding the ordering of > > actions. > > > > On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm > > not convinced rte_security chose the right path and would like to avoid > > repeating the same mistakes if possible, more below. > > > > On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote: > > > This RFC contains a proposal to add a new tunnel endpoint API to DPDK > > > that when used > > > in conjunction with rte_flow enables the configuration of inline data > > > path encapsulation > > > and decapsulation of tunnel endpoint network overlays on accelerated IO > > > devices. > > > > > > The proposed new API would provide for the creation, destruction, and > > > monitoring of a tunnel endpoint in supporting hw, as well as capabilities > > > APIs to allow the > > > acceleration features to be discovered by applications. > > Although I'm not convinced an opaque object is the right approach, if we > > choose this route I suggest the much simpler: > > > > struct rte_flow_action_tep_(encap|decap) { > > struct rte_tep *tep; > > uint32_t flow_id; > > }; > > > > That's a fair point, the only other action that we currently had the > encap/decap actions supporting was the Ethernet item, and going back to a > comment from Boris having the Ethernet header separate from the tunnel is > probably not ideal anyway. As one of our reasons for using an opaque tep > item was to allow modification of the TEP independently of all the flows > being carried on it. So for instance if the src or dst MAC needs to be > modified or the output port needs to changed, the TEP itself could be > modified. Makes sense. I think there's now consensus that without a dedicated API, it can be done through multiple rte_flow groups and "jump" actions targeting them. Such actions remain to be formally defined though. In the meantime there is an alternative approach when opaque pattern items/actions are unavoidable: by using negative values [1]. In addition to an opaque object to use with rte_flow, a PMD could return a PMD-specific negative value cast as enum rte_flow_{item,action}_type and usable with the associated port ID only. An API could even initialize a pattern item or an action object directly: struct rte_flow_action tep_action; if (rte_tep_create(port_id, &tep_action, ...) != 0) rte_panic("no!"); /* * tep_action is now initialized with an opaque type and conf pointer, it * can be used with rte_flow_create() as part of an action list. */ [1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#negative-types > > > struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern); > > > > > > Once the tep context is created flows can then be directed to that > > > endpoint for > > > processing. The following sections will outline how the author envisage > > > flow > > > programming will work and also how TEP acceleration can be combined with > > > other > > > accelerations. > > > > In order to allow a single TEP context object to be shared by multiple flow > > rules, a whole new API must be implemented and applications still have to > > additionally create one rte_flow rule per TEP flow_id to manage. While this > > probably results in shorter flow rule patterns and action lists, is it > > really worth it? > > > > While I understand the reasons for this approach, I'd like to push for a > > rte_flow-only API as much as possible, I'll provide suggestions below. > > > > Not only are the rules shorter to implement, it could help to greatly > reduces the amount of cycles required to add flows, both in terms of the > application marshaling the data in rte_flow patterns and the PMD parsing > that those patterns every time a flow is added, in the case where 10k's of > flow are getting added per second this could add a significant overhead on > the system. True, although only if the underlying hardware supports it; some PMDs may still have to update each flow rule independently in order to expose such an API. Applications can't be certain an update operation will be quick and atomic. > > > /** VERY IMPORTANT NOTE **/ > > > One of the core concepts of this proposal is that actions which modify the > > > packet are defined in the order which they are to be processed. So first > > > decap > > > outer ethernet header, then the outer TEP headers. > > > I think this is not only logical from a usability point of view, it > > > should also > > > simplify the logic required in PMDs to parse the desired actions. > > > > This. I've been thinking about it for a very long time but never got around > > submit a patch. Handling rte_flow
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On 13/02/2018 5:05 PM, Adrien Mazarguil wrote: Hi, Apologies for being late to this thread, I've read the ensuing discussion (hope I didn't miss any) and also think rte_flow could be improved in several ways to enable TEP support, in particular regarding the ordering of actions. On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm not convinced rte_security chose the right path and would like to avoid repeating the same mistakes if possible, more below. On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. /** Tunnel Endpoint context, opaque structure */ struct rte_tep; enum rte_tep_type { RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ ... }; /** Tunnel Endpoint Attributes */ struct rte_tep_attr { enum rte_type_type type; /* other endpoint attributes here */ } /** * Create a tunnel end-point context as specified by the flow attribute and pattern * * @param port_id Port identifier of Ethernet device. * @param attrFlow rule attributes. * @param pattern Pattern specification by list of rte_flow_items. * @return * - On success returns pointer to TEP context * - On failure returns NULL */ struct rte_tep *rte_tep_create(uint16_t port_id, struct rte_tep_attr *attr, struct rte_flow_item pattern[]) /** * Destroy an existing tunnel end-point context. All the end-points context * will be destroyed, so all active flows using tep should be freed before * destroying context. * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) /** * Get tunnel endpoint statistics * * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @param stats Tunnel endpoint statistics * * @return * - On success returns 0 * - On failure returns 1 */ Int rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, struct rte_tep_stats *stats) /** * Get ports tunnel endpoint capabilities * * @param port_idPort identifier of Ethernet device. * @param capabilitiesTunnel endpoint capabilities * * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_capabilities_get(uint16_t port_id, struct rte_tep_capabilities *capabilities) To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } What I dislike is rte_flow item/actions relying on externally-generated opaque objects when these can be avoided, as it means yet another API applications have to deal with and PMDs need to implement; this adds a layer of inefficiency in my opinion. I believe TEP can be fully implemented through a combination of new rte_flow pattern items/actions without involving external API calls. More on that later. Also 2 new generic actions types are added encapsulation and decapsulation. RTE_FLOW_ACTION_TYPE_ENCAP RTE_FLOW_ACTION_TYPE_DECAP struct rte_flow_action_encap { struct rte_flow_item *item; } struct rte_flow_action_decap { struct rte_flow_item *item; } Encap/decap actions are definitely needed and useful, no question about that. I'm unsure about doing so through a generic action with the described structures instead of dedicated ones though. These can't work with anything other than rte_flow_item_tep; a special pattern item using some kind of opaque object is needed (e.g. using rte_flow_item_tcp makes no sense with them). Also struct rte_flow_item is tailored for flow rule patterns, using it with actions is not only confusing, it makes its "mask" and "last" members useless and inconsistent with their documentation. Although I'm not convinced an opaque object is the right approach, if we choose this route I suggest the much simpler: struct rte_flow_action_tep_(encap|decap) { struct rte_tep *tep; uint32_t flow_id; }; That's a fair point, the only other action that we currently had the encap/decap actions supporting was the
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi, Apologies for being late to this thread, I've read the ensuing discussion (hope I didn't miss any) and also think rte_flow could be improved in several ways to enable TEP support, in particular regarding the ordering of actions. On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm not convinced rte_security chose the right path and would like to avoid repeating the same mistakes if possible, more below. On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote: > This RFC contains a proposal to add a new tunnel endpoint API to DPDK that > when used > in conjunction with rte_flow enables the configuration of inline data path > encapsulation > and decapsulation of tunnel endpoint network overlays on accelerated IO > devices. > > The proposed new API would provide for the creation, destruction, and > monitoring of a tunnel endpoint in supporting hw, as well as capabilities > APIs to allow the > acceleration features to be discovered by applications. > > /** Tunnel Endpoint context, opaque structure */ > struct rte_tep; > > enum rte_tep_type { >RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ >RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ >... > }; > > /** Tunnel Endpoint Attributes */ > struct rte_tep_attr { >enum rte_type_type type; > >/* other endpoint attributes here */ > } > > /** > * Create a tunnel end-point context as specified by the flow attribute and > pattern > * > * @param port_id Port identifier of Ethernet device. > * @param attrFlow rule attributes. > * @param pattern Pattern specification by list of rte_flow_items. > * @return > * - On success returns pointer to TEP context > * - On failure returns NULL > */ > struct rte_tep *rte_tep_create(uint16_t port_id, > struct rte_tep_attr *attr, struct rte_flow_item > pattern[]) > > /** > * Destroy an existing tunnel end-point context. All the end-points context > * will be destroyed, so all active flows using tep should be freed before > * destroying context. > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) > > /** > * Get tunnel endpoint statistics > * > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @param stats Tunnel endpoint statistics > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > Int > rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, > struct rte_tep_stats *stats) > > /** > * Get ports tunnel endpoint capabilities > * > * @param port_idPort identifier of Ethernet device. > * @param capabilitiesTunnel endpoint capabilities > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int > rte_tep_capabilities_get(uint16_t port_id, > struct rte_tep_capabilities *capabilities) > > > To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is > enhanced to add a new flow item type. This contains a pointer to the > TEP context as well as the overlay flow id to which the traffic flow is > associated. > > struct rte_flow_item_tep { >struct rte_tep *tep; >uint32_t flow_id; > } What I dislike is rte_flow item/actions relying on externally-generated opaque objects when these can be avoided, as it means yet another API applications have to deal with and PMDs need to implement; this adds a layer of inefficiency in my opinion. I believe TEP can be fully implemented through a combination of new rte_flow pattern items/actions without involving external API calls. More on that later. > Also 2 new generic actions types are added encapsulation and decapsulation. > > RTE_FLOW_ACTION_TYPE_ENCAP > RTE_FLOW_ACTION_TYPE_DECAP > > struct rte_flow_action_encap { >struct rte_flow_item *item; > } > > struct rte_flow_action_decap { >struct rte_flow_item *item; > } Encap/decap actions are definitely needed and useful, no question about that. I'm unsure about doing so through a generic action with the described structures instead of dedicated ones though. These can't work with anything other than rte_flow_item_tep; a special pattern item using some kind of opaque object is needed (e.g. using rte_flow_item_tcp makes no sense with them). Also struct rte_flow_item is tailored for flow rule patterns, using it with actions is not only confusing, it makes its "mask" and "last" members useless and inconsistent with their documentation. Although I'm not convinced an opaque object is the right approach, if we choose this route I suggest the much simpler: struct rte_flow_action_tep_(encap|decap) { struct rte_tep *
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi Declan, sorry for the late response. Tuesday, January 23, 2018 5:36 PM, Doherty, Declan: > > If I get it right, the API proposed here is to have a tunnel endpoint which > > is > a logical port on top of ethdev port. the TEP is able to receive and monitor > some specific tunneled traffic, for example VXLAN, GENEVE and more. > > For example, VXLAN TEP can have multiple flows with different VNIs all > under the same context. > > > > Now, with the current rte_flow APIs, we can do exactly the same and give > the application the full flexibility to group the tunnel flows into logical > TEP. > > On this suggestion application will: > > 1. Create rte_flow rules for the pattern it want to receive. > > 2. In case it is interested in counting, a COUNT action will be added to the > flow. > > 3. In case header manipulation is required, a DECAP/ENCAP/REWRITE > action will be added to the flow. > > 4. Grouping of flows into a logical TEP will be done on the application > > layer > simply by keeping the relevant rte_flow rules in some dedicated struct. With > it, create/destroy TEP can be translated to create/destroy the flow rules. > Statistics query can be done be querying each flow count and sum. Note that > some devices can support the same counter for multiple flows. Even though > it is not yet exposed in rte_flow this can be an interesting optimization. > > As I responsed in John's mail I think this approach fails in devices which > support switching offload also. As the flows never hit the host application > configuring the TEP and flows there is no easy way to sum those statistics, Devices which supports switching offloads must use NIC support to count the flows. It can be either by associating count action with a flow or by using TEP in your proposal. The TEP counting could be introduced in another way - instead of having 1:1 relation between flow counter and rte_flow, to introduce a counter element which can be attached to multiple flows. So this counter element along with the rte_flows it is associate with are basically the TEP: 1. it holds the sum of statistics from all the TEP flows it is associate with. 2. it holds the receive pattern My point is, I don't think it is correct to bound between the TEP and the switching offloads actions (encap/decap/rewrite on this context). The TEP can be presented as auxiliary library/API to help with the flows grouping, however application still need to have the ability to make the switch offloads control as it wish. > also flows are transitory in terms of runtime so it would not be possible to > keep accurate statistics over a period of time. Am not sure I understand what you mean here. In order to receive traffic you need flows. Even the default RSS configuration of the PMD can be described by rte_flows. So as long as one receive traffic it has one/more flows configured on the device. > > > > > >>> > As for the capabilities - what specifically you had in mind? The > current > >>> usage you show with tep is with rte_flow rules. There are no > >>> capabilities currently for rte_flow supported actions/pattern. To > >>> check such capabilities application uses rte_flow_validate. > >>> > >>> I envisaged that the application should be able to see if an ethdev > >>> can support TEP in the rx/tx offloads, and then the > >>> rte_tep_capabilities would allow applications to query what tunnel > >>> endpoint protocols are supported etc. I would like a simple > >>> mechanism to allow users to see if a particular tunnel endpoint type > >>> is supported without having to build actual flows to validate. > >> > >> I can see the value of that, but in the end wouldn't the API call > >> rte_flow_validate anyways? Maybe we don't add the layer now or maybe > >> it doesn't really belong in DPDK? I'm in favor of deferring the > >> capabilities API until we know it's really needed. I hate to see > >> special capabilities APIs start sneaking in after we decided to go > >> the rte_flow_validate route and users are starting to get used to it. > > > > I don't see how it is different from any other rte_flow creation. > > We don't hold caps for device ability to filter packets according to VXLAN > > or > GENEVE items. Why we should start now? > > I don't know, possibly if it makes adoption of the features easier for the end > user. > > > > > We have already the rte_flow_veirfy. I think part of the reasons for it was > that the number of different capabilities possible with rte_flow is huge. I > think this also the case with the TEP capabilities (even though It is still > not > clear to me what exactly they will include). > > It may be that only need advertise that we are capable of encap/decap > services, but it would be good to have input from downstream users what > they would like to see. > > > > >>> > Regarding the creation/destroy of tep. Why not simply use rte_flow > API > >>> and avoid this extra control? > For example - with 17.11
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On 11/01/2018 9:44 PM, John Daley (johndale) wrote: Hi, One comment on DECAP action and a "feature request". I'll also reply to the top of thread discussion separately. Thanks for the RFC Declan! Feature request associated with ENCAP action: VPP (and probably other apps) would like the ability to simply specify an independent tunnel ID as part of egress match criteria in an rte_flow rule. Then egress packets could specify a tunnel ID and valid flag in the mbuf. If it matched the rte_flow tunnel ID item, a simple lookup in the nic could be done and the associated actions (particularly ENCAP) executed. The application already know the tunnel that the packet is associated with so no need to have the nic do matching on a header pattern. Plus it's possible that packet headers alone are not enough to determine the correct encap action (the bridge where the packet came from might be required). This would require a new mbuf field to specify the tunnel ID (maybe in tx_offload) and a valid flag. It would also require a new rte flow item type for matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID). Is something like this being considered by others? If not, should it be part of this RFC or a new one? I think this would be the 1st meta-data match criteria in rte_flow, but I could see others following. This sounds similar to what we needed to do in rte_security to support metadata for inline crypto on the ixgbe. I wasn't aware of devices which supported this type of function for overlaps, but it definitely sounds like we need to consider it here. -johnd -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan Sent: Thursday, December 21, 2017 2:21 PM To: dev@dpdk.org Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. /** Tunnel Endpoint context, opaque structure */ struct rte_tep; enum rte_tep_type { RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ ... }; /** Tunnel Endpoint Attributes */ struct rte_tep_attr { enum rte_type_type type; /* other endpoint attributes here */ } /** * Create a tunnel end-point context as specified by the flow attribute and pattern * * @param port_id Port identifier of Ethernet device. * @param attrFlow rule attributes. * @param pattern Pattern specification by list of rte_flow_items. * @return * - On success returns pointer to TEP context * - On failure returns NULL */ struct rte_tep *rte_tep_create(uint16_t port_id, struct rte_tep_attr *attr, struct rte_flow_item pattern[]) /** * Destroy an existing tunnel end-point context. All the end-points context * will be destroyed, so all active flows using tep should be freed before * destroying context. * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) /** * Get tunnel endpoint statistics * * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @param stats Tunnel endpoint statistics * * @return * - On success returns 0 * - On failure returns 1 */ Int rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, struct rte_tep_stats *stats) /** * Get ports tunnel endpoint capabilities * * @param port_idPort identifier of Ethernet device. * @param capabilitiesTunnel endpoint capabilities * * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_capabilities_get(uint16_t port_id, struct rte_tep_capabilities *capabilities) To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } Also 2 new generic actions types are added encapsulation and decapsulation. RTE_FLOW_ACTION_TYPE_ENCAP RTE_FLOW_ACTION_TYPE_DECAP struct rte_flow_action_encap { struct rte_flow_item *item; } struct rte_flow_action_decap { struct rte_flow_item *item; } The following section outlines the intended
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On 16/01/2018 8:22 AM, Shahaf Shuler wrote: Thursday, January 11, 2018 11:45 PM, John Daley: Hi Declan and Shahaf, I can't see how the existing ethdev API could be used for statistics as a single ethdev could be supporting may concurrent TEPs, therefore we would either need to use the extended stats with many entries, one for each TEP, or if we treat a TEP as an attribute of a port in a similar manner to the way rte_security manages an IPsec SA, the state of each TEP can be monitored and managed independently of both the overall port or the flows being transported on that endpoint. Assuming we can define one rte_flow rule per TEP, does what you propose give us anything more than just using the COUNT action? I agree with John here, and I also not sure we need such assumption. If I get it right, the API proposed here is to have a tunnel endpoint which is a logical port on top of ethdev port. the TEP is able to receive and monitor some specific tunneled traffic, for example VXLAN, GENEVE and more. For example, VXLAN TEP can have multiple flows with different VNIs all under the same context. Now, with the current rte_flow APIs, we can do exactly the same and give the application the full flexibility to group the tunnel flows into logical TEP. On this suggestion application will: 1. Create rte_flow rules for the pattern it want to receive. 2. In case it is interested in counting, a COUNT action will be added to the flow. 3. In case header manipulation is required, a DECAP/ENCAP/REWRITE action will be added to the flow. 4. Grouping of flows into a logical TEP will be done on the application layer simply by keeping the relevant rte_flow rules in some dedicated struct. With it, create/destroy TEP can be translated to create/destroy the flow rules. Statistics query can be done be querying each flow count and sum. Note that some devices can support the same counter for multiple flows. Even though it is not yet exposed in rte_flow this can be an interesting optimization. As I responsed in John's mail I think this approach fails in devices which support switching offload also. As the flows never hit the host application configuring the TEP and flows there is no easy way to sum those statistics, also flows are transitory in terms of runtime so it would not be possible to keep accurate statistics over a period of time. As for the capabilities - what specifically you had in mind? The current usage you show with tep is with rte_flow rules. There are no capabilities currently for rte_flow supported actions/pattern. To check such capabilities application uses rte_flow_validate. I envisaged that the application should be able to see if an ethdev can support TEP in the rx/tx offloads, and then the rte_tep_capabilities would allow applications to query what tunnel endpoint protocols are supported etc. I would like a simple mechanism to allow users to see if a particular tunnel endpoint type is supported without having to build actual flows to validate. I can see the value of that, but in the end wouldn't the API call rte_flow_validate anyways? Maybe we don't add the layer now or maybe it doesn't really belong in DPDK? I'm in favor of deferring the capabilities API until we know it's really needed. I hate to see special capabilities APIs start sneaking in after we decided to go the rte_flow_validate route and users are starting to get used to it. I don't see how it is different from any other rte_flow creation. We don't hold caps for device ability to filter packets according to VXLAN or GENEVE items. Why we should start now? I don't know, possibly if it makes adoption of the features easier for the end user. We have already the rte_flow_veirfy. I think part of the reasons for it was that the number of different capabilities possible with rte_flow is huge. I think this also the case with the TEP capabilities (even though It is still not clear to me what exactly they will include). It may be that only need advertise that we are capable of encap/decap services, but it would be good to have input from downstream users what they would like to see. Regarding the creation/destroy of tep. Why not simply use rte_flow API and avoid this extra control? For example - with 17.11 APIs, application can put the port in isolate mode, and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some queue/do RSS. Such operation, per my understanding, will create a tunnel endpoint. What are the down sides of doing it with the current APIs? That doesn't enable encapsulation and decapsulation of the outer tunnel endpoint in the hw as far as I know. Apart from the inability to monitor the endpoint statistics I mentioned above. It would also require that you redefine the endpoints parameters ever time to you wish to add a new flow to it. I think the having the rte_tep object semantics should also simplify the ability to enable a full vswitch offload of TEP where the hw i
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On 11/01/2018 9:45 PM, John Daley (johndale) wrote: Hi Declan and Shahaf, -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan Sent: Tuesday, January 09, 2018 9:31 AM To: Shahaf Shuler ; dev@dpdk.org Subject: Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement On 24/12/2017 5:30 PM, Shahaf Shuler wrote: Hi Declan, Hey Shahaf, apologies for the delay in responding, I have been out of office for the last 2 weeks. Friday, December 22, 2017 12:21 AM, Doherty, Declan: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. Am not sure I understand why there is a need for the above control methods. Are you introducing a new "tep device" ? > As the tunnel endpoint is sending and receiving Ethernet packets from the network I think it should still be counted as Ethernet device but with more capabilities (for example it supported encap/decap etc..), therefore it should use the Ethdev layer API to query statistics (for example). No, the new APIs are only intended to be a method of creating, monitoring and deleting tunnel-endpoints on an existing ethdev. The rationale for APIs separate to rte_flow are the same as that in the rte_security, there is not a 1:1 mapping of TEPs to flows. Many flows (VNI's in VxLAN for example) can be originate/terminate on the same TEP, therefore managing the TEP independently of the flows being transmitted on it is important to allow visibility of that endpoint stats for example. I don't quite understand what you mean by tunnel and flow here. Can you define exactly what you mean? Flow is an overloaded word in our world. I think that defining it will make understanding the RFC a little easier. Hey John, I think that's a good idea, for me the tunnel endpoint defines the l3/l4 parameters of the endpoint, so for VxLAN over IPv4 this would include the IPv4, UDP and VxLAN headers excluding the VNI(flow id). I'm not sure if it makes more sense that each TEP contains the VNI(flow id) or not. I believe the model currently used by OvS today is similar to the RFC it that many VNIs can be terminated in the same TEP port context. From terms of flows definitions, for encapsulated ingress I would see the definition of a flow to include the l2 and l3/l4 headers of the outer including the flow id of the tunnel and optionally include any or all of the inner headers. For non-encapsulated egress traffic the flow defines any combination of the l2, l3, l4 headers as defined by the user. Taking VxLAN, I think of the tunnel as including up through the VxLAN header, including the VNI. If you go by this definition, I would consider a flow to be all packets with the same VNI and the same 5-tuple hash of the inner packet. Is this what you mean by tunnel (or TEP) and flow here? Yes, with the exception that I had excluded the or flow id from the TEP definition and it was part of the flow but otherwise essentially yes. With these definitions, VPP for example might need up to a couple thousand TEPs on an interface and each TEP could have hundreds or thousands of flows. It would be quite possible to have 1 rte flow rule per TEP (or 2- ingress/decap and egress/encap). The COUNT action could be used to count the number of packets through each TEP. Is this adequate, or are you proposing that we need a mechanism to get stats of flows within each TEP? Is that the main point of the API? Assuming no need for stats on a per TEP/flow basis is there anything else the API adds? Yes the basis of having TEP as separate API is to allow flows to tracked independently of the overlay they may be transported on. I believe this will be a requirement for acceleration of any vswitch, as we could have a case that flows are bypassing the host vswitch completely and encap/decap and switched in hw directly to/from the guest to physical port. OvS currently can track both flows and TEP statistics and I think we need to support this model. I can't see how the existing ethdev API could be used for statistics as a single ethdev could be supporting may concurrent TEPs, therefore we would either need to use the extended stats with many entries, one for each TEP, or if we treat a TEP as an attribute of a port in a similar manner to the way rte_security manages an IPsec SA, the state of each TEP can be monitored and managed independently of both the overall port or the flows being transported on that endpoint. Assuming we can define one rte_flow rule per TEP, does what you pr
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Thursday, January 11, 2018 11:45 PM, John Daley: > Hi Declan and Shahaf, > > > I can't see how the existing > > ethdev API could be used for statistics as a single ethdev could be > > supporting may concurrent TEPs, therefore we would either need to use > > the extended stats with many entries, one for each TEP, or if we treat > > a TEP as an attribute of a port in a similar manner to the way > > rte_security manages an IPsec SA, the state of each TEP can be > > monitored and managed independently of both the overall port or the > flows being transported on that endpoint. > > Assuming we can define one rte_flow rule per TEP, does what you propose > give us anything more than just using the COUNT action? I agree with John here, and I also not sure we need such assumption. If I get it right, the API proposed here is to have a tunnel endpoint which is a logical port on top of ethdev port. the TEP is able to receive and monitor some specific tunneled traffic, for example VXLAN, GENEVE and more. For example, VXLAN TEP can have multiple flows with different VNIs all under the same context. Now, with the current rte_flow APIs, we can do exactly the same and give the application the full flexibility to group the tunnel flows into logical TEP. On this suggestion application will: 1. Create rte_flow rules for the pattern it want to receive. 2. In case it is interested in counting, a COUNT action will be added to the flow. 3. In case header manipulation is required, a DECAP/ENCAP/REWRITE action will be added to the flow. 4. Grouping of flows into a logical TEP will be done on the application layer simply by keeping the relevant rte_flow rules in some dedicated struct. With it, create/destroy TEP can be translated to create/destroy the flow rules. Statistics query can be done be querying each flow count and sum. Note that some devices can support the same counter for multiple flows. Even though it is not yet exposed in rte_flow this can be an interesting optimization. > > > > > As for the capabilities - what specifically you had in mind? The > > > current > > usage you show with tep is with rte_flow rules. There are no > > capabilities currently for rte_flow supported actions/pattern. To > > check such capabilities application uses rte_flow_validate. > > > > I envisaged that the application should be able to see if an ethdev > > can support TEP in the rx/tx offloads, and then the > > rte_tep_capabilities would allow applications to query what tunnel > > endpoint protocols are supported etc. I would like a simple mechanism > > to allow users to see if a particular tunnel endpoint type is > > supported without having to build actual flows to validate. > > I can see the value of that, but in the end wouldn't the API call > rte_flow_validate anyways? Maybe we don't add the layer now or maybe it > doesn't really belong in DPDK? I'm in favor of deferring the capabilities API > until we know it's really needed. I hate to see special capabilities APIs > start > sneaking in after we decided to go the rte_flow_validate route and users are > starting to get used to it. I don't see how it is different from any other rte_flow creation. We don't hold caps for device ability to filter packets according to VXLAN or GENEVE items. Why we should start now? We have already the rte_flow_veirfy. I think part of the reasons for it was that the number of different capabilities possible with rte_flow is huge. I think this also the case with the TEP capabilities (even though It is still not clear to me what exactly they will include). > > > > > Regarding the creation/destroy of tep. Why not simply use rte_flow > > > API > > and avoid this extra control? > > > For example - with 17.11 APIs, application can put the port in > > > isolate mode, > > and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to > > some queue/do RSS. Such operation, per my understanding, will create a > > tunnel endpoint. What are the down sides of doing it with the current > APIs? > > > > That doesn't enable encapsulation and decapsulation of the outer > > tunnel endpoint in the hw as far as I know. Apart from the inability > > to monitor the endpoint statistics I mentioned above. It would also > > require that you redefine the endpoints parameters ever time to you > > wish to add a new flow to it. I think the having the rte_tep object > > semantics should also simplify the ability to enable a full vswitch > > offload of TEP where the hw is handling both encap/decap and switching to > a particular port. > > If we have the ingress/decap and egress/encap actions and 1 rte_flow rule > per TEP and use the COUNT action, I think we get all but the last bit. For > that, > perhaps the application could keep ingress and egress rte_flow template for > each tunnel type (VxLAN, GRE, ..). Then copying the template and filling in > the outer packet info and tunnel Id is all that would be required. We could > also define these in rte_fl
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi Declan and Shahaf, > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan > Sent: Tuesday, January 09, 2018 9:31 AM > To: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement > > On 24/12/2017 5:30 PM, Shahaf Shuler wrote: > > Hi Declan, > > > > Hey Shahaf, apologies for the delay in responding, I have been out of office > for the last 2 weeks. > > > Friday, December 22, 2017 12:21 AM, Doherty, Declan: > >> This RFC contains a proposal to add a new tunnel endpoint API to DPDK > >> that when used in conjunction with rte_flow enables the configuration > >> of inline data path encapsulation and decapsulation of tunnel > >> endpoint network overlays on accelerated IO devices. > >> > >> The proposed new API would provide for the creation, destruction, and > >> monitoring of a tunnel endpoint in supporting hw, as well as > >> capabilities APIs to allow the acceleration features to be discovered by > applications. > >> > > > > > > > Am not sure I understand why there is a need for the above control > methods. > > Are you introducing a new "tep device" ? > As the tunnel endpoint is > > sending and receiving Ethernet packets from > the network I think it should still be counted as Ethernet device but with > more capabilities (for example it supported encap/decap etc..), therefore it > should use the Ethdev layer API to query statistics (for example). > > No, the new APIs are only intended to be a method of creating, monitoring > and deleting tunnel-endpoints on an existing ethdev. The rationale for APIs > separate to rte_flow are the same as that in the rte_security, there is not a > 1:1 mapping of TEPs to flows. Many flows (VNI's in VxLAN for example) can > be originate/terminate on the same TEP, therefore managing the TEP > independently of the flows being transmitted on it is important to allow > visibility of that endpoint stats for example. I don't quite understand what you mean by tunnel and flow here. Can you define exactly what you mean? Flow is an overloaded word in our world. I think that defining it will make understanding the RFC a little easier. Taking VxLAN, I think of the tunnel as including up through the VxLAN header, including the VNI. If you go by this definition, I would consider a flow to be all packets with the same VNI and the same 5-tuple hash of the inner packet. Is this what you mean by tunnel (or TEP) and flow here? With these definitions, VPP for example might need up to a couple thousand TEPs on an interface and each TEP could have hundreds or thousands of flows. It would be quite possible to have 1 rte flow rule per TEP (or 2- ingress/decap and egress/encap). The COUNT action could be used to count the number of packets through each TEP. Is this adequate, or are you proposing that we need a mechanism to get stats of flows within each TEP? Is that the main point of the API? Assuming no need for stats on a per TEP/flow basis is there anything else the API adds? > I can't see how the existing > ethdev API could be used for statistics as a single ethdev could be supporting > may concurrent TEPs, therefore we would either need to use the extended > stats with many entries, one for each TEP, or if we treat a TEP as an > attribute > of a port in a similar manner to the way rte_security manages an IPsec SA, > the state of each TEP can be monitored and managed independently of both > the overall port or the flows being transported on that endpoint. Assuming we can define one rte_flow rule per TEP, does what you propose give us anything more than just using the COUNT action? > > > As for the capabilities - what specifically you had in mind? The current > usage you show with tep is with rte_flow rules. There are no capabilities > currently for rte_flow supported actions/pattern. To check such capabilities > application uses rte_flow_validate. > > I envisaged that the application should be able to see if an ethdev can > support TEP in the rx/tx offloads, and then the rte_tep_capabilities would > allow applications to query what tunnel endpoint protocols are supported > etc. I would like a simple mechanism to allow users to see if a particular > tunnel endpoint type is supported without having to build actual flows to > validate. I can see the value of that, but in the end wouldn't the API call rte_flow_validate anyways? Maybe we don't add the layer now or maybe it doesn't really belong in DPDK? I'm in favor of deferring the capabilities API until we know it's really needed. I hate to see special capabilities APIs start sneaking in after we decid
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi, One comment on DECAP action and a "feature request". I'll also reply to the top of thread discussion separately. Thanks for the RFC Declan! Feature request associated with ENCAP action: VPP (and probably other apps) would like the ability to simply specify an independent tunnel ID as part of egress match criteria in an rte_flow rule. Then egress packets could specify a tunnel ID and valid flag in the mbuf. If it matched the rte_flow tunnel ID item, a simple lookup in the nic could be done and the associated actions (particularly ENCAP) executed. The application already know the tunnel that the packet is associated with so no need to have the nic do matching on a header pattern. Plus it's possible that packet headers alone are not enough to determine the correct encap action (the bridge where the packet came from might be required). This would require a new mbuf field to specify the tunnel ID (maybe in tx_offload) and a valid flag. It would also require a new rte flow item type for matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID). Is something like this being considered by others? If not, should it be part of this RFC or a new one? I think this would be the 1st meta-data match criteria in rte_flow, but I could see others following. -johnd > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan > Sent: Thursday, December 21, 2017 2:21 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement > > This RFC contains a proposal to add a new tunnel endpoint API to DPDK that > when used in conjunction with rte_flow enables the configuration of inline > data path encapsulation and decapsulation of tunnel endpoint network > overlays on accelerated IO devices. > > The proposed new API would provide for the creation, destruction, and > monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs > to allow the acceleration features to be discovered by applications. > > /** Tunnel Endpoint context, opaque structure */ struct rte_tep; > > enum rte_tep_type { >RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ >RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ >... > }; > > /** Tunnel Endpoint Attributes */ > struct rte_tep_attr { >enum rte_type_type type; > >/* other endpoint attributes here */ } > > /** > * Create a tunnel end-point context as specified by the flow attribute and > pattern > * > * @param port_id Port identifier of Ethernet device. > * @param attrFlow rule attributes. > * @param pattern Pattern specification by list of rte_flow_items. > * @return > * - On success returns pointer to TEP context > * - On failure returns NULL > */ > struct rte_tep *rte_tep_create(uint16_t port_id, > struct rte_tep_attr *attr, struct rte_flow_item > pattern[]) > > /** > * Destroy an existing tunnel end-point context. All the end-points context > * will be destroyed, so all active flows using tep should be freed before > * destroying context. > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) > > /** > * Get tunnel endpoint statistics > * > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @param stats Tunnel endpoint statistics > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > Int > rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, > struct rte_tep_stats *stats) > > /** > * Get ports tunnel endpoint capabilities > * > * @param port_idPort identifier of Ethernet device. > * @param capabilitiesTunnel endpoint capabilities > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int > rte_tep_capabilities_get(uint16_t port_id, > struct rte_tep_capabilities *capabilities) > > > To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is > enhanced to add a new flow item type. This contains a pointer to the TEP > context as well as the overlay flow id to which the traffic flow is > associated. > > struct rte_flow_item_tep { >struct rte_tep *tep; >uint32_t flow_id; > } > > Also 2 new generic actions types are added encapsulation and decapsulation. > > RTE_FLOW_ACTION_TYPE_ENCAP > RTE_FLOW_ACTION_TYPE_DECAP > > struct rte_flow_action_encap { >struct rte_flow_item *item; } > > struct rte_flow_action_decap { >struct rte_flow_item *item; } > > The following section outlines the intended usage of the new APIs and then > how they are combined with the existing rte_flow APIs. > > Tunnel endp
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Adding discussion back to list. On 24/12/2017 2:18 PM, Boris Pismenny wrote: Hi Declan On 12/22/2017 12:21 AM, Doherty, Declan wrote: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. /** Tunnel Endpoint context, opaque structure */ struct rte_tep; enum rte_tep_type { RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ ... }; /** Tunnel Endpoint Attributes */ struct rte_tep_attr { enum rte_type_type type; /* other endpoint attributes here */ } /** * Create a tunnel end-point context as specified by the flow attribute and pattern * * @param port_id Port identifier of Ethernet device. * @param attr Flow rule attributes. * @param pattern Pattern specification by list of rte_flow_items. * @return * - On success returns pointer to TEP context * - On failure returns NULL */ struct rte_tep *rte_tep_create(uint16_t port_id, struct rte_tep_attr *attr, struct rte_flow_item pattern[]) /** * Destroy an existing tunnel end-point context. All the end-points context * will be destroyed, so all active flows using tep should be freed before * destroying context. * @param port_id Port identifier of Ethernet device. * @param tep Tunnel endpoint context * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) /** * Get tunnel endpoint statistics * * @param port_id Port identifier of Ethernet device. * @param tep Tunnel endpoint context * @param stats Tunnel endpoint statistics * * @return * - On success returns 0 * - On failure returns 1 */ Int rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, struct rte_tep_stats *stats) /** * Get ports tunnel endpoint capabilities * * @param port_id Port identifier of Ethernet device. * @param capabilities Tunnel endpoint capabilities * * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_capabilities_get(uint16_t port_id, struct rte_tep_capabilities *capabilities) To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } Also 2 new generic actions types are added encapsulation and decapsulation. RTE_FLOW_ACTION_TYPE_ENCAP RTE_FLOW_ACTION_TYPE_DECAP struct rte_flow_action_encap { struct rte_flow_item *item; } struct rte_flow_action_decap { struct rte_flow_item *item; } The following section outlines the intended usage of the new APIs and then how they are combined with the existing rte_flow APIs. Tunnel endpoints are created on logical ports which support the capability using rte_tep_create() using a combination of TEP attributes and rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined. The attrs parameter sets the TEP type, and could be used for other possible attributes. struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN }; The values for the headers which make up the tunnel endpointr are then defined using spec parameter in the rte flow items (IPv4, UDP and VxLAN in this case) struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr = saddr, .dst_addr = daddr } }; struct rte_flow_item_udp udp_item = { .hdr = { .src_port = sport, .dst_port = dport } }; struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags }; struct rte_flow_item pattern[] = { { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item }, { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item }, { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item }, { .type = RTE_FLOW_ITEM_TYPE_END } }; The tunnel endpoint can then be create on the port. Whether or not any hw configuration is required at this point would be hw dependent, but if not the context for the TEP is available for use in programming flow, so the application is not forced to redefine the TEP parameters on each flow addition. struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern); Once the tep context is created flows can then be directed to that endpoint for processing. The fol
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
On 24/12/2017 5:30 PM, Shahaf Shuler wrote: Hi Declan, Hey Shahaf, apologies for the delay in responding, I have been out of office for the last 2 weeks. Friday, December 22, 2017 12:21 AM, Doherty, Declan: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. Am not sure I understand why there is a need for the above control methods. Are you introducing a new "tep device" ? > As the tunnel endpoint is sending and receiving Ethernet packets from the network I think it should still be counted as Ethernet device but with more capabilities (for example it supported encap/decap etc..), therefore it should use the Ethdev layer API to query statistics (for example). No, the new APIs are only intended to be a method of creating, monitoring and deleting tunnel-endpoints on an existing ethdev. The rationale for APIs separate to rte_flow are the same as that in the rte_security, there is not a 1:1 mapping of TEPs to flows. Many flows (VNI's in VxLAN for example) can be originate/terminate on the same TEP, therefore managing the TEP independently of the flows being transmitted on it is important to allow visibility of that endpoint stats for example. I can't see how the existing ethdev API could be used for statistics as a single ethdev could be supporting may concurrent TEPs, therefore we would either need to use the extended stats with many entries, one for each TEP, or if we treat a TEP as an attribute of a port in a similar manner to the way rte_security manages an IPsec SA, the state of each TEP can be monitored and managed independently of both the overall port or the flows being transported on that endpoint. As for the capabilities - what specifically you had in mind? The current usage you show with tep is with rte_flow rules. There are no capabilities currently for rte_flow supported actions/pattern. To check such capabilities application uses rte_flow_validate. I envisaged that the application should be able to see if an ethdev can support TEP in the rx/tx offloads, and then the rte_tep_capabilities would allow applications to query what tunnel endpoint protocols are supported etc. I would like a simple mechanism to allow users to see if a particular tunnel endpoint type is supported without having to build actual flows to validate. Regarding the creation/destroy of tep. Why not simply use rte_flow API and avoid this extra control? For example - with 17.11 APIs, application can put the port in isolate mode, and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some queue/do RSS. Such operation, per my understanding, will create a tunnel endpoint. What are the down sides of doing it with the current APIs? That doesn't enable encapsulation and decapsulation of the outer tunnel endpoint in the hw as far as I know. Apart from the inability to monitor the endpoint statistics I mentioned above. It would also require that you redefine the endpoints parameters ever time to you wish to add a new flow to it. I think the having the rte_tep object semantics should also simplify the ability to enable a full vswitch offload of TEP where the hw is handling both encap/decap and switching to a particular port. To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } Can you provide more detailed definition about the flow id ? to which field from the packet headers it refers to? On your below examples it looks like it is to match the VXLAN vni in case of VXLAN, what about the other protocols? And also, why not using the already exists VXLAN item? I have only been looking initially at couple of the tunnel endpoint procotols, namely Geneve, NvGRE, and VxLAN, but the idea here is to allow the user to define the VNI in the case of Geneve and VxLAN and the VSID in the case of NvGRE on a per flow basis, as per my understanding these are used to identify the source/destination hosts on the overlay network independently from the endpoint there are transported across. The VxLAN item is used in the creation of the TEP object, using the TEP object just removes the need for the user to constantly redefine all the tunnel parameters and also I think dependent on the hw implementation it may simplify the drivers work if it know the exact endpoint the actio
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi Declan, On 12/22/2017 12:21 AM, Doherty, Declan wrote: This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used in conjunction with rte_flow enables the configuration of inline data path encapsulation and decapsulation of tunnel endpoint network overlays on accelerated IO devices. The proposed new API would provide for the creation, destruction, and monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the acceleration features to be discovered by applications. /** Tunnel Endpoint context, opaque structure */ struct rte_tep; enum rte_tep_type { RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ ... }; /** Tunnel Endpoint Attributes */ struct rte_tep_attr { enum rte_type_type type; /* other endpoint attributes here */ } /** * Create a tunnel end-point context as specified by the flow attribute and pattern * * @param port_id Port identifier of Ethernet device. * @param attrFlow rule attributes. * @param pattern Pattern specification by list of rte_flow_items. * @return * - On success returns pointer to TEP context * - On failure returns NULL */ struct rte_tep *rte_tep_create(uint16_t port_id, struct rte_tep_attr *attr, struct rte_flow_item pattern[]) /** * Destroy an existing tunnel end-point context. All the end-points context * will be destroyed, so all active flows using tep should be freed before * destroying context. * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) /** * Get tunnel endpoint statistics * * @param port_idPort identifier of Ethernet device. * @param tepTunnel endpoint context * @param stats Tunnel endpoint statistics * * @return * - On success returns 0 * - On failure returns 1 */ Int rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, struct rte_tep_stats *stats) /** * Get ports tunnel endpoint capabilities * * @param port_idPort identifier of Ethernet device. * @param capabilitiesTunnel endpoint capabilities * * @return * - On success returns 0 * - On failure returns 1 */ int rte_tep_capabilities_get(uint16_t port_id, struct rte_tep_capabilities *capabilities) To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is enhanced to add a new flow item type. This contains a pointer to the TEP context as well as the overlay flow id to which the traffic flow is associated. struct rte_flow_item_tep { struct rte_tep *tep; uint32_t flow_id; } Also 2 new generic actions types are added encapsulation and decapsulation. RTE_FLOW_ACTION_TYPE_ENCAP RTE_FLOW_ACTION_TYPE_DECAP struct rte_flow_action_encap { struct rte_flow_item *item; } struct rte_flow_action_decap { struct rte_flow_item *item; } The following section outlines the intended usage of the new APIs and then how they are combined with the existing rte_flow APIs. Tunnel endpoints are created on logical ports which support the capability using rte_tep_create() using a combination of TEP attributes and rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined. The attrs parameter sets the TEP type, and could be used for other possible attributes. struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN }; The values for the headers which make up the tunnel endpointr are then defined using spec parameter in the rte flow items (IPv4, UDP and VxLAN in this case) struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr = saddr, .dst_addr = daddr } }; struct rte_flow_item_udp udp_item = { .hdr = { .src_port = sport, .dst_port = dport } }; struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags }; struct rte_flow_item pattern[] = { { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item }, { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item }, { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item }, { .type = RTE_FLOW_ITEM_TYPE_END } }; The tunnel endpoint can then be create on the port. Whether or not any hw configuration is required at this point would be hw dependent, but if not the context for the TEP is available for use in programming flow, so the application is not forced to redefine the TEP parameters on each flow addition. struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern); Once the tep context is created flows can then be directed to that endpoint for processing. The following sections will outline how the author envisage flow programming will work and also how TEP
Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
Hi Declan, Friday, December 22, 2017 12:21 AM, Doherty, Declan: > This RFC contains a proposal to add a new tunnel endpoint API to DPDK that > when used in conjunction with rte_flow enables the configuration of inline > data path encapsulation and decapsulation of tunnel endpoint network > overlays on accelerated IO devices. > > The proposed new API would provide for the creation, destruction, and > monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs > to allow the acceleration features to be discovered by applications. > > /** Tunnel Endpoint context, opaque structure */ struct rte_tep; > > enum rte_tep_type { >RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */ >RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ >... > }; > > /** Tunnel Endpoint Attributes */ > struct rte_tep_attr { >enum rte_type_type type; > >/* other endpoint attributes here */ } > > /** > * Create a tunnel end-point context as specified by the flow attribute and > pattern > * > * @param port_id Port identifier of Ethernet device. > * @param attrFlow rule attributes. > * @param pattern Pattern specification by list of rte_flow_items. > * @return > * - On success returns pointer to TEP context > * - On failure returns NULL > */ > struct rte_tep *rte_tep_create(uint16_t port_id, > struct rte_tep_attr *attr, struct rte_flow_item > pattern[]) > > /** > * Destroy an existing tunnel end-point context. All the end-points context > * will be destroyed, so all active flows using tep should be freed before > * destroying context. > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) > > /** > * Get tunnel endpoint statistics > * > * @param port_idPort identifier of Ethernet device. > * @param tepTunnel endpoint context > * @param stats Tunnel endpoint statistics > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > Int > rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, > struct rte_tep_stats *stats) > > /** > * Get ports tunnel endpoint capabilities > * > * @param port_idPort identifier of Ethernet device. > * @param capabilitiesTunnel endpoint capabilities > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int > rte_tep_capabilities_get(uint16_t port_id, > struct rte_tep_capabilities *capabilities) Am not sure I understand why there is a need for the above control methods. Are you introducing a new "tep device" ? As the tunnel endpoint is sending and receiving Ethernet packets from the network I think it should still be counted as Ethernet device but with more capabilities (for example it supported encap/decap etc..), therefore it should use the Ethdev layer API to query statistics (for example). As for the capabilities - what specifically you had in mind? The current usage you show with tep is with rte_flow rules. There are no capabilities currently for rte_flow supported actions/pattern. To check such capabilities application uses rte_flow_validate. Regarding the creation/destroy of tep. Why not simply use rte_flow API and avoid this extra control? For example - with 17.11 APIs, application can put the port in isolate mode, and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some queue/do RSS. Such operation, per my understanding, will create a tunnel endpoint. What are the down sides of doing it with the current APIs? > > > To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is > enhanced to add a new flow item type. This contains a pointer to the TEP > context as well as the overlay flow id to which the traffic flow is > associated. > > struct rte_flow_item_tep { >struct rte_tep *tep; >uint32_t flow_id; > } Can you provide more detailed definition about the flow id ? to which field from the packet headers it refers to? On your below examples it looks like it is to match the VXLAN vni in case of VXLAN, what about the other protocols? And also, why not using the already exists VXLAN item? Generally I like the idea of separating the encap/decap context from the action. However looks like the rte_flow_item has double meaning on this RFC, once for the classification and once for the action. >From the top of my head I would think of an API which separate those, and >re-use the existing flow items. Something like: struct rte_flow_item pattern[] = { { set of already exists pattern }, { ... }, { .type = RTE_FLOW_ITEM_TYPE_END } }; encap_ctx = create_enacap_context(pattern) rte_flow_action act