Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Kevin Traynor
On 12/14/2016 01:54 PM, Adrien Mazarguil wrote:

>>
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is 
>>> set.
>>> + */
>>> +int
>>> +rte_flow_query(uint8_t port_id,
>>> +  struct rte_flow *flow,
>>> +  enum rte_flow_action_type action,
>>> +  void *data,
>>> +  struct rte_flow_error *error);
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>
>> I don't see a way to dump all the rules for a port out. I think this is
>> neccessary for degbugging. You could have a look through dpif.h in OVS
>> and see how dpif_flow_dump_next() is used, it might be a good reference.
>
> DPDK does not maintain flow rules and, depending on hardware capabilities
> and level of compliance, PMDs do not necessarily do it either, 
> particularly
> since it requires space and application probably have a better method to
> store these pointers for their own needs.

 understood

>
> What you see here is only a PMD interface. Depending on applications 
> needs,
> generic helper functions built on top of these may be added to manage flow
> rules in the future.

 I'm thinking of the case where something goes wrong and I want to get a
 dump of all the flow rules from hardware, not query the rules I think I
 have. I don't see a way to do it or something to build a helper on top of?
>>>
>>> Generic helper functions would exist on top of this API and would likely
>>> maintain a list of flow rules themselves. The dump in that case would be
>>> entirely implemented in software. I think that recovering flow rules from HW
>>> may be complicated in many cases (even without taking storage allocation and
>>> rules conversion issues into account), therefore if there is really a need
>>> for it, we could perhaps add a dump() function that PMDs are free to
>>> implement later.
>>>
>>
>> ok. Maybe there are some more generic stats that can be got from the
>> hardware that would help debugging that would suffice, like total flow
>> rule hits/misses (i.e. not on a per flow rule basis).
>>
>> You can get this from the software flow caches and it's widely used for
>> debugging. e.g.
>>
>> pmd thread numa_id 0 core_id 3:
>>  emc hits:0
>>  megaflow hits:0
>>  avg. subtable lookups per hit:0.00
>>  miss:0
>>
> 
> Perhaps a rule such as the following could do the trick:
> 
>  group: 42 (or priority 42)
>  pattern: void
>  actions: count / passthru
> 
> Assuming useful flow rules are defined with higher priorities (using lower
> group ID or priority level) and provide a terminating action, this one would
> count all packets that were not caught by them.
> 
> That is one example to illustrate how "global" counters can be requested by
> applications.
> 
> Otherwise you could just make sure all rules contain mark / flag actions, in
> which case mbufs would tell directly if they went through them or need
> additional SW processing.
> 

ok, sounds like there's some options at least to work with on this which
is good. thanks.


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Adrien Mazarguil
Hi Kevin,

On Wed, Dec 14, 2016 at 11:48:04AM +, Kevin Traynor wrote:
> hi Adrien, sorry for the delay
> 
> <...>
> 
> 
>  Is it expected that the application or pmd will provide locking between
>  these functions if required? I think it's going to have to be the app.
> >>>
> >>> Locking is indeed expected to be performed by applications. This API only
> >>> documents places where locking would make sense if necessary and expected
> >>> behavior.
> >>>
> >>> Like all control path APIs, this one assumes a single control thread.
> >>> Applications must take the necessary precautions.
> >>
> >> If you look at OVS now it's quite possible that you have 2 rx queues
> >> serviced by different threads, that would also install the flow rules in
> >> the software flow caches - possibly that could extend to adding hardware
> >> flows. There could also be another thread that is querying for stats. So
> >> anything that can be done to minimise the locking would be helpful -
> >> maybe query() could be atomic and not require any locking?
> > 
> > I think we need basic functions with as few constraints as possible on PMDs
> > first, this API being somewhat complex to implement on their side. That
> > covers the common use case where applications have a single control thread
> > or otherwise perform locking on their own.
> > 
> > Once the basics are there for most PMDs, we may add new functions, items,
> > properties and actions that provide additional constraints (timing,
> > multi-threading and so on), which remain to be defined according to
> > feedback. It is designed to be extended without causing ABI breakage.
> 
> I think Sugesh and I are trying to foresee some of the issues that may
> arise when integrating with something like OVS. OTOH it's
> hard/impossible to say what will be needed exactly in the API right now
> to make it suitable for OVS.
> 
> So, I'm ok with the approach you are taking by exposing a basic API
> but I think there should be an expectation that it may not be sufficient
> for a project like OVS to integrate in and may take several
> iterations/extensions - don't go anywhere!
> 
> > 
> > As for query(), let's see how PMDs handle it first. A race between query()
> > and create() on a given device is almost unavoidable without locking, same
> > for queries that reset counters in a given flow rule. Basic parallel queries
> > should not cause any harm otherwise, although this cannot be guaranteed yet.
> 
> You still have a race if there is locking, except it is for the lock,
> but it has the same effect. The downside of my suggestion is that all
> the PMDs would need to guarantee they could gets stats atomically - I'm
> not sure if they can or it's too restrictive.
> 
> > 
> 
> <...>
> 
> >>
> >>>
> > +
> > +/**
> > + * Destroy a flow rule on a given port.
> > + *
> > + * Failure to destroy a flow rule handle may occur when other flow 
> > rules
> > + * depend on it, and destroying it would result in an inconsistent 
> > state.
> > + *
> > + * This function is only guaranteed to succeed if handles are 
> > destroyed in
> > + * reverse order of their creation.
> 
>  How can the application find this information out on error?
> >>>
> >>> Without maintaining a list, they cannot. The specified case is the only
> >>> possible guarantee. That does not mean PMDs should not do their best to
> >>> destroy flow rules, only that ordering must remain consistent in case of
> >>> inability to destroy one.
> >>>
> >>> What do you suggest?
> >>
> >> I think if the app cannot remove a specific rule it may want to remove
> >> all rules and deal with flows in software for a time. So once the app
> >> knows it fails that should be enough.
> > 
> > OK, then since destruction may return an error already, is it fine?
> > Applications may call rte_flow_flush() (not supposed to fail unless there is
> > a serious issue, abort() in that case) and switch to SW fallback.
> 
> yes, it's fine.
> 
> > 
> 
> <...>
> 
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is 
> > set.
> > + */
> > +int
> > +rte_flow_query(uint8_t port_id,
> > +  struct rte_flow *flow,
> > +  enum rte_flow_action_type action,
> > +  void *data,
> > +  struct rte_flow_error *error);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> 
>  I don't see a way to dump all the rules for a port out. I think this is
>  neccessary for degbugging. You could have a look through dpif.h in OVS
>  and see how dpif_flow_dump_next() is used, it might be a good reference.
> >>>
> >>> DPDK does not maintain flow rules and, depending on hardware capabilities
> >>> and level of compliance, PMDs do not necessarily do it either, 
> >>> particularly
> >>> 

Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Kevin Traynor
hi Adrien, sorry for the delay

<...>


 Is it expected that the application or pmd will provide locking between
 these functions if required? I think it's going to have to be the app.
>>>
>>> Locking is indeed expected to be performed by applications. This API only
>>> documents places where locking would make sense if necessary and expected
>>> behavior.
>>>
>>> Like all control path APIs, this one assumes a single control thread.
>>> Applications must take the necessary precautions.
>>
>> If you look at OVS now it's quite possible that you have 2 rx queues
>> serviced by different threads, that would also install the flow rules in
>> the software flow caches - possibly that could extend to adding hardware
>> flows. There could also be another thread that is querying for stats. So
>> anything that can be done to minimise the locking would be helpful -
>> maybe query() could be atomic and not require any locking?
> 
> I think we need basic functions with as few constraints as possible on PMDs
> first, this API being somewhat complex to implement on their side. That
> covers the common use case where applications have a single control thread
> or otherwise perform locking on their own.
> 
> Once the basics are there for most PMDs, we may add new functions, items,
> properties and actions that provide additional constraints (timing,
> multi-threading and so on), which remain to be defined according to
> feedback. It is designed to be extended without causing ABI breakage.

I think Sugesh and I are trying to foresee some of the issues that may
arise when integrating with something like OVS. OTOH it's
hard/impossible to say what will be needed exactly in the API right now
to make it suitable for OVS.

So, I'm ok with the approach you are taking by exposing a basic API
but I think there should be an expectation that it may not be sufficient
for a project like OVS to integrate in and may take several
iterations/extensions - don't go anywhere!

> 
> As for query(), let's see how PMDs handle it first. A race between query()
> and create() on a given device is almost unavoidable without locking, same
> for queries that reset counters in a given flow rule. Basic parallel queries
> should not cause any harm otherwise, although this cannot be guaranteed yet.

You still have a race if there is locking, except it is for the lock,
but it has the same effect. The downside of my suggestion is that all
the PMDs would need to guarantee they could gets stats atomically - I'm
not sure if they can or it's too restrictive.

> 

<...>

>>
>>>
> +
> +/**
> + * Destroy a flow rule on a given port.
> + *
> + * Failure to destroy a flow rule handle may occur when other flow rules
> + * depend on it, and destroying it would result in an inconsistent state.
> + *
> + * This function is only guaranteed to succeed if handles are destroyed 
> in
> + * reverse order of their creation.

 How can the application find this information out on error?
>>>
>>> Without maintaining a list, they cannot. The specified case is the only
>>> possible guarantee. That does not mean PMDs should not do their best to
>>> destroy flow rules, only that ordering must remain consistent in case of
>>> inability to destroy one.
>>>
>>> What do you suggest?
>>
>> I think if the app cannot remove a specific rule it may want to remove
>> all rules and deal with flows in software for a time. So once the app
>> knows it fails that should be enough.
> 
> OK, then since destruction may return an error already, is it fine?
> Applications may call rte_flow_flush() (not supposed to fail unless there is
> a serious issue, abort() in that case) and switch to SW fallback.

yes, it's fine.

> 

<...>

> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +rte_flow_query(uint8_t port_id,
> +struct rte_flow *flow,
> +enum rte_flow_action_type action,
> +void *data,
> +struct rte_flow_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif

 I don't see a way to dump all the rules for a port out. I think this is
 neccessary for degbugging. You could have a look through dpif.h in OVS
 and see how dpif_flow_dump_next() is used, it might be a good reference.
>>>
>>> DPDK does not maintain flow rules and, depending on hardware capabilities
>>> and level of compliance, PMDs do not necessarily do it either, particularly
>>> since it requires space and application probably have a better method to
>>> store these pointers for their own needs.
>>
>> understood
>>
>>>
>>> What you see here is only a PMD interface. Depending on applications needs,
>>> generic helper functions built on top of these may be added to manage flow
>>> rules in the future.
>>
>> I'm thinking of the case where so

Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-12 Thread Adrien Mazarguil
Hi Sugesh,

On Mon, Dec 12, 2016 at 10:20:18AM +, Chandran, Sugesh wrote:
> Hi Adrien,
> 
> Regards
> _Sugesh
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Friday, December 9, 2016 4:39 PM
> > To: Chandran, Sugesh 
> > Cc: Kevin Traynor ; dev@dpdk.org; Thomas
> > Monjalon ; De Lara Guarch, Pablo
> > ; Olivier Matz ;
> > sugesh.chand...@intel.comn
> > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> > 
> > Hi Sugesh,
> > 
> > On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote:
> > [...]
> > > > > Are you going to provide any control over the initialization of
> > > > > NIC to define the capability matrices For eg; To operate in a L3
> > > > > router mode,
> > > > software wanted to initialize the NIC port only to consider the L2
> > > > and L3 fields.
> > > > > I assume the initialization is done based on the first rules that
> > > > > are
> > > > programmed into the NIC.?
> > > >
> > > > Precisely, PMDs are supposed to determine the most appropriate
> > > > device mode to use in order to handle the requested rules. They may
> > > > even switch to another mode if necessary assuming this does not
> > > > break existing constraints.
> > > >
> > > > I think we've discussed an atomic (commit-based) mode of operation
> > > > through separate functions as well, where the application would
> > > > attempt to create a bunch of rules at once, possibly making it
> > > > easier for PMDs to determine the most appropriate mode of operation
> > for the device.
> > > >
> > > > All of these may be added later according to users feedback once the
> > > > basic API has settled.
> > > [Sugesh] Yes , we discussed about this before. However I feel that, it
> > > make sense to provide some flexibility to the user/application to define a
> > profile/mode of the device.
> > > This way the complexity of determining the mode by itself will be taken
> > away from PMD.
> > > Looking at the P4 enablement patches in OVS, the mode definition APIs
> > > can be used in conjunction
> > > P4 behavioral model.
> > > For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using
> > > the mode definition APIs Its possible to impose the same behavioral model
> > in the hardware too.
> > > This way its simple, clean and very predictive though it needs to define 
> > > an
> > additional profile_define APIs.
> > > I am sorry to provide the comment at this stage,  However looking at
> > > the adoption of ebpf, P4 make me to think this way.
> > > What do you think?
> > 
> > What you suggest (device profile configuration) would be done by a separate
> > function in any case, so as long as everyone agrees on a generic method to
> > do so, no problem with extending rte_flow. By default in the meantime we'll
> > have to rely on PMDs to make the right decision.
> [Sugesh] I am fine with PMD is making the decision on profile/mode selection 
> in
> Default case. However we must provide an option for the application to define 
> a mode
> and PMD must honor with it to avoid making an invalid mode change.
> > 
> > Do you think it has to be defined from the beginning?
> [Sugesh] I feel it's going to be another big topic to decide how proposed 
> mode implementation will be looks like,
> What should be available modes and etc.  So I am OK to consider as its not 
> part of this flow API definition for now.
> However its good to mention that in the API comments section to be aware. Do 
> you agree that?

Will do, I'll mention it in the "future evolutions" section.

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-12 Thread Chandran, Sugesh
Hi Adrien,

Regards
_Sugesh

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Friday, December 9, 2016 4:39 PM
> To: Chandran, Sugesh 
> Cc: Kevin Traynor ; dev@dpdk.org; Thomas
> Monjalon ; De Lara Guarch, Pablo
> ; Olivier Matz ;
> sugesh.chand...@intel.comn
> Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> Hi Sugesh,
> 
> On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote:
> [...]
> > > > Are you going to provide any control over the initialization of
> > > > NIC to define the capability matrices For eg; To operate in a L3
> > > > router mode,
> > > software wanted to initialize the NIC port only to consider the L2
> > > and L3 fields.
> > > > I assume the initialization is done based on the first rules that
> > > > are
> > > programmed into the NIC.?
> > >
> > > Precisely, PMDs are supposed to determine the most appropriate
> > > device mode to use in order to handle the requested rules. They may
> > > even switch to another mode if necessary assuming this does not
> > > break existing constraints.
> > >
> > > I think we've discussed an atomic (commit-based) mode of operation
> > > through separate functions as well, where the application would
> > > attempt to create a bunch of rules at once, possibly making it
> > > easier for PMDs to determine the most appropriate mode of operation
> for the device.
> > >
> > > All of these may be added later according to users feedback once the
> > > basic API has settled.
> > [Sugesh] Yes , we discussed about this before. However I feel that, it
> > make sense to provide some flexibility to the user/application to define a
> profile/mode of the device.
> > This way the complexity of determining the mode by itself will be taken
> away from PMD.
> > Looking at the P4 enablement patches in OVS, the mode definition APIs
> > can be used in conjunction
> > P4 behavioral model.
> > For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using
> > the mode definition APIs Its possible to impose the same behavioral model
> in the hardware too.
> > This way its simple, clean and very predictive though it needs to define an
> additional profile_define APIs.
> > I am sorry to provide the comment at this stage,  However looking at
> > the adoption of ebpf, P4 make me to think this way.
> > What do you think?
> 
> What you suggest (device profile configuration) would be done by a separate
> function in any case, so as long as everyone agrees on a generic method to
> do so, no problem with extending rte_flow. By default in the meantime we'll
> have to rely on PMDs to make the right decision.
[Sugesh] I am fine with PMD is making the decision on profile/mode selection in
Default case. However we must provide an option for the application to define a 
mode
and PMD must honor with it to avoid making an invalid mode change.
> 
> Do you think it has to be defined from the beginning?
[Sugesh] I feel it's going to be another big topic to decide how proposed mode 
implementation will be looks like,
What should be available modes and etc.  So I am OK to consider as its not part 
of this flow API definition for now.
However its good to mention that in the API comments section to be aware. Do 
you agree that?

> 
> --
> Adrien Mazarguil
> 6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-09 Thread Adrien Mazarguil
Hi Sugesh,

On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote:
[...]
> > > Are you going to provide any control over the initialization of NIC
> > > to define the capability matrices For eg; To operate in a L3 router mode,
> > software wanted to initialize the NIC port only to consider the L2 and L3
> > fields.
> > > I assume the initialization is done based on the first rules that are
> > programmed into the NIC.?
> > 
> > Precisely, PMDs are supposed to determine the most appropriate device
> > mode to use in order to handle the requested rules. They may even switch
> > to another mode if necessary assuming this does not break existing
> > constraints.
> > 
> > I think we've discussed an atomic (commit-based) mode of operation
> > through separate functions as well, where the application would attempt to
> > create a bunch of rules at once, possibly making it easier for PMDs to
> > determine the most appropriate mode of operation for the device.
> > 
> > All of these may be added later according to users feedback once the basic
> > API has settled.
> [Sugesh] Yes , we discussed about this before. However I feel that, it make 
> sense
> to provide some flexibility to the user/application to define a profile/mode 
> of the device.
> This way the complexity of determining the mode by itself will be taken away 
> from PMD.
> Looking at the P4 enablement patches in OVS, the mode definition APIs can be 
> used in conjunction
> P4 behavioral model. 
> For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using the mode 
> definition APIs
> Its possible to impose the same behavioral model in the hardware too. 
> This way its simple, clean and very predictive though it needs to define an 
> additional profile_define APIs.
> I am sorry to provide the comment at this stage,  However looking at the 
> adoption of ebpf, P4 make me
> to think this way.
> What do you think?

What you suggest (device profile configuration) would be done by a separate
function in any case, so as long as everyone agrees on a generic method to
do so, no problem with extending rte_flow. By default in the meantime we'll
have to rely on PMDs to make the right decision.

Do you think it has to be defined from the beginning?

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-09 Thread Chandran, Sugesh
Hi Adrien,
Thank you for your comments,
Please see the reply below.

Regards
_Sugesh


> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Thursday, December 8, 2016 3:09 PM
> To: Chandran, Sugesh 
> Cc: Kevin Traynor ; dev@dpdk.org; Thomas
> Monjalon ; De Lara Guarch, Pablo
> ; Olivier Matz ;
> sugesh.chand...@intel.comn
> Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> Hi Sugesh,
> 
> On Tue, Dec 06, 2016 at 06:11:38PM +, Chandran, Sugesh wrote:
> [...]
> > > >>> +int
> > > >>> +rte_flow_validate(uint8_t port_id,
> > > >>> +   const struct rte_flow_attr *attr,
> > > >>> +   const struct rte_flow_item pattern[],
> > > >>> +   const struct rte_flow_action actions[],
> > > >>> +   struct rte_flow_error *error);
> > > >>
> > > >> Why not just use rte_flow_create() and get an error? Is it less
> > > >> disruptive to do a validate and find the rule cannot be created,
> > > >> than using a create directly?
> > > >
> > > > The rationale can be found in the original RFC, which I'll convert
> > > > to actual documentation in v2. In short:
> > > >
> > > > - Calling rte_flow_validate() before rte_flow_create() is useless since
> > > >   rte_flow_create() also performs validation.
> > > >
> > > > - We cannot possibly express a full static set of allowed flow rules, 
> > > > even
> > > >   if we could, it usually depends on the current hardware configuration
> > > >   therefore would not be static.
> > > >
> > > > - rte_flow_validate() is thus provided as a replacement for capability
> > > >   flags. It can be used to determine during initialization if the 
> > > > underlying
> > > >   device can support the typical flow rules an application might want to
> > > >   provide later and do something useful with that information (e.g.
> always
> > > >   use software fallback due to HW limitations).
> > > >
> > > > - rte_flow_validate() being a subset of rte_flow_create(), it is
> essentially
> > > >   free to expose.
> > >
> > > make sense now, thanks.
> > [Sugesh] : We had this discussion earlier at the design stage about
> > the time taken for programming the hardware, and how to make it
> > deterministic. How about having a timeout parameter as well for the
> > rte_flow_* If the hardware flow insert is timed out, error out than
> > waiting indefinitely, so that application have some control over The
> > time to program the flow. It can be another set of APIs something
> > like, rte_flow_create_timeout()
> 
> Yes as discussed the existing API does not provide any timing constraints to
> PMDs, validate() and create() may take forever to complete, although PMDs
> are strongly encouraged to take as little time as possible.
> 
> Like you suggested, this could be done through distinct API calls. The
> validate() function would also have its _timeout() counterpart since the set
> of possible rules could be restricted in that mode.
[Sugesh] Thanks!. Looking forward to see an api set with that implementation as 
well 
in the future :). I feel it's a must from the user application point of view.
> 
> > Are you going to provide any control over the initialization of NIC
> > to define the capability matrices For eg; To operate in a L3 router mode,
> software wanted to initialize the NIC port only to consider the L2 and L3
> fields.
> > I assume the initialization is done based on the first rules that are
> programmed into the NIC.?
> 
> Precisely, PMDs are supposed to determine the most appropriate device
> mode to use in order to handle the requested rules. They may even switch
> to another mode if necessary assuming this does not break existing
> constraints.
> 
> I think we've discussed an atomic (commit-based) mode of operation
> through separate functions as well, where the application would attempt to
> create a bunch of rules at once, possibly making it easier for PMDs to
> determine the most appropriate mode of operation for the device.
> 
> All of these may be added later according to users feedback once the basic
> API has settled.
[Sugesh] Yes , we discussed about this before. However I feel that, it make 
sense
to provide some flexibility to the user/application to define a profile/mode of 
the device.
This way the complexity of determining the mode by itself will be taken away 
from PMD.
Looking at the P4 enablement patches in OVS, the mode definition APIs can be 
used in conjunction
P4 behavioral model. 
For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using the mode 
definition APIs
Its possible to impose the same behavioral model in the hardware too. 
This way its simple, clean and very predictive though it needs to define an 
additional profile_define APIs.
I am sorry to provide the comment at this stage,  However looking at the 
adoption of ebpf, P4 make me
to think this way.
What do you think?
> 
> --
> Adrien Mazarguil
> 6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-08 Thread Adrien Mazarguil
On Fri, Dec 02, 2016 at 09:06:42PM +, Kevin Traynor wrote:
> On 12/01/2016 08:36 AM, Adrien Mazarguil wrote:
> > Hi Kevin,
> > 
> > On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote:
> >> Hi Adrien,
> >>
> >> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> >>> This new API supersedes all the legacy filter types described in
> >>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> >>> PMDs to process and validate flow rules.
> >>>
> >>> Benefits:
> >>>
> >>> - A unified API is easier to program for, applications do not have to be
> >>>   written for a specific filter type which may or may not be supported by
> >>>   the underlying device.
> >>>
> >>> - The behavior of a flow rule is the same regardless of the underlying
> >>>   device, applications do not need to be aware of hardware quirks.
> >>>
> >>> - Extensible by design, API/ABI breakage should rarely occur if at all.
> >>>
> >>> - Documentation is self-standing, no need to look up elsewhere.
> >>>
> >>> Existing filter types will be deprecated and removed in the near future.
> >>
> >> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
> >> a target release.
> > 
> > Will do, not a sure about the target release though. It seems a bit early
> > since no PMD really supports this API yet.
> > 
> > [...]
> >>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> >>> new file mode 100644
> >>> index 000..064963d
> >>> --- /dev/null
> >>> +++ b/lib/librte_ether/rte_flow.c
> >>> @@ -0,0 +1,159 @@
> >>> +/*-
> >>> + *   BSD LICENSE
> >>> + *
> >>> + *   Copyright 2016 6WIND S.A.
> >>> + *   Copyright 2016 Mellanox.
> >>
> >> There's Mellanox copyright but you are the only signed-off-by - is that
> >> right?
> > 
> > Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
> > on their behalf to expose several features from mlx4/mlx5 as the existing
> > filter types had too many limitations.
> > 
> > [...]
> >>> +/* Get generic flow operations structure from a port. */
> >>> +const struct rte_flow_ops *
> >>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
> >>> +{
> >>> + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >>> + const struct rte_flow_ops *ops;
> >>> + int code;
> >>> +
> >>> + if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> >>> + code = ENODEV;
> >>> + else if (unlikely(!dev->dev_ops->filter_ctrl ||
> >>> +   dev->dev_ops->filter_ctrl(dev,
> >>> + RTE_ETH_FILTER_GENERIC,
> >>> + RTE_ETH_FILTER_GET,
> >>> + &ops) ||
> >>> +   !ops))
> >>> + code = ENOTSUP;
> >>> + else
> >>> + return ops;
> >>> + rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> >>> +NULL, rte_strerror(code));
> >>> + return NULL;
> >>> +}
> >>> +
> >>
> >> Is it expected that the application or pmd will provide locking between
> >> these functions if required? I think it's going to have to be the app.
> > 
> > Locking is indeed expected to be performed by applications. This API only
> > documents places where locking would make sense if necessary and expected
> > behavior.
> > 
> > Like all control path APIs, this one assumes a single control thread.
> > Applications must take the necessary precautions.
> 
> If you look at OVS now it's quite possible that you have 2 rx queues
> serviced by different threads, that would also install the flow rules in
> the software flow caches - possibly that could extend to adding hardware
> flows. There could also be another thread that is querying for stats. So
> anything that can be done to minimise the locking would be helpful -
> maybe query() could be atomic and not require any locking?

I think we need basic functions with as few constraints as possible on PMDs
first, this API being somewhat complex to implement on their side. That
covers the common use case where applications have a single control thread
or otherwise perform locking on their own.

Once the basics are there for most PMDs, we may add new functions, items,
properties and actions that provide additional constraints (timing,
multi-threading and so on), which remain to be defined according to
feedback. It is designed to be extended without causing ABI breakage.

As for query(), let's see how PMDs handle it first. A race between query()
and create() on a given device is almost unavoidable without locking, same
for queries that reset counters in a given flow rule. Basic parallel queries
should not cause any harm otherwise, although this cannot be guaranteed yet.

> > [...]
> >>> +/**
> >>> + * Flow rule attributes.
> >>> + *
> >>> + * Priorities are set on two levels: per group and per rule within 
> >>> groups.
> >>> + *
> >>> + * Lower values denote higher priority, the highest priority for both 
> >>> levels
> >>> + * is

Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-08 Thread Adrien Mazarguil
Hi Sugesh,

On Tue, Dec 06, 2016 at 06:11:38PM +, Chandran, Sugesh wrote:
[...]
> > >>> +int
> > >>> +rte_flow_validate(uint8_t port_id,
> > >>> + const struct rte_flow_attr *attr,
> > >>> + const struct rte_flow_item pattern[],
> > >>> + const struct rte_flow_action actions[],
> > >>> + struct rte_flow_error *error);
> > >>
> > >> Why not just use rte_flow_create() and get an error? Is it less
> > >> disruptive to do a validate and find the rule cannot be created, than
> > >> using a create directly?
> > >
> > > The rationale can be found in the original RFC, which I'll convert to
> > > actual documentation in v2. In short:
> > >
> > > - Calling rte_flow_validate() before rte_flow_create() is useless since
> > >   rte_flow_create() also performs validation.
> > >
> > > - We cannot possibly express a full static set of allowed flow rules, even
> > >   if we could, it usually depends on the current hardware configuration
> > >   therefore would not be static.
> > >
> > > - rte_flow_validate() is thus provided as a replacement for capability
> > >   flags. It can be used to determine during initialization if the 
> > > underlying
> > >   device can support the typical flow rules an application might want to
> > >   provide later and do something useful with that information (e.g. always
> > >   use software fallback due to HW limitations).
> > >
> > > - rte_flow_validate() being a subset of rte_flow_create(), it is 
> > > essentially
> > >   free to expose.
> > 
> > make sense now, thanks.
> [Sugesh] : We had this discussion earlier at the design stage about the time 
> taken for programming the hardware,
> and how to make it deterministic. How about having a timeout parameter as 
> well for the rte_flow_*
> If the hardware flow insert is timed out, error out than waiting 
> indefinitely, so that application have some control over
> The time to program the flow. It can be another set of APIs something like, 
> rte_flow_create_timeout()

Yes as discussed the existing API does not provide any timing constraints to
PMDs, validate() and create() may take forever to complete, although PMDs
are strongly encouraged to take as little time as possible.

Like you suggested, this could be done through distinct API calls. The
validate() function would also have its _timeout() counterpart since the set
of possible rules could be restricted in that mode.

> Are you going to provide any control over the initialization of NIC  to 
> define the capability matrices
> For eg; To operate in a L3 router mode,  software wanted to initialize the 
> NIC port only to consider the L2 and L3 fields.
> I assume the initialization is done based on the first rules that are 
> programmed into the NIC.?

Precisely, PMDs are supposed to determine the most appropriate device mode
to use in order to handle the requested rules. They may even switch to
another mode if necessary assuming this does not break existing constraints.

I think we've discussed an atomic (commit-based) mode of operation through
separate functions as well, where the application would attempt to create a
bunch of rules at once, possibly making it easier for PMDs to determine the
most appropriate mode of operation for the device.

All of these may be added later according to users feedback once the basic
API has settled.

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-08 Thread Adrien Mazarguil
Hi Beilei,

On Thu, Dec 08, 2016 at 09:00:05AM +, Xing, Beilei wrote:
[...]
> > +/**
> > + * RTE_FLOW_ITEM_TYPE_ETH
> > + *
> > + * Matches an Ethernet header.
> > + */
> > +struct rte_flow_item_eth {
> > +   struct ether_addr dst; /**< Destination MAC. */
> > +   struct ether_addr src; /**< Source MAC. */
> > +   unsigned int type; /**< EtherType. */
> Hi Adrien,
> 
> ETHERTYPE in ether header is 2 bytes, so I think "uint16_t type" is more 
> appropriate here, what do you think?

You're right, thanks for catching this. I'll update it in v2 (soon).

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-08 Thread Xing, Beilei


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Thursday, November 17, 2016 12:23 AM
> To: dev@dpdk.org
> Cc: Thomas Monjalon ; De Lara Guarch,
> Pablo ; Olivier Matz
> 
> Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> This new API supersedes all the legacy filter types described in 
> rte_eth_ctrl.h.
> It is slightly higher level and as a result relies more on PMDs to process and
> validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.
> 
> Signed-off-by: Adrien Mazarguil 
> ---
>  MAINTAINERS|   4 +
>  lib/librte_ether/Makefile  |   3 +
>  lib/librte_ether/rte_eth_ctrl.h|   1 +
>  lib/librte_ether/rte_ether_version.map |  10 +
>  lib/librte_ether/rte_flow.c| 159 +
>  lib/librte_ether/rte_flow.h| 947 
>  lib/librte_ether/rte_flow_driver.h | 177 ++
>  7 files changed, 1301 insertions(+)
> 
> +/**
> + * RTE_FLOW_ITEM_TYPE_ETH
> + *
> + * Matches an Ethernet header.
> + */
> +struct rte_flow_item_eth {
> + struct ether_addr dst; /**< Destination MAC. */
> + struct ether_addr src; /**< Source MAC. */
> + unsigned int type; /**< EtherType. */
Hi Adrien,

ETHERTYPE in ether header is 2 bytes, so I think "uint16_t type" is more 
appropriate here, what do you think?

Thanks,
Beilei Xing
> +};
> +


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-06 Thread Chandran, Sugesh
Hi Adrien,
Thanks for sending out the patches,

Please find few comments below,


Regards
_Sugesh


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Kevin Traynor
> Sent: Friday, December 2, 2016 9:07 PM
> To: Adrien Mazarguil 
> Cc: dev@dpdk.org; Thomas Monjalon ; De
> Lara Guarch, Pablo ; Olivier Matz
> ; sugesh.chand...@intel.comn
> Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
>>>>>>Snipp
> >>> + *
> >>> + * Attaches a 32 bit value to packets.
> >>> + *
> >>> + * This value is arbitrary and application-defined. For
> >>> +compatibility with
> >>> + * FDIR it is returned in the hash.fdir.hi mbuf field.
> >>> +PKT_RX_FDIR_ID is
> >>> + * also set in ol_flags.
> >>> + */
> >>> +struct rte_flow_action_mark {
> >>> + uint32_t id; /**< 32 bit value to return with packets. */ };
> >>
> >> One use case I thought we would be able to do for OVS is
> >> classification in hardware and the unique flow id is sent with the packet 
> >> to
> software.
> >> But in OVS the ufid is 128 bits, so it means we can't and there is
> >> still the miniflow extract overhead. I'm not sure if there is a
> >> practical way around this.
> >>
> >> Sugesh (cc'd) has looked at this before and may be able to comment or
> >> correct me.
> >
> > Yes, we settled on 32 bit because currently no known hardware
> > implementation supports more than this. If that changes, another
> > action with a larger type shall be provided (no ABI breakage).
> >
> > Also since even 64 bit would not be enough for the use case you
> > mention, there is no choice but use this as an indirect value (such as
> > an array or hash table index/value).
> 
> ok, cool. I think Sugesh has other ideas anyway!
[Sugesh] It should be fine with 32 bit . we can manage it in OVS accordingly.
> 
> >
> > [...]
> >>> +/**
> >>> + * RTE_FLOW_ACTION_TYPE_RSS
> >>> + *
> >>> +
> >>> + *
> >>> + * Terminating by default.
> >>> + */
> >>> +struct rte_flow_action_vf {
> >>> + uint32_t original:1; /**< Use original VF ID if possible. */
> >>> + uint32_t reserved:31; /**< Reserved, must be zero. */
> >>> + uint32_t id; /**< VF ID to redirect packets to. */ };
> > [...]
> >>> +/**
> >>> + * Check whether a flow rule can be created on a given port.
> >>> + *
> >>> + * While this function has no effect on the target device, the flow
> >>> +rule is
> >>> + * validated against its current configuration state and the
> >>> +returned value
> >>> + * should be considered valid by the caller for that state only.
> >>> + *
> >>> + * The returned value is guaranteed to remain valid only as long as
> >>> +no
> >>> + * successful calls to rte_flow_create() or rte_flow_destroy() are
> >>> +made in
> >>> + * the meantime and no device parameter affecting flow rules in any
> >>> +way are
> >>> + * modified, due to possible collisions or resource limitations
> >>> +(although in
> >>> + * such cases EINVAL should not be returned).
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 if flow rule is valid and can be created. A negative errno value
> >>> + *   otherwise (rte_errno is also set), the following errors are defined:
> >>> + *
> >>> + *   -ENOSYS: underlying device does not support this functionality.
> >>> + *
> >>> + *   -EINVAL: unknown or invalid rule specification.
> >>> + *
> >>> + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
> >>> + *   bit-masks are unsupported).
> >>> + *
> >>> + *   -EEXIST: collision with an existing rule.
> >>> + *
> >>> + *   -ENOMEM: not enough resources

Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-02 Thread Kevin Traynor
On 12/01/2016 08:36 AM, Adrien Mazarguil wrote:
> Hi Kevin,
> 
> On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote:
>> Hi Adrien,
>>
>> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
>>> This new API supersedes all the legacy filter types described in
>>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
>>> PMDs to process and validate flow rules.
>>>
>>> Benefits:
>>>
>>> - A unified API is easier to program for, applications do not have to be
>>>   written for a specific filter type which may or may not be supported by
>>>   the underlying device.
>>>
>>> - The behavior of a flow rule is the same regardless of the underlying
>>>   device, applications do not need to be aware of hardware quirks.
>>>
>>> - Extensible by design, API/ABI breakage should rarely occur if at all.
>>>
>>> - Documentation is self-standing, no need to look up elsewhere.
>>>
>>> Existing filter types will be deprecated and removed in the near future.
>>
>> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
>> a target release.
> 
> Will do, not a sure about the target release though. It seems a bit early
> since no PMD really supports this API yet.
> 
> [...]
>>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
>>> new file mode 100644
>>> index 000..064963d
>>> --- /dev/null
>>> +++ b/lib/librte_ether/rte_flow.c
>>> @@ -0,0 +1,159 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright 2016 6WIND S.A.
>>> + *   Copyright 2016 Mellanox.
>>
>> There's Mellanox copyright but you are the only signed-off-by - is that
>> right?
> 
> Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
> on their behalf to expose several features from mlx4/mlx5 as the existing
> filter types had too many limitations.
> 
> [...]
>>> +/* Get generic flow operations structure from a port. */
>>> +const struct rte_flow_ops *
>>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
>>> +{
>>> +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>>> +   const struct rte_flow_ops *ops;
>>> +   int code;
>>> +
>>> +   if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
>>> +   code = ENODEV;
>>> +   else if (unlikely(!dev->dev_ops->filter_ctrl ||
>>> + dev->dev_ops->filter_ctrl(dev,
>>> +   RTE_ETH_FILTER_GENERIC,
>>> +   RTE_ETH_FILTER_GET,
>>> +   &ops) ||
>>> + !ops))
>>> +   code = ENOTSUP;
>>> +   else
>>> +   return ops;
>>> +   rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
>>> +  NULL, rte_strerror(code));
>>> +   return NULL;
>>> +}
>>> +
>>
>> Is it expected that the application or pmd will provide locking between
>> these functions if required? I think it's going to have to be the app.
> 
> Locking is indeed expected to be performed by applications. This API only
> documents places where locking would make sense if necessary and expected
> behavior.
> 
> Like all control path APIs, this one assumes a single control thread.
> Applications must take the necessary precautions.

If you look at OVS now it's quite possible that you have 2 rx queues
serviced by different threads, that would also install the flow rules in
the software flow caches - possibly that could extend to adding hardware
flows. There could also be another thread that is querying for stats. So
anything that can be done to minimise the locking would be helpful -
maybe query() could be atomic and not require any locking?

> 
> [...]
>>> +/**
>>> + * Flow rule attributes.
>>> + *
>>> + * Priorities are set on two levels: per group and per rule within groups.
>>> + *
>>> + * Lower values denote higher priority, the highest priority for both 
>>> levels
>>> + * is 0, so that a rule with priority 0 in group 8 is always matched after 
>>> a
>>> + * rule with priority 8 in group 0.
>>> + *
>>> + * Although optional, applications are encouraged to group similar rules as
>>> + * much as possible to fully take advantage of hardware capabilities
>>> + * (e.g. optimized matching) and work around limitations (e.g. a single
>>> + * pattern type possibly allowed in a given group).
>>> + *
>>> + * Group and priority levels are arbitrary and up to the application, they
>>> + * do not need to be contiguous nor start from 0, however the maximum 
>>> number
>>> + * varies between devices and may be affected by existing flow rules.
>>> + *
>>> + * If a packet is matched by several rules of a given group for a given
>>> + * priority level, the outcome is undefined. It can take any path, may be
>>> + * duplicated or even cause unrecoverable errors.
>>
>> I get what you are trying to do here wrt supporting multiple
>> pmds/hardware implementations and it's a good idea to keep it flexible.
>>
>> Given that the outcome is undefined, it would be nice that t

[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-01 Thread Adrien Mazarguil
Hi Kevin,

On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote:
> Hi Adrien,
> 
> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> > This new API supersedes all the legacy filter types described in
> > rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> > PMDs to process and validate flow rules.
> > 
> > Benefits:
> > 
> > - A unified API is easier to program for, applications do not have to be
> >   written for a specific filter type which may or may not be supported by
> >   the underlying device.
> > 
> > - The behavior of a flow rule is the same regardless of the underlying
> >   device, applications do not need to be aware of hardware quirks.
> > 
> > - Extensible by design, API/ABI breakage should rarely occur if at all.
> > 
> > - Documentation is self-standing, no need to look up elsewhere.
> > 
> > Existing filter types will be deprecated and removed in the near future.
> 
> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
> a target release.

Will do, not a sure about the target release though. It seems a bit early
since no PMD really supports this API yet.

[...]
> > diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> > new file mode 100644
> > index 000..064963d
> > --- /dev/null
> > +++ b/lib/librte_ether/rte_flow.c
> > @@ -0,0 +1,159 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright 2016 6WIND S.A.
> > + *   Copyright 2016 Mellanox.
> 
> There's Mellanox copyright but you are the only signed-off-by - is that
> right?

Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
on their behalf to expose several features from mlx4/mlx5 as the existing
filter types had too many limitations.

[...]
> > +/* Get generic flow operations structure from a port. */
> > +const struct rte_flow_ops *
> > +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
> > +{
> > +   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +   const struct rte_flow_ops *ops;
> > +   int code;
> > +
> > +   if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> > +   code = ENODEV;
> > +   else if (unlikely(!dev->dev_ops->filter_ctrl ||
> > + dev->dev_ops->filter_ctrl(dev,
> > +   RTE_ETH_FILTER_GENERIC,
> > +   RTE_ETH_FILTER_GET,
> > +   &ops) ||
> > + !ops))
> > +   code = ENOTSUP;
> > +   else
> > +   return ops;
> > +   rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +  NULL, rte_strerror(code));
> > +   return NULL;
> > +}
> > +
> 
> Is it expected that the application or pmd will provide locking between
> these functions if required? I think it's going to have to be the app.

Locking is indeed expected to be performed by applications. This API only
documents places where locking would make sense if necessary and expected
behavior.

Like all control path APIs, this one assumes a single control thread.
Applications must take the necessary precautions.

[...]
> > +/**
> > + * Flow rule attributes.
> > + *
> > + * Priorities are set on two levels: per group and per rule within groups.
> > + *
> > + * Lower values denote higher priority, the highest priority for both 
> > levels
> > + * is 0, so that a rule with priority 0 in group 8 is always matched after 
> > a
> > + * rule with priority 8 in group 0.
> > + *
> > + * Although optional, applications are encouraged to group similar rules as
> > + * much as possible to fully take advantage of hardware capabilities
> > + * (e.g. optimized matching) and work around limitations (e.g. a single
> > + * pattern type possibly allowed in a given group).
> > + *
> > + * Group and priority levels are arbitrary and up to the application, they
> > + * do not need to be contiguous nor start from 0, however the maximum 
> > number
> > + * varies between devices and may be affected by existing flow rules.
> > + *
> > + * If a packet is matched by several rules of a given group for a given
> > + * priority level, the outcome is undefined. It can take any path, may be
> > + * duplicated or even cause unrecoverable errors.
> 
> I get what you are trying to do here wrt supporting multiple
> pmds/hardware implementations and it's a good idea to keep it flexible.
> 
> Given that the outcome is undefined, it would be nice that the
> application has a way of finding the specific effects for verification
> and debugging.

Right, however it was deemed a bit difficult to manage in many cases hence
the vagueness.

For example, suppose two rules with the same group and priority, one
matching any IPv4 header, the other one any UDP header:

- TCPv4 packets => rule #1.
- UDPv6 packets => rule #2.
- UDPv4 packets => both?

That last one is perhaps invalid, checking that some unspecified protocol
combination does not overlap is expensive and may 

[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-11-30 Thread Kevin Traynor
Hi Adrien,

On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> This new API supersedes all the legacy filter types described in
> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> PMDs to process and validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.

I'd suggest to add a deprecation notice to deprecation.rst, ideally with
a target release.

> 
> Signed-off-by: Adrien Mazarguil 
> ---
>  MAINTAINERS|   4 +
>  lib/librte_ether/Makefile  |   3 +
>  lib/librte_ether/rte_eth_ctrl.h|   1 +
>  lib/librte_ether/rte_ether_version.map |  10 +
>  lib/librte_ether/rte_flow.c| 159 +
>  lib/librte_ether/rte_flow.h| 947 
>  lib/librte_ether/rte_flow_driver.h | 177 ++
>  7 files changed, 1301 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d6bb8f8..3b46630 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -243,6 +243,10 @@ M: Thomas Monjalon 
>  F: lib/librte_ether/
>  F: scripts/test-null.sh
>  
> +Generic flow API
> +M: Adrien Mazarguil 
> +F: lib/librte_ether/rte_flow*
> +
>  Crypto API
>  M: Declan Doherty 
>  F: lib/librte_cryptodev/
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index efe1e5f..9335361 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
>  LIBABIVER := 5
>  
>  SRCS-y += rte_ethdev.c
> +SRCS-y += rte_flow.c
>  
>  #
>  # Export include files
> @@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
>  SYMLINK-y-include += rte_ethdev.h
>  SYMLINK-y-include += rte_eth_ctrl.h
>  SYMLINK-y-include += rte_dev_info.h
> +SYMLINK-y-include += rte_flow.h
> +SYMLINK-y-include += rte_flow_driver.h
>  
>  # this lib depends upon:
>  DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool 
> lib/librte_ring lib/librte_mbuf
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index fe80eb0..8386904 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -99,6 +99,7 @@ enum rte_filter_type {
>   RTE_ETH_FILTER_FDIR,
>   RTE_ETH_FILTER_HASH,
>   RTE_ETH_FILTER_L2_TUNNEL,
> + RTE_ETH_FILTER_GENERIC,
>   RTE_ETH_FILTER_MAX
>  };
>  
> diff --git a/lib/librte_ether/rte_ether_version.map 
> b/lib/librte_ether/rte_ether_version.map
> index 72be66d..b5d2547 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -147,3 +147,13 @@ DPDK_16.11 {
>   rte_eth_dev_pci_remove;
>  
>  } DPDK_16.07;
> +
> +DPDK_17.02 {
> + global:
> +
> + rte_flow_validate;
> + rte_flow_create;
> + rte_flow_destroy;
> + rte_flow_query;
> +
> +} DPDK_16.11;
> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> new file mode 100644
> index 000..064963d
> --- /dev/null
> +++ b/lib/librte_ether/rte_flow.c
> @@ -0,0 +1,159 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 6WIND S.A.
> + *   Copyright 2016 Mellanox.

There's Mellanox copyright but you are the only signed-off-by - is that
right?

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of 6WIND S.A. nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 

[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-11-18 Thread Adrien Mazarguil
Hi Beilei,

On Fri, Nov 18, 2016 at 06:36:31AM +, Xing, Beilei wrote:
> Hi Adrien,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Thursday, November 17, 2016 12:23 AM
> > To: dev at dpdk.org
> > Cc: Thomas Monjalon ; De Lara Guarch,
> > Pablo ; Olivier Matz
> > 
> > Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> > 
> > This new API supersedes all the legacy filter types described in 
> > rte_eth_ctrl.h.
> > It is slightly higher level and as a result relies more on PMDs to process 
> > and
> > validate flow rules.
> > 
> > Benefits:
> > 
> > - A unified API is easier to program for, applications do not have to be
> >   written for a specific filter type which may or may not be supported by
> >   the underlying device.
> > 
> > - The behavior of a flow rule is the same regardless of the underlying
> >   device, applications do not need to be aware of hardware quirks.
> > 
> > - Extensible by design, API/ABI breakage should rarely occur if at all.
> > 
> > - Documentation is self-standing, no need to look up elsewhere.
> > 
> > Existing filter types will be deprecated and removed in the near future.
> > 
> > Signed-off-by: Adrien Mazarguil 
> 
> 
> > +
> > +/**
> > + * Opaque type returned after successfully creating a flow.
> > + *
> > + * This handle can be used to manage and query the related flow (e.g.
> > +to
> > + * destroy it or retrieve counters).
> > + */
> > +struct rte_flow;
> > +
> 
> As we talked before, we use attr/pattern/actions to create and destroy a flow 
> in PMD, 
> but I don't think it's easy to clone the user-provided parameters and return 
> the result
> to the application as a rte_flow pointer.  As you suggested:
> /* PMD-specific code. */
>  struct rte_flow {
> struct rte_flow_attr attr;
> struct rte_flow_item *pattern;
> struct rte_flow_action *actions;
>  };

Just to provide some context to the community since the above snippet comes
from private exchanges, I've suggested the above structure as a mean to
create and remove rules in the same fashion as FDIR, by providing the rule
used for creation to the destroy callback.

As an opaque type, each PMD currently needs to implement its own version of
struct rte_flow. The above definition may ease transition from FDIR to
rte_flow for some PMDs, however they need to clone the entire
application-provided rule to do so because there is no requirement for it to
be kept allocated.

I've implemented such a function in testpmd (port_flow_new() in commit [1])
as an example.

 [1] http://dpdk.org/ml/archives/dev/2016-November/050266.html

However my suggestion is for PMDs to use their own HW-specific structure
that only contains relevant information instead of being forced to drag
large, non-native data around, missing useful context and that requires
parsing every time. This is one benefit of using an opaque type in the first
place, the other being ABI breakage avoidance.

> Because both pattern and actions are pointers, and there're also pointers in 
> structure
> rte_flow_item and struct rte_flow_action. We need to iterate allocation 
> during clone
> and iterate free during destroy, then seems that the code is something ugly, 
> right?

Well since I wrote that code, I won't easily admit it's ugly. I think PMDs
should not require the duplication of generic rules actually, which are only
defined as a common language between applications and PMDs. Both are free to
store rules in their own preferred and efficient format internally.

> I think application saves info when creating a flow rule, so why not 
> application provide
> attr/pattern/actions info to PMD before calling PMD API?

They have to do so temporarily (e.g. allocated on the stack) while calling
rte_flow_create() and rte_flow_validate(), that's it. Once a rule is
created, there's no requirement for applications to keep anything around.

For simple applications such as testpmd, the generic format is probably
enough. More complex and existing applications such as ovs-dpdk may rather
choose to keep using their internal format that already fits their needs,
partially duplicating this information in rte_flow_attr and
rte_flow_item/rte_flow_action lists would waste memory. The conversion in
this case should only be performed when creating/validating flow rules.

In short, I fail to see any downside with maintaining struct rte_flow opaque
to applications.

Best regards,

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-11-18 Thread Xing, Beilei
Hi Adrien,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Thursday, November 17, 2016 12:23 AM
> To: dev at dpdk.org
> Cc: Thomas Monjalon ; De Lara Guarch,
> Pablo ; Olivier Matz
> 
> Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> This new API supersedes all the legacy filter types described in 
> rte_eth_ctrl.h.
> It is slightly higher level and as a result relies more on PMDs to process and
> validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.
> 
> Signed-off-by: Adrien Mazarguil 


> +
> +/**
> + * Opaque type returned after successfully creating a flow.
> + *
> + * This handle can be used to manage and query the related flow (e.g.
> +to
> + * destroy it or retrieve counters).
> + */
> +struct rte_flow;
> +

As we talked before, we use attr/pattern/actions to create and destroy a flow 
in PMD, 
but I don't think it's easy to clone the user-provided parameters and return 
the result
to the application as a rte_flow pointer.  As you suggested:
/* PMD-specific code. */
 struct rte_flow {
struct rte_flow_attr attr;
struct rte_flow_item *pattern;
struct rte_flow_action *actions;
 };

Because both pattern and actions are pointers, and there're also pointers in 
structure
rte_flow_item and struct rte_flow_action. We need to iterate allocation during 
clone
and iterate free during destroy, then seems that the code is something ugly, 
right?

I think application saves info when creating a flow rule, so why not 
application provide
attr/pattern/actions info to PMD before calling PMD API?

Thanks,
Beilei Xing


[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-11-16 Thread Adrien Mazarguil
This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil 
---
 MAINTAINERS|   4 +
 lib/librte_ether/Makefile  |   3 +
 lib/librte_ether/rte_eth_ctrl.h|   1 +
 lib/librte_ether/rte_ether_version.map |  10 +
 lib/librte_ether/rte_flow.c| 159 +
 lib/librte_ether/rte_flow.h| 947 
 lib/librte_ether/rte_flow_driver.h | 177 ++
 7 files changed, 1301 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..3b46630 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon 
 F: lib/librte_ether/
 F: scripts/test-null.sh

+Generic flow API
+M: Adrien Mazarguil 
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty 
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5

 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c

 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h

 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring 
lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
RTE_ETH_FILTER_FDIR,
RTE_ETH_FILTER_HASH,
RTE_ETH_FILTER_L2_TUNNEL,
+   RTE_ETH_FILTER_GENERIC,
RTE_ETH_FILTER_MAX
 };

diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 72be66d..b5d2547 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,13 @@ DPDK_16.11 {
rte_eth_dev_pci_remove;

 } DPDK_16.07;
+
+DPDK_17.02 {
+   global:
+
+   rte_flow_validate;
+   rte_flow_create;
+   rte_flow_destroy;
+   rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 000..064963d
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+#include 
+#include 
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure fr