from:"\"Adrien Mazarguil\""

Re: [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item

2020-06-02 Thread Adrien Mazarguil

e this:
> Eth / ipv6 / Not (Ipv6.proto = frag_proto) / udp
> But it makes the rules much harder to use, and I don't think that there
> is any HW that support not, and adding such feature to all items is overkill.
> 
>  
> > Bit string suggested above will allow to match:
> >  - UDP over IPv6 with any extension headers:
> > eth / ipv6 (ext_hdrs mask empty) / udp / end
> >  - UDP over IPv6 without any extension headers:
> > eth / ipv6 (ext_hdrs mask full, spec empty) / udp / end
> >  - UDP over IPv6 without fragment header:
> > eth / ipv6 (ext.spec & ~FRAGMENT, ext.mask | FRAGMENT) / udp / end
> >  - UDP over IPv6 with fragment header
> > eth / ipv6 (ext.spec | FRAGMENT, ext.mask | FRAGMENT) / udp / end
> > 
> > where FRAGMENT is 1 << IPPROTO_FRAGMENT.
> > 
> Please see my response regarding this above.
> 
> > Above I intentionally keep 'proto' unspecified in ipv6
> > since otherwise it would specify the next header after IPv6
> > header.
> > 
> > Extension headers mask should be empty by default.

This is a deliberate design choice/issue with rte_flow: an empty pattern
matches everything; adding items only narrows the selection. As Andrew said
there is currently no way to provide a specific item to reject, it can only
be done globally on a pattern through INVERT that no PMD implements so far.

So we have two requirements here: the ability to specifically match IPv6
fragment headers and the ability to reject them.

To match IPv6 fragment headers, we need a dedicated pattern item. The
generic RTE_FLOW_ITEM_TYPE_IPV6_EXT is useless for that on its own, it must
be completed with RTE_FLOW_ITEM_TYPE_IPV6_EXT_FRAG and associated object
to match individual fields if needed (like all the others
protocols/headers).

Then to reject a pattern item... My preference goes to a new "NOT" meta item
affecting the meaning of the item coming immediately after in the pattern
list. That would be ultra generic, wouldn't break any ABI/API and like
INVERT, wouldn't even require a new object associated with it.

To match UDPv6 traffic when there is no fragment header, one could then do
something like:

 eth / ipv6 / not / ipv6_ext_frag / udp

PMD support would be trivial to implement (I'm sure!)

We may later implement other kinds of "operator" items as Andrew suggested,
for bit-wise stuff and so on. Let's keep adding features on a needed basis
though.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH] maintainers: resign from flow API maintenance

2020-01-08 Thread Adrien Mazarguil

Unfortunately due to lack of time, I've been unable to even participate to
flow API discussions for several months. Better make it official since this
is not going to improve anytime soon.

This doesn't mean I won't contribute to rte_flow in the future!

Cc: sta...@dpdk.org

Signed-off-by: Adrien Mazarguil 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 96fc89dc48..7355d88c95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -370,7 +370,6 @@ F: devtools/test-null.sh
 F: doc/guides/prog_guide/switch_representation.rst
 
 Flow API
-M: Adrien Mazarguil 
 M: Ori Kam 
 T: git://dpdk.org/next/dpdk-next-net
 F: app/test-pmd/cmdline_flow.c
-- 
2.20.1

Re: [dpdk-dev] [PATCH] maintainers: add co-maintainer for flow API

2020-01-06 Thread Adrien Mazarguil

On Sun, Dec 29, 2019 at 08:56:58AM +, Ori Kam wrote:
> I volunteer to be co maintainer for the rte_flow lib.
> 
> Signed-off-by: Ori Kam 

Welcome :)

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: add more protocol support in flow API

2019-08-14 Thread Adrien Mazarguil

s and their 
> attributes, if any.
>  
>- ``teid {unsigned}``: tunnel endpoint identifier.
>  
> +- ``gtp_psc``: match GTPv1 entension header (type is 0x85).
> +
> +  - ``pdu_type {unsigned}``: PDU type (0 or 1).
> +  - ``qfi {unsigned}``: QoS flow identifier.
> +
> +- ``pppoes``, ``pppoed``: match PPPOE header.

PPPOE => PPPoE

[...]
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b66bf1495..ad5e46190 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -328,6 +328,34 @@ enum rte_flow_item_type {
>*/
>   RTE_FLOW_ITEM_TYPE_GTPU,
>  
> + /**
> +  * Matches a GTP PDU extension header (type is 0x85:
> +  * PDU Session Container).

Session Container => session container

> +  *
> +  * Configure flow for GTP packets with extension header type 0x85.
> +  *
> +  * See struct rte_flow_item_gtp_psc.
> +  */
> + RTE_FLOW_ITEM_TYPE_GTP_PSC,
> +
> + /**
> +  * Matches a PPPOE header.
> +  *
> +  * Configure flow for PPPoE Session packets.

Session => session

> +  *
> +  * See struct rte_flow_item_pppoe.
> +  */
> + RTE_FLOW_ITEM_TYPE_PPPOES,
> +
> + /**
> +  * Matches a PPPOE header.
> +  *
> +  * Configure flow for PPPoE Discovery stage packets.

Discovery => discovery

> +  *
> +  * See struct rte_flow_item_pppoe.
> +  */
> + RTE_FLOW_ITEM_TYPE_PPPOED,
> +
>   /**
>* Matches a ESP header.
>*
> @@ -922,6 +950,49 @@ static const struct rte_flow_item_gtp 
> rte_flow_item_gtp_mask = {
>  };
>  #endif
>  
> +/**
> + * RTE_FLOW_ITEM_TYPE_GTP_PSC.
> + *
> + * Matches a GTP-extension header
> + * (type is 0x85: PDU Session Container).

Session Container => session container

(crusade against superfluous caps!)

[...]
> +/**
> + * RTE_FLOW_ITEM_TYPE_PPPOE.
> + *
> + * Matches a PPPOE header.
> + */
> +struct rte_flow_item_pppoe {
> + /**
> +  * Version (4b), type (4b).
> +  */
> + uint8_t v_t_flags;

v_t_flags => version_type

> + uint8_t code; /**< Message type. */
> + rte_be16_t session_id; /**< Session identifier. */
> + rte_be16_t length; /**< Payload length. */
> + rte_be16_t proto_id; /**< PPP Protocol identifier. */

As discussed, I suggest dropping proto_id to make this a generic match for
PPPoE.

> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_PPPOE. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_pppoe rte_flow_item_pppoe_mask = {
> + .session_id = RTE_BE16(0x),
> + .proto_id = RTE_BE16(0x),
> +};

I think the default for PPPoE should be an empty mask that simply says
"match PPPoE" since session_id only becomes known after negotiation and
proto_id shouldn't be part of this object.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH 1/2] ethdev: add symmetric toeplitz hash support

2019-07-31 Thread Adrien Mazarguil

On Wed, Jul 31, 2019 at 03:08:19PM +0300, Andrew Rybchenko wrote:
> On 7/25/19 7:57 AM, simei wrote:
> > From: Simei Su 
> > 
> > Currently, there are DEFAULT,TOEPLITZ and SIMPLE_XOR hash funtion.
> > To support symmetric hash by rte_flow RSS action, this patch adds
> > new hash function "Symmetric Toeplitz" which is supported by some hardware.
> 
> Isn't it a question of key to achieve symmetry?
> I.e. hash algorithm (function) is still the same - Toeplitz, but
> hash key makes the result symmetric (i.e. equal for flows in both
> directions - swap transport ports and IPv4/6 addresses).

This is only an option when src/dst are known in advance.

When doing RSS, HW implementations (such as Mellanox's) implement a modified
Toeplitz XOR'ing src with dst resulting in the same hash both ways
regardless of the key.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [dpdk-stable] [PATCH] librte_flow_classify: fix out-of-bounds access

2019-07-30 Thread Adrien Mazarguil

On Tue, Jul 30, 2019 at 01:27:41PM -0400, Aaron Conole wrote:
> Ferruh Yigit  writes:
> 
> > On 7/30/2019 5:18 PM, Adrien Mazarguil wrote:
> >> On Tue, Jul 30, 2019 at 03:48:31PM +0100, Ferruh Yigit wrote:
> >>> On 7/30/2019 3:42 PM, Aaron Conole wrote:
> >>>> David Marchand  writes:
> >>>>
> >>>>> On Wed, Jul 10, 2019 at 11:49 PM Thomas Monjalon  
> >>>>> wrote:
> >>>>>>
> >>>>>> 09/07/2019 13:09, Bernard Iremonger:
> >>>>>>> This patch fixes the out-of-bounds coverity issue by removing the
> >>>>>>> offending line of code at line 107 in rte_flow_classify_parse.c
> >>>>>>> which is never executed.
> >>>>>>>
> >>>>>>> Coverity issue: 343454
> >>>>>>>
> >>>>>>> Fixes: be41ac2a330f ("flow_classify: introduce flow classify library")
> >>>>>>> Cc: sta...@dpdk.org
> >>>>>>> Signed-off-by: Bernard Iremonger 
> >>>>>>
> >>>>>> Applied, thanks
> >>>>>
> >>>>> We have a segfault in the unit tests since this patch.
> >>>>
> >>>> I think this patch is still correct.  The issue is in the semantic of
> >>>> the flow classify pattern.  It *MUST* always have a valid end marker,
> >>>> but the test passes an invalid end marker.  This causes the bounds to
> >>>> exceed.
> >>>>
> >>>> So, it would be best to fix it, either by having a "failure" on unknown
> >>>> markers (f.e. -1), or by passing a length around.  However, the crash
> >>>> should be expected.  The fact that the previous code was also incorrect
> >>>> and resulted in no segfault is pure luck.
> >>>>
> >>>> See rte_flow_classify_parse.c:80 and test_flow_classify.c:387
> >>>>
> >>>> I would be in favor of passing the lengths of the two arrays to these
> >>>> APIs.  That would let us still make use of the markers (for valid
> >>>> construction), but also let us reason about lengths in a sane way.
> >>>>
> >>>> WDYT?
> >>>>
> >>>
> >>> +1, I also just replied with something very similar.
> >>>
> >>> With current API the testcase is wrong, and it will crash, also the 
> >>> invalid
> >>> action one has exact same problem.
> >>>
> >>> The API can be updated as you suggested, with a length field and 
> >>> testcases can
> >>> be added back.
> >>>
> >>> What worries me more is the rte_flow, which uses same arguments, and open 
> >>> to
> >>> same errors, should we consider updating rte_flow APIs to have lengths 
> >>> values too?
> >> 
> >> (Jumping in since all dashboard lights in my control room went red after
> >> "rte_flow" was detected in this discussion)
> >
> > :)
> >
> >> 
> >> Length values for patterns and action lists were considered during design
> >> but END was preferred as the better solution for convenience and because
> >> it's actually safer:
> >> 
> >> - C programmers are well aware of the dire consequences of omitting the
> >>   ending NUL byte in strings so it's not a foreign concept. This is
> >>   documented as such for rte_flow.
> >
> > I believe, C string functions are one of the most error prone part of the 
> > libc,
> > even after a dozen of years it is not rare to crash the applications 
> > because of
> > omitted terminating NULL, so I think this is not the best example :)
> 
> +1

Of course, but I see such crashes as a *feature* when something's wrong in
the code. Silent data corruption is much, much worse. Those are not
recoverable errors, so it's no different from ignoring SIGSEGV and hoping
for the best (whee, no more crashes!)

> >> 
> >> - Static initialization of flow rules (i.e. defining a large fixed array)
> >>   is much easier if one doesn't have to encode its size as well, think 
> >> about
> >>   compilation directives (#ifdef) on some of its elements.
> >> 
> >> - Like omitting the END element, providing the wrong array size by mistake
> >>   remains a possibility, with similar or possibly worse consequences as
> >>   it's less likely to crash

Re: [dpdk-dev] [dpdk-stable] [PATCH] librte_flow_classify: fix out-of-bounds access

2019-07-30 Thread Adrien Mazarguil

On Tue, Jul 30, 2019 at 03:48:31PM +0100, Ferruh Yigit wrote:
> On 7/30/2019 3:42 PM, Aaron Conole wrote:
> > David Marchand  writes:
> > 
> >> On Wed, Jul 10, 2019 at 11:49 PM Thomas Monjalon  
> >> wrote:
> >>>
> >>> 09/07/2019 13:09, Bernard Iremonger:
> >>>> This patch fixes the out-of-bounds coverity issue by removing the
> >>>> offending line of code at line 107 in rte_flow_classify_parse.c
> >>>> which is never executed.
> >>>>
> >>>> Coverity issue: 343454
> >>>>
> >>>> Fixes: be41ac2a330f ("flow_classify: introduce flow classify library")
> >>>> Cc: sta...@dpdk.org
> >>>> Signed-off-by: Bernard Iremonger 
> >>>
> >>> Applied, thanks
> >>
> >> We have a segfault in the unit tests since this patch.
> > 
> > I think this patch is still correct.  The issue is in the semantic of
> > the flow classify pattern.  It *MUST* always have a valid end marker,
> > but the test passes an invalid end marker.  This causes the bounds to
> > exceed.
> > 
> > So, it would be best to fix it, either by having a "failure" on unknown
> > markers (f.e. -1), or by passing a length around.  However, the crash
> > should be expected.  The fact that the previous code was also incorrect
> > and resulted in no segfault is pure luck.
> > 
> > See rte_flow_classify_parse.c:80 and test_flow_classify.c:387
> > 
> > I would be in favor of passing the lengths of the two arrays to these
> > APIs.  That would let us still make use of the markers (for valid
> > construction), but also let us reason about lengths in a sane way.
> > 
> > WDYT?
> > 
> 
> +1, I also just replied with something very similar.
> 
> With current API the testcase is wrong, and it will crash, also the invalid
> action one has exact same problem.
> 
> The API can be updated as you suggested, with a length field and testcases can
> be added back.
> 
> What worries me more is the rte_flow, which uses same arguments, and open to
> same errors, should we consider updating rte_flow APIs to have lengths values 
> too?

(Jumping in since all dashboard lights in my control room went red after
"rte_flow" was detected in this discussion)

Length values for patterns and action lists were considered during design
but END was preferred as the better solution for convenience and because
it's actually safer:

- C programmers are well aware of the dire consequences of omitting the
  ending NUL byte in strings so it's not a foreign concept. This is
  documented as such for rte_flow.

- Static initialization of flow rules (i.e. defining a large fixed array)
  is much easier if one doesn't have to encode its size as well, think about
  compilation directives (#ifdef) on some of its elements.

- Like omitting the END element, providing the wrong array size by mistake
  remains a possibility, with similar or possibly worse consequences as
  it's less likely to crash early and more prone to silent data corruption.

- [tons of other good reasons here]

See?

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC,v3] ethdev: extend RSS offload types

2019-07-30 Thread Adrien Mazarguil

On Tue, Jul 30, 2019 at 06:06:56AM +, Ori Kam wrote:
> Hi Simei,
> 
> 
> 
> > -Original Message-
> > From: dev  On Behalf Of simei
> > Sent: Monday, July 29, 2019 5:44 AM
> > To: qi.z.zh...@intel.com; jingjing...@intel.com; ferruh.yi...@intel.com;
> > Adrien Mazarguil 
> > Cc: dev@dpdk.org; simei...@intel.com
> > Subject: [dpdk-dev] [RFC,v3] ethdev: extend RSS offload types
> > 
> > From: Simei Su 
> > 
> > Make it easier to represent to define macro values as (1ULL << ###).
> > 
> > This RFC reserves several bits as input set selection from bottom
> > of the 64 bits. The flow type is combined with input set to
> > represent rss types.
> > 
> 
> 
> Why reserve from the bottom? and not from the first available space?

I assume the reason is that maintaining the existing model with
RTE_ETH_FLOW_* sibling macros doesn't make sense for these, so this approach
doesn't impact future ETH_RSS_* definitions...

> > for example:
> > ETH_RSS_IPV4 | ETH_RSS_INSET_L3_SRC: hash on src ip address only
> > ETH_RSS_IPV4_UDP | ETH_RSS_INSET_L4_DST: hash on src/dst IP and
> > dst UDP port
> > ETH_RSS_L2_PAYLOAD | ETH_RSS_INSET_L2_DST: hash on dst mac address
> > 
> 
> What happens when the user set ETH_RSS_IPV4? From what I understand from your 
> RFC this will do nothing
> since no bits where enabled, am I correct? 
> If I'm correct this may break applications.

Also my thought, again I assume that ETH_RSS_INSET_* flags only act as
modifiers to ETH_RSS_IPV4 and friends, if none are provided, then both
source and destination are taken into account. If that's the design, it must
be properly documented still.

Looks like it's time to end the relationship between RTE_ETH_FLOW_* and
ETH_RSS_* seeing both serve different purposes, and these new macros only
make sense for RSS.

I suggest a prior patch that converts all those definitions:

 #define ETH_RSS_IPV4   (1ULL << RTE_ETH_FLOW_IPV4)
 [...]

To their numerical counterparts directly, in which case we have two options:

Without breaking ABI, e.g.:

 #define ETH_RSS_IPV4   (1ULL << 2)
 [...]

Or going further if we're ready to break ABI a tiny little bit, starting
over from zero and use unique flags for IPv4, IPv6, TCP and UDP without
distinguishing between NONFRAG, FRAG and IPV6_EX, which never made sense for
RSS, and separating L4 from L3 to save even more, that is:

 #define ETH_RSS_ETH(1ULL << 0)
 #define ETH_RSS_IPV4   (1ULL << 1)
 #define ETH_RSS_IPV6   (1ULL << 2)
 #define ETH_RSS_UDP(1ULL << 3)
 #define ETH_RSS_TCP(1ULL << 4)
 #define ETH_RSS_SCTP   (1ULL << 5)
 [...]

Then the flags you would like to add would have to be more explicit. I think
Qi's original suggestion with "ONLY" was better in this regard than "INSET":

 #define ETH_RSS_L2_SRC_ONLY(1ULL << 6)
 #define ETH_RSS_L2_DST_ONLY(1ULL << 7)
 [...]

Otherwise there's still the negative approach:

 #define ETH_RSS_L2_NO_SRC  (1ULL << 6)
 #define ETH_RSS_L2_NO_DST  (1ULL << 7)
 [...]

Not sure which is better. Thoughts?

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] ethdev: extend flow metadata

2019-07-29 Thread Adrien Mazarguil

On Sun, Jul 14, 2019 at 02:46:58PM +0300, Andrew Rybchenko wrote:
> On 11.07.2019 10:44, Adrien Mazarguil wrote:
> > On Wed, Jul 10, 2019 at 04:37:46PM +, Yongseok Koh wrote:
> > > > On Jul 10, 2019, at 5:26 AM, Thomas Monjalon  
> > > > wrote:
> > > > 
> > > > 10/07/2019 14:01, Bruce Richardson:
> > > > > On Wed, Jul 10, 2019 at 12:07:43PM +0200, Olivier Matz wrote:
> > > > > > On Wed, Jul 10, 2019 at 10:55:34AM +0100, Bruce Richardson wrote:
> > > > > > > On Wed, Jul 10, 2019 at 11:31:56AM +0200, Olivier Matz wrote:
> > > > > > > > On Thu, Jul 04, 2019 at 04:21:22PM -0700, Yongseok Koh wrote:
> > > > > > > > > Currently, metadata can be set on egress path via mbuf 
> > > > > > > > > tx_meatadata field
> > > > > > > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META 
> > > > > > > > > matches metadata.
> > > > > > > > > 
> > > > > > > > > This patch extends the usability.
> > > > > > > > > 
> > > > > > > > > 1) RTE_FLOW_ACTION_TYPE_SET_META
> > > > > > > > > 
> > > > > > > > > When supporting multiple tables, Tx metadata can also be set 
> > > > > > > > > by a rule and
> > > > > > > > > matched by another rule. This new action allows metadata to 
> > > > > > > > > be set as a
> > > > > > > > > result of flow match.
> > > > > > > > > 
> > > > > > > > > 2) Metadata on ingress
> > > > > > > > > 
> > > > > > > > > There's also need to support metadata on packet Rx. Metadata 
> > > > > > > > > can be set by
> > > > > > > > > SET_META action and matched by META item like Tx. The final 
> > > > > > > > > value set by
> > > > > > > > > the action will be delivered to application via mbuf metadata 
> > > > > > > > > field with
> > > > > > > > > PKT_RX_METADATA ol_flag.
> > > > > > > > > 
> > > > > > > > > For this purpose, mbuf->tx_metadata is moved as a separate 
> > > > > > > > > new field and
> > > > > > > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > > > > > > 
> > > > > > > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may 
> > > > > > > > > not be
> > > > > > > > > propagated to the other path depending on HW capability.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Yongseok Koh 
> > > > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > @@ -648,17 +653,6 @@ struct rte_mbuf {
> > > > > > > > >   /**< User defined tags. See 
> > > > > > > > > rte_distributor_process() */
> > > > > > > > >   uint32_t usr;
> > > > > > > > >   } hash;   /**< hash information 
> > > > > > > > > */
> > > > > > > > > - struct {
> > > > > > > > > - /**
> > > > > > > > > -  * Application specific metadata value
> > > > > > > > > -  * for egress flow rule match.
> > > > > > > > > -  * Valid if PKT_TX_METADATA is set.
> > > > > > > > > -  * Located here to allow conjunct use
> > > > > > > > > -  * with hash.sched.hi.
> > > > > > > > > -  */
> > > > > > > > > - uint32_t tx_metadata;
> > > > > > > > > - uint32_t reserved;
> > > > > > > > > - };
> > > > > > > > >   };
> > > > > > > > > 
> > > > > > > > >

Re: [dpdk-dev] [PATCH] ethdev: extend flow metadata

2019-07-11 Thread Adrien Mazarguil

is at the end, it's not going to move any 
> >>>> older
> >>>> fields, and since everything is cache-aligned I don't think the structure
> >>>> size changes either.
> >>> 
> >>> I think it does break the ABI: in previous version, when the 
> >>> PKT_TX_METADATA
> >>> flag is set, the associated value is put in m->tx_metadata (offset 44 on
> >>> x86-64), and in the next version, it will be in m->metadata (offset 112). 
> >>> So,
> >>> these 2 versions are not binary compatible.
> >>> 
> >>> Anyway, at least it breaks the API.
> >> 
> >> Ok, I misunderstood. I thought it was the structure change itself you were
> >> saying broke the ABI. Yes, putting the data in a different place is indeed
> >> an ABI break.
> > 
> > We could add the new field and keep the old one unused,
> > so it does not break the ABI.
> 
> Still breaks ABI if PKT_TX_METADATA is set. :-) In order not to break it, I 
> can
> keep the current union'd field (tx_metadata) as is with PKT_TX_METADATA, add
> the new one at the end and make it used with the new PKT_RX_METADATA.
> 
> > However I suppose everybody will prefer a version using dynamic fields.
> > Is someone against using dynamic field for such usage?
> 
> However, given that the amazing dynamic fields is coming soon (thanks for your
> effort, Olivier and Thomas!), I'd be honored to be the first user of it.
> 
> Olivier, I'll take a look at your RFC.

Just got a crazy idea while reading this thread... How about repurposing
that "reserved" field as "rx_metadata" in the meantime?

I know reserved fields are cursed and no one's ever supposed to touch them
but this risk is mitigated by having the end user explicitly request its
use, so the patch author (and his relatives) should be safe from the
resulting bad juju.

Joke aside, while I like the idea of Tx/Rx META, I think the similarities
with MARK (and TAG eventually) is a problem. I wasn't available and couldn't
comment when META was originally added to the Tx path, but there's a lot of
overlap between these items/actions, without anything explaining to the end
user how and why they should pick one over the other, if they can be
combined at all and what happens in that case.

All this must be documented, then we should think about unifying their
respective features and deprecate the less capable items/actions. In my
opinion, users need exactly one method to mark/match some mark while
processing Rx/Tx traffic and *optionally* have that mark read from/written
to the mbuf, which may or may not be possible depending on HW features.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] ethdev: add flow tag

2019-07-09 Thread Adrien Mazarguil

On Fri, Jul 05, 2019 at 06:05:50PM +, Yongseok Koh wrote:
> > On Jul 5, 2019, at 6:54 AM, Adrien Mazarguil  
> > wrote:
> > 
> > On Thu, Jul 04, 2019 at 04:23:02PM -0700, Yongseok Koh wrote:
> >> A tag is a transient data which can be used during flow match. This can be
> >> used to store match result from a previous table so that the same pattern
> >> need not be matched again on the next table. Even if outer header is
> >> decapsulated on the previous match, the match result can be kept.
> >> 
> >> Some device expose internal registers of its flow processing pipeline and
> >> those registers are quite useful for stateful connection tracking as it
> >> keeps status of flow matching. Multiple tags are supported by specifying
> >> index.
> >> 
> >> Example testpmd commands are:
> >> 
> >>  flow create 0 ingress pattern ... / end
> >>actions set_tag index 2 value 0xaa00bb mask 0x00ff /
> >>set_tag index 3 value 0x123456 mask 0xff /
> >>vxlan_decap / jump group 1 / end
> >> 
> >>  flow create 0 ingress pattern ... / end
> >>actions set_tag index 2 value 0xcc00 mask 0xff00 /
> >>set_tag index 3 value 0x123456 mask 0xff /
> >>vxlan_decap / jump group 1 / end
> >> 
> >>  flow create 0 ingress group 1
> >>pattern tag index is 2 value spec 0xaa00bb value mask 0x00ff /
> >>eth ... / end
> >>actions ... jump group 2 / end
> >> 
> >>  flow create 0 ingress group 1
> >>pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
> >>tag index is 3 value spec 0x123456 value mask 0xff /
> >>eth ... / end
> >>actions ... / end
> >> 
> >>  flow create 0 ingress group 2
> >>pattern tag index is 3 value spec 0x123456 value mask 0xff /
> >>eth ... / end
> >>actions ... / end
> >> 
> >> Signed-off-by: Yongseok Koh 
> > 
> > Hi Yongseok,
> > 
> > Only high level questions for now, while it unquestionably looks useful,
> > from a user standpoint exposing the separate index seems redundant and not
> > necessarily convenient. Using the following example to illustrate:
> > 
> > actions set_tag index 3 value 0x123456 mask 0xf
> > 
> > pattern tag index is 3 value spec 0x123456 value mask 0xff
> > 
> > I might be missing something, but why isn't this enough:
> > 
> > pattern tag index is 3 # match whatever is stored at index 3
> > 
> > Assuming it can work, then why bother with providing value spec/mask on
> > set_tag? A flow rule pattern matches something, sets some arbitrary tag to
> > be matched by a subsequent flow rule and that's it. It even seems like
> > relying on the index only on both occasions is enough for identification.
> > 
> > Same question for the opposite approach; relying on the value, never
> > mentioning the index.
> > 
> > I'm under the impression that the index is a hardware-specific constraint
> > that shouldn't be exposed (especially since it's an 8-bit field). If so, a
> > PMD could keep track of used indices without having them exposed through the
> > public API.
> 
> 
> Thank you for review, Adrien.
> Hope you are doing well. It's been long since we talked each other. :-)

Yeah clearly! Hope you're doing well too. I'm somewhat busy hence slow to
answer these days...

  hey!
  no private talks!

Back to the topic:

> Your approach will work too in general but we have a request from customer 
> that
> they want to partition this limited tag storage. Assuming that HW exposes 
> 32bit
> tags (those are 'registers' in HW pipeline in mlx5 HW). Then, customers want 
> to
> store multiple data even in a 32-bit storage. For example, 16bit vlan tag, 
> 8bit
> table id and 8bit flow id. As they want to split one 32bit storage, I thought 
> it
> is better to provide mask when setting/matching the value. Even some customer
> wants to store multiple flags bit by bit like ol_flags. They do want to alter
> only partial bits.
> 
> And for the index, it is to reference an entry of tags array as HW can provide
> larger registers than 32-bit. For example, mlx5 HW would provide 4 of 32b
> storage which users can use for their own sake.
>   tag[0], tag[1], tag[2], tag[3]

OK, looks like I missed the point then. I initially took it for a funky
alternative to RTE_FLOW_ITEM_TYPE_META & RTE_FLOW_ACTION_TYPE_SET_META

Re: [dpdk-dev] [PATCH] ethdev: add flow tag

2019-07-05 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 04:23:02PM -0700, Yongseok Koh wrote:
> A tag is a transient data which can be used during flow match. This can be
> used to store match result from a previous table so that the same pattern
> need not be matched again on the next table. Even if outer header is
> decapsulated on the previous match, the match result can be kept.
> 
> Some device expose internal registers of its flow processing pipeline and
> those registers are quite useful for stateful connection tracking as it
> keeps status of flow matching. Multiple tags are supported by specifying
> index.
> 
> Example testpmd commands are:
> 
>   flow create 0 ingress pattern ... / end
> actions set_tag index 2 value 0xaa00bb mask 0x00ff /
> set_tag index 3 value 0x123456 mask 0xff /
> vxlan_decap / jump group 1 / end
> 
>   flow create 0 ingress pattern ... / end
> actions set_tag index 2 value 0xcc00 mask 0xff00 /
> set_tag index 3 value 0x123456 mask 0xff /
> vxlan_decap / jump group 1 / end
> 
>   flow create 0 ingress group 1
> pattern tag index is 2 value spec 0xaa00bb value mask 0x00ff /
> eth ... / end
> actions ... jump group 2 / end
> 
>   flow create 0 ingress group 1
> pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
> tag index is 3 value spec 0x123456 value mask 0xff /
> eth ... / end
> actions ... / end
> 
>   flow create 0 ingress group 2
> pattern tag index is 3 value spec 0x123456 value mask 0xff /
> eth ... / end
> actions ... / end
> 
> Signed-off-by: Yongseok Koh 

Hi Yongseok,

Only high level questions for now, while it unquestionably looks useful,
from a user standpoint exposing the separate index seems redundant and not
necessarily convenient. Using the following example to illustrate:

 actions set_tag index 3 value 0x123456 mask 0xf

 pattern tag index is 3 value spec 0x123456 value mask 0xff

I might be missing something, but why isn't this enough:

 pattern tag index is 3 # match whatever is stored at index 3

Assuming it can work, then why bother with providing value spec/mask on
set_tag? A flow rule pattern matches something, sets some arbitrary tag to
be matched by a subsequent flow rule and that's it. It even seems like
relying on the index only on both occasions is enough for identification.

Same question for the opposite approach; relying on the value, never
mentioning the index.

I'm under the impression that the index is a hardware-specific constraint
that shouldn't be exposed (especially since it's an 8-bit field). If so, a
PMD could keep track of used indices without having them exposed through the
public API.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v6 4/4] app/testpmd: match GRE's key and present bits

2019-07-05 Thread Adrien Mazarguil

On Fri, Jul 05, 2019 at 10:14:45AM +0800, Xiaoyu Min wrote:
> support matching on GRE key and present bits (C,K,S)
> 
> example testpmd command could be:
>   testpmd>flow create 0 ingress group 1 pattern eth / ipv4 /
> gre / gre_key value is 0x12345678 / end
> actions rss queues 1 0 end / mark id 196 / end
> 
> Which will match GRE packet with k present bit set and key value is
> 0x12345678.
> 
> Acked-by: Ori Kam 
> Signed-off-by: Xiaoyu Min 

A few more nits below.

[...]
> @@ -1898,6 +1915,50 @@ static const struct token token_list[] = {
>   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre,
>protocol)),
>   },
> + [ITEM_GRE_C_RSVD0_VER] = {
> + .name = "c_rsvd0_ver",
> + .help = "GRE's first word (bit0 - bit15)",

Help strings on existing fields should ideally be the same as their
counterparts in rte_flow.h (shortened if necessary, not starting with a cap
and not ending "."), in this case for instance:

 .help =
 "checksum (1b), undefined (1b), key bit (1b),"
 " sequence number (1b), reserved 0 (9b),"
 " version (3b)",

> + .next = NEXT(item_gre, NEXT_ENTRY(UNSIGNED), item_param),
> + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre,
> +  c_rsvd0_ver)),
> + },
> + [ITEM_GRE_C_BIT] = {
> + .name = "c_bit",
> + .help = "GRE's C present bit",

A bit odd, here's a suggestion:

 "checksum bit (C)".

> + .next = NEXT(item_gre, NEXT_ENTRY(BOOLEAN), item_param),
> + .args = ARGS(ARGS_ENTRY_MASK_HTON(struct rte_flow_item_gre,
> +   c_rsvd0_ver,
> +   "\x80\x00\x00\x00")),
> + },
> + [ITEM_GRE_S_BIT] = {
> + .name = "s_bit",
> + .help = "GRE's S present bit",

Ditto:

 "sequence number bit (S)"

> + .next = NEXT(item_gre, NEXT_ENTRY(BOOLEAN), item_param),
> + .args = ARGS(ARGS_ENTRY_MASK_HTON(struct rte_flow_item_gre,
> +   c_rsvd0_ver,
> +   "\x10\x00\x00\x00")),
> + },
> + [ITEM_GRE_K_BIT] = {
> + .name = "k_bit",
> + .help = "GRE's K present bit",

Ditto:

 "key bit (K)"

> + .next = NEXT(item_gre, NEXT_ENTRY(BOOLEAN), item_param),
> + .args = ARGS(ARGS_ENTRY_MASK_HTON(struct rte_flow_item_gre,
> +   c_rsvd0_ver,
> +   "\x20\x00\x00\x00")),
> + },
> + [ITEM_GRE_KEY] = {
> + .name = "gre_key",
> + .help = "match GRE Key",

Nit: no caps for "Key" => "match GRE key"

> + .priv = PRIV_ITEM(GRE_KEY, sizeof(rte_be32_t)),
> + .next = NEXT(item_gre_key),
> + .call = parse_vc,
> + },
> + [ITEM_GRE_KEY_VALUE] = {
> + .name = "value",
> + .help = "GRE key value",

No need to repeat "GRE" here since it's already in GRE context:

 "key value"

> + .next = NEXT(item_gre_key, NEXT_ENTRY(UNSIGNED), item_param),
> + .args = ARGS(ARG_ENTRY_HTON(rte_be32_t)),
> + },

Also ITEM_GRE_KEY and ITEM_GRE_KEY_VALUE should come after ITEM_META_DATA to
keep the same order as everywhere else.

Then assuming all the suggested changes are made:

Acked-by: Adrien Mazarguil 

Note I did not look at mlx5 patches, please make sure someone has reviewed
them. Thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v6 1/4] ethdev: add GRE key field to flow API

2019-07-05 Thread Adrien Mazarguil

On Fri, Jul 05, 2019 at 10:14:42AM +0800, Xiaoyu Min wrote:
> Add new rte_flow_item_gre_key in order to match the optional key field.
> 
> Acked-by: Ori Kam 
> Signed-off-by: Xiaoyu Min 

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC] ethdev: support input set change by RSS action

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 01:55:14PM +, Zhang, Qi Z wrote:
> 
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Thursday, July 4, 2019 5:07 PM
> > To: Su, Simei 
> > Cc: Zhang, Qi Z ; Wu, Jingjing 
> > ;
> > Xing, Beilei ; Yang, Qiming ;
> > dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC] ethdev: support input set change by RSS action
> > 
> > On Thu, Jul 04, 2019 at 12:47:09PM +0800, simei wrote:
> > > From: Simei Su 
> > >
> > > This RFC introduces inputset structure to rte_flow_action_rss to
> > > support input set specific configuration by rte_flow RSS action.
> > >
> > > We can give an testpmd command line example to make it more clear.
> > >
> > > For example, below flow selects the l4 port as inputset for any
> > > eth/ipv4/tcp packet: #flow create 0 ingress pattern eth / ipv4 / tcp /
> > > end actions rss inputset tcp src mask 0x dst mask 0x /end
> > >
> > > Signed-off-by: Simei Su 
> > > ---
> > >  lib/librte_ethdev/rte_flow.h | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > b/lib/librte_ethdev/rte_flow.h index f3a8fb1..2a455b6 100644
> > > --- a/lib/librte_ethdev/rte_flow.h
> > > +++ b/lib/librte_ethdev/rte_flow.h
> > > @@ -1796,6 +1796,9 @@ struct rte_flow_action_rss {
> > >   uint32_t queue_num; /**< Number of entries in @p queue. */
> > >   const uint8_t *key; /**< Hash key. */
> > >   const uint16_t *queue; /**< Queue indices to use. */
> > > + struct rte_flow_item *inputset; /** Provide more specific inputset
> > configuration.
> > > +  * ignore spec, only mask.
> > > +  */
> > >  };
> > >
> > >  /**
> > 
> > To make sure I understand, is this kind of a more flexible version of
> > rte_flow_action_rss.types?
> 
> Yes
> > 
> > For instance while specifying .types = ETH_RSS_IPV4 normally covers both
> > source and destination addresses, does this approach enable users to perform
> > RSS on source IP only? 
> 
> Yes, .it is the case to select any subset of 5 tuples or even tunnel header's 
> id for hash
> 
> > In which case, what value does the Toeplitz algorithm
> > assume for the destination, 0x0? (note: must be documented)
> 
> My understanding is src/dst pair is only required for a symmetric case
> But for Toeplitz, it is just a hash function, it process a serial of data 
> with specific algorithm, have no idea about which part is src and dst , 
> So for ip src only with Toeplitz, dst is not required to be a placeholder..

Right, I had symmetric Toeplitz in mind and wondered what would happen when
users would not select the required fields. I guess the PMD would have to
reject unsupported combinations.

> anything I missed, would you share more insight?

No, that answers the question, thanks.

Now what about my suggestion below? In short: extending ETH_RSS_* assuming
there's enough bits left in there, instead of adding a whole new framework
and breaking ABI in the process.

> > My opinion is that, unless you know of a hardware which can perform RSS on
> > random bytes of a packet, this approach is a bit overkill at this point.
> > 
> > How about simply adding the needed ETH_RSS_* definitions (e.g.
> > ETH_RSS_IPV4_(SRC|DST))? How many are needed?
> > 
> > There are currently 20 used bits and struct rte_flow_action_rss.types is 
> > 64-bit
> > wide. I'm sure we can manage something without harming the ABI. Even
> > better, you wouldn't need a deprecation notice.
> > 
> > If you use the suggested approach, please update testpmd and its
> > documentation as part of the same patch, thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC] ethdev: support symmetric hash function

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 12:48:25PM +, Zhang, Qi Z wrote:
> 
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Thursday, July 4, 2019 5:07 PM
> > To: Su, Simei 
> > Cc: Zhang, Qi Z ; Wu, Jingjing 
> > ;
> > Xing, Beilei ; Yang, Qiming ;
> > dev@dpdk.org; Shahaf Shuler ; Yongseok Koh
> > 
> > Subject: Re: [dpdk-dev] [RFC] ethdev: support symmetric hash function
> > 
> > On Thu, Jul 04, 2019 at 12:46:07PM +0800, simei wrote:
> > > From: Simei Su 
> > >
> > > Currently, there are DEFAULT,TOEPLITZ and SIMPLE_XOR hash funtion.
> > > To support symmetric hash by rte_flow RSS action, this RFC introduces
> > > SYMMETRIC_TOEPLITZ to rte_eth_hash_function.
> > >
> > > Signed-off-by: Simei Su 
> > > ---
> > >  lib/librte_ethdev/rte_flow.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > b/lib/librte_ethdev/rte_flow.h index f3a8fb1..e3c4fe5 100644
> > > --- a/lib/librte_ethdev/rte_flow.h
> > > +++ b/lib/librte_ethdev/rte_flow.h
> > > @@ -1744,6 +1744,7 @@ enum rte_eth_hash_function {
> > >   RTE_ETH_HASH_FUNCTION_DEFAULT = 0,
> > >   RTE_ETH_HASH_FUNCTION_TOEPLITZ, /**< Toeplitz */
> > >   RTE_ETH_HASH_FUNCTION_SIMPLE_XOR, /**< Simple XOR */
> > > + RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ, /**< Symmetric
> > TOEPLITZ */
> > 
> > "Symmetric TOEPLITZ" => "Symmetric Toeplitz."
> > 
> > >   RTE_ETH_HASH_FUNCTION_MAX,
> > >  };
> > 
> > Other than that, no problem with this change (no ABI impact, no need for
> > deprecation). Please update testpmd a part of the same patch:
> 
> Is it still ABI break but just with little risk? 
> RTE_ETH_HASH_FUNCTION_MAX's value is changed anyway.
> Should we just remove it, if no one use it?

Indeed, it will update RTE_ETH_HASH_FUNCTION_MAX so you're technically
right, and the fact it's unused in DPDK is doesn't mean applications are not
using it for something.

However for this specific case, the intent behind RTE_ETH_HASH_FUNCTION_MAX
is clearly to give out the number of enum entries, applications are not
supposed to use it for anything other than determining if an arbitrary
integer value corresponds to a valid hash function.

And this is the reason we could say it's OK ABI-wise to increase it (not
ideal but acceptable): a binary application has a fixed idea of
RTE_ETH_HASH_FUNCTION_MAX, it doesn't know the entries you're about to add
yet. To such an application, those will exceed RTE_ETH_HASH_FUNCTION_MAX and
should be rejected accordingly.

A more conservative approach would be to mark RTE_ETH_HASH_FUNCTION_MAX as
deprecated (in a separate patch) and schedule it for removal while adding
new entries after it. Its position in the enum could be recycled once
removed.

If you want to remove RTE_ETH_HASH_FUNCTION_MAX directly, do it in a
separate RFC/patch as it will otherwise block the rest of your submission
for something like 2 releases after deprecation.

It's up to you. I'm fine with any of these approaches.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v4 4/4] app/testpmd: match GRE's key and present bits

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 11:56:35AM +, Jack Min wrote:
> On Thu, 19-07-04, 11:52, Adrien Mazarguil wrote:
> > On Thu, Jul 04, 2019 at 05:52:43AM +, Jack Min wrote:
> > > On Wed, 19-07-03, 17:25, Adrien Mazarguil wrote:
> > > > On Tue, Jul 02, 2019 at 05:45:55PM +0800, Xiaoyu Min wrote:
> > > > > support matching on GRE key and present bits (C,K,S)
> > > > > 
> > > > > example testpmd command could be:
> > > > >   testpmd>flow create 0 ingress group 1 pattern eth / ipv4 /
> > > > >   gre crksv is 0x2000 crksv mask 0xb000 /
> > > > > gre_key key is 0x12345678 / end
> > > > > actions rss queues 1 0 end / mark id 196 / end
> > > > > 
> > > > > Which will match GRE packet with k present bit set and key value is
> > > > > 0x12345678.
> > > > > 
> > > > > Signed-off-by: Xiaoyu Min 
[...]
> > > Well, actullay, when a user explicitly set spec/mask K as "0" and still
> > > provide gre_key item, MLX5 PMD will implicitly set match on K bit as
> > > "1", just ingore the K bit set by user.
> > 
> > Not good then. You should spit an error out if it's an impossible
> > combination. You can't match both K == 0 *and* a GRE key, unless perhaps if
> > key mask is also 0, e.g.:
> > 
> >  gre crksv is 0x crksv mask 0xb000 /
> >  gre_key value spec 0x value mask 0x
> > 
> 
> Never thought man will wirte thing like this, they don't wanna match
> gre_key why put the item there ?
> But, since you have raised this example, I'll update PMD part to handle this.

It's just an example of valid yet convoluted command mind you, I'm not
forcing you to support it, however if you don't, you must raise an error,
you can't just ignore the K bit if user provides GRE_KEY.

[...]
> > Depends. They may want to match all GRE traffic with a key, doesn't matter
> > which, in order to process it through a different path. To do so they could
> > either:
> > 
> > 1. Use the GRE item only to match K bit == 1.
> > 
> > 2. Use the GRE_KEY item to match a nonspecific key value (mask == 0).
> > 
> > 3. Use a combination of both.
> > 
> > I think you can easily support all three of them with mlx5 if you support
> > partial masks on GRE keys (I haven't checked), even if you're unable to
> > specifically match the K bit itself.
> > 
> 
> Already support this.

OK, nice.

[...]
> > > Well, actully, we also wanna testpmd can match on C,S bits with K bit
> > > together so we can test on gre packet with key only or csum + key, or
> > > csum + key + sequence.
> > 
> > OK no problem. Perhaps you could make this easier by allowing users to match
> > individual bits, let me explain:
> > 
> > The flow command in testpmd is a direct interface to manipulate rte_flow's
> > structures. The "crksv" field doesn't exist in rte_flow_item_gre, its name
> > is "c_rsvd0_ver". Testpmd must use the same in its command and internal
> > code.
> > 
> > However since bit-masks are usually a pain to mentally work out, you can
> > provide extras for convenience. The "types" field of the RSS action
> > (ACTION_RSS_TYPES) is an extreme example of this approach.
> > 
> > So I suggest adding ITEM_GRE_C_RSVD0_VER taking a 16-bit value like CRKSV,
> > and complete it with ITEM_GRE_C_BIT, ITEM_GRE_S_BIT and ITEM_GRE_K_BIT
> > addressing the individual bits you would like to expose for convenience.
> > 
> 
> So something like:
>   eth / ipv4 / gre c_rsvd0_ver c_bit is 0 s_bit is 0 k_bit is 1 / ...
> 
> Is it right?

Looks like "c_rsvd0_ver" is incomplete, I assume you meant:

 eth / ipv4 / gre c_rsvd0_ver is 0 c_bit is 0 s_bit is 0 k_bit is 1 / ...

And yes it's valid. Of course since nothing is matched by default, users
will typically not provide c_rsvd0_ver at all and focus on the relevant bits
for their use case: 

 eth / ipv4 / gre k_bit is 1 / ...

Another suggestion, use BOOLEAN instead of INTEGER type for C/K/S to support
other binary expressions:

 eth / ipv4 / gre k_bit is on / ...

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v4 4/4] app/testpmd: match GRE's key and present bits

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 05:52:43AM +, Jack Min wrote:
> On Wed, 19-07-03, 17:25, Adrien Mazarguil wrote:
> > On Tue, Jul 02, 2019 at 05:45:55PM +0800, Xiaoyu Min wrote:
> > > support matching on GRE key and present bits (C,K,S)
> > > 
> > > example testpmd command could be:
> > >   testpmd>flow create 0 ingress group 1 pattern eth / ipv4 /
> > >   gre crksv is 0x2000 crksv mask 0xb000 /
> > > gre_key key is 0x12345678 / end
> > > actions rss queues 1 0 end / mark id 196 / end
> > > 
> > > Which will match GRE packet with k present bit set and key value is
> > > 0x12345678.
> > > 
> > > Signed-off-by: Xiaoyu Min 
> > 
> > I'm wondering... Is matching the K bit mandatory if one explicitly matches
> > gre_key already or is this a specific hardware limitation in your case?
> > 
> 
> If there is gre_key item MLX5 PMD will force set HW matching on K bit,
> From HW perspective it is mandatory. But, from testpmd (user)
> perspective, I agree with you, user needn't set matching on K bit if
> they already explicitly set gre_key item.

OK, makes sense.

> > Perhaps we could document that the K bit is implicitly matched as "1" in the
> > default mask when a gre_key pattern item is present. If a user explicitly
> 
> Yes, I should document this.
> So it should be documented in __testpmd_funcs.rst__ ?

No it would be a change in the GRE_KEY item itself at the rte_flow API
level (rte_flow.h) & documentation (rte_flow.rst). The flow rules created by
testpmd must be an exact translation of user input, as a debugging tool it
can't request something that wasn't explicitly written.

> > spec/mask K as "0" and still provides gre_key, the PMD can safely ignore the
> > gre_key item.
> > 
> 
> Well, actullay, when a user explicitly set spec/mask K as "0" and still
> provide gre_key item, MLX5 PMD will implicitly set match on K bit as
> "1", just ingore the K bit set by user.

Not good then. You should spit an error out if it's an impossible
combination. You can't match both K == 0 *and* a GRE key, unless perhaps if
key mask is also 0, e.g.:

 gre crksv is 0x crksv mask 0xb000 /
 gre_key value spec 0x value mask 0x

This is merely an overly complex way for telling the PMD that one wants to
match packets without GRE keys that you could technically support.

> The reason is wanna keep code simple, needn't to get
> information from other item (gre) inside gre_key item, or vice verse.

PMDs typically maintain context as they process the pattern. The GRE pattern
item is guaranteed to come before GRE_KEY, so you already know at this point
whether users want to match K at all, and if so, what value they want it to
have.

> And, I think, when a user provides a gre_key item, most probably, they do
> really wanna match on gre_key. What do you think?

Depends. They may want to match all GRE traffic with a key, doesn't matter
which, in order to process it through a different path. To do so they could
either:

1. Use the GRE item only to match K bit == 1.

2. Use the GRE_KEY item to match a nonspecific key value (mask == 0).

3. Use a combination of both.

I think you can easily support all three of them with mlx5 if you support
partial masks on GRE keys (I haven't checked), even if you're unable to
specifically match the K bit itself.

[...]
> > > @@ -755,6 +759,13 @@ static const enum index item_mpls[] = {
> > >  
> > >  static const enum index item_gre[] = {
> > >   ITEM_GRE_PROTO,
> > > + ITEM_GRE_CRKSV,
> > 
> > CRKSV may be unnecessary in this patch if the K bit is documented and
> > implemented as described in my previous comment.
> > 
> 
> Well, actully, we also wanna testpmd can match on C,S bits with K bit
> together so we can test on gre packet with key only or csum + key, or
> csum + key + sequence.

OK no problem. Perhaps you could make this easier by allowing users to match
individual bits, let me explain:

The flow command in testpmd is a direct interface to manipulate rte_flow's
structures. The "crksv" field doesn't exist in rte_flow_item_gre, its name
is "c_rsvd0_ver". Testpmd must use the same in its command and internal
code.

However since bit-masks are usually a pain to mentally work out, you can
provide extras for convenience. The "types" field of the RSS action
(ACTION_RSS_TYPES) is an extreme example of this approach.

So I suggest adding ITEM_GRE_C_RSVD0_VER taking a 16-bit value like CRKSV,
and complete it with ITEM_GRE_C_BIT, ITEM_GRE_S_BIT and ITEM_GRE_K_BIT
addressing the individual bits you would like to expose for convenience.

[...]
> > You should have named this field "value" then, i.e.:
> > 
> >  - ``value {unsigned}``: key value.
> > 
> 
> OK, I'll update it.

Please remember to update it in rte_flow.h and documentation as well,
thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC] ethdev: support input set change by RSS action

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 12:47:09PM +0800, simei wrote:
> From: Simei Su 
> 
> This RFC introduces inputset structure to rte_flow_action_rss to
> support input set specific configuration by rte_flow RSS action.
> 
> We can give an testpmd command line example to make it more clear.
> 
> For example, below flow selects the l4 port as inputset for any
> eth/ipv4/tcp packet: #flow create 0 ingress pattern eth / ipv4 / tcp /
> end actions rss inputset tcp src mask 0x dst mask 0x /end
> 
> Signed-off-by: Simei Su 
> ---
>  lib/librte_ethdev/rte_flow.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index f3a8fb1..2a455b6 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1796,6 +1796,9 @@ struct rte_flow_action_rss {
>   uint32_t queue_num; /**< Number of entries in @p queue. */
>   const uint8_t *key; /**< Hash key. */
>   const uint16_t *queue; /**< Queue indices to use. */
> + struct rte_flow_item *inputset; /** Provide more specific inputset 
> configuration.
> +  * ignore spec, only mask.
> +  */
>  };
>  
>  /**

To make sure I understand, is this kind of a more flexible version of
rte_flow_action_rss.types?

For instance while specifying .types = ETH_RSS_IPV4 normally covers both
source and destination addresses, does this approach enable users to perform
RSS on source IP only? In which case, what value does the Toeplitz algorithm
assume for the destination, 0x0? (note: must be documented)

My opinion is that, unless you know of a hardware which can perform RSS on
random bytes of a packet, this approach is a bit overkill at this point.

How about simply adding the needed ETH_RSS_* definitions
(e.g. ETH_RSS_IPV4_(SRC|DST))? How many are needed?

There are currently 20 used bits and struct rte_flow_action_rss.types is
64-bit wide. I'm sure we can manage something without harming the ABI. Even
better, you wouldn't need a deprecation notice.

If you use the suggested approach, please update testpmd and its
documentation as part of the same patch, thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC] ethdev: support symmetric hash function

2019-07-04 Thread Adrien Mazarguil

On Thu, Jul 04, 2019 at 12:46:07PM +0800, simei wrote:
> From: Simei Su 
> 
> Currently, there are DEFAULT,TOEPLITZ and SIMPLE_XOR hash funtion.
> To support symmetric hash by rte_flow RSS action, this RFC introduces
> SYMMETRIC_TOEPLITZ to rte_eth_hash_function.
> 
> Signed-off-by: Simei Su 
> ---
>  lib/librte_ethdev/rte_flow.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index f3a8fb1..e3c4fe5 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1744,6 +1744,7 @@ enum rte_eth_hash_function {
>   RTE_ETH_HASH_FUNCTION_DEFAULT = 0,
>   RTE_ETH_HASH_FUNCTION_TOEPLITZ, /**< Toeplitz */
>   RTE_ETH_HASH_FUNCTION_SIMPLE_XOR, /**< Simple XOR */
> + RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ, /**< Symmetric TOEPLITZ */

"Symmetric TOEPLITZ" => "Symmetric Toeplitz."

>   RTE_ETH_HASH_FUNCTION_MAX,
>  };

Other than that, no problem with this change (no ABI impact, no need for
deprecation). Please update testpmd a part of the same patch:

- Wherever "toeplitz" is mentioned in test-pmd/cmdline.c.

- Ditto for flow command, i.e. add ACTION_RSS_FUNC_SYMMETRIC_TOEPLITZ to
  test-pmd/cmdline_flow.c.

- Update "set_hash_global_config" documentation section in
  testpmd_app_ug/testpmd_funcs.rst.

Note to Shahaf/Yongseok, since mlx5 supports both but defaults to symmetric
Toeplitz on vanilla Linux and standard Toeplitz when using OFED, how about
using this chance to make the algorithm configurable as well?

Thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add GRE key field to flow API

2019-07-03 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 05:45:52PM +0800, Xiaoyu Min wrote:
> Add new rte_flow_item_gre_key in order to match the optional key field.
> 
> Signed-off-by: Xiaoyu Min 

OK with adding this feature, however I still have a bunch of comments below.

> ---
>  doc/guides/prog_guide/rte_flow.rst | 8 
>  lib/librte_ethdev/rte_flow.c   | 1 +
>  lib/librte_ethdev/rte_flow.h   | 7 +++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index a34d012e55..f4b7baa3c3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -980,6 +980,14 @@ Matches a GRE header.
>  - ``protocol``: protocol type.
>  - Default ``mask`` matches protocol only.
>  
> +Item: ``GRE_KEY``
> +^
> +
> +Matches a GRE key field.
> +This should be preceded by item ``GRE``

Nit: missing ending "."

> +
> +- Value to be matched is a big-endian 32 bit integer
> +
>  Item: ``FUZZY``
>  ^^^
>  
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 3277be1edb..f3e56d0bbe 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -55,6 +55,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] 
> = {
>   MK_FLOW_ITEM(NVGRE, sizeof(struct rte_flow_item_nvgre)),
>   MK_FLOW_ITEM(MPLS, sizeof(struct rte_flow_item_mpls)),
>   MK_FLOW_ITEM(GRE, sizeof(struct rte_flow_item_gre)),
> + MK_FLOW_ITEM(GRE_KEY, sizeof(rte_be32_t)),

Hmm? Adding a new item in the middle?

>   MK_FLOW_ITEM(FUZZY, sizeof(struct rte_flow_item_fuzzy)),
>   MK_FLOW_ITEM(GTP, sizeof(struct rte_flow_item_gtp)),
>   MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index f3a8fb103f..5d3702a44c 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -289,6 +289,13 @@ enum rte_flow_item_type {
>*/
>   RTE_FLOW_ITEM_TYPE_GRE,
>  
> + /**
> +  * Matches a GRE optional key field.
> +  *
> +  * The value should a big-endian 32bit integer.
> +  */
> + RTE_FLOW_ITEM_TYPE_GRE_KEY,
> +

Same comment. While I understand the intent to group GRE and GRE_KEY, doing
so causes ABI breakage by shifting the value of all subsequent pattern
items (see IPV6 and IPV6_EXT for instance).

We could later decide to sort them while knowingly breaking ABI on purpose,
however right now there's no choice but adding new pattern items and actions
at the end of their respective enums, please do that.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v4 4/4] app/testpmd: match GRE's key and present bits

2019-07-03 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 05:45:55PM +0800, Xiaoyu Min wrote:
> support matching on GRE key and present bits (C,K,S)
> 
> example testpmd command could be:
>   testpmd>flow create 0 ingress group 1 pattern eth / ipv4 /
>   gre crksv is 0x2000 crksv mask 0xb000 /
> gre_key key is 0x12345678 / end
> actions rss queues 1 0 end / mark id 196 / end
> 
> Which will match GRE packet with k present bit set and key value is
> 0x12345678.
> 
> Signed-off-by: Xiaoyu Min 

I'm wondering... Is matching the K bit mandatory if one explicitly matches
gre_key already or is this a specific hardware limitation in your case?

Perhaps we could document that the K bit is implicitly matched as "1" in the
default mask when a gre_key pattern item is present. If a user explicitly
spec/mask K as "0" and still provides gre_key, the PMD can safely ignore the
gre_key item.

I'm asking because I think most users won't bother with the K bit when
attempting to match some key and their rules may not behave as expected as a
result.

More below.

> ---
> ** This patch is based on patch [1]
> 
> [1] https://patches.dpdk.org/patch/55773/
> ---
>  app/test-pmd/cmdline_flow.c | 32 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  4 +++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index 201bd9de56..8504cc8bc1 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -148,6 +148,9 @@ enum index {
>   ITEM_MPLS_LABEL,
>   ITEM_GRE,
>   ITEM_GRE_PROTO,
> + ITEM_GRE_CRKSV,
> + ITEM_GRE_KEY,
> + ITEM_GRE_KEY_KEY,

Assuming you move the GRE_KEY definition in rte_flow.h, please keep its
location synchronized in this list as well.

>   ITEM_FUZZY,
>   ITEM_FUZZY_THRESH,
>   ITEM_GTP,
> @@ -595,6 +598,7 @@ static const enum index next_item[] = {
>   ITEM_NVGRE,
>   ITEM_MPLS,
>   ITEM_GRE,
> + ITEM_GRE_KEY,
>   ITEM_FUZZY,
>   ITEM_GTP,
>   ITEM_GTPC,
> @@ -755,6 +759,13 @@ static const enum index item_mpls[] = {
>  
>  static const enum index item_gre[] = {
>   ITEM_GRE_PROTO,
> + ITEM_GRE_CRKSV,

CRKSV may be unnecessary in this patch if the K bit is documented and
implemented as described in my previous comment.

> + ITEM_NEXT,
> + ZERO,
> +};
> +
> +static const enum index item_gre_key[] = {
> + ITEM_GRE_KEY_KEY,
>   ITEM_NEXT,
>   ZERO,
>  };
> @@ -1898,6 +1909,27 @@ static const struct token token_list[] = {
>   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre,
>protocol)),
>   },
> + [ITEM_GRE_CRKSV] = {
> + .name = "crksv",
> + .help = "GRE's first word (bit0 - bit15)",
> + .next = NEXT(item_gre, NEXT_ENTRY(UNSIGNED), item_param),
> + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gre,
> +  c_rsvd0_ver)),
> + },
> + [ITEM_GRE_KEY] = {
> + .name = "gre_key",
> + .help = "match GRE Key",
> + .priv = PRIV_ITEM(GRE_KEY,
> +   sizeof(rte_be32_t)),

Could be a single line.

> + .next = NEXT(item_gre_key),
> + .call = parse_vc,
> + },
> + [ITEM_GRE_KEY_KEY] = {
> + .name = "key",
> + .help = "GRE key",
> + .next = NEXT(item_gre_key, NEXT_ENTRY(UNSIGNED), item_param),
> + .args = ARGS(ARG_ENTRY_HTON(rte_be32_t)),
> + },
>   [ITEM_FUZZY] = {
>   .name = "fuzzy",
>   .help = "fuzzy pattern match, expect faster than default",
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index cb83a3ce8a..fc3ba8a009 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -3804,6 +3804,10 @@ This section lists supported pattern items and their 
> attributes, if any.
>  
>- ``protocol {unsigned}``: protocol type.
>  
> +- ``gre_key``: match GRE optional key field.
> +
> +  - ``key {unsigned}``: key value.
> +

You should have named this field "value" then, i.e.:

 - ``value {unsigned}``: key value.

>  - ``fuzzy``: fuzzy pattern match, expect faster than default.
>  
>- ``thresh {unsigned}``: accuracy threshold.
> -- 
> 2.21.0
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v10 0/3] add actions to modify header fields

2019-07-02 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 05:44:25PM +0300, Dekel Peled wrote:
> Patch [1] implemented set of header modification actions in MLX PMD, based on 
> ethdev and testpmd updates included in [2].
> This series implements support of additional header modification actions, in 
> ethdev, testpmd, and MLX5 PMD.
> 
> Original work by Xiaoyu Min.
> 
> [1] http://patches.dpdk.org/patch/49310/
> [2] http://mails.dpdk.org/archives/dev/2018-August/109672.html
> 
> ---
> v2: apply code review comments.
> v3: apply additional code review comments.
> - Update documentation of new commands.
> - Use common general struct for all commands.
> v4: apply checkpatch comments.
> v5: apply additional code review comments.
> - Add 8, 16, 32 bit types to union.
> - Update struct name and documentation.
> v6: expand description of new struct in h file and commit log.
> v7: - Remove the common general struct with union added in v3 & v5.
> - Commands take a simple integer value, not enclosed in a structure.
> - Use separate commands for INC and DEC with 32 bit unsigned value
>   of type rte_be32_t.
> v8: clean redundant comments refering to removed structure.
> v9: - Send the announcement of new approach (use action with single
>   argument configuration) in separate patch before this series,
>   see http://patches.dpdk.org/patch/55773/.
> - Add PMD release notes update.
> v10: - Reorder release notes update properly.
>  - Update comments for doxygen.

Phew!

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: support action with any config object type

2019-07-02 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 05:17:26PM +0300, Dekel Peled wrote:
> In current implementation, an action which requires parameters
> must accept them enclosed in a structure.
> Some actions require a single, trivial type parameter, but it still
> must be enclosed in a structure.
> This obligation results in multiple, action-specific structures, each
> containing a single trivial type parameter.
> 
> This patch introduces a new approach, allowing an action configuration
> object of any type, trivial or a structure.
> 
> Signed-off-by: Dekel Peled 

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add actions to modify TCP header fields

2019-07-02 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 09:52:40AM +, Dekel Peled wrote:
> Thanks, PSB
> 
> > -Original Message-
> > From: Andrew Rybchenko 
> > Sent: Tuesday, July 2, 2019 11:14 AM
> > To: Dekel Peled ; Adrien Mazarguil
> > ; wenzhuo...@intel.com;
> > jingjing...@intel.com; bernard.iremon...@intel.com; Yongseok Koh
> > ; Shahaf Shuler ; Slava
> > Ovsiienko ; arybche...@solarflare.com
> > Cc: dev@dpdk.org; Ori Kam 
> > Subject: Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add actions to modify TCP
> > header fields
> > 
> > On 01.07.2019 18:43, Dekel Peled wrote:
> > > Add actions:
> > > - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> > > - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP
> > header.
> > > - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
> > >   header.
> > > - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
> > >   header.
> > >
> > > Original work by Xiaoyu Min.
> > >
> > > This patch uses the new approach introduced by [1], using a simple
> > > integer instead of using an action-specific structure for each of the
> > > new actions.
> > >
> > > [1]
> > >
> > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> > >
> > es.dpdk.org%2Fpatch%2F55773%2F&data=02%7C01%7Cdekelp%40mell
> > anox.co
> > >
> > m%7Cae3a2667c3a243a9c1e508d6fec54a22%7Ca652971c7d2e4d9ba6a4d1492
> > 56f461
> > >
> > b%7C0%7C0%7C636976520663069258&sdata=1z3uDQGnnyPZH9NAUuY5
> > 0ZSg3smyZ
> > > nDmc3QZtuNTmyg%3D&reserved=0
> > >
> > > Signed-off-by: Dekel Peled 
> > > Acked-by: Andrew Rybchenko 
> > > ---
> > >   doc/guides/prog_guide/rte_flow.rst | 32
> > 
> > >   lib/librte_ethdev/rte_flow.c   |  4 
> > >   lib/librte_ethdev/rte_flow.h   | 32
> > 
> > >   3 files changed, 68 insertions(+)
> > >
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > > b/doc/guides/prog_guide/rte_flow.rst
> > > index 67deed7..bbe32db 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -2346,6 +2346,38 @@ Otherwise, RTE_FLOW_ERROR_TYPE_ACTION
> > error will be returned.
> > >  | ``mac_addr`` | MAC address   |
> > >  +--+---+
> > >
> > > +Action: ``INC_TCP_SEQ``
> > > +^^^
> > > +
> > > +Increase sequence number in the outermost TCP header.
> > > +Value to increase TCP sequence number by is a big-endian 32 bit integer.
> > > +
> > > +Using this action on non-matching traffic will result in undefined 
> > > behavior.
> > > +
> > > +Action: ``DEC_TCP_SEQ``
> > > +^^^
> > > +
> > > +Decrease sequence number in the outermost TCP header.
> > > +Value to decrease TCP sequence number by is a big-endian 32 bit integer.
> > > +
> > > +Using this action on non-matching traffic will result in undefined 
> > > behavior.
> > > +
> > > +Action: ``INC_TCP_ACK``
> > > +^^^
> > > +
> > > +Increase acknowledgment number in the outermost TCP header.
> > > +Value to increase TCP acknowledgment number by is a big-endian 32 bit
> > integer.
> > > +
> > > +Using this action on non-matching traffic will result in undefined 
> > > behavior.
> > > +
> > > +Action: ``DEC_TCP_ACK``
> > > +^^^
> > > +
> > > +Decrease acknowledgment number in the outermost TCP header.
> > > +Value to decrease TCP acknowledgment number by is a big-endian 32 bit
> > integer.
> > > +
> > > +Using this action on non-matching traffic will result in undefined 
> > > behavior.
> > > +
> > >   Negative types
> > >   ~~
> > >
> > > diff --git a/lib/librte_ethdev/rte_flow.c
> > > b/lib/librte_ethdev/rte_flow.c index 3277be1..0c9f6c6 100644
> > > --- a/lib/librte_ethdev/rte_flow.c
> > > +++ b/lib/librte_ethdev/rte_flow.c
> > > @@ -143,6 +143,10 @@ struct rte_flow_desc_data {
> > >   MK_FLOW_ACTION(SET_TTL, sizeof(struct rte_flow_action_set_ttl)),
> > >   MK_FLOW_ACTION(SET_MAC_SRC, sizeof(struct
> > rte_flow_

Re: [dpdk-dev] [PATCH] ethdev: support action with any config object type

2019-07-02 Thread Adrien Mazarguil

On Tue, Jul 02, 2019 at 08:42:41AM +, Dekel Peled wrote:
> Thanks, PSB.
> 
> > -Original Message-
> > From: Andrew Rybchenko 
> > Sent: Tuesday, July 2, 2019 11:09 AM
> > To: Dekel Peled ; Adrien Mazarguil
> > ; wenzhuo...@intel.com;
> > jingjing...@intel.com; bernard.iremon...@intel.com; Yongseok Koh
> > ; Shahaf Shuler ; Slava
> > Ovsiienko ; arybche...@solarflare.com
> > Cc: dev@dpdk.org; Ori Kam 
> > Subject: Re: [dpdk-dev] [PATCH] ethdev: support action with any config
> > object type
> > 
> > On 01.07.2019 17:10, Dekel Peled wrote:
> > > In current implementation, an action which requires parameters must
> > > accept them enclosed in a structure.
> > > Some actions require a single, trivial type parameter, but it still
> > > must be enclosed in a structure.
> > > This obligation results in multiple, action-specific structures, each
> > > containing a single trivial type parameter.
> > >
> > > This patch introduces a new approach, allowing an action configuration
> > > object of any type, trivial or a structure.
> > >
> > > This patch introduces, in test-pmd, a new macro ARG_ENTRY_HTON, to
> > > allow using a single argument, not enclosed in a structure.
> > >
> > > Signed-off-by: Dekel Peled 
> > 
> > The term "object" confuses me a bit, but I'm not a native speaker so it 
> > could
> > be just my wrong association. I'd prefer "configuration data".
> 
> In previous version I wrote just "action configuration", and changed to 
> "action configuration object" per Adrien's suggestion. I think it is better, 
> but if it causes confusion maybe it should be changed.
> 
> Adrien, what do you think? Does "configuration data" carry the correct 
> meaning?

Well I'm no native speaker either but "object" is the term used in the C
standard with a well-defined meaning [1] and encompasses everything
(integers, floats, structures, unions, functions, pointers, arrays):

 "region of data storage in the execution environment, the contents of which
  can represent values"

I think it's a bit less vague than "data" because whenever objects are
mentioned in the standard, they always have a type. There's no such thing as
a C object without one, and rte_flow puts a lot of emphasis on documenting
them.

 int foo;
 struct { ... } foo;
 double foo;
 char foo[];
 void *foo;
 
Whatever the type, would you refer to "foo" itself as an "object" or as
"data"?

Unrelated, but you must remove ARG_ENTRY_HTON from this patch since there's
no testpmd change in there that requires it. There's no tolerance for dead
code in testpmd as it doesn't expose an API.

Thanks.

[1] 3.14 "object"
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v8 1/3] ethdev: add actions to modify TCP header fields

2019-07-01 Thread Adrien Mazarguil

On Sun, Jun 30, 2019 at 10:59:08AM +0300, Dekel Peled wrote:
> Add actions:
> - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
> - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
>   header.
> - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
>   header.
> 
> Original work by Xiaoyu Min.
> 
> This patch introduces a new approach, using a simple integer instead
> of using an action-specific structure for each of these actions.
> This approach can be later applied to modify existing actions which
> require only a single integer.
> 
> Signed-off-by: Dekel Peled 
> Acked-by: Andrew Rybchenko 

You didn't take Andrew's comment [1] into account, this patch must be
split. I'll highlight what needs to be moved to a pre-patch below.

[1] https://mails.dpdk.org/archives/dev/2019-June/136101.html

[...]
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index a34d012..783a904 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1214,7 +1214,8 @@ Actions
>  ~~~
>  
>  Each possible action is represented by a type. Some have associated
> -configuration structures. Several actions combined in a list can be assigned
> +configuration structures, some others use a simple integer.
> +Several actions combined in a list can be assigned
>  to a flow rule and are performed in order.

 This must be moved to a separate patch 

BTW, how about "configuration structure" -> "configuration object"
encompassing all kinds of objects once and for all instead? Such a generic
term will be handy when actions start using floats or function pointers.

[...]
>  /**
> @@ -2140,7 +2172,7 @@ struct rte_flow_action_set_mac {
>   */
>  struct rte_flow_action {
>   enum rte_flow_action_type type; /**< Action type. */
> - const void *conf; /**< Pointer to action configuration structure. */
> + const void *conf; /**< Pointer to action configuration. */
>  };

 This must be moved to a separate patch 

Same comment regarding "configuration object".

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH 1/2] eal: fix duplicate experimental tag

2019-07-01 Thread Adrien Mazarguil

On Fri, Jun 28, 2019 at 06:23:19PM +0200, David Marchand wrote:
> On Fri, Jun 28, 2019 at 5:57 PM Adrien Mazarguil 
> wrote:
> 
> > Its presence on the function prototype in the header file is enough.
> >
> > Fixes: 5f4ed3f05849 ("eal: introduce random generator with upper bound")
> > Cc: Mattias Rönnblom 
> >
> > Signed-off-by: Adrien Mazarguil 
> > ---
> >  lib/librte_eal/common/rte_random.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/common/rte_random.c
> > b/lib/librte_eal/common/rte_random.c
> > index 3d9b9b7d8..f85119048 100644
> > --- a/lib/librte_eal/common/rte_random.c
> > +++ b/lib/librte_eal/common/rte_random.c
> > @@ -137,7 +137,7 @@ rte_rand(void)
> > return __rte_rand_lfsr258(state);
> >  }
> >
> > -uint64_t __rte_experimental
> > +uint64_t
> >  rte_rand_max(uint64_t upper_bound)
> >  {
> > struct rte_rand_state *state;
> > --
> > 2.11.0
> >
> 
> I had mentionned to Thomas I would do an extra pass, thanks for already
> catching this one.
> Do you mind if I squash this in my series ?

No problem, this series was tailored for this exact purpose, thanks for v2!

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v7 1/3] ethdev: add actions to modify TCP header fields

2019-06-28 Thread Adrien Mazarguil

On Thu, Jun 27, 2019 at 08:54:57PM +0300, Andrew Rybchenko wrote:
> On 6/27/19 8:39 PM, Dekel Peled wrote:
> > Add actions:
> > - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> > - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
> > - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
> > header.
> > - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
> > header.
> > 
> > Original work by Xiaoyu Min.
> > 
> > This patch introduces a new approach, using a simple integer instead
> > of using an action-specific structure for each of these actions.
> > This approach can be later applied to modify existing actions which
> > require only a single integer.
> 
> If we allow it, may be we should fix at least experimental API and
> remove dummy structures.
>
> I think ideally it should be a pre-patch which allows to avoid structures.
> Right now it is a mixture of two logical changes.

Yep, I agree we need to split patches and affected experimental APIs should
be modified as well, although the latter part can be done in a third patch,
possibly a whole separate series since a number of "fix" patches might be
needed. This series has been waiting long enough already.

> > Signed-off-by: Dekel Peled 
> 
> A nit below (plus above), other than that
> Acked-by: Andrew Rybchenko 

Ditto, otherwise looks good to me too, we're almost there!

Dekel: this looks so much cleaner without those pesky structures :)

> [...]
> 
> > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> > index f3a8fb1..8c962d0 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -1650,6 +1650,46 @@ enum rte_flow_action_type {
> >  * See struct rte_flow_action_set_mac.
> >  */
> > RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
> > +
> > +   /**
> > +* Increase sequence number in the outermost TCP header.
> > +*
> > +* Using this action on non-matching traffic will result in
> > +* undefined behavior.
> > +*
> > +* See struct rte_flow_integer_action.
> 
> There is no  rte_flow_integer_action, please, fix.
> 
> [...]

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH 1/2] eal: fix duplicate experimental tag

2019-06-28 Thread Adrien Mazarguil

Its presence on the function prototype in the header file is enough.

Fixes: 5f4ed3f05849 ("eal: introduce random generator with upper bound")
Cc: Mattias Rönnblom 

Signed-off-by: Adrien Mazarguil 
---
 lib/librte_eal/common/rte_random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_random.c 
b/lib/librte_eal/common/rte_random.c
index 3d9b9b7d8..f85119048 100644
--- a/lib/librte_eal/common/rte_random.c
+++ b/lib/librte_eal/common/rte_random.c
@@ -137,7 +137,7 @@ rte_rand(void)
return __rte_rand_lfsr258(state);
 }
 
-uint64_t __rte_experimental
+uint64_t
 rte_rand_max(uint64_t upper_bound)
 {
struct rte_rand_state *state;
-- 
2.11.0

[dpdk-dev] [PATCH 2/2] Fix __rte_experimental clutter

2019-06-28 Thread Adrien Mazarguil

Rather than prefixing the return type of function prototypes with
__rte_experimental, move it to a separate line to enhance readability.

Except for checkpatches.sh, this patch was automatically generated by:

 sed -i \
 -e '/^\([^#].*\)\?__rte_experimental */{' \
 -e 's//\1/; s/ *$//; i\' \
 -e __rte_experimental \
 -e '/^$/d}' \
 $(git grep -l __rte_experimental -- '*.h')

And applies on top of below commit:

Fixes: 3b45414830ff ("enforce __rte_experimental at the start of symbol 
declarations")
Cc: David Marchand 

Signed-off-by: Adrien Mazarguil 
---
 devtools/checkpatches.sh|  6 +-
 drivers/net/ixgbe/rte_pmd_ixgbe.h   | 15 ++--
 drivers/net/softnic/rte_eth_softnic.h   |  3 +-
 lib/librte_bbdev/rte_bbdev.h| 72 --
 lib/librte_bbdev/rte_bbdev_op.h | 18 +++--
 lib/librte_bbdev/rte_bbdev_pmd.h| 12 ++-
 lib/librte_bpf/rte_bpf.h| 18 +++--
 lib/librte_bpf/rte_bpf_ethdev.h | 12 ++-
 lib/librte_compressdev/rte_comp.h   | 18 +++--
 lib/librte_compressdev/rte_compressdev.h| 66 +++--
 lib/librte_compressdev/rte_compressdev_pmd.h| 18 +++--
 lib/librte_cryptodev/rte_cryptodev.h| 42 +++
 .../common/include/arch/x86/rte_atomic_64.h |  3 +-
 .../common/include/generic/rte_atomic.h |  3 +-
 .../common/include/generic/rte_cycles.h |  3 +-
 .../common/include/generic/rte_rwlock.h |  6 +-
 .../common/include/generic/rte_ticketlock.h | 27 ---
 lib/librte_eal/common/include/rte_dev.h | 27 ---
 lib/librte_eal/common/include/rte_eal.h | 18 +++--
 lib/librte_eal/common/include/rte_fbarray.h | 78 +---
 lib/librte_eal/common/include/rte_interrupts.h  |  3 +-
 lib/librte_eal/common/include/rte_lcore.h   |  6 +-
 lib/librte_eal/common/include/rte_malloc.h  | 30 +---
 lib/librte_eal/common/include/rte_memory.h  | 72 --
 lib/librte_eal/common/include/rte_random.h  |  3 +-
 lib/librte_eal/common/include/rte_service.h |  9 ++-
 lib/librte_ethdev/rte_ethdev.h  | 36 ++---
 lib/librte_ethdev/rte_ethdev_driver.h   | 15 ++--
 lib/librte_ethdev/rte_flow_driver.h |  3 +-
 lib/librte_ethdev/rte_mtr.h | 36 ++---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h  |  6 +-
 lib/librte_flow_classify/rte_flow_classify.h| 21 --
 lib/librte_hash/rte_hash.h  |  3 +-
 lib/librte_ip_frag/rte_ip_frag.h|  3 +-
 lib/librte_ipsec/rte_ipsec.h|  9 ++-
 lib/librte_ipsec/rte_ipsec_group.h  |  6 +-
 lib/librte_ipsec/rte_ipsec_sa.h | 12 ++-
 lib/librte_kni/rte_kni.h|  3 +-
 lib/librte_mbuf/rte_mbuf.h  |  9 ++-
 lib/librte_meter/rte_meter.h| 18 +++--
 lib/librte_net/rte_arp.h|  3 +-
 lib/librte_net/rte_net.h|  3 +-
 lib/librte_pipeline/rte_port_in_action.h| 24 --
 lib/librte_pipeline/rte_table_action.h  | 48 
 lib/librte_power/rte_power_empty_poll.h | 21 --
 lib/librte_rcu/rte_rcu_qsbr.h   | 39 ++
 lib/librte_sched/rte_sched.h|  3 +-
 lib/librte_security/rte_security.h  |  9 ++-
 lib/librte_stack/rte_stack.h| 21 --
 lib/librte_stack/rte_stack_lf.h |  6 +-
 lib/librte_stack/rte_stack_std.h|  9 ++-
 lib/librte_table/rte_table_hash_func.h  | 27 ---
 lib/librte_telemetry/rte_telemetry.h|  9 ++-
 lib/librte_telemetry/rte_telemetry_parser.h |  3 +-
 lib/librte_timer/rte_timer.h| 24 --
 lib/librte_vhost/rte_vdpa.h | 21 --
 lib/librte_vhost/rte_vhost.h| 33 ++---
 lib/librte_vhost/rte_vhost_crypto.h | 15 ++--
 58 files changed, 723 insertions(+), 363 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 25e3cc56c..e39bb5e21 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -95,9 +95,9 @@ check_experimental_tags() { # 
print "Please only put __rte_experimental tags in 
headers ("current_file")";
ret = 1;
}
-   if ($1 != "+__rte_experimental" &&
-   ($1 != "+" || $2 != "__rte_experimental")) {
-   print "__rte_experimental must be at the start of 
functions prototype.";
+   if ($1 != "+__rte_experimental" || $2 != "") {
+   print "__rte_experimental must appear

[dpdk-dev] [PATCH 0/2] Fix remaining issues with __rte_experimental

2019-06-28 Thread Adrien Mazarguil

This is a follow-up series to David's [1], addressing one remaining minor
issue and moving __rte_experimental on its own line where it belongs.

[1] "[PATCH 0/9] experimental tags fixes"
https://mails.dpdk.org/archives/dev/2019-June/136009.html

Adrien Mazarguil (2):
  eal: fix duplicate experimental tag
  Fix __rte_experimental clutter

 devtools/checkpatches.sh|  6 +-
 drivers/net/ixgbe/rte_pmd_ixgbe.h   | 15 ++--
 drivers/net/softnic/rte_eth_softnic.h   |  3 +-
 lib/librte_bbdev/rte_bbdev.h| 72 --
 lib/librte_bbdev/rte_bbdev_op.h | 18 +++--
 lib/librte_bbdev/rte_bbdev_pmd.h| 12 ++-
 lib/librte_bpf/rte_bpf.h| 18 +++--
 lib/librte_bpf/rte_bpf_ethdev.h | 12 ++-
 lib/librte_compressdev/rte_comp.h   | 18 +++--
 lib/librte_compressdev/rte_compressdev.h| 66 +++--
 lib/librte_compressdev/rte_compressdev_pmd.h| 18 +++--
 lib/librte_cryptodev/rte_cryptodev.h| 42 +++
 .../common/include/arch/x86/rte_atomic_64.h |  3 +-
 .../common/include/generic/rte_atomic.h |  3 +-
 .../common/include/generic/rte_cycles.h |  3 +-
 .../common/include/generic/rte_rwlock.h |  6 +-
 .../common/include/generic/rte_ticketlock.h | 27 ---
 lib/librte_eal/common/include/rte_dev.h | 27 ---
 lib/librte_eal/common/include/rte_eal.h | 18 +++--
 lib/librte_eal/common/include/rte_fbarray.h | 78 +---
 lib/librte_eal/common/include/rte_interrupts.h  |  3 +-
 lib/librte_eal/common/include/rte_lcore.h   |  6 +-
 lib/librte_eal/common/include/rte_malloc.h  | 30 +---
 lib/librte_eal/common/include/rte_memory.h  | 72 --
 lib/librte_eal/common/include/rte_random.h  |  3 +-
 lib/librte_eal/common/include/rte_service.h |  9 ++-
 lib/librte_eal/common/rte_random.c  |  2 +-
 lib/librte_ethdev/rte_ethdev.h  | 36 ++---
 lib/librte_ethdev/rte_ethdev_driver.h   | 15 ++--
 lib/librte_ethdev/rte_flow_driver.h |  3 +-
 lib/librte_ethdev/rte_mtr.h | 36 ++---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h  |  6 +-
 lib/librte_flow_classify/rte_flow_classify.h| 21 --
 lib/librte_hash/rte_hash.h  |  3 +-
 lib/librte_ip_frag/rte_ip_frag.h|  3 +-
 lib/librte_ipsec/rte_ipsec.h|  9 ++-
 lib/librte_ipsec/rte_ipsec_group.h  |  6 +-
 lib/librte_ipsec/rte_ipsec_sa.h | 12 ++-
 lib/librte_kni/rte_kni.h|  3 +-
 lib/librte_mbuf/rte_mbuf.h  |  9 ++-
 lib/librte_meter/rte_meter.h| 18 +++--
 lib/librte_net/rte_arp.h|  3 +-
 lib/librte_net/rte_net.h|  3 +-
 lib/librte_pipeline/rte_port_in_action.h| 24 --
 lib/librte_pipeline/rte_table_action.h  | 48 
 lib/librte_power/rte_power_empty_poll.h | 21 --
 lib/librte_rcu/rte_rcu_qsbr.h   | 39 ++
 lib/librte_sched/rte_sched.h|  3 +-
 lib/librte_security/rte_security.h  |  9 ++-
 lib/librte_stack/rte_stack.h| 21 --
 lib/librte_stack/rte_stack_lf.h |  6 +-
 lib/librte_stack/rte_stack_std.h|  9 ++-
 lib/librte_table/rte_table_hash_func.h  | 27 ---
 lib/librte_telemetry/rte_telemetry.h|  9 ++-
 lib/librte_telemetry/rte_telemetry_parser.h |  3 +-
 lib/librte_timer/rte_timer.h| 24 --
 lib/librte_vhost/rte_vdpa.h | 21 --
 lib/librte_vhost/rte_vhost.h| 33 ++---
 lib/librte_vhost/rte_vhost_crypto.h | 15 ++--
 59 files changed, 724 insertions(+), 364 deletions(-)

-- 
2.11.0

Re: [dpdk-dev] [PATCH 9/9] enforce __rte_experimental at the start of symbol declarations

2019-06-27 Thread Adrien Mazarguil

Hey David,

On Thu, Jun 27, 2019 at 01:33:55PM +0200, David Marchand wrote:
> Putting a '__attribute__((deprecated))' in the middle of a function
> prototype does not result in the expected result with gcc (while clang
> is fine with this syntax).
> 
> $ cat deprecated.c
> void * __attribute__((deprecated)) incorrect() { return 0; }
> __attribute__((deprecated)) void *correct(void) { return 0; }
> int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
> $ gcc -o deprecated.o -c deprecated.c
> deprecated.c: In function ‘main’:
> deprecated.c:3:1: warning: ‘correct’ is deprecated (declared at
> deprecated.c:2) [-Wdeprecated-declarations]
>  int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
>  ^
> 
> Let's enforce the tag is at the very start of the lines, this is not
> perfect but we will trust reviewers to catch the other not so easy to
> detect patterns.
> 
> tag=__rte_experimental
> git grep -l [^^]$tag |grep \\.h$ |while read file; do
>   [ "$file" != 'lib/librte_eal/common/include/rte_compat.h' ] ||
>   continue
>   sed -i -e 's#^\(.*\)  *'$tag'#'$tag' \1#' $file
>   sed -i -e 's#^\(..*\)'$tag'#'$tag' \1#' $file
> done

Just a suggestion, how about putting __rte_experimental on its own line
before the actual prototype? So that instead of:

 __rte_experimental struct rte_compressdev * __rte_experimental
 rte_compressdev_pmd_get_named_dev(const char *name);

We'd get:

 __rte_experimental
 struct rte_compressdev *
 rte_compressdev_pmd_get_named_dev(const char *name);

I personally find the latter much more readable.

Here's the relevant sed expression:

 sed -i \
 -e '/^\([^#].*\)__rte_experimental */{' \
 -e 's//\1/; s/ *$//; i\' \
 -e __rte_experimental \
 -e '}' \
 $(git grep -l __rte_experimental -- '*.h')

Otherwise this series looks good to me.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] build: enable BSD features visibility for FreeBSD

2019-05-14 Thread Adrien Mazarguil

On Tue, May 14, 2019 at 01:43:54PM +0200, Marcin Smoczynski wrote:
> When a component uses either XOPEN_SOURCE or POSIX_C_SOURCE macro
> explicitly in its build recipe, it restricts visibility of a non POSIX
> features subset, such as IANA protocol numbers (IPPROTO_* macros).
> Non standard features are enabled by default for DPDK both for Linux
> thanks to _GNU_SOURCE and for FreeBSD thanks to __BSD_VISIBLE. However
> using XOPEN_SOURCE or POSIX_(C_)SOURCE in a component causes
> __BSD_VISIBLE to be defined to 0 for FreeBSD, causing different feature
> sets visibility for Linux and FreeBSD. It restricts from using IPPROTO
> macros in public headers, such as rte_ip.h, despite the fact they are
> already widely used in sources.
> 
> Add __BSD_VISIBLE macro specified unconditionally for FreeBSD targets
> which enforces feature sets visibility unification between Linux and
> FreeBSD.
> 
> This patch solves the problem of build breaks for [1] on FreeBSD [2]
> following the discussion [3].
> 
> [1] https://mails.dpdk.org/archives/dev/2019-May/131885.html
> [2] http://mails.dpdk.org/archives/test-report/2019-May/082263.html
> [3] https://mails.dpdk.org/archives/dev/2019-May/132110.html
> 
> Signed-off-by: Marcin Smoczynski 

Thanks!

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] Using _XOPEN_SOURCE macros may break builds on FreeBSD

2019-05-14 Thread Adrien Mazarguil

> > system, I think the blame is on DPDK.
> > >
> > > > > I think this reason is
> > > > > enough to go with -D__BSD_VISIBLE under FreeBSD without removing
> > > > > _XOPEN_SOURCE, as it should work regardless.
> > > >
> > > > So do you suggest to add '-D __BSD_VISIBLE'  into mlx/failsafe PMDs
> > > > Makefiles/meson.build, or ... ?
> > >
> > > Since headers of our public API potentially require it, it must be
> > > defined globally (unlike _XOPEN_SOURCE which is only local to a few
> > PMDs):
> > > app/meson.build, lib/meson.build, mk/target/generic/rte.vars.mk,
> > > alongside -D_GNU_SOURCE.
> > >
> > > Add it to mlx*/failsafe only if that's not enough. Just make sure
> > > applications inherit this flag.
> >
> > Ok, to summarize, eyour suggestion is:
> > 1. remove -D_XOPEN_SOURCE=... from mlx and failsafe PMDs.
> > 2. add '-D __BSD_VISIBLE'  into top level make/meson files
> > (app/meson.build, lib/meson.build, mk/target/generic/rte.vars.mk) Similar
> > to what we doing for -DGNU_SOURCE.
> >
> > If I understand you correctly, then it sounds ok to me.
> >
> > >
> > > > > Looking at the patch [1], I also think there's another, simpler 
> > > > > approach:
> > > > > unless really performance critical, defining
> > > > > rte_ipv6_get_next_ext() in rte_net.c instead of a static inline in
> > rte_ip.h should address this issue.
> > > >
> > > > It is performance critical, and I think that function call for each
> > > > ext header is a way too expensive approach.
> > > > Will prefer to keep that function inline.
> > >
> > > OK, a bit cumbersome but since we're heading this way [2], how about
> > > defining our own instead of all the above?
> > >
> > >  #define RTE_IPPROTO_HOPOPTS 0
> > >  #define RTE_IPPROTO_ROUTING 43
> > >  ...
> > >
> > > Which could prove handy later as it appears Linux and FreeBSD don't
> > > have the same set of available IPPROTO_* definitions.
> > >
> > > Thoughts?
> > >
> > > [2] "[RFC v2 00/14] prefix network structures"
> > > https://mails.dpdk.org/archives/dev/2019-April/129752.html
> >
> > Yep, that's definitely an option too.
> > But if we going to replace all current references of IPPROTO_ inside DPDK to
> > RTE_IPROTO_ - the change will be massive.
> > And for sure it is out of scope of this patch series.
> > That's probably need to be done after Olivier RFC will be in and should be
> > subject of a separate patch series.
> > Konstantin
> 
> I agree that we need RTE_IPPROTO* macros but as Konstantin pointed out this
> would be a huge change and we should do that on top of Oliver's work in
> a separate patch set.
> 
> I will propose a patch set with:
> 1. Removed XOPEN_SOURCE macros as they are not needed anymore
> 2. Added BSD_VISIBLE at the top of build system.

Actually I still suggest to leave the existing _XOPEN_SOURCE for users of
-std=whatever, even if covered globally by _GNU_SOURCE and __BSD_VISIBLE.
I think it's useful as a reminder that they did their homework since this is
macro is itself standard.

Regarding RTE_IPPROTO*, my suggestion wasn't to convert DPDK entirely, only
to add the missing ones so far only needed by this patch. Given their values
are defined by RFCs, they should be fully compatible and interchangeable
with system definitions.

I'm fine with either approach in any case.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [RFC] ethdev: add GRE optional fields to flow API

2019-05-14 Thread Adrien Mazarguil

On Tue, May 14, 2019 at 10:34:22AM +0300, Andrew Rybchenko wrote:
> On 5/14/19 10:18 AM, Xiaoyu Min wrote:
> > Add GRE's checksum, key, and sequence field to the
> > struct rte_flow_item_gre in order to match.
> > 
> > Signed-off-by: Xiaoyu Min 
> > ---
> >   lib/librte_ethdev/rte_flow.h | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> > index 63f84fca65..fb04af3268 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -847,6 +847,10 @@ struct rte_flow_item_gre {
> >  */
> > rte_be16_t c_rsvd0_ver;
> > rte_be16_t protocol; /**< Protocol type. */
> > +   rte_be16_t checksum; /**< chksum for the header and payload, optional.*/
> > +   rte_be16_t rsvd1; /**< present when C bit is set, optional. */
> > +   rte_be32_t key; /**< application specific key value, optional. */
> > +   rte_be32_t sequence; /**< sequence num for the GRE packet, optional. */
> >   };
> >   /** Default mask for RTE_FLOW_ITEM_TYPE_GRE. */
> 
> What is the purpose to match checksum, reserved and sequence number?

I think it's not really an issue, this structure only describes a packet
header as found on the wire like other pattern items; rte_flow users only
have to provide a mask to select the fields to be matched.

However you can't just modify an existing public structure without going
through the lengthy API/ABI deprecation/versioning process.

The reason these fields were not initially part of rte_flow_item_gre is that
each of them is optional, meaning the GRE header has variable length.
They should be handled through separate objects like IPv6 options (struct
rte_flow_item_ipv6_ext), ARP (struct rte_flow_item_arp_eth_ipv4) or ICMPv6
neighbor discovery (struct rte_flow_item_icmp6_nd_opt), either all together
e.g.:

 RTE_FLOW_ITEM_TYPE_GRE_OPTS

 struct rte_flow_item_gre_opts {
 rte_be16_t checksum; /**< Checksum for GRE header and payload (C bit). */
 rte_be16_t rsvd1; /**< Reserved bits (C bit). */
 rte_be32_t key; /**< Application specific key value (K bit). */
 rte_be32_t sequence; /**< Sequence number for GRE packet (S bit). */
 };

Or separately, since I guess only key matters no need to define the others:

 RTE_FLOW_ITEM_TYPE_GRE_KEY

 struct rte_flow_item_gre_key {
 rte_be32_t key; /**< Application specific key value (K bit). */
 };

In both cases, the default mask for this object should cover "key". Make
sure to update documentation and testpmd in the same patch.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] Using _XOPEN_SOURCE macros may break builds on FreeBSD

2019-05-13 Thread Adrien Mazarguil

Hey Konstantin,

On Mon, May 13, 2019 at 10:49:00AM +, Ananyev, Konstantin wrote:
> Hi Adrien,
> 
> > 
> > On Mon, May 13, 2019 at 09:51:24AM +, Smoczynski, MarcinX wrote:
> > > 10/05/2019 20:17, Thomas Monjalon:
> > > > 10/05/2019 19:14, Smoczynski, MarcinX:
> > > > > To summarize we have different visibility sets for Linux and BSD
> > > > > when using XOPEN_SOURCE or POSIX_C_SOURCE explicitly. To overcome
> > > > > this situation we can either remove problematic XOPEN macros from
> > > > > mk/meson rules (drivers/net/failsafe, drivers/net/mlx4,
> > > > > drivers/net/mlx5)
> > > >
> > > > What is the consequence of removing these macros in mlx and failsafe 
> > > > PMDs?
> > >
> > > The purpose of these *_SOURCE constants is to enable particular feature 
> > > sets
> > > visibility. As long as we have GNU_SOURCE on Linux removing it won't have 
> > > any
> > > consequences. On BSD it will unify feature sets visibility with the rest 
> > > of
> > > sources. Can't think of any downsides here.
> > >
> > > I believe XOPEN_SOURCE was introduced to extend features not to restrict 
> > > them.
> > 
> > I confirm that under Linux, all IPPROTO_* (POSIX/XOPEN/RFC1700) are defined
> > regardless (_GNU_SOURCE not even needed), while under FreeBSD, the non-POSIX
> > versions are only defined when __BSD_VISIBLE is set.
> > 
> > The FreeBSD behavior is more correct in this respect since the purpose of
> > _XOPEN_SOURCE and friends is also to let applications limit the risk of
> > redefinitions in case they were written for an earlier standard
> > (e.g. -D_XOPEN_SOURCE=500 vs. -D_XOPEN_SOURCE=600).
> 
> Still not sure why do you need it for failsafe and mlx PMDs?
> Would something in these PMDs be broken without  '-D_XOPEN_SOURCE=600'?

Well, not really. At least not anymore if they compile fine without it on
all supported targets. I don't mind if they are removed from PMDs.

_XOPEN_SOURCE=600 was originally added to mlx4 (later inherited by mlx5 and
failsafe) for the following reasons:

- Out fo habit, since a lot of stuff in unistd.h and fcntl.h depends on it
  to be exposed. Some affected definitions were likely needed at some point.

- Besides toggling C syntax extensions, forcing a C standard through the
  -std parameter (e.g. -std=c99) in order to guarantee a minimum level of
  C compliance disables the implicit presence of nonstandard definitions,
  which must be re-enabled as needed through the appropriate #defines.

For instance, including unistd.h for getsid() stops working as soon as you
use -std=c99. On Linux you can get it back through -std=gnu99 or by
combining -std=c99 with -D_GNU_SOURCE or -D_XOPEN_SOURCE. The latter was
chosen because it is the standard define supposed to work across OSes.

Historically mlx4 had to enable -std=c99 to be able to use various features
not present when GCC defaulted to -std=gnu90. It was later transformed to
-std=c11 for similar reasons (anonymous members in structs/unions if memory
serves me right).

> > DPDK applications may also define _XOPEN_SOURCE for their own needs. They
> > should still be able to use rte_ip.h afterward.
> 
> I suppose they can, they would just have (on FreeBSD) to add '-D 
> __BSD_VISIBLE'
> themselves. 

Of course, but public headers should be as self sufficient as possible.
Unless they provide really insane compiler flags, if user applications get
compilation errors after including some header we install on the system,
I think the blame is on DPDK.

> > I think this reason is
> > enough to go with -D__BSD_VISIBLE under FreeBSD without removing
> > _XOPEN_SOURCE, as it should work regardless.
> 
> So do you suggest to add '-D __BSD_VISIBLE'  into
> mlx/failsafe PMDs Makefiles/meson.build, or ... ?

Since headers of our public API potentially require it, it must be defined
globally (unlike _XOPEN_SOURCE which is only local to a few PMDs):
app/meson.build, lib/meson.build, mk/target/generic/rte.vars.mk, alongside
-D_GNU_SOURCE.

Add it to mlx*/failsafe only if that's not enough. Just make sure
applications inherit this flag.

> > Looking at the patch [1], I also think there's another, simpler approach:
> > unless really performance critical, defining rte_ipv6_get_next_ext() in
> > rte_net.c instead of a static inline in rte_ip.h should address this issue.
> 
> It is performance critical, and I think that 
> function call for each ext header is a way too expensive approach.
> Will prefer to keep that function inline.

OK, a bit cumbersome but since we're heading this way [2], how about
defining our own instead of all the above?

 #define RTE_IPPROTO_HOPOPTS 0
 #define RTE_IPPROTO_ROUTING 43
 ...

Which could prove handy later as it appears Linux and FreeBSD don't have the
same set of available IPPROTO_* definitions.

Thoughts?

[2] "[RFC v2 00/14] prefix network structures"
https://mails.dpdk.org/archives/dev/2019-April/129752.html

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] Using _XOPEN_SOURCE macros may break builds on FreeBSD

2019-05-13 Thread Adrien Mazarguil

On Mon, May 13, 2019 at 09:51:24AM +, Smoczynski, MarcinX wrote:
> 10/05/2019 20:17, Thomas Monjalon:
> > 10/05/2019 19:14, Smoczynski, MarcinX:
> > > To summarize we have different visibility sets for Linux and BSD 
> > > when using XOPEN_SOURCE or POSIX_C_SOURCE explicitly. To overcome 
> > > this situation we can either remove problematic XOPEN macros from 
> > > mk/meson rules (drivers/net/failsafe, drivers/net/mlx4, 
> > > drivers/net/mlx5)
> > 
> > What is the consequence of removing these macros in mlx and failsafe PMDs?
> 
> The purpose of these *_SOURCE constants is to enable particular feature sets
> visibility. As long as we have GNU_SOURCE on Linux removing it won't have any
> consequences. On BSD it will unify feature sets visibility with the rest of
> sources. Can't think of any downsides here.
> 
> I believe XOPEN_SOURCE was introduced to extend features not to restrict them.

I confirm that under Linux, all IPPROTO_* (POSIX/XOPEN/RFC1700) are defined
regardless (_GNU_SOURCE not even needed), while under FreeBSD, the non-POSIX
versions are only defined when __BSD_VISIBLE is set.

The FreeBSD behavior is more correct in this respect since the purpose of
_XOPEN_SOURCE and friends is also to let applications limit the risk of
redefinitions in case they were written for an earlier standard
(e.g. -D_XOPEN_SOURCE=500 vs. -D_XOPEN_SOURCE=600).

DPDK applications may also define _XOPEN_SOURCE for their own needs. They
should still be able to use rte_ip.h afterward. I think this reason is
enough to go with -D__BSD_VISIBLE under FreeBSD without removing
_XOPEN_SOURCE, as it should work regardless.

Looking at the patch [1], I also think there's another, simpler approach:
unless really performance critical, defining rte_ipv6_get_next_ext() in
rte_net.c instead of a static inline in rte_ip.h should address this issue.

[1] "[PATCH 1/3] net: new ipv6 header extension parsing function"
https://mails.dpdk.org/archives/dev/2019-May/131885.html

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] net/failsafe: fix source port ID in Rx packets

2019-04-19 Thread Adrien Mazarguil

On Thu, Apr 18, 2019 at 04:06:31PM +0200, David Marchand wrote:
> Hey Adrien,

Hey David!

> On Thu, Apr 18, 2019 at 3:12 PM Adrien Mazarguil 
> wrote:
> 
> > When passed to the application, Rx packets retain the port ID value
> > originally set by slave devices. Unfortunately these IDs have no meaning to
> > applications, which are typically unaware of their existence.
> >
> > This confuses those caring about the source port field in mbufs (m->port)
> > which experience issues ranging from traffic drop to crashes.
> >
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")

> Not a big fan of those duplicated rx_burst functions...
> Reviewed-by: David Marchand 
> 
> I suppose the bonding pmd has the same issue.

I don't have much experience with it, however chances are that it's not as
bad as with failsafe since bonding behaves more like a helper that
applications knowingly use to aggregate ports they set up and already know
about. Leaving the original port ID in this case may actually be useful, but
could be optional.

Failsafe on the other hand spawns and manages any number of sub-devices
hidden from the application on its own based on opaque user configuration.
These sub-devices may or may not be present depending on hot-plug events
absorbed by failsafe, which means their port IDs are not only unexpected but
also volatile.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH v3] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

When passed to the application, Rx packets retain the port ID value
originally set by slave devices. Unfortunately these IDs have no meaning to
applications, which are typically unaware of their existence.

This confuses those caring about the source port field in mbufs (m->port)
which experience issues ranging from traffic drop to crashes.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: sta...@dpdk.org

Signed-off-by: Adrien Mazarguil 
Reviewed-by: David Marchand 
Acked-by: Gaetan Rivet 
--
v3 changes:

Removed unnecessary reference to slavery ("slave" device) for political
correctness.

Also kept *-by lines as in v2.

v2 changes:

Modified "rxq->priv->dev->data->port_id" (v18.11-style) to
"rxq->priv->data->port_id" (since v19.05) and checked compilation against
master this time.

Given the limited scope of that change, reviewed-by/acked-by lines were
kept.
---
 drivers/net/failsafe/failsafe_rxtx.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/failsafe/failsafe_rxtx.c 
b/drivers/net/failsafe/failsafe_rxtx.c
index 231c83291..fee08fa23 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -61,6 +61,21 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int 
force_safe)
rte_wmb();
 }
 
+/*
+ * Override source port in Rx packets.
+ *
+ * Make Rx packets originate from this PMD instance instead of one of its
+ * sub-devices. This is mandatory to avoid breaking applications.
+ */
+static void
+failsafe_rx_set_port(struct rte_mbuf **rx_pkts, uint16_t nb_pkts, uint16_t 
port)
+{
+   unsigned int i;
+
+   for (i = 0; i != nb_pkts; ++i)
+   rx_pkts[i]->port = port;
+}
+
 uint16_t
 failsafe_rx_burst(void *queue,
  struct rte_mbuf **rx_pkts,
@@ -87,6 +102,9 @@ failsafe_rx_burst(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->data->port_id);
return nb_rx;
 }
 
@@ -112,6 +130,9 @@ failsafe_rx_burst_fast(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->data->port_id);
return nb_rx;
 }
 
-- 
2.11.0

Re: [dpdk-dev] [PATCH v2] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

On Thu, Apr 18, 2019 at 06:54:22PM +0200, Thomas Monjalon wrote:

> > 
> > > > "slave" is a wording from bonding.
> > > > In failsafe, it is sub-device, isn't it?
> > 
> > I don't mind, although grep shows a couple of comments talking about slaves
> > already. Either way I think it fits as those are failsafe's pets, as in
> > failsafe does whatever it wants to them and they don't have a say :)
> > 
> > Does it warrant a v3?
> 
> Yes please, except if Ferruh is already doing the change on apply.

Will do.


> > > > I'm afraid the performance drop to be hard.
> > 
> > Mbufs are still hot from the oven at this stage, so it's not *that*
> > expensive. I don't see a more efficient approach.
> 
> Yes, Ali did some quick tests showing no perf drop.

Great.

> > > > How the port id in mbuf is used exactly?
> > 
> > Applications that dissociate Rx itself from packet processing, or whenever a
> > networking stack is involved. Basically every time some code wonders where a
> > packet comes from due to lack of context and looks at m->port for the
> > answer (e.g. checking that a packet arrives on the right port given its
> > destination address).
> > 
> > > > What crash are you seeing?
> > 
> > None, thankfully. In my specific use case, 6WINDGate's stack simply drops
> > traffic coming from unknown ports.
> > 
> > However nothing prevents applications from using m->port as an index of some
> > array they allocated to quickly retrieve port context without looking it
> > up. They wouldn't expect indices they do not know about in there; assuming
> > it will result in a crash is not far fetched.
> > 
> > > Another way to fix it without performance drop would be to add
> > > a new driver op to set the top-level port id.
> > > This top-level id would be stored in the private structure of the port,
> > > initialized with the port id of the port itself, and used to fill mbufs.
> > > 
> > > Thoughts?
> > 
> > Adding a new devop as a fix would be a problem for stable releases, so this
> > patch is definitely needed, at least as a first step.
> > 
> > I'm not against a new API, however would it be worth the trouble? Especially
> > considering it would only be used by failsafe-like drivers with something to
> > hide from applications which is not the main use case.
> > 
> > For some PMDs, this operation could only be done at init time before port ID
> > is stored in private Rx queue data for fast retrieval. Retrieving it through
> > a pointer so it can be updated anytime would make it more expensive than
> > necessary for them.
> 
> I don't understand this comment.
> The port id is currently retrieved via some pointers already.
> I suggest to look at private structure, it is not different.

See "rep->port = rxq->port_id" in mlx4_rxtx.c for instance. Port ID is
cached in private queue data structure (struct rxq) and retrieved there to
avoid looking it up in non-local data structure rxq->priv->dev_data->port.
In fact rxq->priv is not accessed even once during Rx.

> > It's understood that having failsafe in the dataplane has a cost, but even
> > with the proposed fix, that cost is dwarfed by the amount of work done by a
> > true PMD (and the application) for Rx processing.
> > 
> > My suggestion is to wait for someone to complain about the performance
> > compared to what they had before that fix, only then see what we can do.
> 
> OK
> 
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

On Thu, Apr 18, 2019 at 05:51:18PM +0200, Thomas Monjalon wrote:
> 18/04/2019 17:39, Thomas Monjalon:
> > 18/04/2019 17:32, Adrien Mazarguil:
> > > When passed to the application, Rx packets retain the port ID value
> > > originally set by slave devices. Unfortunately these IDs have no meaning 
> > > to
> > > applications, which are typically unaware of their existence.
> > > 
> > > This confuses those caring about the source port field in mbufs (m->port)
> > > which experience issues ranging from traffic drop to crashes.
> [...]
> > > +/*
> > > + * Override source port in Rx packets.
> > > + *
> > > + * Make Rx packets originate from this PMD instance instead of one of its
> > > + * slaves. This is mandatory to avoid breaking applications.
> > > + */

> > "slave" is a wording from bonding.
> > In failsafe, it is sub-device, isn't it?

I don't mind, although grep shows a couple of comments talking about slaves
already. Either way I think it fits as those are failsafe's pets, as in
failsafe does whatever it wants to them and they don't have a say :)

Does it warrant a v3?

> > > +static void
> > > +failsafe_rx_set_port(struct rte_mbuf **rx_pkts, uint16_t nb_pkts, 
> > > uint16_t port)
> > > +{
> > > + unsigned int i;
> > > +
> > > + for (i = 0; i != nb_pkts; ++i)
> > > + rx_pkts[i]->port = port;
> > > +}
> > > +
> > >  uint16_t
> > >  failsafe_rx_burst(void *queue,
> > > struct rte_mbuf **rx_pkts,
> > > @@ -87,6 +102,9 @@ failsafe_rx_burst(void *queue,
> > >   sdev = sdev->next;
> > >   } while (nb_rx == 0 && sdev != rxq->sdev);
> > >   rxq->sdev = sdev;
> > > + if (nb_rx)
> > > + failsafe_rx_set_port(rx_pkts, nb_rx,
> > > +  rxq->priv->data->port_id);
> > >   return nb_rx;
> > >  }
> > 
> > I'm afraid the performance drop to be hard.

Mbufs are still hot from the oven at this stage, so it's not *that*
expensive. I don't see a more efficient approach.

> > How the port id in mbuf is used exactly?

Applications that dissociate Rx itself from packet processing, or whenever a
networking stack is involved. Basically every time some code wonders where a
packet comes from due to lack of context and looks at m->port for the
answer (e.g. checking that a packet arrives on the right port given its
destination address).

> > What crash are you seeing?

None, thankfully. In my specific use case, 6WINDGate's stack simply drops
traffic coming from unknown ports.

However nothing prevents applications from using m->port as an index of some
array they allocated to quickly retrieve port context without looking it
up. They wouldn't expect indices they do not know about in there; assuming
it will result in a crash is not far fetched.

> Another way to fix it without performance drop would be to add
> a new driver op to set the top-level port id.
> This top-level id would be stored in the private structure of the port,
> initialized with the port id of the port itself, and used to fill mbufs.
> 
> Thoughts?

Adding a new devop as a fix would be a problem for stable releases, so this
patch is definitely needed, at least as a first step.

I'm not against a new API, however would it be worth the trouble? Especially
considering it would only be used by failsafe-like drivers with something to
hide from applications which is not the main use case.

For some PMDs, this operation could only be done at init time before port ID
is stored in private Rx queue data for fast retrieval. Retrieving it through
a pointer so it can be updated anytime would make it more expensive than
necessary for them.

It's understood that having failsafe in the dataplane has a cost, but even
with the proposed fix, that cost is dwarfed by the amount of work done by a
true PMD (and the application) for Rx processing.

My suggestion is to wait for someone to complain about the performance
compared to what they had before that fix, only then see what we can do.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH v2] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

When passed to the application, Rx packets retain the port ID value
originally set by slave devices. Unfortunately these IDs have no meaning to
applications, which are typically unaware of their existence.

This confuses those caring about the source port field in mbufs (m->port)
which experience issues ranging from traffic drop to crashes.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: sta...@dpdk.org

Signed-off-by: Adrien Mazarguil 
Reviewed-by: David Marchand 
Acked-by: Gaetan Rivet 
--
v2 changes:

Modified "rxq->priv->dev->data->port_id" (v18.11-style) to
"rxq->priv->data->port_id" (since v19.05) and checked compilation against
master this time.

Given the limited scope of that change, reviewed-by/acked-by lines were
kept.
---
 drivers/net/failsafe/failsafe_rxtx.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/failsafe/failsafe_rxtx.c 
b/drivers/net/failsafe/failsafe_rxtx.c
index 231c83291..b9cddec78 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -61,6 +61,21 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int 
force_safe)
rte_wmb();
 }
 
+/*
+ * Override source port in Rx packets.
+ *
+ * Make Rx packets originate from this PMD instance instead of one of its
+ * slaves. This is mandatory to avoid breaking applications.
+ */
+static void
+failsafe_rx_set_port(struct rte_mbuf **rx_pkts, uint16_t nb_pkts, uint16_t 
port)
+{
+   unsigned int i;
+
+   for (i = 0; i != nb_pkts; ++i)
+   rx_pkts[i]->port = port;
+}
+
 uint16_t
 failsafe_rx_burst(void *queue,
  struct rte_mbuf **rx_pkts,
@@ -87,6 +102,9 @@ failsafe_rx_burst(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->data->port_id);
return nb_rx;
 }
 
@@ -112,6 +130,9 @@ failsafe_rx_burst_fast(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->data->port_id);
return nb_rx;
 }
 
-- 
2.11.0

Re: [dpdk-dev] [PATCH] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

On Thu, Apr 18, 2019 at 03:42:24PM +0100, Ferruh Yigit wrote:
> On 4/18/2019 2:11 PM, Adrien Mazarguil wrote:
> > When passed to the application, Rx packets retain the port ID value
> > originally set by slave devices. Unfortunately these IDs have no meaning to
> > applications, which are typically unaware of their existence.
> > 
> > This confuses those caring about the source port field in mbufs (m->port)
> > which experience issues ranging from traffic drop to crashes.
> > 
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > Signed-off-by: Adrien Mazarguil 
> > ---
> >  drivers/net/failsafe/failsafe_rxtx.c | 21 +
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/drivers/net/failsafe/failsafe_rxtx.c 
> > b/drivers/net/failsafe/failsafe_rxtx.c
> > index 231c83291..e78624127 100644
> > --- a/drivers/net/failsafe/failsafe_rxtx.c
> > +++ b/drivers/net/failsafe/failsafe_rxtx.c
> > @@ -61,6 +61,21 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int 
> > force_safe)
> > rte_wmb();
> >  }
> >  
> > +/*
> > + * Override source port in Rx packets.
> > + *
> > + * Make Rx packets originate from this PMD instance instead of one of its
> > + * slaves. This is mandatory to avoid breaking applications.
> > + */
> > +static void
> > +failsafe_rx_set_port(struct rte_mbuf **rx_pkts, uint16_t nb_pkts, uint16_t 
> > port)
> > +{
> > +   unsigned int i;
> > +
> > +   for (i = 0; i != nb_pkts; ++i)
> > +   rx_pkts[i]->port = port;
> > +}
> > +
> >  uint16_t
> >  failsafe_rx_burst(void *queue,
> >   struct rte_mbuf **rx_pkts,
> > @@ -87,6 +102,9 @@ failsafe_rx_burst(void *queue,
> > sdev = sdev->next;
> > } while (nb_rx == 0 && sdev != rxq->sdev);
> > rxq->sdev = sdev;
> > +   if (nb_rx)
> > +   failsafe_rx_set_port(rx_pkts, nb_rx,
> > +        rxq->priv->dev->data->port_id);
> 
> error: struct "fs_priv" has no field "dev"
> 
> I guess intention is: "rxq->priv->data->port_id"

Indeed, I validated this patch as is against v18.11 and overlooked a
compilation check against master. I'll send an updated version with
Cc stable and all, thanks!

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH] net/failsafe: fix source port ID in Rx packets

2019-04-18 Thread Adrien Mazarguil

When passed to the application, Rx packets retain the port ID value
originally set by slave devices. Unfortunately these IDs have no meaning to
applications, which are typically unaware of their existence.

This confuses those caring about the source port field in mbufs (m->port)
which experience issues ranging from traffic drop to crashes.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/failsafe/failsafe_rxtx.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/failsafe/failsafe_rxtx.c 
b/drivers/net/failsafe/failsafe_rxtx.c
index 231c83291..e78624127 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -61,6 +61,21 @@ failsafe_set_burst_fn(struct rte_eth_dev *dev, int 
force_safe)
rte_wmb();
 }
 
+/*
+ * Override source port in Rx packets.
+ *
+ * Make Rx packets originate from this PMD instance instead of one of its
+ * slaves. This is mandatory to avoid breaking applications.
+ */
+static void
+failsafe_rx_set_port(struct rte_mbuf **rx_pkts, uint16_t nb_pkts, uint16_t 
port)
+{
+   unsigned int i;
+
+   for (i = 0; i != nb_pkts; ++i)
+   rx_pkts[i]->port = port;
+}
+
 uint16_t
 failsafe_rx_burst(void *queue,
  struct rte_mbuf **rx_pkts,
@@ -87,6 +102,9 @@ failsafe_rx_burst(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->dev->data->port_id);
return nb_rx;
 }
 
@@ -112,6 +130,9 @@ failsafe_rx_burst_fast(void *queue,
sdev = sdev->next;
} while (nb_rx == 0 && sdev != rxq->sdev);
rxq->sdev = sdev;
+   if (nb_rx)
+   failsafe_rx_set_port(rx_pkts, nb_rx,
+rxq->priv->dev->data->port_id);
return nb_rx;
 }
 
-- 
2.11.0

[dpdk-dev] [PATCH] net/mlx4: add support for multicast address list interface

2019-04-18 Thread Adrien Mazarguil

Since this driver does not distinguish unicast/multicast addresses,
applications could always rely on the standard MAC add/remove/set interface
to configure both types.

As a result, the multicast address list interface never got implemented
(rte_eth_dev_set_mc_addr_list()) however PMD-agnostic applications still
rely on it for compatibility reasons; a wrapper is therefore required.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c|  1 +
 drivers/net/mlx4/mlx4.h|  3 ++
 drivers/net/mlx4/mlx4_ethdev.c | 61 +++--
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 17dfcd5a3..fe559c040 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -410,6 +410,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
.mac_addr_remove = mlx4_mac_addr_remove,
.mac_addr_add = mlx4_mac_addr_add,
.mac_addr_set = mlx4_mac_addr_set,
+   .set_mc_addr_list = mlx4_set_mc_addr_list,
.stats_get = mlx4_stats_get,
.stats_reset = mlx4_stats_reset,
.fw_version_get = mlx4_fw_version_get,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6224b3be1..e2d184f84 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -188,6 +188,7 @@ struct mlx4_priv {
LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
/**< Configured MAC addresses. Unused entries are zeroed. */
+   uint32_t mac_mc; /**< Number of trailing multicast entries in mac[]. */
struct mlx4_verbs_alloc_ctx verbs_alloc_ctx;
/**< Context for Verbs allocator. */
 };
@@ -211,6 +212,8 @@ void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index);
 int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
  uint32_t index, uint32_t vmdq);
 int mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
+int mlx4_set_mc_addr_list(struct rte_eth_dev *dev, struct ether_addr *list,
+ uint32_t num);
 int mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
 void mlx4_stats_reset(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 4dae67a1b..c38455767 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -433,7 +433,7 @@ mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
struct mlx4_priv *priv = dev->data->dev_private;
struct rte_flow_error error;
 
-   if (index >= RTE_DIM(priv->mac)) {
+   if (index >= RTE_DIM(priv->mac) - priv->mac_mc) {
rte_errno = EINVAL;
return;
}
@@ -471,7 +471,7 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
int ret;
 
(void)vmdq;
-   if (index >= RTE_DIM(priv->mac)) {
+   if (index >= RTE_DIM(priv->mac) - priv->mac_mc) {
rte_errno = EINVAL;
return -rte_errno;
}
@@ -488,6 +488,63 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
 }
 
 /**
+ * DPDK callback to configure multicast addresses.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param list
+ *   List of MAC addresses to register.
+ * @param num
+ *   Number of entries in list.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_set_mc_addr_list(struct rte_eth_dev *dev, struct ether_addr *list,
+ uint32_t num)
+{
+   struct priv *priv = dev->data->dev_private;
+   struct rte_flow_error error;
+   int ret;
+
+   if (num > RTE_DIM(priv->mac)) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   /*
+* Make sure there is enough room to increase the number of
+* multicast entries without overwriting standard entries.
+*/
+   if (num > priv->mac_mc) {
+   unsigned int i;
+
+   for (i = RTE_DIM(priv->mac) - num;
+i != RTE_DIM(priv->mac) - priv->mac_mc;
+++i)
+   if (!is_zero_ether_addr(&priv->mac[i])) {
+   rte_errno = EBUSY;
+   return -rte_errno;
+   }
+   } else if (num < priv->mac_mc) {
+   /* Clear unused entries. */
+   memset(priv->mac + RTE_DIM(priv->mac) - priv->mac_mc,
+  0,
+  sizeof(priv->mac[0]) * (priv->mac_mc - num));
+   }
+   memcpy(priv->mac + RTE_DIM(priv->mac) - num, list, sizeof(*list) * num);
+

Re: [dpdk-dev] [PATCH] ethdev: deprecate legacy filter API

2019-04-18 Thread Adrien Mazarguil

On Wed, Apr 17, 2019 at 02:36:27AM +0200, Thomas Monjalon wrote:
> As stated in the deprecation notice from December 2016,
> "the legacy filter API, including rte_eth_dev_filter_supported(),
> rte_eth_dev_filter_ctrl() as well as filter types MACVLAN, ETHERTYPE,
> FLEXIBLE, SYN, NTUPLE, TUNNEL, FDIR, HASH and L2_TUNNEL, is superseded
> by the generic flow API (rte_flow)".
> 
> After a long wait of more than two years, the legacy filter API
> is marked as deprecated, while still tested with testpmd and
> the tep_termination example.
> 
> The next step will be to announce a deadline for complete removal.
> As preparation of the removal of rte_eth_ctrl.h,
> RTE_ETH_FLOW_*, RTE_TUNNEL_TYPE_* and RTE_ETH_HASH_FUNCTION_* definitions
> are moved to rte_ethdev.h and rte_flow.h.
> 
> Signed-off-by: Thomas Monjalon 

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v4 1/3] ethdev: add actions to modify TCP header fields

2019-04-18 Thread Adrien Mazarguil

On Wed, Apr 10, 2019 at 02:50:41PM +0300, Dekel Peled wrote:
> Add actions:
> - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
> - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
>   header.
> - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
>   header.
> 
> Original work by Xiaoyu Min.
> 
> Signed-off-by: Dekel Peled 

Almost there... Some changes were not made as discussed, please see below.

> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index 0203f4f..fc234de 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2345,6 +2345,78 @@ Otherwise, RTE_FLOW_ERROR_TYPE_ACTION error will be 
> returned.
> | ``mac_addr`` | MAC address   |
> +--+---+
>  
> +Action: ``INC_TCP_SEQ``
> +^^^
> +
> +Increase sequence number in the outermost TCP header.
> +
> +Using this action on non-matching traffic will result in undefined behavior,
> +depending on PMD implementation.

OK, "depending on PMD implementation" looks a bit redundant though.

> +
> +.. _table_rte_flow_action_inc_tcp_seq:
> +
> +.. table:: INC_TCP_SEQ
> +
> +   +---+--+
> +   | Field | Value|
> +   +===+==+
> +   | ``value`` | Value to increase TCP sequence number by |
> +   +---+--+

Configuration object documentation needs updating since you changed its type
and fields.

These comments also apply to DEC_TCP_SEQ, INC_TCP_ACK and DEC_TCP_ACK.

> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index c0fe879..e3f6210 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1651,6 +1651,46 @@ enum rte_flow_action_type {
>* See struct rte_flow_action_set_mac.
>*/
>   RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
> +
> + /**
> +  * Increase sequence number in the outermost TCP header.
> +  *
> +  * Using this action on non-matching traffic will result in
> +  * undefined behavior, depending on PMD implementation.

"depending on PMD implementation" also redundant here.

>  /*
> + * @warning
> + * @b EXPERIMENTAL: this union may change without prior notice
> + *
> + * General integer type, can be expanded as needed.
> + */
> +union rte_flow_integer {
> + rte_be32_t be32;
> +};

You must include the extra fields you don't use (64/16/32/8 and
le/be/host/signed variants as described in previous messages) to limit the
risk of ABI breakage next time someone needs a field that isn't there yet.

This object must be able to accommodate the largest supported integer and
have the proper alignment constraints for any of these types, hence the
union with all of them from the start.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * General struct, for use by actions that require a single integer value.
> + */
> +struct rte_flow_general_action {
> + union rte_flow_integer integer;
> +};

Seriously, why can't actions take "union rte_flow_integer" directly? Besides
"rte_flow_general_action" is vague as heck.

Seems like you made this structure so it could be extended later, but forgot
that doing so is not an option since ABIs are set in stone. You must make
new APIs as exhaustive and restrict their scope as much as possible from the
start because of that.

Note if you don't want to use union rte_flow_integer directly, it's also
fine to document your actions as taking (const rte_be32_t *) directly
through their conf pointer.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2 1/3] ethdev: add actions to modify TCP header fields

2019-04-08 Thread Adrien Mazarguil

Hi Andrew, *Dekel* (I swear I'm not doing it on purpose, hopefully I won't
make that stupid mistake again :)

On Mon, Apr 08, 2019 at 04:53:54PM +0300, Andrew Rybchenko wrote:
> On 4/8/19 4:36 PM, Dekel Peled wrote:
> > Regarding Andrew's suggestion: "Shouldn't these action be 
> > RTE_FLOW_ACTION_TYPE_MOD_TCP_{ACK,SEQ} with singed 32-bit integer parameter 
> > (negative to decrement, positive to increment)?"
> > I will leave the actions as is, the action names indicate the operation 
> > they perform.
> 
> I think it is an overkill to have two actions for the purpose: DEC (value)
> == INC ((uint32_t)-value)
> If it is really important to have DEC and INC, please, make it clear from
> actions documentation why.

The main reason in my opinion is that a signed value may not be able to
represent an increment large enough for an unsigned value of the same bit
width.

This can be worked around by using a type larger than the underlying data
field (e.g. i64 for u32), but it will look confusing and is not an option
for the largest unsigned type we support (u64).

Another problem is what endian increment/decrement actions should use. There
are no dedicated endian types for signed values at the moment, and I'm not
sure we should define any.

This could be addressed by defining a third "SET" action to overwrite the
current value (even if unused) as this action will use the same type as
the two others and that of the underlying data (including endianness) for
consistency.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2 1/3] ethdev: add actions to modify TCP header fields

2019-04-04 Thread Adrien Mazarguil

Hi Ori,

(trimming message down a bit)

On Thu, Apr 04, 2019 at 09:01:52AM +, Ori Kam wrote:
> Hi Adrien,
> 
> PSB

> 
> > From: Adrien Mazarguil 

> > On Wed, Apr 03, 2019 at 10:49:09AM +, Dekel Peled wrote:
> > > Thanks, PSB.

> > > > From: Adrien Mazarguil 

> > > > I still don't agree with the wording as it implies one must combine this
> > action
> > > > with the TCP pattern item or else, while one should simply ensure the
> > > > presence of TCP traffic somehow. This may be done by a prior filtering 
> > > > rule.
> > > >
> > > > So here's a generic suggestion which could be used with pretty much all
> > > > modifying actions (other actions have the same problem and will have to 
> > > > be
> > > > fixed as well eventually):
> > > >
> > > >  Using this action on non-matching traffic results in undefined 
> > > > behavior.
> > > >
> > > > This comment applies to all instances in this patch.
> > >
> > > I accept your suggestion, indeed the existing actions have the problematic
> > condition.
> > > However I would like to currently leave this patch as-is for consistency.
> > > I will send a fix patch for next release, applying the updated text to all
> > modify-header actions.
> > 
> > Please do it now as it's much more difficult to change an existing API
> > later (think deprecation notices and endless discussions); even seemingly
> > minor documentation issues like this one may affect applications.
> > 
> I agree that changing API is not easy. This is why I think we should keep 
> Dekel patch,
> there is a number of API and consistency is very important. Also the PMD is 
> based on the current
> description that such command should fail.
> 
> So lets keep it this way if you want to change all API then and only then 
> this API should be changed.

Wait, I'm not asking Delek to modify existing code/APIs right now, only to
document these new actions properly from the start so we don't have to do it
later (you even acknowledged it's more difficult that way).

So I fail to understand why it's so important for their documentation to be
consistent with unrelated and badly documented actions?

Note the change I'm asking for at the API level doesn't affect PMD code,
which remains free to put extra limitations (namely the presence of TCP
pattern items). It's just that these limitations have nothing to do in the
API itself.


> > > It's either 2 actions with 1 parameters, or 1 action with 2 parameters.
> > > The current implementation is more straight-forward in my opinion.
> > 
> > I generally also prefer the one action per thing to do approach, but seeing
> > the kind of actions you're adding, I fear we'll soon end up with lots of
> > similar rte_flow_action_* structures modifying a single 32-bit value in some
> > way.
> > 
> > So for the same reasons as above, I think it's the right time to define a
> > shared structure to rule them all, or maybe even let users provide a
> > rte_be32_t/uint32_t/whatever pointer directly as a conf pointer (not
> > as straightforward to document though).
> > 
> > An object to rule them all would look something like that:
> > 
> >  union rte_flow_integer {
> >  rte_be64_t be64;
> >  rte_le64_t le64;
> >  uint64_t u64;
> >  int64_t i64;
> >  rte_be32_t be32;
> >  rte_le32_t le32;
> >  uint32_t u32;
> >  int32_t i32;
> >  uint8_t u8;
> >  int8_t i8;
> >  };
> > 
> > Then actions that need a single integer value only have to document which
> > field is relevant to them. How about that?
> > 
> 
> Like my previous comment. I understand your idea, but it has no huge 
> advantage compared to the
> suggested one by Dekel which also match all other API.
> 
> Currently for each action we have a direct command, which is easy to 
> understand by using your idea we break this concept.

Yes, although not all actions have a configuration structure. Those that do
indeed have a rte_flow_action_* counterpart, but it doesn't have to be
unique, see RTE_FLOW_ITEM_GTP/GTPC/GTPU for instance.

Likewise this patch adds struct rte_flow_action_modify_tcp_seq shared by
RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ and RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
although they lack a common prefix (inc_tcp/dec_tcp vs. modify_tcp). The
type to use is covered by documentation and that's fine.

So why not go a little further and share the exact same structure with
RTE_FLOW_ACTIO

Re: [dpdk-dev] [PATCH v2 1/3] ethdev: add actions to modify TCP header fields

2019-04-03 Thread Adrien Mazarguil

On Wed, Apr 03, 2019 at 10:49:09AM +, Dekel Peled wrote:
> Thanks, PSB.
> 
> > -Original Message-
> > From: Adrien Mazarguil 
> > Sent: Wednesday, April 3, 2019 12:15 PM
> > To: Dekel Peled 
> > Cc: wenzhuo...@intel.com; jingjing...@intel.com;
> > bernard.iremon...@intel.com; Yongseok Koh ;
> > Shahaf Shuler ; dev@dpdk.org; Ori Kam
> > 
> > Subject: Re: [PATCH v2 1/3] ethdev: add actions to modify TCP header fields
> > 
> > Hi Dekel,
> > 
> > On Tue, Apr 02, 2019 at 06:13:19PM +0300, Dekel Peled wrote:
> > > Add actions:
> > > - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> > > - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP
> > header.
> > > - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
> > >   header.
> > > - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
> > >   header.
> > >
> > > Original work by Xiaoyu Min.
> > >
> > > Signed-off-by: Dekel Peled 
> > 
> > > +Action: ``INC_TCP_SEQ``
> > > +^^^
> > > +
> > > +Increase sequence number in the outermost TCP header.
> > > +
> > > +If this action is used without a valid RTE_FLOW_ITEM_TYPE_TCP flow
> > > +pattern item, behavior is unspecified, depending on PMD
> > implementation.
> > 
> > I still don't agree with the wording as it implies one must combine this 
> > action
> > with the TCP pattern item or else, while one should simply ensure the
> > presence of TCP traffic somehow. This may be done by a prior filtering rule.
> > 
> > So here's a generic suggestion which could be used with pretty much all
> > modifying actions (other actions have the same problem and will have to be
> > fixed as well eventually):
> > 
> >  Using this action on non-matching traffic results in undefined behavior.
> > 
> > This comment applies to all instances in this patch.
> 
> I accept your suggestion, indeed the existing actions have the problematic 
> condition.
> However I would like to currently leave this patch as-is for consistency.
> I will send a fix patch for next release, applying the updated text to all 
> modify-header actions.

Please do it now as it's much more difficult to change an existing API
later (think deprecation notices and endless discussions); even seemingly
minor documentation issues like this one may affect applications.

> > 
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this structure may change without prior notice
> > > + *
> > > + * RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ
> > > + * RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
> > > + *
> > > + * Increase/Decrease outermost TCP sequence number  */ struct
> > > +rte_flow_action_modify_tcp_seq {
> > > + rte_be32_t value; /**< Value to increase/decrease by. */ };
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this structure may change without prior notice
> > > + *
> > > + * RTE_FLOW_ACTION_TYPE_INC_TCP_ACK
> > > + * RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK
> > > + *
> > > + * Increase/Decrease outermost TCP acknowledgment number.
> > > + */
> > > +struct rte_flow_action_modify_tcp_ack {
> > > + rte_be32_t value; /**< Value to increase/decrease by. */ };
> > 
> > Thanks for adding experimental tags and comments, however you didn't
> > reply anything about using a single action, or at least a single structure 
> > for
> > add/sub/set? I'd like to hear your thoughts.
> 
> It's either 2 actions with 1 parameters, or 1 action with 2 parameters.
> The current implementation is more straight-forward in my opinion.

I generally also prefer the one action per thing to do approach, but seeing
the kind of actions you're adding, I fear we'll soon end up with lots of
similar rte_flow_action_* structures modifying a single 32-bit value in some
way.

So for the same reasons as above, I think it's the right time to define a
shared structure to rule them all, or maybe even let users provide a
rte_be32_t/uint32_t/whatever pointer directly as a conf pointer (not
as straightforward to document though).

An object to rule them all would look something like that:

 union rte_flow_integer {
 rte_be64_t be64;
 rte_le64_t le64;
 uint64_t u64;
 int64_t i64;
 rte_be32_t be32;
 rte_le32_t le32;
 uint32_t u32;
 int32_t i32;
 uint8_t u8;
 int8_t i8;
 };

Then actions that need a single integer value only have to document which
field is relevant to them. How about that?

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2 1/3] ethdev: add actions to modify TCP header fields

2019-04-03 Thread Adrien Mazarguil

Hi Dekel,

On Tue, Apr 02, 2019 at 06:13:19PM +0300, Dekel Peled wrote:
> Add actions:
> - INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
> - DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
> - INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
>   header.
> - DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
>   header.
> 
> Original work by Xiaoyu Min.
> 
> Signed-off-by: Dekel Peled 

> +Action: ``INC_TCP_SEQ``
> +^^^
> +
> +Increase sequence number in the outermost TCP header.
> +
> +If this action is used without a valid RTE_FLOW_ITEM_TYPE_TCP flow pattern 
> item,
> +behavior is unspecified, depending on PMD implementation.

I still don't agree with the wording as it implies one must combine this
action with the TCP pattern item or else, while one should simply ensure the
presence of TCP traffic somehow. This may be done by a prior filtering rule.

So here's a generic suggestion which could be used with pretty much all
modifying actions (other actions have the same problem and will have to be
fixed as well eventually):

 Using this action on non-matching traffic results in undefined behavior.

This comment applies to all instances in this patch.


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ
> + * RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
> + *
> + * Increase/Decrease outermost TCP sequence number
> + */
> +struct rte_flow_action_modify_tcp_seq {
> + rte_be32_t value; /**< Value to increase/decrease by. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_INC_TCP_ACK
> + * RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK
> + *
> + * Increase/Decrease outermost TCP acknowledgment number.
> + */
> +struct rte_flow_action_modify_tcp_ack {
> + rte_be32_t value; /**< Value to increase/decrease by. */
> +};

Thanks for adding experimental tags and comments, however you didn't reply
anything about using a single action, or at least a single structure for
add/sub/set? I'd like to hear your thoughts.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH 2/3] app/testpmd: add actions to modify TCP header fields

2019-03-29 Thread Adrien Mazarguil

+ .next = NEXT(action_inc_tcp_ack),
> + .call = parse_vc,
> + },
> + [ACTION_INC_TCP_ACK_VALUE] = {
> + .name = "value",
> + .help = "the value to increase TCP acknowledgment number by",
> + .next = NEXT(action_inc_tcp_ack, NEXT_ENTRY(UNSIGNED)),
> + .args = ARGS(ARGS_ENTRY_HTON
> + (struct rte_flow_action_modify_tcp_ack, value)),

Ditto.

> + .call = parse_vc_conf,
> + },
> + [ACTION_DEC_TCP_ACK] = {
> + .name = "dec_tcp_ack",
> + .help = "decrease TCP acknowledgment number",
> + .priv = PRIV_ACTION(DEC_TCP_ACK,
> + sizeof(struct rte_flow_action_modify_tcp_ack)),
> + .next = NEXT(action_dec_tcp_ack),
> + .call = parse_vc,
> + },
> + [ACTION_DEC_TCP_ACK_VALUE] = {
> + .name = "value",
> + .help = "the value to decrease TCP acknowledgment number by",
> + .next = NEXT(action_dec_tcp_ack, NEXT_ENTRY(UNSIGNED)),
> + .args = ARGS(ARGS_ENTRY_HTON
> + (struct rte_flow_action_modify_tcp_ack, value)),

Ditto.

> + .call = parse_vc_conf,
> + },
>  };
>  
>  /** Remove and return last entry from argument stack. */
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 1a12da4..c6f8b2c 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -3961,6 +3961,22 @@ This section lists supported actions and their 
> attributes, if any.
>  
>- ``mac_addr {MAC-48}``: new destination MAC address
>  
> +- ``inc_tcp_seq``: Increase sequence number in the outermost TCP header
> +
> +  - ``value {unsigned}``: Value to increase TCP sequence number by
> +
> +- ``dec_tcp_seq``: Decrease sequence number in the outermost TCP header
> +
> +  - ``value {unsigned}``: Value to decrease TCP sequence number by
> +
> +- ``inc_tcp_ack``: Increase acknowledgment number in the outermost TCP header
> +
> +  - ``value {unsigned}``: Value to increase TCP acknowledgment number by
> +
> +- ``dec_tcp_ack``: Decrease acknowledgment number in the outermost TCP header
> +
> +  - ``value {unsigned}``: Value to decrease TCP acknowledgment number by

Please add missing "." to each line.

> +
>  Destroying flow rules
>  ~
>  
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH 1/3] ethdev: add actions to modify TCP header fields

2019-03-29 Thread Adrien Mazarguil

ibrte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 3277be1..589d0b9 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -143,6 +143,14 @@ struct rte_flow_desc_data {
>   MK_FLOW_ACTION(SET_TTL, sizeof(struct rte_flow_action_set_ttl)),
>   MK_FLOW_ACTION(SET_MAC_SRC, sizeof(struct rte_flow_action_set_mac)),
>   MK_FLOW_ACTION(SET_MAC_DST, sizeof(struct rte_flow_action_set_mac)),
> + MK_FLOW_ACTION(INC_TCP_SEQ,
> + sizeof(struct rte_flow_action_modify_tcp_seq)),
> + MK_FLOW_ACTION(DEC_TCP_SEQ,
> + sizeof(struct rte_flow_action_modify_tcp_seq)),
> + MK_FLOW_ACTION(INC_TCP_ACK,
> + sizeof(struct rte_flow_action_modify_tcp_ack)),
> + MK_FLOW_ACTION(DEC_TCP_ACK,
> + sizeof(struct rte_flow_action_modify_tcp_ack)),
>  };
>  
>  static int
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index c0fe879..74cd03e 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1651,6 +1651,46 @@ enum rte_flow_action_type {
>* See struct rte_flow_action_set_mac.
>*/
>   RTE_FLOW_ACTION_TYPE_SET_MAC_DST,
> +
> + /**
> +  * Increase sequence number in the outermost TCP header.
> +  *
> +  * If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_TCP,
> +  * the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.

Ditto.

> +  *
> +  * See struct rte_flow_action_modify_tcp_seq
> +  */
> + RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ,
> +
> + /**
> +  * Decrease sequence number in the outermost TCP header.
> +  *
> +  * If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_TCP,
> +  * the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.

Ditto.

> +  *
> +  * See struct rte_flow_action_modify_tcp_seq
> +  */
> + RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ,
> +
> + /**
> +  * Increase acknowledgment number in the outermost TCP header.
> +  *
> +  * If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_TCP,
> +  * the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.
> +  *

Ditto.

> +  * See struct rte_flow_action_modify_tcp_ack
> +  */
> + RTE_FLOW_ACTION_TYPE_INC_TCP_ACK,
> +
> + /**
> +  * Decrease acknowledgment number in the outermost TCP header.
> +  *
> +  * If flow pattern does not define a valid RTE_FLOW_ITEM_TYPE_TCP,
> +  * the PMD should return a RTE_FLOW_ERROR_TYPE_ACTION error.
> +  *

Ditto.

> +  * See struct rte_flow_action_modify_tcp_ack
> +  */
> + RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK,
>  };
>  
>  /**
> @@ -2122,6 +2162,26 @@ struct rte_flow_action_set_mac {
>   uint8_t mac_addr[ETHER_ADDR_LEN];
>  };
>  
> +/**

Experimental tag is missing.

> + * RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ
> + * RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
> + *
> + * Increase/Decrease outermost TCP's sequence number

Suggestion:

 Increase/decrease outermost TCP sequence number.

> + */
> +struct rte_flow_action_modify_tcp_seq {
> + rte_be32_t value;

Field documentation is mandatory, e.g.:

 rte_be32_t value; /**< Value to add/subtract. */

Beside, I'm not sure this value should be big endian since it's not stored
as is in the TCP header; it's used by the host system to compute a new
value.

Also what about having another field to specify what needs to be done with
this value (e.g. add/sub/set - in which case big endian makes sense) to
reduce the number of new actions? Something like:

 struct rte_flow_action_mod_tcp_seq {
 enum rte_flow_action_mod_tcp_seq_op {
 RTE_FLOW_ACTION_MOD_TCP_SEQ_OP_ADD,
 RTE_FLOW_ACTION_MOD_TCP_SEQ_OP_SUB,
 RTE_FLOW_ACTION_MOD_TCP_SEQ_OP_SET,
 } op; /**< Operation to perform. */
 rte_be32_t value; /**< Value to use with operation. */
 };

> +};
> +
> +/**

Experimental tag also missing.

> + * RTE_FLOW_ACTION_TYPE_INC_TCP_ACK
> + * RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK
> + *
> + * Increase/Decrease TCP's acknowledgment number.

Suggestion:

 Increase/decrease outermost TCP acknowledgment number.

> + */
> +struct rte_flow_action_modify_tcp_ack {
> + rte_be32_t value;

Field documentation also missing.

> +};
> +
>  /*
>   * Definition of a single action.
>   *
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] app/testpmd: fix MPLSoUDP encapsulation

2018-11-22 Thread Adrien Mazarguil

On Thu, Nov 22, 2018 at 09:56:09AM +, Dekel Peled wrote:
> Thanks, PSB.
> 
> > -Original Message-
> > From: Adrien Mazarguil 
> > Sent: Thursday, November 22, 2018 11:05 AM
> > To: Dekel Peled 
> > Cc: wenzhuo...@intel.com; jingjing...@intel.com;
> > bernard.iremon...@intel.com; dev@dpdk.org; Ori Kam
> > ; Shahaf Shuler 
> > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: fix MPLSoUDP encapsulation
> > 
> > On Mon, Nov 19, 2018 at 06:54:50PM +0200, Dekel Peled wrote:
> > > Set MPLS label value in appropriate location at mplsoudp_encap_conf,
> > > so it is correctly copied to rte_flow_item_mpls.
> > >
> > > Fixes: a1191d39cb57 ("app/testpmd: add MPLSoUDP encapsulation")
> > > Cc: or...@mellanox.com
> > >
> > > Signed-off-by: Dekel Peled 
> > > ---
> > >  app/test-pmd/cmdline.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > > 1275074..40e64cc 100644
> > > --- a/app/test-pmd/cmdline.c
> > > +++ b/app/test-pmd/cmdline.c
> > > @@ -15804,10 +15804,10 @@ static void
> > cmd_set_mplsoudp_encap_parsed(void *parsed_result,
> > >   struct cmd_set_mplsoudp_encap_result *res = parsed_result;
> > >   union {
> > >   uint32_t mplsoudp_label;
> > > - uint8_t label[3];
> > > + uint8_t label[4];
> > >   } id = {
> > >   .mplsoudp_label =
> > > - rte_cpu_to_be_32(res->label) &
> > RTE_BE32(0x00ff),
> > > + rte_cpu_to_be_32(res->label<<4) &
> > RTE_BE32(0x00ff),
> > 
> > Just to be sure, since label is a 20-bit value, isn't the shift supposed to 
> > be 12
> > bits? In which case that mask is harmless but misleading. How about:
> > 
> >  .mplsoudp_label = rte_cpu_to_be32((res->label & 0xf) << 12);
> > 
> 
> Label is 20-bits value in a 24-bits field, see struct rte_flow_item_mpls.

OK, I know, what I missed was the following line:

 rte_memcpy(mplsoudp_encap_conf.label, &id.label[1], 3);

Just a suggestion then: using the same memcpy() offsets in both places for
clarity:

  rte_be32_t label = rte_cpu_to_be32(res->label << 12);

  memcpy(mplsodudp_encap_conf.label, &label, 3);

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] app/testpmd: fix MPLSoUDP encapsulation

2018-11-22 Thread Adrien Mazarguil

On Mon, Nov 19, 2018 at 06:54:50PM +0200, Dekel Peled wrote:
> Set MPLS label value in appropriate location at mplsoudp_encap_conf,
> so it is correctly copied to rte_flow_item_mpls.
> 
> Fixes: a1191d39cb57 ("app/testpmd: add MPLSoUDP encapsulation")
> Cc: or...@mellanox.com
> 
> Signed-off-by: Dekel Peled 
> ---
>  app/test-pmd/cmdline.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 1275074..40e64cc 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -15804,10 +15804,10 @@ static void cmd_set_mplsoudp_encap_parsed(void 
> *parsed_result,
>   struct cmd_set_mplsoudp_encap_result *res = parsed_result;
>   union {
>   uint32_t mplsoudp_label;
> - uint8_t label[3];
> + uint8_t label[4];
>   } id = {
>   .mplsoudp_label =
> - rte_cpu_to_be_32(res->label) & RTE_BE32(0x00ff),
> + rte_cpu_to_be_32(res->label<<4) & RTE_BE32(0x00ff),

Just to be sure, since label is a 20-bit value, isn't the shift supposed to
be 12 bits? In which case that mask is harmless but misleading. How about:

 .mplsoudp_label = rte_cpu_to_be32((res->label & 0xf) << 12);

>   };
>  
>   if (strcmp(res->mplsoudp, "mplsoudp_encap") == 0)
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-14 Thread Adrien Mazarguil

On Wed, Nov 14, 2018 at 01:51:19PM +, Shahaf Shuler wrote:

> IMO, it will make it more clear if the key will *have* to be null, because 
> there is no single good reason to have it otherwise. 
> 
> However it looks like an endless debate between strict and relaxed API. there 
> are points to both sides, yet we are failing to converge.
> Mlx5 already implements according to the rss_key_len and rss_key approach. 
> What are other PMD doing?
> 
> Assuming there is a consensus among the PMDs, maybe we can follow it in order 
> to avoid the extra work.
> is it that critical for you to enforce only the key_len w/o the rss_key? 

Not at all, I don't mind extra checks in PMDs actually.

To be clear, here's a list of what I consider valid PMD checks:

- if (!key_len) use_default_key();

- if (key_len) { assert(key); use_app_key(); }

- if (key_len) { if (!key) complain(); else use_app_key(); }

- if (key_len) { if (!key) { complain(); use_default_key(); } else 
use_app_key(); }

- if (key && key_len) use_app_key(); else use_default_key(); /* it's OK
  since the alternative is a crash */

- if (!key_len || !key) use_default_key(); /* ditto */

While those are invalid:

- if (!key_len && !key) use_default_key(); /* err, else what? */

- if (!key_len) { assert(!key); use_default_key(); } /* unless you hate
  users */

- if (!key_len) { if (key) complain(); use_default_key(); } /* extra noise
  can be annoying */

What I'm most concerned with is rte_flow API documentation, that is, what we
ask users to do in order to achieve something. A default behavior for a zero
value is currently documented on "level" and "types" fields. "key_len" and
"queue_num" are currently lacking this information.

Just like "key", requesting RSS without providing a list of queues should
default to all configured Rx queues for convenience. In that case the
"queue" pointer can be undefined, not necessarily NULL because the PMD won't
look for queues if that list anyway, its value doesn't matter.

So documentation will describe that if "key_len" or "queue_num" are nonzero,
non-default behavior is requested and the related "key"/"queue" fields must
be set and valid. Otherwise they can remain undefined.

> > > To this doc issue,
> > > I don't understand on what cases it makes sense for application to have
> > rss_key_len = 0 while rss_key != NULL. This is obviously in-consist input, 
> > and
> > of course all PMD will just ignore the key.
> > > I think enforcing rss_key and rss_key_len to be NULL is a fair requirement
> > from application, and it makes no confusion in the API inputs.
> > 
> > Then you need to define what happens when key_len != 0 and key == NULL,
> > also when key_len == 0 and key != NULL, none of which make sense
> > currently.
> 
> Of course all are in-consist and the PMD is free to reject such rules. This 
> is not related though to how we define the default RSS right? 

If we force users to set both key_len = 0 *and* key = NULL like you suggest
in order to get the default behavior and key_len is not enough, I think it
makes sense to also describe these two cases for consistency. Users are
going to wonder why is the damn PMD reading key at all if its length is
zero, so we have to explain that no, the PMD doesn't do that but the pointer
still has to be NULL anyway.

Likewise users shouldn't be tempted to enter a nonzero key length without a
valid key. It also has to be described should we choose this approach.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-14 Thread Adrien Mazarguil

Hi Shahaf,

On Tue, Nov 13, 2018 at 06:39:04PM +, Shahaf Shuler wrote:
> Hi Adrien, 
> 
> Tuesday, November 13, 2018 7:15 PM, Adrien Mazarguil:
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and
> > types
> > 
> > Again a bit late to the party, please see below.
> > 
> > On Sun, Nov 11, 2018 at 09:35:22AM +, Ori Kam wrote:
> 
> [...]
> 
> > > > The setfault is the result of commit a4391f8bae ("app/testpmd: set
> > > > default RSS key as null").
> > > > Reverting this commit should fix the segfault but it also means
> > > > there is no way to set default key (key=NULL) with testpmd.
> > > > Need to check if this is only a testpmd limitation and not all
> > > > applications limitation.
> > > >
> > > > We should decide how an application can set default RSS without
> > > > knowing anything about keys.
> > > >
> > >
> > > I agree with Adrian that the main criteria should be the length.
> > > Maybe the set default RSS in testpmd should get new parameter.
> > 
> > Since [1] was reverted and we seem to agree that a zero key_len should
> > trigger a PMD-specific default key, this can already be requested with
> > testpmd by overriding key_len, e.g.:
> > 
> >  flow create 1 pattern eth / end actions rss key_len 0 / end
> > 
> > Using an empty string as the key would yield the same result but cannot be
> > expressed on the command line yet. Note that specifying a key automatically
> > overrides key_len, so key_len must be forced to 0 last to get PMD defaults:
> > 
> >  flow create 1 pattern eth / end actions rss key foo key_len 0 / end
> 
> I don't understand why we are backing up API claims with "how testpmd is 
> implemented". The APIs should be correct, regardless of how testpmd is using 
> them. 

This wasn't the intent, I mean, currently one cannot input something like
that to get a zero key length:

 flow create 1 pattern eth / end actions rss key "" / end

Because "" is interpreted literally. So the only way to request a zero key
length is by explicitly setting it through "key_len 0".

The API remains clear: a zero key length requests default behavior from the
PMD regardless of the key pointer, which doesn't *have* to be NULL, merely
undefined. Testpmd does exactly that.

> To this doc issue, 
> I don't understand on what cases it makes sense for application to have 
> rss_key_len = 0 while rss_key != NULL. This is obviously in-consist input, 
> and of course all PMD will just ignore the key.
> I think enforcing rss_key and rss_key_len to be NULL is a fair requirement 
> from application, and it makes no confusion in the API inputs.

Then you need to define what happens when key_len != 0 and key == NULL, also
when key_len == 0 and key != NULL, none of which make sense currently.

There's no reason for the PMD to even look at the key pointer if key_len is
0. Only if nonzero, it *can* check for its validity however there's no
reason to, it's a programming error in the application if not the case.
assert() is more appropriate for such situations.

I agree there's a lack of documentation which must be addressed, my point is
that key_len is the only guarantee a PMD needs from the application.

> > Here key_len is set to testpmd's default size when parsing "rss", updated to
> > 3 when parsing "key foo" and updated once again when parsing "key_len 0".
> > 
> > Lastly, while it would make sense for testpmd to use 0 as the default value,
> > doing so yields inconsistent balancing results between vendors/devices as
> > they all come with a different key. Same reason as initializing the RSS 
> > types
> > field to the global rss_hf instead of 0.
> 
> 
> 
> > 
> > [1] "app/testpmd: revert setting default RSS"
> > 
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> > ils.dpdk.org%2Farchives%2Fdev%2F2018-
> > November%2F118786.html&data=02%7C01%7Cshahafs%40mellanox.co
> > m%7C0eecf3e9af4b4b6bc53108d6498ba2a8%7Ca652971c7d2e4d9ba6a4d1492
> > 56f461b%7C0%7C0%7C636777261425388073&sdata=Hu0iGr2xS%2FI%2FI
> > s5PtzCylMMft5w5TBmtd3GYppEKKcA%3D&reserved=0

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-13 Thread Adrien Mazarguil

Again a bit late to the party, please see below.

On Sun, Nov 11, 2018 at 09:35:22AM +, Ori Kam wrote:
> > -Original Message-
> > From: dev  On Behalf Of Ophir Munk
> > Sent: Friday, November 9, 2018 10:14 AM
> > To: Yongseok Koh ; Adrien Mazarguil
> > ; Andrew Rybchenko
> > 
> > Cc: Ferruh Yigit ; dev@dpdk.org; Thomas Monjalon
> > ; Asaf Penso ; Shahaf Shuler
> > ; Olga Shern 
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and 
> > types
> > 
> > > -Original Message-
> > > From: Yongseok Koh
> > > Sent: Friday, November 09, 2018 1:07 AM
> > > To: Ophir Munk ; Adrien Mazarguil
> > > ; Andrew Rybchenko
> > > 
> > > Cc: Ferruh Yigit ; dev@dpdk.org; Thomas Monjalon
> > > ; Asaf Penso ; Shahaf
> > > Shuler ; Olga Shern 
> > > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and
> > > types
> > >

> > >
> > > -   if (src.rss->key_len) {
> > > +   if (src.rss->key && src.rss->key_len) {
> > >
> > > but looks like we should conclude this thread first?
> > > Or, does the fix make any sense regardless of having key_len=0 or key=null
> > > for default key?
> > > Having more sanity check is no harm usually...
> > >
> > >
> > > Thanks,
> > > Yongseok
> > >
> > 
> > The setfault is the result of commit a4391f8bae ("app/testpmd: set default 
> > RSS
> > key as null").
> > Reverting this commit should fix the segfault but it also means there is no 
> > way
> > to set default key (key=NULL) with testpmd.
> > Need to check if this is only a testpmd limitation and not all applications
> > limitation.
> > 
> > We should decide how an application can set default RSS without knowing
> > anything about keys.
> > 
> 
> I agree with Adrian that the main criteria should be the length.
> Maybe the set default RSS in testpmd should get new parameter.

Since [1] was reverted and we seem to agree that a zero key_len should
trigger a PMD-specific default key, this can already be requested with
testpmd by overriding key_len, e.g.:

 flow create 1 pattern eth / end actions rss key_len 0 / end 

Using an empty string as the key would yield the same result but cannot be
expressed on the command line yet. Note that specifying a key automatically
overrides key_len, so key_len must be forced to 0 last to get PMD defaults:

 flow create 1 pattern eth / end actions rss key foo key_len 0 / end

Here key_len is set to testpmd's default size when parsing "rss", updated to
3 when parsing "key foo" and updated once again when parsing "key_len 0".

Lastly, while it would make sense for testpmd to use 0 as the default value,
doing so yields inconsistent balancing results between vendors/devices as
they all come with a different key. Same reason as initializing the RSS
types field to the global rss_hf instead of 0.

[1] "app/testpmd: revert setting default RSS"
https://mails.dpdk.org/archives/dev/2018-November/118786.html

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-07 Thread Adrien Mazarguil

On Wed, Nov 07, 2018 at 03:13:07PM +, Ophir Munk wrote:
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Wednesday, November 07, 2018 4:06 PM
> > To: Ophir Munk 
> > Cc: Ferruh Yigit ; Andrew Rybchenko
> > ; dev@dpdk.org; Thomas Monjalon
> > ; Asaf Penso ; Shahaf
> > Shuler ; Olga Shern 
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and
> > types
> > 
> > On Wed, Nov 07, 2018 at 12:39:24PM +, Ophir Munk wrote:
> > > > -Original Message-
> > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > > Sent: Wednesday, November 07, 2018 11:31 AM
> > > > To: Ophir Munk 
> > > > Cc: Ferruh Yigit ; Andrew Rybchenko
> > > > ; dev@dpdk.org; Thomas Monjalon
> > > > ; Asaf Penso ; Shahaf
> > > > Shuler ; Olga Shern 
> > > > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key
> > > > and types
> > > >
> > > > On Wed, Nov 07, 2018 at 09:23:42AM +, Ophir Munk wrote:
> > > > > struct rte_flow_action_rss include fields 'key' and 'types'.
> > > > > Field 'key' is a pointer to bytes array (uint8_t *) which contains
> > > > > the specific RSS hash key.
> > > > > If an application is only interested in default RSS operation it
> > > > > should not care about the specific hash key. The application can
> > > > > set the hash key to NULL such that any PMD uses its default RSS key.
> > > > >
> > > > > Field 'types' is a uint64_t bits flag used to specify a specific
> > > > > RSS hash type such as ETH_RSS_IP (see ETH_RSS_*).
> > > > > If an application does not care about the specific RSS type it can
> > > > > set this field to 0 such that any PMD uses its default type.
> > > > >
> > > > > Signed-off-by: Ophir Munk 
> > > > > ---
> > > > >  lib/librte_ethdev/rte_flow.h | 9 +++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > > > b/lib/librte_ethdev/rte_flow.h index c0fe879..ca9e135 100644
> > > > > --- a/lib/librte_ethdev/rte_flow.h
> > > > > +++ b/lib/librte_ethdev/rte_flow.h
> > > > > @@ -1782,10 +1782,15 @@ struct rte_flow_action_rss {
> > > > >* through.
> > > > >*/
> > > > >   uint32_t level;
> > > > > - uint64_t types; /**< Specific RSS hash types (see ETH_RSS_*). */
> > > > > + /**
> > > > > +  * Specific RSS hash types (see ETH_RSS_*),
> > > > > +  * or 0 for PMD specific default.
> > > > > +  */
> > > > > + uint64_t types;
> > > > >   uint32_t key_len; /**< Hash key length in bytes. */
> > > > >   uint32_t queue_num; /**< Number of entries in @p queue. */
> > > > > - const uint8_t *key; /**< Hash key. */
> > > > > + /** Hash key, or NULL for PMD specific default key. */
> > > > > + const uint8_t *key;
> > > >
> > > > I'd suggest to document that on key_len instead. If key_len is
> > > > nonzero, key cannot be NULL anyway.
> > >
> > > The decision if a key/len combination is valid is done in the PMD action
> > validation API.
> > > For example, in MLX5 key==NULL and key_len==40 is accepted.
> > > The combination key==NULL and key_len==0 should always succeeds,
> > however the "must" parameter for RSS default is key==NULL and not
> > key_len==0.
> > 
> > I understand this is how the mlx5 PMD implemented it, but my point is that 
> > it
> > makes more sense API-wise to define key_len == 0 as the trigger for a 
> > default
> > RSS hash key than key == NULL.
> > 
> > My suggestion is to follow the same trend as memcpy(), mmap(), snprintf()
> > and other well-known functions that take a size when dealing with
> > NULL/undefined pointers. Only size matters! :)
> > 
> 
> Please let's stay backward compatible and consistent with previous dpdk 
> releases where
> key==NULL is used in struct rte_eth_rss_conf (see code snippet in [1]).

And I thought I wouldn't hear again from that structure after ac8d22de2394
("ethdev: flatten RSS configuration in flow API") go

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-07 Thread Adrien Mazarguil

On Wed, Nov 07, 2018 at 12:39:24PM +, Ophir Munk wrote:
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Wednesday, November 07, 2018 11:31 AM
> > To: Ophir Munk 
> > Cc: Ferruh Yigit ; Andrew Rybchenko
> > ; dev@dpdk.org; Thomas Monjalon
> > ; Asaf Penso ; Shahaf
> > Shuler ; Olga Shern 
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and
> > types
> > 
> > On Wed, Nov 07, 2018 at 09:23:42AM +, Ophir Munk wrote:
> > > struct rte_flow_action_rss include fields 'key' and 'types'.
> > > Field 'key' is a pointer to bytes array (uint8_t *) which contains the
> > > specific RSS hash key.
> > > If an application is only interested in default RSS operation it
> > > should not care about the specific hash key. The application can set
> > > the hash key to NULL such that any PMD uses its default RSS key.
> > >
> > > Field 'types' is a uint64_t bits flag used to specify a specific RSS
> > > hash type such as ETH_RSS_IP (see ETH_RSS_*).
> > > If an application does not care about the specific RSS type it can set
> > > this field to 0 such that any PMD uses its default type.
> > >
> > > Signed-off-by: Ophir Munk 
> > > ---
> > >  lib/librte_ethdev/rte_flow.h | 9 +++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > b/lib/librte_ethdev/rte_flow.h index c0fe879..ca9e135 100644
> > > --- a/lib/librte_ethdev/rte_flow.h
> > > +++ b/lib/librte_ethdev/rte_flow.h
> > > @@ -1782,10 +1782,15 @@ struct rte_flow_action_rss {
> > >* through.
> > >*/
> > >   uint32_t level;
> > > - uint64_t types; /**< Specific RSS hash types (see ETH_RSS_*). */
> > > + /**
> > > +  * Specific RSS hash types (see ETH_RSS_*),
> > > +  * or 0 for PMD specific default.
> > > +  */
> > > + uint64_t types;
> > >   uint32_t key_len; /**< Hash key length in bytes. */
> > >   uint32_t queue_num; /**< Number of entries in @p queue. */
> > > - const uint8_t *key; /**< Hash key. */
> > > + /** Hash key, or NULL for PMD specific default key. */
> > > + const uint8_t *key;
> > 
> > I'd suggest to document that on key_len instead. If key_len is nonzero, key
> > cannot be NULL anyway.
> 
> The decision if a key/len combination is valid is done in the PMD action 
> validation API.
> For example, in MLX5 key==NULL and key_len==40 is accepted. 
> The combination key==NULL and key_len==0 should always succeeds, however the 
> "must" parameter for RSS default is key==NULL and not key_len==0.

I understand this is how the mlx5 PMD implemented it, but my point is that
it makes more sense API-wise to define key_len == 0 as the trigger for a
default RSS hash key than key == NULL.

My suggestion is to follow the same trend as memcpy(), mmap(), snprintf()
and other well-known functions that take a size when dealing with
NULL/undefined pointers. Only size matters! :)

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] a doubt about rss types action in rte_flow

2018-11-07 Thread Adrien Mazarguil

Hi and sorry for the delay.

On Thu, Oct 18, 2018 at 06:42:52AM +, Peng, Yuan wrote:
> Hi Adrien,
> 
> I have a doubt about the action rss types in rte_flow.
> testpmd> flow create 0 ingress pattern end actions rss types end / end
> 
> what is the expected function of the command?
> Does it mean enable RSS in no types? So actually it can disable rss all?

Doing so requests whatever counts as default RSS from the driver as
documented in doc/guides/prog_guide/rte_flow.rst:

 "Unlike global RSS settings used by other DPDK APIs, unsetting the ``types``
  field does not disable RSS in a flow rule. Doing so instead requests safe
  unspecified "best-effort" settings from the underlying PMD, which depending
  on the flow rule, may result in anything ranging from empty (single queue)
  to all-inclusive RSS."

> There are different execution result of the command from our different NICs.
> Some NICs report error:
> Caught error type 2 (flow rule (handle)): Failed to create flow.: Invalid 
> argument
> With some NICs, the command can be executed successfully, and disable rss 
> function for all packet types.
> 
> Could you help to solve my questions?

PMDs must handle it as described by the documentation, 0 being the only safe
value applications can rely on regardless of device properties; applications
that just want traffic to be spread somehow among the queues they configured
instead of a single queue.

Therefore PMDs that currently reject it should be fixed.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2] ethdev: document RSS default key and types

2018-11-07 Thread Adrien Mazarguil

On Wed, Nov 07, 2018 at 09:23:42AM +, Ophir Munk wrote:
> struct rte_flow_action_rss include fields 'key' and 'types'.
> Field 'key' is a pointer to bytes array (uint8_t *) which contains the
> specific RSS hash key.
> If an application is only interested in default RSS operation it
> should not care about the specific hash key. The application can set
> the hash key to NULL such that any PMD uses its default RSS key.
> 
> Field 'types' is a uint64_t bits flag used to specify a specific RSS
> hash type such as ETH_RSS_IP (see ETH_RSS_*).
> If an application does not care about the specific RSS type it can set
> this field to 0 such that any PMD uses its default type.
> 
> Signed-off-by: Ophir Munk 
> ---
>  lib/librte_ethdev/rte_flow.h | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index c0fe879..ca9e135 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -1782,10 +1782,15 @@ struct rte_flow_action_rss {
>* through.
>*/
>   uint32_t level;
> - uint64_t types; /**< Specific RSS hash types (see ETH_RSS_*). */
> + /**
> +  * Specific RSS hash types (see ETH_RSS_*),
> +  * or 0 for PMD specific default.
> +  */
> + uint64_t types;
>   uint32_t key_len; /**< Hash key length in bytes. */
>   uint32_t queue_num; /**< Number of entries in @p queue. */
> - const uint8_t *key; /**< Hash key. */
> + /** Hash key, or NULL for PMD specific default key. */
> + const uint8_t *key;

I'd suggest to document that on key_len instead. If key_len is nonzero, key
cannot be NULL anyway.

>   const uint16_t *queue; /**< Queue indices to use. */
>  };
>  
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] ethdev: add function name to log message

2018-10-12 Thread Adrien Mazarguil

On Fri, Oct 12, 2018 at 11:45:01AM +0100, Ferruh Yigit wrote:
> On 10/12/2018 11:42 AM, Ferruh Yigit wrote:
> > On 10/11/2018 6:59 PM, Stephen Hemminger wrote:
> >> @@ -161,8 +161,9 @@ extern "C" {
> >>  
> >>  extern int rte_eth_dev_logtype;
> >>  
> >> -#define RTE_ETHDEV_LOG(level, ...) \
> >> -  rte_log(RTE_LOG_ ## level, rte_eth_dev_logtype, "" __VA_ARGS__)
> >> +#define RTE_ETHDEV_LOG(level, fmt, ...)   \
> >> +  rte_log(RTE_LOG_ ## level, rte_eth_dev_logtype, \
> >> +  "%s():" fmt, __func__, ## __VA_ARGS__)
> > 
> > +1 to adding function name, but
> > 
> > failsafe is giving build error [1] with clang because of ## usage [2], that 
> > is
> > why I add this as ` "" __VA_ARGS__` at first place but you can't do this 
> > trick
> > if __VA_ARGS__ used after fmt.
> > 
> > I am not aware of a solution for this, __VA_OPT__(,) also didn't worked 
> > with clang.
> 
> +cc Adrien & Gaetan,
> 
> I saw Adrien put some "workaround" to this for mlx5

Yes, through RTE_FMT() (rte_common.h). Something like this:

 #define RTE_ETHDEV_LOG(level, fmt, ...) \
 rte_log(RTE_LOG_ ## level, \
 rte_eth_dev_logtype, \
 "%s():" fmt, \
 __func__, \
 ## __VA_ARGS__)

Can be rewritten like that:

 #define RTE_ETHDEV_LOG(level, ...) \
 rte_log(RTE_LOG_ ## level, \
 rte_eth_dev_logtype, \
 RTE_FMT("%s():" RTE_FMT_HEAD(__VA_ARGS__,), \
 __func__, \
 RTE_FMT_TAIL(__VA_ARGS__,)))

Although not too pretty and convenient, it does the job. In short:

- Remove "fmt" argument from prototype.
- Enclose format string and its arguments in RTE_FMT().
- Replace "fmt" with RTE_FMT_HEAD(__VA_ARGS__,).
- Replace "## __VA_ARGS__" with RTE_FMT_TAIL(__VA_ARGS__,).
- Yes, trailing commas are mandatory in RTE_FMT_(HEAD|TAIL)().
- Note it quietly appends a dummy "%.0s" argument to the format string.

> > [1]
> > .../build/include/rte_ethdev.h:166:26: error: token pasting of ',' and
> > __VA_ARGS__ is a GNU extension [-Werror,-Wgnu-zero-variadic-macro-arguments]
> > "%s():" fmt, __func__, ## __VA_ARGS__)
> >^
> > 
> > [2]
> > This seems because of "-pedantic" argument driver uses, and other PMDs using
> > "-pedantic", like mlx,  will have same error although they are disable by
> > default and error not observed in default build.
> > 
> 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions

2018-10-11 Thread Adrien Mazarguil

Hey Ori,

(removing most of the discussion, I'll only reply to the summary)

On Thu, Oct 11, 2018 at 08:48:05AM +, Ori Kam wrote:
> Hi Adrian,
> 
> Thanks for your comments please see my answer below and inline.
> 
> Due to a very short time limit and the fact that we have more than
> 4 patches that are based on this we need to close it fast.
> 
> As I can see there are number of options:
> * the old approach that neither of us like. And which mean that for 
>every tunnel we create a new command.

Just to be sure, you mean that for each new tunnel *type* a new rte_flow
action *type* must be added to DPDK right? Because the above reads like with
your proposal, a single flow rule can manage any number of TEPs and flow
rule creation for subsequent tunnels can be somehow bypassed.

One flow *rule* is still needed per TEP or did I miss something?

> * My proposed suggestion as is. Which is easier for at least number of 
> application
>to implement and faster in most cases.
> * My suggestion with different name, but then we need to find also a name
>for the decap and also a name for decap_l3. This approach is also 
> problematic
>since we have 2 API that are doing the same thig. For example in test-pmd 
> encap
>vxlan in which API shell we use?

Since you're doing this for MPLSoUDP and MPLSoGRE, you could leave
VXLAN/NVGRE encap as is, especially since (AFAIK) there are series still
relying on their API floating on the ML.

> * Combine between my suggestion and the current one by replacing the raw
>buffer with list of items. Less code duplication easier on the validation 
> ( that 
>don't think we need to validate the encap data) but we loss insertion rate.

Already suggested in the past [1], this led to VXLAN and NVGRE encap as we
know them.

> * your suggestion of  list of action that each action is one item. Main 
> problem
>is speed.  Complexity form the application side and time to implement.

Speed matters a lot to me also (go figure) but I still doubt this approach
is measurably faster. On the usability side, compared to one action per
protocol layer which better fits the rte_flow model, I'm also not
convinced.

If we put aside usability and performance on which we'll never agree, there
is still one outstanding issue: the lack of mask. Users cannot tell which
fields are relevant and to be kept as is, and which are not.

How do applications know what blanks are filled in by HW? How do PMDs know
what applications expect? There's a risk of sending incomplete or malformed
packets depending on the implementation.

One may expect PMDs and HW to just "do the sensible thing" but some
applications won't know that some fields are not offloaded and will be
emitted with an unexpected value, while others will attempt to force a
normally offloaded field to some specific value and expect it to leave
unmodified. This cannot be predicted by the PMD, something is needed.

Assuming you add a mask pointer to address this, generic encap should be
functionally complete but not all that different from what we currently have
for VXLAN/NVGRE and from Declan's earlier proposal for generic encap [1];
PMD must parse the buffer (using a proper packet parser with your approach),
collect relevant fields, see if anything's unsupported while doing so before
proceeding with the flow rule.

Anyway, if you add that mask and rename these actions (since they should work
with pretty much anything, not necessarily tunnels, i.e. lazy applications
could ask HW to prepend missing Ethernet headers to pure IP traffic), they
can make sense. How about labeling this "raw" encap/decap?

 RTE_FLOW_ACTION_TYPE_RAW_(ENCAP|DECAP)

 struct rte_flow_action_raw_encap {
 uint8_t *data; /**< Encapsulation data. */
 uint8_t *preserve; /**< Bit-mask of @p data to preserve on output. */
 size_t size; /**< Size of @p data and @p preserve. */
 };

I guess decap could use the same object. Since there is no way to define a
sensible default behavior that works across multiple vendors when "preserve"
is not provided, I think this field cannot be NULL.

As for "L3 decap", well, can't one just provide a separate encap action?
I mean a raw decap action, followed by another action doing raw encap of the
intended L2? A separate set of actions seems unnecessary for that.

[1] "[PATCH v3 2/4] ethdev: Add tunnel encap/decap actions"
https://mails.dpdk.org/archives/dev/2018-April/095733.html

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] app/testpmd: fix flow query failure

2018-10-11 Thread Adrien Mazarguil

On Wed, Oct 10, 2018 at 05:04:56PM +, Mordechay Haimovsky wrote:
> This patch fixes a bug found in port_flow_query routine which caused
> flow query command to fail with the following error "Caught error
> type 1 (cause unspecified): unknown object type to retrieve the name
> of: Invalid argument".
> 
> Fixes: f7ba5e7a0f8c ("app/testpmd: rely on flow API conversion function")
> 
> Signed-off-by: Moti Haimovsky 

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] app/testpmd: fix flow list command

2018-10-10 Thread Adrien Mazarguil

On Wed, Oct 10, 2018 at 04:27:59PM +, Mordechay Haimovsky wrote:
> Hi Adrien,
>  You are correct, the bug is not where we thought it is, moreover the fix 
> breaks
> the CLI and should be rejected.
> We investigated more and found that the bug is in the port_flow_query routine
> Which passes an incorrect argument to rte_flow_conv as follows:
>ret = rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
>&name, sizeof(name), action, 
> &error);
> While it should pass onlt the action type as follows:
>   ret = rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
>   &name, sizeof(name),
>   (void *)(uintptr_t)action->type, &error);
> As done in port_flow_list routine (which works)
>   if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> &name, sizeof(name),
> (void 
> *)(uintptr_t)action->type,
> NULL) <= 0)
> And according to the  parameters description of rte_flow_conv_name (called by 
>  rte_flow_conv):
>   * @param[in] src
>   *   Depending on @p is_action, source pattern item or action type cast 
> as a
>*   pointer.
> 
> Modifying port_flow_query accordingly solves the issue.
> I will issue a new patch tonight.

Indeed, I confirm this is the right bug to address (and seems like I did not
validate rte_flow_query() properly.) Thanks for taking care of it! 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions

2018-10-10 Thread Adrien Mazarguil

On Wed, Oct 10, 2018 at 01:17:01PM +, Ori Kam wrote:

> > -Original Message-
> > From: Adrien Mazarguil 

> > On Wed, Oct 10, 2018 at 09:00:52AM +, Ori Kam wrote:
> > 
> > > > On 10/7/2018 1:57 PM, Ori Kam wrote:

> > > > In addtion the parameter to to the encap action is a list of rte items,
> > > > this results in 2 extra translation, between the application to the 
> > > > action
> > > > and from the action to the NIC. This results in negetive impact on the
> > > > insertion performance.
> > 
> > Not sure it's a valid concern since in this proposal, PMD is still expected
> > to interpret the opaque buffer contents regardless for validation and to
> > convert it to its internal format.
> > 
> This is the action to take, we should assume
> that the pattern is valid and not parse it at all.
> Another issue, we have a lot of complains about the time we take 
> for validation, I know that currently we must validate the rule when creating 
> it,
> but this can change, why should a rule that was validate and the only change
> is the IP dest of the encap data?
> virtual switch after creating the first flow are just modifying it so why 
> force
> them into revalidating it? (but this issue is a different topic)

Did you measure what proportion of time is spent on validation when creating
a flow rule?

Based on past experience with mlx4/mlx5, creation used to involve a number
of expensive system calls while validation was basically a single logic loop
checking individual items/actions while performing conversion to HW
format (mandatory for creation). Context switches related to kernel
involvement are the true performance killers.

I'm not sure this is a valid argument in favor of this approach since flow
rule validation still needs to happen regardless.

By the way, applications are not supposed to call rte_flow_validate() before
rte_flow_create(). The former can be helpful in some cases (e.g. to get a
rough idea of PMD capabilities during initialization) but they should in
practice only rely on rte_flow_create(), then fall back to software
processing if that fails.

> > Worse, it will require a packet parser to iterate over enclosed headers
> > instead of a list of convenient rte_flow_whatever objects. It won't be
> > faster without the convenience of pointers to properly aligned structures
> > that only contain relevant data fields.
> >
> Also in the rte_item we are not aligned so there is no difference in 
> performance,
> between the two approaches, In the rte_item actually we have unused pointer 
> which
> are just a waste.

Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have
relied on _pattern item_ structures, the room for their "last" pointer is
arguably wasted. On the other hand, the "mask" pointer allows masking
relevant fields that matter to the application (e.g. source/destination
addresses as opposed to IPv4 length, version and other irrelevant fields for
encap).

Not sure why you think it's not aligned. We're comparing an array of
rte_flow_item objects with raw packet data. The latter requires
interpretation of each protocol header to jump to the next offset. This is
more complex on both sides: to build such a buffer for the application, then
to have it processed by the PMD.

> Also needs to consider how application are using it. They are already have it 
> in raw buffer
> so it saves the conversation time for the application.

I don't think so. Applications typically know where some traffic is supposed
to go and what VNI it should use. They don't have a prefabricated packet
handy to prepend to outgoing traffic. If that was the case they'd most
likely do so themselves through a extra packet segment and not bother with
PMD offloads.

> > From a usability standpoint I'm not a fan of the current interface to
> > perform NVGRE/VXLAN encap, however this proposal adds another layer of
> > opaqueness in the name of making things more generic than rte_flow already
> > is.
> > 
> I'm sorry but I don't understand why it is more opaqueness, as I see it is 
> very simple
> just give the encapsulation data and that's it. For example on system that 
> support number of
> encapsulations they don't need to call to a different function just to change 
> the buffer.

I'm saying it's opaque from an API standpoint if you expect the PMD to
interpret that buffer's contents in order to prepend it in a smart way.

Since this generic encap does not support masks, there is no way for an
application to at least tell a PMD what data matters and what doesn't in the
provided buffer. This means invalid checksums, lengths and

Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions

2018-10-10 Thread Adrien Mazarguil

Who will expect
something that isn't defined by the API to work and rely on it in their
application? I don't see it happening.

Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
that the only alternative is a generic API to work around me :)

> > Arguments about a way of encap/decap headers specification (flow items
> > vs raw) sound sensible, but I'm not sure about it.
> > It would be simpler if the tunnel header is added appended or removed
> > as is, but as I understand it is not true. For example, IPv4 ID will be
> > different in incoming packets to be decapsulated and different values
> > should be used on encapsulation. Checksums will be different (but
> > offloaded in any case).
> > 
> 
> I'm not sure I understand your comment. 
> Decapsulation is independent of encapsulation, for example if we decap 
> L2 tunnel type then there is no parameter at all the NIC just removes 
> the outer layers.

According to the pattern? As described above, you can't rely on that.
Pattern does not necessarily match the full stack of outer layers.

Decap action must be able to determine what to do on its own, possibly in
conjunction with other actions in the list but that's all.

> > Current way allows to specify which fields do not matter and which one
> > must match. It allows to say that, for example, VNI match is sufficient
> > to decapsulate.
> > 
> 
> The encapsulation according to definition, is a list of headers that should 
> encapsulate the packet. So I don't understand your comment about matching
> fields. The matching is based on the flow and the encapsulation is just data
> that should be added on top of the packet.
> 
> > Also arguments assume that action input is accepted as is by the HW.
> > It could be true, but could be obviously false and HW interface may
> > require parsed input (i.e. driver must parse the input buffer and extract
> > required fields of packet headers).
> > 
> 
> You are correct there some PMD even Mellanox (for the E-Switch) require to 
> parsed input
> There is no driver that knows rte_flow structure so in any case there should 
> be 
> Some translation between the encapsulation data and the NIC data.
> I agree that writing the code for translation can be harder in this approach,
> but the code is only written once is the insertion speed is much higher this 
> way.

Avoiding code duplication enough of a reason to do something. Yes NVGRE and
VXLAN encap/decap should be redefined because of that. But IMO, they should
prepend a single VXLAN or NVGRE header and be followed by other actions that
in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers
and finally an Ethernet header.

> Also like I said some Virtual Switches are already store this data in raw 
> buffer 
> (they update only specific fields) so this will also save time for the 
> application when
> creating a rule.
> 
> > So, I'd say no. It should be better motivated if we change existing
> > approach (even advertised as experimental).
> 
> I think the reasons I gave are very good motivation to change the approach
> please also consider that there is no implementation yet that supports the
> old approach.

Well, although the existing API made this painful, I did submit one [4] and
there's an updated version from Slava [5] for mlx5.

> while we do have code that uses the new approach.

If you need the ability to prepend a raw buffer, please consider a different
name for the related actions, redefine them without reliance on specific
pattern items and leave NVGRE/VXLAN encap/decap as is for the time
being. They can deprecated anytime without ABI impact.

On the other hand if that raw buffer is to be interpreted by the PMD for
more intelligent tunnel encap/decap handling, I do not agree with the
proposed approach for usability reasons.

[2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
https://mails.dpdk.org/archives/dev/2018-April/096418.html

[3] ethdev: alter behavior of flow API actions
https://git.dpdk.org/dpdk/commit/?id=cc17feb90413

[4] net/mlx5: add VXLAN encap support to switch flow rules
https://mails.dpdk.org/archives/dev/2018-August/110598.html

[5] net/mlx5: e-switch VXLAN flow validation routine
https://mails.dpdk.org/archives/dev/2018-October/113782.html

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] app/testpmd: fix flow list command

2018-10-10 Thread Adrien Mazarguil

Hi John

On Tue, Oct 09, 2018 at 03:51:24PM -0700, John Daley wrote:
> This patch fixes the 'flow list ' command which caused a
> segfault when passing the action or item 'type' field instead
> of the action or item struct pointer in the call to rte_flow_conv.
> 
> Fixes: 7d94dcedf7ce ("app/testpmd: rely on flow API conversion function")
> 
> Signed-off-by: John Daley 

That bug was introduced by a broken fix, it wasn't present in the original
patch, please see yesterday's discussion [1].

RTE_FLOW_CONV_OP_(ITEM|ACTION)_NAME[_PTR] operations are documented as using
an integer type (enum rte_flow_item_type) cast as (void *) for src because
they convert item/action *types* to corresponding strings, i.e. no need to
allocate temporary items/actions just to retrieve their names. I thought it
would be more versatile and efficient that way.

[1] "ethdev: fix flow API item/action name conversion"
https://mails.dpdk.org/archives/dev/2018-October/115054.html

> ---
>  app/test-pmd/config.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 86c205806..2ce40f3e1 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1354,7 +1354,7 @@ port_flow_list(portid_t port_id, uint32_t n, const 
> uint32_t group[n])
>   while (item->type != RTE_FLOW_ITEM_TYPE_END) {
>   if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
> &name, sizeof(name),
> -   (void *)(uintptr_t)item->type,
> +   (void *)(uintptr_t)item,

Also, while it does work because type is the first field, it should have
read "&item->type" for correctness. Anyway this patch shouldn't be needed
assuming the broken fix is reverted.

> NULL) <= 0)
>   name = "[UNKNOWN]";
>   if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
> @@ -1365,7 +1365,7 @@ port_flow_list(portid_t port_id, uint32_t n, const 
> uint32_t group[n])
>   while (action->type != RTE_FLOW_ACTION_TYPE_END) {
>   if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> &name, sizeof(name),
> -   (void *)(uintptr_t)action->type,
> +   (void *)(uintptr_t)action,

Ditto.

>     NULL) <= 0)
>   name = "[UNKNOWN]";
>   if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
> -- 
> 2.16.2
> 

Thanks.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v1] ethdev: fix flow API item/action name conversion

2018-10-09 Thread Adrien Mazarguil

Hi,

Jumping in although I cannot spend much time on rte_flow at the moment,
please see below.

On Tue, Oct 09, 2018 at 02:21:23PM +0100, Ferruh Yigit wrote:
> On 10/7/2018 5:31 PM, Ori Kam wrote:
> > 
> > 
> >> -Original Message-
> >> From: dev  On Behalf Of Mordechay Haimovsky
> >> Sent: Sunday, October 7, 2018 7:22 PM
> >> To: Adrien Mazarguil ; Shahaf Shuler
> >> ; or...@contextream.com
> >> Cc: dev@dpdk.org; Mordechay Haimovsky 
> >> Subject: [dpdk-dev] [PATCH v1] ethdev: fix flow API item/action name
> >> conversion
> >>
> >> This patch fixes a typecast bug found in rte_flow_conv_name routine
> >> used in rte_flow item/action name conversion.
> >>
> >> Fixes: 0c2640cbfa7a ("ethdev: add flow API item/action name conversion")
> >>
> >> Signed-off-by: Moti Haimovsky 
> <...>
> > Acked-by: Ori Kam 
> 
> Series applied to dpdk-next-net/master, thanks.
> 
> (please confirm latest next-net head)

Please revert, it breaks something that didn't need to be fixed. I don't
think this patch was validated properly.

As documented in RTE_FLOW_CONV_OP_ITEM_NAME, RTE_FLOW_CONV_OP_ACTION_NAME,
RTE_FLOW_CONV_OP_ITEM_NAME_PTR and RTE_FLOW_CONV_OP_ACTION_NAME_PTR:

 @p src type:
   @code (const void *)enum rte_flow_item_type @endcode

With the following reminder in rte_flow_conv_name()'s Doxygen documentation:

 @param[in] src
   Depending on @p is_action, source pattern item or action type cast as a
   pointer.

Hence the original conversion results in the expected behavior while this
one is almost guaranteed to trigger a segfault:

 -   unsigned int type = (uintptr_t)src;
 +   unsigned int type = *(const unsigned int *)src;

This can be validated with testpmd. See what happens with "flow list".

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v3] ppc64: fix compilation of when AltiVec is enabled

2018-09-03 Thread Adrien Mazarguil

Hi Christian,

Couldn't follow up on this last week, however I still have some concerns and
comments, please see below.

On Thu, Aug 30, 2018 at 01:59:59PM +0200, Christian Ehrhardt wrote:
> The definition of almost any newer standard like --stc=c11 will drop
> __APPLCE_ALTIVEC__ which otherwise would be defined.
> If that is the case then altivec.h will redefine bool to a type
> conflicting with those defined by stdbool.h.
> 
> This breaks compilation of 18.08 on ppc64 like:
>   mlx5_nl_flow.c:407:17: error: incompatible types when assigning
>   to type ‘__vector __bool int’ {aka ‘__vector(4) __bool int’}
>   from type ‘int’ in_port_id_set = false;
> 
> Other alternatives were pursued on [1] but they always ended up being
> more complex than what would be appropriate for the issue we face.
> 
> [1]: http://mails.dpdk.org/archives/dev/2018-August/109926.html
> 
> Tested-by: Takeshi T Yoshimura 
> Reviewed-by: Adrien Mazarguil 
> Signed-off-by: Christian Ehrhardt 
> ---
>  .../common/include/arch/ppc_64/rte_memcpy.h   | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h 
> b/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
> index 75f74897b..0b3b89b56 100644
> --- a/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
> +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
> @@ -37,6 +37,17 @@
>  #include 
>  /*To include altivec.h, GCC version must  >= 4.8 */
>  #include 
> +/*
> + * Compilation workaround for PPC64 targets when AltiVec is fully
> + * enabled e.g. with std=c11. Otherwise there would be a type conflict
> + * of "bool" between stdbool and altivec.
> + */
> +#if defined(__PPC64__) && !defined(__APPLE_ALTIVEC__)
> + #undef bool
> + /* redefine as in stdbool.h */
> + #define bool _Bool
> +#endif
> +

The above will break existing C++ programs that include rte_memcpy.h.

Problem is that bool is an actual C++ type. C99 has _Bool which doesn't
exist in C++ along with a bool macro that appears only after including
stdbool.h.

To make things worse, nothing prevents C++ programs from importing a C-style
bool macro by including stdbool.h (or cstdbool).

Enclosing it in #ifdef __cplusplus won't help because you never know what
bool is supposed to be in the first place as it depends on how applications
are written. I think something like this prior suggestion [1]
(saving/restoring bool) is the only way to deal with that in a safe-ish
fashion.

Pending something better, the above #undef/#define workaround is only safe
to use inside mlx5 PMD code that triggers the compilation issue. It must not
be found in a public header.

>  #ifdef __cplusplus
>  extern "C" {
> -- 
> 2.17.1
> 

[1] https://mails.dpdk.org/archives/dev/2018-August/110401.html

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH 7/8] net/mlx5: add VXLAN encap support to switch flow rules

2018-08-31 Thread Adrien Mazarguil

This patch is huge because support for VXLAN encapsulation in switch flow
rules involves configuration of virtual network interfaces on the host
system including source addresses, routes and neighbor entries for flow
rules to be offloadable by TC. All of this is done through Netlink.

VXLAN interfaces are dynamically created for each combination of local UDP
port and outer network interface associated with flow rules, then used as
targets for TC "flower" filters in order to perform encapsulation.

To automatically create and remove these interfaces on a needed basis
according to the applied flow rules, the PMD maintains global resources
shared between all PMD instances of the primary process.

Testpmd example:

- Setting up outer properties of VXLAN tunnel:

  set vxlan ip-version ipv4 vni 0x112233 udp-src 4242 udp-dst 4789
ip-src 1.1.1.1 ip-dst 2.2.2.2
eth-src 00:11:22:33:44:55 eth-dst 66:77:88:99:aa:bb

- Creating a flow rule on port ID 2 performing VXLAN encapsulation with the
  above properties and directing the resulting traffic to port ID 1:

  flow create 2 ingress transfer pattern eth src is 00:11:22:33:44:55 /
 ipv4 / udp dst is 5566 / end actions vxlan_encap / port_id id 1 / end

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile   |   10 +
 drivers/net/mlx5/mlx5_nl_flow.c | 1198 +-
 2 files changed, 1204 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 2e70dec5b..1ba4ce612 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -384,6 +384,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
/usr/include/assert.h \
define static_assert \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_ACT_TUNNEL_KEY \
+   linux/tc_act/tc_tunnel_key.h \
+   define TCA_ACT_TUNNEL_KEY \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT \
+   linux/tc_act/tc_tunnel_key.h \
+   enum TCA_TUNNEL_KEY_ENC_DST_PORT \
+   $(AUTOCONF_OUTPUT)
 
 # Create mlx5_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 91ff90a13..672f92863 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -6,7 +6,31 @@
 #include 
 #include 
 #include 
+/*
+ * Older versions of linux/if.h do not have the required safeties to coexist
+ * with net/if.h. This causes a compilation failure due to symbol
+ * redefinitions even when including the latter first.
+ *
+ * One workaround is to prevent net/if.h from defining conflicting symbols
+ * by removing __USE_MISC, and maintaining it undefined while including
+ * linux/if.h.
+ *
+ * Alphabetical order cannot be preserved since net/if.h must always be
+ * included before linux/if.h regardless.
+ */
+#ifdef __USE_MISC
+#undef __USE_MISC
+#define RESTORE_USE_MISC
+#endif
+#include 
+#include 
+#ifdef RESTORE_USE_MISC
+#undef RESTORE_USE_MISC
+#define __USE_MISC 1
+#endif
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -14,11 +38,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -52,6 +78,34 @@ struct tc_vlan {
 
 #endif /* HAVE_TC_ACT_VLAN */
 
+#ifdef HAVE_TC_ACT_TUNNEL_KEY
+
+#include 
+
+#ifndef HAVE_TCA_TUNNEL_KEY_ENC_DST_PORT
+#define TCA_TUNNEL_KEY_ENC_DST_PORT 9
+#endif
+
+#else /* HAVE_TC_ACT_TUNNEL_KEY */
+
+#define TCA_ACT_TUNNEL_KEY 17
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE 2
+#define TCA_TUNNEL_KEY_PARMS 2
+#define TCA_TUNNEL_KEY_ENC_IPV4_SRC 3
+#define TCA_TUNNEL_KEY_ENC_IPV4_DST 4
+#define TCA_TUNNEL_KEY_ENC_IPV6_SRC 5
+#define TCA_TUNNEL_KEY_ENC_IPV6_DST 6
+#define TCA_TUNNEL_KEY_ENC_KEY_ID 7
+#define TCA_TUNNEL_KEY_ENC_DST_PORT 9
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+#endif /* HAVE_TC_ACT_TUNNEL_KEY */
+
 /* Normally found in linux/netlink.h. */
 #ifndef NETLINK_CAP_ACK
 #define NETLINK_CAP_ACK 10
@@ -148,6 +202,71 @@ struct tc_vlan {
 #define TCA_FLOWER_KEY_VLAN_ETH_TYPE 25
 #endif
 
+#define BIT(b) (1 << (b))
+#define BIT_ENCAP(e) BIT(MLX5_NL_FLOW_ENCAP_ ## e)
+
+/** Flags used for @p mask in struct mlx5_nl_flow_encap. */
+enum mlx5_nl_flow_encap_flag {
+   MLX5_NL_FLOW_ENCAP_ETH_SRC,
+   MLX5_NL_FLOW_ENCAP_ETH_DST,
+   MLX5_NL_FLOW_ENCAP_IPV4_SRC,
+   MLX5_NL_FLOW_ENCAP_IPV4_DST,
+   MLX5_NL_FLOW_ENCAP_IPV6_SRC,
+   MLX5_NL_FLOW_ENCAP_IPV6_DST,
+   MLX5_NL_FLOW_ENCAP_UDP_SRC,
+   MLX5_NL_FLOW_ENCAP_UDP_DST,
+   MLX5_NL_FLOW_ENCAP_VXLAN_VNI,
+};
+
+/** Encapsulation structure with fixed format for convenience. */
+struct mlx5_nl_flow_encap {
+   uin

[dpdk-dev] [PATCH 8/8] net/mlx5: add VXLAN decap support to switch flow rules

2018-08-31 Thread Adrien Mazarguil

This provides support for the VXLAN_DECAP action. Outer tunnel properties
are specified as the initial part of the flow rule pattern (up to and
including VXLAN item), optionally followed by inner traffic properties.

Testpmd examples:

- Creating a flow on port ID 1 performing VXLAN decapsulation and directing
  the result to port ID 2 without checking inner properties:

  flow create 1 ingress transfer pattern eth src is 66:77:88:99:aa:bb
 dst is 00:11:22:33:44:55 / ipv4 src is 2.2.2.2 dst is 1.1.1.1 /
 udp src is 4789 dst is 4242 / vxlan vni is 0x112233 / end
 actions vxlan_decap / port_id id 2 / end

- Same as above except only inner TCPv6 packets with destination port 42
  will be let through:

  flow create 1 ingress transfer pattern eth src is 66:77:88:99:aa:bb
 dst is 00:11:22:33:44:55 / ipv4 src is 2.2.2.2 dst is 1.1.1.1 /
 udp src is 4789 dst is 4242 / vxlan vni is 0x112233 /
 eth / ipv6 / tcp dst is 42 / end
 actions vxlan_decap / port_id id 2 / end

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile   |  65 +++
 drivers/net/mlx5/mlx5_nl_flow.c | 344 ---
 2 files changed, 379 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ba4ce612..85672abd6 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -335,6 +335,71 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
enum TCA_FLOWER_KEY_VLAN_ETH_TYPE \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_KEY_ID \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_KEY_ID \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV4_SRC \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV4_DST \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV4_DST_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV4_DST_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV6_SRC \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV6_DST \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_IPV6_DST_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_IPV6_DST_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_UDP_SRC_PORT \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_UDP_DST_PORT \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK \
+   linux/pkt_cls.h \
+   enum TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
HAVE_TC_ACT_VLAN \
linux/tc_act/tc_vlan.h \
enum TCA_VLAN_PUSH_VLAN_PRIORITY \
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 672f92863..12802796a 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -201,6 +201,45 @@ struct tc_tunnel_key {
 #ifndef HAVE_TCA_FLOWER_KEY_VLAN_ETH_TYPE
 #

[dpdk-dev] [PATCH 6/8] net/mlx5: add convenience macros to switch flow rule engine

2018-08-31 Thread Adrien Mazarguil

Upcoming patches will rely on them.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_nl_flow.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index d20416026..91ff90a13 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -236,6 +236,13 @@ static const union {
struct rte_flow_item_udp udp;
 } mlx5_nl_flow_mask_empty;
 
+#define ETHER_ADDR_MASK "\xff\xff\xff\xff\xff\xff"
+#define IN_ADDR_MASK RTE_BE32(0x)
+#define IN6_ADDR_MASK \
+   "\xff\xff\xff\xff\xff\xff\xff\xff" \
+   "\xff\xff\xff\xff\xff\xff\xff\xff"
+#define BE16_MASK RTE_BE16(0x)
+
 /** Supported masks for known item types. */
 static const struct {
struct rte_flow_item_port_id port_id;
@@ -251,8 +258,8 @@ static const struct {
},
.eth = {
.type = RTE_BE16(0x),
-   .dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-   .src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+   .dst.addr_bytes = ETHER_ADDR_MASK,
+   .src.addr_bytes = ETHER_ADDR_MASK,
},
.vlan = {
/* PCP and VID only, no DEI. */
@@ -261,25 +268,21 @@ static const struct {
},
.ipv4.hdr = {
.next_proto_id = 0xff,
-   .src_addr = RTE_BE32(0x),
-   .dst_addr = RTE_BE32(0x),
+   .src_addr = IN_ADDR_MASK,
+   .dst_addr = IN_ADDR_MASK,
},
.ipv6.hdr = {
.proto = 0xff,
-   .src_addr =
-   "\xff\xff\xff\xff\xff\xff\xff\xff"
-   "\xff\xff\xff\xff\xff\xff\xff\xff",
-   .dst_addr =
-   "\xff\xff\xff\xff\xff\xff\xff\xff"
-   "\xff\xff\xff\xff\xff\xff\xff\xff",
+   .src_addr = IN6_ADDR_MASK,
+   .dst_addr = IN6_ADDR_MASK,
},
.tcp.hdr = {
-   .src_port = RTE_BE16(0x),
-   .dst_port = RTE_BE16(0x),
+   .src_port = BE16_MASK,
+   .dst_port = BE16_MASK,
},
.udp.hdr = {
-   .src_port = RTE_BE16(0x),
-   .dst_port = RTE_BE16(0x),
+   .src_port = BE16_MASK,
+   .dst_port = BE16_MASK,
},
 };
 
-- 
2.11.0

[dpdk-dev] [PATCH 5/8] net/mlx5: prepare switch flow rule parser for encap offloads

2018-08-31 Thread Adrien Mazarguil

A mere message buffer is not enough to support the additional logic
required to manage flow rules with such offloads; a dedicated object
(struct mlx5_nl_flow) with the ability to store additional information and
adjustable target network interfaces is needed, as well as a context
object for shared data (struct mlx5_nl_flow_ctx).

A predictable message sequence number can now be stored in the context
object as an improvement over CPU counters.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  18 ++--
 drivers/net/mlx5/mlx5.h |  22 ++--
 drivers/net/mlx5/mlx5_flow.c|  10 +-
 drivers/net/mlx5/mlx5_nl_flow.c | 189 ---
 4 files changed, 155 insertions(+), 84 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9a504a31c..c10ca4ae5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -282,8 +282,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
close(priv->nl_socket_route);
if (priv->nl_socket_rdma >= 0)
close(priv->nl_socket_rdma);
-   if (priv->mnl_socket)
-   mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+   if (priv->nl_flow_ctx)
+   mlx5_nl_flow_ctx_destroy(priv->nl_flow_ctx);
ret = mlx5_hrxq_ibv_verify(dev);
if (ret)
DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
@@ -1136,13 +1136,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
if (vf && config.vf_nl_en)
mlx5_nl_mac_addr_sync(eth_dev);
-   priv->mnl_socket = mlx5_nl_flow_socket_create();
-   if (!priv->mnl_socket ||
+   priv->nl_flow_ctx = mlx5_nl_flow_ctx_create(eth_dev->device->numa_node);
+   if (!priv->nl_flow_ctx ||
!priv->ifindex ||
-   mlx5_nl_flow_ifindex_init(priv->mnl_socket, priv->ifindex,
+   mlx5_nl_flow_ifindex_init(priv->nl_flow_ctx, priv->ifindex,
  &flow_error)) {
-   if (!priv->mnl_socket) {
-   flow_error.message = "cannot open libmnl socket";
+   if (!priv->nl_flow_ctx) {
+   flow_error.message = "cannot create NL flow context";
} else if (!priv->ifindex) {
rte_errno = ENXIO;
flow_error.message = "unknown network interface index";
@@ -1204,8 +1204,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
close(priv->nl_socket_route);
if (priv->nl_socket_rdma >= 0)
close(priv->nl_socket_rdma);
-   if (priv->mnl_socket)
-   mlx5_nl_flow_socket_destroy(priv->mnl_socket);
+   if (priv->nl_flow_ctx)
+   mlx5_nl_flow_ctx_destroy(priv->nl_flow_ctx);
if (own_domain_id)
claim_zero(rte_eth_switch_domain_free(priv->domain_id));
rte_free(priv);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 287cfc643..210f4ea11 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,7 +162,8 @@ struct mlx5_nl_flow_ptoi {
unsigned int ifindex; /**< Network interface index. */
 };
 
-struct mnl_socket;
+struct mlx5_nl_flow;
+struct mlx5_nl_flow_ctx;
 
 struct priv {
LIST_ENTRY(priv) mem_event_cb; /* Called by memory event callback. */
@@ -229,7 +230,7 @@ struct priv {
rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
/* UAR same-page access control required in 32bit implementations. */
 #endif
-   struct mnl_socket *mnl_socket; /* Libmnl socket. */
+   struct mlx5_nl_flow_ctx *nl_flow_ctx; /* Context for NL flow rules. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -396,21 +397,24 @@ int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 
 /* mlx5_nl_flow.c */
 
-int mlx5_nl_flow_transpose(void *buf,
+int mlx5_nl_flow_transpose(struct mlx5_nl_flow *nl_flow,
   size_t size,
   const struct mlx5_nl_flow_ptoi *ptoi,
   const struct rte_flow_attr *attr,
   const struct rte_flow_item *pattern,
   const struct rte_flow_action *actions,
   struct rte_flow_error *error);
-void mlx5_nl_flow_brand(void *buf, uint32_t handle);
-int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
+void mlx5_nl_flow_brand(struct mlx5_nl_flow *nl_flow, uint32_t handle);
+int mlx5_nl_flow_create(struct mlx5_nl_flow_ctx *ctx,
+   struct mlx5_nl_flow *nl_flow,
struct rte_flow_error *error);
-int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
+int mlx5_nl_

[dpdk-dev] [PATCH 2/8] net/mlx5: clean up redundant interface name getters

2018-08-31 Thread Adrien Mazarguil

In order to return the network interface index (ifindex) associated with a
device, mlx5_ifindex() uses if_nametoindex() to convert the result of
mlx5_get_ifname(). This is inefficient because the latter first retrieves
ifindex on its own to pass it through if_indextoname().

Since indices are much more reliable than names (less prone to change) and
involved in flow rule management where performance matters, this patch
moves ifindex-getting code directly into mlx5_ifindex() and replaces
remaining mlx5_get_ifname() calls with if_indextoname().

Similarly, the new function mlx5_master_ifindex() replaces
mlx5_get_master_ifname() while getting rid of irrelevant compatibility
code for unsupported Linux and MLNX_OFED versions.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c|  15 +--
 drivers/net/mlx5/mlx5.h|   3 -
 drivers/net/mlx5/mlx5_ethdev.c | 184 ++--
 3 files changed, 74 insertions(+), 128 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 55b73a03b..1414ce0c5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -734,7 +734,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
struct ibv_counter_set_description cs_desc = { .counter_type = 0 };
 #endif
struct ether_addr mac;
-   char name[RTE_ETH_NAME_MAX_LEN];
+   char name[RTE_MAX(IF_NAMESIZE, RTE_ETH_NAME_MAX_LEN)];
int own_domain_id = 0;
struct rte_flow_error flow_error;
unsigned int i;
@@ -1116,16 +1116,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
mac.addr_bytes[2], mac.addr_bytes[3],
mac.addr_bytes[4], mac.addr_bytes[5]);
 #ifndef NDEBUG
-   {
-   char ifname[IF_NAMESIZE];
-
-   if (mlx5_get_ifname(eth_dev, &ifname) == 0)
-   DRV_LOG(DEBUG, "port %u ifname is \"%s\"",
-   eth_dev->data->port_id, ifname);
-   else
-   DRV_LOG(DEBUG, "port %u ifname is unknown",
-   eth_dev->data->port_id);
-   }
+   DRV_LOG(DEBUG, "port %u ifname is \"%s\"",
+   eth_dev->data->port_id,
+   if_indextoname(priv->ifindex, name) ? name : "");
 #endif
/* Get actual MTU if possible. */
err = mlx5_get_mtu(eth_dev, &priv->mtu);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4c2dec644..0807cf689 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -241,9 +241,6 @@ int mlx5_getenv_int(const char *);
 
 /* mlx5_ethdev.c */
 
-int mlx5_get_master_ifname(const struct rte_eth_dev *dev,
-  char (*ifname)[IF_NAMESIZE]);
-int mlx5_get_ifname(const struct rte_eth_dev *dev, char 
(*ifname)[IF_NAMESIZE]);
 unsigned int mlx5_ifindex(const struct rte_eth_dev *dev);
 int mlx5_ifreq(const struct rte_eth_dev *dev, int req, struct ifreq *ifr,
   int master);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index cf0b415b2..67149b7b3 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -119,149 +119,104 @@ struct ethtool_link_settings {
 #endif
 
 /**
- * Get master interface name from private structure.
+ * Get network interface index associated with master device.
+ *
+ * Result differs from mlx5_ifindex() when the current device is a port
+ * representor.
  *
  * @param[in] dev
  *   Pointer to Ethernet device.
- * @param[out] ifname
- *   Interface name output buffer.
  *
  * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Nonzero interface index on success, zero otherwise and rte_errno is set.
  */
-int
-mlx5_get_master_ifname(const struct rte_eth_dev *dev,
-  char (*ifname)[IF_NAMESIZE])
+static unsigned int
+mlx5_master_ifindex(const struct rte_eth_dev *dev)
 {
struct priv *priv = dev->data->dev_private;
-   DIR *dir;
-   struct dirent *dent;
-   unsigned int dev_type = 0;
-   unsigned int dev_port_prev = ~0u;
-   char match[IF_NAMESIZE] = "";
-
-   {
-   MKSTR(path, "%s/device/net", priv->ibdev_path);
-
-   dir = opendir(path);
-   if (dir == NULL) {
-   rte_errno = errno;
-   return -rte_errno;
-   }
-   }
-   while ((dent = readdir(dir)) != NULL) {
-   char *name = dent->d_name;
-   FILE *file;
-   unsigned int dev_port;
-   int r;
-
-   if ((name[0] == '.') &&
-   ((name[1] == '\0') ||
-((name[1] == '.') && (name[2] == '\0'
-   continue;
+   DIR *dir = NULL;
+   stru

[dpdk-dev] [PATCH 1/8] net/mlx5: speed up interface index retrieval for flow rules

2018-08-31 Thread Adrien Mazarguil

rte_eth_dev_info_get() can be avoided since the underlying device type and
data structure are known.

Caching the index before creating any flow rules avoids a number of
redundant system calls later since users are not expected to destroy the
associated network interface while PMD is bound and running.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c| 48 ++---
 drivers/net/mlx5/mlx5.h|  1 +
 drivers/net/mlx5/mlx5_ethdev.c |  4 +---
 drivers/net/mlx5/mlx5_flow.c   |  6 ++---
 drivers/net/mlx5/mlx5_nl.c |  9 +++
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a8ae2b5d3..55b73a03b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -736,6 +736,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
struct ether_addr mac;
char name[RTE_ETH_NAME_MAX_LEN];
int own_domain_id = 0;
+   struct rte_flow_error flow_error;
unsigned int i;
 
/* Determine if this port representor is supposed to be spawned. */
@@ -959,6 +960,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
priv->representor_id =
switch_info->representor ? switch_info->port_name : -1;
+   /* Interface index will be known once eth_dev is allocated. */
+   priv->ifindex = 0;
/*
 * Look for sibling devices in order to reuse their switch domain
 * if any, otherwise allocate one.
@@ -1087,6 +1090,16 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
err = rte_errno;
goto error;
}
+   /*
+* Cache associated interface index since lookups are expensive.
+* It is not expected to change while a PMD instance is bound and
+* running.
+*/
+   priv->ifindex = mlx5_ifindex(eth_dev);
+   if (!priv->ifindex)
+   DRV_LOG(WARNING,
+   "cannot retrieve network interface index: %s",
+   strerror(rte_errno));
/* Configure the first MAC address by default. */
if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
DRV_LOG(ERR,
@@ -1131,32 +1144,19 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
if (vf && config.vf_nl_en)
mlx5_nl_mac_addr_sync(eth_dev);
priv->mnl_socket = mlx5_nl_flow_socket_create();
-   if (!priv->mnl_socket) {
-   err = -rte_errno;
+   if (!priv->mnl_socket ||
+   !priv->ifindex ||
+   mlx5_nl_flow_init(priv->mnl_socket, priv->ifindex, &flow_error)) {
+   if (!priv->mnl_socket) {
+   flow_error.message = "cannot open libmnl socket";
+   } else if (!priv->ifindex) {
+   rte_errno = ENXIO;
+   flow_error.message = "unknown network interface index";
+   }
DRV_LOG(WARNING,
"flow rules relying on switch offloads will not be"
-   " supported: cannot open libmnl socket: %s",
-   strerror(rte_errno));
-   } else {
-   struct rte_flow_error error;
-   unsigned int ifindex = mlx5_ifindex(eth_dev);
-
-   if (!ifindex) {
-   err = -rte_errno;
-   error.message =
-   "cannot retrieve network interface index";
-   } else {
-   err = mlx5_nl_flow_init(priv->mnl_socket, ifindex,
-   &error);
-   }
-   if (err) {
-   DRV_LOG(WARNING,
-   "flow rules relying on switch offloads will"
-   " not be supported: %s: %s",
-   error.message, strerror(rte_errno));
-   mlx5_nl_flow_socket_destroy(priv->mnl_socket);
-   priv->mnl_socket = NULL;
-   }
+   " supported: %s: %s",
+   flow_error.message, strerror(rte_errno));
}
TAILQ_INIT(&priv->flows);
TAILQ_INIT(&priv->ctrl_flows);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 35a196e76..4c2dec644 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -183,6 +183,7 @@ struct priv {
unsigned int representor:1; /* Device is a port representor. */
uint16_t domain_id; /* Switch domain identifier. */
int32_t representor_id; /* Port representor identifier. */
+   unsigned int ifindex; /* Interface index associated with device. */
/* RX/TX queues. */
unsigned int rxqs_n;

[dpdk-dev] [PATCH 0/8] net/mlx5: add switch offload for VXLAN encap/decap

2018-08-31 Thread Adrien Mazarguil

This series adds support for RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP and
RTE_FLOW_ACTION_TYPE_VXLAN_DECAP to mlx5.

Since these actions are supported at the switch level, the "transfer"
attribute must be set on such flow rules. They must also be combined with a
port redirection action to make sense.

A typical use case is port representors in switchdev mode, with VXLAN
traffic encapsulation performed on traffic coming *from* a representor and
decapsulation on traffic going *to* that representor, in order to
transparently assign a given VXLAN to VF traffic.

Since only ingress is supported, encapsulation flow rules are normally
applied on a physical port and emit traffic to a port representor. The
opposite order is used for decapsulation.

Like other mlx5 switch flow rule actions, these are implemented through
Linux's TC flower API. Since the Linux interface for VXLAN encap/decap
involves virtual network devices (i.e. ip link add type vxlan [...]), the
PMD automatically spawns them on a needed basis through Netlink calls. The
added complexity necessarily results in a rather convoluted PMD
implementation.

This series relies on "ethdev: add flow API object converter" [1] which
should applied first since testpmd does not provide a means to test VXLAN
encap otherwise.

[1] https://patches.dpdk.org/project/dpdk/list/?series=1123

Adrien Mazarguil (8):
  net/mlx5: speed up interface index retrieval for flow rules
  net/mlx5: clean up redundant interface name getters
  net/mlx5: rename internal function
  net/mlx5: enhance TC flow rule send/ack function
  net/mlx5: prepare switch flow rule parser for encap offloads
  net/mlx5: add convenience macros to switch flow rule engine
  net/mlx5: add VXLAN encap support to switch flow rules
  net/mlx5: add VXLAN decap support to switch flow rules

 drivers/net/mlx5/Makefile   |   75 ++
 drivers/net/mlx5/mlx5.c |   74 +-
 drivers/net/mlx5/mlx5.h |   28 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  188 ++--
 drivers/net/mlx5/mlx5_flow.c|   16 +-
 drivers/net/mlx5/mlx5_nl.c  |9 +-
 drivers/net/mlx5/mlx5_nl_flow.c | 1763 --
 7 files changed, 1871 insertions(+), 282 deletions(-)

-- 
2.11.0

[dpdk-dev] [PATCH 4/8] net/mlx5: enhance TC flow rule send/ack function

2018-08-31 Thread Adrien Mazarguil

A callback parameter to process replies will be useful for subsequent work
in this area. It implies the following:

- Replies may be much larger than requests. In fact their size cannot
  really be known in advance. Using MNL_SOCKET_BUFFER_SIZE (at least 8192
  bytes) is the recommended approach to make truncation less likely (look
  for NLMSG_GOODSIZE in Linux).

- Multipart replies are made of several messages. A loop is needed to
  process these.

- In case of truncated message (since one cannot really be sure),
  its remaining parts must be flushed to prevent their reception by
  subsequent queries.

- Using rte_get_tsc_cycles() instead of random() for message sequence
  numbers is faster yet unlikely to pick the same number twice in a row.

- mlx5_nl_flow_init() can be simplified since the query message is never
  written over (it was already the case actually).

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_nl_flow.c | 73 
 1 file changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index 9ea2a1b55..e720728b7 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1050,38 +1051,63 @@ mlx5_nl_flow_brand(void *buf, uint32_t handle)
 }
 
 /**
- * Send Netlink message with acknowledgment.
+ * Send Netlink message with acknowledgment and process reply.
  *
  * @param nl
  *   Libmnl socket to use.
  * @param nlh
- *   Message to send. This function always raises the NLM_F_ACK flag before
- *   sending.
+ *   Message to send. This function always raises the NLM_F_ACK flag and
+ *   sets its sequence number before sending.
+ * @param cb
+ *   Callback handler for received message.
+ * @param arg
+ *   Data pointer for callback handler.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_flow_nl_ack(struct mnl_socket *nl, struct nlmsghdr *nlh)
+mlx5_nl_flow_chat(struct mnl_socket *nl, struct nlmsghdr *nlh,
+ mnl_cb_t cb, void *arg)
 {
alignas(struct nlmsghdr)
-   uint8_t ans[mnl_nlmsg_size(sizeof(struct nlmsgerr)) +
-   nlh->nlmsg_len - sizeof(*nlh)];
-   uint32_t seq = random();
+   uint8_t ans[MNL_SOCKET_BUFFER_SIZE];
+   unsigned int portid = mnl_socket_get_portid(nl);
+   uint32_t seq = rte_get_tsc_cycles();
+   int err = 0;
int ret;
 
nlh->nlmsg_flags |= NLM_F_ACK;
nlh->nlmsg_seq = seq;
ret = mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
-   if (ret != -1)
+   nlh = (void *)ans;
+   /*
+* The following loop postpones non-fatal errors until multipart
+* messages are complete.
+*/
+   while (ret > 0) {
ret = mnl_socket_recvfrom(nl, ans, sizeof(ans));
-   if (ret != -1)
-   ret = mnl_cb_run
-   (ans, ret, seq, mnl_socket_get_portid(nl), NULL, NULL);
-   if (!ret)
+   if (ret == -1) {
+   err = errno;
+   if (err != ENOSPC)
+   break;
+   ret = sizeof(*nlh);
+   }
+   if (!err) {
+   ret = mnl_cb_run(nlh, ret, seq, portid, cb, arg);
+   if (ret < 0)
+   err = -ret;
+   }
+   if (!(nlh->nlmsg_flags & NLM_F_MULTI) ||
+   nlh->nlmsg_type == NLMSG_DONE)
+   ret = -err;
+   else
+   ret = 1;
+   }
+   if (!err)
return 0;
-   rte_errno = errno;
-   return -rte_errno;
+   rte_errno = err;
+   return -err;
 }
 
 /**
@@ -1105,7 +1131,7 @@ mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
 
nlh->nlmsg_type = RTM_NEWTFILTER;
nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
-   if (!mlx5_nl_flow_nl_ack(nl, nlh))
+   if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
return 0;
return rte_flow_error_set
(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -1133,7 +1159,7 @@ mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
 
nlh->nlmsg_type = RTM_DELTFILTER;
nlh->nlmsg_flags = NLM_F_REQUEST;
-   if (!mlx5_nl_flow_nl_ack(nl, nlh))
+   if (!mlx5_nl_flow_chat(nl, nlh, NULL, NULL))
return 0;
return rte_flow_error_set
(error, errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -1171,23 +1197,20 @@ mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, 
unsigned int ifindex,
tcm->tcm_ifindex = ifindex;
tcm->tcm_handle = TC_H_MAKE(TC_H_INGRESS, 0);
tcm->tcm_parent = TC_H_INGRESS;
+   if (!mnl_attr_pu

[dpdk-dev] [PATCH 3/8] net/mlx5: rename internal function

2018-08-31 Thread Adrien Mazarguil

Clarify difference between mlx5_nl_flow_create() and mlx5_nl_flow_init()
by renaming the latter mlx5_nl_flow_ifindex_init().

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c | 3 ++-
 drivers/net/mlx5/mlx5.h | 4 ++--
 drivers/net/mlx5/mlx5_nl_flow.c | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1414ce0c5..9a504a31c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1139,7 +1139,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
priv->mnl_socket = mlx5_nl_flow_socket_create();
if (!priv->mnl_socket ||
!priv->ifindex ||
-   mlx5_nl_flow_init(priv->mnl_socket, priv->ifindex, &flow_error)) {
+   mlx5_nl_flow_ifindex_init(priv->mnl_socket, priv->ifindex,
+ &flow_error)) {
if (!priv->mnl_socket) {
flow_error.message = "cannot open libmnl socket";
} else if (!priv->ifindex) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0807cf689..287cfc643 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -408,8 +408,8 @@ int mlx5_nl_flow_create(struct mnl_socket *nl, void *buf,
struct rte_flow_error *error);
 int mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
 struct rte_flow_error *error);
-int mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
- struct rte_flow_error *error);
+int mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+ struct rte_flow_error *error);
 struct mnl_socket *mlx5_nl_flow_socket_create(void);
 void mlx5_nl_flow_socket_destroy(struct mnl_socket *nl);
 
diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index beb03c911..9ea2a1b55 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -1154,8 +1154,8 @@ mlx5_nl_flow_destroy(struct mnl_socket *nl, void *buf,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_flow_init(struct mnl_socket *nl, unsigned int ifindex,
- struct rte_flow_error *error)
+mlx5_nl_flow_ifindex_init(struct mnl_socket *nl, unsigned int ifindex,
+ struct rte_flow_error *error)
 {
struct nlmsghdr *nlh;
struct tcmsg *tcm;
-- 
2.11.0

[dpdk-dev] [PATCH] app/testpmd: show errno along with flow API errors

2018-08-31 Thread Adrien Mazarguil

Signed-off-by: Adrien Mazarguil 
---
 app/test-pmd/config.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 669be168b..84e817ff2 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1048,11 +1048,12 @@ port_flow_complain(struct rte_flow_error *error)
errstr = "unknown type";
else
errstr = errstrlist[error->type];
-   printf("Caught error type %d (%s): %s%s\n",
+   printf("Caught error type %d (%s): %s%s: %s\n",
   error->type, errstr,
   error->cause ? (snprintf(buf, sizeof(buf), "cause: %p, ",
error->cause), buf) : "",
-  error->message ? error->message : "(no stated reason)");
+  error->message ? error->message : "(no stated reason)",
+  rte_strerror(err));
return -err;
 }
 
-- 
2.11.0

[dpdk-dev] [PATCH v3 7/7] ethdev: deprecate rte_flow_copy function

2018-08-31 Thread Adrien Mazarguil

No users left for this function, time to deprecate it.

Signed-off-by: Adrien Mazarguil 
Cc: Thomas Monjalon 
Cc: Ferruh Yigit 
Cc: Andrew Rybchenko 
Cc: Gaetan Rivet 
--
v3 changes:

- Removed deprecation notice (finally got Ferruh's point), made patch last
  in series.

v2 changes:

- Patch was not present in original series.
---
 doc/guides/rel_notes/deprecation.rst | 7 ---
 lib/librte_ethdev/rte_flow.h | 7 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index e2dbee317..48cfb266b 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -88,10 +88,3 @@ Deprecation Notices
   - ``rte_pdump_set_socket_dir`` will be removed;
   - The parameter, ``path``, of ``rte_pdump_init`` will be removed;
   - The enum ``rte_pdump_socktype`` will be removed.
-
-* ethdev: flow API function ``rte_flow_copy()`` will be deprecated in v18.11
-  in favor of ``rte_flow_conv()`` (which will appear in that version) and
-  subsequently removed for v19.02.
-
-  This is due to a lack of flexibility and reliance on a type unusable with
-  C++ programs (struct rte_flow_desc).
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 052ceefb6..f062ffead 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -2332,6 +2332,7 @@ rte_flow_error_set(struct rte_flow_error *error,
   const char *message);
 
 /**
+ * @deprecated
  * @see rte_flow_copy()
  */
 struct rte_flow_desc {
@@ -2343,10 +2344,13 @@ struct rte_flow_desc {
 };
 
 /**
+ * @deprecated
  * Copy an rte_flow rule description.
  *
  * This interface is kept for compatibility with older applications but is
- * implemented as a wrapper to rte_flow_conv().
+ * implemented as a wrapper to rte_flow_conv(). It is deprecated due to its
+ * lack of flexibility and reliance on a type unusable with C++ programs
+ * (struct rte_flow_desc).
  *
  * @param[in] fd
  *   Flow rule description.
@@ -2365,6 +2369,7 @@ struct rte_flow_desc {
  *   If len is lower than the size of the flow, the number of bytes that would
  *   have been written to desc had it been sufficient. Nothing is written.
  */
+__rte_deprecated
 size_t
 rte_flow_copy(struct rte_flow_desc *fd, size_t len,
  const struct rte_flow_attr *attr,
-- 
2.11.0

[dpdk-dev] [PATCH v3 5/7] net/bonding: switch to flow API object conversion function

2018-08-31 Thread Adrien Mazarguil

This patch replaces rte_flow_copy() with rte_flow_conv().

Signed-off-by: Adrien Mazarguil 
Cc: Declan Doherty 
Cc: Chas Williams 
--
v3 changes:

- Added build directives to allow experimental APIs, now needed for
  rte_flow_conv().

v2 changes:

- Patch was not present in original series.
---
 drivers/net/bonding/Makefile   |  1 +
 drivers/net/bonding/meson.build|  1 +
 drivers/net/bonding/rte_eth_bond_api.c |  6 ++---
 drivers/net/bonding/rte_eth_bond_flow.c| 31 +++--
 drivers/net/bonding/rte_eth_bond_private.h |  5 +++-
 5 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index acad16a1a..1893e3cad 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -8,6 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 LIB = librte_pmd_bond.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
diff --git a/drivers/net/bonding/meson.build b/drivers/net/bonding/meson.build
index 602d28803..00374edb2 100644
--- a/drivers/net/bonding/meson.build
+++ b/drivers/net/bonding/meson.build
@@ -3,6 +3,7 @@
 
 name = 'bond' #, james bond :-)
 version = 2
+allow_experimental_apis = true
 sources = files('rte_eth_bond_api.c', 'rte_eth_bond_pmd.c', 
'rte_eth_bond_flow.c',
'rte_eth_bond_args.c', 'rte_eth_bond_8023ad.c', 'rte_eth_bond_alb.c')
 
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 8bc04cfd1..a438fc509 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -245,9 +245,9 @@ slave_rte_flow_prepare(uint16_t slave_id, struct 
bond_dev_private *internals)
}
TAILQ_FOREACH(flow, &internals->flow_list, next) {
flow->flows[slave_id] = rte_flow_create(slave_port_id,
-   &flow->fd->attr,
-   flow->fd->items,
-   flow->fd->actions,
+   flow->rule.attr,
+   flow->rule.pattern,
+   flow->rule.actions,
&ferror);
if (flow->flows[slave_id] == NULL) {
RTE_BOND_LOG(ERR, "Cannot create flow for slave"
diff --git a/drivers/net/bonding/rte_eth_bond_flow.c 
b/drivers/net/bonding/rte_eth_bond_flow.c
index 31e4bcaeb..f94d46ca4 100644
--- a/drivers/net/bonding/rte_eth_bond_flow.c
+++ b/drivers/net/bonding/rte_eth_bond_flow.c
@@ -2,8 +2,11 @@
  * Copyright 2018 Mellanox Technologies, Ltd
  */
 
+#include 
+#include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -16,19 +19,33 @@ bond_flow_alloc(int numa_node, const struct rte_flow_attr 
*attr,
   const struct rte_flow_action *actions)
 {
struct rte_flow *flow;
-   size_t fdsz;
+   const struct rte_flow_conv_rule rule = {
+   .attr_ro = attr,
+   .pattern_ro = items,
+   .actions_ro = actions,
+   };
+   struct rte_flow_error error;
+   int ret;
 
-   fdsz = rte_flow_copy(NULL, 0, attr, items, actions);
-   flow = rte_zmalloc_socket(NULL, sizeof(struct rte_flow) + fdsz,
+   ret = rte_flow_conv(RTE_FLOW_CONV_OP_RULE, NULL, 0, &rule, &error);
+   if (ret < 0) {
+   RTE_BOND_LOG(ERR, "Unable to process flow rule (%s): %s",
+error.message ? error.message : "unspecified",
+strerror(rte_errno));
+   return NULL;
+   }
+   flow = rte_zmalloc_socket(NULL, offsetof(struct rte_flow, rule) + ret,
  RTE_CACHE_LINE_SIZE, numa_node);
if (unlikely(flow == NULL)) {
RTE_BOND_LOG(ERR, "Could not allocate new flow");
return NULL;
}
-   flow->fd = (void *)((uintptr_t)flow + sizeof(*flow));
-   if (unlikely(rte_flow_copy(flow->fd, fdsz, attr, items, actions) !=
-fdsz)) {
-   RTE_BOND_LOG(ERR, "Failed to copy flow description");
+   ret = rte_flow_conv(RTE_FLOW_CONV_OP_RULE, &flow->rule, ret, &rule,
+   &error);
+   if (ret < 0) {
+   RTE_BOND_LOG(ERR, "Failed to copy flow rule (%s): %s",
+error.message ? error.message : "unspecified",
+strerror(rte_errno));
rte_free(flow);
return NUL

[dpdk-dev] [PATCH v3 6/7] ethdev: add missing items/actions to flow object converter

2018-08-31 Thread Adrien Mazarguil

Several pattern items and actions were never handled by rte_flow_copy()
because their descriptions were missing. rte_flow_conv() inherited this
deficiency.

This patch adds them and reorders others to match rte_flow.h. It doesn't
pose as a fix because so far no one has complained about it and
rte_flow_conv() would have to be backported as well: this function is the
only sane approach to handle VXLAN and NVGRE encap definitions.

As a matter of fact, it's the last missing piece to finally allow testpmd
users to request the creation of VXLAN/NVGRE encap/decap flow rules without
getting rejected outright.

Signed-off-by: Adrien Mazarguil 
Cc: Declan Doherty 
Cc: Nelio Laranjeiro 
--
v2 changes:

- Patch was not present in original series.
---
 lib/librte_ethdev/rte_flow.c | 50 +--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c3ff7e713..9c56a9734 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -51,10 +51,15 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] 
= {
MK_FLOW_ITEM(TCP, sizeof(struct rte_flow_item_tcp)),
MK_FLOW_ITEM(SCTP, sizeof(struct rte_flow_item_sctp)),
MK_FLOW_ITEM(VXLAN, sizeof(struct rte_flow_item_vxlan)),
-   MK_FLOW_ITEM(MPLS, sizeof(struct rte_flow_item_mpls)),
-   MK_FLOW_ITEM(GRE, sizeof(struct rte_flow_item_gre)),
MK_FLOW_ITEM(E_TAG, sizeof(struct rte_flow_item_e_tag)),
MK_FLOW_ITEM(NVGRE, sizeof(struct rte_flow_item_nvgre)),
+   MK_FLOW_ITEM(MPLS, sizeof(struct rte_flow_item_mpls)),
+   MK_FLOW_ITEM(GRE, sizeof(struct rte_flow_item_gre)),
+   MK_FLOW_ITEM(FUZZY, sizeof(struct rte_flow_item_fuzzy)),
+   MK_FLOW_ITEM(GTP, sizeof(struct rte_flow_item_gtp)),
+   MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
+   MK_FLOW_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
+   MK_FLOW_ITEM(ESP, sizeof(struct rte_flow_item_esp)),
MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
MK_FLOW_ITEM(ARP_ETH_IPV4, sizeof(struct rte_flow_item_arp_eth_ipv4)),
@@ -67,6 +72,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = 
{
 sizeof(struct rte_flow_item_icmp6_nd_opt_sla_eth)),
MK_FLOW_ITEM(ICMP6_ND_OPT_TLA_ETH,
 sizeof(struct rte_flow_item_icmp6_nd_opt_tla_eth)),
+   MK_FLOW_ITEM(MARK, sizeof(struct rte_flow_item_mark)),
 };
 
 /** Generate flow_action[] entry. */
@@ -81,6 +87,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] 
= {
MK_FLOW_ACTION(END, 0),
MK_FLOW_ACTION(VOID, 0),
MK_FLOW_ACTION(PASSTHRU, 0),
+   MK_FLOW_ACTION(JUMP, sizeof(struct rte_flow_action_jump)),
MK_FLOW_ACTION(MARK, sizeof(struct rte_flow_action_mark)),
MK_FLOW_ACTION(FLAG, 0),
MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
@@ -91,6 +98,8 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] 
= {
MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
MK_FLOW_ACTION(PHY_PORT, sizeof(struct rte_flow_action_phy_port)),
MK_FLOW_ACTION(PORT_ID, sizeof(struct rte_flow_action_port_id)),
+   MK_FLOW_ACTION(METER, sizeof(struct rte_flow_action_meter)),
+   MK_FLOW_ACTION(SECURITY, sizeof(struct rte_flow_action_security)),
MK_FLOW_ACTION(OF_SET_MPLS_TTL,
   sizeof(struct rte_flow_action_of_set_mpls_ttl)),
MK_FLOW_ACTION(OF_DEC_MPLS_TTL, 0),
@@ -110,6 +119,10 @@ static const struct rte_flow_desc_data 
rte_flow_desc_action[] = {
   sizeof(struct rte_flow_action_of_pop_mpls)),
MK_FLOW_ACTION(OF_PUSH_MPLS,
   sizeof(struct rte_flow_action_of_push_mpls)),
+   MK_FLOW_ACTION(VXLAN_ENCAP, sizeof(struct rte_flow_action_vxlan_encap)),
+   MK_FLOW_ACTION(VXLAN_DECAP, 0),
+   MK_FLOW_ACTION(NVGRE_ENCAP, sizeof(struct rte_flow_action_vxlan_encap)),
+   MK_FLOW_ACTION(NVGRE_DECAP, 0),
 };
 
 static int
@@ -407,11 +420,16 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
switch (action->type) {
union {
const struct rte_flow_action_rss *rss;
+   const struct rte_flow_action_vxlan_encap *vxlan_encap;
+   const struct rte_flow_action_nvgre_encap *nvgre_encap;
} src;
union {
struct rte_flow_action_rss *rss;
+   struct rte_flow_action_vxlan_encap *vxlan_encap;
+   struct rte_flow_action_nvgre_encap *nvgre_encap;
} dst;
size_t tmp;
+   int ret;
 
case RTE_FLOW_ACTION_TYPE_RSS:
src.rss = action->conf;
@@ -445,6 +463,34 @@ rte_flow_conv_action_conf(void *buf, c

[dpdk-dev] [PATCH v3 3/7] app/testpmd: rely on flow API conversion function

2018-08-31 Thread Adrien Mazarguil

This commit replaces all local information about pattern items and actions
as well as flow rule duplication code with calls to rte_flow_conv().

Signed-off-by: Adrien Mazarguil 
Cc: Wenzhuo Lu 
Cc: Jingjing Wu 
Cc: Bernard Iremonger 
---
 app/test-pmd/config.c  | 407 +++-
 app/test-pmd/testpmd.h |   7 +-
 2 files changed, 67 insertions(+), 347 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 14ccd6864..669be168b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -984,324 +984,35 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
-/** Generate flow_item[] entry. */
-#define MK_FLOW_ITEM(t, s) \
-   [RTE_FLOW_ITEM_TYPE_ ## t] = { \
-   .name = # t, \
-   .size = s, \
-   }
-
-/** Information about known flow pattern items. */
-static const struct {
-   const char *name;
-   size_t size;
-} flow_item[] = {
-   MK_FLOW_ITEM(END, 0),
-   MK_FLOW_ITEM(VOID, 0),
-   MK_FLOW_ITEM(INVERT, 0),
-   MK_FLOW_ITEM(ANY, sizeof(struct rte_flow_item_any)),
-   MK_FLOW_ITEM(PF, 0),
-   MK_FLOW_ITEM(VF, sizeof(struct rte_flow_item_vf)),
-   MK_FLOW_ITEM(PHY_PORT, sizeof(struct rte_flow_item_phy_port)),
-   MK_FLOW_ITEM(PORT_ID, sizeof(struct rte_flow_item_port_id)),
-   MK_FLOW_ITEM(RAW, sizeof(struct rte_flow_item_raw)),
-   MK_FLOW_ITEM(ETH, sizeof(struct rte_flow_item_eth)),
-   MK_FLOW_ITEM(VLAN, sizeof(struct rte_flow_item_vlan)),
-   MK_FLOW_ITEM(IPV4, sizeof(struct rte_flow_item_ipv4)),
-   MK_FLOW_ITEM(IPV6, sizeof(struct rte_flow_item_ipv6)),
-   MK_FLOW_ITEM(ICMP, sizeof(struct rte_flow_item_icmp)),
-   MK_FLOW_ITEM(UDP, sizeof(struct rte_flow_item_udp)),
-   MK_FLOW_ITEM(TCP, sizeof(struct rte_flow_item_tcp)),
-   MK_FLOW_ITEM(SCTP, sizeof(struct rte_flow_item_sctp)),
-   MK_FLOW_ITEM(VXLAN, sizeof(struct rte_flow_item_vxlan)),
-   MK_FLOW_ITEM(E_TAG, sizeof(struct rte_flow_item_e_tag)),
-   MK_FLOW_ITEM(NVGRE, sizeof(struct rte_flow_item_nvgre)),
-   MK_FLOW_ITEM(MPLS, sizeof(struct rte_flow_item_mpls)),
-   MK_FLOW_ITEM(GRE, sizeof(struct rte_flow_item_gre)),
-   MK_FLOW_ITEM(FUZZY, sizeof(struct rte_flow_item_fuzzy)),
-   MK_FLOW_ITEM(GTP, sizeof(struct rte_flow_item_gtp)),
-   MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
-   MK_FLOW_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
-   MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
-   MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
-   MK_FLOW_ITEM(ARP_ETH_IPV4, sizeof(struct rte_flow_item_arp_eth_ipv4)),
-   MK_FLOW_ITEM(IPV6_EXT, sizeof(struct rte_flow_item_ipv6_ext)),
-   MK_FLOW_ITEM(ICMP6, sizeof(struct rte_flow_item_icmp6)),
-   MK_FLOW_ITEM(ICMP6_ND_NS, sizeof(struct rte_flow_item_icmp6_nd_ns)),
-   MK_FLOW_ITEM(ICMP6_ND_NA, sizeof(struct rte_flow_item_icmp6_nd_na)),
-   MK_FLOW_ITEM(ICMP6_ND_OPT, sizeof(struct rte_flow_item_icmp6_nd_opt)),
-   MK_FLOW_ITEM(ICMP6_ND_OPT_SLA_ETH,
-sizeof(struct rte_flow_item_icmp6_nd_opt_sla_eth)),
-   MK_FLOW_ITEM(ICMP6_ND_OPT_TLA_ETH,
-sizeof(struct rte_flow_item_icmp6_nd_opt_tla_eth)),
-};
-
-/** Pattern item specification types. */
-enum item_spec_type {
-   ITEM_SPEC,
-   ITEM_LAST,
-   ITEM_MASK,
-};
-
-/** Compute storage space needed by item specification and copy it. */
-static size_t
-flow_item_spec_copy(void *buf, const struct rte_flow_item *item,
-   enum item_spec_type type)
-{
-   size_t size = 0;
-   const void *data =
-   type == ITEM_SPEC ? item->spec :
-   type == ITEM_LAST ? item->last :
-   type == ITEM_MASK ? item->mask :
-   NULL;
-
-   if (!item->spec || !data)
-   goto empty;
-   switch (item->type) {
-   union {
-   const struct rte_flow_item_raw *raw;
-   } spec;
-   union {
-   const struct rte_flow_item_raw *raw;
-   } last;
-   union {
-   const struct rte_flow_item_raw *raw;
-   } mask;
-   union {
-   const struct rte_flow_item_raw *raw;
-   } src;
-   union {
-   struct rte_flow_item_raw *raw;
-   } dst;
-   size_t off;
-
-   case RTE_FLOW_ITEM_TYPE_RAW:
-   spec.raw = item->spec;
-   last.raw = item->last ? item->last : item->spec;
-   mask.raw = item->mask ? item->mask : &rte_flow_item_raw_mask;
-   src.raw = data;
-   dst.raw = buf;
-   off = RTE_ALIGN_CEIL(sizeof(struct rte_flow_item_raw),
-sizeof(*src.raw->pattern)

[dpdk-dev] [PATCH v3 4/7] net/failsafe: switch to flow API object conversion function

2018-08-31 Thread Adrien Mazarguil

This patch replaces rte_flow_copy() with rte_flow_conv().

Signed-off-by: Adrien Mazarguil 
Cc: Gaetan Rivet 
--
v2 changes:

- Patch was split from "ethdev: add flow API object converter".
---
 drivers/net/failsafe/failsafe_ether.c   |  6 +++---
 drivers/net/failsafe/failsafe_flow.c| 31 +---
 drivers/net/failsafe/failsafe_private.h |  5 -
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_ether.c 
b/drivers/net/failsafe/failsafe_ether.c
index 5b5cb3b49..8bce368f3 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -230,9 +230,9 @@ fs_eth_dev_conf_apply(struct rte_eth_dev *dev,
DEBUG("Creating flow #%" PRIu32, i++);
flow->flows[SUB_ID(sdev)] =
rte_flow_create(PORT_ID(sdev),
-   &flow->fd->attr,
-   flow->fd->items,
-   flow->fd->actions,
+   flow->rule.attr,
+   flow->rule.pattern,
+   flow->rule.actions,
&ferror);
ret = rte_errno;
if (ret)
diff --git a/drivers/net/failsafe/failsafe_flow.c 
b/drivers/net/failsafe/failsafe_flow.c
index bfe42fcee..5e2b5f7c6 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -3,8 +3,11 @@
  * Copyright 2017 Mellanox Technologies, Ltd
  */
 
+#include 
+#include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -18,19 +21,33 @@ fs_flow_allocate(const struct rte_flow_attr *attr,
 const struct rte_flow_action *actions)
 {
struct rte_flow *flow;
-   size_t fdsz;
+   const struct rte_flow_conv_rule rule = {
+   .attr_ro = attr,
+   .pattern_ro = items,
+   .actions_ro = actions,
+   };
+   struct rte_flow_error error;
+   int ret;
 
-   fdsz = rte_flow_copy(NULL, 0, attr, items, actions);
-   flow = rte_zmalloc(NULL,
-  sizeof(struct rte_flow) + fdsz,
+   ret = rte_flow_conv(RTE_FLOW_CONV_OP_RULE, NULL, 0, &rule, &error);
+   if (ret < 0) {
+   ERROR("Unable to process flow rule (%s): %s",
+ error.message ? error.message : "unspecified",
+ strerror(rte_errno));
+   return NULL;
+   }
+   flow = rte_zmalloc(NULL, offsetof(struct rte_flow, rule) + ret,
   RTE_CACHE_LINE_SIZE);
if (flow == NULL) {
ERROR("Could not allocate new flow");
return NULL;
}
-   flow->fd = (void *)((uintptr_t)flow + sizeof(*flow));
-   if (rte_flow_copy(flow->fd, fdsz, attr, items, actions) != fdsz) {
-   ERROR("Failed to copy flow description");
+   ret = rte_flow_conv(RTE_FLOW_CONV_OP_RULE, &flow->rule, ret, &rule,
+   &error);
+   if (ret < 0) {
+   ERROR("Failed to copy flow rule (%s): %s",
+ error.message ? error.message : "unspecified",
+ strerror(rte_errno));
rte_free(flow);
return NULL;
}
diff --git a/drivers/net/failsafe/failsafe_private.h 
b/drivers/net/failsafe/failsafe_private.h
index 886af8616..cc1f0343d 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -6,6 +6,7 @@
 #ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
 #define _RTE_ETH_FAILSAFE_PRIVATE_H_
 
+#include 
 #include 
 #include 
 
@@ -13,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
@@ -81,7 +83,8 @@ struct rte_flow {
/* sub_flows */
struct rte_flow *flows[FAILSAFE_MAX_ETHPORTS];
/* flow description for synchronization */
-   struct rte_flow_desc *fd;
+   struct rte_flow_conv_rule rule;
+   uint8_t rule_data[];
 };
 
 enum dev_state {
-- 
2.11.0

[dpdk-dev] [PATCH v3 0/7] ethdev: add flow API object converter

2018-08-31 Thread Adrien Mazarguil

This is a follow up to the "Flow API helpers enhancements" series submitted
almost a year ago [1]. The new title is due to the reduced scope of this
version.

rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
temporary solution pending something better [2]. It replaces a lot of
duplicated code found in testpmd and removes some of the maintenance burden
that developers tend to forget (me included) when modifying pattern
items or actions (updating app/test-pmd/config.c to be clear).

This series was unearthed in order to complete the implementation of
RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
duplicate existing code once again.

See individual patches for specific changes in this version.

v3 changes:

- Marked rte_flow_conv() as experimental, modified net/bonding accordingly.
- Fixed compilation issue on ARM.
- Removed deprecation notice.

v2 changes:

- rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
- Updated bonding PMD.
- No more automatic generation of rte_flow_conv.h.

[1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
[2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
[3] Currently the command-line parser (cmdline_flow.c) is aware of these
actions, however config.c isn't. Flow rules with such actions cannot
be created and cannot be validated with PMDs that implement them.

Adrien Mazarguil (7):
  ethdev: add flow API object converter
  ethdev: add flow API item/action name conversion
  app/testpmd: rely on flow API conversion function
  net/failsafe: switch to flow API object conversion function
  net/bonding: switch to flow API object conversion function
  ethdev: add missing items/actions to flow object converter
  ethdev: deprecate rte_flow_copy function

 app/test-pmd/config.c  | 407 +++
 app/test-pmd/testpmd.h |   7 +-
 doc/guides/prog_guide/rte_flow.rst |  20 +
 doc/guides/rel_notes/deprecation.rst   |   7 -
 drivers/net/bonding/Makefile   |   1 +
 drivers/net/bonding/meson.build|   1 +
 drivers/net/bonding/rte_eth_bond_api.c |   6 +-
 drivers/net/bonding/rte_eth_bond_flow.c|  31 +-
 drivers/net/bonding/rte_eth_bond_private.h |   5 +-
 drivers/net/failsafe/failsafe_ether.c  |   6 +-
 drivers/net/failsafe/failsafe_flow.c   |  31 +-
 drivers/net/failsafe/failsafe_private.h|   5 +-
 lib/librte_ethdev/rte_ethdev_version.map   |   1 +
 lib/librte_ethdev/rte_flow.c   | 666 ++--
 lib/librte_ethdev/rte_flow.h   | 231 +++-
 15 files changed, 886 insertions(+), 539 deletions(-)

-- 
2.11.0

[dpdk-dev] [PATCH v3 1/7] ethdev: add flow API object converter

2018-08-31 Thread Adrien Mazarguil

rte_flow_copy() is bound to duplicate flow rule descriptions (attributes,
pattern and list of actions, all at once), however applications sometimes
need more flexibility, for instance the ability to duplicate only one of
the underlying objects (a single pattern item or action) or retrieve other
properties such as their names.

Instead of adding dedicated functions to handle each possible use case,
this patch introduces rte_flow_conv(), which supports any number of object
conversion operations in an extensible manner.

This patch re-implements rte_flow_copy() as a wrapper to rte_flow_conv().

Signed-off-by: Adrien Mazarguil 
Cc: Thomas Monjalon 
Cc: Ferruh Yigit 
Cc: Andrew Rybchenko 
Cc: Gaetan Rivet 
--
v3 changes:

- Worked around compilation issue on ARM where rte_memcpy() is a macro that
  chokes on commas.

- Marked rte_flow_conv() as experimental.

v2 changes:

- Modified patch to keep rte_flow_copy() around instead of removing it
  entirely. Reworded commit log accordingly.

- Moved failsafe PMD changes to a subsequent patch.

- Re-implemented rte_flow_copy() as a wrapper to rte_flow_conv() to reduce
  code duplication.

- Tweaked semantics of rte_flow_conv() to return the required number of
  bytes regardless of the size parameter; a buffer not large enough is not
  considered to be an error anymore. This change removes the need for a
  "store" pass in underlying helper functions.

- Renamed and properly documented internal helper functions.
---
 doc/guides/prog_guide/rte_flow.rst   |  19 +
 lib/librte_ethdev/rte_ethdev_version.map |   1 +
 lib/librte_ethdev/rte_flow.c | 553 ++
 lib/librte_ethdev/rte_flow.h | 170 +++-
 4 files changed, 582 insertions(+), 161 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index b305a72a5..964cf9ceb 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2419,6 +2419,25 @@ This function initializes ``error`` (if non-NULL) with 
the provided
 parameters and sets ``rte_errno`` to ``code``. A negative error ``code`` is
 then returned.
 
+Object conversion
+~
+
+.. code-block:: c
+
+   int
+   rte_flow_conv(enum rte_flow_conv_op op,
+ void *dst,
+ size_t size,
+ const void *src,
+ struct rte_flow_error *error);
+
+Convert ``src`` to ``dst`` according to operation ``op``. Possible
+operations include:
+
+- Attributes, pattern item or action duplication.
+- Duplication of an entire pattern or list of actions.
+- Duplication of a complete flow rule description.
+
 Caveats
 ---
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map 
b/lib/librte_ethdev/rte_ethdev_version.map
index 38f117f01..2ee9173a1 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -239,6 +239,7 @@ EXPERIMENTAL {
rte_eth_dev_tx_offload_name;
rte_eth_switch_domain_alloc;
rte_eth_switch_domain_free;
+   rte_flow_conv;
rte_flow_expand_rss;
rte_mtr_capabilities_get;
rte_mtr_create;
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index cff4b5209..4fd6cfa76 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -288,26 +288,41 @@ rte_flow_error_set(struct rte_flow_error *error,
 }
 
 /** Pattern item specification types. */
-enum item_spec_type {
-   ITEM_SPEC,
-   ITEM_LAST,
-   ITEM_MASK,
+enum rte_flow_conv_item_spec_type {
+   RTE_FLOW_CONV_ITEM_SPEC,
+   RTE_FLOW_CONV_ITEM_LAST,
+   RTE_FLOW_CONV_ITEM_MASK,
 };
 
-/** Compute storage space needed by item specification and copy it. */
+/**
+ * Copy pattern item specification.
+ *
+ * @param[out] buf
+ *   Output buffer. Can be NULL if @p size is zero.
+ * @param size
+ *   Size of @p buf in bytes.
+ * @param[in] item
+ *   Pattern item to copy specification from.
+ * @param type
+ *   Specification selector for either @p spec, @p last or @p mask.
+ *
+ * @return
+ *   Number of bytes needed to store pattern item specification regardless
+ *   of @p size. @p buf contents are truncated to @p size if not large
+ *   enough.
+ */
 static size_t
-flow_item_spec_copy(void *buf, const struct rte_flow_item *item,
-   enum item_spec_type type)
+rte_flow_conv_item_spec(void *buf, const size_t size,
+   const struct rte_flow_item *item,
+   enum rte_flow_conv_item_spec_type type)
 {
-   size_t size = 0;
+   size_t off;
const void *data =
-   type == ITEM_SPEC ? item->spec :
-   type == ITEM_LAST ? item->last :
-   type == ITEM_MASK ? item->mask :
+   type == RTE_FLOW_CONV_ITEM_SPEC ? item->spec :
+   type == RTE_FLOW_CONV_ITEM_LAST ? item->last :
+   type == RTE_FLOW_CONV_ITEM_MASK ? item->

[dpdk-dev] [PATCH v3 2/7] ethdev: add flow API item/action name conversion

2018-08-31 Thread Adrien Mazarguil

This provides a means for applications to retrieve the name of flow pattern
items and actions.

Signed-off-by: Adrien Mazarguil 
Cc: Thomas Monjalon 
Cc: Ferruh Yigit 
Cc: Andrew Rybchenko 
--
v2 changes:

- Replaced rte_flow_conv_name_ptr() with extra is_ptr argument to
  rte_flow_conv_name() since both functions were almost identical.

- Properly documented internal helper functions.
---
 doc/guides/prog_guide/rte_flow.rst |  1 +
 lib/librte_ethdev/rte_flow.c   | 63 +
 lib/librte_ethdev/rte_flow.h   | 56 +
 3 files changed, 120 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 964cf9ceb..1b17f6e01 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2437,6 +2437,7 @@ operations include:
 - Attributes, pattern item or action duplication.
 - Duplication of an entire pattern or list of actions.
 - Duplication of a complete flow rule description.
+- Pattern item or action name retrieval.
 
 Caveats
 ---
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 4fd6cfa76..c3ff7e713 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "rte_ethdev.h"
 #include "rte_flow_driver.h"
 #include "rte_flow.h"
@@ -679,6 +680,60 @@ rte_flow_conv_rule(struct rte_flow_conv_rule *dst,
return off;
 }
 
+/**
+ * Retrieve the name of a pattern item/action type.
+ *
+ * @param is_action
+ *   Nonzero when @p src represents an action type instead of a pattern item
+ *   type.
+ * @param is_ptr
+ *   Nonzero to write string address instead of contents into @p dst.
+ * @param[out] dst
+ *   Destination buffer. Can be NULL if @p size is zero.
+ * @param size
+ *   Size of @p dst in bytes.
+ * @param[in] src
+ *   Depending on @p is_action, source pattern item or action type cast as a
+ *   pointer.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A positive value representing the number of bytes needed to store the
+ *   name or its address regardless of @p size on success (@p buf contents
+ *   are truncated to @p size if not large enough), a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+rte_flow_conv_name(int is_action,
+  int is_ptr,
+  char *dst,
+  const size_t size,
+  const void *src,
+  struct rte_flow_error *error)
+{
+   struct desc_info {
+   const struct rte_flow_desc_data *data;
+   size_t num;
+   };
+   static const struct desc_info info_rep[2] = {
+   { rte_flow_desc_item, RTE_DIM(rte_flow_desc_item), },
+   { rte_flow_desc_action, RTE_DIM(rte_flow_desc_action), },
+   };
+   const struct desc_info *const info = &info_rep[!!is_action];
+   unsigned int type = (uintptr_t)src;
+
+   if (type >= info->num)
+   return rte_flow_error_set
+   (error, EINVAL, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+"unknown object type to retrieve the name of");
+   if (!is_ptr)
+   return strlcpy(dst, info->data[type].name, size);
+   if (size >= sizeof(const char **))
+   *((const char **)dst) = info->data[type].name;
+   return sizeof(const char **);
+}
+
 /** Helper function to convert flow API objects. */
 int
 rte_flow_conv(enum rte_flow_conv_op op,
@@ -708,6 +763,14 @@ rte_flow_conv(enum rte_flow_conv_op op,
return rte_flow_conv_actions(dst, size, src, 0, error);
case RTE_FLOW_CONV_OP_RULE:
return rte_flow_conv_rule(dst, size, src, error);
+   case RTE_FLOW_CONV_OP_ITEM_NAME:
+   return rte_flow_conv_name(0, 0, dst, size, src, error);
+   case RTE_FLOW_CONV_OP_ACTION_NAME:
+   return rte_flow_conv_name(1, 0, dst, size, src, error);
+   case RTE_FLOW_CONV_OP_ITEM_NAME_PTR:
+   return rte_flow_conv_name(0, 1, dst, size, src, error);
+   case RTE_FLOW_CONV_OP_ACTION_NAME_PTR:
+   return rte_flow_conv_name(1, 1, dst, size, src, error);
}
return rte_flow_error_set
(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 1288e76ae..052ceefb6 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -2043,6 +2043,62 @@ enum rte_flow_conv_op {
 *   @code struct rte_flow_conv_rule * @endcode
 */
RTE_FLOW_CONV_OP_RULE,
+
+   /**
+* Convert item type to its name string.
+*
+* Writes a NUL-terminated string to @p dst. Like snprintf(), the
+* returned value excludes th

Re: [dpdk-dev] 18.08 build error on ppc64el - bool as vector type

2018-08-29 Thread Adrien Mazarguil

On Wed, Aug 29, 2018 at 10:27:03AM +0200, Christian Ehrhardt wrote:
> On Tue, Aug 28, 2018 at 5:02 PM Adrien Mazarguil 
> wrote:
> 
> > On Tue, Aug 28, 2018 at 02:38:35PM +0200, Christian Ehrhardt wrote:

> > > --- a/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
> > > +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h
> > > @@ -36,6 +36,14 @@
> > > #include 
> > > #include 
> > > /*To include altivec.h, GCC version must  >= 4.8 */
> > > +/*
> > > + * If built with std=c11 stdbool and altivec bool will conflict.
> > > + * The altivec bool type is not needed at the moment, to avoid the
> > conflict
> > > + * define __APPLE_ALTIVEC__ so that the conflict will not happen.
> > > + */
> > > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L &&
> > > !defined(__APPLE_ALTIVEC__)
> > > +#define __APPLE_ALTIVEC__
> > > +#endif
> > > #include 
> > >
> > > #ifdef __cplusplus
> > >
> > > But it turned out we are not allowed to switch of other things as vector
> > > (and probably some more code than the type) is actually used:
> > > With your suggestion or mine above it will break on:
> > >
> > > x5.o -c /home/ubuntu/deb_dpdk/drivers/net/mlx5/mlx5.c
> > > In file included from
> > /home/ubuntu/deb_dpdk/drivers/net/mlx5/mlx5_prm.h:21,
> > > from
> > /home/ubuntu/deb_dpdk/drivers/net/mlx5/mlx5_rxtx.h:37,
> > > from /home/ubuntu/deb_dpdk/drivers/net/mlx5/mlx5.h:36,
> > > from /home/ubuntu/deb_dpdk/drivers/net/mlx5/mlx5.c:42:
> > > /home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_vect.h:43:15:
> > error:
> > > expected ‘;’ before ‘signed’
> > > typedef vector signed int xmm_t;
> > >   ^~~
> > >   ;
> > > /home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_vect.h:49:2:
> > error:
> > > expected specifier-qualifier-list before ‘xmm_t’
> > >  xmm_tx;
> > >  ^
> > >
> > > I have no much better suggestion for the ordering issue that you raised.
> > > To test what would happen I moved the stdbool include after all other
> > > includes in drivers/net/mlx5/mlx5_nl.c
> > > I also moved mlx5.h (which eventually brings in altivec) right at the
> > top.
> > > This works to build, but such a check is always subtle as one of the
> > other
> > > includes might have pulled in stdbool before altivec still.
> > > For a bit of confidence I picked said gcc call and ran it with -E.
> > > The output suggests altivec really was included before stdbool.
> >
> > How about making altivec.h users (rte_vect.h and rte_memcpy.h) rely on
> > "__vector" directly instead of the "vector" macro to make it transparent
> > for
> > others then?
> >
> > I think we can assume they have internal knowledge of this file in order to
> > deal with __APPLE_ALTIVEC__ anyway.
> >
> 
> While "pushing the internal knowledge out to users" sounds right at first.
> There are far too many IMHO, the change would be huge unclean and messy.
> 
> $ grep -Hrn altivec.h
> drivers/net/i40e/i40e_rxtx_vec_altivec.c:45:#include 
> examples/l3fwd/l3fwd_lpm.c:165:#include "l3fwd_lpm_altivec.h"
> examples/l3fwd/l3fwd_lpm_altivec.h:10:#include "l3fwd_altivec.h"
> MAINTAINERS:239:F: examples/l3fwd/*altivec.h
> lib/librte_acl/acl_run_altivec.c:34:#include "acl_run_altivec.h"
> lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h:49:/*To include
> altivec.h, GCC version must  >= 4.8 */
> lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h:50:#include <
> altivec.h>
> lib/librte_eal/common/include/arch/ppc_64/rte_vect.h:36:#include 
> 
> lib/librte_lpm/meson.build:9:headers += files('rte_lpm_altivec.h',
> 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> lib/librte_lpm/Makefile:28:SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include +=
> rte_lpm_altivec.h
> lib/librte_lpm/rte_lpm.h:461:#include "rte_lpm_altivec.h"

I'd still like to give it a try given only knwon users of AltiVec code may
rely on these vector/pixel/bool definitions. Scope should be quite small.

The root issue we need to address is that DPDK applications may
involuntarily pull altivec.h by including something unrelated (rte_memcpy.h)
and get unwanted bool/vector/pixel macros polluting their namespace and
breaking things.

> > Also I would suggest not to make this workaround C11-only. I suspect the

Re: [dpdk-dev] 18.08 build error on ppc64el - bool as vector type

2018-08-28 Thread Adrien Mazarguil

On Tue, Aug 28, 2018 at 02:38:35PM +0200, Christian Ehrhardt wrote:
> On Tue, Aug 28, 2018 at 1:44 PM Adrien Mazarguil 
> wrote:
> 
> > On Tue, Aug 28, 2018 at 01:30:12PM +0200, Christian Ehrhardt wrote:
> > > On Mon, Aug 27, 2018 at 2:22 PM Adrien Mazarguil <
> > adrien.mazarg...@6wind.com>
> > > wrote:
> > >
> > > > Hi Christian,
> > > >
> > > > On Wed, Aug 22, 2018 at 05:11:41PM +0200, Christian Ehrhardt wrote:
> > > > > Just FYI the simple change hits similar issues later on.
> > > > >
> > > > > The (not really) proposed patch would have to be extended to be as
> > > > > following.
> > > > > We really need a better solution (or somebody has to convince me
> > that my
> > > > > change is better than a band aid).
> > > >
> > > > Thanks for reporting. I've made a quick investigation on my own and
> > believe
> > > > it's a toolchain issue which may affect more than this PMD;
> > potentially all
> > > > users of stdbool.h (C11) on this platform.
> > > >
> > >
> > > Yeah I assumed as much, which is why I was hoping that some of the arch
> > > experts would jump in and say "yeah this is a common thing and correctly
> > > handled like "
> > > I'll continue trying to reach out to people that should know better still
> > > ...
> > >
> > >
> > > > C11's stdbool.h defines a bool macro as _Bool (big B) along with
> > > > true/false. On PPC targets, another file (altivec.h) defines bool as
> > _bool
> > > > (small b) but not true/false:
> > > >
> > > >  #if !defined(__APPLE_ALTIVEC__)
> > > >  /* You are allowed to undef these for C++ compatibility.  */
> > > >  #define vector __vector
> > > >  #define pixel __pixel
> > > >  #define bool __bool
> > > >  #endif
> > > >
> > > > mlx5_nl.c explicitly includes stdbool.h to get the above definitions
> > then
> > > > includes mlx5.h -> rte_ether.h -> ppc_64/rte_memcpy.h -> altivec.h.
> > > >
> > > > For some reason the conflicting bool redefinition doesn't seem to
> > raise any
> > > > warnings, but results in mismatching bool and true/false definitions;
> > an
> > > > integer value cannot be assigned to a bool variable anymore, hence the
> > > > build
> > > > failure.
> > > >
> > > > The inability to assign integer values to bool is, in my opinion, a
> > > > fundamental issue caused by altivec.h. If there is no way to fix this
> > on
> > > > the
> > > > system, there are a couple of workarounds for DPDK, by order of
> > preference:
> > > >
> > > > 1. Always #undef bool after including altivec.h in
> > > >lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h. I do not
> > think
> > > >anyone expects this type to be unusable with true/false or integer
> > > > values
> > > >anyway. The version of altivec.h I have doesn't rely on this macro
> > at
> > > >all so it's probably not a big loss.
> > > >
> > >
> > > The undef of a definition in header A by hedaer B can lead to most
> > > interesting, still broken effects.
> > > If e.g. one does
> > > #include 
> > > #include "mlx5.h"
> > >
> > > or similar then it would undefine that of stdbool as well right?
> > > In any case, the undefine not only would be suspicious it also fails
> > right
> > > away:
> > >
> > > In file included from
> > > /home/ubuntu/deb_dpdk/lib/librte_eal/common/malloc_heap.c:27:
> > > /home/ubuntu/deb_dpdk/lib/librte_eal/common/eal_memalloc.h:30:15:
> > > error: unknown
> > > type name ‘bool’; did you mean ‘_Bool’?
> > >   int socket, bool exact);
> > >   ^~~~
> > >   _Bool
> > > [...]
> > >
> > >
> > >
> > > >Ditto for "pixel" and "vector" keywords. Alternatively you could
> > #define
> > > >__APPLE_ALTIVEC__ before including altivec.h to prevent them from
> > > > getting
> > > >defined in the first place.
> > > >
> > >
> > > Interesting I got plenty of these:
> > > In file included from
> > > /home/ubuntu/

Re: [dpdk-dev] 18.08 build error on ppc64el - bool as vector type

2018-08-28 Thread Adrien Mazarguil

On Tue, Aug 28, 2018 at 01:30:12PM +0200, Christian Ehrhardt wrote:
> On Mon, Aug 27, 2018 at 2:22 PM Adrien Mazarguil 
> wrote:
> 
> > Hi Christian,
> >
> > On Wed, Aug 22, 2018 at 05:11:41PM +0200, Christian Ehrhardt wrote:
> > > Just FYI the simple change hits similar issues later on.
> > >
> > > The (not really) proposed patch would have to be extended to be as
> > > following.
> > > We really need a better solution (or somebody has to convince me that my
> > > change is better than a band aid).
> >
> > Thanks for reporting. I've made a quick investigation on my own and believe
> > it's a toolchain issue which may affect more than this PMD; potentially all
> > users of stdbool.h (C11) on this platform.
> >
> 
> Yeah I assumed as much, which is why I was hoping that some of the arch
> experts would jump in and say "yeah this is a common thing and correctly
> handled like "
> I'll continue trying to reach out to people that should know better still
> ...
> 
> 
> > C11's stdbool.h defines a bool macro as _Bool (big B) along with
> > true/false. On PPC targets, another file (altivec.h) defines bool as _bool
> > (small b) but not true/false:
> >
> >  #if !defined(__APPLE_ALTIVEC__)
> >  /* You are allowed to undef these for C++ compatibility.  */
> >  #define vector __vector
> >  #define pixel __pixel
> >  #define bool __bool
> >  #endif
> >
> > mlx5_nl.c explicitly includes stdbool.h to get the above definitions then
> > includes mlx5.h -> rte_ether.h -> ppc_64/rte_memcpy.h -> altivec.h.
> >
> > For some reason the conflicting bool redefinition doesn't seem to raise any
> > warnings, but results in mismatching bool and true/false definitions; an
> > integer value cannot be assigned to a bool variable anymore, hence the
> > build
> > failure.
> >
> > The inability to assign integer values to bool is, in my opinion, a
> > fundamental issue caused by altivec.h. If there is no way to fix this on
> > the
> > system, there are a couple of workarounds for DPDK, by order of preference:
> >
> > 1. Always #undef bool after including altivec.h in
> >lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h. I do not think
> >anyone expects this type to be unusable with true/false or integer
> > values
> >anyway. The version of altivec.h I have doesn't rely on this macro at
> >all so it's probably not a big loss.
> >
> 
> The undef of a definition in header A by hedaer B can lead to most
> interesting, still broken effects.
> If e.g. one does
> #include 
> #include "mlx5.h"
> 
> or similar then it would undefine that of stdbool as well right?
> In any case, the undefine not only would be suspicious it also fails right
> away:
> 
> In file included from
> /home/ubuntu/deb_dpdk/lib/librte_eal/common/malloc_heap.c:27:
> /home/ubuntu/deb_dpdk/lib/librte_eal/common/eal_memalloc.h:30:15:
> error: unknown
> type name ‘bool’; did you mean ‘_Bool’?
>   int socket, bool exact);
>   ^~~~
>   _Bool
> [...]
> 
> 
> 
> >Ditto for "pixel" and "vector" keywords. Alternatively you could #define
> >__APPLE_ALTIVEC__ before including altivec.h to prevent them from
> > getting
> >defined in the first place.
> >
> 
> Interesting I got plenty of these:
> In file included from
> /home/ubuntu/deb_dpdk/lib/librte_eal/common/eal_common_options.c:25:
> /home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_memcpy.h:39:
> warning:
> "__APPLE_ALTIVEC__" redefined
> #define __APPLE_ALTIVEC__
> 
> With a few of it being even errors, but the position of the original define
> is interesting.
> /home/ubuntu/deb_dpdk/debian/build/static-root/include/rte_memcpy.h:39: error:
> "__APPLE_ALTIVEC__" redefined [-Werror]
> #define __APPLE_ALTIVEC__
> : note: this is the location of the previous definition
> 
> So if being a built-in, shouldn't it ALWAYS be defined and never
> over-declare the bool type?
> 
> Checking GCC on the platform:
> $ gcc -dM -E - < /dev/null | grep ALTI
> #define __ALTIVEC__ 1
> #define __APPLE_ALTIVEC__ 1
> 
> 
> I added an #error in the header and dropped all dpdk changes.
> if !defined(__APPLE_ALTIVEC__)
> /* You are allowed to undef these for C++ compatibility.  */
> #error WOULD REDECLARE BOOL
> #define vector __vector
> 
> And I get:
> gcc -Wp,-MD,./.mlx4.o.d.tmp -Wdate-time -D_FORTIFY_SOURCE=2 -m64 -pthread
>   -DRTE_MACHINE_CPUFLAG_PPC64 -DRTE_MA

Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter

2018-08-27 Thread Adrien Mazarguil

On Thu, Aug 23, 2018 at 02:48:37PM +0100, Ferruh Yigit wrote:
> On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> > This is a follow up to the "Flow API helpers enhancements" series submitted
> > almost a year ago [1]. The new title is due to the reduced scope of this
> > version.
> > 
> > rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> > temporary solution pending something better [2]. It replaces a lot of
> > duplicated code found in testpmd and removes some of the maintenance burden
> > that developers tend to forget (me included) when modifying pattern
> > item or actions (updating app/test-pmd/config.c to be clear).
> > 
> > This series was unearthed in order to complete the implementation of
> > RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> > duplicate existing code once again.
> > 
> > See individual patches for specific changes in this version.
> > 
> > v2 changes:
> > 
> > - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> > - Updated bonding PMD.
> > - No more automatic generation of rte_flow_conv.h.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> > [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> > [3] Currently the command-line parser (cmdline_flow.c) is aware of these
> > actions, however config.c isn't. Flow rules with such actions cannot
> > be created and cannot be validated with PMDs that implement them.
> > 
> > Adrien Mazarguil (7):
> >   ethdev: add flow API object converter
> >   ethdev: add flow API item/action name conversion
> >   app/testpmd: rely on flow API conversion function
> >   net/failsafe: switch to flow API object conversion function
> >   net/bonding: switch to flow API object conversion function
> >   ethdev: deprecate rte_flow_copy function
> >   ethdev: add missing item/actions to flow object converter
> 
> Patch needs to be rebased to target v18.11 (in map file),

Right, will do it for v3.

> and indeed new APIs
> (rte_flow_conv) needs to be experimental.

This is what I did at first. Problem is that experimental APIs cannot be
used in internal code without triggering a compilation error unless
ALLOW_EXPERIMENTAL_API is defined (bonding cannot rely on an API marked as
experimental).

Since this series reimplements rte_flow_copy() as a wrapper to
rte_flow_conv(), I thought it didn't make sense for internal code to keep
using the former either.

Considering this, shall I add -DDALLOW_EXPERIMENTAL_API to bonding PMD or
keep things not experimental?

> And needs to remove deprecation notice in this patchset.

Doesn't it make sense to deprecate this function immediately after providing
a replacement on top of which it is reimplemented? Users end up using the
new function whether they want it or not. I don't think maintaining the
old duplicated code around is the right thing to do either.

> Also do you think does make sense to announce this change in release notes?

I'm not sure it's worth a release note. It's a rather obscure helper
function part of rte_flow. We didn't do it for rte_flow_copy() for
instance. Please confirm if you think it's needed.

> Apart from above, any volunteer for reviewing actual implementation?

I hope Gaetan will take a look, he added rte_flow_copy() after all :)

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter

2018-08-27 Thread Adrien Mazarguil

On Fri, Aug 24, 2018 at 11:58:39AM +0100, Ferruh Yigit wrote:
> On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> > This is a follow up to the "Flow API helpers enhancements" series submitted
> > almost a year ago [1]. The new title is due to the reduced scope of this
> > version.
> > 
> > rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> > temporary solution pending something better [2]. It replaces a lot of
> > duplicated code found in testpmd and removes some of the maintenance burden
> > that developers tend to forget (me included) when modifying pattern
> > item or actions (updating app/test-pmd/config.c to be clear).
> > 
> > This series was unearthed in order to complete the implementation of
> > RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> > duplicate existing code once again.
> > 
> > See individual patches for specific changes in this version.
> > 
> > v2 changes:
> > 
> > - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> > - Updated bonding PMD.
> > - No more automatic generation of rte_flow_conv.h.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> > [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> > [3] Currently the command-line parser (cmdline_flow.c) is aware of these
> > actions, however config.c isn't. Flow rules with such actions cannot
> > be created and cannot be validated with PMDs that implement them.
> > 
> > Adrien Mazarguil (7):
> >   ethdev: add flow API object converter
> >   ethdev: add flow API item/action name conversion
> >   app/testpmd: rely on flow API conversion function
> >   net/failsafe: switch to flow API object conversion function
> >   net/bonding: switch to flow API object conversion function
> >   ethdev: deprecate rte_flow_copy function
> >   ethdev: add missing item/actions to flow object converter
> 
> Causing build error for arm, it looks like related to rte_memcpy macro:
> 
> .../lib/librte_ethdev/rte_flow.c: In function ‘rte_flow_conv_item_spec’:
> .../lib/librte_ethdev/rte_flow.c:373:58: error: macro "rte_memcpy" passed 9
> arguments, but takes just 3
>(size > sizeof(*dst.raw) ? sizeof(*dst.raw) : size));

Thanks, noticed it after sending v2. I'll fix it for v3.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] RTE-FLOW: PF vs PHY_PORT

2018-08-27 Thread Adrien Mazarguil

Hi Vivek,

On Wed, Aug 22, 2018 at 05:16:52PM +0530, Vivek Sharma wrote:
> Hi Devs,
> 
> I am trying to enable RTE-FLOW support on one of our platforms & having hard 
> time in figuring out PF vs PHY_PORT differences and DPDK rationale for 
> introducing these two distinct identities. 
> 
> Rte-Flow distinguishes between RTE_FLOW_ITEM_TYPE_PF & 
> RTE_FLOW_ITEM_TYPE_PHY_PORT and
> 
>RTE_FLOW_ACTION_TYPE_PF & 
> RTE_FLOW_ACTION_TYPE_PHY_PORT.
> 
> 
> I am finding it difficult to justify the presence of both these types, when 
> functionality & implementation wise, these look quite similar. I would really 
> appreciate if you could illustrate the differences between above item & 
> action types by taking some hardware/platform as reference.

Some devices, typically those with a single PCI bus address shared for all
ports (e.g. Mellanox ConnectX-3) expose all their physical ports to each
PF/VF instance [1], not the other way around. With these, PHY_PORT item and
action give the ability to select a nondefault physical port in a flow rule.

PHY_PORT cannot be specified on most devices with PF/VF dedicated to
physical ports, although their drivers should at least recognize 0 as a
supported index and ignore it.

Since devices can expose any number of PF/VF instances and physical ports,
this gives applications the ability to use both as matching criteria and/or
action target.

A higher level alternative to PHY_PORT and PF/VF items/actions is PORT_ID to
match/target DPDK port IDs, which users may find more convenient. One
drawback is that it only works with devices instantiated within DPDK.

PF/VF and PHY_PORT should be reserved for corner cases where PORT_ID cannot
be used. My advice is to implement PORT_ID and not bother with the others
since port IDs are what applications are familiar with.

[1] Although with CX3, individual ports can be disabled per VF, they remain
"seen" by each instance.

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] 18.08 build error on ppc64el - bool as vector type

2018-08-27 Thread Adrien Mazarguil

t/mlx5/mlx5_nl_flow.c
> +++ b/drivers/net/mlx5/mlx5_nl_flow.c
> @@ -385,11 +385,11 @@ mlx5_nl_flow_transpose(void *buf,
>const struct rte_flow_action *action;
>unsigned int n;
>uint32_t act_index_cur;
> -   bool in_port_id_set;
> -   bool eth_type_set;
> -   bool vlan_present;
> -   bool vlan_eth_type_set;
> -   bool ip_proto_set;
> +   int in_port_id_set;
> +   int eth_type_set;
> +   int vlan_present;
> +   int vlan_eth_type_set;
> +   int ip_proto_set;
>struct nlattr *na_flower;
>struct nlattr *na_flower_act;
>struct nlattr *na_vlan_id;
> @@ -404,11 +404,11 @@ init:
>action = actions;
>n = 0;
>act_index_cur = 0;
> -   in_port_id_set = false;
> -   eth_type_set = false;
> -   vlan_present = false;
> -   vlan_eth_type_set = false;
> -   ip_proto_set = false;
> +   in_port_id_set = 0;
> +   eth_type_set = 0;
> +   vlan_present = 0;
> +   vlan_eth_type_set = 0;
> +   ip_proto_set = 0;
>na_flower = NULL;
>na_flower_act = NULL;
>na_vlan_id = NULL;
> 
> 
> On Tue, Aug 21, 2018 at 4:19 PM Christian Ehrhardt <
> christian.ehrha...@canonical.com> wrote:
> 
> > Hi,
> > Debian and Ubuntu face a build error with 18.08 on ppc64el.
> > It looks like that:
> >
> > Full log:
> >
> > https://buildd.debian.org/status/fetch.php?pkg=dpdk&arch=ppc64el&ver=18.08-1&stamp=1534520196&raw=0
> >
> > /<>/drivers/net/mlx5/mlx5_nl.c: In function
> > 'mlx5_nl_switch_info_cb':
> > /<>/drivers/net/mlx5/mlx5_nl.c:837:23: error: incompatible
> > types when initializing type '__vector __bool int' {aka '__vector(4) __bool
> > int'} using type 'int'
> >   bool port_name_set = false;
> >^
> > /<>/drivers/net/mlx5/mlx5_nl.c:838:23: error: incompatible
> > types when initializing type '__vector __bool int' {aka '__vector(4) __bool
> > int'} using type 'int'
> >   bool switch_id_set = false;
> >^
> > /<>/drivers/net/mlx5/mlx5_nl.c:857:18: error: incompatible
> > types when assigning to type '__vector __bool int' {aka '__vector(4) __bool
> > int'} from type 'int'
> > port_name_set = true;
> >   ^
> > /<>/drivers/net/mlx5/mlx5_nl.c:865:18: error: incompatible
> > types when assigning to type '__vector __bool int' {aka '__vector(4) __bool
> > int'} from type 'int'
> > switch_id_set = true;
> >   ^
> > /<>/drivers/net/mlx5/mlx5_nl.c:870:16: error: used vector
> > type where scalar is required
> >   info.master = switch_id_set && !port_name_set;
> > ^
> > /<>/drivers/net/mlx5/mlx5_nl.c:870:33: error: wrong type
> > argument to unary exclamation mark
> >   info.master = switch_id_set && !port_name_set;
> >  ^
> > /<>/drivers/net/mlx5/mlx5_nl.c:871:21: error: used vector
> > type where scalar is required
> >   info.representor = switch_id_set && port_name_set;
> >
> >
> > Now I checked and the reason seems to be some combination of altivec and
> > MLX headers and the use of bool - probably stdbool vs altivec bool.
> >
> > If built with gcc -E I see it the bool variables become:
> >__attribute__((altivec(bool__))) unsigned port_name_set =
> >
> > I have found a strawmans approach to it, but I'm sure people with
> > experience on the matter will come up with something better.
> >
> > My current change looks like that and would work:
> > $ git diff
> > diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
> > index d61826aea..2cc8f49c5 100644
> > --- a/drivers/net/mlx5/mlx5_nl.c
> > +++ b/drivers/net/mlx5/mlx5_nl.c
> > @@ -834,8 +834,8 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > *arg)
> >.switch_id = 0,
> >};
> >size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> > -   bool port_name_set = false;
> > -   bool switch_id_set = false;
> > +   int port_name_set = 0;
> > +   int switch_id_set = 0;
> >
> >if (nh->nlmsg_type != RTM_NEWLINK)
> >goto error;
> > @@ -854,7 +854,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > *arg)
> >if (errno ||
> >(size_t)(end - (char *)payload) !=
> > strlen(payload))
> >goto error;
> > -   port_name_set = true;
> > +   port_name_set = 1;
> >break;
> >case IFLA_PHYS_SWITCH_ID:
> >info.switch_id = 0;
> > @@ -862,7 +862,7 @@ mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void
> > *arg)
> >info.switch_id <<= 8;
> >info.switch_id |= ((uint8_t *)payload)[i];
> >}
> > -   switch_id_set = true;
> > +   switch_id_set = 1;
> >break;
> >}
> >off += RTA_ALIGN(ra->rta_len);

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH] net/mlx5: fix artificial L4 limitation on switch flow rules

2018-08-06 Thread Adrien Mazarguil

Partial bit-masks are in fact supported on TCP/UDP source/destination
ports. Remove unnecessary check.

Fixes: 2bfc777e07 ("net/mlx5: add L2-L4 pattern items to switch flow rules")

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_nl_flow.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_nl_flow.c b/drivers/net/mlx5/mlx5_nl_flow.c
index a1c8c340b..beb03c911 100644
--- a/drivers/net/mlx5/mlx5_nl_flow.c
+++ b/drivers/net/mlx5/mlx5_nl_flow.c
@@ -800,16 +800,6 @@ mlx5_nl_flow_transpose(void *buf,
}
spec.tcp = item->spec;
if ((mask.tcp->hdr.src_port &&
-mask.tcp->hdr.src_port != RTE_BE16(0x)) ||
-   (mask.tcp->hdr.dst_port &&
-mask.tcp->hdr.dst_port != RTE_BE16(0x)))
-   return rte_flow_error_set
-   (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
-mask.tcp,
-"no support for partial masks on"
-" \"hdr.src_port\" and \"hdr.dst_port\""
-" fields");
-   if ((mask.tcp->hdr.src_port &&
 (!mnl_attr_put_u16_check(buf, size,
  TCA_FLOWER_KEY_TCP_SRC,
  spec.tcp->hdr.src_port) ||
@@ -847,16 +837,6 @@ mlx5_nl_flow_transpose(void *buf,
}
spec.udp = item->spec;
if ((mask.udp->hdr.src_port &&
-mask.udp->hdr.src_port != RTE_BE16(0x)) ||
-   (mask.udp->hdr.dst_port &&
-mask.udp->hdr.dst_port != RTE_BE16(0x)))
-   return rte_flow_error_set
-   (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM_MASK,
-mask.udp,
-"no support for partial masks on"
-" \"hdr.src_port\" and \"hdr.dst_port\""
-" fields");
-   if ((mask.udp->hdr.src_port &&
 (!mnl_attr_put_u16_check(buf, size,
  TCA_FLOWER_KEY_UDP_SRC,
  spec.udp->hdr.src_port) ||
-- 
2.11.0

Re: [dpdk-dev] [PATCH v2] net/tap: fix zeroed flow mask configurations

2018-08-06 Thread Adrien Mazarguil

On Mon, Aug 06, 2018 at 10:58:47AM +, Matan Azrad wrote:
> The rte_flow meaning of zero flow mask configuration is to match all
> the range of the item value.
> For example, the flow eth / ipv4 dst spec 1.2.3.4 dst mask 0.0.0.0
> should much all the ipv4 traffic from the rte_flow API perspective.
> 
> From some kernel perspectives the above rule means to ignore all the
> ipv4 traffic (e.g. Ubuntu 16.04, 4.15.10).
> 
> Due to the fact that the tap PMD should provide the rte_flow meaning,
> it is necessary to ignore the spec in case the mask is zero when it
> forwards such like flows to the kernel.
> So, the above rule should be translated to eth / ipv4 to get the
> correct meaning.
> 
> Ignore spec configurations when the mask is zero.
> 
> Fixes: de96fe68ae95 ("net/tap: add basic flow API patterns and actions")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Matan Azrad 
> ---
>  drivers/net/tap/tap_flow.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> V2:
> Address Adrien comments to fix also the spec=0 check.

Thanks,

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

Re: [dpdk-dev] [PATCH] net/tap: fix zeroed flow mask configurations

2018-08-06 Thread Adrien Mazarguil

On Sun, Aug 05, 2018 at 06:10:55AM +, Matan Azrad wrote:
> Hi Adrien
> 
> From: Adrien Mazarguil
> > Hi Matan,
> > 
> > On Thu, Aug 02, 2018 at 05:52:18PM +, Matan Azrad wrote:
> > > Hi Adrien
> > >
> > > From: Adrien Mazarguil
> > > > On Thu, Aug 02, 2018 at 10:33:00AM +, Matan Azrad wrote:
> > > > > The rte_flow meaning of zero flow mask configuration is to match
> > > > > all the range of the item value.
> > > > > For example, the flow eth / ipv4 dst spec 1.2.3.4 dst mask 0.0.0.0
> > > > > should much all the ipv4 traffic from the rte_flow API perspective.
> > > > >
> > > > > From some kernel perspectives the above rule means to ignore all
> > > > > the
> > > > > ipv4 traffic (e.g. Ubuntu 16.04, 4.15.10).
> > > > >
> > > > > Due to the fact that the tap PMD should provide the rte_flow
> > > > > meaning, it is necessary to ignore the spec in case the mask is
> > > > > zero when it forwards such like flows to the kernel.
> > > > > So, the above rule should be translated to eth / ipv4 to get the
> > > > > correct meaning.
> > > > >
> > > > > Ignore spec configurations when the mask is zero.
> > > >
> > > > I would go further, one should be able to match IP address 0.0.0.0 for
> > instance.
> > > > The PMD should only trust the mask on all fields without looking at 
> > > > spec.
> > >
> > > The PMD should convert the RTE flow API to the device configuration,
> > > So I can think on scenarios that the PMD should look on spec.
> > 
> > Obviously the PMD needs to take spec into account. What I meant is that for
> > each field, spec must be taken into account according to mask only.
> > 
> > For any given field, when mask is empty, don't look at spec, it's like a 
> > wildcard.
> > When mask is full, take spec as is, even if spec only contains zeroed bits.
> > 
> > User intent in that case is to match a zero value exactly, so it must not 
> > result in
> > a wildcard match. If supported, when mask is partial, masked bits are also
> > matched exactly, even if these turn out to be a zero value. Unmasked bits 
> > are
> > considered wildcards.
> > 
> 
> Yes I understand your point Adrien, but I mean that maybe sometimes some spec 
> values should be converted to another spec values to get the correct 
> translation of rte_flow for a special device.
> 
> Here, maybe IP_spec=0.0.0.0 is a special case that should be taken into 
> account, so we must validate what's happen in Tap for this case to apply your 
> suggestion too, Maybe there was some intentions for spec=0 cases from the 
> current code author.

I understand that's a lot of maybes :)

I've checked the code and I'am sure it's a mistake made by the original
author. See tap_flow_create_eth() for instance:

 if (!is_zero_ether_addr(&spec->dst)) {

Followed by:

 if (!is_zero_ether_addr(&mask->src))

This lack of consistency doesn't make any sense, it cannot be on purpose.

To my credentials I wrote a very similar code which uses TC flower in mlx5
and relies on mask (only) in order to retrieve spec. Have a look at
drivers/net/mlx5/mlx5_nl_flow.c. I validated that traffic where addresses
were all zeroes could be successfully matched.

-- 
Adrien Mazarguil
6WIND

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1899 matches

Mail list logo