from:"Vladimir Medvedkin"

[dpdk-dev] purifier

2016-07-19 Thread Vladimir Medvedkin

Hi all!

I decided to open my project's code. It is fast scalable transparent
statefull firewall which helps to mitigate transport layer DDoS attacks in
datacenter environment.

https://github.com/medvedv/purifier

It was written with dpdk1.7.1 and I hope to rewrite it with the current
stable version of DPDK soon.

-- 
Regards,
Vladimir

[dpdk-dev] perfomance of rte_lpm rule subsystem

2016-04-20 Thread Vladimir Medvedkin

Hi Alexander,

Why next_hop is 64 bit long?


2016-04-19 18:46 GMT+03:00 Stephen Hemminger :

> On Tue, 19 Apr 2016 14:11:11 +0300
> ? ???  wrote:
>
> > Hi.
> >
> > Doing some test with rte_lpm (adding/deleting bgp full table rules) I
> > noticed that
> > rule subsystem is very slow even considering that probably it was never
> > designed for using
> > in a data forwarding plane. So I want to propose some changes to the
> "rule"
> > subsystem.
> >
> > I reimplemented rule part ot the lib using rte_hash, and perfomance of
> > adding/deleted routes have increased dramatically.
> > If increasing speed of adding deleting routes makes sence for anybody
> else
> > I would like to discuss my patch.
> > The patch also include changes that make next_hop 64 bit, so please just
> > ignore them. The rule changes are in the following
> > functions only:
> >
> > rte_lpm2_create
> >
> > rule_find
> > rule_add
> > rule_delete
> > find_previous_rule
> > delete_depth_small
> > delete_depth_big
> >
> > rte_lpm2_add
> > rte_lpm2_delete
> > rte_lpm2_is_rule_present
> > rte_lpm2_delete_all
> >
>
> We forked LPM back several versions ago.
> I sent the patches to use BSD red-black tree for rules but the patches were
> ignored. mostly because it broke ABI.
>



-- 
Regards,
Vladimir

[dpdk-dev] lpm patches

2015-10-30 Thread Vladimir Medvedkin

Hi all,

2015-10-30 20:59 GMT+03:00 Matthew Hall :

> On Fri, Oct 30, 2015 at 12:00:18PM +, Bruce Richardson wrote:
> > Matthew's patches were attachments, I don't think they came through in
> patchwork
> > correctly :-(, but that is the relevant link there anyway.]
>
> Let me know if there is something I can do better there. I was having a
> difficult time figuring out how to preserve the thread ID in the middle of
> the
> thread and not cause a new thread. The git email workflows are very
> confusing
> and I figured it was better to send something as soon as I could.
>
> > * Some patches increase the next-hop to 16 bits, others to 24-bits. In
> both cases
> >   a single entry still only occupies 32-bits, so can be read/written to
> >   atomically
>
> I went with 24 because it was the biggest amount I could get that still had
> this property.
>
> > * Only Michal's set appears to take into account ABI versioning, which is
> >   a difficult problem for this lib, with inlined functions.
>
> Agreed. His patches are the most professional from this perspective. This
> is
> why I was trying to contribute to you and to him so we get the most
> professional result for the customers.
>
> > * Matthew's patchset moves the lookup functions to be non-inlined, which
> will
> >   make future updates easier from ABI compatibility - at the cost of
> lookup
> >   performance.
>
> This point is optional for me. I did it, because without it, it was totally
> impossible for me to work on the code in a debugger as I am a security
> engineering guy not a crazy embedded C coder or kernel hacker.
>
> > * Vladimir's patchset merges in the tbl24 and tbl8 entries into a single
> data
> >   type.
>
> I really liked this feature of Vladimir's patches, it makes it easier to
> maintain and less confusing. I had a lot of headaches keeping all those
> structs straight with the separate types, but I didn't know we had the
> chance
> for a great big MEGA-REFACTOR. I love this community!
>
> > * That patchset also introduces an extra optional 32-bit field "as_num",
> allowing
> >   64-bit lpm table entries - obviously at a cost of increased
> memory/cache
> >   footprint.
>
> Is there a way we could test it? Vladimir, did you test the performance? If
> so, what happened?
>
Performance regression depends on the traffic pattern and cache size.

>
> > * Stephen's patchset includes a range of other fixes e.g. for more
> efficient
> >   management of the rules array, and dynamic allocation of the TBL8s.
> > * Matthew's patchset also includes change to LPM for IPv6, which I'm
> considering
> >   out-of-scope for now, so as to focus on LPM v4 only.
>
> Any chance that is inconsistent betwen LPM4 and LPM6 really hoses me,
> because
> I am writing green-field code which treats both protocols as first-class
> citizens and I'd really not like to have totally inconsistent and inferior
> support in one versus the other.
>
> > * Increase next hops to be the full 24 bits, so as to allow maximum
> flexibility
> >   and not waste the extra 8 bits of space in the 32-bit entries.
>
> +1
>
+1. Split of next hop and forwarding class I can do in app logic.

>
> > * Move the lookup functions which work on multiple packets to be
> non-inlined
>
> Open to opinions on the performance of this. I am not an expert on this
> area.
>
> > * Merge in the tbl24 and tbl8 structures to make the code that little
> bit shorter
>
> +1
>
> > * Look to pull in as many of Stephen's other improvements as possible -
> though
> >   this may be in a separate patchset to the other changes.
>
> +1. Perhaps if we get a pre-release on a branch with everything else, we
> could
> see if Stephen is willing to rebase his non-duplicate changes.
>
> > * I'm uncertain as to the extra 32-bit as_num field. Adding it as an
> extra
> >   #define is trivial, but adds to the compile-time config. Having it as
> a run-time
> >   option is possible, but likely will make the code a lot more
> complicated, as
> >   we no longer have arrays of a fixed size.
> >
> > Naturally, with whatever solution is come up with, ABI compatibility
> must be
> > taken into account and functions versionned appropriately!
>
> Normally I am not a big define guy. But it seems like a define is good
> here.
> Somebody is going to need to know beforehand if they are making a Core
> Router
> where they want this, or a Security Inspection system like mine, etc.
>
For example in case of core router as_num feature can be necessary for
netflow. It can be necessary in case of security device such as my ddos
mitigation system.

>
> So it seems easier than doing a bunch of crazy size-juggling in the code.
>
> > do we want to have some of these changes in 2.2?
>
> Personally I am OK to wait as I have it working in my copy. I am just
> trying
> to be a good citizen of the community and contribute back when I see some
> core
> engineers going after the same code.
>
> In particular, for me, having LPM4 only with no LPM6 is not worth much

[dpdk-dev] Release of Packet Journey

2015-10-30 Thread Vladimir Medvedkin

Hi Nikita,

First of all thank you for published your project.
Please apply this patch below
diff --git a/app/acl.h b/app/acl.h
index fb2f73a..74a1dd5 100644
--- a/app/acl.h
+++ b/app/acl.h
@@ -72,4 +72,21 @@ extern struct acl_parm acl_parm_config;
 extern struct rte_acl_ctx *ipv4_acx[NB_SOCKETS];
 extern struct rte_acl_ctx *ipv6_acx[NB_SOCKETS];

+/*
+ * That effectively defines order of IPV4VLAN classifications:
+ *  - PROTO
+ *  - VLAN (TAG and DOMAIN)
+ *  - SRC IP ADDRESS
+ *  - DST IP ADDRESS
+ *  - PORTS (SRC and DST)
+ */
+enum {
+RTE_ACL_IPV4VLAN_PROTO,
+RTE_ACL_IPV4VLAN_VLAN,
+RTE_ACL_IPV4VLAN_SRC,
+RTE_ACL_IPV4VLAN_DST,
+RTE_ACL_IPV4VLAN_PORTS,
+RTE_ACL_IPV4VLAN_NUM
+};
+
 #endif

without it your project is not compiled.

Further your app breaks with segment fault. I run it as follows
root at war202:~/PKTJ/packet-journey#
./build/app/x86_64-native-linuxapp-gcc/app/pktj -l 0,1,2,3 -n 4
--socket-mem=4096 --log-level=4 -- --configfile
/home/medved/PKTJ/packet-journey/tests/integration/lab00/pktj.conf
PKTJ_ACL: IPv6 ACL entries 0:
PKTJ_ACL: IPv4 ACL entries 1:
PKTJ_ACL:   1:PKTJ_ACL: 0.0.0.0/0 PKTJ_ACL: 1.2.6.0/24 PKTJ_ACL: 0 :
65535 0 : 65535 0x0/0x0 PKTJ_ACL: 0x-0x0-0xf000 PKTJ_ACL:
acl context @0x7fd3bf41aa80
  socket_id=0
  alg=3
  max_rules=10
  rule_size=96
  num_rules=1
  num_categories=1
  num_tries=1
ACL: allocation of 9600904 bytes on socket 1 for ACL_pktj-acl-ipv41-0 failed
PKTJ_ACL: Failed to create ACL context
PKTJ_ACL: setup_acl failed for ipv4 with socketid 1, keeping previous rules
for that socket
ACL: allocation of 9600904 bytes on socket 2 for ACL_pktj-acl-ipv42-0 failed
PKTJ_ACL: Failed to create ACL context
PKTJ_ACL: setup_acl failed for ipv4 with socketid 2, keeping previous rules
for that socket
ACL: allocation of 9600904 bytes on socket 3 for ACL_pktj-acl-ipv43-0 failed
PKTJ_ACL: Failed to create ACL context
PKTJ_ACL: setup_acl failed for ipv4 with socketid 3, keeping previous rules
for that socket
 Address:90:E2:BA:39:2A:D8
port=0 tx_queueid=3 nb_txd=512 kni
launching control thread for socketid 0 on lcore 0
CMDLINE1: symlink() failed
Segmentation fault

Regards,
Vladimir



2015-10-29 22:53 GMT+03:00 Nikita Kozlov :

> Hello,
>
> We have opensourced our dpdk-based project, Packet Journey
> https://github.com/Gandi/packet-journey .
>
> Packet Journey is a combinationof Linux RT_NETLINK and severalparts
> ofDPDK (rte_kni, rte_lpm, rte_acl, rte_cmdline) and is intended to
> serveas an edge router.
>
> Our use case is:
> - pktj starts several forwarding threads and a KNI thread per external port
> - pktj launches a script which configures the KNI interface (MAC, IP,
> VLAN) and launches a BGP daemon
> - the host receives routes from the BGP daemon
> - the BGP daemon injects the routes in Linux
> - pktj receives the routes from NETLINK and put them in LPM in our
> "control" threads
> - pktj forwards packets to the KNI if the packets are
>   - for the KNI IP or
>   - if the neighbor is not known yet
>   - if ttl reaches 0
> - pktj filters packets if they match an ACLor if they exceed the rate
> limit, kni output is also rate-limited
> - pktj forwards packetsdirectly from the RXqueue to the TXqueue if the
> neighbor is known
>
> --
> Nikita
>

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-27 Thread Vladimir Medvedkin

Hi Michal,

Try patch below. I will send it via git.

Regards,
Vladimir

2015-10-26 21:40 GMT+03:00 Matthew Hall :

> > I can't apply patch 0001-... , could You check it please?
>
> I generated it from a rebase of my own copy of DPDK against DPDK upstream
> master.
>
> So I'm not sure why it would not apply against latest DPDK master.
>
> But I will try it and see what could be the reason.
>
> Matthew.
>

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-27 Thread Vladimir Medvedkin

Signed-off-by: Vladimir Medvedkin 
---
 config/common_bsdapp |   1 +
 config/common_linuxapp   |   1 +
 lib/librte_lpm/rte_lpm.c | 194 +--
 lib/librte_lpm/rte_lpm.h | 163 +++
 4 files changed, 219 insertions(+), 140 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index b37dcf4..408cc2c 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -344,6 +344,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
 #
 CONFIG_RTE_LIBRTE_LPM=y
 CONFIG_RTE_LIBRTE_LPM_DEBUG=n
+CONFIG_RTE_LIBRTE_LPM_ASNUM=n

 #
 # Compile librte_acl
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..1c60e63 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -352,6 +352,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
 #
 CONFIG_RTE_LIBRTE_LPM=y
 CONFIG_RTE_LIBRTE_LPM_DEBUG=n
+CONFIG_RTE_LIBRTE_LPM_ASNUM=n

 #
 # Compile librte_acl
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 163ba3c..363b400 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -159,9 +159,11 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,

lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);

-   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl24_entry) != 2);
-   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl8_entry) != 2);
-
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 8);
+#else
+   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 4);
+#endif
/* Check user arguments. */
if ((name == NULL) || (socket_id < -1) || (max_rules == 0)){
rte_errno = EINVAL;
@@ -261,7 +263,7 @@ rte_lpm_free(struct rte_lpm *lpm)
  */
 static inline int32_t
 rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
-   uint8_t next_hop)
+   struct rte_lpm_res *res)
 {
uint32_t rule_gindex, rule_index, last_rule;
int i;
@@ -282,8 +284,11 @@ rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t 
depth,

/* If rule already exists update its next_hop and 
return. */
if (lpm->rules_tbl[rule_index].ip == ip_masked) {
-   lpm->rules_tbl[rule_index].next_hop = next_hop;
-
+   lpm->rules_tbl[rule_index].next_hop = 
res->next_hop;
+   lpm->rules_tbl[rule_index].fwd_class = 
res->fwd_class;
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   lpm->rules_tbl[rule_index].as_num = res->as_num;
+#endif
return rule_index;
}
}
@@ -320,7 +325,11 @@ rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t 
depth,

/* Add the new rule. */
lpm->rules_tbl[rule_index].ip = ip_masked;
-   lpm->rules_tbl[rule_index].next_hop = next_hop;
+   lpm->rules_tbl[rule_index].next_hop = res->next_hop;
+   lpm->rules_tbl[rule_index].fwd_class = res->fwd_class;
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   lpm->rules_tbl[rule_index].as_num = res->as_num;
+#endif

/* Increment the used rules counter for this rule group. */
lpm->rule_info[depth - 1].used_rules++;
@@ -382,10 +391,10 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, 
uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static inline int32_t
-tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
+tbl8_alloc(struct rte_lpm_tbl_entry *tbl8)
 {
uint32_t tbl8_gindex; /* tbl8 group index. */
-   struct rte_lpm_tbl8_entry *tbl8_entry;
+   struct rte_lpm_tbl_entry *tbl8_entry;

/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
for (tbl8_gindex = 0; tbl8_gindex < RTE_LPM_TBL8_NUM_GROUPS;
@@ -393,12 +402,12 @@ tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
tbl8_entry = [tbl8_gindex *
   RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
/* If a free tbl8 group is found clean it and set as VALID. */
-   if (!tbl8_entry->valid_group) {
+   if (!tbl8_entry->ext_valid) {
memset(_entry[0], 0,
RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
sizeof(tbl8_entry[0]));

-   tbl8_entry->valid_group = VALID;
+   tbl8_entry->ext_valid = VALID;

/* Return group index for allocated tbl8 group. */
return tbl8_gindex;
@@ -410,46 +419,50 @@ tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
 }

 static inline void
-tbl8_free(struct rte_lpm_tbl8_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
 {
/* Set tbl8 group invalid*/
-   tbl8[tbl8_group_start].valid_group = INVALID;
+   tbl8[tbl8_group_start].ext_valid = INV

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-26 Thread Vladimir Medvedkin

Michal,

Looks strange, you have:
error: while searching for:

   lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);
...
error: patch failed: lib/librte_lpm/rte_lpm.c:159
but if we look at
http://dpdk.org/browse/dpdk/tree/lib/librte_lpm/rte_lpm.c#n159
patch should apply fine.
Latest commit in my repo is 139debc42dc0a320dad40f5295b74d2e3ab8a7f9


2015-10-26 18:39 GMT+03:00 Michal Jastrzebski <
michalx.k.jastrzebski at intel.com>:

> esOn Mon, Oct 26, 2015 at 05:03:31PM +0300, Vladimir Medvedkin wrote:
> > Hi Michal,
> >
> > Forwarding class can help us to classify traffic based on dst prefix,
> it's
> > something like Juniper DCU. For example on Juniper MX I can make policy
> > that install prefix into the FIB with some class and use it on dataplane,
> > for example with ACL.
> > On Juniper MX I can make something like that:
> > #show policy-options
> > policy-statement community-to-class {
> > term customer {
> > from community originate-customer;
> > then destination-class customer;
> > }
> > }
> > community originate-customer members 12345:1;
> > # show routing-options
> > forwarding-table {
> > export community-to-class;
> > }
> > # show forwarding-options
> > forwarding-options {
> > family inet {
> > filter {
> > output test-filter;
> > }
> > }
> > }
> > # show firewall family inet filter test-filter
> > term 1 {
> > from {
> > protocol icmp;
> > destination-class customer;
> > }
> > then {
> > discard;
> > }
> > }
> > announce route 10.10.10.10/32 next-hop 10.10.10.2 community 12345:1
> > After than on dataplane we have
> > NPC1( vty)# show route ip lookup 10.10.10.10
> > Route Information (10.10.10.10):
> >  interface : xe-1/0/0.0 (328)
> >  Nexthop prefix : -
> >  Nexthop ID : 1048574
> >  MTU: 0
> >  Class ID   : 129 <- That is "forwarding class" in my implementation
> > This construction discards all ICMP traffic that goes to dst prefixes
> which
> > was originated with community 12345:1. With this mechanism we can
> make
> > on control plane different sophisticated policy to control traffic on
> > dataplane.
> > The same with as_num, we can have on dataplane AS number that has
> > originated that prefix, or another 4-byte number e.g. geo-id.
> > What issue do you mean? I think it is because of table/pipeline/test
> > frameworks that doesen't want to compile due to changing API/ABI. You can
> > turn it off for LPM testing, if my patch will be applied I will make
> > changes in above-mentioned frameworks.
> >
> > Regards,
> > Vladimir
>
> Hi Vladimir,
> I have an issue with applying Your patch not compilation.
> This is the error i get:
> Checking patch config/common_bsdapp...
> Checking patch config/common_linuxapp...
> Checking patch lib/librte_lpm/rte_lpm.c...
> error: while searching for:
>
>lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);
>
>RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl24_entry) != 2);
>RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl8_entry) != 2);
>
>/* Check user arguments. */
>if ((name == NULL) || (socket_id < -1) || (max_rules == 0)){
>rte_errno = EINVAL;
>
> error: patch failed: lib/librte_lpm/rte_lpm.c:159
> error: lib/librte_lpm/rte_lpm.c: patch does not apply
> Checking patch lib/librte_lpm/rte_lpm.h...
> error: while searching for:
> #define RTE_LPM_RETURN_IF_TRUE(cond, retval)
> #endif
>
> /** @internal bitmask with valid and ext_entry/valid_group fields set */
> #define RTE_LPM_VALID_EXT_ENTRY_BITMASK 0x0300
>
> /** Bitmask used to indicate successful lookup */
> #define RTE_LPM_LOOKUP_SUCCESS  0x0100
>
> #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> /** @internal Tbl24 entry structure. */
> struct rte_lpm_tbl24_entry {
>/* Stores Next hop or group index (i.e. gindex)into tbl8. */
>union {
>uint8_t next_hop;
>uint8_t tbl8_gindex;
>};
>/* Using single uint8_t to store 3 values. */
>uint8_t valid :1; /**< Validation flag. */
>uint8_t ext_entry :1; /**< External entry. */
>uint8_t depth :6; /**< Rule depth. */
> };
>
> /** @internal Tbl8 entry structure. */
> struct rte_lpm_tbl8_entry {
>uint8_t next_hop; /**< next hop. */
>/* Using single uint8_t to store 3 values. */
>uint8_t valid   :1

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-26 Thread Vladimir Medvedkin

Hi Michal,

Forwarding class can help us to classify traffic based on dst prefix, it's
something like Juniper DCU. For example on Juniper MX I can make policy
that install prefix into the FIB with some class and use it on dataplane,
for example with ACL.
On Juniper MX I can make something like that:
#show policy-options
policy-statement community-to-class {
term customer {
from community originate-customer;
then destination-class customer;
}
}
community originate-customer members 12345:1;
# show routing-options
forwarding-table {
export community-to-class;
}
# show forwarding-options
forwarding-options {
family inet {
filter {
output test-filter;
}
}
}
# show firewall family inet filter test-filter
term 1 {
from {
protocol icmp;
destination-class customer;
}
then {
discard;
}
}
announce route 10.10.10.10/32 next-hop 10.10.10.2 community 12345:1
After than on dataplane we have
NPC1( vty)# show route ip lookup 10.10.10.10
Route Information (10.10.10.10):
 interface : xe-1/0/0.0 (328)
 Nexthop prefix : -
 Nexthop ID : 1048574
 MTU: 0
 Class ID   : 129 <- That is "forwarding class" in my implementation
This construction discards all ICMP traffic that goes to dst prefixes which
was originated with community 12345:1. With this mechanism we can make
on control plane different sophisticated policy to control traffic on
dataplane.
The same with as_num, we can have on dataplane AS number that has
originated that prefix, or another 4-byte number e.g. geo-id.
What issue do you mean? I think it is because of table/pipeline/test
frameworks that doesen't want to compile due to changing API/ABI. You can
turn it off for LPM testing, if my patch will be applied I will make
changes in above-mentioned frameworks.

Regards,
Vladimir

2015-10-26 14:57 GMT+03:00 Jastrzebski, MichalX K <
michalx.k.jastrzebski at intel.com>:

> > -Original Message-
> > From: Michal Jastrzebski [mailto:michalx.k.jastrzebski at intel.com]
> > Sent: Monday, October 26, 2015 12:55 PM
> > To: Vladimir Medvedkin
> > Subject: Re: [dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops
> > for lpm (ipv4)
> >
> > On Sun, Oct 25, 2015 at 08:52:04PM +0300, Vladimir Medvedkin wrote:
> > > Hi all,
> > >
> > > Here my implementation
> > >
> > > Signed-off-by: Vladimir Medvedkin 
> > > ---
> > >  config/common_bsdapp |   1 +
> > >  config/common_linuxapp   |   1 +
> > >  lib/librte_lpm/rte_lpm.c | 194
> > > +--
> > >  lib/librte_lpm/rte_lpm.h | 163 +++
> > >  4 files changed, 219 insertions(+), 140 deletions(-)
> > >
> > > diff --git a/config/common_bsdapp b/config/common_bsdapp
> > > index b37dcf4..408cc2c 100644
> > > --- a/config/common_bsdapp
> > > +++ b/config/common_bsdapp
> > > @@ -344,6 +344,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
> > >  #
> > >  CONFIG_RTE_LIBRTE_LPM=y
> > >  CONFIG_RTE_LIBRTE_LPM_DEBUG=n
> > > +CONFIG_RTE_LIBRTE_LPM_ASNUM=n
> > >
> > >  #
> > >  # Compile librte_acl
> > > diff --git a/config/common_linuxapp b/config/common_linuxapp
> > > index 0de43d5..1c60e63 100644
> > > --- a/config/common_linuxapp
> > > +++ b/config/common_linuxapp
> > > @@ -352,6 +352,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
> > >  #
> > >  CONFIG_RTE_LIBRTE_LPM=y
> > >  CONFIG_RTE_LIBRTE_LPM_DEBUG=n
> > > +CONFIG_RTE_LIBRTE_LPM_ASNUM=n
> > >
> > >  #
> > >  # Compile librte_acl
> > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > > index 163ba3c..363b400 100644
> > > --- a/lib/librte_lpm/rte_lpm.c
> > > +++ b/lib/librte_lpm/rte_lpm.c
> > > @@ -159,9 +159,11 @@ rte_lpm_create(const char *name, int socket_id,
> > int
> > > max_rules,
> > >
> > > lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);
> > >
> > > -   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl24_entry) != 2);
> > > -   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl8_entry) != 2);
> > > -
> > > +#ifdef RTE_LIBRTE_LPM_ASNUM
> > > +   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 8);
> > > +#else
> > > +   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 4);
> > > +#endif
> > > /* Check user arguments. */
> > > if ((name == NULL) || (socket_id < -1) || (max_rules == 0)){
> > > rte_errno = EINVAL;
> > > @@ -261,7 +26

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-25 Thread Vladimir Medvedkin

Hi all,

Here my implementation

Signed-off-by: Vladimir Medvedkin 
---
 config/common_bsdapp |   1 +
 config/common_linuxapp   |   1 +
 lib/librte_lpm/rte_lpm.c | 194
+--
 lib/librte_lpm/rte_lpm.h | 163 +++
 4 files changed, 219 insertions(+), 140 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index b37dcf4..408cc2c 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -344,6 +344,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
 #
 CONFIG_RTE_LIBRTE_LPM=y
 CONFIG_RTE_LIBRTE_LPM_DEBUG=n
+CONFIG_RTE_LIBRTE_LPM_ASNUM=n

 #
 # Compile librte_acl
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..1c60e63 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -352,6 +352,7 @@ CONFIG_RTE_LIBRTE_JOBSTATS=y
 #
 CONFIG_RTE_LIBRTE_LPM=y
 CONFIG_RTE_LIBRTE_LPM_DEBUG=n
+CONFIG_RTE_LIBRTE_LPM_ASNUM=n

 #
 # Compile librte_acl
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 163ba3c..363b400 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -159,9 +159,11 @@ rte_lpm_create(const char *name, int socket_id, int
max_rules,

lpm_list = RTE_TAILQ_CAST(rte_lpm_tailq.head, rte_lpm_list);

-   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl24_entry) != 2);
-   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl8_entry) != 2);
-
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 8);
+#else
+   RTE_BUILD_BUG_ON(sizeof(struct rte_lpm_tbl_entry) != 4);
+#endif
/* Check user arguments. */
if ((name == NULL) || (socket_id < -1) || (max_rules == 0)){
rte_errno = EINVAL;
@@ -261,7 +263,7 @@ rte_lpm_free(struct rte_lpm *lpm)
  */
 static inline int32_t
 rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
-   uint8_t next_hop)
+   struct rte_lpm_res *res)
 {
uint32_t rule_gindex, rule_index, last_rule;
int i;
@@ -282,8 +284,11 @@ rule_add(struct rte_lpm *lpm, uint32_t ip_masked,
uint8_t depth,

/* If rule already exists update its next_hop and
return. */
if (lpm->rules_tbl[rule_index].ip == ip_masked) {
-   lpm->rules_tbl[rule_index].next_hop =
next_hop;
-
+   lpm->rules_tbl[rule_index].next_hop =
res->next_hop;
+   lpm->rules_tbl[rule_index].fwd_class =
res->fwd_class;
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   lpm->rules_tbl[rule_index].as_num =
res->as_num;
+#endif
return rule_index;
}
}
@@ -320,7 +325,11 @@ rule_add(struct rte_lpm *lpm, uint32_t ip_masked,
uint8_t depth,

/* Add the new rule. */
lpm->rules_tbl[rule_index].ip = ip_masked;
-   lpm->rules_tbl[rule_index].next_hop = next_hop;
+   lpm->rules_tbl[rule_index].next_hop = res->next_hop;
+   lpm->rules_tbl[rule_index].fwd_class = res->fwd_class;
+#ifdef RTE_LIBRTE_LPM_ASNUM
+   lpm->rules_tbl[rule_index].as_num = res->as_num;
+#endif

/* Increment the used rules counter for this rule group. */
lpm->rule_info[depth - 1].used_rules++;
@@ -382,10 +391,10 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked,
uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static inline int32_t
-tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
+tbl8_alloc(struct rte_lpm_tbl_entry *tbl8)
 {
uint32_t tbl8_gindex; /* tbl8 group index. */
-   struct rte_lpm_tbl8_entry *tbl8_entry;
+   struct rte_lpm_tbl_entry *tbl8_entry;

/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
for (tbl8_gindex = 0; tbl8_gindex < RTE_LPM_TBL8_NUM_GROUPS;
@@ -393,12 +402,12 @@ tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
tbl8_entry = [tbl8_gindex *
   RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
/* If a free tbl8 group is found clean it and set as VALID.
*/
-   if (!tbl8_entry->valid_group) {
+   if (!tbl8_entry->ext_valid) {
memset(_entry[0], 0,
RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
sizeof(tbl8_entry[0]));

-   tbl8_entry->valid_group = VALID;
+   tbl8_entry->ext_valid = VALID;

/* Return group index for allocated tbl8 group. */
return tbl8_gindex;
@@ -410,46 +419,50 @@ tbl8_alloc(struct rte_lpm_tbl8_entry *tbl8)
 }

 static inline void
-tbl8_free(struct rte_lpm_tbl8_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
 {
/* Set tbl8 group invalid*/
-   tbl8[tbl8_group_start].valid_group = INVALID;
+   t

[dpdk-dev] [PATCH v1 0/3] lpm: increase number of next hops for lpm (ipv4)

2015-10-23 Thread Vladimir Medvedkin

Hi all,

I also have LPM library implementation. Main points:
- First, we don't need two separate structures rte_lpm_tbl8_entry and
rte_lpm_tbl24_entry. I think it's better to merge in one rte_lpm_tbl_entry
because there is only one difference in name of one bit - valid_group vs
ext_entry. Let it's name will be ext_valid.
- Second, I think that 16 bit is more than enough for next-hop index. It's
better to use remaining 8 bits for so called forwarding class. It is
something like Juniper DCU that can help us to classify traffic based on
dst prefix. But after conversation with Bruce Richardson I agree with him
that next-hop index and forwarding class can be split from one return value
by the application.
- Third, I want to add posibility to lookup AS number (or any other 4 byte)
that originate that prefix. It can be defined like:
union rte_lpm_tbl_entry_extend {
#ifdef RTE_LPM_ASNUM
uint64_t entry;
#else
uint32_t entry;
#endif
#ifdef RTE_LPM_ASNUM
   uint32_t as_num;
#endif
   struct{
   uint32_t next_hop   :24;/**< next hop. */
   uint32_t valid  :1; /**< Validation flag. */
   uint32_t ext_valid :1; /**< External entry. */
   uint32_t depth :6; /**< Rule depth. */
};
};
- Fourth, extension of next-hop index is done not only for increasing of
next-hops but also to increase more specific routes. So I think that should
be fixed
+   unsigned tbl8_index = (uint8_t)ip +
+   ((uint8_t)tbl_entry *
RTE_LPM_TBL8_GROUP_NUM_ENTRIES);

Regards,
Vladimir

2015-10-23 21:38 GMT+03:00 Matthew Hall :

> On Fri, Oct 23, 2015 at 09:33:05AM -0700, Stephen Hemminger wrote:
> > On Fri, 23 Oct 2015 09:20:33 -0700
> > Matthew Hall  wrote:
> >
> > > On Fri, Oct 23, 2015 at 03:51:48PM +0200, Michal Jastrzebski wrote:
> > > > From: Michal Kobylinski  
> > > >
> > > > The current DPDK implementation for LPM for IPv4 and IPv6 limits the
> > > > number of next hops to 256, as the next hop ID is an 8-bit long
> field.
> > > > Proposed extension increase number of next hops for IPv4 to 2^24 and
> > > > also allows 32-bits read/write operations.
> > > >
> > > > This patchset requires additional change to rte_table library to meet
> > > > ABI compatibility requirements. A v2 will be sent next week.
> > >
> > > I also have a patchset for this.
> > >
> > > I will send it out as well so we could compare.
> > >
> > > Matthew.
> >
> > Could you consider rolling in the Brocade/Vyatta changes to LPM
> > structure as well. Would prefer only one ABI change
>
> Hi Stephen,
>
> I asked you if you could send me these a while ago but I never heard
> anything.
>
> That's the only reason I made my own version.
>
> If you have them available also maybe we can consolidate things.
>
> Matthew.
>

[dpdk-dev] Changes to 5tuple IPv4 filters in dpdk v2.0

2015-08-05 Thread Vladimir Medvedkin

Hi Kam,

Flow director can filter by src/dst prefix, but the src/dst prefix length
is global for all rules. So, if you decide to specify /16 dst network, all
rules will have /16 prefix length for dst address.

Regards,
Vladimir

2015-08-05 17:53 GMT+03:00 Kamraan Nasim :

> Hi Vladimir,
>
> Thank you for the link. Seems to simply be an abstraction over the
> existing filters so it is safe for me to upgrade to v2.0 :)
>
> Since we are on the subject, are you aware of any filters on 82599 or
> Fortville that may provide subnet filtering(I can specify something like
> 192.168.0.0/16 instead of host addresses)?  What about flow director
> filters?
>
>
> --Kam
>
> On Tue, Aug 4, 2015 at 5:40 PM, Vladimir Medvedkin 
> wrote:
>
>> Hi Kam,
>>
>> 1) The reason is discussed in
>> http://dpdk.org/ml/archives/dev/2014-September/005179.html
>> 2) No, it's still not supported (on current NICs). At the moment ntuple
>> is supported only by igb and ixgbe. If you look at
>> drivers/net/ixgbe/ixgbe_ethdev.c you can see ntuple_filter_to_5tuple
>> function which translate rte_eth_ntuple_filter to ixgbe_5tuple_filter_info,
>> so mask can be either UINT32_MAX or 0. It's hardware limitation (see 82599
>> datasheet 7.1.2.5 L3/L4 5-tuple Filters).
>>
>> Regards,
>> Vladimir
>>
>> 2015-08-04 23:44 GMT+03:00 Kamraan Nasim :
>>
>>> Hi DPDK community,
>>>
>>> I've been using DPDK v1.7 and v1.8 for the past year. On updating to
>>> v2.0.0,  I see that *rte_5tuple_filter* has been deprecated as well as
>>> the
>>> associated install/remove call,* rte_eth_dev_add_5tuple_filter()*
>>>
>>> I now see that rte_eth_ntuple_filter has been added in place.
>>>
>>> 1) Is there a specific reason for removing backward compatibility? As in
>>> is
>>> there a known issue with rte_5tuple_filter infra that was discovered in
>>> v2.0?
>>>
>>>
>>> 2) One limitation of rte_5tuple_filter was that it could not be used to
>>> filter /24 or /16 ip addresses(subnet filtering). I now see that the
>>> src_ip_mask and dst_ip_mask is 32 bits and a separate
>>> RTE_NTUPLE_FLAGS_SRC_IP
>>> <
>>> http://dpdk.org/doc/api/rte__eth__ctrl_8h.html#aff1204ca0b33628610956f840dd9b206
>>> >
>>>   has been introduced. Does this imply that we NOW support subnet
>>> filtering(use mask for wildcard masking)?
>>>
>>>
>>> Any help or pointers on the subject will be greatly appreciated!!!
>>>
>>>
>>> Thanks,
>>> Kam
>>>
>>
>>
>

[dpdk-dev] Changes to 5tuple IPv4 filters in dpdk v2.0

2015-08-05 Thread Vladimir Medvedkin

Hi Kam,

1) The reason is discussed in
http://dpdk.org/ml/archives/dev/2014-September/005179.html
2) No, it's still not supported (on current NICs). At the moment ntuple is
supported only by igb and ixgbe. If you look at
drivers/net/ixgbe/ixgbe_ethdev.c you can see ntuple_filter_to_5tuple
function which translate rte_eth_ntuple_filter to ixgbe_5tuple_filter_info,
so mask can be either UINT32_MAX or 0. It's hardware limitation (see 82599
datasheet 7.1.2.5 L3/L4 5-tuple Filters).

Regards,
Vladimir

2015-08-04 23:44 GMT+03:00 Kamraan Nasim :

> Hi DPDK community,
>
> I've been using DPDK v1.7 and v1.8 for the past year. On updating to
> v2.0.0,  I see that *rte_5tuple_filter* has been deprecated as well as the
> associated install/remove call,* rte_eth_dev_add_5tuple_filter()*
>
> I now see that rte_eth_ntuple_filter has been added in place.
>
> 1) Is there a specific reason for removing backward compatibility? As in is
> there a known issue with rte_5tuple_filter infra that was discovered in
> v2.0?
>
>
> 2) One limitation of rte_5tuple_filter was that it could not be used to
> filter /24 or /16 ip addresses(subnet filtering). I now see that the
> src_ip_mask and dst_ip_mask is 32 bits and a separate
> RTE_NTUPLE_FLAGS_SRC_IP
> <
> http://dpdk.org/doc/api/rte__eth__ctrl_8h.html#aff1204ca0b33628610956f840dd9b206
> >
>   has been introduced. Does this imply that we NOW support subnet
> filtering(use mask for wildcard masking)?
>
>
> Any help or pointers on the subject will be greatly appreciated!!!
>
>
> Thanks,
> Kam
>

[dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS

2015-07-29 Thread Vladimir Medvedkin

Hi Michael,

Thanks for comment, it will be fixed in next patch.

Regards,
Vladimir

2015-07-29 8:01 GMT+03:00 Qiu, Michael :

>  Hi, Vladimir
>
> You need also to fix this issue in i686 platform:
>
> RHEL65_32,2.6.32,4.4.7,14.0.0
> SUSE11SP3_32,3.0.76-0,4.3.4,14.0.0
>
>
> i686-native-linuxapp-gcc/include/rte_thash.h:63: error: integer constant
> is too large for 'long' type
> i686-native-linuxapp-gcc/include/rte_thash.h:63: error: integer constant
> is too large for 'long' type
>
>
> Thanks,
> Michael
>
> On 2015/7/27 4:58, Vladimir Medvedkin wrote:
>
> Hi Tony,
>
> Sorry for the late reply, I was on vacation.
> I'll prepare patch soon.
>
> Regards,
> Vladimir
>
> 2015-07-22 10:55 GMT+03:00 Tony Lu  :
>
>
>  Hi, Vladimir
>
> When compiling thash for no-X86 arches, it fails with the following errors.
> I wonder if
> it is possible to make the thash library arch-independent?
>
> == Build app/test
>   CC test_thash.o
> In file included from /u/zlu.bjg/git/dpdk.org/app/test/test_thash.c:40:
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:56:22
> :
> error: rte_vect.h: No such file or directory
> In file included from /u/zlu.bjg/git/dpdk.org/app/test/test_thash.c:40:
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:62:
> error: expected '=', ',', ';', 'asm' or '__attribute__' before
> 'rte_thash_ipv6_bswap_mask'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:129:
> error: requested alignment is not a constant
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h: In
> function 'rte_thash_load_v6_addrs':
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: '__m128i' undeclared (first use in this function)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: (Each undeclared identifier is reported only once
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: for each function it appears in.)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: expected ';' before 'ipv6'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:161:
> error: expected expression before ')' token
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> error: 'ipv6' undeclared (first use in this function)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: implicit declaration of function '_mm_loadu_si128'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: nested extern declaration of '_mm_loadu_si128'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> error: expected ')' before '__m128i'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: type defaults to 'int' in declaration of 'type name'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: cast from pointer to integer of different size
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:164:
> error: expected expression before ')' token
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:158:
> warning: unused parameter 'targ'
> make[3]: *** [test_thash.o] Error 1
> make[2]: *** [test] Error 2
> make[1]: *** [app] Error 2
> make: *** [all] Error 2
>
> Thanks
> -Zhigang Lu
>
>
>  -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org ] On 
> Behalf Of Vladimir Medvedkin
> Sent: Wednesday, July 01, 2015 7:40 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS
>
> Software implementation of the Toeplitz hash function used by RSS.
> Can be used either for packet distribution on single queue NIC or for
>
>  simulating
>
>  of RSS computation on specific NIC (for example after GRE header
> decapsulating).
>
> v6 changes
> - Fix compilation error
> - Rename some defines and function
>
> v5 changes
> - Fix errors reported by checkpatch.pl
>
> v4 changes
> - Fix copyright
> - rename bswap_mask constant, add rte_ prefix
> - change rte_ipv[46]_tuple struct
> - change rte_thash_load_v6_addr prototype
>
> v3 changes
> - Rework API to be more generic
> - Add sctp_tag into tuple
>
> v2 changes
> - Add ipv6 support
> - Various style fixes
>
> Signed-off-by: Vladimir Medvedkin   gmail.com>
> ---
> lib/librte_hash/Makefile|   1 +
> lib/librte_hash/rte_thash.h | 231
> 
> 2 files changed, 232 insertions(+)
> create mode 100644 lib/librte_hash/rte_thash.h
>

[dpdk-dev] [PATCH v2] Make the thash library arch-independent

2015-07-29 Thread Vladimir Medvedkin

v2 changes
- Fix SSE to SSE3 typo
- remove unnecessary comments
- Leave unalligned union rte_thash_tuple if no support for SSE3
- Makes 32bit compiler happy by adding ULL suffix

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/rte_thash.h | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
index 6156e8a..d98e98e 100644
--- a/lib/librte_hash/rte_thash.h
+++ b/lib/librte_hash/rte_thash.h
@@ -53,14 +53,19 @@ extern "C" {

 #include 
 #include 
-#include 
 #include 

+#ifdef __SSE3__
+#include 
+#endif
+
+#ifdef __SSE3__
 /* Byte swap mask used for converting IPv6 address
  * 4-byte chunks to CPU byte order
  */
 static const __m128i rte_thash_ipv6_bswap_mask = {
-   0x0405060700010203, 0x0C0D0E0F08090A0B};
+   0x0405060700010203ULL, 0x0C0D0E0F08090A0BULL};
+#endif

 /**
  * length in dwords of input tuple to
@@ -126,7 +131,11 @@ struct rte_ipv6_tuple {
 union rte_thash_tuple {
struct rte_ipv4_tuple   v4;
struct rte_ipv6_tuple   v6;
+#ifdef __SSE3__
 } __attribute__((aligned(XMM_SIZE)));
+#else
+};
+#endif

 /**
  * Prepare special converted key to use with rte_softrss_be()
@@ -157,12 +166,22 @@ rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, 
int len)
 static inline void
 rte_thash_load_v6_addrs(const struct ipv6_hdr *orig, union rte_thash_tuple 
*targ)
 {
+#ifdef __SSE3__
__m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
*(__m128i *)targ->v6.src_addr =
_mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
ipv6 = _mm_loadu_si128((const __m128i *)orig->dst_addr);
*(__m128i *)targ->v6.dst_addr =
_mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
+#else
+   int i;
+   for (i = 0; i < 4; i++) {
+   *((uint32_t *)targ->v6.src_addr + i) =
+   rte_be_to_cpu_32(*((const uint32_t *)orig->src_addr + 
i));
+   *((uint32_t *)targ->v6.dst_addr + i) =
+   rte_be_to_cpu_32(*((const uint32_t *)orig->dst_addr + 
i));
+   }
+#endif
 }

 /**
-- 
1.8.3.2

[dpdk-dev] [PATCH] Make the thash library arch-independent

2015-07-28 Thread Vladimir Medvedkin

2015-07-28 19:05 GMT+03:00 Thomas Monjalon :

> 2015-07-28 18:33, Vladimir Medvedkin:
> > 2015-07-28 16:47 GMT+03:00 Thomas Monjalon :
> > > 2015-07-28 09:06, Vladimir Medvedkin:
> > > Please explain how it was broken and how you fixed it.
> > > It would be interesting to know which part is __SSE3__ and __SSE__.
> > >
> >  As mentioned in http://dpdk.org/ml/archives/dev/2015-July/022020.html
> > compilation fails on non x86 architectures( in that case it was tile).
> So I
> > add for optimized code, which uses SSE3 intrinsics, non optimized general
> > version.
>
> I know. I was requesting an updated commit with explanations:
> build is broken because...
> x86 version uses SSE3...
> Some code is enclosed with __SSE__, not __SSE3__ because...
>
Oh, that's my mistake. Will fix this typo in the next patch.

>
> What happens if it is built with SSE3 support but run on
> a CPU without such support?
> Please check how it is done for ACL.
>
> > > > +#ifdef __SSE3__
> > > > +#include 
> > > > +#endif /* __SSE3__ */
> > >
> > > Comments after short ifdef block are not needed.
> > >
> > Should I delete it?
>
> Yes please.
>
> > > > +#ifndef XMM_SIZE
> > > > +#define XMM_SIZE 16
> > >
> > > Why is it needed?
> > >
> >  because there is no defines for XMM_SIZE on non X86 architectures
>
> Why XMM_SIZE is needed on non x86 arch?
>
Ok, I will leave union rte_thash_tuple unaligned for non x86 arch

[dpdk-dev] [PATCH] Make the thash library arch-independent

2015-07-28 Thread Vladimir Medvedkin

Hi Thomas,


2015-07-28 16:47 GMT+03:00 Thomas Monjalon :

> Hi Vladimir,
> Thanks for fixing.
> Comments below.
>
> 2015-07-28 09:06, Vladimir Medvedkin:
> > Signed-off-by: Vladimir Medvedkin 
>
> Please explain how it was broken and how you fixed it.
> It would be interesting to know which part is __SSE3__ and __SSE__.
>
 As mentioned in http://dpdk.org/ml/archives/dev/2015-July/022020.html
compilation fails on non x86 architectures( in that case it was tile). So I
add for optimized code, which uses SSE3 intrinsics, non optimized general
version.

>
> > +#ifdef __SSE3__
> > +#include 
> > +#endif /* __SSE3__ */
>
> Comments after short ifdef block are not needed.
>
Should I delete it?

>
> > +#ifndef XMM_SIZE
> > +#define XMM_SIZE 16
>
> Why is it needed?
>
 because there is no defines for XMM_SIZE on non X86 architectures

Regards,
Vladimir

[dpdk-dev] [PATCH] Make the thash library arch-independent

2015-07-28 Thread Vladimir Medvedkin

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/rte_thash.h | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
index 6156e8a..ddb650a 100644
--- a/lib/librte_hash/rte_thash.h
+++ b/lib/librte_hash/rte_thash.h
@@ -53,14 +53,23 @@ extern "C" {

 #include 
 #include 
-#include 
 #include 

+#ifdef __SSE3__
+#include 
+#endif /* __SSE3__ */
+
+#ifndef XMM_SIZE
+#define XMM_SIZE   16
+#endif /* XMM_SIZE */
+
+#ifdef __SSE3__
 /* Byte swap mask used for converting IPv6 address
  * 4-byte chunks to CPU byte order
  */
 static const __m128i rte_thash_ipv6_bswap_mask = {
0x0405060700010203, 0x0C0D0E0F08090A0B};
+#endif /* __SSE3__ */

 /**
  * length in dwords of input tuple to
@@ -157,12 +166,22 @@ rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, 
int len)
 static inline void
 rte_thash_load_v6_addrs(const struct ipv6_hdr *orig, union rte_thash_tuple 
*targ)
 {
+#ifdef __SSE__
__m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
*(__m128i *)targ->v6.src_addr =
_mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
ipv6 = _mm_loadu_si128((const __m128i *)orig->dst_addr);
*(__m128i *)targ->v6.dst_addr =
_mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
+#else
+   int i;
+   for (i = 0; i < 4; i++) {
+   *((uint32_t *)targ->v6.src_addr + i) =
+   rte_be_to_cpu_32(*((const uint32_t *)orig->src_addr + 
i));
+   *((uint32_t *)targ->v6.dst_addr + i) =
+   rte_be_to_cpu_32(*((const uint32_t *)orig->dst_addr + 
i));
+   }
+#endif /* __SSE3__ */
 }

 /**
-- 
1.8.3.2

[dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS

2015-07-27 Thread Vladimir Medvedkin

Hi Tony,

Sorry for the late reply, I was on vacation.
I'll prepare patch soon.

Regards,
Vladimir

2015-07-22 10:55 GMT+03:00 Tony Lu :

> Hi, Vladimir
>
> When compiling thash for no-X86 arches, it fails with the following errors.
> I wonder if
> it is possible to make the thash library arch-independent?
>
> == Build app/test
>   CC test_thash.o
> In file included from /u/zlu.bjg/git/dpdk.org/app/test/test_thash.c:40:
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:56:22
> :
> error: rte_vect.h: No such file or directory
> In file included from /u/zlu.bjg/git/dpdk.org/app/test/test_thash.c:40:
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:62:
> error: expected '=', ',', ';', 'asm' or '__attribute__' before
> 'rte_thash_ipv6_bswap_mask'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:129:
> error: requested alignment is not a constant
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h: In
> function 'rte_thash_load_v6_addrs':
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: '__m128i' undeclared (first use in this function)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: (Each undeclared identifier is reported only once
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: for each function it appears in.)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:160:
> error: expected ';' before 'ipv6'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:161:
> error: expected expression before ')' token
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> error: 'ipv6' undeclared (first use in this function)
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: implicit declaration of function '_mm_loadu_si128'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: nested extern declaration of '_mm_loadu_si128'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> error: expected ')' before '__m128i'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: type defaults to 'int' in declaration of 'type name'
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:163:
> warning: cast from pointer to integer of different size
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:164:
> error: expected expression before ')' token
> /u/zlu.bjg/git/dpdk.org/tile-tilegx-linuxapp-gcc/include/rte_thash.h:158:
> warning: unused parameter 'targ'
> make[3]: *** [test_thash.o] Error 1
> make[2]: *** [test] Error 2
> make[1]: *** [app] Error 2
> make: *** [all] Error 2
>
> Thanks
> -Zhigang Lu
>
> >-Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir Medvedkin
> >Sent: Wednesday, July 01, 2015 7:40 AM
> >To: dev at dpdk.org
> >Subject: [dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS
> >
> >Software implementation of the Toeplitz hash function used by RSS.
> >Can be used either for packet distribution on single queue NIC or for
> simulating
> >of RSS computation on specific NIC (for example after GRE header
> >decapsulating).
> >
> >v6 changes
> >- Fix compilation error
> >- Rename some defines and function
> >
> >v5 changes
> >- Fix errors reported by checkpatch.pl
> >
> >v4 changes
> >- Fix copyright
> >- rename bswap_mask constant, add rte_ prefix
> >- change rte_ipv[46]_tuple struct
> >- change rte_thash_load_v6_addr prototype
> >
> >v3 changes
> >- Rework API to be more generic
> >- Add sctp_tag into tuple
> >
> >v2 changes
> >- Add ipv6 support
> >- Various style fixes
> >
> >Signed-off-by: Vladimir Medvedkin 
> >---
> > lib/librte_hash/Makefile|   1 +
> > lib/librte_hash/rte_thash.h | 231
> >
> > 2 files changed, 232 insertions(+)
> > create mode 100644 lib/librte_hash/rte_thash.h
> >
> >diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index
> >3696cb1..981230b 100644
> >--- a/lib/librte_hash/Makefile
> >+++ b/lib/librte_hash/Makefile
> >@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
> >SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
> >SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
> >SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
> >+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
>

[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin

Hi Anuj,

Thanks for fixes!
I have 2 comments
- from i40e_ethdev.h : #define I40E_DEFAULT_RX_WTHRESH  0
- (26 + 32) / 4 (batched descriptor writeback) should be (26 + 4 * 32) / 4
(batched descriptor writeback)
, thus we have 135 bytes/packet

This corresponds to 58.8 Mpps

Regards,
Vladimir

2015-07-01 17:22 GMT+03:00 Anuj Kalia :

> Vladimir,
>
> Few possible fixes to your PCIe analysis (let me know if I'm wrong):
> - ECRC is probably disabled (check using sudo lspci -vvv | grep
> CGenEn-), so TLP header is 26 bytes
> - Descriptor writeback can be batched using high value of WTHRESH,
> which is what DPDK uses by default
> - Read request contains full TLP header (26 bytes)
>
> Assuming WTHRESH = 4, bytes transferred from NIC to host per packet =
> 26 + 64 (packet itself) +
> (26 + 32) / 4 (batched descriptor writeback) +
> (26 / 4) (read request for new descriptors) =
> 111 bytes / packet
>
> This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP
> overhead, rate = 67.4 Mpps
>
> --Anuj
>
>
>
> On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin 
> wrote:
> > In case with syn flood you should take into account return syn-ack
> traffic,
> > which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
> > faster. And don't forget about DLLP's generated by rx traffic, which
> > saturates host-to-NIC bus.
> >
> > 2015-07-01 16:05 GMT+03:00 Pavel Odintsov :
> >
> >> Yes, Bruce, we understand this. But we are working with huge SYN
> >> attacks processing and they are 64byte only :(
> >>
> >> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
> >>  wrote:
> >> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
> >> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
> >> >> achieve 40GE line rate...
> >> >>
> >> > Note that this would only apply for your minimal i.e. 64-byte, packet
> >> sizes.
> >> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
> >> requirements
> >> > are lower and you can easier achieve line rate.
> >> >
> >> > /Bruce
> >> >
> >> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
> >> medvedkinv at gmail.com> wrote:
> >> >> > Hi Pavel,
> >> >> >
> >> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710
> rx
> >> only
> >> >> > case.
> >> >> > Assume we have 32byte descriptors (if we want more offload).
> >> >> > DMA makes one pcie transaction with packet payload, one descriptor
> >> writeback
> >> >> > and one memory request for free descriptors for every 4 packets.
> For
> >> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY +
> 6
> >> DLL +
> >> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
> >> itself) +
> >> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> >> >> > descriptors). Note that we do not take into account PCIe
> ACK/NACK/FC
> >> Update
> >> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits
> 1
> >> byte in
> >> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20
> ns.
> >> Thus
> >> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
> >> >> > Correct me if I'm wrong.
> >> >> >
> >> >> > Regards,
> >> >> > Vladimir
> >> >> >
> >> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours, Pavel Odintsov
> >>
>

[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin

In case with syn flood you should take into account return syn-ack traffic,
which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
faster. And don't forget about DLLP's generated by rx traffic, which
saturates host-to-NIC bus.

2015-07-01 16:05 GMT+03:00 Pavel Odintsov :

> Yes, Bruce, we understand this. But we are working with huge SYN
> attacks processing and they are 64byte only :(
>
> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
>  wrote:
> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
> >> achieve 40GE line rate...
> >>
> > Note that this would only apply for your minimal i.e. 64-byte, packet
> sizes.
> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
> requirements
> > are lower and you can easier achieve line rate.
> >
> > /Bruce
> >
> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
> medvedkinv at gmail.com> wrote:
> >> > Hi Pavel,
> >> >
> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx
> only
> >> > case.
> >> > Assume we have 32byte descriptors (if we want more offload).
> >> > DMA makes one pcie transaction with packet payload, one descriptor
> writeback
> >> > and one memory request for free descriptors for every 4 packets. For
> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
> DLL +
> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
> itself) +
> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> >> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC
> Update
> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1
> byte in
> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
> Thus
> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
> >> > Correct me if I'm wrong.
> >> >
> >> > Regards,
> >> > Vladimir
> >> >
> >> >
>
>
>
> --
> Sincerely yours, Pavel Odintsov
>

[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin

Hi Pavel,

Looks like you ran into pcie bottleneck. So let's calculate xl710 rx only
case.
Assume we have 32byte descriptors (if we want more offload).
DMA makes one pcie transaction with packet payload, one descriptor
writeback and one memory request for free descriptors for every 4 packets.
For Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
DLL + 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
itself) + 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
descriptors). Note that we do not take into account PCIe ACK/NACK/FC Update
DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1 byte
in 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
Thus in theory pcie 3.0 x8 may transfer not more than 50mpps.
Correct me if I'm wrong.

Regards,
Vladimir


2015-06-29 18:41 GMT+03:00 Pavel Odintsov :

> Hello, Andrew!
>
> What NIC have you used? Is it XL710?
>
> On Mon, Jun 29, 2015 at 6:38 PM, Andrew Theurer 
> wrote:
> >
> >
> > On Mon, Jun 29, 2015 at 10:06 AM, Keunhong Lee 
> wrote:
> >>
> >> I have not used XL710 or i40e.
> >> I have no opinion for those NICs.
> >>
> >> Keunhong.
> >>
> >> 2015-06-29 15:59 GMT+09:00 Pavel Odintsov :
> >>
> >> > Hello!
> >> >
> >> > Lee, thank you so much for sharing your experience! What do you think
> >> > about 40GE version of 82599?
> >> >
> >> > On Mon, Jun 29, 2015 at 2:35 AM, Keunhong Lee 
> >> > wrote:
> >> > > DISCLAIMER: This information is not verified. This is truly my
> >> > > personal
> >> > > opinion.
> >> > >
> >> > > As I know, intel 82599 is the only 10G NIC which supports line rate
> >> > > with
> >> > > minimum sized packets (64 byte).
> >> > > According to our internal tests, Mellanox's 40G NICs even support
> less
> >> > than
> >> > > 30Mpps.
> >> > > I think 40 Mpps is the hardware capacity.
> >
> >
> > This is approximately what I see as well.
> >
> >>
> >> > >
> >> > > Keunhong.
> >> > >
> >> > >
> >> > >
> >> > > 2015-06-28 19:34 GMT+09:00 Pavel Odintsov  >:
> >> > >>
> >> > >> Hello, folks!
> >> > >>
> >> > >> We have execute bunch of tests for receive data with Intel XL710
> 40GE
> >> > >> NIC. We want to achieve wire speed on this platform for traffic
> >> > >> capture.
> >> > >>
> >> > >> But we definitely can't do it. We tried with different versions of
> >> > >> DPDK: 1.4, 1.6, 1.8, 2.0. And have not success.
> >> > >>
> >> > >> We achieved only 40Mpps and could do more.
> >> > >>
> >> > >> Could anybody help us with this issue? Looks like this NIC's could
> >> > >> not
> >> > >> work on wire speed :(
> >> > >>
> >> > >> Platform: Intel Xeon E5 e5 2670 + XL 710.
> >> > >>
> >> > >> --
> >> > >> Sincerely yours, Pavel Odintsov
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Sincerely yours, Pavel Odintsov
> >> >
> >
> > -Andrew
> >
> >
>
>
>
> --
> Sincerely yours, Pavel Odintsov
>

[dpdk-dev] [PATCH v4] Add unit test for thash library

2015-06-30 Thread Vladimir Medvedkin

Add unit test for thash library

v4 changes
- Reflect rte_thash.h changes

v3 changes
- Fix checkpatch errors

v2 changes
- fix typo
- remove unnecessary comments

Signed-off-by: Vladimir Medvedkin 
---
 app/test/Makefile |   1 +
 app/test/test_thash.c | 176 ++
 2 files changed, 177 insertions(+)
 create mode 100644 app/test/test_thash.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 2e2758c..caa359c 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -82,6 +82,7 @@ SRCS-y += test_memcpy.c
 SRCS-y += test_memcpy_perf.c

 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
+SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_thash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c
diff --git a/app/test/test_thash.c b/app/test/test_thash.c
new file mode 100644
index 000..8e9dca0
--- /dev/null
+++ b/app/test/test_thash.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+struct test_thash_v4 {
+   uint32_tdst_ip;
+   uint32_tsrc_ip;
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+struct test_thash_v6 {
+   uint8_t dst_ip[16];
+   uint8_t src_ip[16];
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+/*From 82599 Datasheet 7.1.2.8.3 RSS Verification Suite*/
+struct test_thash_v4 v4_tbl[] = {
+{IPv4(161, 142, 100, 80), IPv4(66, 9, 149, 187),
+   1766, 2794, 0x323e8fc2, 0x51ccc178},
+{IPv4(65, 69, 140, 83), IPv4(199, 92, 111, 2),
+   4739, 14230, 0xd718262a, 0xc626b0ea},
+{IPv4(12, 22, 207, 184), IPv4(24, 19, 198, 95),
+   38024, 12898, 0xd2d0a5de, 0x5c2b394a},
+{IPv4(209, 142, 163, 6), IPv4(38, 27, 205, 30),
+   2217, 48228, 0x82989176, 0xafc7327f},
+{IPv4(202, 188, 127, 2), IPv4(153, 39, 163, 191),
+   1303, 44251, 0x5d1809c5, 0x10e828a2},
+};
+
+struct test_thash_v6 v6_tbl[] = {
+/*3ffe:2501:200:3::1*/
+{{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x00, 0x03,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,},
+/*3ffe:2501:200:1fff::7*/
+{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x1f, 0xff,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07,},
+1766, 2794, 0x2cc18cd5, 0x40207d3d},
+/*ff02::1*/
+{{0xff, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,},
+/*3ffe:501:8::260:97ff:fe40:efab*/
+{0x3f, 0xfe, 0x05, 0x01, 0x00, 0x08, 0x00, 0x00,
+0x02, 0x60, 0x97, 0xff, 0xfe, 0x40, 0xef, 0xab,},
+4739, 14230, 0x0f0c461c, 0xdde51bbf},
+/*fe80::200:f8ff:fe21:67cf*/
+{{0xfe, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x02, 0x00, 0xf8, 0xff, 0xfe, 0x21, 0x67, 0xcf,},
+/*3ffe:1900:4545:3:200:f8ff:fe21:67cf*/
+{0x3f, 0xfe, 0x19, 0x00, 0x45, 0x45, 0x00, 0x03,
+0x02, 0x00, 0xf8, 0xff, 0xfe, 0x21, 0x67, 0xcf,},
+38024, 44251, 0x4b61e985, 0x02d1feef},
+};
+
+uint8_t default_rss_key[] = {
+0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+0x77, 0xcb, 0x2d, 0xa3, 0x

[dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS

2015-06-30 Thread Vladimir Medvedkin

Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

v6 changes
- Fix compilation error
- Rename some defines and function

v5 changes
- Fix errors reported by checkpatch.pl

v4 changes
- Fix copyright
- rename bswap_mask constant, add rte_ prefix
- change rte_ipv[46]_tuple struct
- change rte_thash_load_v6_addr prototype

v3 changes
- Rework API to be more generic
- Add sctp_tag into tuple

v2 changes
- Add ipv6 support
- Various style fixes

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 231 
 2 files changed, 232 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..1808f47
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,231 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* Byte swap mask used for converting IPv6 address
+ * 4-byte chunks to CPU byte order
+ */
+static const __m128i rte_thash_ipv6_bswap_mask = {
+   0x0405060700010203, 0x0C0D0E0F08090A0B};
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv4 header only
+ */
+#define RTE_THASH_V4_L3_LEN((sizeof(struct rte_ipv4_tuple) -   \
+   sizeof(((struct rte_ipv4_tuple *)0)->sctp_tag)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv4 header +
+ * transport header
+ */
+#define RTE_THASH_V4_L4_LEN ((sizeof(struct rte_ipv4_tuple)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv6 header only
+ */
+#define RTE_THASH_V6_L3_LEN((sizeof(struct rte_ipv6_tuple) -   \
+   sizeof(((struct rte_ipv6_tuple *)0)->sctp_tag)) / 4)
+
+/**
+ * length in dwords of input tuple to
+ * calculate hash of ipv6 header +
+ * transport header
+ */
+#define RTE_THASH_V6_L4_LEN((sizeof(struct rte_ipv6_tuple)) / 4)
+
+/**
+ * IPv4 tuple
+ * addreses and ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   unio

[dpdk-dev] [PATCH v5] Add toeplitz hash algorithm used by RSS

2015-06-30 Thread Vladimir Medvedkin

Hi Bruce,

2015-06-29 15:40 GMT+03:00 Bruce Richardson :

> On Fri, Jun 19, 2015 at 01:31:13PM -0400, Vladimir Medvedkin wrote:
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC
> > or for simulating of RSS computation on specific NIC (for example
> > after GRE header decapsulating).
> >
> > v5 changes
> > - Fix errors reported by checkpatch.pl
> >
> > v4 changes
> > - Fix copyright
> > - rename bswap_mask constant, add rte_ prefix
> > - change rte_ipv[46]_tuple struct
> > - change rte_thash_load_v6_addr prototype
> >
> > v3 changes
> > - Rework API to be more generic
> > - Add sctp_tag into tuple
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
> > Signed-off-by: Vladimir Medvedkin 
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 207
> 
> >  2 files changed, 208 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> 
> > +static const __m128i rte_thash_ipv6_bswap_mask = {
> > + 0x0405060700010203, 0x0C0D0E0F08090A0B};
> > +
> > +#define RTE_THASH_V4_L3   2  /*calculate hash of ipv4 header
> only*/
> > +#define RTE_THASH_V4_L4   3  /*calculate hash of ipv4 +
> transport headers*/
> > +#define RTE_THASH_V6_L3   8  /*calculate hash of ipv6 header
> only */
> > +#define RTE_THASH_V6_L4   9  /*calculate hash of ipv6 +
> transport headers */
> > +
>
> Comment as on V4 patch - add LEN to name to make it clear they are lengths
> in quadwords.
>
Agree, but length dwords

>
> > +/**
> > + * IPv4 tuple
> > + * addreses and ports/sctp_tag have to be CPU byte order
> > + */
> > +struct rte_ipv4_tuple {
> > + uint32_tsrc_addr;
> > + uint32_tdst_addr;
> > + union {
> > + struct {
> > + uint16_t dport;
> > + uint16_t sport;
> > + };
> > + uint32_tsctp_tag;
> > + };
> > +};
> > +
> > +/**
> > + * IPv6 tuple
> > + * Addresses have to be filled by rte_thash_load_v6_addr()
> > + * ports/sctp_tag have to be CPU byte order
> > + */
> > +struct rte_ipv6_tuple {
> > + uint8_t src_addr[16];
> > + uint8_t dst_addr[16];
> > + union {
> > + struct {
> > + uint16_t dport;
> > + uint16_t sport;
> > + };
> > + uint32_tsctp_tag;
> > + };
> > +};
> > +
> > +union rte_thash_tuple {
> > + struct rte_ipv4_tuple   v4;
> > + struct rte_ipv6_tuple   v6;
> > +} __aligned(size);
> > +
> This throws an error on compilation [with unit test patch also applied].
>
I will never copy paste
I will never copy paste
I will never... =)

>
> In file included from /home/bruce/dpdk.org/app/test/test_thash.c:40:0:
> /home/bruce/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_thash.h:106:1:
> error: parameter names (without types) in function declaration [-Werror]
>  } __aligned(size);
>   ^
>
> +/**
> > + * Prepare special converted key to use with rte_softrss_be()
> > + * @param orig
> > + *   pointer to original RSS key
> > + * @param targ
> > + *   pointer to target RSS key
> > + * @param len
> > + *   RSS key length
> > + */
> > +static inline void
> > +rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < (len >> 2); i++)
> > + targ[i] = rte_be_to_cpu_32(orig[i]);
> > +}
> > +
> > +/**
> > + * Prepare and load IPv6 address
> > + * @param orig
> > + *   Pointer to ipv6 header of the original packet
> > + * @param targ
> > + *   Pointer to rte_ipv6_tuple structure
> > + */
> > +static inline void
> > +rte_thash_load_v6_addr(const struct ipv6_hdr *orig, union
> rte_thash_tuple *targ)
> > +{
> > + __m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
> > + *(__m128i *)targ->v6.src_addr =
> > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > + ipv6 = _mm_loadu_si128((const __m128i *)orig->dst_addr);
> > + *(__m128i *)targ->v6.dst_addr =
> > + _mm_shuffle_epi8(ipv6, rte_thash_ipv6_bswap_mask);
> > +}
>
> I think the function name needs to be pluralized, and the comment updated
> to make
> it clear that it is not just one IPv6 address that is loaded, but rather
> both
> source and destination.
>
I think no need to make too long function name like
rte_thash_load_v6_addresses(), instead I'll make comment more clear.
Is it enough?

>
> Regards,
> /Bruce
>

[dpdk-dev] rte_lpm with larger nexthops or another method?

2015-06-24 Thread Vladimir Medvedkin

Hi Matthew,

I published changes to rte_lpm_tbl24_entry only because it was just an idea
:) So rte_lpm_tbl8_entry should look like:
struct rte_lpm_tbl8_entry {
uint32_t next_hop :24; /**< next hop. */
uint32_t valid   :1; /**< Validation flag. */
uint32_t valid_group :1; /**< Group validation flag. */
uint32_t depth   :6; /**< Rule depth. */
};

,and
struct rte_lpm_rule {
uint32_t ip; /**< Rule IP address. */
uint32_t  next_hop; /**< Rule next hop. */
};
, and different defines and checks should be modified too.


2015-06-24 8:15 GMT+03:00 Matthew Hall :

> OK, I went and made a whole ton of patches to LPM and the tests and
> examples,
> now the selftest errors out... but I think maybe I don't have an adequate
> amount of hugepages. How much hugepage memory did people have when they
> did the selftest successfully before?
>
> I just keep seeing this over and over...
>
> RTE>>lpm_autotest
> Error at line 293:
> ERROR: LPM Test tests[i]: FAIL
> LPM: LPM memory allocation failed
>
> Matthew.
>

[dpdk-dev] rte_lpm with larger nexthops or another method?

2015-06-23 Thread Vladimir Medvedkin

Hi all,

Matthew,  I think ipv6 lpm code need less changes
struct rte_lpm6_tbl_entry {
uint32_t next_hop:  21;  /**< Next hop / next table to be
checked. */
uint32_t depth  :8;  /**< Rule depth. */

/* Flags. */
uint32_t valid :1;   /**< Validation flag. */
uint32_t valid_group :1; /**< Group validation flag. */
uint32_t ext_entry :1;   /**< External entry. */
};
there already is 21 bit for next_hop (need chenge only for rte_lpm6_rule)
In Stephen approach for next_hop given only 16 bits, this is enough for
next hop index, but not enough for AS number that originate prefix.

Regards,
Vladimir

2015-06-23 9:30 GMT+03:00 Matthew Hall :

> On Mon, Jun 22, 2015 at 11:51:02PM -0400, Stephen Hemminger wrote:
> > In order to make Vyatta/Brocade router work with LPM code
> > I ended up redoing the layout. It is:
> >
> > And also several other scalability improvements (plus IPv6)
> > and the correct handling of /32.
> >
> > Unfortunately, this is such a big binary change that I was
> > reluctant to break any tests or applications using existing code
> > and therefore never submitted the patches.
>
> 1. What you and Vladimir have done to this code kicks total ass and will
> be a
> huge help so I am very excited to squeeze in some cycles somewhere to test
> all
> of this stuff out ASAP.
>
> 2. Vladimir's changes were somewhat smaller, but Stephen yours are larger.
> Stephen, if you could place them into a cloned copy of DPDK or a branch
> somewhere for convenient pickup, I think I could help you make a lot of
> progress.
>
> I could help test these fixes in a second app besides your own to get some
> cross validation, and help make the required cleanups, so we could get a
> bit
> more external validation before we try to negotiate a safe way to merge
> them
> upstream to Bruce since he is marked as the LPM maintainer.
>
> My DPDK fork is located here, for example, but it could really be anywhere
> you
> like to put it which I could access. Or even a one-off zip or tarball with
> the
> git repo inside and I could host it in my fork or give you access on the
> fork
> to push it as a second remote if you are OK to do that...
>
> https://github.com/megahall/dpdk_mhall
>
> Matthew.
>

[dpdk-dev] rte_lpm with larger nexthops or another method?

2015-06-22 Thread Vladimir Medvedkin

Hi Matthew,

I just recently thought about next_hop extension. For ipv4 we can do
something like:
struct rte_lpm_tbl24_entry {
/* Stores Next hop or group index (i.e. gindex)into tbl8. */
union {
uint32_t  next_hop :24;
uint32_t  tbl8_gindex :24;
}__attribute__((__packed__));
/* Using single uint8_t to store 3 values. */
uint32_t valid :1; /**< Validation flag. */
uint32_t ext_entry :1; /**< External entry. */
uint32_t depth :6; /**< Rule depth. */
};
so we have 24 bit for next_hop.

2015-06-22 5:29 GMT+03:00 Matthew Hall :

> Hello,
>
> I have gone out on the internet for days looking at a bunch of different
> radix tree implementations to see if I could figure a way to implement my
> own tree, just to work around the really low 255 CIDR block limitation in
> librte_lpm. Unfortunately every single one I could find falls into one of
> these two annoying categories:
>
> 1) bloated with a lot of irrelevant kernel code I don't care about
> (especially the Linux version but also the BSD one, which also makes a
> weird assumption every address object stores its length in byte 0 of the
> address struct). These are hard to convert into something that plays nice
> with raw packet data.
>
> 2) very seemingly simple code, which breaks horribly if you try to add
> IPv6 support (such as the radix tree from University of Michigan / LLVM
> compiler benchmark suite, and the one from the old unmaintained mrt daemon,
> which includes a bizarre custom reference-counted memory manager that is
> very convoluted). These are easy to set up, but cause a lot of weird
> segfaults which I am having a difficult time to try to debug.
>
> So it seems like I am going nowhere with this approach. Instead, I'd like
> to know, what would I need to do to add this support to my local copy of
> librte_lpm? Let's assume for the sake of this discussion, that I don't care
> one iota about any performance cost, and I am happy if I need to prefetch
> two cachelines instead of just one (which I recall from a past thread is
> why librte_lpm has such a low nexthop limit to start with).
>
> Failing that, does anybody have a known good userspace version of any of
> these sort of items:
>
> 1) Hash Based FIB (forwarding information base),
> 2) Tree Based FIB,
> 3) Patricia trie (which does not break horribly on IPv6 or make bad
> assumptions about data format besides uint8_t* and length),
> 4) Crit-Bit tree
> 5) any other good way of taking IPv4 and IPv6 and finding the longest
> prefix match against a table of pre-loaded CIDR blocks?
>
> I am really pulling out my hair trying to find a way to do something which
> doesn't seem like it should have to be be this difficult. I must be missing
> a more obvious way to handle this.
>
> Thanks,
> Matthew

[dpdk-dev] [PATCH] Add unit test for thash library

2015-06-19 Thread Vladimir Medvedkin

2015-06-19 19:14 GMT+03:00 Richardson, Bruce :

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir Medvedkin
> > Sent: Friday, June 19, 2015 3:56 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] Add unit test for thash library
> >
> > Add unit test for thash library
> >
> Missing sign-off.
>
> > ---
> >  app/test/Makefile |   2 +
> >  app/test/autotest_data.py |  13 
> >  app/test/test_thash.c | 164
> > ++
> >  3 files changed, 179 insertions(+)
> >  create mode 100644 app/test/test_thash.c
> >
> > diff --git a/app/test/Makefile b/app/test/Makefile
> > index 5cf8296..fc6a247 100644
> > --- a/app/test/Makefile
> > +++ b/app/test/Makefile
> > @@ -85,6 +85,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
> >
> > +SRCS-y += test_thash.c
> > +
> >  SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm6.c
> >
> > diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
> > index 0c3802b..7653f09 100644
> > --- a/app/test/autotest_data.py
> > +++ b/app/test/autotest_data.py
> > @@ -475,6 +475,19 @@ non_parallel_test_group_list = [
> >   },
> >   ]
> >  },
> > +{
> > + "Prefix" :  "thash",
> > + "Memory" :  "32",
> > + "Tests" :
> > + [
> > + {
> > + "Name" :   "Thash autotest",
> > + "Command" :"thash_autotest",
> > + "Func" :   default_autotest,
> > + "Report" : None,
> > +},
> > + ]
> > +},
> >
> >  #
> >  # Please always make sure that ring_perf is the last test!
> > diff --git a/app/test/test_thash.c b/app/test/test_thash.c
> > new file mode 100644
> > index 000..4c863cc
> > --- /dev/null
> > +++ b/app/test/test_thash.c
> > @@ -0,0 +1,164 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2015 Vladimir Medvedkin 
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + * * Redistributions of source code must retain the above copyright
> > + *   notice, this list of conditions and the following disclaimer.
> > + * * Redistributions in binary form must reproduce the above
> > copyright
> > + *   notice, this list of conditions and the following disclaimer in
> > + *   the documentation and/or other materials provided with the
> > + *   distribution.
> > + * * Neither the name of Intel Corporation nor the names of its
> > + *   contributors may be used to endorse or promote products derived
> > + *   from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> > FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> > USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> > ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
> > USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +//#include 
> > +//#include 
> > +//#include 
> > +#include 
> > +//#include 
> > +//#include 
>
> Please just delete the commented out lines, there is no need to keep them.
>
Deleted in v2 patch.

>
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "test.h"
> > +
> >

[dpdk-dev] [PATCH v4] Add toeplitz hash algorithm used by RSS

2015-06-19 Thread Vladimir Medvedkin

Hi Bruce,

2015-06-19 18:59 GMT+03:00 Richardson, Bruce :

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir Medvedkin
> > Sent: Friday, June 19, 2015 3:56 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v4] Add toeplitz hash algorithm used by RSS
> >
> > v4 changes
> > - Fix copyright
> > - rename bswap_mask constant, add rte_ prefix
> > - change rte_ipv[46]_tuple struct
> > - change rte_thash_load_v6_addr prototype
> >
> > v3 changes
> > - Rework API to be more generic
> > - Add sctp_tag into tuple
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
>
> Missing signoff line.
>
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 202
> > 
> >  2 files changed, 203 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> <...snip...>
> > +
> > +/* Byte swap mask used for converting IPv6 address 4-byte chunks to CPU
> > byte order */
> > +static const __m128i rte_thash_ipv6_bswap_mask = {0x0405060700010203,
> > 0x0C0D0E0F08090A0B};
> > +
> > +#define RTE_THASH_V4_L3   2  /*calculate hash of ipv4 header
> only*/
> > +#define RTE_THASH_V4_L4   3  /*calculate hash of ipv4 +
> transport
> > headers*/
> > +#define RTE_THASH_V6_L3   8  /*calculate hash of ipv6 header
> only
> > */
> > +#define RTE_THASH_V6_L4   9  /*calculate hash of ipv6 +
> transport
> > headers */
>
> I'm still not seeing why these values need to be defined here, rather than
> in a specific app.
> Also, the choice of values for these defines seems strange to me? How were
> they chosen?
>
This is a predefined values. They mean the length (in 4-bytes) of the
input data
in hashing. I think it's like defines in rte_ip.h, for example.

>
> /Bruce
>
>

[dpdk-dev] [PATCH v3] Add unit test for thash library

2015-06-19 Thread Vladimir Medvedkin

Add unit test for thash library

v3 changes
- Fix checkpatch errors

v2 changes
- fix typo
- remove unnecessary comments

Signed-off-by: Vladimir Medvedkin 
---
 app/test/Makefile |   2 +
 app/test/autotest_data.py |  13 
 app/test/test_thash.c | 176 ++
 3 files changed, 191 insertions(+)
 create mode 100644 app/test/test_thash.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 2e2758c..22e0052 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -86,6 +86,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c

+SRCS-y += test_thash.c
+
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm6.c

diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 0c3802b..7653f09 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -475,6 +475,19 @@ non_parallel_test_group_list = [
},
]
 },
+{
+   "Prefix" :  "thash",
+   "Memory" :  "32",
+   "Tests" :
+   [
+   {
+ "Name" :   "Thash autotest",
+ "Command" :"thash_autotest",
+ "Func" :   default_autotest,
+ "Report" : None,
+},
+   ]
+},

 #
 # Please always make sure that ring_perf is the last test!
diff --git a/app/test/test_thash.c b/app/test/test_thash.c
new file mode 100644
index 000..2a9eb28
--- /dev/null
+++ b/app/test/test_thash.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+struct test_thash_v4 {
+   uint32_tdst_ip;
+   uint32_tsrc_ip;
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+struct test_thash_v6 {
+   uint8_t dst_ip[16];
+   uint8_t src_ip[16];
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+/*From 82599 Datasheet 7.1.2.8.3 RSS Verification Suite*/
+struct test_thash_v4 v4_tbl[] = {
+{IPv4(161, 142, 100, 80), IPv4(66, 9, 149, 187),
+   1766, 2794, 0x323e8fc2, 0x51ccc178},
+{IPv4(65, 69, 140, 83), IPv4(199, 92, 111, 2),
+   4739, 14230, 0xd718262a, 0xc626b0ea},
+{IPv4(12, 22, 207, 184), IPv4(24, 19, 198, 95),
+   38024, 12898, 0xd2d0a5de, 0x5c2b394a},
+{IPv4(209, 142, 163, 6), IPv4(38, 27, 205, 30),
+   2217, 48228, 0x82989176, 0xafc7327f},
+{IPv4(202, 188, 127, 2), IPv4(153, 39, 163, 191),
+   1303, 44251, 0x5d1809c5, 0x10e828a2},
+};
+
+struct test_thash_v6 v6_tbl[] = {
+/*3ffe:2501:200:3::1*/
+{{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x00, 0x03,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01,},
+/*3ffe:2501:200:1fff::7*/
+{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x1f, 0xff,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07,},
+1766, 2794, 0x2cc18cd5, 0x40207d3d},
+/*ff02::1*/
+{{0xff, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00,

[dpdk-dev] [PATCH v5] Add toeplitz hash algorithm used by RSS

2015-06-19 Thread Vladimir Medvedkin

Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

v5 changes
- Fix errors reported by checkpatch.pl

v4 changes
- Fix copyright
- rename bswap_mask constant, add rte_ prefix
- change rte_ipv[46]_tuple struct
- change rte_thash_load_v6_addr prototype

v3 changes
- Rework API to be more generic
- Add sctp_tag into tuple

v2 changes
- Add ipv6 support
- Various style fixes

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 207 
 2 files changed, 208 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..877bd50
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* Byte swap mask used for converting IPv6 address
+ * 4-byte chunks to CPU byte order
+ */
+static const __m128i rte_thash_ipv6_bswap_mask = {
+   0x0405060700010203, 0x0C0D0E0F08090A0B};
+
+#define RTE_THASH_V4_L3 2  /*calculate hash of ipv4 header only*/
+#define RTE_THASH_V4_L4 3  /*calculate hash of ipv4 + transport 
headers*/
+#define RTE_THASH_V6_L3 8  /*calculate hash of ipv6 header only */
+#define RTE_THASH_V6_L4 9  /*calculate hash of ipv6 + transport 
headers */
+
+/**
+ * IPv4 tuple
+ * addreses and ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   union {
+   struct {
+   uint16_t dport;
+   uint16_t sport;
+   };
+   uint32_tsctp_tag;
+   };
+};
+
+/**
+ * IPv6 tuple
+ * Addresses have to be filled by rte_thash_load_v6_addr()
+ * ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv6_tuple {
+   uint8_t src_addr[16];
+   uint8_t dst_addr[16];
+   union {
+   struct {
+   uint16_t dport;
+   uint16_t sport;
+   };
+   uint

[dpdk-dev] [PATCH v2] Add unit test for thash library

2015-06-19 Thread Vladimir Medvedkin

Add unit test for thash library

v2 changes
- fix typo
- remove unnecessary comments

---
 app/test/Makefile |   2 +
 app/test/autotest_data.py |  13 
 app/test/test_thash.c | 157 ++
 3 files changed, 172 insertions(+)
 create mode 100644 app/test/test_thash.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 5cf8296..fc6a247 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -85,6 +85,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c

+SRCS-y += test_thash.c
+
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm6.c

diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 0c3802b..7653f09 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -475,6 +475,19 @@ non_parallel_test_group_list = [
},
]
 },
+{
+   "Prefix" :  "thash",
+   "Memory" :  "32",
+   "Tests" :
+   [
+   {
+ "Name" :   "Thash autotest",
+ "Command" :"thash_autotest",
+ "Func" :   default_autotest,
+ "Report" : None,
+},
+   ]
+},

 #
 # Please always make sure that ring_perf is the last test!
diff --git a/app/test/test_thash.c b/app/test/test_thash.c
new file mode 100644
index 000..148fd0a
--- /dev/null
+++ b/app/test/test_thash.c
@@ -0,0 +1,157 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+struct test_thash_v4 {
+   uint32_tdst_ip;
+   uint32_tsrc_ip;
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+struct test_thash_v6 {
+   uint8_t dst_ip[16];
+   uint8_t src_ip[16];
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+/*From 82599 Datasheet p.309 7.1.2.8.3 RSS Verification Suite*/
+struct test_thash_v4 v4_tbl[] =
+{
+{IPv4(161,142,100,80), IPv4(66,9,149,187), 1766, 2794, 0x323e8fc2, 0x51ccc178},
+{IPv4(65,69,140,83), IPv4(199,92,111,2), 4739, 14230, 0xd718262a, 0xc626b0ea},
+{IPv4(12,22,207,184), IPv4(24,19,198,95), 38024, 12898, 0xd2d0a5de, 
0x5c2b394a},
+{IPv4(209,142,163,6), IPv4(38,27,205,30), 2217, 48228, 0x82989176, 0xafc7327f},
+{IPv4(202,188,127,2), IPv4(153,39,163,191), 1303, 44251, 0x5d1809c5, 
0x10e828a2},
+};
+
+struct test_thash_v6 v6_tbl[] =
+{
+/*3ffe:2501:200:3::1*/
+{{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x01,},
+/*3ffe:2501:200:1fff::7*/
+ {0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x1f, 0xff, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x07,},
+ 1766, 2794, 0x2cc18cd5, 0x40207d3d},
+/*ff02::1*/
+{{0xff, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x01,},
+/*3ffe:501:8::260:97ff:fe40:efab*/
+ {0x3f, 0xfe, 0x05, 0x01, 0x00, 0x08, 0x00, 0x00, 0x02, 0x60, 0x97, 0xff

[dpdk-dev] [PATCH] Add unit test for thash library

2015-06-19 Thread Vladimir Medvedkin

Add unit test for thash library

---
 app/test/Makefile |   2 +
 app/test/autotest_data.py |  13 
 app/test/test_thash.c | 164 ++
 3 files changed, 179 insertions(+)
 create mode 100644 app/test/test_thash.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 5cf8296..fc6a247 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -85,6 +85,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c

+SRCS-y += test_thash.c
+
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm6.c

diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 0c3802b..7653f09 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -475,6 +475,19 @@ non_parallel_test_group_list = [
},
]
 },
+{
+   "Prefix" :  "thash",
+   "Memory" :  "32",
+   "Tests" :
+   [
+   {
+ "Name" :   "Thash autotest",
+ "Command" :"thash_autotest",
+ "Func" :   default_autotest,
+ "Report" : None,
+},
+   ]
+},

 #
 # Please always make sure that ring_perf is the last test!
diff --git a/app/test/test_thash.c b/app/test/test_thash.c
new file mode 100644
index 000..4c863cc
--- /dev/null
+++ b/app/test/test_thash.c
@@ -0,0 +1,164 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+//#include 
+//#include 
+//#include 
+#include 
+//#include 
+//#include 
+
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+
+struct test_thash_v4 {
+   uint32_tdst_ip;
+   uint32_tsrc_ip;
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+struct test_thash_v6 {
+   uint8_t dst_ip[16];
+   uint8_t src_ip[16];
+   uint16_tdst_port;
+   uint16_tsrc_port;
+   uint32_thash_l3;
+   uint32_thash_l3l4;
+};
+
+/*From 82599 Datasheet p.309 ???7.1.2.8.RSS Verification Suite*/
+struct test_thash_v4 v4_tbl[] =
+{
+{IPv4(161,142,100,80), IPv4(66,9,149,187), 1766, 2794, 0x323e8fc2, 0x51ccc178},
+{IPv4(65,69,140,83), IPv4(199,92,111,2), 4739, 14230, 0xd718262a, 0xc626b0ea},
+{IPv4(12,22,207,184), IPv4(24,19,198,95), 38024, 12898, 0xd2d0a5de, 
0x5c2b394a},
+{IPv4(209,142,163,6), IPv4(38,27,205,30), 2217, 48228, 0x82989176, 0xafc7327f},
+{IPv4(202,188,127,2), IPv4(153,39,163,191), 1303, 44251, 0x5d1809c5, 
0x10e828a2},
+};
+
+struct test_thash_v6 v6_tbl[] =
+{
+/*3ffe:2501:200:3::1*/
+{{0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x01,},
+/*3ffe:2501:200:1fff::7*/
+ {0x3f, 0xfe, 0x25, 0x01, 0x02, 0x00, 0x1f, 0xff, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x07,},
+ 1766, 2794, 0x2cc18cd5, 0x40207d3d},
+/*ff02::1*/
+{{0xff, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x01,},
+/*3ffe:501:8::260:97ff:fe40:efab*/
+ {0x3f, 0xfe, 0x05, 0x01, 0x00, 0x08, 0x00, 0x00

[dpdk-dev] [PATCH v4] Add toeplitz hash algorithm used by RSS

2015-06-19 Thread Vladimir Medvedkin

v4 changes
- Fix copyright
- rename bswap_mask constant, add rte_ prefix
- change rte_ipv[46]_tuple struct
- change rte_thash_load_v6_addr prototype

v3 changes
- Rework API to be more generic
- Add sctp_tag into tuple

v2 changes
- Add ipv6 support
- Various style fixes

---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 202 
 2 files changed, 203 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..7b3cc52
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,202 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Vladimir Medvedkin 
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* Byte swap mask used for converting IPv6 address 4-byte chunks to CPU byte 
order */
+static const __m128i rte_thash_ipv6_bswap_mask = {0x0405060700010203, 
0x0C0D0E0F08090A0B};
+
+#define RTE_THASH_V4_L3 2  /*calculate hash of ipv4 header only*/
+#define RTE_THASH_V4_L4 3  /*calculate hash of ipv4 + transport 
headers*/
+#define RTE_THASH_V6_L3 8  /*calculate hash of ipv6 header only */
+#define RTE_THASH_V6_L4 9  /*calculate hash of ipv6 + transport 
headers */
+
+/**
+ * IPv4 tuple
+ * addreses and ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   union {
+   struct {
+   uint16_t dport;
+   uint16_t sport;
+   };
+   uint32_tsctp_tag;
+   };
+};
+
+/**
+ * IPv6 tuple
+ * Addresses have to be filled by rte_thash_load_v6_addr()
+ * ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv6_tuple {
+   uint8_t src_addr[16];
+   uint8_t dst_addr[16];
+   union {
+   struct {
+   uint16_t dport;
+   uint16_t sport;
+   };
+   uint32_tsctp_tag;
+   };
+};
+
+union rte_thash_tuple {
+   struct rte_ipv4_tuple   v4;
+   struct rte_ipv6_tuple   v6;
+} __attribute__((aligned(16)));
+
+/**
+ * Prepare special converted key to use with rte_softrss_be()
+ * @param orig
+ *   pointer to original RSS key
+ * @param targ
+ *   pointer to target RS

[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS

2015-05-08 Thread Vladimir Medvedkin

Hi Andrey,

OK, so be it. Thus in case you want to distribute (or just calculate hash
based on non standart tuple) - use your own tuple and own hash key (length
of tuple and key - responsible of the programmer). In case you want to
emulate NIC RSS - use union rte_thash_tuple (still needs to be updated with
new NICs input tuples) and NIC RSS hash key.
P.S Thanks for reviews.

Regards,
Vladimir

2015-05-07 14:38 GMT+03:00 Chilikin, Andrey :

>  Hi Vladimir,
>
>
>
> Yes, at the moment NICs support limited input sets for hash calculation,
> but why limit SW for the same sets if it can be done in more general way
> and be easily scalable for HW updates? Using limited input set for RSS is
> not a feature of Toeplitz hash, but limitation of HW. I believe that
> general Toeplitz function will be more appropriate ? it will cover input
> sets currently supported by HW and also will be easily scalable for future
> HW. Also, talking about different NICs ? Niantic and Fortville, for
> example, have hash keys of different length, so rte_softrss() function
> should take into account hash key?s length as well.
>
>  Regards,
>
> Andrey
>
>
>
>
>
> *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com]
> *Sent:* Thursday, May 7, 2015 11:28 AM
> *To:* Chilikin, Andrey
> *Cc:* dev at dpdk.org
> *Subject:* Re: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by
> RSS
>
>
>
> Hi Andrey,
>
> The main goal of this new functions is to calculate the hash which is
> equal to the hash of the NIC.
> According to XL710 datasheet table 7-5 for sctp input set consists of
> IP4-S, IP4-D, SCTP-Verification-Tag. I don't see any NIC that uses QinQ or
> single vlan tag, ip proto number, tunnel id, vxlan, etc for calculating RSS
> hash. If it appear we can always update union rte_thash_tuple.
> I think it should be like:
>
> struct rte_ports {
> uint16_t dport;
> uint16_t sport;
> };
>
> union rte_thash_l4 {
> struct  rte_ports ports;
> uint32_tsctp_tag;
> };
> struct rte_ipv4_tuple {
> uint32_tsrc_addr;
> uint32_tdst_addr;
> union rte_thash_l4 l4;
> };
>
> If it is necessary to distribute packets according to non standart tuples
> I think it's more appropriate to use crc32 or jhash because of speed.
> rte_softrss_be consumes 400-500 clocks for each 4-byte input at E3
> 1230v1 at 3.2GHz. This means for ipv4+tcp it consumes ~1500 clocks.
>
> If you or someone still think you need general toeplitz hash I'll add it.
>
> Regards,
>
> Vladimir
>
>
>
>
>
> 2015-05-05 19:03 GMT+03:00 Chilikin, Andrey :
>
> Hi Vladimir,
>
> Why limit Toeplitz hash calculation to predefined tuples and length?
> Should it be more general, something like
> rte_softrss_be(void *input, uint32_t input_len, const uint8_t *rss_key) to
> enable hash calculation for an input of any size? It would be useful for
> distributing packets using some non-standard tuples, like hashing on QinQ
> or adding IP protocol to hash calculation to separate UDP and TCP flows or
> even some other fields from a packet, for example, tunnel ID from VXLAN
> headers. By the way, i40e already supports RSS for SCTP in addition to TCP
> and UDP and includes Verification Tag as well as SCTP source and
> destination ports for RSS hash.
>
> Regards,
> Andrey
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir
> > Medvedkin
> > Sent: Tuesday, May 5, 2015 2:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS
> >
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC or for
> > simulating of RSS computation on specific NIC (for example after GRE
> header
> > decapsulating).
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
> > Signed-off-by: Vladimir Medvedkin 
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 209
> > 
> >  2 files changed, 210 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> > diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index
> > 3696cb1..981230b 100644
> > --- a/lib/librte_hash/Makefile
> > +++ b/lib/librte_hash/Makefile
> > @@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
> > SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h  SYMLINK-
> > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_

[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS

2015-05-08 Thread Vladimir Medvedkin

Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

v3 changes
- Rework API to be more generic
- Add sctp_tag into tuple

v2 changes
- Add ipv6 support
- Various style fixes

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 207 
 2 files changed, 208 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..5d5111b
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+
+#ifdef __SSE3__
+static const __m128i bswap_mask = {0x0405060700010203, 0x0C0D0E0F08090A0B};
+#endif
+
+#define RTE_THASH_V4_L3 2  /*calculate hash of ipv4 header only*/
+#define RTE_THASH_V4_L4 3  /*calculate hash of ipv4 + transport 
headers*/
+#define RTE_THASH_V6_L3 8  /*calculate hash of ipv6 header only */
+#define RTE_THASH_V6_L4 9  /*calculate hash of ipv6 + transport 
headers */
+
+struct rte_ports {
+uint16_t dport;
+uint16_t sport;
+};
+
+union rte_thash_l4 {
+struct  rte_ports ports;
+uint32_tsctp_tag;
+};
+
+/**
+ * IPv4 tuple
+ * addreses and ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   union rte_thash_l4 l4;
+};
+
+/**
+ * IPv6 tuple
+ * Addresses have to be filled by rte_thash_load_v6_addr()
+ * ports/sctp_tag have to be CPU byte order
+ */
+struct rte_ipv6_tuple {
+   uint8_t src_addr[16];
+   uint8_t dst_addr[16];
+   union rte_thash_l4 l4;
+};
+
+union rte_thash_tuple {
+   struct rte_ipv4_tuple   v4;
+   struct rte_ipv6_tuple   v6;
+} __attribute__((aligned(16)));
+
+/**
+ * Prepare special converted key to use with rte_softrss_be()
+ * @param orig
+ *   pointer to original RSS key
+ * @param targ
+ *   pointer to target RSS key
+ * @param len
+ *   RSS key length
+ */
+static inline void
+rte_convert_rss_key(const uint

[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS

2015-05-07 Thread Vladimir Medvedkin

Hi Andrey,

The main goal of this new functions is to calculate the hash which is equal
to the hash of the NIC.
According to XL710 datasheet table 7-5 for sctp input set consists of
IP4-S, IP4-D, SCTP-Verification-Tag. I don't see any NIC that uses QinQ or
single vlan tag, ip proto number, tunnel id, vxlan, etc for calculating RSS
hash. If it appear we can always update union rte_thash_tuple.
I think it should be like:
struct rte_ports {
uint16_t dport;
uint16_t sport;
};

union rte_thash_l4 {
struct  rte_ports ports;
uint32_tsctp_tag;
};
struct rte_ipv4_tuple {
uint32_tsrc_addr;
uint32_tdst_addr;
union rte_thash_l4 l4;
};
If it is necessary to distribute packets according to non standart tuples I
think it's more appropriate to use crc32 or jhash because of speed.
rte_softrss_be consumes 400-500 clocks for each 4-byte input at E3
1230v1 at 3.2GHz. This means for ipv4+tcp it consumes ~1500 clocks.
If you or someone still think you need general toeplitz hash I'll add it.

Regards,
Vladimir


2015-05-05 19:03 GMT+03:00 Chilikin, Andrey :

> Hi Vladimir,
>
> Why limit Toeplitz hash calculation to predefined tuples and length?
> Should it be more general, something like
> rte_softrss_be(void *input, uint32_t input_len, const uint8_t *rss_key) to
> enable hash calculation for an input of any size? It would be useful for
> distributing packets using some non-standard tuples, like hashing on QinQ
> or adding IP protocol to hash calculation to separate UDP and TCP flows or
> even some other fields from a packet, for example, tunnel ID from VXLAN
> headers. By the way, i40e already supports RSS for SCTP in addition to TCP
> and UDP and includes Verification Tag as well as SCTP source and
> destination ports for RSS hash.
>
> Regards,
> Andrey
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir
> > Medvedkin
> > Sent: Tuesday, May 5, 2015 2:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS
> >
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC or for
> > simulating of RSS computation on specific NIC (for example after GRE
> header
> > decapsulating).
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
> > Signed-off-by: Vladimir Medvedkin 
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 209
> > 
> >  2 files changed, 210 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> > diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index
> > 3696cb1..981230b 100644
> > --- a/lib/librte_hash/Makefile
> > +++ b/lib/librte_hash/Makefile
> > @@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
> > SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h  SYMLINK-
> > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h  SYMLINK-
> > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
> >
> >  # this lib needs eal
> > diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
> new file
> > mode 100644 index 000..42c7bf6
> > --- /dev/null
> > +++ b/lib/librte_hash/rte_thash.h
> > @@ -0,0 +1,209 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + * * Redistributions of source code must retain the above copyright
> > + *   notice, this list of conditions and the following disclaimer.
> > + * * Redistributions in binary form must reproduce the above
> copyright
> > + *   notice, this list of conditions and the following disclaimer in
> > + *   the documentation and/or other materials provided with the
> > + *   distribution.
> > + * * Neither the name of Intel Corporation nor the names of its
> > + *   contributors may be used to endorse or promote products derived
> > + *   from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTR

[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS

2015-05-05 Thread Vladimir Medvedkin

Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

v2 changes
- Add ipv6 support
- Various style fixes

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 209 
 2 files changed, 210 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..981230b 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h

 # this lib needs eal
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..42c7bf6
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,209 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+#include 
+
+#ifdef __SSE3__
+static const __m128i bswap_mask = {0x0405060700010203, 0x0C0D0E0F08090A0B};
+#endif
+
+enum rte_thash_len {
+   RTE_THASH_V4_L3 = 2,/*calculate hash of ipv4 header only*/
+   RTE_THASH_V4_L4 = 3,/*calculate hash of ipv4 + transport headers*/
+   RTE_THASH_V6_L3 = 8,/*calculate hash of ipv6 header only */
+   RTE_THASH_V6_L4 = 9,/*calculate hash of ipv6 + transport headers */
+};
+
+/**
+ * IPv4 tuple
+ * addreses and ports have to be CPU byte order
+ */
+struct rte_ipv4_tuple {
+   uint32_tsrc_addr;
+   uint32_tdst_addr;
+   uint16_tdport;
+   uint16_tsport;
+};
+
+/**
+ * IPv6 tuple
+ * Addresses have to be filled by rte_thash_load_v6_addr()
+ * ports have to be CPU byte order
+ */
+struct rte_ipv6_tuple {
+   uint8_t src_addr[16];
+   uint8_t dst_addr[16];
+   uint16_tdport;
+   uint16_tsport;
+};
+
+union rte_thash_tuple {
+   struct rte_ipv4_tuple   v4;
+   struct rte_ipv6_tuple   v6;
+} __attribute__((aligned(16)));
+
+/**
+ * Prepare special converted key to use with rte_softrss_be()
+ * @param orig
+ *   pointer to original RSS key
+ * @param targ
+ *   pointer to target RSS key
+ * @param len
+ *   RSS key length
+ */
+static inline void
+rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
+{
+   int i;
+
+   for (i = 0; i < (len >> 2); i++) {
+   targ[i] = rte_be_to_cpu_32(orig[i]);
+   }
+}
+
+/**
+ * Prepare and load IPv6 address

[dpdk-dev] Can't allocate different number of TX and RX queues for single port of ixgbe NIC

2015-04-28 Thread Vladimir Medvedkin

Hi Pavel,

I think mistake is here:
-int eth_configure_ret = rte_eth_dev_configure(current_port, tx_queues,
rx_queues, _port_conf);
+int eth_configure_ret = rte_eth_dev_configure(current_port, rx_queues,
tx_queues, _port_conf);
according to
http://dpdk.org/doc/api/rte__ethdev_8h.html#ac30d075b4b206c7122e200164ce69893
second arg is number of rx queues

Regards,
Vladimir

2015-04-28 13:02 GMT+03:00 Pavel Odintsov :

> Hello, Network Performance Gurus!
>
> I have Debian Jessie with 3.16 kernel, DPDK 2.0.0 with ixgbe NIC. And
> I wrote following code:
> https://gist.github.com/pavel-odintsov/e1f64de4d56c0ab1b37c
>
> I try to allocate 2 queues for TX and only 1 queue for RX and I can't
> do it with error (detailed error message
> https://gist.github.com/pavel-odintsov/507cf7a082793f547120):
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9e9dcdbc80
> hw_ring=0x7f9e9dd41500 dma_addr=0x36b41500
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9e9dcd9b40
> hw_ring=0x7f9e9dd51580 dma_addr=0x36b51580
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> EAL: Error - exiting with code: 1
>   Cause: Can't configure TX queue 1 for port 0
>
> I could fix this issue with allocation 2 queues for TX and 2 queues
> for RX. But it's useless for my aplication because I need multiple TX
> queue but could use only one RX and I want ability to specify
> different number of queues for NIC.
>
> Thank you so much!
>
> --
> Sincerely yours, Pavel Odintsov
>

[dpdk-dev] [PATCH] Add toeplitz hash algorithm

2015-04-09 Thread Vladimir Medvedkin

Hi Gleb,


2015-04-09 9:37 GMT+03:00 Gleb Natapov :

> On Wed, Apr 08, 2015 at 03:06:13PM -0400, Vladimir Medvedkin wrote:
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC
> > or for simulating of RSS computation on specific NIC (for example
> > after GRE header decapsulating).
> >
> > Signed-off-by: Vladimir Medvedkin 
> > ---
> >  lib/librte_hash/Makefile|   1 +
> >  lib/librte_hash/rte_thash.h | 179
> 
> >  2 files changed, 180 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> > diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
> > index 3696cb1..083a9e5 100644
> > --- a/lib/librte_hash/Makefile
> > +++ b/lib/librte_hash/Makefile
> > @@ -50,6 +50,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
> >
> >  # this lib needs eal
> >  DEPDIRS-$(CONFIG_RTE_LIBRTE_HASH) += lib/librte_eal lib/librte_malloc
> > diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
> > new file mode 100644
> > index 000..1acfa3a
> > --- /dev/null
> > +++ b/lib/librte_hash/rte_thash.h
> > @@ -0,0 +1,179 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + * * Redistributions of source code must retain the above copyright
> > + *   notice, this list of conditions and the following disclaimer.
> > + * * Redistributions in binary form must reproduce the above
> copyright
> > + *   notice, this list of conditions and the following disclaimer in
> > + *   the documentation and/or other materials provided with the
> > + *   distribution.
> > + * * Neither the name of Intel Corporation nor the names of its
> > + *   contributors may be used to endorse or promote products derived
> > + *   from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
> USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_THASH_H
> > +#define _RTE_THASH_H
> > +
> > +/**
> > + * @file
> > + *
> > + * toeplitz hash functions.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Software implementation of the Toeplitz hash function used by RSS.
> > + * Can be used either for packet distribution on single queue NIC
> > + * or for simulating of RSS computation on specific NIC (for example
> > + * after GRE header decapsulating)
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +enum rte_thash_flag {
> > + RTE_THASH_L3 = 0,   //calculate hash tacking into account only
> l3 header
> > + RTE_THASH_L4//calculate hash tacking into account l4 +
> l4 headers
> > +};
> > +
> > +/**
> > + * Prepare special converted key to use with rte_softrss_be()
> > + * @param orig
> > + *   pointer to original RSS key
> > + * @param targ
> > + *   pointer to target RSS key
> > + */
> > +
> > +static inline void
> > +rte_convert_rss_key(

[dpdk-dev] [PATCH] Add toeplitz hash algorithm

2015-04-09 Thread Vladimir Medvedkin

Hi Stephen,



2015-04-09 1:24 GMT+03:00 Stephen Hemminger :

> On Wed,  8 Apr 2015 15:06:13 -0400
> Vladimir Medvedkin  wrote:
>
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC
> > or for simulating of RSS computation on specific NIC (for example
> > after GRE header decapsulating).
> >
> > Signed-off-by: Vladimir Medvedkin 
>
> > +enum rte_thash_flag {
> > + RTE_THASH_L3 = 0,   //calculate hash tacking into account only
> l3 header
> > + RTE_THASH_L4//calculate hash tacking into account l4 +
> l4 headers
> > +};
> > +
> > +/**
> > + * Prepare special converted key to use with rte_softrss_be()
> > + * @param orig
> > + *   pointer to original RSS key
> > + * @param targ
> > + *   pointer to target RSS key
> > + */
> > +
> > +static inline void
> > +rte_convert_rss_key(uint32_t *orig, uint32_t *targ)
> orig should be const
>
> > +{
> > + int i;
> > + for (i = 0; i < 10; i++) {
> > + targ[i] = rte_be_to_cpu_32(orig[i]);
> > + }
> > +}
>
> > +static inline uint32_t
> > +rte_softrss(uint32_t sip, uint32_t dip, uint16_t sp, uint16_t dp, enum
> rte_thash_flag flag, uint32_t *rss_key)
>
> rss_key should be const
>
> > +{
> > + uint32_t ret = 0;
> > + int i;
> > + for (i = 0; i < 32; i++) {
> blank line after declaration please
>
> > + if (sip & (1 << (31 - i))) {
> > + ret ^= (rte_cpu_to_be_32(*rss_key) <<
> i)|(uint32_t)((uint64_t)(rte_cpu_to_be_32(*(rss_key + 1))) >> (32 - i));
>
> Long expression > 80 characters.
> Repeated multiple times (should be inline)
> Extra parens ()
>
Thanks for remarks, I'll fix it.

> Extension to 64 bits is only to avoid compiler warning?
>
No, in case when i = 0 we shift uint32_t left by 32 bits, which leads to
undefined behaviour. In fact, shift counter just masked to 5 bits, so count
range is limited to 0 to 31.

>
>
> > + }
> > + }
> > + rss_key++;
> > + for (i = 0; i < 32; i++) {
> > + if (dip & (1 << (31 - i))) {
> > + ret ^= (rte_cpu_to_be_32(*rss_key) <<
> i)|(uint32_t)((uint64_t)(rte_cpu_to_be_32(*(rss_key + 1))) >> (32 - i));
> > + }
> > + }
> > +if (flag == RTE_THASH_L4) {
> > + rss_key++;
> > + for (i = 0; i < 32; i++) {
> > + if (((sp<<16)|dp) & (1 << (31 - i))) {
> > + ret ^= (rte_cpu_to_be_32(*rss_key) <<
> i)|(uint32_t)((uint64_t)(rte_cpu_to_be_32(*(rss_key + 1))) >> (32 - i));
> > + }
> > + }
> > + }
> > + return ret;
> > +}
> > +
> > +/**
> > + * Optimized implementation.
> > + * If you want the calculated hash value matches NIC RSS value
> > + * you have to use special converted key.
> > + * All ip's and ports have to be CPU byte order.
> > + * @param sip
> > + *   Source ip address.
> > + * @param dip
> > + *   Destination ip address.
> > + * @param sp
> > + *   Source TCP|UDP port.
> > + * @param dp
> > + *   Destination TCP|UDP port.
> > + * @param flag
> > + *   RTE_THASH_L3:   calculate hash tacking into account only sip and
> dip
> > + *   RTE_THASH_L4:   calculate hash tacking into account sip, dip, sp
> and dp
> > + * @param *rss_key
> > + *   Pointer to 40-byte RSS hash key.
> > + * @return
> > + *   Calculated hash value.
> > + */
> > +
> > +static inline uint32_t
> > +rte_softrss_be(uint32_t sip, uint32_t dip, uint16_t sp, uint16_t dp,
> enum rte_thash_flag flag, uint32_t *rss_key)
> > +{
>
> Same problems as previous code.
> Also lots of copy paste (see Do Not Repeat Yourself principle).
>

[dpdk-dev] [PATCH] Add toeplitz hash algorithm

2015-04-08 Thread Vladimir Medvedkin

Software implementation of the Toeplitz hash function used by RSS.
Can be used either for packet distribution on single queue NIC
or for simulating of RSS computation on specific NIC (for example
after GRE header decapsulating).

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_hash/Makefile|   1 +
 lib/librte_hash/rte_thash.h | 179 
 2 files changed, 180 insertions(+)
 create mode 100644 lib/librte_hash/rte_thash.h

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile
index 3696cb1..083a9e5 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -50,6 +50,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h

 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_HASH) += lib/librte_eal lib/librte_malloc
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
new file mode 100644
index 000..1acfa3a
--- /dev/null
+++ b/lib/librte_hash/rte_thash.h
@@ -0,0 +1,179 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_THASH_H
+#define _RTE_THASH_H
+
+/**
+ * @file
+ *
+ * toeplitz hash functions.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Software implementation of the Toeplitz hash function used by RSS.
+ * Can be used either for packet distribution on single queue NIC
+ * or for simulating of RSS computation on specific NIC (for example
+ * after GRE header decapsulating)
+ */
+
+#include 
+#include 
+
+enum rte_thash_flag {
+   RTE_THASH_L3 = 0,   //calculate hash tacking into account only l3 
header
+   RTE_THASH_L4//calculate hash tacking into account l4 + l4 
headers
+};
+
+/**
+ * Prepare special converted key to use with rte_softrss_be()
+ * @param orig
+ *   pointer to original RSS key
+ * @param targ
+ *   pointer to target RSS key
+ */
+
+static inline void
+rte_convert_rss_key(uint32_t *orig, uint32_t *targ)
+{
+   int i;
+   for (i = 0; i < 10; i++) {
+   targ[i] = rte_be_to_cpu_32(orig[i]);
+   }
+}
+
+/**
+ * Generic implementation. Can be used with original rss_key
+ * All ip's and ports have to be CPU byte order.
+ * @param sip
+ *   Source ip address.
+ * @param dip
+ *   Destination ip address.
+ * @param sp
+ *   Source TCP|UDP port.
+ * @param dp
+ *   Destination TCP|UDP port.
+ * @param flag
+ *   RTE_THASH_L3: calculate hash tacking into account only sip and dip
+ *   RTE_THASH_L4: calculate hash tacking into account sip, dip, sp and dp
+ * @param *rss_key
+ *   Pointer to 40-byte RSS hash key.
+ * @return
+ *   Calculated hash value.
+ */
+
+static inline uint32_t
+rte_softrss(uint32_t sip, uint32_t dip, uint16_t sp, uint16_t dp, enum 
rte_thash_flag flag, uint32_t *rss_key)
+{
+   uint32_t ret = 0;
+   int i;
+   for (i = 0; i < 32; i++) {
+   if (sip & (1 << (31 - i))) {
+   ret ^= (rte_cpu_to_be_32(*rss_key) << 
i)|(uint32_t)((uint64_t)(rte_cpu_to_be_32(*(rss_key + 1))) >> (32 - i));
+   }
+   }

[dpdk-dev] Symmetric RSS Hashing, Part 2

2015-03-30 Thread Vladimir Medvedkin

Matthew,

I don't use any special tricks to make symmetric RSS work. Furthermore, it
works not only with 0x6d5a.

Regards,
Vladimir

2015-03-28 23:11 GMT+03:00 Matthew Hall :

> On Sat, Mar 28, 2015 at 12:10:20PM +0300, Vladimir Medvedkin wrote:
> > I just verify RSS symmetric in my code, all works great.
> > ...
> > By the way, maybe it will be usefull to add softrss function in DPDK?
>
> Vladimir,
>
> All of this is super-awesome code. I agree having SW RSS would be quite
> nice.
> Then you could more easily support things like virtio-net and other stuff
> which doesn't have RSS.
>
> Did you have to use any special tricks to get the 0x6d5a to work? I wasn't
> quite
> sure how to initialize that and get it to run right.
>
> Matthew.
>

[dpdk-dev] Symmetric RSS Hashing, Part 2

2015-03-28 Thread Vladimir Medvedkin

Hi Matthew,

I just verify RSS symmetric in my code, all works great. I have 82599 NIC
and dpdk 1.7.0. Moreover, we can use not only 0x6d5a, but repeated random 2
bytes for 4 tuple, and repeated 4 bytes for 2 tuple in rss hash key. Bellow
some code:
uint8_t my_rss_key[40];
static const struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
.split_hdr_size = 0,
.header_split   = 0, /**< Header Split disabled */
.hw_ip_checksum = 0, /**< IP checksum offload disabled */
.hw_vlan_filter = 0, /**< VLAN filtering disabled */
.jumbo_frame= 0, /**< Jumbo Frame Support disabled */
.hw_strip_crc   = 0, /**< CRC stripped by hardware */
},
.txmode = {
.mq_mode = ETH_MQ_TX_NONE,
},
.rx_adv_conf.rss_conf   = {
.rss_key= my_rss_key,
.rss_hf = ETH_RSS_IPV4|ETH_RSS_IPV4_TCP,
},

};
...
int i;
uint8_t a, b;
a = rte_rand();
b = rte_rand();
for (i = 0; i < 40; i++) {
   switch( i & 0x1) {
 case 0: my_rss_key[i] = a; break;
 case 1: my_rss_key[i] = b; break;
}
}

ret = rte_eth_dev_configure(portid, 1, 1, _conf);
...
static uint32_t
softrss(uint32_t sip, uint32_t dip, uint16_t sp, uint16_t dp, int l4flag,
uint32_t *rss_key)
{
uint32_t ret = 0;
int i;
for (i = 0; i < 32; i++) {
if (sip & (1 << (31 - i))) {
ret ^= (rte_cpu_to_be_32(*rss_key) <<
i)|(rte_cpu_to_be_32(*(rss_key + 1)) >> (32 - i));
}
}
rss_key++;
for (i = 0; i < 32; i++) {
if (dip & (1 << (31 - i))) {
ret ^= (rte_cpu_to_be_32(*rss_key) <<
i)|(rte_cpu_to_be_32(*(rss_key + 1)) >> (32 - i));
}
}
rss_key++;
if (l4flag == 1) {
 for (i = 0; i < 32; i++) {
if (((sp<<16)|dp) & (1 << (31 - i))) {
ret ^= (rte_cpu_to_be_32(*rss_key) <<
i)|(rte_cpu_to_be_32(*(rss_key + 1)) >> (32 - i));
}
}
}
return ret;
}
...
uint32_t rss = softrss(rte_be_to_cpu_32(ipv4_hdr->src_addr),
rte_be_to_cpu_32(ipv4_hdr->dst_addr), rte_be_to_cpu_16(tcp_hdr->src_port),
rte_be_to_cpu_16(tcp_hdr->dst_port), 1, (uint32_t *)my_rss_key);
printf("RSS %u \t\t softRSS %u\n",m->pkt.hash.rss, rss);

By the way, maybe it will be usefull to add softrss function in DPDK?

2015-03-27 17:37 GMT+03:00 Matthew Hall :

> On Mar 26, 2015, at 10:30 PM, Zhang, Helin  wrote:
> > Hi guys
> >
> > Did you guys talk about symmetric hash in software or in hardware?
> >
> > If about hardware, I have one comment.
> > I40e supports symmetric hash by hardware, which was enabled in i40e PMD
> recently. You can have a try.
> >
> > Regards,
> > Helin
>
> Hello Helin,
>
> Very few of us have that hardware or driver yet.
>
> It's also quite costly if you're doing the open-source model like I am.
>
> Is there any way to get the symmetric mode to work for IGB or IXGBE?
>
> Matthew.
>
>

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-06-14 Thread Vladimir Medvedkin

Hi Jingjing,

Ok!
Let's get back to this patch after 1.7 release.

Thanks!

Regards,
Vladimir


2014-06-14 5:00 GMT+04:00 Wu, Jingjing :

>  Hi, Vladimir
>
>
>
> Yes, for Fortville, uint8_t is not enough, it was also the concern is to
> keep consistent with flow director?s implementation. But I agree that we
> need to change.
>
>
>
> Let make an agreement like:
>
>
>
> I will make change for the remarks from you. One is the change the type of
> rx_queue to uint16_t. The other is change API like
> ?rte_eth_dev_add_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
> uint16_t rx_queue)?.
>
>
>
> And about the pool and virtualization case, maybe you will send a new
> patch about it, maybe me. Whatever, just leave it in future, not  include
> in this patch.
>
>
>
> Thank you!
>
> Jingjing
>
>
>
> *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com]
> *Sent:* Saturday, June 14, 2014 12:19 AM
> *To:* Wu, Jingjing
> *Cc:* Thomas Monjalon; dev at dpdk.org
>
> *Subject:* Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic
> filter
>
>
>
> Hi Jingjing,
>
> Yes, I agree.
> I have one more remark. It is about type of rx_queue arg. Now it is
> uint8_t. I think we have to change it to uint16_t because for Fortville NIC
> it is not enough. Quote fro the datasheet:
> A PF VSI (Virtual Station Interfaces aka virtual NICs) can allocate and
> use up to 1536 LQPs (LAN queue pairs).
>
> Regards,
>
> Vladimir
>
>
>
> 2014-06-13 18:12 GMT+04:00 Wu, Jingjing :
>
>  Hi, Vladimir
>
>
>
> Thanks a lot for your remarks.
>
>
>
> Yes, your understanding is correct, in non-IOV mode, we can use 64pool,
> per pool can has 2 queues when ETH_MQ_RX_VMDQ_ONLY.  While in IOV mode,
> current DPDK version makes the number of queue to 1 by default. The pools
> logic makes sense, but I didn?t consider it globally with the thinking we
> can do it in future. I will be great if you can generate a new patch based
> on mine. Or we can discuss it further? Due to it is close to the feature
> deliver time now and much verification work for it, it may not possible to
> add it in this patch.
>
>
>
> In API
>
> About your first remark, the reason why I didn?t put the queue in the
> filter structure is that the filter contains the fields used for comparison
> and the queue is acted as result, and another concern is to keep consistent
> with flow director?s implementation.
>
> About your second remark, I will accept it and integrate the change to
> patch in new version.
>
>
>
> Do your  agree my proposal?
>
>
>
>
>
> *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com]
> *Sent:* Friday, June 13, 2014 7:52 PM
> *To:* Thomas Monjalon
> *Cc:* Wu, Jingjing; dev at dpdk.org
>
>
> *Subject:* Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic
> filter
>
>
>
> Hi all,
>
> The 82599 datasheet (p. 284 and p.287) has only recommendations and only
> when possible about assign rx queue not used by RSS/DCB. I do not see any
> serious restrictions do not assign the rx queue used by RSS/DCB.
>
> For cases with only 1 queue if I understand correctly this patch
> http://dpdk.org/ml/archives/dev/2014-May/002589.html we can init second
> queue in pool and assign it by filter. In *ETH_MQ_RX_VMDQ_ONLY*  mode
> init all possible queues (even if hardware route packets to zero queue in
> pools) so there no problem. Moreover, it is not necesary for rx queue to be
> set in the same pool.
>
>
> About genericity. I agree with Jingjing, different controllers have
> different definitions for pools or VFs. And it is only Intel controllers!
> It is very hard to predict hardware implementation. For example for
> Fortville I can not find 5-tuple filters at all.
>
>
>
> API. I have several remarks.
>
> 1. You use rx_queue as separate arg. For example:
>
> rte_eth_dev_add_ethertype_filter(uint8_t port_id, uint16_t index, struct
> rte_ethertype_filter *filter, uint8_t rx_queue)
> rte_eth_dev_get_ethertype_filter(uint8_t port_id, uint16_t index, struct
> rte_ethertype_filter *filter, uint8_t *rx_queue)
>
> you can move uint8_t rx_queue into struct rte_ethertype_filter *filter.
>
> 2. In SYN filter:
> rte_eth_dev_add_syn_filter(uint8_t port_id, uint8_t high_pri, uint8_t
> rx_queue)
> rte_eth_dev_get_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
> uint8_t *rx_queue)
>
> In first ADD func you alloc struct rte_syn_filter inside func, but in GET
> func you have to alloc struct rte_syn_filter in your app. May be better to
> do
> rte_eth_dev_add_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-06-13 Thread Vladimir Medvedkin

Hi Jingjing,

Yes, I agree.
I have one more remark. It is about type of rx_queue arg. Now it is
uint8_t. I think we have to change it to uint16_t because for Fortville NIC
it is not enough. Quote fro the datasheet:
A PF VSI (Virtual Station Interfaces aka virtual NICs) can allocate and use
up to 1536 LQPs (LAN queue pairs).

Regards,
Vladimir


2014-06-13 18:12 GMT+04:00 Wu, Jingjing :

>  Hi, Vladimir
>
>
>
> Thanks a lot for your remarks.
>
>
>
> Yes, your understanding is correct, in non-IOV mode, we can use 64pool,
> per pool can has 2 queues when ETH_MQ_RX_VMDQ_ONLY.  While in IOV mode,
> current DPDK version makes the number of queue to 1 by default. The pools
> logic makes sense, but I didn?t consider it globally with the thinking we
> can do it in future. I will be great if you can generate a new patch based
> on mine. Or we can discuss it further? Due to it is close to the feature
> deliver time now and much verification work for it, it may not possible to
> add it in this patch.
>
>
>
> In API
>
> About your first remark, the reason why I didn?t put the queue in the
> filter structure is that the filter contains the fields used for comparison
> and the queue is acted as result, and another concern is to keep consistent
> with flow director?s implementation.
>
> About your second remark, I will accept it and integrate the change to
> patch in new version.
>
>
>
> Do your  agree my proposal?
>
>
>
>
>
> *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com]
> *Sent:* Friday, June 13, 2014 7:52 PM
> *To:* Thomas Monjalon
> *Cc:* Wu, Jingjing; dev at dpdk.org
>
> *Subject:* Re: [dpdk-dev] [PATCH v2 0/4] NIC filters support for generic
> filter
>
>
>
> Hi all,
>
> The 82599 datasheet (p. 284 and p.287) has only recommendations and only
> when possible about assign rx queue not used by RSS/DCB. I do not see any
> serious restrictions do not assign the rx queue used by RSS/DCB.
>
> For cases with only 1 queue if I understand correctly this patch
> http://dpdk.org/ml/archives/dev/2014-May/002589.html we can init second
> queue in pool and assign it by filter. In *ETH_MQ_RX_VMDQ_ONLY*  mode
> init all possible queues (even if hardware route packets to zero queue in
> pools) so there no problem. Moreover, it is not necesary for rx queue to be
> set in the same pool.
>
>
> About genericity. I agree with Jingjing, different controllers have
> different definitions for pools or VFs. And it is only Intel controllers!
> It is very hard to predict hardware implementation. For example for
> Fortville I can not find 5-tuple filters at all.
>
>
>
> API. I have several remarks.
>
> 1. You use rx_queue as separate arg. For example:
>
> rte_eth_dev_add_ethertype_filter(uint8_t port_id, uint16_t index, struct
> rte_ethertype_filter *filter, uint8_t rx_queue)
> rte_eth_dev_get_ethertype_filter(uint8_t port_id, uint16_t index, struct
> rte_ethertype_filter *filter, uint8_t *rx_queue)
>
> you can move uint8_t rx_queue into struct rte_ethertype_filter *filter.
>
> 2. In SYN filter:
> rte_eth_dev_add_syn_filter(uint8_t port_id, uint8_t high_pri, uint8_t
> rx_queue)
> rte_eth_dev_get_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
> uint8_t *rx_queue)
>
> In first ADD func you alloc struct rte_syn_filter inside func, but in GET
> func you have to alloc struct rte_syn_filter in your app. May be better to
> do
> rte_eth_dev_add_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
> uint8_t *rx_queue) ?
>
>
>
> So, Jingjing made a lot of work, much more then I (igb filters, testpmd
> commands). It works the same as mine (not counting pools logic), so let's
> integrate it (it's will be great if jingjing change api according to my
> remarks).
>
>
>
> Regards,
>
> Vladimir
>
>
>
> 2014-06-12 19:36 GMT+04:00 Thomas Monjalon :
>
> > 2014-06-11 17:45, Thomas Monjalon:
>
> > > My main concern is that Vladimir Medvedkin suggested another API and
> I'd
> > > like you give your opinion about it:
> > > http://dpdk.org/ml/archives/dev/2014-June/003053.html
> > > It offers pool number in configuration of the filters.
>
> 2014-06-12 08:08, Wu, Jingjing:
>
> > The pool field is used in virtualization scenario. It is acting as one of
> > input set during filter matching in ixgbe.
> > My patch didn't consider the virtualization scenario in generic filter
> > feature. Because in 82599 datasheet, it is recommended to assign rx
> queues
> > not used by DCB/RSS, that is virtualization without RSS and DCB mode. For
> > this mode, current DPDK version makes the number of queue to 1 by
> default in

[dpdk-dev] [PATCH v2 0/4] NIC filters support for generic filter

2014-06-13 Thread Vladimir Medvedkin

Hi all,

The 82599 datasheet (p. 284 and p.287) has only recommendations and only
when possible about assign rx queue not used by RSS/DCB. I do not see any
serious restrictions do not assign the rx queue used by RSS/DCB.
For cases with only 1 queue if I understand correctly this patch
http://dpdk.org/ml/archives/dev/2014-May/002589.html we can init second
queue in pool and assign it by filter. In *ETH_MQ_RX_VMDQ_ONLY*  mode init
all possible queues (even if hardware route packets to zero queue in pools)
so there no problem. Moreover, it is not necesary for rx queue to be set in
the same pool.

About genericity. I agree with Jingjing, different controllers have
different definitions for pools or VFs. And it is only Intel controllers!
It is very hard to predict hardware implementation. For example for
Fortville I can not find 5-tuple filters at all.

API. I have several remarks.
1. You use rx_queue as separate arg. For example:
rte_eth_dev_add_ethertype_filter(uint8_t port_id, uint16_t index, struct
rte_ethertype_filter *filter, uint8_t rx_queue)
rte_eth_dev_get_ethertype_filter(uint8_t port_id, uint16_t index, struct
rte_ethertype_filter *filter, uint8_t *rx_queue)
you can move uint8_t rx_queue into struct rte_ethertype_filter *filter.
2. In SYN filter:
rte_eth_dev_add_syn_filter(uint8_t port_id, uint8_t high_pri, uint8_t
rx_queue)
rte_eth_dev_get_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
uint8_t *rx_queue)
In first ADD func you alloc struct rte_syn_filter inside func, but in GET
func you have to alloc struct rte_syn_filter in your app. May be better to
do
rte_eth_dev_add_syn_filter(uint8_t port_id, struct rte_syn_filter *filter,
uint8_t *rx_queue) ?

So, Jingjing made a lot of work, much more then I (igb filters, testpmd
commands). It works the same as mine (not counting pools logic), so let's
integrate it (it's will be great if jingjing change api according to my
remarks).

Regards,
Vladimir


2014-06-12 19:36 GMT+04:00 Thomas Monjalon :

> > 2014-06-11 17:45, Thomas Monjalon:
> > > My main concern is that Vladimir Medvedkin suggested another API and
> I'd
> > > like you give your opinion about it:
> > > http://dpdk.org/ml/archives/dev/2014-June/003053.html
> > > It offers pool number in configuration of the filters.
>
> 2014-06-12 08:08, Wu, Jingjing:
> > The pool field is used in virtualization scenario. It is acting as one of
> > input set during filter matching in ixgbe.
> > My patch didn't consider the virtualization scenario in generic filter
> > feature. Because in 82599 datasheet, it is recommended to assign rx
> queues
> > not used by DCB/RSS, that is virtualization without RSS and DCB mode. For
> > this mode, current DPDK version makes the number of queue to 1 by
> default in
> > IOV mode. So in this case it makes no sense make pool as a input set and
> the
> > rx queue also need to be set to in this pool, so just keep the consistent
> > with flow director who also ignore it in previous version.
> > And further E1000/Niantic/Fortville have different definitions for VF, we
> > need to think how to define it more generic.
> > And even just need offer pool number in configuration of the filters as
> what
> > Vladimir did, it also need to verify the interworking with Virtualization
> > for different kinds of NICs, and the interworking with DCB and RSS which
> is
> > not recommended in 82599's datasheet.
> > So I think it will be a good choice to implement generic filter
> interworking
> > with virtualization in future patch. If there is any volunteer to send
> patch
> > for support this concern later, it will be also cool.
>
> Vladimir, do you agree with this analysis?
> As you suggested another implementation, I need you acknowledgment for this
> patchset to be integrated.
>
> Thanks
> --
> Thomas
>

[dpdk-dev] [PATCH 0/3] ixgbe: Add L2 Ethertype, SYN and Five tuple queue filters

2014-06-04 Thread Vladimir Medvedkin

Hi Thomas,

Sorry for late reply, I'm on vacation now.

1. I'm not shure about other NICs but Intel. API for Intel NICs is generic
enough, even more generic than Jingjing's API because of pool logic.
Besides I think it's more properly make rx_queue as part of filter struct
for Jingjing's etype and 5-tuple filters implementation.

2. I'll try to send checked patch today.

Regards,
Vladimir.


2014-05-28 3:09 GMT+04:00 Thomas Monjalon :

> Hi Vladimir,
>
> Seems like hardware filtering becomes useful these days :)
>
> 2014-05-19 19:51, Vladimir Medvedkin:
> > This patchset adds in addition to the Flow Director filters L2 Ethertype,
> > SYN and Five tuple queue filters to route packets according to ethertype,
> > l4 proto, source/destination ip/ports pool and presence of SYN flag in
> TCP
> > packet. Unlike http://dpdk.org/ml/archives/dev/2014-May/002512.html this
> > gives capability to work with pools. This patch functionality can be
> merged
> > with the patch above.
>
> 2 comments:
>
> 1) Do you have a good confidence that this new API is generic enough to be
> used by other NICs than ixgbe?
>
> 2) Could you try to check your patches with the kernel script
> checkpatch.pl,
> please?
>
> Thanks
> --
> Thomas
>

[dpdk-dev] [PATCH v2 3/3] ixgbe: Add five tuple filter for ixgbe

2014-06-04 Thread Vladimir Medvedkin

This patch adds ability to route packets according to source, destination 
ip/ports, L4 proto and pool to certain queue.

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_ether/rte_ethdev.c   |  81 ++
 lib/librte_ether/rte_ethdev.h   |  96 ++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   8 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 115 
 4 files changed, 300 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4e25e59..6e6e282 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1195,6 +1195,87 @@ rte_eth_dev_get_vlan_offload(uint8_t port_id)
 }

 int
+rte_eth_dev_5tuple_add_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_5tuple_filter *filter)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info info;
+
+   rte_eth_dev_info_get(port_id, );
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   if (filter->rx_queue >= info.max_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", filter->rx_queue);
+   return (-EINVAL);
+   }
+
+   if ((filter->proto == RTE_5TUPLE_PROTO_OTHER) && (filter->mask & 
(ETH_5TUPLE_MASK_SRCPORT|ETH_5TUPLE_MASK_DSTPORT))) {
+   PMD_DEBUG_TRACE(" L4 protocol not TCP, UDP or SCTP, ports are 
meaningless /n");
+   return(-EINVAL);
+   }
+
+   if (filter->mask & ETH_5TUPLE_MASK_POOL) {
+   if (dev->data->dev_conf.rxmode.mq_mode < ETH_MQ_RX_VMDQ_ONLY) {
+   PMD_DEBUG_TRACE("Port %d is in non-VT mode\n", port_id);
+   return (-EINVAL);
+   }
+   if (filter->pool >= 
dev->data->dev_conf.rx_adv_conf.vmdq_rx_conf.nb_queue_pools) {
+   PMD_DEBUG_TRACE("Invalid pool number %d\n", 
filter->pool);
+   return (-EINVAL);
+   }
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->add_5tuple_filter, -ENOTSUP);
+   return (*dev->dev_ops->add_5tuple_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_5tuple_get_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_5tuple_filter *filter)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_5tuple_filter, -ENOTSUP);
+   return (*dev->dev_ops->get_5tuple_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_5tuple_del_filter(uint8_t port_id, uint8_t filter_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->del_5tuple_filter, -ENOTSUP);
+   return (*dev->dev_ops->del_5tuple_filter)(dev, filter_id);
+}
+int
 rte_eth_dev_synq_add_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 6b90aed..2f38f4b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -363,6 +363,14 @@ struct rte_eth_rss_conf {
 /* Definitions used for unicast hash  */
 #define ETH_VMDQ_NUM_UC_HASH_ARRAY  128 /**< Maximum nb. of UC hash array. */

+/* Definitions used for 5 tuple filters  */
+#define ETH_5TUPLE_MASK_SRCIP  0x1
+#define ETH_5TUPLE_MASK_DSTIP  0x2
+#define ETH_5TUPLE_MASK_SRCPORT0x4
+#define ETH_5TUPLE_MASK_DSTPORT0x8
+#define ETH_5TUPLE_MASK_PROTO  0x10
+#define ETH_5TUPLE_MASK_POOL   0x20
+
 /* Definitions used for L2 Ether type filters  */
 #define ETH_L2ETYPE_UP_EN  0x1
 #define ETH_L2ETYPE_POOL_EN0x2
@@ -562,6 +570,31 @@ struct rte_eth_pfc_conf {
 };

 /**
+ *  Possible l4type of 5 tuple filters.
+ */
+enum rte_5tuple_proto {
+   RTE_5TUPLE_PROTO_TCP = 0,   /**< TCP. */
+   RTE_5TUPLE_PROTO_UDP,   /**< UDP. */
+   RTE_5TUPLE_PROTO_SCTP,  /**< SCTP. */
+   RTE_5TUPLE_PROTO_OTHER, /**< Other. */
+};
+
+/**
+ * A structur

[dpdk-dev] [PATCH v2 2/3] ixgbe: Add syn queue filter for ixgbe

2014-06-04 Thread Vladimir Medvedkin

This patch adds ability to route TCP packets according to SYN flag presence to 
certain queue.

Signed-off-by: Vladimir Medvedkin 
---
 lib/librte_ether/rte_ethdev.c   | 66 +
 lib/librte_ether/rte_ethdev.h   | 63 +++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  6 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 57 +++-
 4 files changed, 191 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ad19817..4e25e59 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1195,6 +1195,72 @@ rte_eth_dev_get_vlan_offload(uint8_t port_id)
 }

 int
+rte_eth_dev_synq_add_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info info;
+
+   rte_eth_dev_info_get(port_id, );
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   if (filter->rx_queue >= info.max_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", filter->rx_queue);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_add_filter)(dev, filter);
+}
+
+int
+rte_eth_dev_synq_get_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_get_filter)(dev, filter);
+}
+
+int
+rte_eth_dev_synq_del_filter(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_del_filter)(dev);
+}
+
+int
 rte_eth_dev_l2_etype_add_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_l2etype_filter *filter)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0e6326e..6b90aed 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -562,6 +562,14 @@ struct rte_eth_pfc_conf {
 };

 /**
+ * A structure used to configure SYN Packet queue Filters
+ */
+struct rte_eth_synq_filter {
+   uint8_t rx_queue;
+   uint8_t synq_first; /**< Defines the priority between SYNQF and 
5-tuple filter.  */
+};
+
+/**
  * A structure used to configure L2 Ethertype Filters
  */
 struct rte_eth_l2etype_filter {
@@ -926,6 +934,15 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef int (*synq_add_filter_t)(struct rte_eth_dev *dev, struct 
rte_eth_synq_filter *filter);
+/**< @internal Setup SYN Packer queue filter */
+
+typedef int (*synq_get_filter_t)(struct rte_eth_dev *dev, struct 
rte_eth_synq_filter *filter);
+/**< @internal Get SYN Packer queue filter */
+
+typedef int (*synq_del_filter_t)(struct rte_eth_dev *dev);
+/**< @internal Delete SYN Packer queue filter */
+
 typedef int (*l2_etype_add_filter_t)(struct rte_eth_dev *dev, uint8_t 
filter_id, struct rte_eth_l2etype_filter *filter);
 /**< @internal Setup a new L2 Ethertype filter */

@@ -1144,6 +1161,12 @@ struct eth_dev_ops {
eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
transmit */
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/

+   /** Setup a SYN Packet queue filter. */
+   synq_add_filter_t synq_add_filter;
+   /** Get a SYN Packet queue filter. */
+   synq_get_filter_t synq_get_filter;
+   /** Delete a SYN Packet queue filter. */
+   synq_del_filter_t synq_del_filter;
/** Setup a L2 Ethertype filter. */
l2_etype_add_filter_t l2_etype_add_filter;
/** Get a L2 Ethertype filter. */
@@ -2139,6 +2162,46 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 #endif

 /**
+ * Setup SYN Packet queue filter
+ * @param port_id
+ *   The port

[dpdk-dev] [PATCH v2 0/3] ixgbe: Add L2 Ethertype, SYN and Five tuple queue filters

2014-06-04 Thread Vladimir Medvedkin

This patchset adds in addition to the Flow Director filters L2 Ethertype, SYN 
and Five tuple queue filters to route packets according to ethertype, l4 proto,
source/destination ip/ports pool and presence of SYN flag in TCP packet.
Unlike http://dpdk.org/ml/archives/dev/2014-May/002512.html this gives 
capability to work with pools.
This patch functionality can be merged with the patch above.

V2 changes:
* Fixing various checkpatch.pl errors 

Vladimir Medvedkin (3):
  ixgbe: Add L2 ethertype filter for ixgbe
  ixgbe: Add syn queue filter for ixgbe
  ixgbe: Add five tuple filter for ixgbe

 lib/librte_ether/rte_ethdev.c   | 228 ++
 lib/librte_ether/rte_ethdev.h   | 237 +++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  20 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 273 
 4 files changed, 758 insertions(+)

-- 
1.8.3.2

[dpdk-dev] [PATCH 3/3] ixgbe: Add five tuple filter for ixgbe

2014-05-19 Thread Vladimir Medvedkin

This patch adds ability to route packets according to source, destination 
ip/ports, L4 proto and pool to certain queue.
---
 lib/librte_ether/rte_ethdev.c   |  81 ++
 lib/librte_ether/rte_ethdev.h   |  96 ++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   8 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 115 
 4 files changed, 300 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4597176..6cf838b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1195,6 +1195,87 @@ rte_eth_dev_get_vlan_offload(uint8_t port_id)
 }

 int
+rte_eth_dev_5tuple_add_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_5tuple_filter *filter)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info info;
+
+   rte_eth_dev_info_get(port_id, );
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   if(filter->rx_queue >= info.max_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", filter->rx_queue);
+   return (-EINVAL);
+   }
+
+   if ((filter->proto == RTE_5TUPLE_PROTO_OTHER) && (filter->mask & 
(ETH_5TUPLE_MASK_SRCPORT|ETH_5TUPLE_MASK_DSTPORT))) {
+   PMD_DEBUG_TRACE(" L4 protocol not TCP, UDP or SCTP, ports are 
meaningless /n");
+   return(-EINVAL);
+   }
+
+   if (filter->mask & ETH_5TUPLE_MASK_POOL) {
+   if(dev->data->dev_conf.rxmode.mq_mode < ETH_MQ_RX_VMDQ_ONLY) {
+   PMD_DEBUG_TRACE("Port %d is in non-VT mode\n", port_id);
+   return (-EINVAL);
+   }
+   if(filter->pool >= 
dev->data->dev_conf.rx_adv_conf.vmdq_rx_conf.nb_queue_pools) {
+   PMD_DEBUG_TRACE("Invalid pool number %d\n", 
filter->pool);
+   return (-EINVAL);
+   }
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->add_5tuple_filter, -ENOTSUP);
+   return (*dev->dev_ops->add_5tuple_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_5tuple_get_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_5tuple_filter *filter)
+{
+struct rte_eth_dev *dev;
+
+if (port_id >= nb_ports) {
+PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+return (-ENODEV);
+}
+
+dev = _eth_devices[port_id];
+
+if (filter == NULL) {
+PMD_DEBUG_TRACE("Invalid filter pointer\n");
+return (-EINVAL);
+}
+
+FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_5tuple_filter, -ENOTSUP);
+return (*dev->dev_ops->get_5tuple_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_5tuple_del_filter(uint8_t port_id, uint8_t filter_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->del_5tuple_filter, -ENOTSUP);
+   return (*dev->dev_ops->del_5tuple_filter)(dev, filter_id);
+}
+int
 rte_eth_dev_synq_add_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 6b90aed..7f460c8 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -363,6 +363,14 @@ struct rte_eth_rss_conf {
 /* Definitions used for unicast hash  */
 #define ETH_VMDQ_NUM_UC_HASH_ARRAY  128 /**< Maximum nb. of UC hash array. */

+/* Definitions used for 5 tuple filters  */
+#define ETH_5TUPLE_MASK_SRCIP  0x1
+#define ETH_5TUPLE_MASK_DSTIP  0x2
+#define ETH_5TUPLE_MASK_SRCPORT0x4
+#define ETH_5TUPLE_MASK_DSTPORT0x8
+#define ETH_5TUPLE_MASK_PROTO  0x10
+#define ETH_5TUPLE_MASK_POOL   0x20
+
 /* Definitions used for L2 Ether type filters  */
 #define ETH_L2ETYPE_UP_EN  0x1
 #define ETH_L2ETYPE_POOL_EN0x2
@@ -562,6 +570,31 @@ struct rte_eth_pfc_conf {
 };

 /**
+ *  Possible l4type of 5 tuple filters.
+ */
+enum rte_5tuple_proto {
+RTE_5TUPLE_PROTO_TCP = 0,  /**< TCP. */
+RTE_5TUPLE_PROTO_UDP,  /**< UDP. */
+RTE_5TUPLE_PROTO_SCTP, /**< SCTP. */
+RTE_5TUPLE_PROTO_OTHER,/**< Other. */
+};
+
+/**
+ * A structure used to configure Five Tuple Filters
+ */
+struct rte_eth_5tuple_filter {
+   uint32_tsrc;
+   uint32_tdst;
+   uint16_tsrc_port;
+   uint16_tdst_port;
+

[dpdk-dev] [PATCH 2/3] ixgbe: Add syn queue filter for ixgbe

2014-05-19 Thread Vladimir Medvedkin

This patch adds ability to route TCP packets according to SYN flag presence to 
certain queue.
---
 lib/librte_ether/rte_ethdev.c   | 66 +
 lib/librte_ether/rte_ethdev.h   | 63 +++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  6 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 57 +++-
 4 files changed, 191 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5cd0148..4597176 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1195,6 +1195,72 @@ rte_eth_dev_get_vlan_offload(uint8_t port_id)
 }

 int
+rte_eth_dev_synq_add_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info info;
+
+   rte_eth_dev_info_get(port_id, );
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   if(filter->rx_queue >= info.max_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", filter->rx_queue);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_add_filter)(dev, filter);
+}
+
+int
+rte_eth_dev_synq_get_filter(uint8_t port_id, struct rte_eth_synq_filter 
*filter)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_get_filter)(dev, filter);
+}
+
+int
+rte_eth_dev_synq_del_filter(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->synq_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->synq_del_filter)(dev);
+}
+
+int
 rte_eth_dev_l2_etype_add_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_l2etype_filter *filter)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0e6326e..6b90aed 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -562,6 +562,14 @@ struct rte_eth_pfc_conf {
 };

 /**
+ * A structure used to configure SYN Packet queue Filters
+ */
+struct rte_eth_synq_filter {
+   uint8_t rx_queue;
+   uint8_t synq_first; /**< Defines the priority between SYNQF and 
5-tuple filter.  */
+};
+
+/**
  * A structure used to configure L2 Ethertype Filters
  */
 struct rte_eth_l2etype_filter {
@@ -926,6 +934,15 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef int (*synq_add_filter_t)(struct rte_eth_dev *dev, struct 
rte_eth_synq_filter *filter);
+/**< @internal Setup SYN Packer queue filter */
+
+typedef int (*synq_get_filter_t)(struct rte_eth_dev *dev, struct 
rte_eth_synq_filter *filter);
+/**< @internal Get SYN Packer queue filter */
+
+typedef int (*synq_del_filter_t)(struct rte_eth_dev *dev);
+/**< @internal Delete SYN Packer queue filter */
+
 typedef int (*l2_etype_add_filter_t)(struct rte_eth_dev *dev, uint8_t 
filter_id, struct rte_eth_l2etype_filter *filter);
 /**< @internal Setup a new L2 Ethertype filter */

@@ -1144,6 +1161,12 @@ struct eth_dev_ops {
eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
transmit */
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/

+   /** Setup a SYN Packet queue filter. */
+   synq_add_filter_t synq_add_filter;
+   /** Get a SYN Packet queue filter. */
+   synq_get_filter_t synq_get_filter;
+   /** Delete a SYN Packet queue filter. */
+   synq_del_filter_t synq_del_filter;
/** Setup a L2 Ethertype filter. */
l2_etype_add_filter_t l2_etype_add_filter;
/** Get a L2 Ethertype filter. */
@@ -2139,6 +2162,46 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 #endif

 /**
+ * Setup SYN Packet queue filter
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param filter
+ *   The pointer to the synq filter structure.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support SYN

[dpdk-dev] [PATCH 1/3] ixgbe: Add L2 ethertype filter for ixgbe

2014-05-19 Thread Vladimir Medvedkin

This patch adds ability to route packets according to ethertype, priority and 
pool to certain queue specified in rx_queue field.
---
 lib/librte_ether/rte_ethdev.c   |  81 +
 lib/librte_ether/rte_ethdev.h   |  78 
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |   6 ++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 104 
 4 files changed, 269 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..5cd0148 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1194,6 +1194,87 @@ rte_eth_dev_get_vlan_offload(uint8_t port_id)
return ret;
 }

+int
+rte_eth_dev_l2_etype_add_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_l2etype_filter *filter)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info info;
+
+   rte_eth_dev_info_get(port_id, );
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+if (filter == NULL) {
+PMD_DEBUG_TRACE("Invalid filter pointer\n");
+return (-EINVAL);
+}
+
+if(filter->rx_queue >= info.max_rx_queues) {
+PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", filter->rx_queue);
+return (-EINVAL);
+}
+
+   if (filter->etype == ETHER_TYPE_IPv4 || filter->etype == 
ETHER_TYPE_IPv6){
+   PMD_DEBUG_TRACE("IP and IPv6 are not supported in ethertype 
filter\n");
+   return (-EINVAL);
+   }
+
+if(filter->flags & ETH_L2ETYPE_POOL_EN) {
+if(dev->data->dev_conf.rxmode.mq_mode < ETH_MQ_RX_VMDQ_ONLY) {
+PMD_DEBUG_TRACE("Port %d is in non-VT mode\n", 
port_id);
+return (-EINVAL);
+}
+if(filter->pool >= 
dev->data->dev_conf.rx_adv_conf.vmdq_rx_conf.nb_queue_pools) {
+PMD_DEBUG_TRACE("Invalid pool number %d\n", 
filter->pool);
+return (-EINVAL);
+}
+}
+
+FUNC_PTR_OR_ERR_RET(*dev->dev_ops->l2_etype_add_filter, -ENOTSUP);
+return (*dev->dev_ops->l2_etype_add_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_l2_etype_get_filter(uint8_t port_id, uint8_t filter_id, struct 
rte_eth_l2etype_filter *filter)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (filter == NULL) {
+   PMD_DEBUG_TRACE("Invalid filter pointer\n");
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->l2_etype_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->l2_etype_get_filter)(dev, filter_id, filter);
+}
+
+int
+rte_eth_dev_l2_etype_del_filter(uint8_t port_id, uint8_t filter_id)
+{
+struct rte_eth_dev *dev;
+
+if (port_id >= nb_ports) {
+PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+return (-ENODEV);
+}
+
+dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->l2_etype_add_filter, -ENOTSUP);
+   return (*dev->dev_ops->l2_etype_del_filter)(dev, filter_id);
+}

 int
 rte_eth_dev_fdir_add_signature_filter(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index dea7471..0e6326e 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -363,6 +363,10 @@ struct rte_eth_rss_conf {
 /* Definitions used for unicast hash  */
 #define ETH_VMDQ_NUM_UC_HASH_ARRAY  128 /**< Maximum nb. of UC hash array. */

+/* Definitions used for L2 Ether type filters  */
+#define ETH_L2ETYPE_UP_EN  0x1
+#define ETH_L2ETYPE_POOL_EN0x2
+
 /* Definitions used for VMDQ pool rx mode setting */
 #define ETH_VMDQ_ACCEPT_UNTAG   0x0001 /**< accept untagged packets. */
 #define ETH_VMDQ_ACCEPT_HASH_MC 0x0002 /**< accept packets in multicast table 
. */
@@ -558,6 +562,17 @@ struct rte_eth_pfc_conf {
 };

 /**
+ * A structure used to configure L2 Ethertype Filters
+ */
+struct rte_eth_l2etype_filter {
+   uint16_tetype;
+   uint8_t priority; /**< VLAN User Priority.  */
+   uint8_t pool;
+   uint8_t flags;/**< Flags byte.  */
+   uint8_t rx_queue;
+};
+
+/**
  *  Flow Director setting modes: none (default), signature or perfect.
  */
 enum rte_fdir_mode {
@@ -911,6 +926,15 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef int (*l2_etype_add_filter_t)(struct rte_eth_dev *dev, uint8_t 
filter_id, struct

[dpdk-dev] [PATCH 0/3] ixgbe: Add L2 Ethertype, SYN and Five tuple queue filters

2014-05-19 Thread Vladimir Medvedkin

This patchset adds in addition to the Flow Director filters L2 Ethertype, SYN 
and Five tuple queue filters to route 
packets according to ethertype, l4 proto, source/destination ip/ports pool and 
presence of SYN flag in TCP packet.
Unlike http://dpdk.org/ml/archives/dev/2014-May/002512.html this gives 
capability to work with pools. 
This patch functionality can be merged with the patch above.


Vladimir Medvedkin (3):
  ixgbe: Add L2 ethertype filter for ixgbe
  ixgbe: Add syn queue filter for ixgbe
  ixgbe: Add five tuple filter for ixgbe

 lib/librte_ether/rte_ethdev.c   | 228 ++
 lib/librte_ether/rte_ethdev.h   | 237 +++
 lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h |  20 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 274 
 4 files changed, 759 insertions(+)

-- 
1.8.3.2

[dpdk-dev] Question regarding concurrency and hash table

2014-05-12 Thread Vladimir Medvedkin

Hi,

Programmer's guide section 22.1:

The hash and LPM libraries are, by design, thread unsafe in order to
maintain performance. However, if required the developer can add layers on
top of these libraries to provide thread safety. Locking is not needed in
all situations, and in both the hash and LPM libraries, lookups of values
can be performed in parallel in multiple threads. Adding, removing or
modifying values, however, cannot be done in multiple threads without using
locking when a single hash or LPM table is accessed. Another alternative to
locking would be to create multiple instances of these tables allowing each
thread its own copy.


So, rte_hash_add_key() is not multi-thread safe unlike the
rte_hash_lookup() which is multi-thread safe. You can call add and lookup
func simultaneously if you make sure that rte_hash_add_key() call either
from single dedicated thread or protect by lock hash table.


2014-05-12 12:29 GMT+04:00 Tomas Vestelind :

> Hello all!
>
> I have a question regarding the possible concurrency issues in hash table.
> My questions is:
> Is it possible to call rte_hash_add() and rte_hash_lookup() at the same
> time without data inconsistency?
>
> My guess is that I need to use a lock as protection. I see that you have a
> couple of nice ones :)
>
> BR,
> Tomas Vestelind
>

[dpdk-dev] L2FWD uses 'too much' CPU

2014-04-01 Thread Vladimir Medvedkin

Hi,

One of the objectives of DPDK is avoiding of interrupts, so application
(not only L2FWD) polls NIC infineteley. You can look at  programmers guide
section 24 "Power Management" and "L3 Forwarding with Power Management
Sample Application" in  Sample Applications User Guide.

Regards,
Vladimir.


2014-04-01 9:24 GMT+04:00 Fred Pedrisa :

> Hi,
>
>
>
> Why by default L2FWD saturate both cores ? I mean, it seems it keeps
> wasting
> cycles due to the infinite loop placed there to check the queues.
>
>
>
> Which would be the way to improve this and make it to become more efficient
> ?
>
>
>
> Sincerely,
>
>
>
> Fred
>
>

[dpdk-dev] Questions on use of multiple NIC interfaces

2014-03-28 Thread Vladimir Medvedkin

1. Yes.
2. Yes. Look at programmer's guide section 16 Multi-process support.
3. You can use blacklist eal option.

Regards,
Vladimir


2014-03-28 13:25 GMT+04:00 Sujith Sankar (ssujith) :

> Hi all,
>
> Could someone answer the following questions about the usage of multiple
> NIC interfaces with DPDK?
>
> 1.  If my server has two identical Intel NICs, could I use both with DPDK
> and its applications?
> 2.  If both the NIC cards could be used with DPDK, could I use them with
> separate instances of applications?  E.g., NIC1 used by App1 and NIC2 used
> by App2.
> 3.  If answer to qn no 2 is yes, does the driver take care to avoid
> reinitialising NIC1 when App2 tries to initialise NIC2?  From what I've
> seen, DPDK calls the driver init for all the matching devices (vendor id
> and device id).
>
> Thanks,
> -Sujith
>

[dpdk-dev] VM L2 control register (PFVML2FLT) configuring in VMDQ mode

2014-03-24 Thread Vladimir Medvedkin

Hi all,

I found that there is no way to configure pool behavior, for example accept
broadcasts, in  VMDQ mode. For SR-IOV there is rte_eth_dev_set_vf_rxmode(),
but according to datasheet it doesn't matter whether the 82599's virtual
environment operates in IOV mode or in Next Generation VMDq mode.

So, we have 2 paths: either make single function by removing from
rte_eth_dev_set_vf_rxmode()
checking pool(or vf) number against dev_info.max_vfs
or make similar function for vmdq mode.

What is the best?

[dpdk-dev] Crafting a packet for transmission.

2014-03-13 Thread Vladimir Medvedkin

Cast pointer to required struct pointer, for example for ipv4
struct ipv4_hdr *iph;
iph = (struct ipv4_hdr *)rte_pktmbuf_append(m, sizeof(struct ipv4_hdr));
and fill in fields.
Look in lib/librte_net/ directory.

Regards,
Vladimir


2014-03-13 16:04 GMT+04:00 Aravind :

> Thank you for your reply Vladimir.
>
> I just went through the api reference for Intel DPDK. 'rte_pktmbuf_append'
> funcion append len bytes to an mbuf and return a pointer to the start
> address of the added data. But how am I suppose to fill in the packet
> headers...? It would be great if you could guide me on this.
>
>
>
>
>
> The information contained in this communication is intended solely for the
> use of the individual(individuals) or entity to whom it is addressed and
> others authorized to receive it. It may contain confidential or legally
> privileged information. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or taking any
> action in reliance on the contents of this information is strictly
> prohibited and may be unlawful. If you have received this communication in
> error, please notify me immediately by responding to this email and then
> delete it from your system. Me myself is neither liable for the proper and
> complete transmission of the information contained in this communication
> nor for any delay in its receipt.
>
>
> On Thu, Mar 13, 2014 at 5:18 PM, Vladimir Medvedkin  gmail.com>wrote:
>
>> Hi,
>>
>> At first look at https://github.com/Pktgen/Pktgen-DPDK
>>
>> If you need your custom app:
>> - alloc mbuf with rte_pktmbuf_alloc
>> - fill up L2-4 headers fields (look at rte_pktmbuf_append func for
>> example)
>> - send packet via rte_eth_tx_burst
>>
>> Regards,
>> Vladimir
>>
>>
>> 2014-03-13 15:15 GMT+04:00 sabu kurian :
>>
>> > Hai friends,
>> >
>> > My requirement is to create a packet generator. So I could use
>> >
>> > struct rte_mbuf * m;
>> >
>> > to create a single packet holder. So how am I suppose to fill in the
>> packet
>> > details like the MAC source , destination and also the IP source ,
>> > destination (in case of IPv4 packets).
>> >
>> > Following the l2fwd example, which has got the TAP interface to write
>> the
>> > data to and read the data from.
>> >
>> > using the ether_hdr , one could read the MAC address from the packet
>> >
>> > eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
>> > tmp = >s_addr.addr_bytes[0];
>> >
>> > But how am I suppose to craft a packet and supply in all these details
>> , so
>> > that I could sent the packet via rte_eth_tx_burst
>> >
>> >
>> > Thanks in advance
>> >
>>
>
>

[dpdk-dev] Crafting a packet for transmission.

2014-03-13 Thread Vladimir Medvedkin

Hi,

At first look at https://github.com/Pktgen/Pktgen-DPDK

If you need your custom app:
- alloc mbuf with rte_pktmbuf_alloc
- fill up L2-4 headers fields (look at rte_pktmbuf_append func for example)
- send packet via rte_eth_tx_burst

Regards,
Vladimir


2014-03-13 15:15 GMT+04:00 sabu kurian :

> Hai friends,
>
> My requirement is to create a packet generator. So I could use
>
> struct rte_mbuf * m;
>
> to create a single packet holder. So how am I suppose to fill in the packet
> details like the MAC source , destination and also the IP source ,
> destination (in case of IPv4 packets).
>
> Following the l2fwd example, which has got the TAP interface to write the
> data to and read the data from.
>
> using the ether_hdr , one could read the MAC address from the packet
>
> eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
> tmp = >s_addr.addr_bytes[0];
>
> But how am I suppose to craft a packet and supply in all these details , so
> that I could sent the packet via rte_eth_tx_burst
>
>
> Thanks in advance
>

[dpdk-dev] checking packet drop at NIC

2014-01-31 Thread Vladimir Medvedkin

Hi Sharath,

In DPDK interrupts are disabled to eliminate the performance overhead.
Interrupts used only for link status change. So you can poll
rte_eth_stats_get and check struct rte_eth_stats for errors.

Regards,
Vladimir



2014-01-31 Sharath 

> Hi Daniel & all,
>
> can anyone please let me know about this.
>
> Tx
> -SB
>
>
> On Thu, Jan 30, 2014 at 4:50 PM, Sharath  >wrote:
>
> > hi!
> >
> > are there any interrupts which are raised by DPDK, for the fifo errors.
> >
> > please let me know, where can I find the details and how to handle such
> > interrupts ?
> >
> > Tx
> > -SB
> >
> >
> > On Thu, Jan 30, 2014 at 2:30 PM, Sharath  >wrote:
> >
> >> Thanks Daniel !
> >> Let me check it out . . .
> >> On Jan 29, 2014 8:54 PM, "Daniel Kaminsky" <
> >> daniel.kaminsky at infinitelocality.com> wrote:
> >>
> >>> Hi Sharath,
> >>>
> >>> Try rte_eth_stats_get, I think this should give you what you're looking
> >>> for.
> >>>
> >>> Regards,
> >>> Daniel
> >>>
> >>>
> >>> On Tue, Jan 28, 2014 at 7:29 AM, Sharath <
> sharathjm.bharadwaj at gmail.com>wrote:
> >>>
>  hi !
> 
>  can someone please tell me whether the DPDK provides any method to
>  handle
>  below
> 
>  a. account the packet drops at NIC level ? is there any interrupt
>  raised by
>  DPDK for the same ?
>  b. to check fifo errors ?
>  c. way to check rx and tx in sync
> 
>  Tx,
>  -SB
> 
> >>>
> >>>
> >
>

[dpdk-dev] pktgen offload checksum flag not able to make it work with pacp packets.

2014-01-27 Thread Vladimir Medvedkin

Hi Banashankar,

For proper TCP checksum calculation you have to calculate checksum over
pseudo header (see app/test-pmd/cmdline.c) and put result in tcp_hdr->cksum


2014-01-26 Wiles, Roger Keith 

> Hi Banashankar,
>
> The tx_conf is used in the pktgen_config_ports() with the
> rte_eth_tx_queue_setup() and I am not sure why it matters that tx_conf is
> disable. The values are mostly zero, but some type of interaction must be
> going on. It may be the txq_flags being set to  IXGBE_SIMPLE_FLAGS and it
> is overriding the the per packet flag later. You will need to look at the
> driver to determine the real reason.
>
> The checksum should not be wrong unless the hardware registers are not
> setup correctly, but I would not think that is the case. You may want to
> verify the checksum is correct another way, because I can not see the
> hardware doing the checksum wrong.
>
> Thanks
> ++Keith
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO
> office, Wind River
> mobile 940.213.5533
> [Powering 30 Years of Innovation] >
>
> On Jan 25, 2014, at 4:53 PM, Banashankar KV  banveerad at gmail.com>> wrote:
>
> Hi,
> Thanks a lot for the reply !
> Yes I have checked those examples and had set all those flags. But IP
> checksum started working after commenting off the txq_flags from
> the pktgen.c file's tx_conf .
>
> And I added the following flag to calculate the tcp checksum.
>
> m->ol_flags  |= PKT_TX_TCP_CKSUM
>
> its calculating the TCP checksum but turning out to be wrong checksum.
>
> Thanks
> Banashankar
>
>
>
> On Fri, Jan 24, 2014 at 11:44 AM, Wiles, Roger Keith <
> keith.wiles at windriver.com> wrote:
> I have not enabled that feature myself, but I would expect it to work as
> long as the hardware does. What does the docs say about enabling hardware
> offload support? Did you look at the following files:
>
> ip_reassembly/ipv4_rsmbl.h: m->ol_flags |= PKT_TX_IP_CKSUM;
> ipv4_frag/rte_ipv4_frag.h:  out_pkt->ol_flags |=
> PKT_TX_IP_CKSUM;
>
> Thanks
> ++Keith
>
> Keith Wiles, Principal Technologist for Networking member of the CTO
> office, Wind River
> mobile 940.213.5533
> [Powering 30 Years of Innovation] >
>
> On Jan 24, 2014, at 12:54 PM, Banashankar KV  banveerad at gmail.com>> wrote:
>
> I was modifying a packet in pktgen_pcap_mbuf_ctor()
> and after modifying I wanted to offload the checksum calculation to h/w
> so I am setting these flags in pktgen_pcap_mbuf_ctor function.
>
> m->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> m->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
>
> m->ol_flags = PKT_TX_IP_CKSUM
>
>
> I even tried with setting .txq_flags = 0 in rte_eth_txconf struct in
> pktgen.c.
>
> But still not able to get the h/w checksum. Am I missing anything ?
>
>
>
> Thanks
> Banashankar
>
>
>
> Regards,
Vladimir

[dpdk-dev] Who can correct me about 82599 RSS Hash Function

2013-12-12 Thread Vladimir Medvedkin

Hi,

First, I hope you configure
port_conf->rx_adv_conf.rss_conf.rss_key and .rss_hf
properly.
Secondly,

-for(j=0;j<8;j++){
+for(j=7;j>=0;j--){


Regards,
Vladimir

2013/12/11 chen_lp at neusoft.com 

>
> I want calculate the NIC rss hash result by function,but the result is not
> right, I don't know where the wrong.
>
>
> struct mbf_cb{
> uint32_t sip;
> uint32_t dip;
> uint16_t sport;
> uint16_t dport;
> };
>
> static uint8_t test_rss[]={
> 0x6d,0x5a,0x56,0xda,0x25,0x5b,0x0e,0xc2,
> 0x41,0x67,0x25,0x3d,0x43,0xa3,0x8f,0xb0,
> 0xd0,0xca,0x2b,0xcb,0xae,0x7b,0x30,0xb4,
> 0x77,0xcb,0x2d,0xa3,0x80,0x30,0xf2,0x0c,
> 0x6a,0x42,0xb7,0x3b,0xbe,0xac,0x01,0xfa,
> };
>
> static uint8_t input_mask[]={
> 0x01,0x02,0x04,0x08,
> 0x10,0x20,0x40,0x80,
> };
>
>  mcb.sip=rte_cpu_to_be_32(IPv4(66,9,149,187));
>   mcb.dip=rte_cpu_to_be_32(IPv4(161,142,100,80));
>mcb.sport=rte_cpu_to_be_16(2794);
>mcb.dport=rte_cpu_to_be_16(1766);
>
>
> uint32_t compute_hash(uint8_t *input, int n)
> {
> int i,j,k;
> uint32_t result=0;
> uint32_t *lk;
> uint8_t rss_key[40];
>
> memcpy(rss_key,test_rss,40);
>
> lk=(uint32_t *)rss_key;
> for(i=0;i for(j=0;j<8;j++){
> if((input_mask[j])[i]){
> result^=*lk;
> }
>
> // shift k left 1 bit
> rss_key[0]=rss_key[0]<<1;
> for(k=1;k<40;k++){
> if(rss_key[k]&0x80){
> rss_key[k-1]|=0x01;
> }
> rss_key[k]=rss_key[k]<<1;
> }
> }
> }
> return result;
> }
>
> printf("rss_hash=%#x\n",compute_hash((uint8_t *),sizeof(struct
> mbf_cb)));
>
> rss_hash=0x57476eca
>  but the right result is 0x51ccc178
>
>
>
>
>
>
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
>  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>

[dpdk-dev] How to calculate checksum automically with NIC when sending a packet?

2013-11-22 Thread Vladimir Medvedkin

Of course you have to properly set
m->pkt.vlan_macip.f.l2_len and
m->pkt.vlan_macip.f.l3_len fields.

Regards,
Vladimir


2013/11/22 Vladimir Medvedkin 

> Hi,
>
> If you need only ip checksum:
> struct rte_mbuf *m;
> m->ol_flags |= PKT_TX_IP_CKSUM;
>
> if you need to calculate tcp checksum add also PKT_TX_TCP_CKSUM flag to
> ol_flags field and caclulate pseudoheader checksum (see get_ipv4_psd_sum()
> in app/test-pmd/csumonly.c)
>
> struct tcp_hdr *th;
> th->cksum   = get_ipv4_psd_sum(iph);
>
> Regards,
> Vladimir
>
>
>
>
> 2013/11/22 William Rolinson 
>
>> RT~
>>
>
>

[dpdk-dev] pci_unbind.py failure

2013-11-13 Thread Vladimir Medvedkin

Hi all,

I have faced with similar problem on my real environment with 82599 nic.
Looks like in some cases rte_eth_dev_count() return 0 instead of real
number of igb_uio binded ports. After restarting app several times
rte_eth_dev_count() return real number of ports and continue execute
normaly.


2013/11/13 Daniel Kaminsky 

> Hi Jyotiswarup,
>
> Did you initialized all the relevant parts before
> (rte_eal_init(), rte_pmd_init_all() and don't forget rte_eal_pci_probe())?
>
> Regards,
> Daniel
>
>
> On Wed, Nov 13, 2013 at 1:27 PM, Jose Gavine Cueto  >wrote:
>
> > Hi,
> >
> > How are you using it? I've successfully used it with vbox.
> >
> > Cheers
> > On Nov 13, 2013 7:17 PM, "Jyotiswarup Raiturkar" <
> jyotisr5 at googlemail.com>
> > wrote:
> >
> > > hi
> > >
> > > I got my application running inside a vm (vmplayer) where the VM
> > emulates a
> > > e1000 NIC (82545EM). But rte_eth_dev_count() seems to return 0.. From
> the
> > > website looks like it's a supported NIC . My lspci and pci_ubind status
> > is
> > > below. Any pointers?
> > >
> > > # ./tools/pci_unbind.py --status
> > >
> > > Network devices using IGB_UIO driver
> > > 
> > > :02:06.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
> > > unused=e1000
> > >
> > > Network devices using kernel driver
> > > ===
> > > :02:01.0 '79c970 [PCnet32 LANCE]' if=eth1 drv=pcnet32 unused=
> > *Active*
> > >
> > > Other network devices
> > > =
> > > 
> > >
> > >
> > > # lspci -vt
> > > -[:00]-+-00.0  Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host
> > > bridge
> > >+-01.0-[01]--
> > >+-07.0  Intel Corporation 82371AB/EB/MB PIIX4 ISA
> > >+-07.1  Intel Corporation 82371AB/EB/MB PIIX4 IDE
> > >+-07.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
> > >+-07.7  VMware Virtual Machine Communication Interface
> > >+-0f.0  VMware SVGA II Adapter
> > >+-10.0  LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT
> > Dual
> > > Ultra320 SCSI
> > >+-11.0-[02]--+-00.0  VMware USB1.1 UHCI Controller
> > >|+-01.0  Advanced Micro Devices [AMD] 79c970
> > > [PCnet32 LANCE]
> > >|+-02.0  Ensoniq ES1371 [AudioPCI-97]
> > >|+-03.0  VMware USB2 EHCI Controller
> > >|+-05.0  VMware Device 07e0
> > >|\-06.0  Intel Corporation 82545EM Gigabit
> > Ethernet
> > > Controller (Copper)
> > >+-15.0-[03]--
> > >+-15.1-[04]--
> > >+-15.2-[05]--
> > >+-15.3-[06]--
> > >+-15.4-[07]--
> > >+-15.5-[08]--
> > >+-15.6-[09]--
> > >+-15.7-[0a]--
> > >+-16.0-[0b]--
> > >+-16.1-[0c]--
> > >+-16.2-[0d]--
> > >+-16.3-[0e]--
> > >+-16.4-[0f]--
> > >+-16.5-[10]--
> > >+-16.6-[11]--
> > >+-16.7-[12]--
> > >+-17.0-[13]--
> > >+-17.1-[14]--
> > >+-17.2-[15]--
> > >+-17.3-[16]--
> > >+-17.4-[17]--
> > >+-17.5-[18]--
> > >+-17.6-[19]--
> > >+-17.7-[1a]--
> > >+-18.0-[1b]--
> > >+-18.1-[1c]--
> > >+-18.2-[1d]--
> > >+-18.3-[1e]--
> > >+-18.4-[1f]--
> > >+-18.5-[20]--
> > >+-18.6-[21]--
> > >\-18.7-[22]--
> > >
> > >
> > > Regards
> > > Jyotiswarup
> > >
> > >
> > > On Tue, Nov 5, 2013 at 9:34 PM, Cyril Cressent <
> cyril.cressent at intel.com
> > > >wrote:
> > >
> > > > On Tue, Nov 05, 2013 at 08:01:06PM +0530, Jyotiswarup Raiturkar
> wrote:
> > > >
> > > > > Thanks for the quick reply. I saw some definitions of
> e1000_phy_82579
> > > > hence
> > > > > I thought (hoped) the NIC would be supported. I will try to run my
> > dpdk
> > > > app
> > > > > inside a VM with an emulated e1000 NIC (just to test the code ..).
> > > >
> > > > As a general rule, even if you find references to a NIC in the poll
> > mode
> > > > drivers, if it's not listed in
> > > > lib/librte_eal/common/include/rte_pci_dev_ids.h
> > > > then consider the NIC as not supported.
> > > >
> > > > Good luck with the VM,
> > > >
> > > > Cyril
> > > >
> > >
> >
>

[dpdk-dev] olflags in SRIOV VF environment

2013-11-12 Thread Vladimir Medvedkin

Hi Prashant,

May be it doesn't work due to Known Issues and Limitations (see Release
Notes)
quote:

6.1 In packets provided by the PMD, some flags are missing
In packets provided by the PMD, some flags are missing. The application
does not have access to information provided by the hardware (packet is
broadcast, packet is multicast, packet is IPv4 and so on).

Regards,
Vladimir



2013/11/12 Prashant Upadhyaya 

> Hi guys,
>
> I am facing a peculiar issue with the usage of struct rte_mbuf-> ol_flags
> field in the rte_mbuf when I receive the packets with the rte_eth_rx_burst
> function.
> I use the ol_flags field to identify whether is an IPv4 or IPv6 packet or
> not thus -
>
> if ((pkts_burst->ol_flags & PKT_RX_IPV4_HDR) ||
> (pkts_burst->ol_flags &
> PKT_RX_IPV6_HDR))
>
> [pkts_burst is my rte_mbuf pointer]
>
> Now here are the observations -
>
>
> 1.   This works mighty fine when my app is working on the native
> machine
>
> 2.   This works good when I run this in a VM and use one VF over SRIOV
> from one NIC port
>
> 3.   This works good when I run this in two VM's and use one VF from 2
> different NIC ports (one VF from each) and use these VF's in these 2 VM's
> (VF1 from NIC port1 in VM1 and VF2 from NIC port2 in VM2)
>
> 4.   However the ol_flags fails to classify the packets when I use 2
> VM's and use 2 VF's from the 'same' NIC port and expose one each to the 2
> VM's I have
>
> There is no bug in my 'own' application, because when I stopped inspecting
> the ol_flags for classification of IPv4 and V6 packets and wrote a mini
> logic of my own by inspecting the ether type of the packets (the packets
> themselves come proper in all the cases, thankfully), my entire usecase
> passes (it is a rather significant usecase, so it can't be luck)
>
> Any idea guys why it works and doesn't work ?
>
> Regards
> -Prashant
>
>
>
>
>
>
> ===
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
> ===
>

[dpdk-dev] TX IP checksum offloading

2013-11-08 Thread Vladimir Medvedkin

Hi,

Did you set
m->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
m->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
 fields?

Regards,
Vladimir


2013/11/7 Daniel Kaminsky 

> Hi,
>
> Did anyone had an experience using the PKT_TX_IP_CKSUM flag?
> I have an application that generates IP traffic but whenever I try to set
> this flag on (*m->ol_flags = PKT_TX_IP_CKSUM)* the rte_mbuf nothing is sent
> out.
> Retrieving that statistics from the ethernet device show zero for opackets
> and oerrors.
>
> I'm using ixgbe driver and DPDK 1.5.0-8 (from Intel) distribution.
>
> Thanks,
> Daniel
>

[dpdk-dev] Multiple LCore receiving from same port/queue

2013-10-17 Thread Vladimir Medvedkin

Hi,

By default, you can't poll same queue on same port from different lcores.
If you need poll same queue on several lcores use locks to avoid race
conditions.


2013/10/17 Sambath Kumar Balasubramanian 

> Hi,
>
>   I have a test dpdk application with 2 lcores receiving packets
> using rte_eth_rx_burst API. Is this a supported packet processing model.
> The reason I am asking is I am running into some trouble with this model.
>
> Thanks,
> Sambath
>

[dpdk-dev] L3FWD LPM IP lookup performance question

2013-10-01 Thread Vladimir Medvedkin

Hi,

Base concepts of algorithm used by LPM (DIR-24-8) described here
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.9617=rep1=pdf

Best regards,
   Vladimir


On 24/09/2013 15:53, Jun Han wrote:
>* Hello,*>**>* We are trying to benchmark L3FWD application and have a 
>question regarding*>* the IP lookup algorithm as we expect the bottleneck to 
>be at the lookup.*>* Could someone let us know how efficient the lookup 
>algorithm that L3FWD is*>* using (e.g, LPM)? We are asking because we want to 
>obtain highest L3*>* forwarding performance number, and we might need to 
>change the lookup*>* method if the current LPM method is not as 
>efficient.*>**>* Thank you very much,*>**>* Jun*>

70 matches

Mail list logo