[dpdk-dev] [RFC 0/4] Use Google Test as DPDK unit test framework

2016-08-05 Thread Yerden Zhumabekov

On 03.08.2016 15:57, Doherty, Declan wrote:
> Some of the things I've come across include:
> No standard output format to integrated with continuous regression systems
> No ability to specify specific unit tests or groups of tests to run from the 
> command line
> No standard set of test assertions used across the test suites.
> No standard setup and teardown functions across test suites, state from 
> previous test
> suite can break current
> Requirement to use a python script to orchestrate test runs.
> No support for mocking functionality.
>
> I know that none of the above couldn't be fixed in our current test 
> application, but I would
> question if it is effort worthwhile when we take an off the shelf framework, 
> which does all
> those things and a whole lot more, which has been test and used in a huge 
> variety of
> projects.
>
> I certainly willing to look at other frameworks both C and C++ but I yet to 
> find a C framework
> which come close to the usability and flexibility of the popular C++ ones.

We use cmocka.org for tests. Written in C. It has support for:
* mocking;
* setup/teardown;
* asserts;
* test groups.

Output is nicely formatted.


[dpdk-dev] random pkt generator PMD

2016-06-21 Thread Yerden Zhumabekov
I've developed some preliminary version of the driver. The code is 
derived from Null PMD, but required a lot of rework.

It uses following devargs to generate packets:

1) edit=offset:size:[rnd|value]
 Edit a field within an mbuf packet data with given offset and size. 
Mark it as 'rnd' or assign it a hex value, for example:
'edit=8:16:rnd' tags field with offset 8 bytes and with size of 16 
bytes random-generated,
'edit=14:4:0xdeadbeef' assigns a specified sequence of bytes to the 
field (network byte order).

2) tmpl=name
 Use a template with name. Instead of editing data manually, specify 
a hard-coded template and then edit only intended fields. Implemented 
icmp4, tcp4, but needs to be expanded.

3) size=len
 Specify a size of packet. May not be less than size of template 
(checked on devinit).

I ran testpmd (start/stop), then l2fwd, looks like it works, but I'd be 
happy to hear about additional tests I need to run to ensure the PMD 
conformance.

With 64 bytes packet and one 8-byte random field it's about 6-7 Mpps 
now. I use rte_rand()/lrand48() as a source of random bytes, it impacts 
a performance, but I haven't come up with anything else.


On 15.06.2016 16:07, Bruce Richardson wrote:
> On Wed, Jun 15, 2016 at 04:03:59PM +0600, Yerden Zhumabekov wrote:
>>
>> Right, but development of various features regarding L3/L4 etc requires more
>> subtle approach, like live packets, different protocol versions, fields
>> manipulation. In this case some packet mangling/randomizing capabilities
>> would be quite useful. Something similar to what is done in Pktgen, but more
>> lightweight approach, in a same app.
>>
>> I've almost made my mind :) so the next question: is there any guide on PMD
>> dev? I'm looking through rte_ether.h right now, but some doc would be very
>> nice.
> Unfortunately not. My suggestion is to take one of the simple vdev's e.g. 
> ring,
> pcap, null, and work off a copy of it.
>
> /Bruce



[dpdk-dev] random pkt generator PMD

2016-06-16 Thread Yerden Zhumabekov
On 15.06.2016 19:02, Neil Horman wrote:
> On Wed, Jun 15, 2016 at 03:43:56PM +0600, Yerden Zhumabekov wrote:
>> Hello everybody,
>>
>> DPDK already got a number of PMDs for various eth devices, it even has PMD
>> emulations for backends such as pcap, sw rings etc.
>>
>> I've been thinking about the idea of having PMD which would generate mbufs
>> on the fly in some randomized fashion. This would serve goals like, for
>> example:
>>
>> 1) running tests for applications with network processing capabilities
>> without additional software packet generators;
>> 2) making performance measurements with no hw inteference;
>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>> build, so on.
>>
>> Maybe there's no such need, and these goals may be achieved by other means
>> and this idea is flawed? Any thoughts?
>>
> I think you already have a solution to this problem.  Linux/BSD have multiple
> user space packet generators that can dump thier output to a pcap format file,
> and dpdk has a pcap pmd that accepts a pcap file as input to send in packets.

Things that I don't like about the idea of using PCAP PMD:

1) the need to create additional files with additional scripts and keep 
those with your test suite;
2) the need to rewind pcap once you played it (fixable);
3) reading packets one-by-one, file operations which may lead to perf 
impact;
4) low variability among source packets.

Those are things which put me on idea of randomized packet generator 
PMD. Possible devargs could be:
1) id of a template, like "ipv4", "ipv6", "dot1q" etc;
2) size of mbuf payload;
3) array of tuples like (offset, size, value) with value being exact 
value or "rnd" keyword.


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov


On 15.06.2016 18:25, Dumitrescu, Cristian wrote:

 So add a loop-mode to pcap pmd?
>>> It would be nice to have an option like "...,rewind=1,...".
>> As Cristian points out in
>> http://dpdk.org/ml/archives/dev/2016-June/041589.html, the current pmd
>> behavior of stopping is the odd man out in the pmd crowd.
>>
>> Rather than whether to rewind or not, I'd make the number of loops
>> configurable, defaulting to forever and 1 being the equal to current
>> behavior.
>>
>>  - Panu -
> +1

I'm afraid, all packets from pcap file would need to be preloaded to 
memory. Otherwise, each loop would infer pcap_open/pcap_close(), am I wrong?


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov

On 15.06.2016 18:33, Jay Rolette wrote:
> On Wed, Jun 15, 2016 at 7:11 AM, Yerden Zhumabekov 
> wrote:
>>
>> On 15.06.2016 17:50, Jay Rolette wrote:
>>
>>> On Wed, Jun 15, 2016 at 4:43 AM, Yerden Zhumabekov 
>>> wrote:
>>>
>>> Hello everybody,
>>>> DPDK already got a number of PMDs for various eth devices, it even has
>>>> PMD
>>>> emulations for backends such as pcap, sw rings etc.
>>>>
>>>> I've been thinking about the idea of having PMD which would generate
>>>> mbufs
>>>> on the fly in some randomized fashion. This would serve goals like, for
>>>> example:
>>>>
>>>> 1) running tests for applications with network processing capabilities
>>>> without additional software packet generators;
>>>> 2) making performance measurements with no hw inteference;
>>>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>>>> build, so on.
>>>>
>>>> Maybe there's no such need, and these goals may be achieved by other
>>>> means
>>>> and this idea is flawed? Any thoughts?
>>>>
>>>> Are you thinking of something along the lines of what BreakingPoint (now
>>> part of Ixia) does, but as an open source software tool?
>>>
>>>
>> More dreaming than thinking though :) Live flows generation, malware,
>> attacks simulation etc is way out of scope of PMD dev, I guess.
>>
> Having a DPDK-based open-source BreakingPoint app would be a _fantastic_
> tool for the security community, but yes, it doesn't really make sense to
> put any of that logic in the PMD itself.
>
> Were you more after the capabilities from that sort of tool or the
> experience of writing a PMD?
>

We're developing packet processing applications for our company and, of 
course, having a testing tool with such capabilities would be great. As 
for experience in PMD development - sure, why not getting it.


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov


On 15.06.2016 17:25, Panu Matilainen wrote:
> On 06/15/2016 02:10 PM, Yerden Zhumabekov wrote:
>>
>>
>> On 15.06.2016 16:43, Dumitrescu, Cristian wrote:
>>>
>>>>
>>>> Hello everybody,
>>>>
>>>> DPDK already got a number of PMDs for various eth devices, it even has
>>>> PMD emulations for backends such as pcap, sw rings etc.
>>>>
>>>> I've been thinking about the idea of having PMD which would generate
>>>> mbufs on the fly in some randomized fashion. This would serve goals
>>>> like, for example:
>>>>
>>>> 1) running tests for applications with network processing capabilities
>>>> without additional software packet generators;
>>>> 2) making performance measurements with no hw inteference;
>>>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>>>> build, so on.
>>>>
>>>> Maybe there's no such need, and these goals may be achieved by other
>>>> means and this idea is flawed? Any thoughts?
>>> How about a Perl/Python script to generate a PCAP file with random
>>> packets and then feed the PCAP file to the PCAP PMD?
>>>
>>> Random can mean different requirements for different
>>> users/application, I think it is difficult to fit this  under a simple
>>> generic API. Customizing the script for different requirements if a
>>> far better option in my opinion.
>>
>> AFAIK, the thing about pcap pmd is that one needs to rewind pcap file
>> once pcap pmd reaches its end. It requires additional (non-generic)
>> handling in app code.
>
> So add a loop-mode to pcap pmd?

It would be nice to have an option like "...,rewind=1,...".


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov


On 15.06.2016 17:50, Jay Rolette wrote:
> On Wed, Jun 15, 2016 at 4:43 AM, Yerden Zhumabekov 
> wrote:
>
>> Hello everybody,
>>
>> DPDK already got a number of PMDs for various eth devices, it even has PMD
>> emulations for backends such as pcap, sw rings etc.
>>
>> I've been thinking about the idea of having PMD which would generate mbufs
>> on the fly in some randomized fashion. This would serve goals like, for
>> example:
>>
>> 1) running tests for applications with network processing capabilities
>> without additional software packet generators;
>> 2) making performance measurements with no hw inteference;
>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>> build, so on.
>>
>> Maybe there's no such need, and these goals may be achieved by other means
>> and this idea is flawed? Any thoughts?
>>
> Are you thinking of something along the lines of what BreakingPoint (now
> part of Ixia) does, but as an open source software tool?
>

More dreaming than thinking though :) Live flows generation, malware, 
attacks simulation etc is way out of scope of PMD dev, I guess.


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov


On 15.06.2016 16:43, Dumitrescu, Cristian wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yerden
>> Zhumabekov
>> Sent: Wednesday, June 15, 2016 10:44 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] random pkt generator PMD
>>
>> Hello everybody,
>>
>> DPDK already got a number of PMDs for various eth devices, it even has
>> PMD emulations for backends such as pcap, sw rings etc.
>>
>> I've been thinking about the idea of having PMD which would generate
>> mbufs on the fly in some randomized fashion. This would serve goals
>> like, for example:
>>
>> 1) running tests for applications with network processing capabilities
>> without additional software packet generators;
>> 2) making performance measurements with no hw inteference;
>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>> build, so on.
>>
>> Maybe there's no such need, and these goals may be achieved by other
>> means and this idea is flawed? Any thoughts?
> How about a Perl/Python script to generate a PCAP file with random packets 
> and then feed the PCAP file to the PCAP PMD?
>
> Random can mean different requirements for different users/application, I 
> think it is difficult to fit this  under a simple generic API. Customizing 
> the script for different requirements if a far better option in my opinion.

AFAIK, the thing about pcap pmd is that one needs to rewind pcap file 
once pcap pmd reaches its end. It requires additional (non-generic) 
handling in app code.


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov
On 15.06.2016 15:49, Bruce Richardson wrote:
> On Wed, Jun 15, 2016 at 03:43:56PM +0600, Yerden Zhumabekov wrote:
>> Hello everybody,
>>
>> DPDK already got a number of PMDs for various eth devices, it even has PMD
>> emulations for backends such as pcap, sw rings etc.
>>
>> I've been thinking about the idea of having PMD which would generate mbufs
>> on the fly in some randomized fashion. This would serve goals like, for
>> example:
>>
>> 1) running tests for applications with network processing capabilities
>> without additional software packet generators;
>> 2) making performance measurements with no hw inteference;
>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>> build, so on.
>>
>> Maybe there's no such need, and these goals may be achieved by other means
>> and this idea is flawed? Any thoughts?
> Isn't some of this already covered by the NULL PMD? Perhaps it could be 
> extended
> or enhanced to meet some more of your requirements?
>
> /Bruce
Right, but development of various features regarding L3/L4 etc requires 
more subtle approach, like live packets, different protocol versions, 
fields manipulation. In this case some packet mangling/randomizing 
capabilities would be quite useful. Something similar to what is done in 
Pktgen, but more lightweight approach, in a same app.

I've almost made my mind :) so the next question: is there any guide on 
PMD dev? I'm looking through rte_ether.h right now, but some doc would 
be very nice.


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Yerden Zhumabekov
Hello everybody,

DPDK already got a number of PMDs for various eth devices, it even has 
PMD emulations for backends such as pcap, sw rings etc.

I've been thinking about the idea of having PMD which would generate 
mbufs on the fly in some randomized fashion. This would serve goals 
like, for example:

1) running tests for applications with network processing capabilities 
without additional software packet generators;
2) making performance measurements with no hw inteference;
3) ability to run without root privileges, --no-pci, --no-huge, for CI 
build, so on.

Maybe there's no such need, and these goals may be achieved by other 
means and this idea is flawed? Any thoughts?


[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-03 Thread Yerden Zhumabekov
+1

We're using INI in our app, turned out to be quite simple, like this:

[eal]
;; EAL common options:
;;   -c COREMASK Hexadecimal bitmask of cores to run on
# coremask = fff

;;   -l CORELIST List of cores to run on
corelist = 3,4,5

;;   --lcores COREMAPMap lcore set to physical cpu set
; coremap =

;;   --master-lcore ID   Core ID that is used as master
; master-lcore-id = 0

;;   -n CHANNELS Number of memory channels
memory-channels = 4


On 01.06.2016 22:18, Bruce Richardson wrote:
> On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
>> On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  
>> wrote:
>>
>>> Started from the link below, but did not want to highjack the thread.
>>> http://dpdk.org/ml/archives/dev/2016-June/040021.html
>>>
>>> I was thinking about this problem from a user perspective and command line
>>> options are very difficult to manage specifically when you have a large
>>> number of options as we have in dpdk. I see all of these options as a type
>>> of database of information for the DPDK and the application, because the
>>> application command line options are also getting very complex as well.
>>>
>>> I have been looking at a number of different options here and the
>>> direction I was thinking was using a file for the options and
>>> configurations with the data in a clean format. It could have been a INI
>>> file or JSON or XML, but they all seem to have some problems I do not like.
>>> The INI file is too flat and I wanted a hierarchy in the data, the JSON
>>> data is similar and XML is just hard to read. I wanted to be able to manage
>>> multiple applications and possible system the DPDK/app runs. The problem
>>> with the above formats is they are just data and not easy to make decisions
>>> about the system and applications at runtime.
>>>
>> INI format is simplest for users to read, but if you really need hierarchy,
>> JSON will do that just fine. Not sure what you mean by "JSON data is
>> similar"...
>>
>>
> I'd be quite concerned if we start needing lots of hierarchies for 
> configuration.
>
> I'd really just like to see ini file format used for this because:
> * it's a well understood, simple format
> * very easily human readable and editable
> * lots of support for it in lots of languages
> * hierarchies are possible in it too - just not as easy as in other formats
>though. [In a previous life I worked with ini files which had address
>hierarchies 6-levels deep in them. It wasn't hard to work with]
> * it works well with grep since you must have one value per-line
> * it allows comments
> * we already have a DPDK library for parsing them
>
> However, for me the biggest advantage of using something like ini is that it
> would force us to keep things simple!
>
> I'd stay away from formats like json or XML that are designed for serializing
> entire objects or structures, and look for something that allows us to just
> specify configuration values.
>
> Regards,
> /Bruce
>



[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Yerden Zhumabekov
I recently felt tired enough of specifying various options for EAL, so I 
came up to use ini-based configuration. EAL parameters from dedicated 
section of ini file are parsed to argv array which is subsequently fed 
to rte_eal_init(). Quite handy, but maybe a little overkill.


On 01.06.2016 12:04, Yuanhan Liu wrote:
> Hi all,
>
> I guess we (maybe just me :) have stated few times something like
> "hey, this kind of stuff is good to have, but you are trying to
> add an EAL CLI option for a specific subsystem/driver, which is
> wrong".
>
> One recent example that is still fresh in my mind is the one from
> Christian [0], that he made a proposal to introduce two new EAL
> options, --vhost-owner and --vhost-perm, to configure the vhost
> user socket file permission.
>
>  [0]: http://dpdk.org/ml/archives/dev/2016-April/037948.html
>
> Another example is the one I met while enabling virtio 1.0 support.
> QEMU has the ability to support both virtio 0.95 (legacy) and 1.0
> (modern) at the same time for one virtio device, therefore, we
> could either use legacy driver or modern driver to operate the
> device. However, the current logic is we try with modern driver
> first, and then legacy driver if it failed. In above case, we will
> never hit the legacy driver. But sometimes, it's nice to let it
> force back to the legacy driver, say, for debug or compare purpose.
>
> Apparently, adding a new EAL option like "--force-legacy" looks
> wrong.
>
> The generic yet elegant solution I just thought of while having
> lunch is to add a new EAL option, say, --extra-options, where we
> could specify driver/subsystem specific options. As you see, it's
> nothing big deal, it just looks like Linux kernel parameters.
>
> Take above two cases as example, it could be:
>
>  --extra-options "vhost-owner=kvm:kvm force-legacy"
>
> Note that those options could also be delimited by comma.
>
> DPDK EAL then will provide some generic helper functions to get
> and parse those options, and let the specific driver/subsystem
> to invoke them to do the actual parse and do the proper action
> when some option is specified, say, virtio PMD driver will force
> back to legacy driver when "force-legacy" is given.
>
> Comments? Makes sense to you guys, or something nice to have?
>
>   --yliu



[dpdk-dev] pcap->eth low TX performance

2015-09-07 Thread Yerden Zhumabekov
tx burst sends, say, 10-15% percent of a supplied array. The tail is
being ignored so I have to drop it to avoid overflow.
Ethernet device is 82599.

In my app, I transmit all traffic through a ring then feed it to eth.
That leads to overflow as well.

04.09.2015 20:03, Kyle Larose ?:
> Are you reading from the pcap faster than the device can transmit?
> Does the app hold off reading from the pcap when the ethdev is pushing
> back, or does it just tail drop?
>
> On Fri, Sep 4, 2015 at 12:14 AM, Yerden Zhumabekov
> mailto:e_zhumabekov at sts.kz>> wrote:
>
> Hello,
>
> Did anyone try to work with pcap PMD recently? We're testing our app
> with this setup:
>
> PCAP --- rte_eth_rx_burst--> APP-> rte_eth_tx_burst -> ethdev
>
> I'm experiencing very low TX performance leading to massive mbuf drop
> while trying to send those packets over the Ethernet device. I tried
> running ordinary l2fwd and got the same issue with over 80-90% of
> packets drop. When I substitute PCAP with another ordinary Ethernet
> device, everything works fine. Can anyone share an idea?
>
> --
> Sincerely,
>
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
>
>

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] pcap->eth low TX performance

2015-09-04 Thread Yerden Zhumabekov
Hello,

Did anyone try to work with pcap PMD recently? We're testing our app
with this setup:

PCAP --- rte_eth_rx_burst--> APP-> rte_eth_tx_burst -> ethdev

I'm experiencing very low TX performance leading to massive mbuf drop
while trying to send those packets over the Ethernet device. I tried
running ordinary l2fwd and got the same issue with over 80-90% of
packets drop. When I substitute PCAP with another ordinary Ethernet
device, everything works fine. Can anyone share an idea?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v2] hash: fix breaking strict-aliasing rules

2015-03-24 Thread Yerden Zhumabekov
Fix rte_hash_crc() function by making use of uintptr_t variable
to hold a pointer to data being hashed. In this way, casting uint64_t
pointer to uint32_t avoided.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 1cd626c..abdbd9a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -513,35 +513,36 @@ rte_hash_crc(const void *data, uint32_t data_len, 
uint32_t init_val)
 {
unsigned i;
uint64_t temp = 0;
-   const uint64_t *p64 = (const uint64_t *)data;
+   uintptr_t pd = (uintptr_t) data;

for (i = 0; i < data_len / 8; i++) {
-   init_val = rte_hash_crc_8byte(*p64++, init_val);
+   init_val = rte_hash_crc_8byte(*(const uint64_t *)pd, init_val);
+   pd += 8;
}

switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
+   temp |= (uint64_t) *((const uint8_t *)pd + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
+   temp |= (uint64_t) *((const uint8_t *)pd + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
-   temp |= *((const uint32_t *)p64);
+   temp |= (uint64_t) *((const uint8_t *)pd + 4) << 32;
+   temp |= *(const uint32_t *)pd;
init_val = rte_hash_crc_8byte(temp, init_val);
break;
case 3:
-   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)pd, init_val);
break;
case 4:
-   temp |= *((const uint8_t *)p64 + 2) << 16;
+   temp |= *((const uint8_t *)pd + 2) << 16;
/* Fallthrough */
case 5:
-   temp |= *((const uint8_t *)p64 + 1) << 8;
+   temp |= *((const uint8_t *)pd + 1) << 8;
/* Fallthrough */
case 6:
-   temp |= *((const uint8_t *)p64);
+   temp |= *(const uint8_t *)pd;
init_val = rte_hash_crc_4byte(temp, init_val);
/* Fallthrough */
default:
-- 
1.7.9.5



[dpdk-dev] [PATCH] hash: fix breaking strict-aliasing rules

2015-03-20 Thread Yerden Zhumabekov
Hi Bruce,

Answers below.

19.03.2015 22:25, Bruce Richardson ?:
> On Wed, Mar 18, 2015 at 10:51:12PM +0600, Yerden Zhumabekov wrote:
>> Fix rte_hash_crc() function. Casting uint64_t pointer to uin32_t
>> may trigger a compiler warning about breaking strict-aliasing rules.
>> To avoid that, introduce a lookup table which is used to mask out
>> a remainder of data.
>>
>> See issue #1, http://dpdk.org/ml/archives/dev/2015-March/015174.html
>>
>> Signed-off-by: Yerden Zhumabekov 
> Looks ok to me. Couple of minor suggestions below.
>
> /Bruce
>
>> ---
>>  lib/librte_hash/rte_hash_crc.h |   31 +++
>>  1 file changed, 15 insertions(+), 16 deletions(-)
>>
>> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
>> index 3dcd362..e81920f 100644
>> --- a/lib/librte_hash/rte_hash_crc.h
>> +++ b/lib/librte_hash/rte_hash_crc.h
>> @@ -323,6 +323,16 @@ static const uint32_t crc32c_tables[8][256] = {{
>>   0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 
>> 0x56294D82, 0x1F1530A5
>>  }};
>>  
>> +static const uint64_t odd_8byte_mask[] = {
> Where does the name of this variable come from, it seems unclear to me?

If the number of bytes in data for CRC hashing cannot be evenly divided
by 8, the remainder is extracted with these masks. Hence, we have 'odd'
bytes to mask out. Maybe my poor english. :) Suggestions are welcome.
What about remainder_8byte_mask?

>> +0x00FF,
>> +0x,
>> +0x00FF,
>> +0x,
>> +0x00FF,
>> +0x,
>> +0x00FF,
>> +};
>> +
>>  #define CRC32_UPD(crc, n) \
>>  (crc32c_tables[(n)][(crc) & 0xFF] ^ \
>>   crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
>> @@ -535,38 +545,27 @@ static inline uint32_t
>>  rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
>>  {
>>  unsigned i;
>> -uint64_t temp = 0;
>> +uint64_t temp;
> It is worth keeping variable "temp" at all, it looks to me like it could be 
> done
> away with without seriously affecting readability.

Noted.

>>  const uint64_t *p64 = (const uint64_t *)data;
>>  
>>  for (i = 0; i < data_len / 8; i++) {
>>  init_val = rte_hash_crc_8byte(*p64++, init_val);
>>  }
>>  
>> -switch (7 - (data_len & 0x07)) {
>> +i = 7 - (data_len & 0x07);
> i is not a terribly meaningful variable name, perhaps a slightly longer, more
> meaningful name might improve readability.

Noted, I'll declare a new one.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] Virtual NIC interface fails to receive packets

2015-03-19 Thread Yerden Zhumabekov
'82545EM Gigabit Ethernet Controller (Copper)' if=eth1
> drv=e1000 unused=igb_uio *Active*
> :00:11.0 '82545EM Gigabit Ethernet Controller (Copper)' if=eth2
> drv=e1000 unused=igb_uio *Active*
>
> Other network devices
> =
> 
>
> got the below output:
>
>
>
>
>
>
>
> controller at controller-VirtualBox:~$ ifconfig eth4 >>> Corresponds to the
> interface bound to igb driver
>
> eth4: error fetching interface information: Device not found
> controller at controller-VirtualBox:~$ ifconfig eth2
> eth2  Link encap:Ethernet  HWaddr 08:00:27:bc:04:b6
>   inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
>   inet6 addr: fe80::a00:27ff:febc:4b6/64 Scope:Link
>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>   RX packets:197 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:271 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:15864 (15.8 KB)  TX bytes:30592 (30.5 KB)
>
> controller at controller-VirtualBox:~$ ifconfig eth1
> eth1  Link encap:Ethernet  HWaddr 08:00:27:ef:8b:a1
>   inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
>   inet6 addr: fe80::a00:27ff:feef:8ba1/64 Scope:Link
>   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>   RX packets:2 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:90 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:1180 (1.1 KB)  TX bytes:17162 (17.1 KB)aff
>
>
>   After this when I send traffic with this MAC >>> 080027b73a25
> traffic does not flow to this interface
>
>  Regards
>  Shankari.V

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH] hash: fix breaking strict-aliasing rules

2015-03-18 Thread Yerden Zhumabekov
Fix rte_hash_crc() function. Casting uint64_t pointer to uin32_t
may trigger a compiler warning about breaking strict-aliasing rules.
To avoid that, introduce a lookup table which is used to mask out
a remainder of data.

See issue #1, http://dpdk.org/ml/archives/dev/2015-March/015174.html

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 3dcd362..e81920f 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -323,6 +323,16 @@ static const uint32_t crc32c_tables[8][256] = {{
  0xE54C35A1, 0xAC704886, 0x7734CFEF, 0x3E08B2C8, 0xC451B7CC, 0x8D6DCAEB, 
0x56294D82, 0x1F1530A5
 }};

+static const uint64_t odd_8byte_mask[] = {
+   0x00FF,
+   0x,
+   0x00FF,
+   0x,
+   0x00FF,
+   0x,
+   0x00FF,
+};
+
 #define CRC32_UPD(crc, n) \
(crc32c_tables[(n)][(crc) & 0xFF] ^ \
 crc32c_tables[(n)-1][((crc) >> 8) & 0xFF])
@@ -535,38 +545,27 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint64_t temp = 0;
+   uint64_t temp;
const uint64_t *p64 = (const uint64_t *)data;

for (i = 0; i < data_len / 8; i++) {
init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (7 - (data_len & 0x07)) {
+   i = 7 - (data_len & 0x07);
+   switch (i) {
case 0:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
-   /* Fallthrough */
case 1:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
-   /* Fallthrough */
case 2:
-   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
-   temp |= *((const uint32_t *)p64);
+   temp = odd_8byte_mask[i] & *p64;
init_val = rte_hash_crc_8byte(temp, init_val);
break;
case 3:
-   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
-   break;
case 4:
-   temp |= *((const uint8_t *)p64 + 2) << 16;
-   /* Fallthrough */
case 5:
-   temp |= *((const uint8_t *)p64 + 1) << 8;
-   /* Fallthrough */
case 6:
-   temp |= *((const uint8_t *)p64);
+   temp = odd_8byte_mask[i] & *p64;
init_val = rte_hash_crc_4byte(temp, init_val);
-   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH 1/3 v2] librte_hash: Fix unsupported instruction `crc32' in i686 platform

2015-03-10 Thread Yerden Zhumabekov


08.03.2015 0:39, Thomas Monjalon ?:
> 2015-03-06 01:39, Qiu, Michael:
>> On 3/6/2015 1:11 AM, Thomas Monjalon wrote:
>>> 2015-03-06 00:55, Michael Qiu:
>>>> ... skipped ...
>>>>  
>>>> +#if defined RTE_ARCH_I686 || defined RTE_ARCH_X86_64
>>>>  static inline uint32_t
>>>>  crc32c_sse42_u32(uint32_t data, uint32_t init_val)
>>>>  {
>>>> @@ -373,7 +374,9 @@ crc32c_sse42_u32(uint32_t data, uint32_t init_val)
>>>>: [data] "rm" (data));
>>>>return init_val;
>>>>  }
>>>> +#endif
>>> Wouldn't it be more elegant to define a stub which returns 0 in #else
>>> in order to remove #ifdef below?
>>> Not sure, matter of taste.
>> It may be not a good idea, see rte_hash_crc_8byte(), if no crc32
>> support, it will use crc32c_2words(), if we define a stub which returns
>> 0 in #else, then we need always check the return value whether it is
>> none-zero otherwise need fallback.
> I don't think so.
> The stub won't never been called because they are protected by the cpuflag
> condition.

That would be a bad surprise if one tries to launch that pre-built
binary on SSE4.2-capable arch :) It's fine though, if binary portability
is out of scope here.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 1/3 v2] librte_hash: Fix unsupported instruction `crc32' in i686 platform

2015-03-05 Thread Yerden Zhumabekov
Acked-by: Yerden Zhumabekov 

05.03.2015 22:55, Michael Qiu ?:
> CC rte_hash.o
> Error: unsupported instruction `crc32'
>
> The root cause is that i686 platform does not support 'crc32q'
> Need make it only available in x86_64 platform
>
> Signed-off-by: Michael Qiu 
> ---
> v2 --> v1:
>  Make crc32 instruction only works in X86 platform
>  lib/librte_hash/rte_hash_crc.h | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index d28bb2a..c0a789e 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -364,6 +364,7 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>   return crc;
>  }
>  
> +#if defined RTE_ARCH_I686 || defined RTE_ARCH_X86_64
>  static inline uint32_t
>  crc32c_sse42_u32(uint32_t data, uint32_t init_val)
>  {
> @@ -373,7 +374,9 @@ crc32c_sse42_u32(uint32_t data, uint32_t init_val)
>   : [data] "rm" (data));
>   return init_val;
>  }
> +#endif
>  
> +#ifdef RTE_ARCH_X86_64
>  static inline uint32_t
>  crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>  {
> @@ -383,7 +386,9 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>   : [data] "rm" (data));
>   return init_val;
>  }
> +#endif
>  
> +#if defined RTE_ARCH_I686 || defined RTE_ARCH_X86_64
>  static inline uint32_t
>  crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
>  {
> @@ -397,6 +402,7 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
>   init_val = crc32c_sse42_u32(d.u32[1], init_val);
>   return init_val;
>  }
> +#endif
>  
>  #define CRC32_SW(1U << 0)
>  #define CRC32_SSE42 (1U << 1)
> @@ -455,8 +461,10 @@ rte_hash_crc_init_alg(void)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> +#if defined RTE_ARCH_I686 || defined RTE_ARCH_X86_64
>   if (likely(crc32_alg & CRC32_SSE42))
>   return crc32c_sse42_u32(data, init_val);
> +#endif
>  
>   return crc32c_1word(data, init_val);
>  }
> @@ -476,11 +484,15 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> +#ifdef RTE_ARCH_X86_64
>   if (likely(crc32_alg == CRC32_SSE42_x64))
>   return crc32c_sse42_u64(data, init_val);
> +#endif
>  
> +#if defined RTE_ARCH_I686 || defined RTE_ARCH_X86_64
>   if (likely(crc32_alg & CRC32_SSE42))
>   return crc32c_sse42_u64_mimic(data, init_val);
> +#endif
>  
>   return crc32c_2words(data, init_val);
>  }

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 1/3] librte_hash: Fix unsupported instruction `crc32' in i686 platform

2015-03-05 Thread Yerden Zhumabekov
Hi Michael,

Thanks for this patch, in fact I didn't try to compile it on i686 when
developing original software fallback for CRC32.

I think if we want to make code compilable as wide as possible, we
should compile out all SSE4.2 instructions. As to the patch, we may
compile out 'crc32l' instruction emitting code if the arch is not x86.
This concerns two functions: crc32c_sse42_u32() and
crc32c_sse42_u64_mimic().

The compile check might be something like this:

#if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64)
#endif

Otherwise, the patch looks good.

05.03.2015 19:15, Michael Qiu ?:
> CC rte_hash.o
> Error: unsupported instruction `crc32'
>
> The root cause is that i686 platform does not support 'crc32q'
> Need make it only available in x86_64 platform
>
> Signed-off-by: Michael Qiu 
> ---
>  lib/librte_hash/rte_hash_crc.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index d28bb2a..4e9546f 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -374,6 +374,7 @@ crc32c_sse42_u32(uint32_t data, uint32_t init_val)
>   return init_val;
>  }
>  
> +#ifdef RTE_ARCH_X86_64
>  static inline uint32_t
>  crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>  {
> @@ -383,6 +384,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>   : [data] "rm" (data));
>   return init_val;
>  }
> +#endif
>  
>  static inline uint32_t
>  crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
> @@ -476,8 +478,10 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> +#ifdef RTE_ARCH_X86_64
>   if (likely(crc32_alg == CRC32_SSE42_x64))
>   return crc32c_sse42_u64(data, init_val);
> +#endif
>  
>   if (likely(crc32_alg & CRC32_SSE42))
>   return crc32c_sse42_u64_mimic(data, init_val);

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v2] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov
All notes taken into account. v3 posted.

25.02.2015 17:34, Bruce Richardson ?:
> On Wed, Feb 25, 2015 at 10:08:32AM +0600, Yerden Zhumabekov wrote:
>> New function test_crc32_hash_alg_equiv() checks whether software,
>> 4-byte operand and 8-byte operand versions of CRC32 hash function
>> implementations return the same result value.
>>
>> Signed-off-by: Yerden Zhumabekov 
> Two small notes below for improving output on error.
>
> Acked-by: Bruce Richardson 
>
>> ---
>>  app/test/test_hash.c |   63 
>> ++
>>  1 file changed, 63 insertions(+)
>>
>> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
>> index 76b1b8f..3e94af1 100644
>> --- a/app/test/test_hash.c
>> +++ b/app/test/test_hash.c
>> @@ -177,6 +177,66 @@ static struct rte_hash_parameters ut_params = {
>>  .socket_id = 0,
>>  };
>>  
>> +#define CRC32_ITERATIONS (1U << 20)
>> +#define CRC32_DWORDS (1U << 6)
>> +/*
>> + * Test if all CRC32 implementations yield the same hash value
>> + */
>> +static int
>> +test_crc32_hash_alg_equiv(void)
>> +{
>> +uint32_t hash_val;
>> +uint32_t init_val;
>> +uint64_t data64[CRC32_DWORDS];
>> +unsigned i, j;
>> +size_t data_len;
>> +
>> +printf("# CRC32 implementations equivalence test\n");
>> +for (i = 0; i < CRC32_ITERATIONS; i++) {
>> +/* Randomizing data_len of data set */
>> +data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
>> +init_val = (uint32_t) rte_rand();
>> +
>> +/* Fill the data set */
>> +for (j = 0; j < CRC32_DWORDS; j++)
>> +data64[j] = rte_rand();
>> +
>> +/* Calculate software CRC32 */
>> +rte_hash_crc_set_alg(CRC32_SW);
>> +hash_val = rte_hash_crc(data64, data_len, init_val);
>> +
>> +/* Check against 4-byte-operand sse4.2 CRC32 if available */
>> +rte_hash_crc_set_alg(CRC32_SSE42);
>> +if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
>> +printf("Failed checking CRC32_SW against 
>> CRC32_SSE42\n");
>> +break;
>> +}
>> +
>> +/* Check against 8-byte-operand sse4.2 CRC32 if available */
>> +rte_hash_crc_set_alg(CRC32_SSE42_x64);
>> +if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
>> +printf("Failed checking CRC32_SW against 
>> CRC32_SSE42_x64\n");
>> +break;
>> +}
>> +}
>> +
>> +/* Resetting to best available algorithm */
>> +rte_hash_crc_set_alg(CRC32_SSE42_x64);
>> +
>> +if (i == CRC32_ITERATIONS)
>> +return 0;
>> +
>> +printf("Failed test data (hex):\n");
>> +
>> +for (j = 0; j < data_len; j++) {
>> +printf("%02X", ((uint8_t *)data64)[j]);
> Put in a space after each hex character, otherwise it comes out like:
>
> Failed test data (hex):
> AAD292776348010C7A18D3080DB3A300
> FD
> Test Failed
>
> [I forced a failure by changing a != to == to test it, don't worry, the
> hash calculations are fine! :-)]
>
>> +    if ((j+1) % 16 == 0 || j == data_len - 1)
>> +printf("\n");
>> +}
> Maybe also print out here, or before the hex digits, the length of the data
> that was tested. e.g. "printf("%u bytes total\n", data_len);" or similar.
>> +
>> +return -1;
>> +}
>> +
>>  /*
>>   * Test a hash function.
>>   */
>> @@ -1356,6 +1416,9 @@ test_hash(void)
>>  
>>  run_hash_func_tests();
>>  
>> +if (test_crc32_hash_alg_equiv() < 0)
>> +return -1;
>> +
>>  return 0;
>>  }
>>  
>> -- 
>> 1.7.9.5
>>

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v3] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov
New function test_crc32_hash_alg_equiv() checks whether software,
4-byte operand and 8-byte operand versions of CRC32 hash function
implementations return the same result value.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c |   60 ++
 1 file changed, 60 insertions(+)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 76b1b8f..653dd86 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -177,6 +177,63 @@ static struct rte_hash_parameters ut_params = {
.socket_id = 0,
 };

+#define CRC32_ITERATIONS (1U << 20)
+#define CRC32_DWORDS (1U << 6)
+/*
+ * Test if all CRC32 implementations yield the same hash value
+ */
+static int
+test_crc32_hash_alg_equiv(void)
+{
+   uint32_t hash_val;
+   uint32_t init_val;
+   uint64_t data64[CRC32_DWORDS];
+   unsigned i, j;
+   size_t data_len;
+
+   printf("# CRC32 implementations equivalence test\n");
+   for (i = 0; i < CRC32_ITERATIONS; i++) {
+   /* Randomizing data_len of data set */
+   data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
+   init_val = (uint32_t) rte_rand();
+
+   /* Fill the data set */
+   for (j = 0; j < CRC32_DWORDS; j++)
+   data64[j] = rte_rand();
+
+   /* Calculate software CRC32 */
+   rte_hash_crc_set_alg(CRC32_SW);
+   hash_val = rte_hash_crc(data64, data_len, init_val);
+
+   /* Check against 4-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42\n");
+   break;
+   }
+
+   /* Check against 8-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42_x64\n");
+   break;
+   }
+   }
+
+   /* Resetting to best available algorithm */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+
+   if (i == CRC32_ITERATIONS)
+   return 0;
+
+   printf("Failed test data (hex, %lu bytes total):\n", data_len);
+   for (j = 0; j < data_len; j++)
+   printf("%02X%c", ((uint8_t *)data64)[j],
+   ((j+1) % 16 == 0 || j == data_len - 1) ? '\n' : 
' ');
+
+   return -1;
+}
+
 /*
  * Test a hash function.
  */
@@ -1356,6 +1413,9 @@ test_hash(void)

run_hash_func_tests();

+   if (test_crc32_hash_alg_equiv() < 0)
+   return -1;
+
return 0;
 }

-- 
1.7.9.5



[dpdk-dev] [PATCH v2] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov
New function test_crc32_hash_alg_equiv() checks whether software,
4-byte operand and 8-byte operand versions of CRC32 hash function
implementations return the same result value.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c |   63 ++
 1 file changed, 63 insertions(+)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 76b1b8f..3e94af1 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -177,6 +177,66 @@ static struct rte_hash_parameters ut_params = {
.socket_id = 0,
 };

+#define CRC32_ITERATIONS (1U << 20)
+#define CRC32_DWORDS (1U << 6)
+/*
+ * Test if all CRC32 implementations yield the same hash value
+ */
+static int
+test_crc32_hash_alg_equiv(void)
+{
+   uint32_t hash_val;
+   uint32_t init_val;
+   uint64_t data64[CRC32_DWORDS];
+   unsigned i, j;
+   size_t data_len;
+
+   printf("# CRC32 implementations equivalence test\n");
+   for (i = 0; i < CRC32_ITERATIONS; i++) {
+   /* Randomizing data_len of data set */
+   data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
+   init_val = (uint32_t) rte_rand();
+
+   /* Fill the data set */
+   for (j = 0; j < CRC32_DWORDS; j++)
+   data64[j] = rte_rand();
+
+   /* Calculate software CRC32 */
+   rte_hash_crc_set_alg(CRC32_SW);
+   hash_val = rte_hash_crc(data64, data_len, init_val);
+
+   /* Check against 4-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42\n");
+   break;
+   }
+
+   /* Check against 8-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42_x64\n");
+   break;
+   }
+   }
+
+   /* Resetting to best available algorithm */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+
+   if (i == CRC32_ITERATIONS)
+   return 0;
+
+   printf("Failed test data (hex):\n");
+
+   for (j = 0; j < data_len; j++) {
+   printf("%02X", ((uint8_t *)data64)[j]);
+   if ((j+1) % 16 == 0 || j == data_len - 1)
+   printf("\n");
+   }
+
+   return -1;
+}
+
 /*
  * Test a hash function.
  */
@@ -1356,6 +1416,9 @@ test_hash(void)

run_hash_func_tests();

+   if (test_crc32_hash_alg_equiv() < 0)
+   return -1;
+
return 0;
 }

-- 
1.7.9.5



[dpdk-dev] [PATCH] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov

24.02.2015 20:57, Bruce Richardson ?:
> +#define CRC32_ITERATIONS (1U << 16)
> This test takes almost no time at all, so maybe we want to do a few more
> iterations e.g. 2^18 - 2^20. 
Noted, I'll put (1U << 20).
>> +printf("# CRC32 implementations equivalence test\n");
>> +for (i = 0; i < CRC32_ITERATIONS; i++) {
>> +/* Randomizing data_len of data set */
>> +data_len = (size_t) (rte_rand() % sizeof(data64) + 1);
> I suggest parenthesis around the % operation for clarity.
Noted.
>> +init_val = (uint32_t) rte_rand();
>> +
>> +/* Fill the data set */
>> +for (j = 0; j < CRC32_DWORDS; j++) {
>> +data64[j] = rte_rand();
>> +}
> As a matter of style, we generally omit braces for single-statement loop 
> bodies.
Noted.
>> +
>> +/* Calculate software CRC32 */
>> +rte_hash_crc_set_alg(CRC32_SW);
>> +hash_val = rte_hash_crc(data64, data_len, init_val);
>> +
>> +/* Check against 4-byte-operand sse4.2 CRC32 if available */
>> +rte_hash_crc_set_alg(CRC32_SSE42);
>> +if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
>> +res = -1;
> I think you need a print statement here, stating that the test failed, and
> why exactly it failed.
> Also, rather than setting res to -1, you can just do a print and break, and
> change "return res" below to "return i == CRC32_ITERATIONS ? 0 : -1", making
> use of the fact that you can check i to detect early termination on error.

Noted; then I suggest I'll print out test data which caused the break as
well. It might be handy for further investigation.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH] app/test: add crc32 algorithms equivalence check

2015-02-24 Thread Yerden Zhumabekov
New function test_crc32_hash_alg_equiv() checks whether software,
4-byte operand and 8-byte operand versions of CRC32 hash function
implementations return the same result value.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c |   53 ++
 1 file changed, 53 insertions(+)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 76b1b8f..941dc69 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -177,6 +177,56 @@ static struct rte_hash_parameters ut_params = {
.socket_id = 0,
 };

+#define CRC32_ITERATIONS (1U << 16)
+#define CRC32_DWORDS (1U << 6)
+/*
+ * Test if all CRC32 implementations yield the same hash value
+ */
+static int
+test_crc32_hash_alg_equiv(void)
+{
+   uint32_t hash_val;
+   uint32_t init_val;
+   uint64_t data64[CRC32_DWORDS];
+   unsigned i, j;
+   size_t data_len;
+   int res = 0;
+
+   printf("# CRC32 implementations equivalence test\n");
+   for (i = 0; i < CRC32_ITERATIONS; i++) {
+   /* Randomizing data_len of data set */
+   data_len = (size_t) (rte_rand() % sizeof(data64) + 1);
+   init_val = (uint32_t) rte_rand();
+
+   /* Fill the data set */
+   for (j = 0; j < CRC32_DWORDS; j++) {
+   data64[j] = rte_rand();
+   }
+
+   /* Calculate software CRC32 */
+   rte_hash_crc_set_alg(CRC32_SW);
+   hash_val = rte_hash_crc(data64, data_len, init_val);
+
+   /* Check against 4-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   res = -1;
+   break;
+   }
+
+   /* Check against 8-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   res = -1;
+   break;
+   }
+   }
+
+   /* Resetting to best available algorithm */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   return res;
+}
+
 /*
  * Test a hash function.
  */
@@ -1356,6 +1406,9 @@ test_hash(void)

run_hash_func_tests();

+   if (test_crc32_hash_alg_equiv() < 0)
+   return -1;
+
return 0;
 }

-- 
1.7.9.5



[dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent

2015-02-24 Thread Yerden Zhumabekov

23.02.2015 23:36, Thomas Monjalon ?:
> 2015-02-19 15:21, Bruce Richardson:
>> Confirmed, this worked for me too.
>> Looking at the patches, they look good. However, one thing I think we are 
>> missing
>> is a unit test to verify that all our CRC implementations give the same 
>> result.
>> That would be useful as a sanity check of the software fallback especially. 
>> The
>> existing hash tests, test the hash table implementation rather than the
>> mathematical argorithm used to compute the hash values.
>>
>> Overall, though, software fallback for CRC is something well worthwhile 
>> having.
>>
>> Series Acked-by: Bruce Richardson 
> Applied, thanks
>
> Note: running doxygen compilation helped me to find and fix a small
> mismatch (parameter alg was flag in comment).

Thanks, Bruce, Thomas.

As for yielding the same hash value, I made a test which runs every
CRC32 implementation across a number of randomly generated data sets.
Results are equal on my trial run.

I can post a patch for test_hash.c a bit later if this kind of check
suffices.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent

2015-02-02 Thread Yerden Zhumabekov

02.02.2015 9:31, Neil Horman ?:
> On Mon, Feb 02, 2015 at 09:07:45AM +0600, Yerden Zhumabekov wrote:
>
>> I think so, I've just successfully built it against latest snapshot with
>> RTE_TARGET
>> equal to 'x86_64-native-linuxapp-gcc'.
>>
> Please confirm that setting the machine type to default builds and runs 
> properly.

If I understood you correctly, I set CONFIG_RTE_MACHINE="default" in the
config and the build was successful.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics

2015-02-02 Thread Yerden Zhumabekov

02.02.2015 11:15, Liang, Cunming ?:
>
>> +static inline uint32_t
>> +crc32c_sse42_u64(uint64_t data, uint64_t init_val)
>> +{
>> +__asm__ volatile(
>> +"crc32q %[data], %[init_val];"
>> +: [init_val] "+r" (init_val)
>> +: [data] "rm" (data));
>> +return init_val;
>> +}
> [LCM] I'm curious about the benefit of replacing CRC32 intrinsic
> "_mm_crc32_u32/64".

These intrinsics are not available on a platform which has no SSE4.2
support so the build would fail.

See previous suggestion from Neil: 
http://dpdk.org/ml/archives/dev/2014-November/008353.html

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent

2015-02-02 Thread Yerden Zhumabekov

01.02.2015 20:13, Neil Horman ?:
> On Thu, Jan 29, 2015 at 02:48:11PM +0600, Yerden Zhumabekov wrote:
>> This is a rework of my previous patches improving performance of 
>> rte_hash_crc.
>>
>> Summary of changes:
>> * software implementation of CRC32 introduced;
>> * in the runtime, algorithm can fall back to software version if CPU doesn't 
>> support SSE4.2;
>> * best available algorithm is automatically detected upon application 
>> startup;
>> * redundant compile checks removed from test utilities;
>> * assembly code for emitting SSE4.2 instructions is used instead of built-in 
>> intrinsics;
>> * rte_hash_crc() function performance significantly improved.
>>
>> v6 changes:
>> * added 'const' qualifier to crc32c lookup tables declaration.
> Just to be clear, this does build if you compile it against the "default"
> machine type, correct?
> Neil

I think so, I've just successfully built it against latest snapshot with
RTE_TARGET
equal to 'x86_64-native-linuxapp-gcc'.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v6 7/7] test: remove redundant compile checks

2015-01-29 Thread Yerden Zhumabekov
Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c  |7 ---
 app/test/test_hash_perf.c |   11 ---
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 
/***
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 
21, 31, 32, 33, 63, 64};
 
/**/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 100

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 
31, 32, 33, 63, 64};
 
/**/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64,rte_jhash,   
0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |HashFunc | 
InitVal */
 { ADD_ON_EMPTY,1024, 1024,   1,  16, rte_hash_crc,   
0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64, rte_hash_crc,   
0},
-#endif
 };

 
/**/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
if (f == rte_jhash)
return "jhash";

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
if (f == rte_hash_crc)
return "rte_hash_crc";
-#endif

return "UnknownHash";
 }
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 5/7] hash: add fallback to software CRC32 implementation

2015-01-29 Thread Yerden Zhumabekov
Initially, SSE4.2 support is detected via the constructor function.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default.

rte_hash_crc_*byte() functions reworked so they choose available
CRC32 implementation in the runtime.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   61 ++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 6cc67cd..435048e 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,6 +45,8 @@ extern "C" {
 #endif

 #include 
+#include 
+#include 

 /* Lookup tables for software implementation of CRC32C */
 static const uint32_t crc32c_tables[8][256] = {{
@@ -396,8 +398,52 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
return init_val;
 }

+#define CRC32_SW(1U << 0)
+#define CRC32_SSE42 (1U << 1)
+#define CRC32_x64   (1U << 2)
+#define CRC32_SSE42_x64 (CRC32_x64|CRC32_SSE42)
+
+static uint8_t crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default)
+ *
+ */
+static inline void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+   switch (alg) {
+   case CRC32_SSE42_x64:
+   if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T))
+   alg = CRC32_SSE42;
+   case CRC32_SSE42:
+   if (! rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+   alg = CRC32_SW;
+   case CRC32_SW:
+   crc32_alg = alg;
+   default:
+   break;
+   }
+}
+
+/* Setting the best available algorithm */
+static inline void __attribute__((constructor))
+rte_hash_crc_init_alg(void)
+{
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -409,11 +455,16 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-   return crc32c_sse42_u32(data, init_val);
+   if (likely(crc32_alg & CRC32_SSE42))
+   return crc32c_sse42_u32(data, init_val);
+
+   return crc32c_1word(data, init_val);
 }

 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -425,7 +476,13 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-   return crc32c_sse42_u64(data, init_val);
+   if (likely(crc32_alg == CRC32_SSE42_x64))
+   return crc32c_sse42_u64(data, init_val);
+
+   if (likely(crc32_alg & CRC32_SSE42))
+   return crc32c_sse42_u64_mimic(data, init_val);
+
+   return crc32c_2words(data, init_val);
 }

 /**
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 4/7] hash: add rte_hash_crc_8byte function

2015-01-29 Thread Yerden Zhumabekov
SSE4.2 provides CRC32 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 45b0dce..6cc67cd 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return crc32c_sse42_u64(data, init_val);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 3/7] hash: replace built-in functions implementing SSE4.2

2015-01-29 Thread Yerden Zhumabekov
Give up using built-in intrinsics and use our own assembly
implementation. Remove #include entry as well.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index fe35996..45b0dce 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,6 @@ extern "C" {
 #endif

 #include 
-#include 

 /* Lookup tables for software implementation of CRC32C */
 static const uint32_t crc32c_tables[8][256] = {{
@@ -410,7 +409,7 @@ crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-   return _mm_crc32_u32(init_val, data);
+   return crc32c_sse42_u32(data, init_val);
 }

 /**
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 2/7] hash: add assembly implementation of CRC32 intrinsics

2015-01-29 Thread Yerden Zhumabekov
Added:
- crc32c_sse42_u32() emits 'crc32l' asm instruction;
- crc32c_sse42_u64() emits 'crc32q' asm instruction;
- crc32c_sse42_u64_mimic(), wrapper in case of run on 32-bit platform.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4da7ca4..fe35996 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -363,6 +363,40 @@ crc32c_2words(uint64_t data, uint32_t init_val)
return crc;
 }

+static inline uint32_t
+crc32c_sse42_u32(uint32_t data, uint32_t init_val)
+{
+   __asm__ volatile(
+   "crc32l %[data], %[init_val];"
+   : [init_val] "+r" (init_val)
+   : [data] "rm" (data));
+   return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64(uint64_t data, uint64_t init_val)
+{
+   __asm__ volatile(
+   "crc32q %[data], %[init_val];"
+   : [init_val] "+r" (init_val)
+   : [data] "rm" (data));
+   return init_val;
+}
+
+static inline uint32_t
+crc32c_sse42_u64_mimic(uint64_t data, uint64_t init_val)
+{
+   union {
+   uint32_t u32[2];
+   uint64_t u64;
+   } d;
+
+   d.u64 = data;
+   init_val = crc32c_sse42_u32(d.u32[0], init_val);
+   init_val = crc32c_sse42_u32(d.u32[1], init_val);
+   return init_val;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
  *
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 0/7] rte_hash_crc reworked to be platform-independent

2015-01-29 Thread Yerden Zhumabekov
This is a rework of my previous patches improving performance of rte_hash_crc.

Summary of changes:
* software implementation of CRC32 introduced;
* in the runtime, algorithm can fall back to software version if CPU doesn't 
support SSE4.2;
* best available algorithm is automatically detected upon application startup;
* redundant compile checks removed from test utilities;
* assembly code for emitting SSE4.2 instructions is used instead of built-in 
intrinsics;
* rte_hash_crc() function performance significantly improved.

v6 changes:
* added 'const' qualifier to crc32c lookup tables declaration.

v5 changes:
* given up gcc's builtin SSE4.2 intrinsics;
* add assembly code for emitting SSE4.2 instructions.

v4 changes:
* icc-specific compile checks removed.

v3 changes:
* setting default algorithm implementation as a constructor while application 
startup;
* crc32 software implementation improved;
* removed compile-time checks from test_hash_perf and test_hash.

v2 changes:
* added CRC32 software implementation;
* added rte_hash_crc_set_alg() function to control availability of SSE4.2;
* added fallback to sw crc32 in case SSE4.2 is not available, or if SSE4.2 is 
intentionally disabled.

Initial version (v1) changes:
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand;
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash 
calculation functions with 4 and 8-byte operands.


Yerden Zhumabekov (7):
  hash: add software CRC32 implementation
  hash: add assembly implementation of CRC32 intrinsics
  hash: replace built-in functions implementing SSE4.2
  hash: add rte_hash_crc_8byte function
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c   |7 -
 app/test/test_hash_perf.c  |   11 -
 lib/librte_hash/rte_hash_crc.h |  459 +++-
 3 files changed, 448 insertions(+), 29 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent

2014-11-28 Thread Yerden Zhumabekov

28.11.2014 3:04, Thomas Monjalon ?:
> 2014-11-20 11:15, Yerden Zhumabekov:
>> These patches bring a fallback mechanism to ensure that CRC32 hash is 
>> calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
>> Performance is also improved by slicing data in 8 bytes.
>>
>> Patches were tested on machines either with and without SSE4.2 support.
>>
>> Software implementation seems to be about 4-5 times slower than 
>> SSE4.2-enabled one. Of course, they return identical results.
>>
>> Summary of changes:
>> * added CRC32 software implementation, which is used as a fallback in case 
>> SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
>> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
>> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
>> * reworked rte_hash_crc() function which leverages both versions of CRC32 
>> hash calculation functions with 4 and 8-byte operands.
>> * removed compile-time checks from test_hash_perf and test_hash.
>> * setting default algorithm implementation as a constructor while 
>> application startup.
>> * SSE4.2 intrinsics are implemented through inline assembly code.
>> * added additional run-time check for 64-bit support.
> So you don't want to use the target attribute as suggested by Konstantin?
>
> Why the discussion ended without any acknowledgement?
>

I decided to emit SSE4.2 instruction right from the code, because:
* it is supported by gcc 4.3;
* use of target attribute (in a way suggested by Konstantin) presumably
still requires us to use #ifdef which we want to avoid.

Actually then, I didn't investigate it further. I'm quite happy with
last revision, but I'm open for ideas and discussion.
I made new patch series with solely change of crc32c tables declaration
using 'const' just as Stephen suggested, and I may post it. But I'd like
to see a confirmation for what I've done so far.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function

2014-11-21 Thread Yerden Zhumabekov

21.11.2014 17:22, Neil Horman ?:
> On Thu, Nov 20, 2014 at 11:16:34AM +0600, Yerden Zhumabekov wrote:
>> SSE4.2 provides CRC32 intrinsic with 8-byte operand.
>>
>> Signed-off-by: Yerden Zhumabekov 
>> ---
>>  lib/librte_hash/rte_hash_crc.h |   16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
>> index cd28833..2c8ec99 100644
>> --- a/lib/librte_hash/rte_hash_crc.h
>> +++ b/lib/librte_hash/rte_hash_crc.h
>> @@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>  }
>>  
>>  /**
>> + * Use single crc32 instruction to perform a hash on a 8 byte value.
>> + *
>> + * @param data
>> + *   Data to perform hash on.
>> + * @param init_val
>> + *   Value to initialise hash generator.
>> + * @return
>> + *   32bit calculated hash value.
>> + */
>> +static inline uint32_t
>> +rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>> +{
>> +return crc32c_sse42_u64(data, init_val);
>> +}
>> +
>> +/**
>>   * Use crc32 instruction to perform a hash.
>>   *
>>   * @param data
>> -- 
>> 1.7.9.5
>>
>>
> I'm sorry, it may be early here, so I may be missing something. The assembly
> implementations look great, but if a user calls rte_hash_crc_8byte on a system
> that doesn't support ss342, how do they wind up getting into the software crc
> implementation given what you have above?
> Neil

After applying patch 4 out of 7 - there's no fall back.  Fall back to SW
crc32 algorithm is in patch 5/7.

Moreover, after patch 5/7  there's a detection if the platform supports
64-bit, otherwise 64-bit operand support is mimicked using two 32-bit
function calls.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v5 7/7] test: remove redundant compile checks

2014-11-20 Thread Yerden Zhumabekov
Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c  |7 ---
 app/test/test_hash_perf.c |   11 ---
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 
/***
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 
21, 31, 32, 33, 63, 64};
 
/**/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 100

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 
31, 32, 33, 63, 64};
 
/**/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64,rte_jhash,   
0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |HashFunc | 
InitVal */
 { ADD_ON_EMPTY,1024, 1024,   1,  16, rte_hash_crc,   
0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64, rte_hash_crc,   
0},
-#endif
 };

 
/**/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
if (f == rte_jhash)
return "jhash";

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
if (f == rte_hash_crc)
return "rte_hash_crc";
-#endif

return "UnknownHash";
 }
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 6/7] hash: rte_hash_crc() slices data into 8-byte pieces

2014-11-20 Thread Yerden Zhumabekov
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 469b4f5..39d0569 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -486,7 +486,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }

 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -501,23 +501,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint32_t temp = 0;
-   const uint32_t *p32 = (const uint32_t *)data;
+   uint64_t temp = 0;
+   const uint64_t *p64 = (const uint64_t *)data;

-   for (i = 0; i < data_len / 4; i++) {
-   init_val = rte_hash_crc_4byte(*p32++, init_val);
+   for (i = 0; i < data_len / 8; i++) {
+   init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (3 - (data_len & 0x03)) {
+   switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= *((const uint8_t *)p32 + 2) << 16;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= *((const uint8_t *)p32 + 1) << 8;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= *((const uint8_t *)p32);
+   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+   temp |= *((const uint32_t *)p64);
+   init_val = rte_hash_crc_8byte(temp, init_val);
+   break;
+   case 3:
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   break;
+   case 4:
+   temp |= *((const uint8_t *)p64 + 2) << 16;
+   /* Fallthrough */
+   case 5:
+   temp |= *((const uint8_t *)p64 + 1) << 8;
+   /* Fallthrough */
+   case 6:
+   temp |= *((const uint8_t *)p64);
init_val = rte_hash_crc_4byte(temp, init_val);
+   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 4/7] hash: add rte_hash_crc_8byte function

2014-11-20 Thread Yerden Zhumabekov
SSE4.2 provides CRC32 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index cd28833..2c8ec99 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -413,6 +413,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return crc32c_sse42_u64(data, init_val);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH v5 0/7] rte_hash_crc reworked to be platform-independent

2014-11-20 Thread Yerden Zhumabekov
These patches bring a fallback mechanism to ensure that CRC32 hash is 
calculated regardless of hardware support from CPU (i.e. SSE4.2 intrinsics).
Performance is also improved by slicing data in 8 bytes.

Patches were tested on machines either with and without SSE4.2 support.

Software implementation seems to be about 4-5 times slower than SSE4.2-enabled 
one. Of course, they return identical results.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case 
SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash 
calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application 
startup.
* SSE4.2 intrinsics are implemented through inline assembly code.
* added additional run-time check for 64-bit support.

Yerden Zhumabekov (7):
  hash: add software CRC32 implementation
  hash: add assembly implementation of CRC32 intrinsics
  hash: replace built-in functions implementing SSE4.2
  hash: add rte_hash_crc_8byte function
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c   |7 -
 app/test/test_hash_perf.c  |   11 -
 lib/librte_hash/rte_hash_crc.h |  459 +++-
 3 files changed, 448 insertions(+), 29 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-20 Thread Yerden Zhumabekov

19.11.2014 21:07, Neil Horman ?:
> On Wed, Nov 19, 2014 at 05:35:51PM +0600, Yerden Zhumabekov wrote:
>> static inline uint32_t
>> crc32_sse42_u32(uint32_t data, uint32_t init_val)
>> {
>> /*??__asm__ volatile(
>> "crc32l %[data], %[init_val];"
>> : [init_val] "+r" (init_val)
>> : [data] "rm" (data));
>> return init_val;*/
>>
>> But wait, will __builtin_ia32_crc32si and __builtin_ia32_crc32di
>> functions do the trick? ICC has them?
> If builtins work on both icc and gcc, yes, that would be a solution as it
> creates non sse instructions when the target cpu doesn't support it.

Can anyone acknowledge?

>
>> What about prototyping functions and extracting their bodies to separate
>> module? Does it break anything?
>>
> That would be a variant on the asm inline idea, but yes, I think that would 
> work
> too

No luck. Performance degrades up to 30-50 percent if extracting
functions to separate module.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-19 Thread Yerden Zhumabekov

19.11.2014 17:50, Ananyev, Konstantin ?:
>
> As I remember with gcc & icc it is possible to specify tht you'd like to 
> compile that particular function
> for different target.
> From https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html:
> "target
> The target attribute is used to specify that a function is to be compiled 
> with different target options than specified on the command line. This can be 
> used for instance to have functions compiled with a different ISA 
> (instruction set architecture) than the default. You can also use the 
> ?#pragma GCC target? pragma to set more than one function to be compiled with 
> specific target options. See Function Specific Option Pragmas, for details 
> about the ?#pragma GCC target? pragma.
> For instance on a 386, you could compile one function with 
> target("sse4.1,arch=core2") and another with target("sse4a,arch=amdfam10"). 
> This is equivalent to compiling the first function with -msse4.1 and 
> -march=core2 options, and the second function with -msse4a and 
> -march=amdfam10 options. It is up to the user to make sure that a function is 
> only invoked on a machine that supports the particular ISA it is compiled for 
> (for example by using cpuid on 386 to determine what feature bits and 
> architecture family are used).
>
>   int core2_func (void) __attribute__ ((__target__ ("arch=core2")));
>   int sse3_func (void) __attribute__ ((__target__ ("sse3")));
> You can either use multiple strings to specify multiple options, or separate 
> the options with a comma (?,?).
>
> The target attribute is presently implemented for i386/x86_64, PowerPC, and 
> Nios II targets only. The options supported are specific to each target.
>
> On the 386, the following options are allowed:
> ...
>  ?sse4.2?
> ?no-sse4.2?"
>
> Wouldn't that suit your purposes?
> Probably you can even keep your function inline with that approach.
Very nice. Thank you. I will test it.


-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-19 Thread Yerden Zhumabekov

19.11.2014 16:16, Bruce Richardson ?:
> On Tue, Nov 18, 2014 at 04:36:24PM -0500, Neil Horman wrote:
>> an alternate option would be to not use the intrinsic, and craft some 
>> explicit
>> __asm__ statement that executes the right sse42 instructions.  That way the 
>> asm
>> is directly emitted, without requiring the -msse42 flag at all, and it will 
>> just
>> work in all the files that call it.
>>
> I really don't like that approach. I think using intrinsics is much more 
> maintainable.
>

static inline uint32_t
crc32_sse42_u32(uint32_t data, uint32_t init_val)
{
/*??__asm__ volatile(
"crc32l %[data], %[init_val];"
: [init_val] "+r" (init_val)
: [data] "rm" (data));
return init_val;*/

But wait, will __builtin_ia32_crc32si and __builtin_ia32_crc32di
functions do the trick? ICC has them?
What about prototyping functions and extracting their bodies to separate
module? Does it break anything?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-19 Thread Yerden Zhumabekov

18.11.2014 23:29, Wang, Shawn ?:
> I have a general question about using CPUID to detect supported instruction 
> set.
> What if we are compiling the software with some old hardware which does not 
> support SSE4.2, but run it on new hardware which does support SSE4.2. Is 
> there still a static way to force the compiler to turn on the SSE4.2 support? 
> I guess for SSE4.2, most of the CPU has support for it now. But for AVX2, 
> this might not be the case.
According to gcc 4.7 changes (https://gcc.gnu.org/gcc-4.7/changes.html)
they've added support for AVX2 instructions since that version.
Use -mavx2 or -march=core-avx2. The latter seems to be supported by ICC
as well, according to Google :)

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-19 Thread Yerden Zhumabekov

19.11.2014 3:36, Neil Horman ?:
> On Tue, Nov 18, 2014 at 05:52:27PM +, Bruce Richardson wrote:
>> On Tue, Nov 18, 2014 at 12:46:19PM -0500, Neil Horman wrote:
>>> On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
>>>> Everybody's up for the second option? :)
>>>>
>>> Crud, you're right, I didn't think about the header inclusion issue.  Is it
>>> worth adding the jump to enable the dynamic hash selection?
>>> Neil
>> Maybe for cases where SSE4.2 is not currently available, i.e. for generic 
>> builds.
>> For builds where we have hardware support confirmed at compile time, just use
>> the function from the header file.
>> Does that make sense?
>>
> I'm not certain of that, as I don't think anything can be 'confirmed' at 
> compile
> time.  I.e. just because you have sse42 at compile time doesn't guarantee you
> have it at run time with a DSO.  If you have these as macros, you need to 
> enable
> sse42 whereever you include the file so that the intrinsic works properly.
>
> an alternate option would be to not use the intrinsic, and craft some explicit
> __asm__ statement that executes the right sse42 instructions.  That way the 
> asm
> is directly emitted, without requiring the -msse42 flag at all, and it will 
> just
> work in all the files that call it.

Thanks for the discussion. To summarize it with my suggestions for 'v5':
1) replace intrinsics with asm code and give up including nmmintrin.h;
2) detect arch (EM64T flag) on runtime because crc32 for 64-bit operand
doesn't work on 32-bit x86;
3) separate function prototypes (leaving them in header) and bodies, add
to SRCS in Makefile.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov

18.11.2014 23:46, Neil Horman ?:
> On Tue, Nov 18, 2014 at 11:13:17PM +0600, Yerden Zhumabekov wrote:
>> 18.11.2014 22:00, Neil Horman ?:
>>>
>>> You need to edit the makefile so that the compiler gets passed the option
>>> -msse42.  That way it will know to emit sse42 instructions. It will also 
>>> allow
>>> you to remove the ifdef from the include file
>> In this case, I guess there are two options:
>> 1) modify all makefiles which use librte_hash
>> 2) move all function bodies from rte_hash_crc.h to separate module,
>> leaving prototype definitions there only.
>>
>> Everybody's up for the second option? :)
>>
> Crud, you're right, I didn't think about the header inclusion issue.  Is it
> worth adding the jump to enable the dynamic hash selection?

If I understood you correctly - I've already added a function to
dynamically change the CRC32 implementation in the runtime,
rte_hash_crc_set_alg(). I can rework patches once again, if everybody's
fine with the separate module.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov

18.11.2014 22:00, Neil Horman ?:
> On Tue, Nov 18, 2014 at 09:06:35PM +0600, Yerden Zhumabekov wrote:
>> 18.11.2014 20:41, Neil Horman ?:
>>> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
>>>>  /**
>>>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
>>>> + * Fall back to software crc32 implementation in case SSE4.2 is
>>>> + * not supported
>>>>   *
>>>>   * @param data
>>>>   *   Data to perform hash on.
>>>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>>>  static inline uint32_t
>>>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>>>  {
>>>> -  return _mm_crc32_u32(init_val, data);
>>>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>>>> +  if (likely(crc32_alg == CRC32_SSE42))
>>>> +  return _mm_crc32_u32(init_val, data);
>>>> +#endif
>>> you don't really need these ifdefs here anymore given that you have a
>>> constructor to do the algorithm selection.  In fact you need to remove 
>>> them, in
>>> the event you build on a system that doesn't support SSE42, but run on a 
>>> system
>>> that does.
>> Originally, I thought so as well. I wrote the code without these ifdefs,
>> but it didn't compile on my machine which doesn't support SSE4.2. Error
>> was triggered by nmmintrin.h which has a check for respective GCC
>> extension. So I think these ifdefs are indeed required.
>>
> You need to edit the makefile so that the compiler gets passed the option
> -msse42.  That way it will know to emit sse42 instructions. It will also allow
> you to remove the ifdef from the include file

In this case, I guess there are two options:
1) modify all makefiles which use librte_hash
2) move all function bodies from rte_hash_crc.h to separate module,
leaving prototype definitions there only.

Everybody's up for the second option? :)

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov

18.11.2014 20:41, Neil Horman ?:
> On Tue, Nov 18, 2014 at 08:03:40PM +0600, Yerden Zhumabekov wrote:
>> Initially, SSE4.2 support is detected via CPUID instruction.
>>
>> Added rte_hash_crc_set_alg() function to detect and set CRC32
>> implementation if necessary. SSE4.2 is allowed by default. If it's
>> not available, fall back to sw implementation.
>>
>> Best available algorithm is detected upon application startup
>> through the constructor function rte_hash_crc_try_sse442().
>>
>> Signed-off-by: Yerden Zhumabekov 
>> ---
>>  lib/librte_hash/rte_hash_crc.h |   53 
>> ++--
>>  1 file changed, 51 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
>> index 15f687a..332ed99 100644
>> --- a/lib/librte_hash/rte_hash_crc.h
>> +++ b/lib/librte_hash/rte_hash_crc.h
>> @@ -45,7 +45,11 @@ extern "C" {
>>  #endif
>>  
>>  #include 
>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>>  #include 
>> +#endif
>> +#include 
>> +#include 
>>  
>>  /* Lookup tables for software implementation of CRC32C */
>>  static uint32_t crc32c_tables[8][256] = {{
>> @@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>  return crc;
>>  }
>>  
>> +enum crc32_alg_t {
>> +CRC32_SW = 0,
>> +CRC32_SSE42
>> +};
>> +
>> +static enum crc32_alg_t crc32_alg;
>> +
>> +/**
>> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
>> + * calculation.
>> + *
>> + * @param flag
>> + *   unsigned integer flag
>> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
>> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
>> + */
>> +static inline void
>> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
>> +{
>> +int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
>> +enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
>> +crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
>> +}
>> +
>> +/* Best available algorithm is detected via CPUID instruction */
>> +static inline void __attribute__((constructor))
>> +rte_hash_crc_try_sse42(void)
>> +{
>> +rte_hash_crc_set_alg(CRC32_SSE42);
>> +}
>> +
>>  /**
>>   * Use single crc32 instruction to perform a hash on a 4 byte value.
>> + * Fall back to software crc32 implementation in case SSE4.2 is
>> + * not supported
>>   *
>>   * @param data
>>   *   Data to perform hash on.
>> @@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>>  static inline uint32_t
>>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>>  {
>> -return _mm_crc32_u32(init_val, data);
>> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> +if (likely(crc32_alg == CRC32_SSE42))
>> +return _mm_crc32_u32(init_val, data);
>> +#endif
> you don't really need these ifdefs here anymore given that you have a
> constructor to do the algorithm selection.  In fact you need to remove them, 
> in
> the event you build on a system that doesn't support SSE42, but run on a 
> system
> that does.

Originally, I thought so as well. I wrote the code without these ifdefs,
but it didn't compile on my machine which doesn't support SSE4.2. Error
was triggered by nmmintrin.h which has a check for respective GCC
extension. So I think these ifdefs are indeed required.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v4 5/5] test: remove redundant compile checks

2014-11-18 Thread Yerden Zhumabekov
Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c  |7 ---
 app/test/test_hash_perf.c |   11 ---
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 
/***
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 
21, 31, 32, 33, 63, 64};
 
/**/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 100

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 
31, 32, 33, 63, 64};
 
/**/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64,rte_jhash,   
0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |HashFunc | 
InitVal */
 { ADD_ON_EMPTY,1024, 1024,   1,  16, rte_hash_crc,   
0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64, rte_hash_crc,   
0},
-#endif
 };

 
/**/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
if (f == rte_jhash)
return "jhash";

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
if (f == rte_hash_crc)
return "rte_hash_crc";
-#endif

return "UnknownHash";
 }
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 4/5] hash: rte_hash_crc() slices data into 8-byte pieces

2014-11-18 Thread Yerden Zhumabekov
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 332ed99..e7819f3 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -445,7 +445,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }

 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -460,23 +460,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint32_t temp = 0;
-   const uint32_t *p32 = (const uint32_t *)data;
+   uint64_t temp = 0;
+   const uint64_t *p64 = (const uint64_t *)data;

-   for (i = 0; i < data_len / 4; i++) {
-   init_val = rte_hash_crc_4byte(*p32++, init_val);
+   for (i = 0; i < data_len / 8; i++) {
+   init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (3 - (data_len & 0x03)) {
+   switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= *((const uint8_t *)p32 + 2) << 16;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= *((const uint8_t *)p32 + 1) << 8;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= *((const uint8_t *)p32);
+   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+   temp |= *((const uint32_t *)p64);
+   init_val = rte_hash_crc_8byte(temp, init_val);
+   break;
+   case 3:
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   break;
+   case 4:
+   temp |= *((const uint8_t *)p64 + 2) << 16;
+   /* Fallthrough */
+   case 5:
+   temp |= *((const uint8_t *)p64 + 1) << 8;
+   /* Fallthrough */
+   case 6:
+   temp |= *((const uint8_t *)p64);
init_val = rte_hash_crc_4byte(temp, init_val);
+   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov
Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Best available algorithm is detected upon application startup
through the constructor function rte_hash_crc_try_sse442().

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   53 ++--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 15f687a..332ed99 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif

 #include 
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
+#endif
+#include 
+#include 

 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -363,8 +367,41 @@ crc32c_2words(uint64_t data, uint32_t init_val)
return crc;
 }

+enum crc32_alg_t {
+   CRC32_SW = 0,
+   CRC32_SSE42
+};
+
+static enum crc32_alg_t crc32_alg;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+   int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+   enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+   crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
+}
+
+/* Best available algorithm is detected via CPUID instruction */
+static inline void __attribute__((constructor))
+rte_hash_crc_try_sse42(void)
+{
+   rte_hash_crc_set_alg(CRC32_SSE42);
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -376,11 +413,18 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-   return _mm_crc32_u32(init_val, data);
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u32(init_val, data);
+#endif
+
+   return crc32c_1word(data, init_val);
 }

 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -392,7 +436,12 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-   return _mm_crc32_u64(init_val, data);
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u64(init_val, data);
+#endif
+
+   return crc32c_2words(data, init_val);
 }

 /**
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 2/5] hash: add new rte_hash_crc_8byte call

2014-11-18 Thread Yerden Zhumabekov
SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4d7532a..15f687a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -380,6 +380,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 1/5] hash: add software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov
Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |  316 
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4d7532a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include 
 #include 

+/* Lookup tables for software implementation of CRC32C */
+static uint32_t crc32c_tables[8][256] = {{
+ 0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 
0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 
0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 
0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 
0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 
0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 
0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 
0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 
0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 
0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 
0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 
0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 
0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 
0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 
0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 
0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 
0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 
0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 
0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 
0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 
0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 
0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 
0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 
0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 
0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 
0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 
0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 
0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 
0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 
0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 
0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 
0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 
0x5F16D052, 0xAD7D5351
+},
+{
+ 0x, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 
0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 
0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 
0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 
0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 
0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 
0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 
0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 
0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 
0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 
0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 
0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 
0x3409A50F, 0x27AB3D78,
+ 

[dpdk-dev] [PATCH v4 0/5] rte_hash_crc reworked to be platform-independent

2014-11-18 Thread Yerden Zhumabekov
This is a rework of my previous patches improving performance of rte_hash_crc. 
In addition, this revision brings a fallback mechanism to ensure that CRC32 
hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 
intrinsics). Performance of software CRC32 implementation is also improved.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case 
SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash 
calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application 
startup.
* compared to v3, icc-specific code was removed

Patches were tested on machines either with and without SSE4.2 support. 
Software implementation seems to be about 4-5 times slower than SSE4.2-enabled 
one. Of course, they return identical results.

Yerden Zhumabekov (5):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c   |7 -
 app/test/test_hash_perf.c  |   11 --
 lib/librte_hash/rte_hash_crc.h |  416 +++-
 3 files changed, 406 insertions(+), 28 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov

18.11.2014 19:33, Neil Horman ?:
> On Tue, Nov 18, 2014 at 10:56:24AM +0600, Yerden Zhumabekov wrote:
>> Sorry, maybe I made a mistake here.
>>
>> Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
>> to me that constructor attribute is not supported by intel compiler. So
>> in that case here I decided to leave the code for autodetection. Am I
>> correct?
>>
> I don't think thats correct. The Intel Compiler claims support for most GCC
> features, except where explicitly stated in the release notes, and I don't 
> find
> any documentation clearly excepting the constructor attribute from that list.
> That said, since the intel compiler isn't open, I don't have access to it and
> cannot confirm either way, though if its the case, the DPDK has a major issue,
> as __attribute__((constructor)) is used extensively throughout the code
> Neil

My bad. Ok, I'll redo it again and send the series as 'v4'.
Thanks.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov
Sorry, maybe I made a mistake here.

Accoring to lib/librte_eal/common/eal_common_cpuflags.c code, it seemed
to me that constructor attribute is not supported by intel compiler. So
in that case here I decided to leave the code for autodetection. Am I
correct?

18.11.2014 9:21, Yerden Zhumabekov ?:
> Initially, SSE4.2 support is detected via CPUID instruction.
>
> Added rte_hash_crc_set_alg() function to detect and set CRC32
> implementation if necessary. SSE4.2 is allowed by default. If it's
> not available, fall back to sw implementation.
>
> Depending on compiler attributes support, best available algorithm
> may be detected upon application startup.
>
> Signed-off-by: Yerden Zhumabekov 
> ---
>  lib/librte_hash/rte_hash_crc.h |   64 
> ++--
>  1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 15f687a..c1b75e8 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -45,7 +45,11 @@ extern "C" {
>  #endif
>  
>  #include 
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>  #include 
> +#endif
> +#include 
> +#include 
>  
>  /* Lookup tables for software implementation of CRC32C */
>  static uint32_t crc32c_tables[8][256] = {{
> @@ -363,8 +367,44 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>   return crc;
>  }
>  
> +enum crc32_alg_t {
> + CRC32_SW = 0,
> + CRC32_SSE42,
> + CRC32_AUTODETECT
> +};
> +
> +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> +
> +/**
> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> + * calculation.
> + *
> + * @param flag
> + *   unsigned integer flag
> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> + */
> +static inline void
> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> +{
> + int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> + enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> + crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
> +}
> +
> +/* Best available algorithm is detected via CPUID instruction */
> +#ifndef __INTEL_COMPILER
> +static inline void __attribute__((constructor))
> +rte_hash_crc_try_sse42(void)
> +{
> + rte_hash_crc_set_alg(CRC32_SSE42);
> +}
> +#endif
> +
>  /**
>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -376,11 +416,22 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> - return _mm_crc32_u32(init_val, data);
> +#ifdef __INTEL_COMPILER
> + if (unlikely(crc32_alg == CRC32_AUTODETECT))
> + rte_hash_crc_set_alg(CRC32_SSE42);
> +#endif
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> + if (likely(crc32_alg == CRC32_SSE42))
> + return _mm_crc32_u32(init_val, data);
> +#endif
> +
> + return crc32c_1word(data, init_val);
>  }
>  
>  /**
>   * Use single crc32 instruction to perform a hash on a 8 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -392,7 +443,16 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> - return _mm_crc32_u64(init_val, data);
> +#ifdef __INTEL_COMPILER
> + if (unlikely(crc32_alg == CRC32_AUTODETECT))
> + rte_hash_crc_set_alg(CRC32_SSE42);
> +#endif
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> + if (likely(crc32_alg == CRC32_SSE42))
> + return _mm_crc32_u64(init_val, data);
> +#endif
> +
> + return crc32c_2words(data, init_val);
>  }
>  
>  /**

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v3 5/5] test: remove redundant compile checks

2014-11-18 Thread Yerden Zhumabekov
Since rte_hash_crc() can now be run regardless of SSE4.2 support,
we can safely remove compile checks for RTE_MACHINE_CPUFLAG_SSE4_2
in test utilities.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c  |7 ---
 app/test/test_hash_perf.c |   11 ---
 2 files changed, 18 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 178ec3f..76b1b8f 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -55,10 +55,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 
/***
  * Hash function performance test configuration section. Each performance test
@@ -67,11 +64,7 @@
  * The five arrays below control what tests are performed. Every combination
  * from the array entries is tested.
  */
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {0, 2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 
21, 31, 32, 33, 63, 64};
 
/**/
diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c
index be34957..05a88ec 100644
--- a/app/test/test_hash_perf.c
+++ b/app/test/test_hash_perf.c
@@ -56,10 +56,7 @@
 #include 
 #include 
 #include 
-
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
-#endif

 /* Types of hash table performance test that can be performed */
 enum hash_test_t {
@@ -97,11 +94,7 @@ struct tbl_perf_test_params {
  */
 #define HASHTEST_ITERATIONS 100

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 static rte_hash_function hashtest_funcs[] = {rte_jhash, rte_hash_crc};
-#else
-static rte_hash_function hashtest_funcs[] = {rte_jhash};
-#endif
 static uint32_t hashtest_initvals[] = {0};
 static uint32_t hashtest_key_lens[] = {2, 4, 5, 6, 7, 8, 10, 11, 15, 16, 21, 
31, 32, 33, 63, 64};
 
/**/
@@ -243,7 +236,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64,rte_jhash,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64,rte_jhash,   
0},
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 /* Small table, add */
 /*  Test type | Iterations | Entries | BucketSize | KeyLen |HashFunc | 
InitVal */
 { ADD_ON_EMPTY,1024, 1024,   1,  16, rte_hash_crc,   
0},
@@ -376,7 +368,6 @@ struct tbl_perf_test_params tbl_perf_params[] =
 {   LOOKUP,  ITERATIONS,  1048576,   4,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,   8,  64, rte_hash_crc,   
0},
 {   LOOKUP,  ITERATIONS,  1048576,  16,  64, rte_hash_crc,   
0},
-#endif
 };

 
/**/
@@ -423,10 +414,8 @@ static const char *get_hash_name(rte_hash_function f)
if (f == rte_jhash)
return "jhash";

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
if (f == rte_hash_crc)
return "rte_hash_crc";
-#endif

return "UnknownHash";
 }
-- 
1.7.9.5



[dpdk-dev] [PATCH v3 4/5] hash: rte_hash_crc() slices data into 8-byte pieces

2014-11-18 Thread Yerden Zhumabekov
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index c1b75e8..2d95e3c 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -456,7 +456,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }

 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -471,23 +471,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint32_t temp = 0;
-   const uint32_t *p32 = (const uint32_t *)data;
+   uint64_t temp = 0;
+   const uint64_t *p64 = (const uint64_t *)data;

-   for (i = 0; i < data_len / 4; i++) {
-   init_val = rte_hash_crc_4byte(*p32++, init_val);
+   for (i = 0; i < data_len / 8; i++) {
+   init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (3 - (data_len & 0x03)) {
+   switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= *((const uint8_t *)p32 + 2) << 16;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= *((const uint8_t *)p32 + 1) << 8;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= *((const uint8_t *)p32);
+   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+   temp |= *((const uint32_t *)p64);
+   init_val = rte_hash_crc_8byte(temp, init_val);
+   break;
+   case 3:
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   break;
+   case 4:
+   temp |= *((const uint8_t *)p64 + 2) << 16;
+   /* Fallthrough */
+   case 5:
+   temp |= *((const uint8_t *)p64 + 1) << 8;
+   /* Fallthrough */
+   case 6:
+   temp |= *((const uint8_t *)p64);
init_val = rte_hash_crc_4byte(temp, init_val);
+   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH v3 3/5] hash: add fallback to software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov
Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Depending on compiler attributes support, best available algorithm
may be detected upon application startup.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   64 ++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 15f687a..c1b75e8 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif

 #include 
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
+#endif
+#include 
+#include 

 /* Lookup tables for software implementation of CRC32C */
 static uint32_t crc32c_tables[8][256] = {{
@@ -363,8 +367,44 @@ crc32c_2words(uint64_t data, uint32_t init_val)
return crc;
 }

+enum crc32_alg_t {
+   CRC32_SW = 0,
+   CRC32_SSE42,
+   CRC32_AUTODETECT
+};
+
+static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+   int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+   enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+   crc32_alg = (alg == CRC32_SSE42) ? alg_supp : CRC32_SW;
+}
+
+/* Best available algorithm is detected via CPUID instruction */
+#ifndef __INTEL_COMPILER
+static inline void __attribute__((constructor))
+rte_hash_crc_try_sse42(void)
+{
+   rte_hash_crc_set_alg(CRC32_SSE42);
+}
+#endif
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -376,11 +416,22 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-   return _mm_crc32_u32(init_val, data);
+#ifdef __INTEL_COMPILER
+   if (unlikely(crc32_alg == CRC32_AUTODETECT))
+   rte_hash_crc_set_alg(CRC32_SSE42);
+#endif
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u32(init_val, data);
+#endif
+
+   return crc32c_1word(data, init_val);
 }

 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -392,7 +443,16 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-   return _mm_crc32_u64(init_val, data);
+#ifdef __INTEL_COMPILER
+   if (unlikely(crc32_alg == CRC32_AUTODETECT))
+   rte_hash_crc_set_alg(CRC32_SSE42);
+#endif
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u64(init_val, data);
+#endif
+
+   return crc32c_2words(data, init_val);
 }

 /**
-- 
1.7.9.5



[dpdk-dev] [PATCH v3 2/5] hash: add new rte_hash_crc_8byte call

2014-11-18 Thread Yerden Zhumabekov
SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 4d7532a..15f687a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -380,6 +380,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH v3 1/5] hash: add software CRC32 implementation

2014-11-18 Thread Yerden Zhumabekov
Add lookup tables for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |  316 
 1 file changed, 316 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..4d7532a 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,322 @@ extern "C" {
 #include 
 #include 

+/* Lookup tables for software implementation of CRC32C */
+static uint32_t crc32c_tables[8][256] = {{
+ 0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4, 0xC79A971F, 0x35F1141C, 
0x26A1E7E8, 0xD4CA64EB,
+ 0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B, 0x4D43CFD0, 0xBF284CD3, 
0xAC78BF27, 0x5E133C24,
+ 0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B, 0xD7C45070, 0x25AFD373, 
0x36FF2087, 0xC494A384,
+ 0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54, 0x5D1D08BF, 0xAF768BBC, 
0xBC267848, 0x4E4DFB4B,
+ 0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A, 0xE72719C1, 0x154C9AC2, 
0x061C6936, 0xF477EA35,
+ 0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5, 0x6DFE410E, 0x9F95C20D, 
0x8CC531F9, 0x7EAEB2FA,
+ 0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45, 0xF779DEAE, 0x05125DAD, 
0x1642AE59, 0xE4292D5A,
+ 0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A, 0x7DA08661, 0x8FCB0562, 
0x9C9BF696, 0x6EF07595,
+ 0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48, 0x86E18AA3, 0x748A09A0, 
0x67DAFA54, 0x95B17957,
+ 0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687, 0x0C38D26C, 0xFE53516F, 
0xED03A29B, 0x1F682198,
+ 0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927, 0x96BF4DCC, 0x64D4CECF, 
0x77843D3B, 0x85EFBE38,
+ 0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8, 0x1C661503, 0xEE0D9600, 
0xFD5D65F4, 0x0F36E6F7,
+ 0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096, 0xA65C047D, 0x5437877E, 
0x4767748A, 0xB50CF789,
+ 0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859, 0x2C855CB2, 0xDEEEDFB1, 
0xCDBE2C45, 0x3FD5AF46,
+ 0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9, 0xB602C312, 0x44694011, 
0x5739B3E5, 0xA55230E6,
+ 0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36, 0x3CDB9BDD, 0xCEB018DE, 
0xDDE0EB2A, 0x2F8B6829,
+ 0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C, 0x456CAC67, 0xB7072F64, 
0xA457DC90, 0x563C5F93,
+ 0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043, 0xCFB5F4A8, 0x3DDE77AB, 
0x2E8E845F, 0xDCE5075C,
+ 0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3, 0x55326B08, 0xA759E80B, 
0xB4091BFF, 0x466298FC,
+ 0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C, 0xDFEB33C7, 0x2D80B0C4, 
0x3ED04330, 0xCCBBC033,
+ 0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652, 0x65D122B9, 0x97BAA1BA, 
0x84EA524E, 0x7681D14D,
+ 0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D, 0xEF087A76, 0x1D63F975, 
0x0E330A81, 0xFC588982,
+ 0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D, 0x758FE5D6, 0x87E466D5, 
0x94B49521, 0x66DF1622,
+ 0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2, 0xFF56BD19, 0x0D3D3E1A, 
0x1E6DCDEE, 0xEC064EED,
+ 0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530, 0x0417B1DB, 0xF67C32D8, 
0xE52CC12C, 0x1747422F,
+ 0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF, 0x8ECEE914, 0x7CA56A17, 
0x6FF599E3, 0x9D9E1AE0,
+ 0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F, 0x144976B4, 0xE622F5B7, 
0xF5720643, 0x07198540,
+ 0x590AB964, 0xAB613A67, 0xB831C993, 0x4A5A4A90, 0x9E902E7B, 0x6CFBAD78, 
0x7FAB5E8C, 0x8DC0DD8F,
+ 0xE330A81A, 0x115B2B19, 0x020BD8ED, 0xF0605BEE, 0x24AA3F05, 0xD6C1BC06, 
0xC5914FF2, 0x37FACCF1,
+ 0x69E9F0D5, 0x9B8273D6, 0x88D28022, 0x7AB90321, 0xAE7367CA, 0x5C18E4C9, 
0x4F48173D, 0xBD23943E,
+ 0xF36E6F75, 0x0105EC76, 0x12551F82, 0xE03E9C81, 0x34F4F86A, 0xC69F7B69, 
0xD5CF889D, 0x27A40B9E,
+ 0x79B737BA, 0x8BDCB4B9, 0x988C474D, 0x6AE7C44E, 0xBE2DA0A5, 0x4C4623A6, 
0x5F16D052, 0xAD7D5351
+},
+{
+ 0x, 0x13A29877, 0x274530EE, 0x34E7A899, 0x4E8A61DC, 0x5D28F9AB, 
0x69CF5132, 0x7A6DC945,
+ 0x9D14C3B8, 0x8EB65BCF, 0xBA51F356, 0xA9F36B21, 0xD39EA264, 0xC03C3A13, 
0xF4DB928A, 0xE7790AFD,
+ 0x3FC5F181, 0x2C6769F6, 0x1880C16F, 0x0B225918, 0x714F905D, 0x62ED082A, 
0x560AA0B3, 0x45A838C4,
+ 0xA2D13239, 0xB173AA4E, 0x859402D7, 0x96369AA0, 0xEC5B53E5, 0xFFF9CB92, 
0xCB1E630B, 0xD8BCFB7C,
+ 0x7F8BE302, 0x6C297B75, 0x58CED3EC, 0x4B6C4B9B, 0x310182DE, 0x22A31AA9, 
0x1644B230, 0x05E62A47,
+ 0xE29F20BA, 0xF13DB8CD, 0xC5DA1054, 0xD6788823, 0xAC154166, 0xBFB7D911, 
0x8B507188, 0x98F2E9FF,
+ 0x404E1283, 0x53EC8AF4, 0x670B226D, 0x74A9BA1A, 0x0EC4735F, 0x1D66EB28, 
0x298143B1, 0x3A23DBC6,
+ 0xDD5AD13B, 0xCEF8494C, 0xFA1FE1D5, 0xE9BD79A2, 0x93D0B0E7, 0x80722890, 
0xB4958009, 0xA737187E,
+ 0xFF17C604, 0xECB55E73, 0xD852F6EA, 0xCBF06E9D, 0xB19DA7D8, 0xA23F3FAF, 
0x96D89736, 0x857A0F41,
+ 0x620305BC, 0x71A19DCB, 0x45463552, 0x56E4AD25, 0x2C896460, 0x3F2BFC17, 
0x0BCC548E, 0x186ECCF9,
+ 0xC0D23785, 0xD370AFF2, 0xE797076B, 0xF4359F1C, 0x8E585659, 0x9DFACE2E, 
0xA91D66B7, 0xBABFFEC0,
+ 0x5DC6F43D, 0x4E646C4A, 0x7A83C4D3, 0x69215CA4, 0x134C95E1, 0x00EE0D96, 
0x3409A50F, 0x27AB3D78,
+ 

[dpdk-dev] [PATCH v3 0/5] rte_hash_crc reworked to be platform-independent

2014-11-18 Thread Yerden Zhumabekov
This is a rework of my previous patches improving performance of rte_hash_crc. 
In addition, this revision brings a fallback mechanism to ensure that CRC32 
hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 
intrinsics). Performance of software CRC32 implementation is also improved.

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case 
SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash 
calculation functions with 4 and 8-byte operands.
* removed compile-time checks from test_hash_perf and test_hash.
* setting default algorithm implementation as a constructor while application 
startup.

Patches were tested on machines either with and without SSE4.2 support. 
Software implementation seems to be about 4-5 times slower than SSE4.2-enabled 
one. Of course, they return identical results.

Yerden Zhumabekov (5):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces
  test: remove redundant compile checks

 app/test/test_hash.c   |7 -
 app/test/test_hash_perf.c  |   11 --
 lib/librte_hash/rte_hash_crc.h |  427 +++-
 3 files changed, 417 insertions(+), 28 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation

2014-11-17 Thread Yerden Zhumabekov

17.11.2014 18:34, Ananyev, Konstantin ?:
> Hi Yerden,
>
>> +static inline void
>> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
>> +{
>> +int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
>> +enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
>> +
>> +if (alg == CRC32_SSE42)
>> +crc32_alg = alg_supp;
>> +else
>> +crc32_alg = CRC32_SW;
>> +}
>> +
> Wonder can we define that function with __attribute__((constructor))?
> Then, I suppose we can remove CRC32_AUTODETECT, and remove:
> if (unlikely(crc32_alg == CRC32_AUTODETECT))
>rte_hash_crc_set_alg(CRC32_SSE42);   
> from rte_hash_crc_*byte().
Nice feature  I was unfamiliar with :)
Since I'm going to revise the patch series anyway, I'll apply it and
test. Thank you.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent

2014-11-17 Thread Yerden Zhumabekov

17.11.2014 17:31, Neil Horman ?:
> On Sun, Nov 16, 2014 at 11:59:16PM +0600, Yerden Zhumabekov wrote:
>> This is a rework of my previous patches improving performance of 
>> rte_hash_crc. In addition, this revision brings a fallback mechanism to 
>> ensure that CRC32 hash is calculated regardless of hardware support from CPU 
>> (i.e. SSE4.2 intrinsics).
>>
>> Summary of changes:
>> * added CRC32 software implementation, which is used as a fallback in case 
>> SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
>> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
>> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
>> * reworked rte_hash_crc() function which leverages both versions of CRC32 
>> hash calculation functions with 4 and 8-byte operands.
>>
>> Patches were tested on machines either with and without SSE4.2 support. 
>> Software implementation seems to be about 15 times slower than 
>> SSE4.2-enabled one. Of course, they return identical results.
>>
>> Yerden Zhumabekov (4):
>>   hash: add software CRC32 implementation
>>   hash: add new rte_hash_crc_8byte call
>>   hash: add fallback to software CRC32 implementation
>>   hash: rte_hash_crc() slices data into 8-byte pieces
>>
>>  lib/librte_hash/rte_hash_crc.h |  212 
>> ++--
>>  1 file changed, 202 insertions(+), 10 deletions(-)
>>
>> -- 
>> 1.7.9.5
>>
>>
> Functionally this all looks great, but I think you want to add a 5th patch to
> the series in which you remove the ifdef SSE4.2 bits from test_hash_perf, 
> since
> this makes rte_hash_crc usable in all cases.  Not sure if you would rather 
> just
> ditch rte_hash_jhash alltogether, or make testing it a command line runtime
> option

Meanwhile, I've borrowed some Intel's code (BSD licensed) for CRC32 sw
algorithm, it runs 4 times faster sacrificing memory (2K) for additional
lookup tables. I'd like to include it as well. As for test_hash_perf,
I'll look at it.
Should I just send new series over as 'v3'? Any approval/disapproval for
the current series?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v2 4/4] hash: rte_hash_crc() slices data into 8-byte pieces

2014-11-16 Thread Yerden Zhumabekov
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using CRC32 functions with either 8 and 4 byte operands.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 178b162..3d8dafe 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -241,7 +241,7 @@ rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 }

 /**
- * Use crc32 instruction to perform a hash.
+ * Calculate CRC32 hash on user-supplied byte array.
  *
  * @param data
  *   Data to perform hash on.
@@ -256,23 +256,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint32_t temp = 0;
-   const uint32_t *p32 = (const uint32_t *)data;
+   uint64_t temp = 0;
+   const uint64_t *p64 = (const uint64_t *)data;

-   for (i = 0; i < data_len / 4; i++) {
-   init_val = rte_hash_crc_4byte(*p32++, init_val);
+   for (i = 0; i < data_len / 8; i++) {
+   init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (3 - (data_len & 0x03)) {
+   switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= *((const uint8_t *)p32 + 2) << 16;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= *((const uint8_t *)p32 + 1) << 8;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= *((const uint8_t *)p32);
+   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+   temp |= *((const uint32_t *)p64);
+   init_val = rte_hash_crc_8byte(temp, init_val);
+   break;
+   case 3:
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   break;
+   case 4:
+   temp |= *((const uint8_t *)p64 + 2) << 16;
+   /* Fallthrough */
+   case 5:
+   temp |= *((const uint8_t *)p64 + 1) << 8;
+   /* Fallthrough */
+   case 6:
+   temp |= *((const uint8_t *)p64);
init_val = rte_hash_crc_4byte(temp, init_val);
+   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation

2014-11-16 Thread Yerden Zhumabekov
Initially, SSE4.2 support is detected via CPUID instruction.

Added rte_hash_crc_set_alg() function to detect and set CRC32
implementation if necessary. SSE4.2 is allowed by default. If it's
not available, fall back to sw implementation.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   60 ++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 74e2d92..178b162 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -45,7 +45,11 @@ extern "C" {
 #endif

 #include 
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
 #include 
+#endif
+#include 
+#include 

 /* Lookup table for software implementation of CRC32C */
 static const uint32_t crc32c_table[256] = {
@@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
return init_val;
 }

+enum crc32_alg_t {
+   CRC32_SW = 0,
+   CRC32_SSE42,
+   CRC32_AUTODETECT
+};
+
+/* Default algorithm is left for autodetection,
+ * it is detected on first run of hash function
+ */
+static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
+
+/**
+ * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
+ * hash calculation.
+ *
+ * @param flag
+ *   unsigned integer flag
+ *   - (CRC32_SW) Don't use SSE4.2 intrinsics
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
+ */
+static inline void
+rte_hash_crc_set_alg(enum crc32_alg_t alg)
+{
+   int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
+   enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
+
+   if (alg == CRC32_SSE42)
+   crc32_alg = alg_supp;
+   else
+   crc32_alg = CRC32_SW;
+}
+
 /**
  * Use single crc32 instruction to perform a hash on a 4 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -165,11 +203,21 @@ crc32c_2words(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-   return _mm_crc32_u32(init_val, data);
+   if (unlikely(crc32_alg == CRC32_AUTODETECT))
+   rte_hash_crc_set_alg(CRC32_SSE42);
+
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u32(init_val, data);
+#endif
+
+   return crc32c_1word(data, init_val);
 }

 /**
  * Use single crc32 instruction to perform a hash on a 8 byte value.
+ * Fall back to software crc32 implementation in case SSE4.2 is
+ * not supported
  *
  * @param data
  *   Data to perform hash on.
@@ -181,7 +229,15 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-   return _mm_crc32_u64(init_val, data);
+   if (unlikely(crc32_alg == CRC32_AUTODETECT))
+   rte_hash_crc_set_alg(CRC32_SSE42);
+
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+   if (likely(crc32_alg == CRC32_SSE42))
+   return _mm_crc32_u64(init_val, data);
+#endif
+
+   return crc32c_2words(data, init_val);
 }

 /**
-- 
1.7.9.5



[dpdk-dev] [PATCH v2 2/4] hash: add new rte_hash_crc_8byte call

2014-11-16 Thread Yerden Zhumabekov
SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 3c368c5..74e2d92 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -169,6 +169,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH v2 1/4] hash: add software CRC32 implementation

2014-11-16 Thread Yerden Zhumabekov
Add lookup table for CRC32 algorithm, crc32c_1word() and
crc32c_2words() functions returning hash of 32-bit and
64-bit operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |  105 
 1 file changed, 105 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..3c368c5 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -47,6 +47,111 @@ extern "C" {
 #include 
 #include 

+/* Lookup table for software implementation of CRC32C */
+static const uint32_t crc32c_table[256] = {
+   0xL, 0xF26B8303L, 0xE13B70F7L, 0x1350F3F4L,
+   0xC79A971FL, 0x35F1141CL, 0x26A1E7E8L, 0xD4CA64EBL,
+   0x8AD958CFL, 0x78B2DBCCL, 0x6BE22838L, 0x9989AB3BL,
+   0x4D43CFD0L, 0xBF284CD3L, 0xAC78BF27L, 0x5E133C24L,
+   0x105EC76FL, 0xE235446CL, 0xF165B798L, 0x030E349BL,
+   0xD7C45070L, 0x25AFD373L, 0x36FF2087L, 0xC494A384L,
+   0x9A879FA0L, 0x68EC1CA3L, 0x7BBCEF57L, 0x89D76C54L,
+   0x5D1D08BFL, 0xAF768BBCL, 0xBC267848L, 0x4E4DFB4BL,
+   0x20BD8EDEL, 0xD2D60DDDL, 0xC186FE29L, 0x33ED7D2AL,
+   0xE72719C1L, 0x154C9AC2L, 0x061C6936L, 0xF477EA35L,
+   0xAA64D611L, 0x580F5512L, 0x4B5FA6E6L, 0xB93425E5L,
+   0x6DFE410EL, 0x9F95C20DL, 0x8CC531F9L, 0x7EAEB2FAL,
+   0x30E349B1L, 0xC288CAB2L, 0xD1D83946L, 0x23B3BA45L,
+   0xF779DEAEL, 0x05125DADL, 0x1642AE59L, 0xE4292D5AL,
+   0xBA3A117EL, 0x4851927DL, 0x5B016189L, 0xA96AE28AL,
+   0x7DA08661L, 0x8FCB0562L, 0x9C9BF696L, 0x6EF07595L,
+   0x417B1DBCL, 0xB3109EBFL, 0xA0406D4BL, 0x522BEE48L,
+   0x86E18AA3L, 0x748A09A0L, 0x67DAFA54L, 0x95B17957L,
+   0xCBA24573L, 0x39C9C670L, 0x2A993584L, 0xD8F2B687L,
+   0x0C38D26CL, 0xFE53516FL, 0xED03A29BL, 0x1F682198L,
+   0x5125DAD3L, 0xA34E59D0L, 0xB01EAA24L, 0x42752927L,
+   0x96BF4DCCL, 0x64D4CECFL, 0x77843D3BL, 0x85EFBE38L,
+   0xDBFC821CL, 0x2997011FL, 0x3AC7F2EBL, 0xC8AC71E8L,
+   0x1C661503L, 0xEE0D9600L, 0xFD5D65F4L, 0x0F36E6F7L,
+   0x61C69362L, 0x93AD1061L, 0x80FDE395L, 0x72966096L,
+   0xA65C047DL, 0x5437877EL, 0x4767748AL, 0xB50CF789L,
+   0xEB1FCBADL, 0x197448AEL, 0x0A24BB5AL, 0xF84F3859L,
+   0x2C855CB2L, 0xDEEEDFB1L, 0xCDBE2C45L, 0x3FD5AF46L,
+   0x7198540DL, 0x83F3D70EL, 0x90A324FAL, 0x62C8A7F9L,
+   0xB602C312L, 0x44694011L, 0x5739B3E5L, 0xA55230E6L,
+   0xFB410CC2L, 0x092A8FC1L, 0x1A7A7C35L, 0xE811FF36L,
+   0x3CDB9BDDL, 0xCEB018DEL, 0xDDE0EB2AL, 0x2F8B6829L,
+   0x82F63B78L, 0x709DB87BL, 0x63CD4B8FL, 0x91A6C88CL,
+   0x456CAC67L, 0xB7072F64L, 0xA457DC90L, 0x563C5F93L,
+   0x082F63B7L, 0xFA44E0B4L, 0xE9141340L, 0x1B7F9043L,
+   0xCFB5F4A8L, 0x3DDE77ABL, 0x2E8E845FL, 0xDCE5075CL,
+   0x92A8FC17L, 0x60C37F14L, 0x73938CE0L, 0x81F80FE3L,
+   0x55326B08L, 0xA759E80BL, 0xB4091BFFL, 0x466298FCL,
+   0x1871A4D8L, 0xEA1A27DBL, 0xF94AD42FL, 0x0B21572CL,
+   0xDFEB33C7L, 0x2D80B0C4L, 0x3ED04330L, 0xCCBBC033L,
+   0xA24BB5A6L, 0x502036A5L, 0x4370C551L, 0xB11B4652L,
+   0x65D122B9L, 0x97BAA1BAL, 0x84EA524EL, 0x7681D14DL,
+   0x2892ED69L, 0xDAF96E6AL, 0xC9A99D9EL, 0x3BC21E9DL,
+   0xEF087A76L, 0x1D63F975L, 0x0E330A81L, 0xFC588982L,
+   0xB21572C9L, 0x407EF1CAL, 0x532E023EL, 0xA145813DL,
+   0x758FE5D6L, 0x87E466D5L, 0x94B49521L, 0x66DF1622L,
+   0x38CC2A06L, 0xCAA7A905L, 0xD9F75AF1L, 0x2B9CD9F2L,
+   0xFF56BD19L, 0x0D3D3E1AL, 0x1E6DCDEEL, 0xEC064EEDL,
+   0xC38D26C4L, 0x31E6A5C7L, 0x22B65633L, 0xD0DDD530L,
+   0x0417B1DBL, 0xF67C32D8L, 0xE52CC12CL, 0x1747422FL,
+   0x49547E0BL, 0xBB3FFD08L, 0xA86F0EFCL, 0x5A048DFFL,
+   0x8ECEE914L, 0x7CA56A17L, 0x6FF599E3L, 0x9D9E1AE0L,
+   0xD3D3E1ABL, 0x21B862A8L, 0x32E8915CL, 0xC083125FL,
+   0x144976B4L, 0xE622F5B7L, 0xF5720643L, 0x07198540L,
+   0x590AB964L, 0xAB613A67L, 0xB831C993L, 0x4A5A4A90L,
+   0x9E902E7BL, 0x6CFBAD78L, 0x7FAB5E8CL, 0x8DC0DD8FL,
+   0xE330A81AL, 0x115B2B19L, 0x020BD8EDL, 0xF0605BEEL,
+   0x24AA3F05L, 0xD6C1BC06L, 0xC5914FF2L, 0x37FACCF1L,
+   0x69E9F0D5L, 0x9B8273D6L, 0x88D28022L, 0x7AB90321L,
+   0xAE7367CAL, 0x5C18E4C9L, 0x4F48173DL, 0xBD23943EL,
+   0xF36E6F75L, 0x0105EC76L, 0x12551F82L, 0xE03E9C81L,
+   0x34F4F86AL, 0xC69F7B69L, 0xD5CF889DL, 0x27A40B9EL,
+   0x79B737BAL, 0x8BDCB4B9L, 0x988C474DL, 0x6AE7C44EL,
+   0xBE2DA0A5L, 0x4C4623A6L, 0x5F16D052L, 0xAD7D5351L
+};
+
+#define CRC32C_UPD(crc, byte) \
+   (crc = crc32c_table[((crc) ^ (byte)) & 0xFFL] ^ ((crc) >> 8))
+
+static inline uint32_t
+crc32c_1word(uint32_t data, uint32_t init_val)
+{
+   union {
+   uint32_t u32;
+   uint8_t u8[4];
+   } d;
+   d.u32 = data;
+   CRC32C_UPD(init_val, d.u8[0]);
+   CRC32C_UPD(init_val, d.u8[1]);
+   CRC32C_UPD(init_val, d.u8[2]);
+   CRC32C_UPD(init_val, d.u8[3]);
+   return init_val;
+}
+
+static inline uint32_t
+crc32c_

[dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent

2014-11-16 Thread Yerden Zhumabekov
This is a rework of my previous patches improving performance of rte_hash_crc. 
In addition, this revision brings a fallback mechanism to ensure that CRC32 
hash is calculated regardless of hardware support from CPU (i.e. SSE4.2 
intrinsics).

Summary of changes:
* added CRC32 software implementation, which is used as a fallback in case 
SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
* added rte_hash_crc_set_alg() function to control availability of SSE4.2.
* added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
* reworked rte_hash_crc() function which leverages both versions of CRC32 hash 
calculation functions with 4 and 8-byte operands.

Patches were tested on machines either with and without SSE4.2 support. 
Software implementation seems to be about 15 times slower than SSE4.2-enabled 
one. Of course, they return identical results.

Yerden Zhumabekov (4):
  hash: add software CRC32 implementation
  hash: add new rte_hash_crc_8byte call
  hash: add fallback to software CRC32 implementation
  hash: rte_hash_crc() slices data into 8-byte pieces

 lib/librte_hash/rte_hash_crc.h |  212 ++--
 1 file changed, 202 insertions(+), 10 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call

2014-11-16 Thread Yerden Zhumabekov

15.11.2014 0:41, Neil Horman ?:
> On Fri, Nov 14, 2014 at 10:43:39PM +0600, Yerden Zhumabekov wrote:
>> 14.11.2014 19:53, Neil Horman ?:
>>>
>>> Well, its possible you'll get lucky.  crc is such a common operation, its
>>> entirely possible that the gcc intrinsic emits software based crc 
>>> computation if
>>> the SSE4.2 instructions aren't enabled.  I recommend modifying the 
>>> test_hash_crc
>>> function to use rte_hash_crc with SSE4.2 disabled, and see if you get a 
>>> crash.
>>> If you don't examine the disassembly of your new function and confirm that
>>> something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
>>> your patch is fine, and we can focus on how to change the ifdefs in the 
>>> existing
>>> code, as use of the rte_hash_crc functions should be safe.
>>>
>> Unfortunately, it seems not to be the case. Trying to force compiling a
>> test program with _mm_crc32_u32 intrinsic on computer with no SSE4.2
>> support leads to "Illegal instruction error". So it looks like GCC does
>> not fall back to crc32 software implementation.
>>
> Ok, but crc32 is pretty easy to implement in software.  Just appropriate the
> calculate_crc32c function from the BSD or Linux kernels and if
> (unlikely(!support_sse42)) calculate_crc32 operation.
>

I've almost reworked patches, but there's one more issue I was wondering
about.

If we use a flag (say, 'sse42_flag ') to determine code path, where
should it be defined?
Should it be some sort of rte_hash_crc_init() call in the init stage of
application?

Alternatively, I could have implemented it like this:


static uint8_t sse42_flag = FLAG_UNKNOWN;

rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
{
if (unlikely(sse42_flag == FLAG_UNKNOWN))
sse42_flag = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2) ?
FLAG_SUPPORTED : FLAG_NOTSUPPORTED;

if (likely(sse42_flag == FLAG_SUPPORTED))
return _mm_crc32_u32(init_val, data);
.
}



-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-15 Thread Yerden Zhumabekov
Hello Matt,

You can specify RSS configuration through rte_eth_dev_configure()
function supplied with this structure:

struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode= ETH_MQ_RX_RSS,
 ...
},
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
},
},
.
};

In this case, RSS-hash is calculated over IP addresses only and with
default RSS key. Look at lib/librte_ether/rte_ethdev.h for other
definitions.


15.11.2014 0:49, Matt Laswell ?:
> Hey Folks,
>
> This thread has been tremendously helpful, as I'm looking at adding
> RSS-based load balancing to my application in the not too distant
> future.  Many thanks to all who have contributed, especially regarding
> symmetric RSS.
>
> Not to derail the conversation too badly, but could one of you point
> me to some example code that demonstrates the steps needed to
> configure RSS?  We're using Niantic NICs, so I assume that this is
> pretty standard stuff, but having an example to study is a real leg up.
>
> Again, thanks for all of the information.
>
> --
> Matt Laswell
> laswell at infiniteio.com <mailto:laswell at infiniteio.com>
> infinite io, inc.
>
> On Fri, Nov 14, 2014 at 10:57 AM, Chilikin, Andrey
> mailto:andrey.chilikin at intel.com>> wrote:
>
> Fortville supports symmetrical hashing on HW level, a patch for
> i40e PMD was submitted a couple of weeks ago. For Niantic you can
> use symmetrical  rss key recommended by Konstantin.
>
> Regards,
> Andrey
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org
> <mailto:dev-bounces at dpdk.org>] On Behalf Of Ananyev, Konstantin
> Sent: Friday, November 14, 2014 4:50 PM
> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> <mailto:dev at dpdk.org>
> Cc: Yuanzhang Hu
> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
> load_balancer sample app vs. Hash table
>
> > -Original Message-
> > From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz
> <mailto:e_zhumabekov at sts.kz>]
> > Sent: Friday, November 14, 2014 4:23 PM
> > To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
> <mailto:dev at dpdk.org>
> > Cc: Yuanzhang Hu
> > Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
> > load_balancer sample app vs. Hash table
> >
> > I'd like to interject a question here.
> >
> > In case of flow classification, one might possibly prefer for
> packets
> > from the same flow to fall on the same logical core. With this '%'
> > load balancing, it would require to get the same RSS hash value for
> > packets with direct (src to dst) and swapped (dst to src) IPs and
> > ports. Am I correct that hardware RSS calculation cannot provide
> this symmetry?
>
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
> <http://www.ndsl.kaist.edu/%7Eshinae/papers/TR-symRSS.pdf>
>
> Konstantin
>
> >
> > 14.11.2014 20:44, Ananyev, Konstantin ?:
> > > If you have a NIC that is capable to do HW hash computation, then
> > > you can do your load balancing based on that value.
> > > Let say ixgbe/igb/i40e NICs can calculate RSS hash value based on
> > > different combinations of dst/src Ips, dst/src ports.
> > > This value can be stored inside mbuf for each RX packet by PMD
> RX function.
> > > Then you can do:
> > > worker_id = mbuf->hash.rss % n_workersl
> > >
> > > That might to provide better balancing then using just one byte
> > > value, plus should be a bit faster, as in that case your
> balancer code don't need to touch packet's data.
> > >
> > > Konstantin
> >
> > --
> > Sincerely,
> >
> > Yerden Zhumabekov
> > State Technical Service
> > Astana, KZ
> >
>
>

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-14 Thread Yerden Zhumabekov
Thank you. And one more thing, does Fortville (or Niantic) support
various L2 headers when calculating RSS hash? I mean MPLS, QinQ, etc.?

14.11.2014 22:57, Chilikin, Andrey ?:
> Fortville supports symmetrical hashing on HW level, a patch for i40e PMD was 
> submitted a couple of weeks ago. For Niantic you can use symmetrical  rss key 
> recommended by Konstantin.
>
> Regards,
> Andrey
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, November 14, 2014 4:50 PM
> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> Cc: Yuanzhang Hu
> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK load_balancer 
> sample app vs. Hash table
>
>> -----Original Message-
>> From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
>> Sent: Friday, November 14, 2014 4:23 PM
>> To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
>> Cc: Yuanzhang Hu
>> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK 
>> load_balancer sample app vs. Hash table
>>
>> I'd like to interject a question here.
>>
>> In case of flow classification, one might possibly prefer for packets 
>> from the same flow to fall on the same logical core. With this '%' 
>> load balancing, it would require to get the same RSS hash value for 
>> packets with direct (src to dst) and swapped (dst to src) IPs and 
>> ports. Am I correct that hardware RSS calculation cannot provide this 
>> symmetry?
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
>
> Konstantin
>

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-14 Thread Yerden Zhumabekov

14.11.2014 22:50, Ananyev, Konstantin ?:
>> -Original Message-
>> From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
>> Sent: Friday, November 14, 2014 4:23 PM
>> To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
>> Cc: Yuanzhang Hu
>> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK load_balancer 
>> sample app vs. Hash table
>>
>> I'd like to interject a question here.
>>
>> In case of flow classification, one might possibly prefer for packets
>> from the same flow to fall on the same logical core. With this '%' load
>> balancing, it would require to get the same RSS hash value for packets
>> with direct (src to dst) and swapped (dst to src) IPs and ports. Am I
>> correct that hardware RSS calculation cannot provide this symmetry?
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf

Oh, very interesting paper. Thank you for hinting. Need to give it a go.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call

2014-11-14 Thread Yerden Zhumabekov

14.11.2014 19:53, Neil Horman ?:
> On Fri, Nov 14, 2014 at 05:57:51PM +0600, Yerden Zhumabekov wrote:
>> 14.11.2014 17:33, Neil Horman ?:
>>> Not really.  That covers the case of applications selecting the hash 
>>> function
>>> using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications 
>>> using
>>> the function directly.  Test_hash_perf is an example  of this, and 
>>> ostensibly
>>> because of the behavior without SSE4.2 it defines these huge test tables 
>>> twice
>>> based on the availability of SSE4.2.  It would be better if we could allow
>>> applications to use rte_hash_crc regardless, and make the code it uses at 
>>> run
>>> time configurable.
>> I see, then we have a problem here :)
>>
>> Actually, that was one of my concerns when developing these patches. I
>> looked through the source code of libs and examples and I saw the
>> '#ifdef..#include..#endif'-like appoach while selecting hash function
>> was common. So I organized patches to minimize the impact on API and not
>> to contradict this approach.
>>
> Thats a reasonable approach, but I really hate the idea of continuing this 
> need
> to select cpu features at compile time if its not nececcesary.
>
>> If we prefer to change this approach then, I guess, we need to introduce
>> broader changes to rte_hash library and change other code which uses it.
>> If that's what's needed, then it'll take some time for me to rework
>> these patches.
>>
> Well, its possible you'll get lucky.  crc is such a common operation, its
> entirely possible that the gcc intrinsic emits software based crc computation 
> if
> the SSE4.2 instructions aren't enabled.  I recommend modifying the 
> test_hash_crc
> function to use rte_hash_crc with SSE4.2 disabled, and see if you get a crash.
> If you don't examine the disassembly of your new function and confirm that
> something reasonable that doesn't use SSE4.2 is emitted.  If thats the case,
> your patch is fine, and we can focus on how to change the ifdefs in the 
> existing
> code, as use of the rte_hash_crc functions should be safe.
>

Unfortunately, it seems not to be the case. Trying to force compiling a
test program with _mm_crc32_u32 intrinsic on computer with no SSE4.2
support leads to "Illegal instruction error". So it looks like GCC does
not fall back to crc32 software implementation.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-14 Thread Yerden Zhumabekov
I'd like to interject a question here.

In case of flow classification, one might possibly prefer for packets
from the same flow to fall on the same logical core. With this '%' load
balancing, it would require to get the same RSS hash value for packets
with direct (src to dst) and swapped (dst to src) IPs and ports. Am I
correct that hardware RSS calculation cannot provide this symmetry?

14.11.2014 20:44, Ananyev, Konstantin ?:
> If you have a NIC that is capable to do HW hash computation,
> then you can do your load balancing based on that value.
> Let say ixgbe/igb/i40e NICs can calculate RSS hash value based on different 
> combinations of 
> dst/src Ips, dst/src ports.
> This value can be stored inside mbuf for each RX packet by PMD RX function.
> Then you can do:
> worker_id = mbuf->hash.rss % n_workersl
>
> That might to provide better balancing then using just one byte value,
> plus should be a bit faster, as in that case your balancer code don't need to 
> touch packet's data.   
>
> Konstantin

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call

2014-11-14 Thread Yerden Zhumabekov

14.11.2014 17:33, Neil Horman ?:
> On Fri, Nov 14, 2014 at 01:15:12PM +0600, Yerden Zhumabekov wrote:
>>
>> Hello,
>>
>> A quick grep on dpdk source shows that rte_hash_crc() is used in
>> librte_hash in following context:
>>
>> In rte_hash.c:
>> /* Hash function used if none is specified */
>> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> #include 
>> #define DEFAULT_HASH_FUNC   rte_hash_crc
>> #else
>> #include 
>> #define DEFAULT_HASH_FUNC   rte_jhash
>> #endif
>>
>> In rte_fbk_hash.h
>> #ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> #include 
>> /** Default four-byte key hash function if none is specified. */
>> #define RTE_FBK_HASH_FUNC_DEFAULT???rte_hash_crc_4byte
>> #else
>> #include 
>> #define RTE_FBK_HASH_FUNC_DEFAULT???rte_jhash_1word
>> #endif
>> #endif
>>
>>
>> I guess it covers the cpu flags check you're talking about.
>>
> Not really.  That covers the case of applications selecting the hash function
> using the DEFUALT_HASH_FUNC macro, but doesn't nothing for applications using
> the function directly.  Test_hash_perf is an example  of this, and ostensibly
> because of the behavior without SSE4.2 it defines these huge test tables twice
> based on the availability of SSE4.2.  It would be better if we could allow
> applications to use rte_hash_crc regardless, and make the code it uses at run
> time configurable.

I see, then we have a problem here :)

Actually, that was one of my concerns when developing these patches. I
looked through the source code of libs and examples and I saw the
'#ifdef..#include..#endif'-like appoach while selecting hash function
was common. So I organized patches to minimize the impact on API and not
to contradict this approach.

If we prefer to change this approach then, I guess, we need to introduce
broader changes to rte_hash library and change other code which uses it.
If that's what's needed, then it'll take some time for me to rework
these patches.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call

2014-11-14 Thread Yerden Zhumabekov
14.11.2014 6:52, Neil Horman ?:
> On Thu, Nov 13, 2014 at 06:33:14PM +0100, Thomas Monjalon wrote:
>> Any comment on these patches?
>>
>> 2014-09-03 12:05, Yerden Zhumabekov:
>>> As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
>>> new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
>>> useful.
>>>
>>> ...  ...
>>
> Yeah, sorry I didn't speak up earlier.  I meant to ask if the __mm_crc_u64
> intrinsic will emit software emulated versions of the sse4.2 instruction in 
> the
> event that you build with a config that doesn't enable sse4.2?  If not, then
> NAK, since this will break on the default build.  In that event you'll have to
> modify the new function to do a runtime cpu flags check to either just use the
> instruction inlined with some asm, or emulate it in software.

Hello,

A quick grep on dpdk source shows that rte_hash_crc() is used in
librte_hash in following context:

In rte_hash.c:
/* Hash function used if none is specified */
#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
#include 
#define DEFAULT_HASH_FUNC   rte_hash_crc
#else
#include 
#define DEFAULT_HASH_FUNC   rte_jhash
#endif

In rte_fbk_hash.h
#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
#include 
/** Default four-byte key hash function if none is specified. */
#define RTE_FBK_HASH_FUNC_DEFAULT???rte_hash_crc_4byte
#else
#include 
#define RTE_FBK_HASH_FUNC_DEFAULT???rte_jhash_1word
#endif
#endif


I guess it covers the cpu flags check you're talking about.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] l2fwd does not send packets

2014-09-11 Thread Yerden Zhumabekov
Hi,

To make l2fwd act like a L2 bridge, I had altered l2fwd_simple_forward()
function:

static void
l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
{
unsigned dst_port;
dst_port = l2fwd_dst_ports[portid];
l2fwd_send_packet(m, (uint8_t) dst_port);
}

11.09.2014 3:18, Xin Li ?:
> Hi,
>
> The l2fwd sample application in my environment does not send packets
> through the TX port. I run DPDK inside a KVM guest. The NIC ports are VFs
> assigned to the VM by pci passthrough.
>
> Environment:
>
> Host OS: ubuntu 14.04 x86_64
> NIC: intel x540-t1
> Guest OS: ubuntu 14.04 x86_64
> DPDK: v1.7.0
>
> Some findings:
>
> 1. l2fwd reports 511 packets sent when max tx descriptor is 512, The number
> changes to 1023 if the max tx descriptor is set to 1024.
>
> 2. On the receiver side, no packet captured.
>
> Anyone know the issue and the corresponding fix? Thanks.
>
> Best,
> Xin

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 03/13] mbuf: add packet_type field

2014-09-08 Thread Yerden Zhumabekov
I would use it :)
It's useful to store the IP protocol number (UDP, TCP etc) and version
of IP (4, 6) and then relay packet to specific handler.

08.09.2014 16:17, Olivier MATZ ?:
> Hi Bruce,
>
> On 09/03/2014 05:49 PM, Bruce Richardson wrote:
>> Replace a reserved slot with the new packet type field used to identify
>> the type of the packet, i.e. what protocols are used.
>>
>> Signed-off-by: Bruce Richardson 
>> ---
>>  lib/librte_mbuf/rte_mbuf.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index f136d37..8d0c6fb 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -146,7 +146,7 @@ struct rte_mbuf {
>>  uint32_t reserved1; /**< Unused field. Required for padding */
>>  
>>  /* remaining bytes are set on RX when pulling packet from descriptor */
>> -uint16_t reserved2; /**< Unused field. Required for padding */
>> +uint16_t packet_type;   /**< Type of packet, e.g. protocols used */
>>  uint16_t data_len;  /**< Amount of data in segment buffer. */
>>  uint32_t pkt_len;   /**< Total pkt len: sum of all segments. */
>>  uint16_t l3_len:9;  /**< L3 (IP) Header Length. */
>>
> This patch adds a new fields that nobody uses. So why should we add it ?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH 11/13] mbuf: move l2_len and l3_len to second cache line

2014-09-04 Thread Yerden Zhumabekov
I get your point. I've also read throught the code of various PMDs and
have found no indication of setting l2_len/l3_len fields as well.

As for testing, we'd be happy to test the patchset but we are now in
process of building our testing facilities so we are not ready to
provide enough workload for the hardware/software. I was also wondering
if anyone has run some test and can provide some numbers on that matter.

Personally, I don't think frag/reassemly app is a perfect example for
evaluating 2nd cache line performance penalty. The offsets to L3 and L4
headers need to be calculated for all TCP/IP traffic and fragmented
traffic is not representative in this case. Maybe it would be better to
write an app which calculates these offsets for different set of mbufs
and provides some stats. For example, l2fwd/l3fwd + additional l2_len
and l3_len calculation.

And I'm also figuring out how to rewrite our app/libs (prefetch etc) to
reflect the future changes in mbuf, hence my concerns :)


04.09.2014 16:27, Bruce Richardson ?:
> Hi Yerden,
>
> I understand your concerns and it's good to have this discussion.
>
> There are a number of reasons why I've moved these particular fields
> to the second cache line. Firstly, the main reason is that, obviously enough,
> not all fields will fit in cache line 0, and we need to prioritize what does
> get stored there. The guiding principle behind what fields get moved or not
> that I've chosen to use for this patch set is to move fields that are not
> used on the receive path (or the fastpath receive path, more specifically -
> so that we can move fields only used by jumbo frames that span mbufs) to the
> second cache line. From a search through the existing codebase, there are no
> drivers which set the l2/l3 length fields on RX, this is only used in
> reassembly libraries/apps and by the drivers on TX.
>
> The other reason for moving it to the second cache line is that it logically
> belongs with all the other length fields that we need to add to enable
> tunneling support. [To get an idea of the extra fields that I propose adding
> to the mbuf, please see the RFC patchset I sent out previously as "[RFC 
> PATCH 00/14] Extend the mbuf structure"]. While we probably can fit the 
> 16-bits
> needed for l2/l3 length on the mbuf line 0, there is not enough room for all
> the lengths so we would end up splitting them with other fields in between.
>
> So, in terms of what do to about this particular issue. I would hope that for
> applications that use these fields the impact should be small and/or possible
> to work around e.g. maybe prefetch second cache line on RX in driver. If not,
> then I'm happy to see about withdrawing this particular change and seeing if
> we can keep l2/l3 lengths on cache line zero, with other length fields being
> on cache line 1.
>
> Question: would you consider the ip fragmentation and reassembly example apps
> in the Intel DPDK releases good examples to test to see the impacts of this
> change, or is there some other test you would prefer that I look to do? 
> Can you perhaps test out the patch sets for the mbuf that I've upstreamed so
> far and let me know what regressions, if any, you see in your use-case
> scenarios?
>
> Regards,
> /Bruce
>
-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



[dpdk-dev] [PATCH 11/13] mbuf: move l2_len and l3_len to second cache line

2014-09-04 Thread Yerden Zhumabekov
Hello Bruce,

I'm a little bit concerned about performance issues that would arise if
these fields would go to the 2nd cache line.

For exampe, l2_len and l3_len fields are used by librte_ip_frag to find
L3 and L4 headers position inside mbuf data. Thus, these values should
be calculated by NIC offload, or by user on RX leg.

Secondly, (I wouldn't say on behalf of everyone, but) we use these
fields in our libraries as well for needs of classification. For
instance, in case you try to support other ethertypes which are not
supported by NIC offload (MPLS, IPX etc), but you still need to point
out L3 and L3 headers.

If my concerns are consistent, what would be possible suggestions?

03.09.2014 21:49, Bruce Richardson ?:
> The l2_len and l3_len fields are used for TX offloads and so should be
> put on the second cache line, along with the other fields only used on
> TX.
>
> Signed-off-by: Bruce Richardson 

-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



[dpdk-dev] [PATCH 2/2] hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics

2014-09-03 Thread Yerden Zhumabekov
Calculating hash for data of variable length is more efficient
when that data is sliced into 8-byte pieces. The rest part of data
is hashed using either 8 and 4-byte CRC32 intrinsics.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index 102b2a0..d023e5d 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -95,23 +95,38 @@ static inline uint32_t
 rte_hash_crc(const void *data, uint32_t data_len, uint32_t init_val)
 {
unsigned i;
-   uint32_t temp = 0;
-   const uint32_t *p32 = (const uint32_t *)data;
+   uint64_t temp = 0;
+   const uint64_t *p64 = (const uint64_t *)data;

-   for (i = 0; i < data_len / 4; i++) {
-   init_val = rte_hash_crc_4byte(*p32++, init_val);
+   for (i = 0; i < data_len / 8; i++) {
+   init_val = rte_hash_crc_8byte(*p64++, init_val);
}

-   switch (3 - (data_len & 0x03)) {
+   switch (7 - (data_len & 0x07)) {
case 0:
-   temp |= *((const uint8_t *)p32 + 2) << 16;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 6) << 48;
/* Fallthrough */
case 1:
-   temp |= *((const uint8_t *)p32 + 1) << 8;
+   temp |= (uint64_t) *((const uint8_t *)p64 + 5) << 40;
/* Fallthrough */
case 2:
-   temp |= *((const uint8_t *)p32);
+   temp |= (uint64_t) *((const uint8_t *)p64 + 4) << 32;
+   temp |= *((const uint32_t *)p64);
+   init_val = rte_hash_crc_8byte(temp, init_val);
+   break;
+   case 3:
+   init_val = rte_hash_crc_4byte(*(const uint32_t *)p64, init_val);
+   break;
+   case 4:
+   temp |= *((const uint8_t *)p64 + 2) << 16;
+   /* Fallthrough */
+   case 5:
+   temp |= *((const uint8_t *)p64 + 1) << 8;
+   /* Fallthrough */
+   case 6:
+   temp |= *((const uint8_t *)p64);
init_val = rte_hash_crc_4byte(temp, init_val);
+   /* Fallthrough */
default:
break;
}
-- 
1.7.9.5



[dpdk-dev] [PATCH 1/2] hash: add new rte_hash_crc_8byte call

2014-09-03 Thread Yerden Zhumabekov
SSE4.2 provides _mm_crc32_u64 intrinsic with 8-byte operand.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_hash/rte_hash_crc.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
index b48b0db..102b2a0 100644
--- a/lib/librte_hash/rte_hash_crc.h
+++ b/lib/librte_hash/rte_hash_crc.h
@@ -64,6 +64,22 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 }

 /**
+ * Use single crc32 instruction to perform a hash on a 8 byte value.
+ *
+ * @param data
+ *   Data to perform hash on.
+ * @param init_val
+ *   Value to initialise hash generator.
+ * @return
+ *   32bit calculated hash value.
+ */
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+   return _mm_crc32_u64(init_val, data);
+}
+
+/**
  * Use crc32 instruction to perform a hash.
  *
  * @param data
-- 
1.7.9.5



[dpdk-dev] [PATCH 0/2] rewritten rte_hash_crc() call

2014-09-03 Thread Yerden Zhumabekov
As SSE4.2 provides CRC32 instructions with either 32 and 64 bit operands,
new rte_hash_crc_8byte() call assisted with _mm_crc32_u64 intrinsic may be
useful.

Then, rte_hash_crc() function is redesigned to take advantage of both 32
and 64 bit operands. This improves the function's performance significantly.

Results of my test run on a single CPU core are below.

CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Number of iterations/chunks: 52428800
Chunk size: 24
  rte_hash_crc:0.379 sec, hash: 0x14c64e11
  rte_hash_crc_new:0.253 sec, hash: 0x14c64e11
Chunk size: 25
  rte_hash_crc:0.442 sec, hash: 0xa9afc779
  rte_hash_crc_new:0.316 sec, hash: 0xa9afc779
Chunk size: 26
  rte_hash_crc:0.442 sec, hash: 0x92f2284b
  rte_hash_crc_new:0.316 sec, hash: 0x92f2284b
Chunk size: 27
  rte_hash_crc:0.442 sec, hash: 0x7c4655ff
  rte_hash_crc_new:0.316 sec, hash: 0x7c4655ff
Chunk size: 28
  rte_hash_crc:0.442 sec, hash: 0xf577c6b4
  rte_hash_crc_new:0.316 sec, hash: 0xf577c6b4
Chunk size: 29
  rte_hash_crc:0.505 sec, hash: 0x6e18ba55
  rte_hash_crc_new:0.337 sec, hash: 0x6e18ba55
Chunk size: 30
  rte_hash_crc:0.505 sec, hash: 0x35f07dbb
  rte_hash_crc_new:0.337 sec, hash: 0x35f07dbb
Chunk size: 31
  rte_hash_crc:0.505 sec, hash: 0x1bf2ee8c
  rte_hash_crc_new:0.337 sec, hash: 0x1bf2ee8c

Yerden Zhumabekov (2):
  hash: add new rte_hash_crc_8byte call
  hash: rte_hash_crc uses 8- and 4-byte CRC32 intrinsics

 lib/librte_hash/rte_hash_crc.h |   47 +---
 1 file changed, 39 insertions(+), 8 deletions(-)

-- 
1.7.9.5



[dpdk-dev] [PATCH] igb_uio: fall back to enable/disable irq mode

2014-07-24 Thread Yerden Zhumabekov
24.07.2014 0:09, Stephen Hemminger ?:
>> Rewritten IRQ mode handling code introduced in commit 399a3f0d
>> (igb_uio: fix IRQ mode handling) renders some faulty NICs (VMware
>> e1000, for example) unusable if INTX mode is not supported.
>>
>> This patch gets these NICs up and running, but throwing a kernel
>> warning.
>>
>> Signed-off-by: Yerden Zhumabekov 
> That is because the VMWare PCI INTX is broken.
> The masking logic doesn't work.
>
> Rather than applying this patch a deeper fix in E1000 and DPDK handling
> of link state is needed. Better to just make the E1000 able
> to function without IRQ for Link state than just pretend masking works

I'll dig deeper then, maybe I'll figure out something.
If IRQ doesn't hook anything then, I guess, NIC should be continuously
checked for link state. If so, where should I put my efforts? PMD?

-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



[dpdk-dev] [PATCH] igb_uio: fall back to enable/disable irq mode

2014-07-23 Thread Yerden Zhumabekov
Rewritten IRQ mode handling code introduced in commit 399a3f0d
(igb_uio: fix IRQ mode handling) renders some faulty NICs (VMware
e1000, for example) unusable if INTX mode is not supported.

This patch gets these NICs up and running, but throwing a kernel
warning.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index f220a12..c4ab01a 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -620,9 +620,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
udev->info.irq_flags = IRQF_SHARED;
udev->mode = RTE_INTR_MODE_LEGACY;
} else {
-   dev_err(>dev, "PCI INTX mask not supported\n");
-   err = -EIO;
-   goto fail_release_iomem;
+   dev_warn(>dev, "PCI INTX mask not supported\n");
+   udev->info.irq_flags = IRQF_SHARED;
+   udev->mode = RTE_INTR_MODE_LEGACY;
}
break;
default:
-- 
1.7.10.4



[dpdk-dev] [PATCH] igb_uio dropped support for some faulty NICs

2014-07-23 Thread Yerden Zhumabekov
Hi,

Recent patch 399a3f0d (igb_uio: fix IRQ mode handling) has introduces new IRQ 
mode handling code.

As Stephen reported earlier, VMware PCI emulation of interrupts is somehow 
broken, so INTX mode is not supported (see 
http://dpdk.org/ml/archives/dev/2014-May/002432.html). The current code makes 
VMware e1000 unbindable to igb_uio driver and throwing -EIO error.

What I suggest is to throw a kernel warning but make the work with this kind of 
NIC possible.

Yerden Zhumabekov (1):
  igb_uio: fall back to enable/disable irq mode

 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
1.7.10.4



[dpdk-dev] [PATCH 3/3] igb_uio: renaming pci config lock/unlock functions

2014-07-21 Thread Yerden Zhumabekov

renaming pci config lock/unlock functions using wrappers introduced
in commit f57049874f61046641a8eb1e9832810cc33befe5

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)



[dpdk-dev] [PATCH 2/3] igb_uio: pci_config_lock/pci_config_unlock wrappers

2014-07-21 Thread Yerden Zhumabekov

Since PCI config lock/unlock functions were renamed in linux kernel,
these wrappers are introduced to reflect this change.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   20 
 1 file changed, 20 insertions(+)



[dpdk-dev] [PATCH 1/3] igb_uio: fixed typos

2014-07-21 Thread Yerden Zhumabekov

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)



[dpdk-dev] [PATCH 0/3] igb_uio: fixed typos and pci lock/unlock calls

2014-07-21 Thread Yerden Zhumabekov
Since PCI config lock/unlock functions were renamed in linux kernel,
wrappers are introduced to reflect this change.

Fixed a few typos.

Patches attached, not inlined. :)

Yerden Zhumabekov (3):
  igb_uio: fixed typos
  igb_uio: pci_config_lock/pci_config_unlock wrappers
  igb_uio: renaming pci config lock/unlock functions

 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   54 -
 1 file changed, 38 insertions(+), 16 deletions(-)

-- 
1.7.10.4



[dpdk-dev] [igb_uio PATCH 3/3] igb_uio: renaming pci config lock/unlock functions

2014-07-21 Thread Yerden Zhumabekov
renaming pci config lock/unlock functions using wrappers introduced
in commit f57049874f61046641a8eb1e9832810cc33befe5

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 605410e..418bfa2 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -148,11 +148,11 @@ store_extended_tag(struct device *dev,
else
return -EINVAL;

-   pci_cfg_access_lock(pci_dev);
+   pci_config_lock(pci_dev);
pci_bus_read_config_dword(pci_dev->bus, pci_dev->devfn,
PCI_DEV_CAP_REG, );
if (!(val & PCI_DEV_CAP_EXT_TAG_MASK)) { /* Not supported */
-   pci_cfg_access_unlock(pci_dev);
+   pci_config_unlock(pci_dev);
return -EPERM;
}

@@ -165,7 +165,7 @@ store_extended_tag(struct device *dev,
val &= ~PCI_DEV_CTRL_EXT_TAG_MASK;
pci_bus_write_config_dword(pci_dev->bus, pci_dev->devfn,
PCI_DEV_CTRL_REG, val);
-   pci_cfg_access_unlock(pci_dev);
+   pci_config_unlock(pci_dev);

return count;
 }
@@ -252,7 +252,7 @@ static bool pci_intx_mask_supported(struct pci_dev *pdev)
bool mask_supported = false;
uint16_t orig, new;

-   pci_block_user_cfg_access(pdev);
+   pci_config_lock(pdev);
pci_read_config_word(pdev, PCI_COMMAND, );
pci_write_config_word(pdev, PCI_COMMAND,
  orig ^ PCI_COMMAND_INTX_DISABLE);
@@ -265,7 +265,7 @@ static bool pci_intx_mask_supported(struct pci_dev *pdev)
mask_supported = true;
pci_write_config_word(pdev, PCI_COMMAND, orig);
}
-   pci_unblock_user_cfg_access(pdev);
+   pci_config_unlock(pdev);

return mask_supported;
 }
@@ -275,7 +275,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
bool pending;
uint32_t status;

-   pci_block_user_cfg_access(pdev);
+   pci_config_lock(pdev);
pci_read_config_dword(pdev, PCI_COMMAND, );

/* interrupt is not ours, goes to out */
@@ -292,7 +292,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
if (old != new)
pci_write_config_word(pdev, PCI_COMMAND, new);
}
-   pci_unblock_user_cfg_access(pdev);
+   pci_config_unlock(pdev);

return pending;
 }
@@ -357,7 +357,7 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
struct rte_uio_pci_dev *udev = igbuio_get_uio_pci_dev(info);
struct pci_dev *pdev = udev->pdev;

-   pci_cfg_access_lock(pdev);
+   pci_config_lock(pdev);
if (udev->mode == RTE_INTR_MODE_LEGACY)
pci_intx(pdev, !!irq_state);
else if (udev->mode == RTE_INTR_MODE_MSI) {
@@ -370,7 +370,7 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
list_for_each_entry(desc, >msi_list, list)
igbuio_msix_mask_irq(desc, irq_state);
}
-   pci_cfg_access_unlock(pdev);
+   pci_config_unlock(pdev);

return 0;
 }
-- 
1.7.10.4



[dpdk-dev] [igb_uio PATCH 2/3] igb_uio: pci_config_lock/pci_config_unlock wrappers

2014-07-21 Thread Yerden Zhumabekov
Since PCI config lock/unlock functions were renamed in linux kernel,
these wrappers are introduced to reflect this change.

Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   20 
 1 file changed, 20 insertions(+)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 02545d9..605410e 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -223,6 +223,26 @@ static const struct attribute_group dev_attr_grp = {
.attrs = dev_attrs,
 };

+static inline void
+pci_config_lock(struct pci_dev *pdev)
+{
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,3,0)
+   pci_block_user_cfg_access(pdev);
+#else
+   pci_cfg_access_lock(pdev);
+#endif
+}
+
+static inline void
+pci_config_unlock(struct pci_dev *pdev)
+{
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,3,0)
+   pci_unblock_user_cfg_access(pdev);
+#else
+   pci_cfg_access_unlock(pdev);
+#endif
+}
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 3, 0)
 /* Check if INTX works to control irq's.
  * Set's INTX_DISABLE flag and reads it back
-- 
1.7.10.4



[dpdk-dev] [igb_uio PATCH 1/3] igb_uio: fixed typos

2014-07-21 Thread Yerden Zhumabekov
Signed-off-by: Yerden Zhumabekov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 05cbe8e..02545d9 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -227,25 +227,27 @@ static const struct attribute_group dev_attr_grp = {
 /* Check if INTX works to control irq's.
  * Set's INTX_DISABLE flag and reads it back
  */
-static bool pci_intx_mask_supported(struct pci_dev *dev)
+static bool pci_intx_mask_supported(struct pci_dev *pdev)
 {
bool mask_supported = false;
-   uint16_t orig, new
+   uint16_t orig, new;

-   pci_block_user_cfg_access(dev);
+   pci_block_user_cfg_access(pdev);
pci_read_config_word(pdev, PCI_COMMAND, );
-   pci_write_config_word(dev, PCI_COMMAND,
+   pci_write_config_word(pdev, PCI_COMMAND,
  orig ^ PCI_COMMAND_INTX_DISABLE);
-   pci_read_config_word(dev, PCI_COMMAND, );
+   pci_read_config_word(pdev, PCI_COMMAND, );

if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
-   dev_err(>dev, "Command register changed from "
+   dev_err(>dev, "Command register changed from "
"0x%x to 0x%x: driver or hardware bug?\n", orig, new);
} else if ((new ^ orig) & PCI_COMMAND_INTX_DISABLE) {
mask_supported = true;
-   pci_write_config_word(dev, PCI_COMMAND, orig);
+   pci_write_config_word(pdev, PCI_COMMAND, orig);
}
-   pci_unblock_user_cfg_access(dev);
+   pci_unblock_user_cfg_access(pdev);
+
+   return mask_supported;
 }

 static bool pci_check_and_mask_intx(struct pci_dev *pdev)
@@ -253,7 +255,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
bool pending;
uint32_t status;

-   pci_block_user_cfg_access(dev);
+   pci_block_user_cfg_access(pdev);
pci_read_config_dword(pdev, PCI_COMMAND, );

/* interrupt is not ours, goes to out */
@@ -262,7 +264,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
uint16_t old, new;

old = status;
-   if (state != 0)
+   if (status != 0)
new = old & (~PCI_COMMAND_INTX_DISABLE);
else
new = old | PCI_COMMAND_INTX_DISABLE;
@@ -270,7 +272,7 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
if (old != new)
pci_write_config_word(pdev, PCI_COMMAND, new);
}
-   pci_unblock_user_cfg_access(dev);
+   pci_unblock_user_cfg_access(pdev);

return pending;
 }
-- 
1.7.10.4



[dpdk-dev] [igb_uio PATCH 0/3] igb_uio: fixed typos and pci lock/unlock calls

2014-07-21 Thread Yerden Zhumabekov
Since PCI config lock/unlock functions were renamed in linux kernel,
wrappers are introduced to reflect this change.

Fixed a few typos.


Yerden Zhumabekov (3):
  igb_uio: fixed typos
  igb_uio: pci_config_lock/pci_config_unlock wrappers
  igb_uio: renaming pci config lock/unlock functions

 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   54 -
 1 file changed, 38 insertions(+), 16 deletions(-)

-- 
1.7.10.4



[dpdk-dev] [PATCH 00/10] igb_uio related patches

2014-07-20 Thread Yerden Zhumabekov
hi,

Unfortunately the latest 'master' no longer builds on ubuntu 12.04.
Build log attached.
i've also added a patch to fix it. It involves fixing some typos and
reverting pci dev lock/unlock functions from older kernel versions
(maybe I'm not at liberty doing this). Please consider/remark.


19.07.2014 6:16, Thomas Monjalon ?:
> 2014-07-18 09:14, Stephen Hemminger:
>> Update patches so all are now bisectable, and incorporate comments.
>> Also fix the checkpatch warnings that are fixable.
> I've isolated all MSI additions in the dedicated commit.
>
> Acked-by: Thomas Monjalon 
>
> Applied for version 1.7.1
>
> What are the news about your uio work for kernel.org?
>
> Thanks

-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ

-- next part --
== Installing x86_64-native-linuxapp-gcc
make[5]: Nothing to be done for `depdirs'.
Configuration done
== Build scripts
== Build scripts/testhost
== Build lib
== Build lib/librte_eal
== Build lib/librte_eal/common
== Build lib/librte_eal/linuxapp
== Build lib/librte_eal/linuxapp/igb_uio
  CC [M]  
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:
 In function ?pci_intx_mask_supported?:
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:235:2:
 error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before 
?pci_block_user_cfg_access?
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:236:23:
 error: ?pdev? undeclared (first use in this function)
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:236:23:
 note: each undeclared identifier is reported only once for each function it 
appears in
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:239:42:
 error: ?new? undeclared (first use in this function)
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:249:1:
 error: no return statement in function returning non-void [-Werror=return-type]
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:
 In function ?pci_check_and_mask_intx?:
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:256:28:
 error: ?dev? undeclared (first use in this function)
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:265:7:
 error: ?state? undeclared (first use in this function)
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:
 In function ?igbuio_pci_irqcontrol?:
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:338:2:
 error: implicit declaration of function ?pci_cfg_access_lock? 
[-Werror=implicit-function-declaration]
/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:351:2:
 error: implicit declaration of function ?pci_cfg_access_unlock? 
[-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[10]: *** 
[/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o]
 Error 1
make[9]: *** 
[_module_/home/yerden/dpdk/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio]
 Error 2
make[8]: *** [sub-make] Error 2
make[7]: *** [igb_uio.ko] Error 2
make[6]: *** [igb_uio] Error 2
make[5]: *** [linuxapp] Error 2
make[4]: *** [librte_eal] Error 2
make[3]: *** [lib] Error 2
make[2]: *** [all] Error 2
make[1]: *** [x86_64-native-linuxapp-gcc_install] Error 2
make: *** [install] Error 2
-- next part --
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 05cbe8e..c5dbbe2 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -148,11 +148,11 @@ store_extended_tag(struct device *dev,
else
return -EINVAL;

-   pci_cfg_access_lock(pci_dev);
+   pci_block_user_cfg_access(pci_dev);
pci_bus_read_config_dword(pci_dev->bus, pci_dev->devfn,
PCI_DEV_CAP_REG, );
if (!(val & PCI_DEV_CAP_EXT_TAG_MASK)) { /* Not supported */
-   pci_cfg_access_unlock(pci_dev);
+   pci_unblock_user_cfg_access(pci_dev);
return -EPERM;
}

@@ -165,7 +165,7 @@ store_extended_tag(struct device *dev,
val &= ~PCI_DEV_CTRL_EXT_TAG_MASK;
pci_bus_write_config_dword(pci_dev->bus, pci_dev->devfn,
PCI_DEV_CTRL_REG, val);
-   pci_cfg_access_unlock(pci_dev);
+   pci_block_user_cfg_access(pci_dev);

return count;
 }
@@ -227,25 +227,26 @@ 

[dpdk-dev] librte_distributor relies on RSS?

2014-07-09 Thread Yerden Zhumabekov
Hi,

As far as I understand the code of rte_distributor,
rte_distributor_process() assign packets to workers using
m->pkt.hash.rss field of the mbuf. It means that Receive Side Scaling
should be initialized for the NIC. Otherwise, all packets would be
erroneously distributed to the same single worker.

Am I correct? maybe the RSS requirement should be added to the
description of the library then?

I apparently have some trouble enabling RSS for vmxnet3 pmd driver, so
the question is whether it is right or wrong to calculate and fill the
m->pkt.hash.rss field using my own hash function?

-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



[dpdk-dev] non-x86 ? Re: Would DPDK run on AMD processors

2014-07-04 Thread Yerden Zhumabekov
Intel DPDK is intentionally being developed to bring support of packet
processing
to Intel processors. In order to provide high performance in packet
processing,
developing Intel DPDK requires rather good optimization, like extensive
use of
SSE intrinsics etc. Hence it demands x86 arch.

Since Intel DPDK is opensource, there is no license restriction to run
it on any arch,
but I guess one will face rather intriguing technical issues doing that :)

03.07.2014 19:35, Derek Wasely ?:
> how about running DPDK on other processors ?   Any licensing restriction on 
> using it on non-x86 arch ?  Does it work automatically on say  PPC or OCTEON ?

-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



  1   2   >