[dpdk-dev] some questions about rte_memcpy

2015-01-21 Thread Matthew Hall
On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?

No guarantee. But a theory. It might use some things from the EAL init to 
figure out which version of the accelerated algorithm to use.

Matthew.


[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-21 Thread Thomas Monjalon
2015-01-21 14:43, Neil Horman:
> On Wed, Jan 21, 2015 at 05:05:51PM +0100, Thomas Monjalon wrote:
> > 2015-01-21 09:59, Neil Horman:
> > > Considered and answered already.  I'm in favor of listing macros and 
> > > structure
> > > changes in the abi document, but I think an exhaustive list isn't needed. 
> > >  If it
> > > is, we could spend pages diving into minute.  Better to point out the 
> > > need for
> > > abi noticies as patches get posted.
> > 
> > I'm afraid you don't understand what I'm saying. Copy/paste:
> > "No, I was suggesting to explain in this doc that macro removal must be
> > announced with a deprecation notice,
> > and that in case structure must be reworked, the name must change if we
> > want to preserve ABI compatibility with old structure."
> > Rewording: if you agree with this policy, please add it in this document.
> > 
> Yes, we're on the same page regarding what your asking, I just don't agree 
> that
> it needs to be explicitly called out.  I thought I was clear on that.
> Appaerntly not however, so if it will settle the point, I'll just add it.

OK maybe I didn't explain enough my proposal.
You can disagree but I want to be sure we think about the same thing.

1) Macros are not part of the ABI but can be part of the API.
Such macro removal must be announced in the previous release.
2) Structures are part of the ABI but cannot be versionned as the functions.
So an ABI breaking change should be done by cloning the structure in a new one.
And the API functions where this structure appears should be cloned and 
versionned
to support new structure while keeping old version.

Maybe that these precisions are confuse and useless.
Now I think I understand what you were saying by "an exhaustive list isn't 
needed".
You mean listing all types of ABI/API breakage like I did with these 2 cases, 
right?
I thought it was related to list of real/effective deprecations.

> > > > Neil, we expect that you consider comments done previously and that you 
> > > > test your patch.
> > > > Otherwise, we are losing time in useless reviews.
> > > > 
> > > Thomas, I have considered your comments, I simply don't agree with all of 
> > > them,
> > > and I made that clear.
> > > 
> > > As for losing time, you let the first attempt at this
> > > patch rot on the list in 1.7 and have done the same thing for the 1.8 
> > > cycle
> > > until I yelled for reviews.
> > 
> > Now, I'm really upset of your wrong assumptions.
> > You sent your first proposal on september, during 1.8 cycle, not 1.7 !
> > And during this cycle, the decision was to postpone it for 2.0 release.
> > 
> you're missing the point. I apologize for not getting the release numbers 
> right,
> it should be 1.8 to 2.0 not 1.7 to 1.8 as you note, but that doesn't really
> matter.  The point was 6 months.  6 months this has been sitting around.

No, 5 months. Yes, it's long.

> In that time up to this point I've gotten one review from another devloper on 
> the
> set, and you indicating that its not ready yet.  Then, the day 1.8 released, I
> reposed the patch series as we agreed, and its taken almost 5 weeks before 
> I've
> gotten any feedback on it, and then its feedback that could have been given 6
> months ago (you'll note this patch was initially identical to the version I
> posted back in september).  I think you can understand how I find that
> frustrating.

You must understand that I'd prefer more people feel involved by this change.
It would be saner to have this policy reviewed and acked by many developpers.
As it was announced on the roadmap for 2.0, this first month of the cycle was
ideal to have more discussions on how this policy can be precisely applied.
You only received my comments (which may be useless) and it's now time to
apply this important patchset.

> > I don't understand what's wrong with you.
> The above is whats wrong with me.  The fact that I can try and try and try to
> add value to this project so that I can expand its user base, and the best 
> I've
> thus far been able to receive is indifference.  At worst, the indifference is
> followed by being told that the indifference is tantamount to rejection.
> 
> 
> > You don't make any effort to understand what we are saying and
> > you make no effort to understand what is this doc directory.
> > You prefer crying that your patch is not applied.
> No effort?  How many emails have I written contesting your opinions, 
> presenting
> supporting evidence, only to be met with assertions?  I don't think I'm the 
> one
> not making an effort here.

At the end, I accept your point of view and will apply the patchset.

> > And I still don't understand if you are willing to work on a test tool for 
> > ABI?
> > 
> From this email
> http://dpdk.org/ml/archives/dev/2015-January/011306.html
> 
> ===
> > Yes, it should be another patchset.
> > Do you plan to work on it? It would be very convenient for developpers and
> > maintainers 

[dpdk-dev] Packet drops during non-exhaustive flood with OVS and 1.8.0

2015-01-21 Thread Andrey Korolyov
Hello,

I observed that the latest OVS with dpdk-1.8.0 and igb_uio starts to
drop packets earlier than a regular Linux ixgbe 10G interface, setup
follows:

receiver/forwarder:
- 8 core/2 head system with E5-2603v2, cores 1-3 are given to OVS exclusively
- n-dpdk-rxqs=6, rx scattering is not enabled
- x520 da
- 3.10/3.18 host kernel
- during 'legacy mode' testing, queue interrupts are scattered through all cores

sender:
- 16-core E52630, netmap framework for packet generation
- pkt-gen -f tx -i eth2 -s 10.6.9.0-10.6.9.255 -d
10.6.10.0-10.6.10.255 -S 90:e2:ba:84:19:a0 -D 90:e2:ba:85:06:07 -R
1100, results in 11Mpps 60-byte packet flood, there are constant
values during test.

OVS contains only single drop rule at the moment:
ovs-ofctl add-flow br0 in_port=1,actions=DROP

Packet generator was launched for tens of seconds for both Linux stack
and OVS+DPDK cases, resulting in zero drop/error count on the
interface in first, along with same counter values on pktgen and host
interface stat (means that the none of generated packets are
unaccounted).

I selected rate for about 11M because OVS starts to do packet drop
around this value, after same short test interface stat shows
following:

statistics  : {collisions=0, rx_bytes=22003928768,
rx_crc_err=0, rx_dropped=0, rx_errors=10694693, rx_frame_err=0,
rx_over_err=0, rx_packets=343811387, tx_bytes=0, tx_dropped=0,
tx_errors=0, tx_packets=0}

pktgen side:
Sent 354506080 packets, 60 bytes each, in 32.23 seconds.
Speed: 11.00 Mpps Bandwidth: 5.28 Gbps (raw 7.39 Gbps)

If rate will be increased up to 13-14Mpps, the relative error/overall
ratio will rise up to a one third. So far OVS on dpdk shows perfect
results and I do not want to reject this solution due to exhaustive
behavior like described one, so I`m open for any suggestions to
improve the situation (except using 1.7 branch :) ).


[dpdk-dev] [PATCH v4 06/11] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-01-21 Thread Tetsuya Mukawa
Hi Michael,

On 2015/01/20 18:23, Qiu, Michael wrote:
> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
>> The patch adds functions for unmapping igb_uio resources. The patch is only
>> for Linux and igb_uio environment. VFIO and BSD are not supported.
>>
>> v4:
>> - Add paramerter checking.
>> - Add header file to determine if hotplug can be enabled.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/common/Makefile  |  1 +
>>  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 38 +++
>>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
>> +
>>  5 files changed, 156 insertions(+)
>>  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h
>>
>> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
>> index 52c1a5f..db7cc93 100644
>> --- a/lib/librte_eal/common/Makefile
>> +++ b/lib/librte_eal/common/Makefile
>> @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
>>  INC += rte_hexdump.h rte_devargs.h rte_dev.h
>>  INC += rte_common_vect.h
>>  INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
>> +INC += rte_dev_hotplug.h
>>  
>>  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
>>  INC += rte_warnings.h
>> diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
>> b/lib/librte_eal/common/include/rte_dev_hotplug.h
>> new file mode 100644
>> index 000..b333e0f
>> --- /dev/null
>> +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
>> @@ -0,0 +1,44 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2015 IGEL Co.,LTd.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + * * Redistributions of source code must retain the above copyright
>> + *   notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + *   notice, this list of conditions and the following disclaimer in
>> + *   the documentation and/or other materials provided with the
>> + *   distribution.
>> + * * Neither the name of IGEL Co.,Ltd. nor the names of its
>> + *   contributors may be used to endorse or promote products derived
>> + *   from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_HOTPLUG_H_
>> +#define _RTE_DEV_HOTPLUG_H_
>> +
>> +/*
>> + * determine if hotplug can be enabled on the system
>> + */
>> +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
> As you said, VFIO should not work with it, so does it need to add the
> vfio check here?

Could I have a advice of you?
First I guess it's the best to include "eal_vfio.h" here, and add
checking of VFIO_PRESENT macro.
But it seems I cannot reach "eal_vfio.h" from this file.

My second option is just checking RTE_EAL_VFIO macro.
But according to "eal_vfio.h", if kernel is under 3.6.0, VFIO_PRESENT
will not be defined even when RTE_EAL_VFIO is enabled.
So I guess simply macro checking will not work correctly.

Anyway, here are my implementation choices so far.

1. Like "eal_vfio.h", check kernel version in "rte_dev_hotplug.h".
In this case, if "eal_vfio.h" is changed, "rte_edv_hotplug.h" may need
to be changed also.

2. Merge "eal_vfio.h" and "rte_dev_hotplug.h" definitions, and define
these in new rte header like "rte_settings.h".

Can I have advice about it?

Thanks,
Tetsuya

>
> Thanks,
> Michael
>> +#define ENABLE_HOTPLUG
>> +#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
>> +
>> +#endif /* _RTE_DEV_HOTPLUG_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
>> b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 3d2d93c..52c464c 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -137,6 +137,25 @@ pci_map_resource(void *requested_addr, int fd, off_t 
>> 

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Olivier MATZ
Hi Konstantin,

On 01/21/2015 05:28 PM, Ananyev, Konstantin wrote:
>> I added the support of Ether over GRE, IP over GRE and IP over IP
>> tunnels in csumonly to do the test. I ask the csum forward engine
>> to calculate inner IP+TCP checksums, and outer IP (case 6 in [1]).
>> Here are the results:
>>
>> 1/ When I use I40E_TXD_CTX_UDP_TUNNELING:
>> - vxlan: all checksums ok
>> - eth over gre: all checksums ok
>> - ip over gre: not transmitted by hw
>> - ip over ip: all checksums wrong (set to 0 by hw)
>>
>> 2/ When I use I40E_TXD_CTX_GRE_TUNNELING:
>> - vxlan: checksums ok
>> - eth over gre: all checksums ok
>> - ip over gre: all checksums ok
>> - ip over ip: all checksums wrong (set to 0 by hw)
>>
>> 3/ When I use 00b:
>> - vxlan: all checksums ok
>> - eth over gre: all checksums ok
>> - ip over gre: all checksums ok
>> - ip over ip: checksums wrong (set to 0 by hw)
>
> Wow, so there is absolutely no difference in results for L4TUNT=2(GRE) and 
> L4TUNT=0, right?
> And IP over IP doesn't work at all?

Right. I probably missed something in i40e driver. The application seems
to fill the mbuf properly.

> I suppose you set L4TUNLEN as described in spec for each case, right?

I think so.

> That looks really weird to me and as I can see completely contradicts with 
> what spec.
> I suppose we'll need to reproduce all that tests on our HW too.
> Could you send to us a patch with your changes, so we can try same thing?
> Or just a dump of TDD and TCD values for each case.

Sure, I'm going to send all my code and tests in a RFC patchset in
a few minutes. By the way, I'm off tomorrow, I won't be able to
answer.

Regards,
Olivier



[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-21 Thread Thomas Monjalon
2015-01-21 09:59, Neil Horman:
> On Wed, Jan 21, 2015 at 11:25:48AM +0100, Thomas Monjalon wrote:
> > 2015-01-20 16:17, Neil Horman:
> > > Adding a document describing rudimentary ABI policy and adding notice 
> > > space for
> > > any deprecation announcements
> > > 
> > > Signed-off-by: Neil Horman 
> > > CC: Thomas Monjalon 
> > > CC: "Richardson, Bruce" 
> > > 
> > > ---
> > > Change notes:
> > > 
> > > v5) Updated documentation to add notes from Thomas M.
> > > 
> > > v6) Moved abi.txt to guides/rel_notes/abi.rst
> > 
> > You didn't integrate this file in the index.
> > 
> Shiobahn indicated that its just a plain text file, so I left it as a plain 
> text
> file.  I guess we have different definitions of plain text files.
> 
> > [...]
> > 
> > > --- /dev/null
> > > +++ b/doc/guides/rel_notes/abi.rst
> > > @@ -0,0 +1,38 @@
> > > +ABI policy
> > > +==
> > > + ABI versions are set at the time of major release labeling, and ABI may
> > > +change multiple times between the last labeling and the HEAD label of 
> > > the git
> > > +tree without warning
> > > +
> > > + ABI versions, once released are available until such time as their
> > > +deprecation has been noted here for at least one major release cycle, 
> > > after it
> > > +has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and then the 
> > > decision to
> > > +remove it is made during the development of DPDK 1.9.  The decision will 
> > > be
> > > +recorded here, shipped with the DPDK 1.9 release, and actually removed 
> > > when DPDK
> > > +1.10 ships.
> > 
> > As previously said, speaking about 2.0/2.1 would be more coherent.
> > 
> As previously mentioned, I really don't see this as relevant, as it will be 
> out
> of date within a release, and I think we can agree, no one is going to update
> this paragraph every release.
> 
> > > +
> > > + ABI versions may be deprecated in whole, or in part as needed by a given
> > > +update.
> > > +
> > > + Some ABI changes may be too significant to reasonably maintain multiple
> > > +versions of.  In those events ABI's may be updated without backward
> > > +compatibility provided.  The requirements for doing so are:
> > > + 1) At least 3 acknoweldgements of the need on the dpdk.org
> > > + 2) A full deprecation cycle must be made to offer downstream consumers
> > > +sufficient warning of the change.  E.g. if dpdk 2.0 is under development 
> > > when
> > > +the change is proposed, a deprecation notice must be added to this file, 
> > > and
> > > +released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
> > > + 3) The LIBABIVER variable in the makefilei(s) where the ABI changes are
> > > +incorporated must be incremented in parallel with the ABI changes 
> > > themselves
> > > +
> > > + Note that the above process for ABI deprecation should not be undertaken
> > > +lightly.  ABI stability is extreemely important for downstream consumers 
> > > of the
> > > +DPDK, especially when distributed in shared object form.  Every effort 
> > > should be
> > > +made to preserve ABI whenever possible.  For instance, reorganizing 
> > > public
> > > +structure field for astetic or readability purposes should be avoided as 
> > > it will
> > > +cause ABI breakage.  Only significant (e.g. performance) reasons should 
> > > be seen
> > > +as cause to alter ABI.
> > 
> > When applying the patch, there are these (minor) warnings:
> > 
> > /home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:52: trailing 
> > whitespace.
> > /home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:55: new blank line 
> > at EOF.
> > 
> > When building the documentation, there are these errors:
> > make doc-guides-html
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:4: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:8: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:15: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:18: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:20: ERROR: 
> > Unexpected indentation.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:22: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:25: ERROR: 
> > Unexpected indentation.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:26: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:29: WARNING: 
> > Block quote ends without a blank line; unexpected unindent.
> > /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:: WARNING: 
> > document isn't included in any 

[dpdk-dev] [RFC 00/16] enhance checksum offload API

2015-01-21 Thread Stephen Hemminger
On Thu, 22 Jan 2015 00:36:19 +0100
Olivier Matz  wrote:

> The goal of this series is to clarify and simplify the mbuf offload API.
> Several issues are solved:
> 
> - simplify the definitions of PKT_TX_IP_CKSUM and PKT_TX_IPV4, each
>   flag has now only one meaning. No impact on the code.
> 
> - add a feature flag for OUTER_IP_CHECKSUM (from Jijiang's patches)
> 
> - remove the PKT_TX_UDP_TUNNEL_PKT flag: it is useless from an API point
>   of view. It was added because i40e need this info for some reason. We
>   have 3 solutions:
> 
>   - remove the flag and adapt the driver to the API (the choice I made
> for this series).
> 
>   - remove the flag and stop advertising OUTER_IP_CHECKSUM in i40e
> 
>   - keep this flag, penalizing performance of drivers that do not
> require the flag. It would also mean that drivers won't support
> outer IP checksum for all tunnel types, but only for the tunnel
> types having a flag.
> 
> - a side effect of this API clarification is that there is only one
>   way for doing one operation. If the hardware has several ways to
>   do the same operation, a choice has to be made in the driver.
> 
> The patch adds new tunnel types to testpmd csum forward engine.
> Note: the i40e patches should be carefully checked by i40e experts.
> 
> [1] http://dpdk.org/ml/archives/dev/2015-January/011127.html

If you are doing this could you invert the meaning of the checksum
flags? Right now the flags are fine for Intel hardware but are useless
for devices that have less checksum support.

It would work better if instead of two states:
  * Checksum known bad=>  PKT_RX_L4_CKSUM_BAD == 1
  * Checksum (maybe) good =>  PKT_RX_L4_CKSUM_BAD == 0
The bit was changed to only flag good checksum:
  * Checksum known good => PKT_RX_L4_CKSUM_GOOD == 1
  * Checksum status unknown => PKT_RX_L4_CKSUM_GOOD == 0

That way code code fallback to software checksum if hardware was incapable
of handling the packet. It does mean packets with bad checksums get checked
twice, but thats ok.



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Stephen Hemminger
On Wed, 21 Jan 2015 15:25:40 -0600
Jim Thompson  wrote:

> I?m not as concerned with compile times given the potential performance boost.

Compile time matters. Right now full build of large project is fast.
Like 2 minutes or less.


Is this only the test applications (which can be disabled from the build),
or the library trying to do some tests. Since the build and target environment
will be different on a real product, the whole scheme seems flawed.


[dpdk-dev] [PATCH 0/7] vmxnet3: driver enhancements

2015-01-21 Thread Stephen Hemminger
On Thu, 15 Jan 2015 12:02:11 +0100
Thomas Monjalon  wrote:

> Someone to review these patches?

Any comments from 
Bruce Richardson 


[dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for ethertype filter

2015-01-21 Thread Jingjing Wu
new commands for ethertype filter
  - ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr)
(mac_address) ethertype (ether_type) (drop|fwd) queue (queue_id)

Signed-off-by: Jingjing Wu 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 46 +++--
 1 file changed, 11 insertions(+), 35 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index be935c2..61a7f6d 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1397,56 +1397,32 @@ add_ethertype_filter

 Add a L2 Ethertype filter, which identify packets by their L2 Ethertype mainly 
assign them to a receive queue.

-add_ethertype_filter (port_id) ethertype (eth_value) priority (enable|disable) 
(pri_value) queue (queue_id) index (idx)
+ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr) (mac_address) 
ethertype (ether_type) (drop|fwd) queue (queue_id)

 The available information parameters are:

 *   port_id:  the port which the Ethertype filter assigned on.

-*   eth_value: the EtherType value want to match,
-for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
invalid.
-
-*   enable: user priority participates in the match.
-
-*   disable: user priority doesn't participate in the match.
-
-*   pri_value: user priority value that want to match.
-
-*   queue_id : The receive queue associated with this EtherType filter
+*   mac_addr: need compare destination mac address.

-*   index: the index of this EtherType filter
-
-Example:
-
-.. code-block:: console
+*   mac_ignr: ignore destination mac address match.

-testpmd> add_ethertype_filter 0 ethertype 0x0806 priority disable 0 queue 
3 index 0
-Assign ARP packet to receive queue 3
+*   mac_address: destination mac address need to match.

-remove_ethertype_filter
-~~~
-
-Remove a L2 Ethertype filter
-
-remove_ethertype_filter (port_id) index (idx)
-
-get_ethertype_filter
-
-
-Get and display a L2 Ethertype filter
+*   ether_type: the EtherType value want to match,
+for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
invalid.

-get_ethertype_filter (port_id) index (idx)
+*   queue_id : The receive queue associated with this EtherType filter. It is 
meaningless when deleting or dropping.

 Example:

 .. code-block:: console

-testpmd> get_ethertype_filter 0 index 0
+testpmd> ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 3
+add a rule to assign ARP packet to receive queue 3

-filter[0]:
-ethertype: 0x0806
-priority: disable, 0
-queue: 3
+   testpmd> ethertype_filter 0 del mac_ignr ethertype 0x0806 fwd queue 3
+delete the rule to assign ARP packet to receive queue 3

 add_2tuple_filter
 ~
-- 
1.9.3



[dpdk-dev] DPDK - TX from lcore in packet distributor configuration

2015-01-21 Thread Bruce Richardson
On Wed, Jan 21, 2015 at 03:00:50PM +0100, deco33000 Jog wrote:
> Hello,
> 
> -- PROBLEM
> I have a AF_PACKET socket which is in promiscuous mode to get all the NIC 
> traffic and let my apps do the whole stuff.
> 
> So I have one receiver and need to communicate the packet to different 
> threads/processes (lcore) so that they can process the rest of the packet 
> (tcp/udp...)
> 
> -- QUESTION
> I read about the packet distributor architecture which seems to answer that 
> need.
> http://dpdk.org/doc/guides/prog_guide/packet_distrib_lib.html
> 
> BUT i fear that it be slow at resending the packet to the distributor which 
> may already be overloaded by inputs from the net. Why pass back the answer to 
> the distributor if the lcore could send to the wire directly ?
> 
> My problem is going back to the distributor after the packet processing. i 
> would a direct send to the tx ring.
> 
> Is it possible ? How ?  By passing the TX pointer to the lcore ?

If you sent up the NIC so that it has multiple queues for each thread you can
have each worker send directly to the NIC. Without multiple queues, they could
still send directly, they will just have to use locking or some other access
mechanism to mediate access to the common TX queue.
The other problem with this approach is that sending packets individually is
almost always slower than sending them in bursts. To mitigate against this,
you could look to buffer packets inside the workers before transmitting them
back out, but that could lead to packets being sent out of order - not sure if
that is a problem for you or not.

The reason the distributor sample app sends the packets back to the distributor
after worker processing is to overcome these limitations. There is no additional
cross-core round trip involved in sending a packet back to the distributor along
with the request, and having the distributor re-gather the packets ensures 
ordering
within flows/tags is maintained. Thereafter packets can be burst-sent out an
ethernet port.

Out of interest, given you are using the AF_PACKET driver, what rate of packets
per second are you looking at? 

/Bruce


[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Ananyev, Konstantin
Hi Olivier,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, January 21, 2015 3:25 PM
> To: Liu, Jijiang; Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum 
> forwarding engine
> 
> Hi,
> 
> On 01/21/2015 04:12 AM, Liu, Jijiang wrote:
> > Ok, and why it should be our problem?
> > We have a lot of things done in a different manner then
> > linux/freebsd kernel drivers, Why now it became a problem?
> 
>  If linux doesn't need an equivalent flag for doing the same thing, it
>  probably means we don't need it either.
> >>>
> >>> Probably yes  Or probably not.
> >>> Why do we need to guess what was the intention of guys who wrote that
> >> part of linux driver?
> >>
> >> Because the dpdk looks very similar to that part of linux driver.
> >
> > A  guy from Intel  who have already confirmed that the NVGRE is not 
> > supported yet in Linux kernel.
> >
> > He said "So far as I know it is not yet supported and I have no information 
> > on when it will be."
> 
> I added the support of Ether over GRE, IP over GRE and IP over IP
> tunnels in csumonly to do the test. I ask the csum forward engine
> to calculate inner IP+TCP checksums, and outer IP (case 6 in [1]).
> Here are the results:
> 
> 1/ When I use I40E_TXD_CTX_UDP_TUNNELING:
> - vxlan: all checksums ok
> - eth over gre: all checksums ok
> - ip over gre: not transmitted by hw
> - ip over ip: all checksums wrong (set to 0 by hw)
> 
> 2/ When I use I40E_TXD_CTX_GRE_TUNNELING:
> - vxlan: checksums ok
> - eth over gre: all checksums ok
> - ip over gre: all checksums ok
> - ip over ip: all checksums wrong (set to 0 by hw)
> 
> 3/ When I use 00b:
> - vxlan: all checksums ok
> - eth over gre: all checksums ok
> - ip over gre: all checksums ok
> - ip over ip: checksums wrong (set to 0 by hw)

Wow, so there is absolutely no difference in results for L4TUNT=2(GRE) and 
L4TUNT=0, right?
And IP over IP doesn't work at all?
I suppose you set L4TUNLEN as described in spec for each case, right?
That looks really weird to me and as I can see completely contradicts with what 
spec.
I suppose we'll need to reproduce all that tests on our HW too.
Could you send to us a patch with your changes, so we can try same thing?
Or just a dump of TDD and TCD values for each case. 
Konstantin

> 
> All the ip over ip tests do not work yet for an unknown reason.
> There is maybe something wrong in my app or in the driver
> (although the registers looks consistent with the datasheet).
> 
> I think we could use 3/ for all tunnels, because the ipip case
> is supposed to work according to the datasheet, and all other cases
> work too.
> 
> It would allow to remove the UDP_TUNNELING flag from mbuf API.
> 
> I will send a RFC patch that provides the API change and this new
> feature in csum forward engine, with full tests on ixgbe and i40e
> and explanations for all changes.
> 
> Regards,
> Olivier
> 
> [1] http://dpdk.org/ml/archives/dev/2015-January/011127.html



[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Olivier MATZ
Hi,

On 01/21/2015 04:12 AM, Liu, Jijiang wrote:
> Ok, and why it should be our problem?
> We have a lot of things done in a different manner then
> linux/freebsd kernel drivers, Why now it became a problem?

 If linux doesn't need an equivalent flag for doing the same thing, it
 probably means we don't need it either.
>>>
>>> Probably yes  Or probably not.
>>> Why do we need to guess what was the intention of guys who wrote that
>> part of linux driver?
>>
>> Because the dpdk looks very similar to that part of linux driver.
>
> A  guy from Intel  who have already confirmed that the NVGRE is not supported 
> yet in Linux kernel.
>
> He said "So far as I know it is not yet supported and I have no information 
> on when it will be."

I added the support of Ether over GRE, IP over GRE and IP over IP
tunnels in csumonly to do the test. I ask the csum forward engine
to calculate inner IP+TCP checksums, and outer IP (case 6 in [1]).
Here are the results:

1/ When I use I40E_TXD_CTX_UDP_TUNNELING:
- vxlan: all checksums ok
- eth over gre: all checksums ok
- ip over gre: not transmitted by hw
- ip over ip: all checksums wrong (set to 0 by hw)

2/ When I use I40E_TXD_CTX_GRE_TUNNELING:
- vxlan: checksums ok
- eth over gre: all checksums ok
- ip over gre: all checksums ok
- ip over ip: all checksums wrong (set to 0 by hw)

3/ When I use 00b:
- vxlan: all checksums ok
- eth over gre: all checksums ok
- ip over gre: all checksums ok
- ip over ip: checksums wrong (set to 0 by hw)

All the ip over ip tests do not work yet for an unknown reason.
There is maybe something wrong in my app or in the driver
(although the registers looks consistent with the datasheet).

I think we could use 3/ for all tunnels, because the ipip case
is supposed to work according to the datasheet, and all other cases
work too.

It would allow to remove the UDP_TUNNELING flag from mbuf API.

I will send a RFC patch that provides the API change and this new
feature in csum forward engine, with full tests on ixgbe and i40e
and explanations for all changes.

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2015-January/011127.html



[dpdk-dev] Q on Support for I217 and I218 Intel chipsets.

2015-01-21 Thread Ravi Kerur
Intel team,

Please let me know what additional testing needs to be done for I217/I218?
I have confined changes only to _osdep_ files and have done basic testing
with testpmd utility. Since DPDK PMD driver supporting e1000e  has been
available for quite sometime, I have assumed basic testing for Tx/Rx
packets should suffice.

Thanks,
Ravi

On Fri, Jan 16, 2015 at 4:01 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Friday, January 16, 2015 11:32 AM
> > To: Ananyev, Konstantin
> > Cc: Ravi Kerur; Thomas Monjalon; dev at dpdk.org
> > Subject: Re: [dpdk-dev] Q on Support for I217 and I218 Intel chipsets.
> >
> > On Fri, Jan 16, 2015 at 11:08:46AM +, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Richardson, Bruce
> > > > Sent: Friday, January 16, 2015 10:53 AM
> > > > To: Ananyev, Konstantin
> > > > Cc: Ravi Kerur; Thomas Monjalon; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] Q on Support for I217 and I218 Intel
> chipsets.
> > > >
> > > > On Thu, Jan 15, 2015 at 11:54:52PM +, Ananyev, Konstantin wrote:
> > > > > Hi,
> > > > >
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
> > > > > > Sent: Thursday, January 15, 2015 8:34 PM
> > > > > > To: Thomas Monjalon
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] Q on Support for I217 and I218 Intel
> chipsets.
> > > > > >
> > > > > > On Wed, Jan 14, 2015 at 8:27 AM, Thomas Monjalon <
> thomas.monjalon at 6wind.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 2015-01-09 04:41, Ravi Kerur:
> > > > > > > > Thomas,
> > > > > > > >
> > > > > > > > Please let me know how I can move forward on this. If i
> confine changes
> > > > > > > in
> > > > > > > > e1000/ directory to e1000_osdep.h file only and the rest in
> PMD will that
> > > > > > > > work? The reason I ask is because of following comment  in
> README file.
> > > > > > > >
> > > > > > > > ...
> > > > > > > > Few changes to the original FreeBSD sources were made to:
> > > > > > > > - Adopt it for PMD usage mode:
> > > > > > > > e1000_osdep.c
> > > > > > > > e1000_osdep.h
> > > > > > > > ...
> > > > >
> > > > > Yes, if needed you can modify these files.
> > > > > In fact, these files are the only 2 that are allowed to be
> modified inside e1000 sub-directory.
> > > > > As I understand you plan to implement E1000_READ_FLASH_REG  and
> E1000_WRITE_FLASH_REG
> > > > > macros properly, correct?
> > > > > Konstantin
> > > > >
> > > >
> > > > As a cleanup we should really look to move these two files out of
> the e1000
> > > > subdirectory (and similarly for the ixgbe versions etc.), so as to
> give a cleaner
> > > > and more manageable separation between what can be edited or not.
> > >
> > > It was always like that for all Intel PMDs we have:
> > >
> > > $ find lib/ -name '*_osdep.*' | grep -v acl
> > > lib/librte_pmd_vmxnet3/vmxnet3/vmxnet3_osdep.h
> > > lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h
> > > lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_osdep.h
> > > lib/librte_eal/linuxapp/kni/ethtool/igb/e1000_osdep.h
> > > lib/librte_pmd_i40e/i40e/i40e_osdep.h
> > > lib/librte_pmd_e1000/e1000/e1000_osdep.c
> > > lib/librte_pmd_e1000/e1000/e1000_osdep.h
> > >
> > > As I understand ND has it's own version of _osdep.* for each
> OS they support.
> > > We obviously modify it to fit DPDK purposes.
> > >
> > > Konstantin
> > >
> > > >
> > > > /Bruce
> >
> > Yep. Doesn't mean we haven't put it in the wrong place though! :-)
>
> We just don't move it at all :)
> It is at the same place where ND puts it, we just modify the contents.
> From my point - current location is perfectly ok.
> Konstantin
>
> >
> > /Bruce
>


[dpdk-dev] [PATCH v7 4/4] docs: Add ABI documentation

2015-01-21 Thread Neil Horman
Adding a document describing rudimentary ABI policy and adding notice space for
any deprecation announcements

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 

---
Change notes:

v5) Updated documentation to add notes from Thomas M.

v6) Moved abi.txt to guides/rel_notes/abi.rst

v7) Updated abi.rst to integrate with index file
Updated abi.rst to conform to rst formatting
Updated abi.rst to include example deprecation notices.  Its not exactly the
language that Thomas indicated, but I think it makes the idea clear.
---
 doc/guides/rel_notes/abi.rst | 41 +
 1 file changed, 41 insertions(+)
 create mode 100644 doc/guides/rel_notes/abi.rst

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
new file mode 100644
index 000..9b72719
--- /dev/null
+++ b/doc/guides/rel_notes/abi.rst
@@ -0,0 +1,41 @@
+ABI policy
+==
+ABI versions are set at the time of major release labeling, and ABI may change
+multiple times between the last labeling and the HEAD label of the git tree
+without warning.
+
+ABI versions, once released are available until such time as their
+deprecation has been noted here for at least one major release cycle, after it
+has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and then the decision 
to
+remove it is made during the development of DPDK 1.9.  The decision will be
+recorded here, shipped with the DPDK 1.9 release, and actually removed when 
DPDK
+1.10 ships.
+
+ABI versions may be deprecated in whole, or in part as needed by a given 
update.
+
+Some ABI changes may be too significant to reasonably maintain multiple
+versions of.  In those events ABI's may be updated without backward
+compatibility provided.  The requirements for doing so are:
+
+#. At least 3 acknoweldgements of the need on the dpdk.org
+#. A full deprecation cycle must be made to offer downstream consumers 
sufficient warning of the change.  E.g. if dpdk 2.0 is under development when 
the change is proposed, a deprecation notice must be added to this file, and 
released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
+#. The LIBABIVER variable in the makefilei(s) where the ABI changes are 
incorporated must be incremented in parallel with the ABI changes themselves
+
+Note that the above process for ABI deprecation should not be undertaken
+lightly.  ABI stability is extreemely important for downstream consumers of the
+DPDK, especially when distributed in shared object form.  Every effort should 
be
+made to preserve ABI whenever possible.  For instance, reorganizing public
+structure field for astetic or readability purposes should be avoided as it 
will
+cause ABI breakage.  Only significant (e.g. performance) reasons should be seen
+as cause to alter ABI.
+
+Examples of Deprecation notices
+---  
+* The Macro #RTE_FOO is deprecated and will be removed with version 2.0, to be 
replaced with the inline function rte_bar()
+* The function rte_mbuf_grok has been updated to include new parameter in 
version 2.0.  Backwards compatibility will be maintained for this function 
until the release of version 2.1
+* The members struct foo have been reorganized in release 2.0.  Existing 
binary applications will have backwards compatibility in release 2.0, while 
newly built binaries will need to reference new structure variant struct foo2.  
Compatibility will be removed in release 2.2, and all applications will require 
updating a rebuilding to the new structure at that time, which will be renamed 
to the origional struct foo.
+* Significant ABI changes are planned for the librte_dostuff library.  The 
upcomming release 2.0 will not contain these changes, but release 2.1 will, and 
no backwards compatibility is planned due to the invasive nature of these 
changes.  Binaries using this library built prior to version 2.1 will require 
updating and recompilation.
+
+Deprecation Notices
+---
+
-- 
2.1.0



[dpdk-dev] [PATCH v7 3/4] Add library version extenstion

2015-01-21 Thread Neil Horman
To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken.  This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 

---
Change Notes:
v3)
Made symlinking of libraries conditional on a DSO build

v4) Removed erroneous newline
changed @exit 1 to @false
changed ./$(LIB) to $<
---
 lib/librte_acl/Makefile  |  2 ++
 lib/librte_cfgfile/Makefile  |  2 ++
 lib/librte_cmdline/Makefile  |  2 ++
 lib/librte_compat/Makefile   |  2 ++
 lib/librte_distributor/Makefile  |  2 ++
 lib/librte_eal/bsdapp/eal/Makefile   |  2 ++
 lib/librte_eal/linuxapp/eal/Makefile |  2 ++
 lib/librte_ether/Makefile|  2 ++
 lib/librte_hash/Makefile |  2 ++
 lib/librte_ip_frag/Makefile  |  2 ++
 lib/librte_ivshmem/Makefile  |  2 ++
 lib/librte_kni/Makefile  |  2 ++
 lib/librte_kvargs/Makefile   |  2 ++
 lib/librte_lpm/Makefile  |  2 ++
 lib/librte_malloc/Makefile   |  2 ++
 lib/librte_mbuf/Makefile |  2 ++
 lib/librte_mempool/Makefile  |  2 ++
 lib/librte_meter/Makefile|  2 ++
 lib/librte_pipeline/Makefile |  2 ++
 lib/librte_pmd_af_packet/Makefile|  2 ++
 lib/librte_pmd_bond/Makefile |  2 ++
 lib/librte_pmd_e1000/Makefile|  2 ++
 lib/librte_pmd_enic/Makefile |  2 ++
 lib/librte_pmd_i40e/Makefile |  2 ++
 lib/librte_pmd_ixgbe/Makefile|  2 ++
 lib/librte_pmd_pcap/Makefile |  2 ++
 lib/librte_pmd_ring/Makefile |  2 ++
 lib/librte_pmd_virtio/Makefile   |  2 ++
 lib/librte_pmd_vmxnet3/Makefile  |  2 ++
 lib/librte_pmd_xenvirt/Makefile  |  2 ++
 lib/librte_port/Makefile |  2 ++
 lib/librte_power/Makefile|  2 ++
 lib/librte_ring/Makefile |  2 ++
 lib/librte_sched/Makefile|  2 ++
 lib/librte_table/Makefile|  2 ++
 lib/librte_timer/Makefile|  2 ++
 lib/librte_vhost/Makefile|  2 ++
 mk/rte.lib.mk| 12 ++--
 38 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 45cbf80..765deb1 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -39,6 +39,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)

 EXPORT_MAP := rte_acl_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += tb_mem.c

diff --git a/lib/librte_cfgfile/Makefile b/lib/librte_cfgfile/Makefile
index a4f73de..032c240 100644
--- a/lib/librte_cfgfile/Makefile
+++ b/lib/librte_cfgfile/Makefile
@@ -41,6 +41,8 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_cfgfile_version.map

+LIBABIVER := 1
+
 #
 # all source are stored in SRCS-y
 #
diff --git a/lib/librte_cmdline/Makefile b/lib/librte_cmdline/Makefile
index 3c71831..719dff6 100644
--- a/lib/librte_cmdline/Makefile
+++ b/lib/librte_cmdline/Makefile
@@ -38,6 +38,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3

 EXPORT_MAP := rte_cmdline_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := cmdline.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_cirbuf.c
diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
index 0bab870..0c57533 100644
--- a/lib/librte_compat/Makefile
+++ b/lib/librte_compat/Makefile
@@ -32,6 +32,8 @@
 include $(RTE_SDK)/mk/rte.vars.mk


+LIBABIVER := 1
+
 # install includes
 SYMLINK-y-include := rte_compat.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 3674a2c..4c9af17 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,6 +39,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)

 EXPORT_MAP := rte_distributor_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 0b5f9d9..ae214a4 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -48,6 +48,8 @@ CFLAGS += $(WERROR_FLAGS) -O3

 EXPORT_MAP := rte_eal_version.map

+LIBABIVER := 1
+
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) := eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_memory.c
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index bae8af1..e117cec 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -35,6 +35,8 @@ LIB = librte_eal.a

 EXPORT_MAP := rte_eal_version.map

+LIBABIVER := 1
+
 VPATH += $(RTE_SDK)/lib/librte_eal/common

 CFLAGS += 

[dpdk-dev] [PATCH v7 1/4] compat: Add infrastructure to support symbol versioning

2015-01-21 Thread Neil Horman
Add initial pass header files to support symbol versioning.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 
CC: "Gonzalez Monroy, Sergio" 

---
Change Notes:
V2)
Moved ifeq to _INSTALL target

V3)
Undo V2 changes and make librte_compat use the rte.install.mk file
instead

v4)
changed --version-script to accept SRCDIR in this patch at per request
documented versioning macros
cleaned up macro parameter consistency
converted SA macro to RTE_STR macro
fixed copyright
---
 lib/Makefile   |   1 +
 lib/librte_compat/Makefile |  38 +
 lib/librte_compat/rte_compat.h | 117 +
 mk/rte.lib.mk  |   4 ++
 4 files changed, 160 insertions(+)
 create mode 100644 lib/librte_compat/Makefile
 create mode 100644 lib/librte_compat/rte_compat.h

diff --git a/lib/Makefile b/lib/Makefile
index 0ffc982..d617d81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -31,6 +31,7 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+DIRS-y += librte_compat
 DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
 DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
new file mode 100644
index 000..0bab870
--- /dev/null
+++ b/lib/librte_compat/Makefile
@@ -0,0 +1,38 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2013 Neil Horman 
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+
+# install includes
+SYMLINK-y-include := rte_compat.h
+
+include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
new file mode 100644
index 000..d7cc176
--- /dev/null
+++ b/lib/librte_compat/rte_compat.h
@@ -0,0 +1,117 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010 Neil Horman .
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF 

[dpdk-dev] [PATCH v7 01/26] version: 2.0.0-rc0

2015-01-21 Thread Neil Horman
On Wed, Jan 21, 2015 at 03:57:25PM -0500, Neil Horman wrote:
> From: Thomas Monjalon 
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/librte_eal/common/include/rte_version.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_version.h 
> b/lib/librte_eal/common/include/rte_version.h
> index d2686ae..a267040 100644
> --- a/lib/librte_eal/common/include/rte_version.h
> +++ b/lib/librte_eal/common/include/rte_version.h
> @@ -54,12 +54,12 @@ extern "C" {
>  /**
>   * Major version number i.e. the x in x.y.z
>   */
> -#define RTE_VER_MAJOR 1
> +#define RTE_VER_MAJOR 2
>  
>  /**
>   * Minor version number i.e. the y in x.y.z
>   */
> -#define RTE_VER_MINOR 8
> +#define RTE_VER_MINOR 0
>  
>  /**
>   * Patch level number i.e. the z in x.y.z
> @@ -69,14 +69,14 @@ extern "C" {
>  /**
>   * Extra string to be appended to version number
>   */
> -#define RTE_VER_SUFFIX ""
> +#define RTE_VER_SUFFIX "-rc"
>  
>  /**
>   * Patch release number
>   *   0-15 = release candidates
>   *   16   = release
>   */
> -#define RTE_VER_PATCH_RELEASE 16
> +#define RTE_VER_PATCH_RELEASE 0
>  
>  /**
>   * Macro to compute a version number usable for comparisons
> -- 
> 2.1.0
> 
> 
Shoot, sorry, exuse this post, something just went sideways with my git tree
here
Neil



[dpdk-dev] [PATCH v7 10/26] nic_uio: fix thread structure compatibility for future FreeBSD

2015-01-21 Thread Neil Horman
From: Bruce Richardson 

Replace d_thread_t with struct thread in nic_uio.

Ref: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196691
Quote:
"The d_thread_t typedef is a compat shim to support FreeBSD 4.x.
I'm planning to remove this shim from 11 and dpdk is very unlikely
to ever be ported to 4.x.
If it does it will need far more changes than just d_thread_t"

Reported-by: John Baldwin 
Signed-off-by: Bruce Richardson 
---
 lib/librte_eal/bsdapp/nic_uio/nic_uio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/nic_uio/nic_uio.c 
b/lib/librte_eal/bsdapp/nic_uio/nic_uio.c
index ed11d84..5ae8560 100644
--- a/lib/librte_eal/bsdapp/nic_uio/nic_uio.c
+++ b/lib/librte_eal/bsdapp/nic_uio/nic_uio.c
@@ -175,13 +175,13 @@ nic_uio_mmap_single(struct cdev *cdev, vm_ooffset_t 
*offset, vm_size_t size,


 int
-nic_uio_open(struct cdev *dev, int oflags, int devtype, d_thread_t *td)
+nic_uio_open(struct cdev *dev, int oflags, int devtype, struct thread *td)
 {
return 0;
 }

 int
-nic_uio_close(struct cdev *dev, int fflag, int devtype, d_thread_t *td)
+nic_uio_close(struct cdev *dev, int fflag, int devtype, struct thread *td)
 {
return 0;
 }
-- 
2.1.0



[dpdk-dev] [PATCH v7 09/26] app/testpmd: remove duplicated function for list parsing

2015-01-21 Thread Neil Horman
From: Bruce Richardson 

There were two static functions called "parse_item_list" in testpmd app.
Since one was a superset of the functionality of the other, we can
collapse the two calls down into a single one, shared between the two
C files.

Signed-off-by: Bruce Richardson 
Acked-by: Pablo de Lara 
---
 app/test-pmd/cmdline.c|  2 +-
 app/test-pmd/parameters.c | 49 ++-
 app/test-pmd/testpmd.h|  3 +++
 3 files changed, 6 insertions(+), 48 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 882a5a2..4618b92 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2224,7 +2224,7 @@ cmdline_parse_inst_t cmd_stop = {

 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */

-static unsigned int
+unsigned int
 parse_item_list(char* str, const char* item_name, unsigned int max_items,
unsigned int *parsed_items, int check_unique_values)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index fcb2c99..adf3203 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -491,52 +491,6 @@ parse_ringnuma_config(const char *q_arg)
return 0;
 }

-static unsigned int
-parse_item_list(char* str, unsigned int max_items, unsigned int *parsed_items)
-{
-   unsigned int nb_item;
-   unsigned int value;
-   unsigned int i;
-   int value_ok;
-   char c;
-
-   /*
-* First parse all items in the list and store their value.
-*/
-   value = 0;
-   nb_item = 0;
-   value_ok = 0;
-   for (i = 0; i < strlen(str); i++) {
-   c = str[i];
-   if ((c >= '0') && (c <= '9')) {
-   value = (unsigned int) (value * 10 + (c - '0'));
-   value_ok = 1;
-   continue;
-   }
-   if (c != ',') {
-   printf("character %c is not a decimal digit\n", c);
-   return (0);
-   }
-   if (! value_ok) {
-   printf("No valid value before comma\n");
-   return (0);
-   }
-   if (nb_item < max_items) {
-   parsed_items[nb_item] = value;
-   value_ok = 0;
-   value = 0;
-   }
-   nb_item++;
-   }
-
-   if (nb_item >= max_items)
-   rte_exit(EXIT_FAILURE, "too many txpkt segments!\n");
-
-   parsed_items[nb_item++] = value;
-
-   return (nb_item);
-}
-
 void
 launch_args_parse(int argc, char** argv)
 {
@@ -1050,7 +1004,8 @@ launch_args_parse(int argc, char** argv)
unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
unsigned int nb_segs;

-   nb_segs = parse_item_list(optarg, 
RTE_MAX_SEGS_PER_PKT, seg_lengths);
+   nb_segs = parse_item_list(optarg, "txpkt 
segments",
+   RTE_MAX_SEGS_PER_PKT, 
seg_lengths, 0);
if (nb_segs > 0)
set_tx_pkt_segments(seg_lengths, 
nb_segs);
else
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8b0740..8f5e6c7 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -446,6 +446,9 @@ port_pci_reg_write(struct rte_port *port, uint32_t reg_off, 
uint32_t reg_v)
port_pci_reg_write([(pt_id)], (reg_off), (reg_value))

 /* Prototypes */
+unsigned int parse_item_list(char* str, const char* item_name,
+   unsigned int max_items,
+   unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void prompt(void);
 void nic_stats_display(portid_t port_id);
-- 
2.1.0



[dpdk-dev] [PATCH v7 08/26] bond: fix vlan flag interpretation

2015-01-21 Thread Neil Horman
From: Declan Doherty 

This patch contains a fix for link bonding handling of vlan tagged packets in 
mode 3 and 5.
Currently xmit_slave_hash function misinterprets the PKT_RX_VLAN_PKT flag to 
mean that
there is a vlan tag within the packet when in actually means that there is a 
valid entry
in the vlan_tci field in the mbuf.

- Fixed VLAN tag support in hashing functions.
- Adds support for TCP in layer 4 header hashing.
- Splits transmit hashing function into separate functions for each policy to
  reduce branching and to make the code clearer.
- Fixed incorrect flag set in test application packet generator.

Test report: http://dpdk.org/ml/archives/dev/2015-January/010792.html

Signed-off-by: Declan Doherty 
Acked-by: Pawel Wodkowski 
Tested-by: SunX Jiajia 
---
 app/test/packet_burst_generator.c  |   2 +-
 lib/librte_net/rte_ip.h|   8 ++
 lib/librte_pmd_bond/rte_eth_bond_api.c |   8 ++
 lib/librte_pmd_bond/rte_eth_bond_pmd.c | 162 -
 lib/librte_pmd_bond/rte_eth_bond_private.h |  15 +++
 5 files changed, 122 insertions(+), 73 deletions(-)

diff --git a/app/test/packet_burst_generator.c 
b/app/test/packet_burst_generator.c
index b2824dc..4a89663 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -97,7 +97,7 @@ initialize_eth_header(struct ether_hdr *eth_hdr, struct 
ether_addr *src_mac,
vhdr->eth_proto =  rte_cpu_to_be_16(ETHER_TYPE_IPv4);
vhdr->vlan_tci = van_id;
} else {
-   eth_hdr->ether_type = rte_cpu_to_be_16(ETHER_TYPE_VLAN);
+   eth_hdr->ether_type = rte_cpu_to_be_16(ETHER_TYPE_IPv4);
}

 }
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index f0ec543..64935d9 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -110,6 +110,14 @@ struct ipv4_hdr {
   (((c) & 0xff) << 8)  | \
   ((d) & 0xff))

+/** Internet header length mask for version_ihl field */
+#define IPV4_HDR_IHL_MASK  (0x0f)
+/**
+ * Internet header length field multiplier (IHL field specifies overall header
+ * length in number of 4-byte words)
+ */
+#define IPV4_IHL_MULTIPLIER(4)
+
 /* Fragment Offset * Flags. */
 #defineIPV4_HDR_DF_SHIFT   14
 #defineIPV4_HDR_MF_SHIFT   13
diff --git a/lib/librte_pmd_bond/rte_eth_bond_api.c 
b/lib/librte_pmd_bond/rte_eth_bond_api.c
index c2a99a3..4ab3267 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_api.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_api.c
@@ -272,6 +272,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
internals->mode = BONDING_MODE_INVALID;
internals->current_primary_port = 0;
internals->balance_xmit_policy = BALANCE_XMIT_POLICY_LAYER2;
+   internals->xmit_hash = xmit_l2_hash;
internals->user_defined_mac = 0;
internals->link_props_set = 0;

@@ -714,9 +715,16 @@ rte_eth_bond_xmit_policy_set(uint8_t bonded_port_id, 
uint8_t policy)

switch (policy) {
case BALANCE_XMIT_POLICY_LAYER2:
+   internals->balance_xmit_policy = policy;
+   internals->xmit_hash = xmit_l2_hash;
+   break;
case BALANCE_XMIT_POLICY_LAYER23:
+   internals->balance_xmit_policy = policy;
+   internals->xmit_hash = xmit_l23_hash;
+   break;
case BALANCE_XMIT_POLICY_LAYER34:
internals->balance_xmit_policy = policy;
+   internals->xmit_hash = xmit_l34_hash;
break;

default:
diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c 
b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
index bb4a537..e9cec2a 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
@@ -31,6 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -48,6 +50,9 @@
 #include "rte_eth_bond_8023ad_private.h"

 #define REORDER_PERIOD_MS 10
+
+#define HASH_L4_PORTS(h) ((h)->src_port ^ (h)->dst_port)
+
 /* Table for statistics in mode 5 TLB */
 static uint64_t tlb_last_obytets[RTE_MAX_ETHPORTS];

@@ -276,90 +281,105 @@ ipv6_hash(struct ipv6_hdr *ipv6_hdr)
(word_src_addr[3] ^ word_dst_addr[3]);
 }

-static uint32_t
-udp_hash(struct udp_hdr *hdr)
+static inline size_t
+get_vlan_offset(struct ether_hdr *eth_hdr)
 {
-   return hdr->src_port ^ hdr->dst_port;
+   size_t vlan_offset = 0;
+
+   /* Calculate VLAN offset */
+   if (rte_cpu_to_be_16(ETHER_TYPE_VLAN) == eth_hdr->ether_type) {
+   struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+   vlan_offset = sizeof(struct vlan_hdr);
+
+   while (rte_cpu_to_be_16(ETHER_TYPE_VLAN) ==
+   vlan_hdr->eth_proto) {
+   

[dpdk-dev] [PATCH v7 07/26] vfio: avoid enabling while the module is not loaded

2015-01-21 Thread Neil Horman
From: Michael Qiu 

When vfio module is not loaded when kernel support vfio feature,
the routine still try to open the container to get file
description.

This action is not safe, and of course got error messages:

EAL: Detected 40 lcore(s)
EAL:   unsupported IOMMU type!
EAL: VFIO support could not be initialized
EAL: Setting up memory...

This may make user confuse, this patch make it reasonable
and much more smooth to user.

Signed-off-by: Michael Qiu 
Acked-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_private.h| 14 ++
 lib/librte_eal/linuxapp/eal/eal.c  | 27 +++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 24 ++--
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 232fcec..159cd66 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -203,4 +203,18 @@ int rte_eal_alarm_init(void);
  */
 int rte_eal_dev_init(void);

+/**
+ * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
+ * etc.) loaded.
+ *
+ * @param module_name
+ * The module's name which need to be checked
+ *
+ * @return
+ * -1 means some error happens(NULL pointer or open failure)
+ * 0  means the module not loaded
+ * 1  means the module loaded
+ */
+int rte_eal_check_module(const char *module_name);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 2fb1acc..648ef81 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -859,3 +859,30 @@ int rte_eal_has_hugepages(void)
 {
return ! internal_config.no_hugetlbfs;
 }
+
+int
+rte_eal_check_module(const char *module_name)
+{
+   char mod_name[30]; /* Any module names can be longer than 30 bytes? */
+   int ret = 0;
+
+   if (NULL == module_name)
+   return -1;
+
+   FILE *fd = fopen("/proc/modules", "r");
+   if (NULL == fd) {
+   RTE_LOG(ERR, EAL, "Open /proc/modules failed!"
+   " error %i (%s)\n", errno, strerror(errno));
+   return -1;
+   }
+   while (!feof(fd)) {
+   fscanf(fd, "%29s %*[^\n]", mod_name);
+   if (!strcmp(mod_name, module_name)) {
+   ret = 1;
+   break;
+   }
+   }
+   fclose(fd);
+
+   return ret;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c1246e8..20e0977 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "eal_filesystem.h"
 #include "eal_pci_init.h"
@@ -340,9 +341,11 @@ pci_vfio_get_container_fd(void)
if (ret != 1) {
if (ret < 0)
RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
-   "error %i (%s)\n", errno, 
strerror(errno));
+   "error %i (%s)\n", errno,
+   strerror(errno));
else
-   RTE_LOG(ERR, EAL, "  unsupported IOMMU 
type!\n");
+   RTE_LOG(ERR, EAL, "  unsupported IOMMU type "
+   "detected in VFIO\n");
close(vfio_container_fd);
return -1;
}
@@ -783,11 +786,28 @@ pci_vfio_enable(void)
 {
/* initialize group list */
int i;
+   int module_vfio_type1;

for (i = 0; i < VFIO_MAX_GROUPS; i++) {
vfio_cfg.vfio_groups[i].fd = -1;
vfio_cfg.vfio_groups[i].group_no = -1;
}
+
+   module_vfio_type1 = rte_eal_check_module("vfio_iommu_type1");
+
+   /* return error directly */
+   if (module_vfio_type1 == -1) {
+   RTE_LOG(INFO, EAL, "Could not get loaded module details!\n");
+   return -1;
+   }
+
+   /* return 0 if VFIO modules not loaded */
+   if (module_vfio_type1 == 0) {
+   RTE_LOG(INFO, EAL, "VFIO modules not all loaded, "
+   "skip VFIO support...\n");
+   return 0;
+   }
+
vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();

/* check if we have VFIO driver enabled */
-- 
2.1.0



[dpdk-dev] [PATCH v7 06/26] log: remove unnecessary stubs

2015-01-21 Thread Neil Horman
From: Stephen Hemminger 

The read/seek/close stub functions are unnecessary on the
log stream.  Per glibc fopencookie man page:

   cookie_read_function_t *read
  If *read is a null pointer, then reads from  the  custom  stream
  always return end of file.

   cookie_seek_function_t *seek
  If *seek is a null pointer, then it is not possible  to  perform
  seek operations on the stream.

   cookie_close_function_t *close
  If  *close is NULL, then no special action is performed when the
  stream is closed.

Signed-off-by: Stephen Hemminger 
Acked-by: Thomas Monjalon 
---
 lib/librte_eal/linuxapp/eal/eal_log.c | 50 ---
 1 file changed, 50 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_log.c 
b/lib/librte_eal/linuxapp/eal/eal_log.c
index 94dedfb..a2d9056 100644
--- a/lib/librte_eal/linuxapp/eal/eal_log.c
+++ b/lib/librte_eal/linuxapp/eal/eal_log.c
@@ -83,33 +83,8 @@ console_log_write(__attribute__((unused)) void *c, const 
char *buf, size_t size)
return ret;
 }

-static ssize_t
-console_log_read(__attribute__((unused)) void *c,
-__attribute__((unused)) char *buf,
-__attribute__((unused)) size_t size)
-{
-   return 0;
-}
-
-static int
-console_log_seek(__attribute__((unused)) void *c,
-__attribute__((unused)) off64_t *offset,
-__attribute__((unused)) int whence)
-{
-   return -1;
-}
-
-static int
-console_log_close(__attribute__((unused)) void *c)
-{
-   return 0;
-}
-
 static cookie_io_functions_t console_log_func = {
-   .read  = console_log_read,
.write = console_log_write,
-   .seek  = console_log_seek,
-   .close = console_log_close
 };

 /*
@@ -150,33 +125,8 @@ early_log_write(__attribute__((unused)) void *c, const 
char *buf, size_t size)
return ret;
 }

-static ssize_t
-early_log_read(__attribute__((unused)) void *c,
-  __attribute__((unused)) char *buf,
-  __attribute__((unused)) size_t size)
-{
-   return 0;
-}
-
-static int
-early_log_seek(__attribute__((unused)) void *c,
-  __attribute__((unused)) off64_t *offset,
-  __attribute__((unused)) int whence)
-{
-   return -1;
-}
-
-static int
-early_log_close(__attribute__((unused)) void *c)
-{
-   return 0;
-}
-
 static cookie_io_functions_t early_log_func = {
-   .read  = early_log_read,
.write = early_log_write,
-   .seek  = early_log_seek,
-   .close = early_log_close
 };
 static FILE *early_log_stream;

-- 
2.1.0



[dpdk-dev] [PATCH v7 05/26] mem: search only dpdk hugetlbfs maps

2015-01-21 Thread Neil Horman
From: Vlad Zolotarov 

When scanning the hugetlbfs maps search only for the DPDK maps.
This will allow the application create its own hugetlbfs mappings
and use the DPDK facilities on the same hugetlbfs mount point.

Signed-off-by: Vlad Zolotarov 
Acked-by: Thomas Monjalon 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index bae2507..a67a1b0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -611,7 +611,7 @@ find_numasocket(struct hugepage_file *hugepg_tbl, struct 
hugepage_info *hpi)
}

snprintf(hugedir_str, sizeof(hugedir_str),
-   "%s/", hpi->hugedir);
+   "%s/%s", hpi->hugedir, internal_config.hugefile_prefix);

/* parse numa map */
while (fgets(buf, sizeof(buf), f) != NULL) {
-- 
2.1.0



[dpdk-dev] [PATCH v7 04/26] ethdev: fix missing parenthesis in mac check

2015-01-21 Thread Neil Horman
From: Pawel Wodkowski 

Fix check introduced in commit 4bdefaade6d1 (VMDQ enhancements).

Signed-off-by: Pawel Wodkowski 
Acked-by: Thomas Monjalon 
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 95f2ceb..ff26bd0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -814,7 +814,7 @@ rte_eth_dev_config_restore(uint8_t port_id)

/* add address to the hardware */
if  (*dev->dev_ops->mac_addr_add &&
-   dev->data->mac_pool_sel[i] & (1ULL << pool))
+   (dev->data->mac_pool_sel[i] & (1ULL << pool)))
(*dev->dev_ops->mac_addr_add)(dev, , i, pool);
else {
PMD_DEBUG_TRACE("port %d: MAC address array not 
supported\n",
-- 
2.1.0



[dpdk-dev] [PATCH v7 03/26] eal: fix check for power of 2 in 0 case

2015-01-21 Thread Neil Horman
From: Ravi Kerur 

rte_is_power_of_2 returns true for 0 and 0 is not power_of_2.
Fix by checking for n.

Signed-off-by: Ravi Kerur 
Acked-by: Neil Horman 
---
 lib/librte_eal/common/include/rte_common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_common.h 
b/lib/librte_eal/common/include/rte_common.h
index 921b91f..8ac940c 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -203,7 +203,7 @@ extern int RTE_BUILD_BUG_ON_detected_error;
 static inline int
 rte_is_power_of_2(uint32_t n)
 {
-   return ((n-1) & n) == 0;
+   return n && !(n & (n - 1));
 }

 /**
-- 
2.1.0



[dpdk-dev] [PATCH v7 02/26] mk: fix link to static combined library

2015-01-21 Thread Neil Horman
When building static archives with CONFIG_COMBINED_LIBS, we still need to
specify --whole-archive to pull in all the proper constructors.

Signed-off-by: Neil Horman 
Reported-by: Lyn M 
Tested-by: Lyn M 
Acked-by: Thomas Monjalon 
---
 mk/rte.app.mk | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index e1a0dbf..40afb2c 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -61,6 +61,10 @@ ifeq ($(NO_AUTOLIBS),)

 LDLIBS += --whole-archive

+ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),y)
+LDLIBS += -l$(RTE_LIBNAME)
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n)

 ifeq ($(CONFIG_RTE_LIBRTE_DISTRIBUTOR),y)
@@ -251,10 +255,6 @@ build: _postbuild

 exe2cmd = $(strip $(call dotfile,$(patsubst %,%.cmd,$(1

-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),y)
-LDLIBS += -l$(RTE_LIBNAME)
-endif
-
 ifeq ($(LINK_USING_CC),1)
 override EXTRA_LDFLAGS := $(call linkerprefix,$(EXTRA_LDFLAGS))
 O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
-- 
2.1.0



[dpdk-dev] [PATCH v7 01/26] version: 2.0.0-rc0

2015-01-21 Thread Neil Horman
From: Thomas Monjalon 

Signed-off-by: Thomas Monjalon 
---
 lib/librte_eal/common/include/rte_version.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_version.h 
b/lib/librte_eal/common/include/rte_version.h
index d2686ae..a267040 100644
--- a/lib/librte_eal/common/include/rte_version.h
+++ b/lib/librte_eal/common/include/rte_version.h
@@ -54,12 +54,12 @@ extern "C" {
 /**
  * Major version number i.e. the x in x.y.z
  */
-#define RTE_VER_MAJOR 1
+#define RTE_VER_MAJOR 2

 /**
  * Minor version number i.e. the y in x.y.z
  */
-#define RTE_VER_MINOR 8
+#define RTE_VER_MINOR 0

 /**
  * Patch level number i.e. the z in x.y.z
@@ -69,14 +69,14 @@ extern "C" {
 /**
  * Extra string to be appended to version number
  */
-#define RTE_VER_SUFFIX ""
+#define RTE_VER_SUFFIX "-rc"

 /**
  * Patch release number
  *   0-15 = release candidates
  *   16   = release
  */
-#define RTE_VER_PATCH_RELEASE 16
+#define RTE_VER_PATCH_RELEASE 0

 /**
  * Macro to compute a version number usable for comparisons
-- 
2.1.0



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Neil Horman
On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
> On Wed, 21 Jan 2015 13:26:20 +
> Bruce Richardson  wrote:
> 
> > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> > > 
> > > On 21/01/15 14:02, Bruce Richardson wrote:
> > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
> > > -Original Message-
> > > From: Richardson, Bruce
> > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > To: Neil Horman
> > > Cc: Wang, Zhihong; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > 
> > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > >>>-Original Message-
> > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > >>>Sent: Monday, January 19, 2015 9:02 PM
> > > >>>To: Wang, Zhihong
> > > >>>Cc: dev at dpdk.org
> > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > >>>
> > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
> > > wrote:
> > > This patch set optimizes memcpy for DPDK for both SSE and AVX
> > > platforms.
> > > It also extends memcpy test coverage with unaligned cases and
> > > more test
> > > >>>points.
> > > Optimization techniques are summarized below:
> > > 
> > > 1. Utilize full cache bandwidth
> > > 
> > > 2. Enforce aligned stores
> > > 
> > > 3. Apply load address alignment based on architecture features
> > > 
> > > 4. Make load/store address available as early as possible
> > > 
> > > 5. General optimization techniques like inlining, branch
> > > reducing, prefetch pattern access
> > > 
> > > Zhihong Wang (4):
> > >    Disabled VTA for memcpy test in app/test/Makefile
> > >    Removed unnecessary test cases in test_memcpy.c
> > >    Extended test coverage in test_memcpy_perf.c
> > >    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
> > >  platforms
> > > 
> > >   app/test/Makefile  |   6 +
> > >   app/test/test_memcpy.c |  52 +-
> > >   app/test/test_memcpy_perf.c| 238 
> > >  +---
> > >   .../common/include/arch/x86/rte_memcpy.h   | 664
> > > >>>+++--
> > >   4 files changed, 656 insertions(+), 304 deletions(-)
> > > 
> > > --
> > > 1.9.3
> > > 
> > > 
> > > >>>Are you able to compile this with gcc 4.9.2?  The compilation of
> > > >>>test_memcpy_perf is taking forever for me.  It appears hung.
> > > >>>Neil
> > > >>Neil,
> > > >>
> > > >>Thanks for reporting this!
> > > >>It should compile but will take quite some time if the CPU doesn't 
> > > >>support
> > > AVX2, the reason is that:
> > > >>1. The SSE & AVX memcpy implementation is more complicated than
> > > AVX2
> > > >>version thus the compiler takes more time to compile and optimize 2.
> > > >>The new test_memcpy_perf.c contains 126 constants memcpy calls for
> > > >>better test case coverage, that's quite a lot
> > > >>
> > > >>I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
> > > >>1. The whole compile process takes 9'41" with the original
> > > >>test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
> > > >>only 2'41" after I reduce the constant memcpy call number to 12 + 12
> > > >>= 24
> > > >>
> > > >>I'll reduce memcpy call in the next version of patch.
> > > >>
> > > >ok, thank you.  I'm all for optimzation, but I think a compile that
> > > >takes almost
> > > >10 minutes for a single file is going to generate some raised 
> > > >eyebrows
> > > >when end users start tinkering with it
> > > >
> > > >Neil
> > > >
> > > >>Zhihong (John)
> > > >>
> > > Even two minutes is a very long time to compile, IMHO. The whole of 
> > > DPDK
> > > doesn't take that long to compile right now, and that's with a couple 
> > > of huge
> > > header files with routing tables in it. Any chance you could cut 
> > > compile time
> > > down to a few seconds while still having reasonable tests?
> > > Also, when there is AVX2 present on the system, what is the compile 
> > > time
> > > like for that code?
> > > 
> > >   /Bruce
> > > >>>Neil, Bruce,
> > > >>>
> > > >>>Some data first.
> > > >>>
> > > >>>Sandy Bridge without AVX2:
> > > >>>1. original w/ 10 constant memcpy: 2'25"
> > > >>>2. patch w/ 12 constant memcpy: 2'41"
> > > >>>3. patch w/ 63 constant memcpy: 9'41"
> > > 

[dpdk-dev] [PATCH v4 10/11] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-01-21 Thread Tetsuya Mukawa
Hi Michael,

On 2015/01/21 12:49, Qiu, Michael wrote:
> On 1/19/2015 6:43 PM, Tetsuya Mukawa wrote:
>> These functions are used for attaching or detaching a port.
>> When rte_eal_dev_attach() is called, the function tries to realize the
>> device name as pci address. If this is done successfully,
>> rte_eal_dev_attach() will attach physical device port. If not, attaches
>> virtual devive port.
>> When rte_eal_dev_detach() is called, the function gets the device type
>> of this port to know whether the port is came from physical or virtual.
>> And then specific detaching function will be called.
>>
>> v4:
>> - Fix comment.
>> - Add error checking.
>> - Fix indent of 'if' statement.
>> - Change function name.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/common/eal_common_dev.c  | 273 
>> 
>>  lib/librte_eal/common/eal_private.h |  11 ++
>>  lib/librte_eal/common/include/rte_dev.h |  33 
>>  lib/librte_eal/linuxapp/eal/Makefile|   1 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   6 +-
>>  5 files changed, 321 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_dev.c 
>> b/lib/librte_eal/common/eal_common_dev.c
>> index eae5656..828bd70 100644
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -32,10 +32,13 @@
>>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>   */
>>  
>> +#include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -107,3 +110,273 @@ rte_eal_dev_init(void)
>>  }
>>  return 0;
>>  }
>> +
>> +/* So far, DPDK hotplug function only supports linux */
>> +#ifdef ENABLE_HOTPLUG
>> +static void
>> +rte_eal_dev_invoke(struct rte_driver *driver,
>> +struct rte_devargs *devargs, enum rte_eal_invoke_type type)
>> +{
>> +if ((driver == NULL) || (devargs == NULL))
>> +return;
>> +
>> +switch (type) {
>> +case RTE_EAL_INVOKE_TYPE_PROBE:
>> +driver->init(devargs->virtual.drv_name, devargs->args);
>> +break;
>> +case RTE_EAL_INVOKE_TYPE_CLOSE:
>> +driver->uninit(devargs->virtual.drv_name, devargs->args);
>> +break;
>> +default:
>> +break;
>> +}
>> +}
>> +
>> +static int
>> +rte_eal_dev_find_and_invoke(const char *name, int type)
> This function is totally for vdev, so I would like it shows in name,
> like *rte_eal_vdev_find_and_invoke*

Sure, I will change like above. I appreciate your suggestion.

Thanks,
Tetsuya

>> +{
>> +struct rte_devargs *devargs;
>> +struct rte_driver *driver;
>> +
>> +if (name == NULL)
>> +return -EINVAL;
>> +
>> +/* call the init function for each virtual device */
>> +TAILQ_FOREACH(devargs, _list, next) {
>> +
>> +if (devargs->type != RTE_DEVTYPE_VIRTUAL)
>> +continue;
>> +
>> +if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
>> +continue;
>> +
>> +TAILQ_FOREACH(driver, _driver_list, next) {
>> +if (driver->type != PMD_VDEV)
>> +continue;
>> +
>> +/* search a driver prefix in virtual device name */
>> +if (!strncmp(driver->name, devargs->virtual.drv_name,
>> +strlen(driver->name))) {
>> +rte_eal_dev_invoke(driver, devargs, type);
>> +break;
>> +}
>> +}
>> +
>> +if (driver == NULL) {
>> +RTE_LOG(WARNING, EAL, "no driver found for %s\n",
>> +  devargs->virtual.drv_name);
>> +}
>> +return 0;
>> +}
>> +return 1;
>> +}
>> +
>> +/* attach the new physical device, then store port_id of the device */
>> +static int
>> +rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
>> +{
>> +uint8_t new_port_id;
>> +struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
>> +
>> +if ((addr == NULL) || (port_id == NULL))
>> +goto err;
>> +
>> +/* save current port status */
>> +rte_eth_dev_save(devs);
>> +/* re-construct pci_device_list */
>> +if (rte_eal_pci_scan())
>> +goto err;
>> +/* invoke probe func of the driver can handle the new device */
>> +if (rte_eal_pci_probe_one(addr))
>> +goto err;
>> +/* get port_id enabled by above procedures */
>> +if (rte_eth_dev_get_changed_port(devs, _port_id))
>> +goto err;
>> +
>> +*port_id = new_port_id;
>> +return 0;
>> +err:
>> +RTE_LOG(ERR, EAL, "Drver, cannot attach the device\n");
>> +return -1;
>> +}
>> +
>> +/* detach the new physical device, then store pci_addr of the device */
>> +static int
>> +rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
>> +{
>> +struct 

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Jim Thompson

I?m not as concerned with compile times given the potential performance boost.

A long time ago (mid-80s) I was at Convex, and wanted to do a vector bcopy(), 
because it would make the I/O system (mostly disk then (*)) go faster.
The architect explained to me that the vector registers were for applications, 
not the kernel (as well as re-explaining the expense of vector context
switches, should the kernel be using the vector unit(s) and some application 
also wanted to use them.  

The same is true today of AVX/AVX2, SSE, and even the AES-NI instructions.  
Normally we don?t use these in kernel code (which is traditionally where
the networking stack has lived).   

The differences with DPDK are that a) entire cores (including the AVX/SSE units 
and even AES-NI (FPU) are dedicated to DPDK, and b) DPDK is a library,
and the resulting networking applications are exactly that, applications.  The 
"operating system? is now a control plane.

Jim

(* Back then it was commonly thought that TCP would never be able to fill a 
10Gbps Ethernet.)

> On Jan 21, 2015, at 2:54 PM, Neil Horman  wrote:
> 
> On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
>> On Wed, 21 Jan 2015 13:26:20 +
>> Bruce Richardson  wrote:
>> 
>>> On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
 
 On 21/01/15 14:02, Bruce Richardson wrote:
> On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
>> On 21/01/15 04:44, Wang, Zhihong wrote:
 -Original Message-
 From: Richardson, Bruce
 Sent: Wednesday, January 21, 2015 12:15 AM
 To: Neil Horman
 Cc: Wang, Zhihong; dev at dpdk.org
 Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
 
 On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
>>> -Original Message-
>>> From: Neil Horman [mailto:nhorman at tuxdriver.com]
>>> Sent: Monday, January 19, 2015 9:02 PM
>>> To: Wang, Zhihong
>>> Cc: dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>>> 
>>> On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
 wrote:
 This patch set optimizes memcpy for DPDK for both SSE and AVX
 platforms.
 It also extends memcpy test coverage with unaligned cases and
 more test
>>> points.
 Optimization techniques are summarized below:
 
 1. Utilize full cache bandwidth
 
 2. Enforce aligned stores
 
 3. Apply load address alignment based on architecture features
 
 4. Make load/store address available as early as possible
 
 5. General optimization techniques like inlining, branch
 reducing, prefetch pattern access
 
 Zhihong Wang (4):
  Disabled VTA for memcpy test in app/test/Makefile
  Removed unnecessary test cases in test_memcpy.c
  Extended test coverage in test_memcpy_perf.c
  Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
platforms
 
 app/test/Makefile  |   6 +
 app/test/test_memcpy.c |  52 +-
 app/test/test_memcpy_perf.c| 238 +---
 .../common/include/arch/x86/rte_memcpy.h   | 664
>>> +++--
 4 files changed, 656 insertions(+), 304 deletions(-)
 
 --
 1.9.3
 
 
>>> Are you able to compile this with gcc 4.9.2?  The compilation of
>>> test_memcpy_perf is taking forever for me.  It appears hung.
>>> Neil
>> Neil,
>> 
>> Thanks for reporting this!
>> It should compile but will take quite some time if the CPU doesn't 
>> support
 AVX2, the reason is that:
>> 1. The SSE & AVX memcpy implementation is more complicated than
 AVX2
>> version thus the compiler takes more time to compile and optimize 2.
>> The new test_memcpy_perf.c contains 126 constants memcpy calls for
>> better test case coverage, that's quite a lot
>> 
>> I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
>> 1. The whole compile process takes 9'41" with the original
>> test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
>> only 2'41" after I reduce the constant memcpy call number to 12 + 12
>> = 24
>> 
>> I'll reduce memcpy call in the next version of patch.
>> 
> ok, thank you.  I'm all for optimzation, but I think a compile that
> takes almost

[dpdk-dev] New to DPDK

2015-01-21 Thread Ravi Rao
Hi,
Thanks for the inputs. I did download dpdk-1.8 and used your latest 
packet-gen-2.8.0 but I am getting this crash.
Looks like I am missing or not specifying the correct arguments. Can you 
please help ?
./app/pktgen -c 03 -n 1 --proc-type auto --socket-mem 128 --file-prefix 
pg -- -T -P -m "1.0, 2.1" -f themes/black-yellow.theme

!PANIC!: Cannot configure device: port=0, Num queues 2,2 (2)Invalid argument
PANIC in pktgen_config_ports():
Cannot configure device: port=0, Num queues 2,2 (2)Invalid argument6: 
[./app/pktgen() [0x4268c3]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) 
[0x7f6959e0dec5]]
4: [./app/pktgen(main+0x47c) [0x4262ac]]
3: [./app/pktgen(pktgen_config_ports+0x16ab) [0x43e09b]]
2: [./app/pktgen(__rte_panic+0xc1) [0x425d20]]
1: [./app/pktgen(rte_dump_stack+0x18) [0x4bce48]]
./doit.sh: line 53: 21912 Aborted (core dumped) 
./app/pktgen -c 0f -n 3 --proc-type auto --socket-mem 128 --file-prefix 
pg -- -T -P -m "1.0, 2.1" -f themes/black-yellow.theme
labadmin at Openstack:~/dpdk-1.8.0/pktgen-2.8.0$ sudo gdb ./app/pktgen 
-core core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./app/pktgen...done.
[New LWP 21912]
[New LWP 21924]
[New LWP 21925]
[New LWP 21923]
[New LWP 21922]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./app/pktgen -c 0f -n 3 --proc-type auto 
--socket-mem 128 --file-prefix pg -- -'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f6959e22cc9 in __GI_raise (sig=sig at entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f6959e22cc9 in __GI_raise (sig=sig at entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f6959e260d8 in __GI_abort () at abort.c:89
#2  0x00425d25 in __rte_panic ()
#3  0x0043e09b in pktgen_config_ports () at pktgen-port-cfg.c:275
#4  0x004262ac in main (argc=7, argv=0x7fff12510090) at 
pktgen-main.c:420
(gdb) info shared
>FromTo  Syms Read   Shared Object Library
0x7f695aae6c20  0x7f695ab050fa  Yes (*) 
/usr/lib/x86_64-linux-gnu/libpcap.so.0.8
0x7f695a7e1610  0x7f695a8501b6  Yes /lib/x86_64-linux-gnu/libm.so.6
0x7f695a5d6350  0x7f695a5d933c  Yes /lib/x86_64-linux-gnu/librt.so.1
0x7f695a3d0ed0  0x7f695a3d19ce  Yes /lib/x86_64-linux-gnu/libdl.so.2
0x7f695a1b79f0  0x7f695a1c44a1  Yes 
/lib/x86_64-linux-gnu/libpthread.so.0
0x7f6959e0b4a0  0x7f6959f51113  Yes /lib/x86_64-linux-gnu/libc.so.6
0x7f695ad20ae0  0x7f695ad3b4e0  Yes /lib64/ld-linux-x86-64.so.2
0x7f6959bd8ab0  0x7f6959be8995  Yes (*) 
/lib/x86_64-linux-gnu/libgcc_s.so.1
(*): Shared library is missing debugging information.
(gdb)

Regards,
Ravi
On 01/14/2015 02:26 PM, Wiles, Keith wrote:
> BTW, you may want to check out the Pktgen-DPDK (I wrote) to give you a
> simple starting point or at least a traffic generator like system.
>
> http://dpdk.org/browse/apps/pktgen-dpdk/refs/
>
>
> On 1/14/15, 1:54 PM, "Ravi Rao"  wrote:
>
>> Thanks a lot for the quick response.
>> On 01/14/2015 01:27 PM, Wiles, Keith wrote:
>>> Most people I guess use a Xeon CPU class MB with one or two sockets
>>> running Linux with a supported NICs. I use a motherboard like the one
>>> below running Ubuntu 12.04 with 12G RAM and the 82599 NICs. You can find
>>> more supported NICs in the documentation and you need find the rest of
>>> the
>>> parts :-) You do not need much disk space I have a 500G disk and you can
>>> use less memory, but that is something you need to decide on.
>>>
>>>
>>> http://www.intel.com/content/www/us/en/motherboards/server-motherboards/s
>>> er
>>> ver-board-w2600cr.html
>>>
>>>
>>> On 1/14/15, 12:33 PM, "Ravi Rao"  wrote:
>>>
 Hi All,
  I am a newbee to DPDK. Can one of you please let me know if there
 is
 any reference board that is available which I can use to build and
 tryout the dpdk stuff on.
 Regards,
 Ravi



[dpdk-dev] DPDK - TX from lcore in packet distributor configuration

2015-01-21 Thread deco33000 Jog
Hello,

-- PROBLEM
I have a AF_PACKET socket which is in promiscuous mode to get all the NIC 
traffic and let my apps do the whole stuff.

So I have one receiver and need to communicate the packet to different 
threads/processes (lcore) so that they can process the rest of the packet 
(tcp/udp...)

-- QUESTION
I read about the packet distributor architecture which seems to answer that 
need.
http://dpdk.org/doc/guides/prog_guide/packet_distrib_lib.html

BUT i fear that it be slow at resending the packet to the distributor which may 
already be overloaded by inputs from the net. Why pass back the answer to the 
distributor if the lcore could send to the wire directly ?

My problem is going back to the distributor after the packet processing. i 
would a direct send to the tx ring.

Is it possible ? How ?  By passing the TX pointer to the lcore ?


[dpdk-dev] DPDK Community Call, Monday 2nd February, 17:00 GMT

2015-01-21 Thread O'driscoll, Tim
> From: O'driscoll, Tim
> 
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> >
> > On Tue, 20 Jan 2015 15:21:40 +
> > "O'driscoll, Tim"  wrote:
> >
> > > We had our last community call in December, and then took a break over
> > the holiday period. I think we should reinstate these, so I've scheduled the
> > next one for Monday 2nd February. Since our last call was at a time
> > convenient for Asia, this one is at a time that's more convenient for people
> > based in the USA. As for previous calls, I'll post a recording to youtube
> > afterwards for anybody who can't make it.
> > >
> > > I don't have an agenda yet, but will send one out in advance of the
> > meeting.
> >
> > This is right after FOSDEM and many people will be returning home.
> 
> Good point. Thanks for pointing this out. I'll move the meeting to avoid this
> clash.

To avoid the clash, I've moved this to Thursday 5th Feb, at the same time 
(17:00 GMT). Hopefully that's a more convenient time for most people.

Meeting time:
Dublin (Ireland), Thursday, February 5, 2015 at 5:00:00 PMGMT UTC   
 
San Francisco (U.S.A. - California), Thursday, February 5, 2015 at 9:00:00 AM   
 PST UTC-8 hours
Phoenix (U.S.A. - Arizona), Thursday, February 5, 2015 at 10:00:00 AM   MST 
UTC-7 hours
New York (U.S.A. - New York), Thursday, February 5, 2015 at 12:00:00 Noon EST 
UTC-5 hours
Ottawa (Canada - Ontario), Thursday, February 5, 2015 at 12:00:00 Noon EST 
UTC-5 hours
Paris (France), Thursday, February 5, 2015 at 6:00:00 PMCET UTC+1 hour 
Tel Aviv (Israel), Thursday, February 5, 2015 at 7:00:00 PMIST UTC+2 hours  
  
Moscow (Russia), Thursday, February 5, 2015 at 8:00:00 PMMSK UTC+3 hours
New Delhi (India - Delhi), Thursday, February 5, 2015 at 10:30:00 PM   IST 
UTC+5:30 hours 
Shanghai (China - Shanghai Municipality), Friday, February 6, 2015 at 1:00:00 
AM   CST UTC+8 hours
Tokyo (Japan), Friday, February 6, 2015 at 2:00:00 AM   JST UTC+9 hours
Corresponding UTC (GMT), Thursday, February 5, 2015 at 17:00:00

GoToMeeting Details:
To join, follow the meeting link: 
https://global.gotomeeting.com/join/557845085. This will start the GoToMeeting 
web viewer. You then have two options for audio:

1. To use your computer's audio via a headset, you need to switch to the 
desktop version of GoToMeeting. You can do this by clicking the GoToMeeting 
icon on the top right hand side of the web viewer, and then selecting "Switch 
to the desktop version". The desktop version will need to download and install, 
so if you plan to use this you may want to get it set up in advance. Once it 
starts, under the Audio section, you can select "Mic & Speakers". The desktop 
version is only available for Windows and Mac, so if you're using Linux then 
you need to use option 2 below.

2. You can join using a phone via one of the numbers listed below. The Access 
Code is 557-845-085. You'll also be asked for an Audio PIN, which is accessible 
by clicking the phone icon in the GoToMeeting web viewer after you've joined 
the meeting.
Canada +1 (647) 497-9391
France +33 (0) 170 950 593
Ireland +353 (0) 15 290 180
United Kingdom +44 (0) 20 3713 5028
United States +1 (646) 982-0002
More phone numbers: https://global.gotomeeting.com/557845085/numbersdisplay.html

Info on downloading the desktop app is available at: 
http://support.citrixonline.com/en_US/meeting/help_files/G2M010002?title=Download%7D
Info on the web viewer is available at: 
http://support.citrixonline.com/en_US/GoToMeeting/help_files/GTM130019?title=Web+Viewer+FAQs


Thanks,
Tim



[dpdk-dev] [PATCH] stats: remove useless memset's

2015-01-21 Thread David Marchand
Hello Stephen,

On Wed, Jan 21, 2015 at 5:16 AM,  wrote:

> From: Stephen Hemminger 
>
> The rte_eth_stats_get is the only API that should call the device
> statistics function directly, and it already does a memset of the
> resulting structure. Therefore doing memset() in the driver is
> redundant and should be removed.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_pmd_af_packet/rte_eth_af_packet.c | 2 --
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c   | 4 
>  lib/librte_pmd_enic/enic_main.c  | 1 -
>  lib/librte_pmd_i40e/i40e_ethdev_vf.c | 1 -
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c  | 1 -
>  lib/librte_pmd_ring/rte_eth_ring.c   | 1 -
>  6 files changed, 10 deletions(-)
>

I think you missed some :
- lib/librte_pmd_e1000/igb_ethdev.c function eth_igbvf_stats_get()
- lib/librte_pmd_pcap/rte_eth_pcap.c function eth_stats_get()

With these fixed :
Acked-By: David Marchand 

-- 
David Marchand


[dpdk-dev] Segmentation fault in ixgbe_rxtx_vec.c:444 with 1.8.0

2015-01-21 Thread Bruce Richardson
On Tue, Jan 20, 2015 at 11:39:03AM +0100, Martin Weiser wrote:
> Hi again,
> 
> I did some further testing and it seems like this issue is linked to
> jumbo frames. I think a similar issue has already been reported by
> Prashant Upadhyaya with the subject 'Packet Rx issue with DPDK1.8'.
> In our application we use the following rxmode port configuration:
> 
> .mq_mode= ETH_MQ_RX_RSS,
> .split_hdr_size = 0,
> .header_split   = 0,
> .hw_ip_checksum = 1,
> .hw_vlan_filter = 0,
> .jumbo_frame= 1,
> .hw_strip_crc   = 1,
> .max_rx_pkt_len = 9000,
> 
> and the mbuf size is calculated like the following:
> 
> (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
> 
> This works fine with DPDK 1.7 and jumbo frames are split into buffer
> chains and can be forwarded on another port without a problem.
> With DPDK 1.8 and the default configuration (CONFIG_RTE_IXGBE_INC_VECTOR
> enabled) the application sometimes crashes like described in my first
> mail and sometimes packet receiving stops with subsequently arriving
> packets counted as rx errors. When CONFIG_RTE_IXGBE_INC_VECTOR is
> disabled the packet processing also comes to a halt as soon as jumbo
> frames arrive with a the slightly different effect that now
> rte_eth_tx_burst refuses to send any previously received packets.
> 
> Is there anything special to consider regarding jumbo frames when moving
> from DPDK 1.7 to 1.8 that we might have missed?
> 
> Martin
> 
> 
> 
> On 19.01.15 11:26, Martin Weiser wrote:
> > Hi everybody,
> >
> > we quite recently updated one of our applications to DPDK 1.8.0 and are
> > now seeing a segmentation fault in ixgbe_rxtx_vec.c:444 after a few minutes.
> > I just did some quick debugging and I only have a very limited
> > understanding of the code in question but it seems that the 'continue'
> > in line 445 without increasing 'buf_idx' might cause the problem. In one
> > debugging session when the crash occurred the value of 'buf_idx' was 2
> > and the value of 'pkt_idx' was 8965.
> > Any help with this issue would be greatly appreciated. If you need any
> > further information just let me know.
> >
> > Martin
> >
> >
> 
Hi Martin, Prashant,

I've managed to reproduce the issue here and had a look at it. Could you
both perhaps try the proposed change below and see if it fixes the problem for
you and gives you a working system? If so, I'll submit this as a patch fix 
officially - or go back to the drawing board, if not. :-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
index b54cb19..dfaccee 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
@@ -402,10 +402,10 @@ reassemble_packets(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_bufs,
struct rte_mbuf *pkts[RTE_IXGBE_VPMD_RX_BURST]; /*finished pkts*/
struct rte_mbuf *start = rxq->pkt_first_seg;
struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned pkt_idx = 0, buf_idx = 0;
+   unsigned pkt_idx, buf_idx;


-   while (buf_idx < nb_bufs) {
+   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
if (end != NULL) {
/* processing a split packet */
end->next = rx_bufs[buf_idx];
@@ -448,7 +448,6 @@ reassemble_packets(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_bufs,
rx_bufs[buf_idx]->data_len += rxq->crc_len;
rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
}
-   buf_idx++;
}

/* save the partial packet for next time */


Regards,
/Bruce



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Marc Sune

On 21/01/15 04:44, Wang, Zhihong wrote:
>
>> -Original Message-
>> From: Richardson, Bruce
>> Sent: Wednesday, January 21, 2015 12:15 AM
>> To: Neil Horman
>> Cc: Wang, Zhihong; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>>
>> On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
>>> On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Monday, January 19, 2015 9:02 PM
> To: Wang, Zhihong
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>
> On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
>> wrote:
>> This patch set optimizes memcpy for DPDK for both SSE and AVX
>> platforms.
>> It also extends memcpy test coverage with unaligned cases and
>> more test
> points.
>> Optimization techniques are summarized below:
>>
>> 1. Utilize full cache bandwidth
>>
>> 2. Enforce aligned stores
>>
>> 3. Apply load address alignment based on architecture features
>>
>> 4. Make load/store address available as early as possible
>>
>> 5. General optimization techniques like inlining, branch
>> reducing, prefetch pattern access
>>
>> Zhihong Wang (4):
>>Disabled VTA for memcpy test in app/test/Makefile
>>Removed unnecessary test cases in test_memcpy.c
>>Extended test coverage in test_memcpy_perf.c
>>Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
>>  platforms
>>
>>   app/test/Makefile  |   6 +
>>   app/test/test_memcpy.c |  52 +-
>>   app/test/test_memcpy_perf.c| 238 +---
>>   .../common/include/arch/x86/rte_memcpy.h   | 664
> +++--
>>   4 files changed, 656 insertions(+), 304 deletions(-)
>>
>> --
>> 1.9.3
>>
>>
> Are you able to compile this with gcc 4.9.2?  The compilation of
> test_memcpy_perf is taking forever for me.  It appears hung.
> Neil

 Neil,

 Thanks for reporting this!
 It should compile but will take quite some time if the CPU doesn't support
>> AVX2, the reason is that:
 1. The SSE & AVX memcpy implementation is more complicated than
>> AVX2
 version thus the compiler takes more time to compile and optimize 2.
 The new test_memcpy_perf.c contains 126 constants memcpy calls for
 better test case coverage, that's quite a lot

 I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
 1. The whole compile process takes 9'41" with the original
 test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
 only 2'41" after I reduce the constant memcpy call number to 12 + 12
 = 24

 I'll reduce memcpy call in the next version of patch.

>>> ok, thank you.  I'm all for optimzation, but I think a compile that
>>> takes almost
>>> 10 minutes for a single file is going to generate some raised eyebrows
>>> when end users start tinkering with it
>>>
>>> Neil
>>>
 Zhihong (John)

>> Even two minutes is a very long time to compile, IMHO. The whole of DPDK
>> doesn't take that long to compile right now, and that's with a couple of huge
>> header files with routing tables in it. Any chance you could cut compile time
>> down to a few seconds while still having reasonable tests?
>> Also, when there is AVX2 present on the system, what is the compile time
>> like for that code?
>>
>>  /Bruce
> Neil, Bruce,
>
> Some data first.
>
> Sandy Bridge without AVX2:
> 1. original w/ 10 constant memcpy: 2'25"
> 2. patch w/ 12 constant memcpy: 2'41"
> 3. patch w/ 63 constant memcpy: 9'41"
>
> Haswell with AVX2:
> 1. original w/ 10 constant memcpy: 1'57"
> 2. patch w/ 12 constant memcpy: 1'56"
> 3. patch w/ 63 constant memcpy: 3'16"
>
> Also, to address Bruce's question, we have to reduce test case to cut down 
> compile time. Because we use:
> 1. intrinsics instead of assembly for better flexibility and can utilize more 
> compiler optimization
> 2. complex function body for better performance
> 3. inlining
> This increases compile time.
> But I think it'd be okay to do that as long as we can select a fair set of 
> test points.
>
> It'd be great if you could give some suggestion, say, 12 points.
>
> Zhihong (John)
>
>

While I agree in the general case these long compilation times is 
painful for the users, having a factor of 2-8x in memcpy operations is 
quite an improvement, specially in DPDK applications which need to deal 
(unfortunately) heavily on them -- e.g. IP fragmentation and reassembly.

Why not having a fast compilation by default, and a tunable config flag 
to enable a highly optimized version of rte_memcpy (e.g. 
RTE_EAL_OPT_MEMCPY)?

Marc

>



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Bruce Richardson
On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
> 
> On 21/01/15 14:02, Bruce Richardson wrote:
> >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> >>On 21/01/15 04:44, Wang, Zhihong wrote:
> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, January 21, 2015 12:15 AM
> To: Neil Horman
> Cc: Wang, Zhihong; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> >>>-Original Message-
> >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
> >>>Sent: Monday, January 19, 2015 9:02 PM
> >>>To: Wang, Zhihong
> >>>Cc: dev at dpdk.org
> >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >>>
> >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
> wrote:
> This patch set optimizes memcpy for DPDK for both SSE and AVX
> platforms.
> It also extends memcpy test coverage with unaligned cases and
> more test
> >>>points.
> Optimization techniques are summarized below:
> 
> 1. Utilize full cache bandwidth
> 
> 2. Enforce aligned stores
> 
> 3. Apply load address alignment based on architecture features
> 
> 4. Make load/store address available as early as possible
> 
> 5. General optimization techniques like inlining, branch
> reducing, prefetch pattern access
> 
> Zhihong Wang (4):
>    Disabled VTA for memcpy test in app/test/Makefile
>    Removed unnecessary test cases in test_memcpy.c
>    Extended test coverage in test_memcpy_perf.c
>    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
>  platforms
> 
>   app/test/Makefile  |   6 +
>   app/test/test_memcpy.c |  52 +-
>   app/test/test_memcpy_perf.c| 238 +---
>   .../common/include/arch/x86/rte_memcpy.h   | 664
> >>>+++--
>   4 files changed, 656 insertions(+), 304 deletions(-)
> 
> --
> 1.9.3
> 
> 
> >>>Are you able to compile this with gcc 4.9.2?  The compilation of
> >>>test_memcpy_perf is taking forever for me.  It appears hung.
> >>>Neil
> >>Neil,
> >>
> >>Thanks for reporting this!
> >>It should compile but will take quite some time if the CPU doesn't 
> >>support
> AVX2, the reason is that:
> >>1. The SSE & AVX memcpy implementation is more complicated than
> AVX2
> >>version thus the compiler takes more time to compile and optimize 2.
> >>The new test_memcpy_perf.c contains 126 constants memcpy calls for
> >>better test case coverage, that's quite a lot
> >>
> >>I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
> >>1. The whole compile process takes 9'41" with the original
> >>test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
> >>only 2'41" after I reduce the constant memcpy call number to 12 + 12
> >>= 24
> >>
> >>I'll reduce memcpy call in the next version of patch.
> >>
> >ok, thank you.  I'm all for optimzation, but I think a compile that
> >takes almost
> >10 minutes for a single file is going to generate some raised eyebrows
> >when end users start tinkering with it
> >
> >Neil
> >
> >>Zhihong (John)
> >>
> Even two minutes is a very long time to compile, IMHO. The whole of DPDK
> doesn't take that long to compile right now, and that's with a couple of 
> huge
> header files with routing tables in it. Any chance you could cut compile 
> time
> down to a few seconds while still having reasonable tests?
> Also, when there is AVX2 present on the system, what is the compile time
> like for that code?
> 
>   /Bruce
> >>>Neil, Bruce,
> >>>
> >>>Some data first.
> >>>
> >>>Sandy Bridge without AVX2:
> >>>1. original w/ 10 constant memcpy: 2'25"
> >>>2. patch w/ 12 constant memcpy: 2'41"
> >>>3. patch w/ 63 constant memcpy: 9'41"
> >>>
> >>>Haswell with AVX2:
> >>>1. original w/ 10 constant memcpy: 1'57"
> >>>2. patch w/ 12 constant memcpy: 1'56"
> >>>3. patch w/ 63 constant memcpy: 3'16"
> >>>
> >>>Also, to address Bruce's question, we have to reduce test case to cut down 
> >>>compile time. Because we use:
> >>>1. intrinsics instead of assembly for better flexibility and can utilize 
> >>>more compiler optimization
> >>>2. complex function body for better performance
> >>>3. inlining
> >>>This increases compile time.
> >>>But I think it'd be okay to do that as long as we can select a fair set of 
> >>>test points.
> >>>
> >>>It'd be great if you 

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Bruce Richardson
On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
> 
> On 21/01/15 04:44, Wang, Zhihong wrote:
> >
> >>-Original Message-
> >>From: Richardson, Bruce
> >>Sent: Wednesday, January 21, 2015 12:15 AM
> >>To: Neil Horman
> >>Cc: Wang, Zhihong; dev at dpdk.org
> >>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >>
> >>On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> >>>On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> 
> >-Original Message-
> >From: Neil Horman [mailto:nhorman at tuxdriver.com]
> >Sent: Monday, January 19, 2015 9:02 PM
> >To: Wang, Zhihong
> >Cc: dev at dpdk.org
> >Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> >On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
> >>wrote:
> >>This patch set optimizes memcpy for DPDK for both SSE and AVX
> >>platforms.
> >>It also extends memcpy test coverage with unaligned cases and
> >>more test
> >points.
> >>Optimization techniques are summarized below:
> >>
> >>1. Utilize full cache bandwidth
> >>
> >>2. Enforce aligned stores
> >>
> >>3. Apply load address alignment based on architecture features
> >>
> >>4. Make load/store address available as early as possible
> >>
> >>5. General optimization techniques like inlining, branch
> >>reducing, prefetch pattern access
> >>
> >>Zhihong Wang (4):
> >>   Disabled VTA for memcpy test in app/test/Makefile
> >>   Removed unnecessary test cases in test_memcpy.c
> >>   Extended test coverage in test_memcpy_perf.c
> >>   Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
> >> platforms
> >>
> >>  app/test/Makefile  |   6 +
> >>  app/test/test_memcpy.c |  52 +-
> >>  app/test/test_memcpy_perf.c| 238 +---
> >>  .../common/include/arch/x86/rte_memcpy.h   | 664
> >+++--
> >>  4 files changed, 656 insertions(+), 304 deletions(-)
> >>
> >>--
> >>1.9.3
> >>
> >>
> >Are you able to compile this with gcc 4.9.2?  The compilation of
> >test_memcpy_perf is taking forever for me.  It appears hung.
> >Neil
> 
> Neil,
> 
> Thanks for reporting this!
> It should compile but will take quite some time if the CPU doesn't support
> >>AVX2, the reason is that:
> 1. The SSE & AVX memcpy implementation is more complicated than
> >>AVX2
> version thus the compiler takes more time to compile and optimize 2.
> The new test_memcpy_perf.c contains 126 constants memcpy calls for
> better test case coverage, that's quite a lot
> 
> I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
> 1. The whole compile process takes 9'41" with the original
> test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
> only 2'41" after I reduce the constant memcpy call number to 12 + 12
> = 24
> 
> I'll reduce memcpy call in the next version of patch.
> 
> >>>ok, thank you.  I'm all for optimzation, but I think a compile that
> >>>takes almost
> >>>10 minutes for a single file is going to generate some raised eyebrows
> >>>when end users start tinkering with it
> >>>
> >>>Neil
> >>>
> Zhihong (John)
> 
> >>Even two minutes is a very long time to compile, IMHO. The whole of DPDK
> >>doesn't take that long to compile right now, and that's with a couple of 
> >>huge
> >>header files with routing tables in it. Any chance you could cut compile 
> >>time
> >>down to a few seconds while still having reasonable tests?
> >>Also, when there is AVX2 present on the system, what is the compile time
> >>like for that code?
> >>
> >>/Bruce
> >Neil, Bruce,
> >
> >Some data first.
> >
> >Sandy Bridge without AVX2:
> >1. original w/ 10 constant memcpy: 2'25"
> >2. patch w/ 12 constant memcpy: 2'41"
> >3. patch w/ 63 constant memcpy: 9'41"
> >
> >Haswell with AVX2:
> >1. original w/ 10 constant memcpy: 1'57"
> >2. patch w/ 12 constant memcpy: 1'56"
> >3. patch w/ 63 constant memcpy: 3'16"
> >
> >Also, to address Bruce's question, we have to reduce test case to cut down 
> >compile time. Because we use:
> >1. intrinsics instead of assembly for better flexibility and can utilize 
> >more compiler optimization
> >2. complex function body for better performance
> >3. inlining
> >This increases compile time.
> >But I think it'd be okay to do that as long as we can select a fair set of 
> >test points.
> >
> >It'd be great if you could give some suggestion, say, 12 points.
> >
> >Zhihong (John)
> >
> >
> 
> While I agree in the general case these long compilation times is painful
> for the users, having a factor of 2-8x in memcpy operations is quite an
> improvement, specially in DPDK applications which need to deal
> (unfortunately) heavily on them -- e.g. IP 

[dpdk-dev] [PATCH 0/5] new ntuple filter replaces 2tuple and 5tuple filters

2015-01-21 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wu, Jingjing
> Sent: Thursday, January 15, 2015 1:46 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; De Lara Guarch, Pablo; Cao, Min
> Subject: [PATCH 0/5] new ntuple filter replaces 2tuple and 5tuple filters
> 
> The patch set uses new filter_ctrl API to replace old 2tuple and 5tuple filter
> APIs.
> It defines ntuple filter to combine 2tuple and 5tuple types.
> It uses new functions and structure to replace old ones in igb/ixgbe driver,
> new commands to replace old ones in testpmd, and removes the old APIs.
> It removes the filter's index parameters from user interface, only the
> filter's key and assigned queue are visible to user.
> 
> Jingjing Wu (5):
>   ethdev: define ntuple filter type and its structure
>   ixgbe: ntuple filter functions replace old ones for 5tuple filter
>   e1000: ntuple filter functions replace old ones for 2tuple and 5tuple
> filter
>   testpmd: new commands for ntuple filter
>   ethdev: remove old APIs and structures of 5tuple and 2tuple filters
> 
>  app/test-pmd/cmdline.c  | 406 
>  app/test-pmd/config.c   |  65 ---
>  lib/librte_ether/rte_eth_ctrl.h |  57 +++
>  lib/librte_ether/rte_ethdev.c   | 116 -
>  lib/librte_ether/rte_ethdev.h   | 193 
>  lib/librte_pmd_e1000/e1000_ethdev.h |  79 +++-
>  lib/librte_pmd_e1000/igb_ethdev.c   | 892 +-
> --
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 489 +++-
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  62 ++-
>  9 files changed, 1344 insertions(+), 1015 deletions(-)
> 
> --
> 1.9.3

Could you send a v2 of this patchset? Your previous patchset "Integrate 
ethertype filter in igb/ixgbe driver to new API"
(which has been applied yesterday) contains already some of the code that this 
patchset has.

Thanks,
Pablo



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> Sent: Wednesday, January 21, 2015 3:44 AM
> To: Richardson, Bruce; Neil Horman
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Wednesday, January 21, 2015 12:15 AM
> > To: Neil Horman
> > Cc: Wang, Zhihong; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >
> > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > Sent: Monday, January 19, 2015 9:02 PM
> > > > > To: Wang, Zhihong
> > > > > Cc: dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > >
> > > > > On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
> > wrote:
> > > > > > This patch set optimizes memcpy for DPDK for both SSE and AVX
> > platforms.
> > > > > > It also extends memcpy test coverage with unaligned cases and
> > > > > > more test
> > > > > points.
> > > > > >
> > > > > > Optimization techniques are summarized below:
> > > > > >
> > > > > > 1. Utilize full cache bandwidth
> > > > > >
> > > > > > 2. Enforce aligned stores
> > > > > >
> > > > > > 3. Apply load address alignment based on architecture features
> > > > > >
> > > > > > 4. Make load/store address available as early as possible
> > > > > >
> > > > > > 5. General optimization techniques like inlining, branch
> > > > > > reducing, prefetch pattern access
> > > > > >
> > > > > > Zhihong Wang (4):
> > > > > >   Disabled VTA for memcpy test in app/test/Makefile
> > > > > >   Removed unnecessary test cases in test_memcpy.c
> > > > > >   Extended test coverage in test_memcpy_perf.c
> > > > > >   Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
> > > > > > platforms
> > > > > >
> > > > > >  app/test/Makefile  |   6 +
> > > > > >  app/test/test_memcpy.c |  52 +-
> > > > > >  app/test/test_memcpy_perf.c| 238 +---
> > > > > >  .../common/include/arch/x86/rte_memcpy.h   | 664
> > > > > +++--
> > > > > >  4 files changed, 656 insertions(+), 304 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 1.9.3
> > > > > >
> > > > > >
> > > > > Are you able to compile this with gcc 4.9.2?  The compilation of
> > > > > test_memcpy_perf is taking forever for me.  It appears hung.
> > > > > Neil
> > > >
> > > >
> > > > Neil,
> > > >
> > > > Thanks for reporting this!
> > > > It should compile but will take quite some time if the CPU doesn't 
> > > > support
> > AVX2, the reason is that:
> > > > 1. The SSE & AVX memcpy implementation is more complicated than
> > AVX2
> > > > version thus the compiler takes more time to compile and optimize 2.
> > > > The new test_memcpy_perf.c contains 126 constants memcpy calls for
> > > > better test case coverage, that's quite a lot
> > > >
> > > > I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
> > > > 1. The whole compile process takes 9'41" with the original
> > > > test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
> > > > only 2'41" after I reduce the constant memcpy call number to 12 + 12
> > > > = 24
> > > >
> > > > I'll reduce memcpy call in the next version of patch.
> > > >
> > > ok, thank you.  I'm all for optimzation, but I think a compile that
> > > takes almost
> > > 10 minutes for a single file is going to generate some raised eyebrows
> > > when end users start tinkering with it
> > >
> > > Neil
> > >
> > > > Zhihong (John)
> > > >
> > Even two minutes is a very long time to compile, IMHO. The whole of DPDK
> > doesn't take that long to compile right now, and that's with a couple of 
> > huge
> > header files with routing tables in it. Any chance you could cut compile 
> > time
> > down to a few seconds while still having reasonable tests?
> > Also, when there is AVX2 present on the system, what is the compile time
> > like for that code?
> >
> > /Bruce
> 
> Neil, Bruce,
> 
> Some data first.
> 
> Sandy Bridge without AVX2:
> 1. original w/ 10 constant memcpy: 2'25"
> 2. patch w/ 12 constant memcpy: 2'41"
> 3. patch w/ 63 constant memcpy: 9'41"
> 
> Haswell with AVX2:
> 1. original w/ 10 constant memcpy: 1'57"
> 2. patch w/ 12 constant memcpy: 1'56"
> 3. patch w/ 63 constant memcpy: 3'16"
> 
> Also, to address Bruce's question, we have to reduce test case to cut down 
> compile time. Because we use:
> 1. intrinsics instead of assembly for better flexibility and can utilize more 
> compiler optimization
> 2. complex function body for better performance
> 3. inlining
> This increases compile time.

We use instrincts and inlining in many other places too.
Why it suddenly became a 

[dpdk-dev] DPDK Community Call, Monday 2nd February, 17:00 GMT

2015-01-21 Thread Stephen Hemminger
On Wed, 21 Jan 2015 14:39:17 +
"O'driscoll, Tim"  wrote:

> > From: O'driscoll, Tim
> > 
> > > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > >
> > > On Tue, 20 Jan 2015 15:21:40 +
> > > "O'driscoll, Tim"  wrote:
> > >
> > > > We had our last community call in December, and then took a break over
> > > the holiday period. I think we should reinstate these, so I've scheduled 
> > > the
> > > next one for Monday 2nd February. Since our last call was at a time
> > > convenient for Asia, this one is at a time that's more convenient for 
> > > people
> > > based in the USA. As for previous calls, I'll post a recording to youtube
> > > afterwards for anybody who can't make it.
> > > >
> > > > I don't have an agenda yet, but will send one out in advance of the
> > > meeting.
> > >
> > > This is right after FOSDEM and many people will be returning home.
> > 
> > Good point. Thanks for pointing this out. I'll move the meeting to avoid 
> > this
> > clash.
> 
> To avoid the clash, I've moved this to Thursday 5th Feb, at the same time 
> (17:00 GMT). Hopefully that's a more convenient time for most people.
> 
> Meeting time:
> Dublin (Ireland), Thursday, February 5, 2015 at 5:00:00 PMGMT UTC 
>
> San Francisco (U.S.A. - California), Thursday, February 5, 2015 at 9:00:00 AM 
>PST UTC-8 hours
> Phoenix (U.S.A. - Arizona), Thursday, February 5, 2015 at 10:00:00 AM   MST 
> UTC-7 hours
> New York (U.S.A. - New York), Thursday, February 5, 2015 at 12:00:00 Noon EST 
> UTC-5 hours
> Ottawa (Canada - Ontario), Thursday, February 5, 2015 at 12:00:00 Noon EST 
> UTC-5 hours
> Paris (France), Thursday, February 5, 2015 at 6:00:00 PMCET UTC+1 hour
>  
> Tel Aviv (Israel), Thursday, February 5, 2015 at 7:00:00 PMIST UTC+2 
> hours
> Moscow (Russia), Thursday, February 5, 2015 at 8:00:00 PMMSK UTC+3 hours  
>   
> New Delhi (India - Delhi), Thursday, February 5, 2015 at 10:30:00 PM   IST 
> UTC+5:30 hours 
> Shanghai (China - Shanghai Municipality), Friday, February 6, 2015 at 1:00:00 
> AM   CST UTC+8 hours
> Tokyo (Japan), Friday, February 6, 2015 at 2:00:00 AM   JST UTC+9 hours
> Corresponding UTC (GMT), Thursday, February 5, 2015 at 17:00:00
> 
> GoToMeeting Details:
> To join, follow the meeting link: 
> https://global.gotomeeting.com/join/557845085. This will start the 
> GoToMeeting web viewer. You then have two options for audio:
> 
> 1. To use your computer's audio via a headset, you need to switch to the 
> desktop version of GoToMeeting. You can do this by clicking the GoToMeeting 
> icon on the top right hand side of the web viewer, and then selecting "Switch 
> to the desktop version". The desktop version will need to download and 
> install, so if you plan to use this you may want to get it set up in advance. 
> Once it starts, under the Audio section, you can select "Mic & Speakers". The 
> desktop version is only available for Windows and Mac, so if you're using 
> Linux then you need to use option 2 below.
> 
> 2. You can join using a phone via one of the numbers listed below. The Access 
> Code is 557-845-085. You'll also be asked for an Audio PIN, which is 
> accessible by clicking the phone icon in the GoToMeeting web viewer after 
> you've joined the meeting.
> Canada +1 (647) 497-9391
> France +33 (0) 170 950 593
> Ireland +353 (0) 15 290 180
> United Kingdom +44 (0) 20 3713 5028
> United States +1 (646) 982-0002
> More phone numbers: 
> https://global.gotomeeting.com/557845085/numbersdisplay.html
> 
> Info on downloading the desktop app is available at: 
> http://support.citrixonline.com/en_US/meeting/help_files/G2M010002?title=Download%7D
> Info on the web viewer is available at: 
> http://support.citrixonline.com/en_US/GoToMeeting/help_files/GTM130019?title=Web+Viewer+FAQs
> 
> 
> Thanks,
> Tim
>  

Great thanks


[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, January 21, 2015 9:11 AM
> To: Liu, Jijiang
> Cc: Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum 
> forwarding engine
> 
> Hi Jijiang,
> 
> On 01/21/2015 09:01 AM, Liu, Jijiang wrote:
> >>> I still don't understand why you are so eager to 'forbid' it.
> >>> Yes we support it for FVL, but no one forces you to use it.
> >>
> >> Well, how would you describe this 2 ways of doing the same thing in the
> >> offload API? Would you talk about the i40e registers? It's not because i40e
> >> has 2 ways to do the same operation that the DPDK should do the same.
> >>
> >> How will you explain to a user how to choose between these 2 cases?
> >
> > Talk about B method in 
> > http://dpdk.org/ml/archives/dev/2014-December/009213.html again.
> >
> > DPDK Never supports a  NIC that can recognize tunneling packet for TX side 
> > before 1.8, right?
> 
> When you say "recognize tunnel", if you mean offlading checksum of
> tunnel headers, I agree.
> 
> If you mean recognizing a tunnel packet in rx, I also agree
> it's new to dpdk-1.8, but I think it's unrelated to what we
> are talking about, which is tx checksum. A DPDK application
> is able to generate tunnel packets by itself and offload the
> checksums to the NIC.
> 
> > So when we need to support TX checksum offload for tunneling packet,  and 
> > we have to choose B.2.
> 
> I don't see why we should choose either B.1 or B.2 (I guess you want
> to say B.1 here, right?).
> 
> The m->lX_len are not filled in rx today. If one day they are, it won't
> prevent the application to configure the lX_len fields and offload
> flags according to the API.
> 
> > After introducing i40e(FVL),  FVL is able to recognize tunneling packet and 
> >  support outer IP, or inner IP or outer IP and inner IP TX
> checksum for tunneling packet.
> > And you agree on "outer and inner at the same time", why do you object 
> > "only inner"?
> >
> > Actually, B.2 method is a software workaround using L2 length when NIC 
> > can't recognize tunneling packet.
> > When NIC is able to recognize tunneling packet, I think you shouldn't take 
> > B.2 as a standard to 'forbid' other method.
> 
> Again, I'm not sure there is a link between "recognizing tunneling
> packets" and tx checksum offload of tunnels. 

I think what Jijiang doesn't talk about RX here.
What he is trying to say that with B.2 (case 4) we hide from HW the fact that 
the packet is tunnelling.
We just using the fact, that to calculate cksum for inner packet only, HW 
doesn't need to know is it a tunnelling packet or not.
All it needs is a proper values of l2_len and l3_len. 

> 
> Regards,
> Olivier



[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss

2015-01-21 Thread Ouyang Changchun
To follow up the comments from Wodkowski, PawelX, remove this unnecessary check,
as check_mq_mode has already check the queue number in device configure stage,
if the queue number of vf is not correct, it will return error code and exit,
so it doesn't need check again here in device start stage(note: 
pf_host_configure
is called in device start stage).

This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_ixgbe/ixgbe_pf.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c b/lib/librte_pmd_ixgbe/ixgbe_pf.c
index 93f6e43..dbda9b5 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
@@ -187,21 +187,6 @@ int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev)
IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw->mac.num_rar_entries), 0);
IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw->mac.num_rar_entries), 0);

-   /*
-* VF RSS can support at most 4 queues for each VF, even if
-* 8 queues are available for each VF, it need refine to 4
-* queues here due to this limitation, otherwise no queue
-* will receive any packet even RSS is enabled.
-*/
-   if (eth_dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_VMDQ_RSS) {
-   if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
-   RTE_ETH_DEV_SRIOV(eth_dev).active = ETH_32_POOLS;
-   RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
-   RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
-   dev_num_vf(eth_dev) * 4;
-   }
-   }
-
/* set VMDq map to default PF pool */
hw->mac.ops.set_vmdq(hw, 0, RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);

-- 
1.8.4.2



[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-21 Thread Thomas Monjalon
2015-01-20 16:17, Neil Horman:
> Adding a document describing rudimentary ABI policy and adding notice space 
> for
> any deprecation announcements
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 
> 
> ---
> Change notes:
> 
> v5) Updated documentation to add notes from Thomas M.
> 
> v6) Moved abi.txt to guides/rel_notes/abi.rst

You didn't integrate this file in the index.

[...]

> --- /dev/null
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -0,0 +1,38 @@
> +ABI policy
> +==
> + ABI versions are set at the time of major release labeling, and ABI may
> +change multiple times between the last labeling and the HEAD label of the git
> +tree without warning
> +
> + ABI versions, once released are available until such time as their
> +deprecation has been noted here for at least one major release cycle, after 
> it
> +has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and then the 
> decision to
> +remove it is made during the development of DPDK 1.9.  The decision will be
> +recorded here, shipped with the DPDK 1.9 release, and actually removed when 
> DPDK
> +1.10 ships.

As previously said, speaking about 2.0/2.1 would be more coherent.

> +
> + ABI versions may be deprecated in whole, or in part as needed by a given
> +update.
> +
> + Some ABI changes may be too significant to reasonably maintain multiple
> +versions of.  In those events ABI's may be updated without backward
> +compatibility provided.  The requirements for doing so are:
> + 1) At least 3 acknoweldgements of the need on the dpdk.org
> + 2) A full deprecation cycle must be made to offer downstream consumers
> +sufficient warning of the change.  E.g. if dpdk 2.0 is under development when
> +the change is proposed, a deprecation notice must be added to this file, and
> +released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
> + 3) The LIBABIVER variable in the makefilei(s) where the ABI changes are
> +incorporated must be incremented in parallel with the ABI changes themselves
> +
> + Note that the above process for ABI deprecation should not be undertaken
> +lightly.  ABI stability is extreemely important for downstream consumers of 
> the
> +DPDK, especially when distributed in shared object form.  Every effort 
> should be
> +made to preserve ABI whenever possible.  For instance, reorganizing public
> +structure field for astetic or readability purposes should be avoided as it 
> will
> +cause ABI breakage.  Only significant (e.g. performance) reasons should be 
> seen
> +as cause to alter ABI.

When applying the patch, there are these (minor) warnings:

/home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:52: trailing whitespace.
/home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:55: new blank line at 
EOF.

When building the documentation, there are these errors:
make doc-guides-html
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:4: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:8: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:15: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:18: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:20: ERROR: 
Unexpected indentation.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:22: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:25: ERROR: 
Unexpected indentation.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:26: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:29: WARNING: Block 
quote ends without a blank line; unexpected unindent.
/home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:: WARNING: 
document isn't included in any toctree

Please check it.

Other comment, what about the additions I suggested about macros and structure 
renaming?

Neil, we expect that you consider comments done previously and that you test 
your patch.
Otherwise, we are losing time in useless reviews.

-- 
Thomas


[dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for ethertype filter

2015-01-21 Thread Thomas Monjalon
Hi Jingjing,

Thanks for providing a patch quickly for the missing doc.
I have a few comments.

2015-01-21 16:30, Jingjing Wu:
> new commands for ethertype filter
>   - ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr)
> (mac_address) ethertype (ether_type) (drop|fwd) queue (queue_id)
> 
> Signed-off-by: Jingjing Wu 
> ---
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 46 
> +++--
>  1 file changed, 11 insertions(+), 35 deletions(-)
> 
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index be935c2..61a7f6d 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1397,56 +1397,32 @@ add_ethertype_filter
>  
>  Add a L2 Ethertype filter, which identify packets by their L2 Ethertype 
> mainly assign them to a receive queue.
>  
> -add_ethertype_filter (port_id) ethertype (eth_value) priority 
> (enable|disable) (pri_value) queue (queue_id) index (idx)
> +ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr) (mac_address) 
> ethertype (ether_type) (drop|fwd) queue (queue_id)
>  
>  The available information parameters are:
>  
>  *   port_id:  the port which the Ethertype filter assigned on.
>  
> -*   eth_value: the EtherType value want to match,
> -for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
> invalid.
> -
> -*   enable: user priority participates in the match.
> -
> -*   disable: user priority doesn't participate in the match.
> -
> -*   pri_value: user priority value that want to match.
> -
> -*   queue_id : The receive queue associated with this EtherType filter
> +*   mac_addr: need compare destination mac address.

is "need" needed? ;)
mac_addr: compare destination mac address.

>  
> -*   index: the index of this EtherType filter
> -
> -Example:
> -
> -.. code-block:: console
> +*   mac_ignr: ignore destination mac address match.
>  
> -testpmd> add_ethertype_filter 0 ethertype 0x0806 priority disable 0 
> queue 3 index 0
> -Assign ARP packet to receive queue 3
> +*   mac_address: destination mac address need to match.

again, I would remove "need"
mac_address: destination mac address to match.

>  
> -remove_ethertype_filter
> -~~~
> -
> -Remove a L2 Ethertype filter
> -
> -remove_ethertype_filter (port_id) index (idx)
> -
> -get_ethertype_filter
> -
> -
> -Get and display a L2 Ethertype filter
> +*   ether_type: the EtherType value want to match,
> +for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
> invalid.
>  
> -get_ethertype_filter (port_id) index (idx)
> +*   queue_id : The receive queue associated with this EtherType filter. It 
> is meaningless when deleting or dropping.

Do you mean queue_id is optional?

>  
>  Example:
>  
>  .. code-block:: console
>  
> -testpmd> get_ethertype_filter 0 index 0
> +testpmd> ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 3
> +add a rule to assign ARP packet to receive queue 3

You are adding a comment in the code-block. Not sure it is a good idea.

>  
> -filter[0]:
> -ethertype: 0x0806
> -priority: disable, 0
> -queue: 3
> + testpmd> ethertype_filter 0 del mac_ignr ethertype 0x0806 fwd queue 3

The indent is strange here.

> +delete the rule to assign ARP packet to receive queue 3
>  
>  add_2tuple_filter
>  ~
> 


-- 
Thomas


[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-21 Thread Iremonger, Bernard
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Tuesday, January 20, 2015 9:18 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation
> 
> Adding a document describing rudimentary ABI policy and adding notice space 
> for any deprecation
> announcements
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 
> 
> ---
> Change notes:
> 
> v5) Updated documentation to add notes from Thomas M.
> 
> v6) Moved abi.txt to guides/rel_notes/abi.rst
> ---
>  doc/guides/rel_notes/abi.rst | 38 ++
>  1 file changed, 38 insertions(+)
>  create mode 100644 doc/guides/rel_notes/abi.rst

Hi Neil,

The file doc/guides/rel_notes/index.rst  should be modified to include "abi" so 
that the abi.rst file is included in the release notes.

> 
> diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst new 
> file mode 100644 index
> 000..98ac19d
> --- /dev/null
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -0,0 +1,38 @@
> +ABI policy
> +==
> + ABI versions are set at the time of major release labeling, and ABI
> +may change multiple times between the last labeling and the HEAD label
> +of the git tree without warning
> +
> + ABI versions, once released are available until such time as their
> +deprecation has been noted here for at least one major release cycle,
> +after it has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and
> +then the decision to remove it is made during the development of DPDK
> +1.9.  The decision will be recorded here, shipped with the DPDK 1.9
> +release, and actually removed when DPDK
> +1.10 ships.
> +
> + ABI versions may be deprecated in whole, or in part as needed by a
> +given update.
> +
> + Some ABI changes may be too significant to reasonably maintain
> +multiple versions of.  In those events ABI's may be updated without
> +backward compatibility provided.  The requirements for doing so are:

The #.  Syntax could be used for numbered lists

> + 1) At least 3 acknoweldgements of the need on the dpdk.org

A blank line is needed otherwise the text will concatenate.

> + 2) A full deprecation cycle must be made to offer downstream consumers
> +sufficient warning of the change.  E.g. if dpdk 2.0 is under
> +development when the change is proposed, a deprecation notice must be
> +added to this file, and released with dpdk 2.0.  Then the change may be 
> incorporated for dpdk 2.1

A blank line is needed otherwise the text will concatenate.

> + 3) The LIBABIVER variable in the makefilei(s) where the ABI changes
> +are incorporated must be incremented in parallel with the ABI changes
> +themselves

A blank line is needed otherwise the text will concatenate.
> +
> + Note that the above process for ABI deprecation should not be
> +undertaken lightly.  ABI stability is extreemely important for
> +downstream consumers of the DPDK, especially when distributed in shared
> +object form.  Every effort should be made to preserve ABI whenever
> +possible.  For instance, reorganizing public structure field for
> +astetic or readability purposes should be avoided as it will cause ABI
> +breakage.  Only significant (e.g. performance) reasons should be seen as 
> cause to alter ABI.
> +
> +Deprecation Notices
> +===
> +
> --
> 2.1.0
Regards,

Bernard.


[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Olivier MATZ
Hi Jijiang,

On 01/21/2015 09:01 AM, Liu, Jijiang wrote:
>>> I still don't understand why you are so eager to 'forbid' it.
>>> Yes we support it for FVL, but no one forces you to use it.
>>
>> Well, how would you describe this 2 ways of doing the same thing in the
>> offload API? Would you talk about the i40e registers? It's not because i40e
>> has 2 ways to do the same operation that the DPDK should do the same.
>>
>> How will you explain to a user how to choose between these 2 cases?
>
> Talk about B method in 
> http://dpdk.org/ml/archives/dev/2014-December/009213.html again.
>
> DPDK Never supports a  NIC that can recognize tunneling packet for TX side 
> before 1.8, right?

When you say "recognize tunnel", if you mean offlading checksum of
tunnel headers, I agree.

If you mean recognizing a tunnel packet in rx, I also agree
it's new to dpdk-1.8, but I think it's unrelated to what we
are talking about, which is tx checksum. A DPDK application
is able to generate tunnel packets by itself and offload the
checksums to the NIC.

> So when we need to support TX checksum offload for tunneling packet,  and we 
> have to choose B.2.

I don't see why we should choose either B.1 or B.2 (I guess you want
to say B.1 here, right?).

The m->lX_len are not filled in rx today. If one day they are, it won't
prevent the application to configure the lX_len fields and offload
flags according to the API.

> After introducing i40e(FVL),  FVL is able to recognize tunneling packet and  
> support outer IP, or inner IP or outer IP and inner IP TX checksum for 
> tunneling packet.
> And you agree on "outer and inner at the same time", why do you object "only 
> inner"?
>
> Actually, B.2 method is a software workaround using L2 length when NIC can't 
> recognize tunneling packet.
> When NIC is able to recognize tunneling packet, I think you shouldn't take 
> B.2 as a standard to 'forbid' other method.

Again, I'm not sure there is a link between "recognizing tunneling
packets" and tx checksum offload of tunnels.

Regards,
Olivier



[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-21 Thread Neil Horman
On Wed, Jan 21, 2015 at 11:25:48AM +0100, Thomas Monjalon wrote:
> 2015-01-20 16:17, Neil Horman:
> > Adding a document describing rudimentary ABI policy and adding notice space 
> > for
> > any deprecation announcements
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > CC: "Richardson, Bruce" 
> > 
> > ---
> > Change notes:
> > 
> > v5) Updated documentation to add notes from Thomas M.
> > 
> > v6) Moved abi.txt to guides/rel_notes/abi.rst
> 
> You didn't integrate this file in the index.
> 
Shiobahn indicated that its just a plain text file, so I left it as a plain text
file.  I guess we have different definitions of plain text files.

> [...]
> 
> > --- /dev/null
> > +++ b/doc/guides/rel_notes/abi.rst
> > @@ -0,0 +1,38 @@
> > +ABI policy
> > +==
> > +   ABI versions are set at the time of major release labeling, and ABI may
> > +change multiple times between the last labeling and the HEAD label of the 
> > git
> > +tree without warning
> > +
> > +   ABI versions, once released are available until such time as their
> > +deprecation has been noted here for at least one major release cycle, 
> > after it
> > +has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and then the 
> > decision to
> > +remove it is made during the development of DPDK 1.9.  The decision will be
> > +recorded here, shipped with the DPDK 1.9 release, and actually removed 
> > when DPDK
> > +1.10 ships.
> 
> As previously said, speaking about 2.0/2.1 would be more coherent.
> 
As previously mentioned, I really don't see this as relevant, as it will be out
of date within a release, and I think we can agree, no one is going to update
this paragraph every release.

> > +
> > +   ABI versions may be deprecated in whole, or in part as needed by a given
> > +update.
> > +
> > +   Some ABI changes may be too significant to reasonably maintain multiple
> > +versions of.  In those events ABI's may be updated without backward
> > +compatibility provided.  The requirements for doing so are:
> > +   1) At least 3 acknoweldgements of the need on the dpdk.org
> > +   2) A full deprecation cycle must be made to offer downstream consumers
> > +sufficient warning of the change.  E.g. if dpdk 2.0 is under development 
> > when
> > +the change is proposed, a deprecation notice must be added to this file, 
> > and
> > +released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
> > +   3) The LIBABIVER variable in the makefilei(s) where the ABI changes are
> > +incorporated must be incremented in parallel with the ABI changes 
> > themselves
> > +
> > +   Note that the above process for ABI deprecation should not be undertaken
> > +lightly.  ABI stability is extreemely important for downstream consumers 
> > of the
> > +DPDK, especially when distributed in shared object form.  Every effort 
> > should be
> > +made to preserve ABI whenever possible.  For instance, reorganizing public
> > +structure field for astetic or readability purposes should be avoided as 
> > it will
> > +cause ABI breakage.  Only significant (e.g. performance) reasons should be 
> > seen
> > +as cause to alter ABI.
> 
> When applying the patch, there are these (minor) warnings:
> 
> /home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:52: trailing 
> whitespace.
> /home/thomas/projects/dpdk/dpdk/.git/rebase-apply/patch:55: new blank line at 
> EOF.
> 
> When building the documentation, there are these errors:
> make doc-guides-html
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:4: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:8: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:15: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:18: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:20: ERROR: 
> Unexpected indentation.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:22: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:25: ERROR: 
> Unexpected indentation.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:26: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:29: WARNING: 
> Block quote ends without a blank line; unexpected unindent.
> /home/thomas/projects/dpdk/dpdk/doc/guides/rel_notes/abi.rst:: WARNING: 
> document isn't included in any toctree
> 
> Please check it.
> 
Again, I guess we have separate definitions of what a plain text file is, but
I'll look into it.


> Other comment, what about the additions I suggested about macros and 
> structure renaming?
> 
Considered and answered 

[dpdk-dev] DPDK Community Call, Monday 2nd February, 17:00 GMT

2015-01-21 Thread O'driscoll, Tim
> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, January 20, 2015 5:11 PM
> To: O'driscoll, Tim
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] DPDK Community Call, Monday 2nd February, 17:00
> GMT
> 
> On Tue, 20 Jan 2015 15:21:40 +
> "O'driscoll, Tim"  wrote:
> 
> > We had our last community call in December, and then took a break over
> the holiday period. I think we should reinstate these, so I've scheduled the
> next one for Monday 2nd February. Since our last call was at a time
> convenient for Asia, this one is at a time that's more convenient for people
> based in the USA. As for previous calls, I'll post a recording to youtube
> afterwards for anybody who can't make it.
> >
> > I don't have an agenda yet, but will send one out in advance of the
> meeting.
> 
> This is right after FOSDEM and many people will be returning home.

Good point. Thanks for pointing this out. I'll move the meeting to avoid this 
clash.


Tim


[dpdk-dev] [PATCH v4 06/11] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-01-21 Thread Tetsuya Mukawa
Hi Michael,

On 2015/01/20 18:23, Qiu, Michael wrote:
> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
>> The patch adds functions for unmapping igb_uio resources. The patch is only
>> for Linux and igb_uio environment. VFIO and BSD are not supported.
>>
>> v4:
>> - Add paramerter checking.
>> - Add header file to determine if hotplug can be enabled.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/common/Makefile  |  1 +
>>  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 38 +++
>>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
>> +
>>  5 files changed, 156 insertions(+)
>>  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h
>>
>> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
>> index 52c1a5f..db7cc93 100644
>> --- a/lib/librte_eal/common/Makefile
>> +++ b/lib/librte_eal/common/Makefile
>> @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
>>  INC += rte_hexdump.h rte_devargs.h rte_dev.h
>>  INC += rte_common_vect.h
>>  INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
>> +INC += rte_dev_hotplug.h
>>  
>>  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
>>  INC += rte_warnings.h
>> diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
>> b/lib/librte_eal/common/include/rte_dev_hotplug.h
>> new file mode 100644
>> index 000..b333e0f
>> --- /dev/null
>> +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
>> @@ -0,0 +1,44 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2015 IGEL Co.,LTd.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + * * Redistributions of source code must retain the above copyright
>> + *   notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + *   notice, this list of conditions and the following disclaimer in
>> + *   the documentation and/or other materials provided with the
>> + *   distribution.
>> + * * Neither the name of IGEL Co.,Ltd. nor the names of its
>> + *   contributors may be used to endorse or promote products derived
>> + *   from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_HOTPLUG_H_
>> +#define _RTE_DEV_HOTPLUG_H_
>> +
>> +/*
>> + * determine if hotplug can be enabled on the system
>> + */
>> +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
> As you said, VFIO should not work with it, so does it need to add the
> vfio check here?
I appreciate your comment.
Yes, it should be. I will fix it in next version.

Thanks,
Tetsuya

> Thanks,
> Michael
>> +#define ENABLE_HOTPLUG
>> +#endif /* RTE_LIBRTE_EAL_HOTPLUG & RTE_LIBRTE_EAL_LINUXAPP */
>> +
>> +#endif /* _RTE_DEV_HOTPLUG_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
>> b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 3d2d93c..52c464c 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -137,6 +137,25 @@ pci_map_resource(void *requested_addr, int fd, off_t 
>> offset, size_t size)
>>  return mapaddr;
>>  }
>>  
>> +#ifdef ENABLE_HOTPLUG
>> +/* unmap a particular resource */
>> +void
>> +pci_unmap_resource(void *requested_addr, size_t size)
>> +{
>> +if (requested_addr == NULL)
>> +return;
>> +
>> +/* Unmap the PCI memory resource of device */
>> +if (munmap(requested_addr, size)) {
>> +RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
>> +__func__, requested_addr, (unsigned long)size,
>> +strerror(errno));
>> +} else
>> +RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n",
>> +requested_addr);
>> +}
>> +#endif /* ENABLE_HOTPLUG 

[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-21 Thread Wodkowski, PawelX


> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, January 21, 2015 3:44 AM
> To: Wodkowski, PawelX; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> 
> 
> 
> > -Original Message-
> > From: Wodkowski, PawelX
> > Sent: Tuesday, January 20, 2015 5:35 PM
> > To: Ouyang, Changchun; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
> > Changchun
> > > Sent: Monday, January 12, 2015 6:59 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> > >
> > > It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF
> > RSS.
> > >
> > > The psrtype will determine how many queues the received packets will
> > > distribute to, and the value of psrtype should depends on both facet:
> > > max VF rxq number which has been negotiated with PF, and the number of
> > > rxq specified in config on guest.
> > >
> > > Signed-off-by: Changchun Ouyang 
> > >
> > > Changes in v6:
> > >   - Raise an error for the case of ETH_16_POOLS in config vf rss, as the
> > previous
> > > logic have changed it into: ETH_32_POOLS.
> > >
> > > Changes in v4:
> > >  - The number of rxq from config should be power of 2 and should not
> > > bigger than
> > > max VF rxq number(negotiated between guest and host).
> > >
> > > ---
> > >  lib/librte_pmd_ixgbe/ixgbe_pf.c   |  15 ++
> > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 102
> > > +-
> > >  2 files changed, 105 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > b/lib/librte_pmd_ixgbe/ixgbe_pf.c index dbda9b5..93f6e43 100644
> > > --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
> > > *eth_dev)
> > >   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
> > >mac.num_rar_entries),
> > > 0);
> > >   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
> > >mac.num_rar_entries),
> > > 0);
> > >
> > > + /*
> > > +  * VF RSS can support at most 4 queues for each VF, even if
> > > +  * 8 queues are available for each VF, it need refine to 4
> > > +  * queues here due to this limitation, otherwise no queue
> > > +  * will receive any packet even RSS is enabled.
> > > +  */
> > > + if (eth_dev->data->dev_conf.rxmode.mq_mode ==
> > > ETH_MQ_RX_VMDQ_RSS) {
> > > + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
> > > + RTE_ETH_DEV_SRIOV(eth_dev).active =
> > > ETH_32_POOLS;
> > > + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
> > > + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
> > > + dev_num_vf(eth_dev) * 4;
> > > + }
> > > + }
> > > +
> >
> > I did not looked before at your patches but I think you are messing with
> > things that should not be changed:
> >
> > Why you are changing those values. They are set up during
> > ixgbe_pf_host_init(). Limitation you are describing is only RSS related. If
> > there will be reconfiguration from ETH_MQ_RX_VMDQ_RSS to other mode
> > those value need to be re-evaluated. If you find this kind of limitation you
> > should handle it during RSS part configuration. Or if your way is the right 
> > way
> > you should explicitly make separate function that will re-evaluate those
> > parameters each time.
> >
> > Second issue with this code is that the nb_q_per_pool is changed from:
> > ixgbe_pf_host_configure() -> ixgbe_dev_start() -> rte_eth_dev_start() and
> > rte_eth_dev_check_vf_rss_rxq_num() -> rte_eth_dev_check_mq_mode() ->
> > rte_eth_dev_configure()
> >
> > Which one is the right one? If both, why they are calculated twice?
> >
> > I don't think that rte_eth_dev_data::sriov field should be changed at all - 
> > it
> > holds current SRIOV capabilities.
> > If this will change during runtime it no point to have this field at all 
> > and should
> > be some kind of "siov_get()"
> > function that will calculate and return those parameters dynamically.
> >
> > Please refer also to
> >
>  > .com>
> > for further issues.
> >
> > I think this patchset should not be applied.
> 
> The better way should be either raise your comments before this patch is
> merged into mainline, or

Yes, I should but I trusted that Vlad review was covering this part. Does no 
matter
my, fault.

> You send out a patch to fix it.
> I agree on part of what you said, the check is not necessary for vf rss in
> pf_host_configure because
> Check_mq_mode has already check the queue number, I will send out a patch to
> fix it by removing this check.
> 
> On the other hand, I disagree with you on " rte_eth_dev_data::sriov field 
> should
> be changed at all ",

This is my private opinion, but either way, recalculating those values or not,
it should be consistent and for feature development 

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Liu, Jijiang
Hi,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, January 21, 2015 2:16 AM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and
> csum forwarding engine
> 
> Hi Konstantin,
> >
> >> Case 4) and case 9) would fill the hardware registers exactly the same.
> >
> > No, they wouldn't.
> > Please read corresponding section of FVL spec and i40e_rxtx.c For case
> > 4) we only need to setup TDD (TX data descriptor) with the following
> values:
> > IIPT, IPLEN, L4T, L4LEN
> > For case 9) we need to setup both TDD and TCD (TX context descriptor)
> with the following values:
> > TDD: IIPT, IPLEN, L4T, L4LEN
> > TCD: EIPT, EIPLEN,  L4TUNT, L4TUNLEN
> >
> >> To me, it's just an API question.
> >
> > No, it is not.
> >
> > I still don't understand why you are so eager to 'forbid' it.
> > Yes we support it for FVL, but no one forces you to use it.
> 
> Well, how would you describe this 2 ways of doing the same thing in the
> offload API? Would you talk about the i40e registers? It's not because i40e
> has 2 ways to do the same operation that the DPDK should do the same.
> 
> How will you explain to a user how to choose between these 2 cases?

Talk about B method in 
http://dpdk.org/ml/archives/dev/2014-December/009213.html again.

DPDK Never supports a  NIC that can recognize tunneling packet for TX side 
before 1.8, right?
So when we need to support TX checksum offload for tunneling packet,  and we 
have to choose B.2.

After introducing i40e(FVL),  FVL is able to recognize tunneling packet and  
support outer IP, or inner IP or outer IP and inner IP TX checksum for 
tunneling packet.
And you agree on "outer and inner at the same time", why do you object "only 
inner"? 

Actually, B.2 method is a software workaround using L2 length when NIC can't 
recognize tunneling packet.
When NIC is able to recognize tunneling packet, I think you shouldn't take B.2 
as a standard to 'forbid' other method.


> Having to support these 2 different cases for the same thing will complexify
> all future drivers that will not work the same way than i40e.
> 
> >>> As one of possible use-cases:  HW VLAN tags insertion for both inner and
> outer packets.
> >>> FVL can do that, though as I know our PMD doesn't implement it yet.
> >>> For that, we'll need to specify at least:
> >>> outer_l2_len, outer_l3_len, l2_len.
> >>> While PKT_TX_OUTER_* might stay cleared.
> >>
> >> If a VLAN flag has to be inserted in outer header, a new flag
> >> PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
> would
> >> still be correct:
> >>
> >> The driver should look at mb->outer_lX_len only if a
> >> PKT_TX_OUTER_* flag is present.
> >>
> >
> > Introducing PKT_TX_OUTER_INSERT_VLAN is ok.
> > Though I still think we'll need TX_*_TUNNEL flags and no need to 'forbid'
> case 9).
> > BTW, as I can see linux i40e driver for tunnelling packets uses case 9), not
> case 4), right?
> 
> I need to check this.
> 
> Regards,
> Olivier



[dpdk-dev] [PATCH] stats: remove useless memset's

2015-01-21 Thread Neil Horman
On Tue, Jan 20, 2015 at 08:16:58PM -0800, stephen at networkplumber.org wrote:
> From: Stephen Hemminger 
> 
> The rte_eth_stats_get is the only API that should call the device
> statistics function directly, and it already does a memset of the
> resulting structure. Therefore doing memset() in the driver is
> redundant and should be removed.
> 
> Signed-off-by: Stephen Hemminger 
Acked-by: Neil Horman 



[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-21 Thread Neil Horman
On Wed, Jan 21, 2015 at 12:02:57PM +, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> > Sent: Wednesday, January 21, 2015 3:44 AM
> > To: Richardson, Bruce; Neil Horman
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > 
> > 
> > 
> > > -Original Message-
> > > From: Richardson, Bruce
> > > Sent: Wednesday, January 21, 2015 12:15 AM
> > > To: Neil Horman
> > > Cc: Wang, Zhihong; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > >
> > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
> > > > On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > > > Sent: Monday, January 19, 2015 9:02 PM
> > > > > > To: Wang, Zhihong
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> > > > > >
> > > > > > On Mon, Jan 19, 2015 at 09:53:30AM +0800, zhihong.wang at intel.com
> > > wrote:
> > > > > > > This patch set optimizes memcpy for DPDK for both SSE and AVX
> > > platforms.
> > > > > > > It also extends memcpy test coverage with unaligned cases and
> > > > > > > more test
> > > > > > points.
> > > > > > >
> > > > > > > Optimization techniques are summarized below:
> > > > > > >
> > > > > > > 1. Utilize full cache bandwidth
> > > > > > >
> > > > > > > 2. Enforce aligned stores
> > > > > > >
> > > > > > > 3. Apply load address alignment based on architecture features
> > > > > > >
> > > > > > > 4. Make load/store address available as early as possible
> > > > > > >
> > > > > > > 5. General optimization techniques like inlining, branch
> > > > > > > reducing, prefetch pattern access
> > > > > > >
> > > > > > > Zhihong Wang (4):
> > > > > > >   Disabled VTA for memcpy test in app/test/Makefile
> > > > > > >   Removed unnecessary test cases in test_memcpy.c
> > > > > > >   Extended test coverage in test_memcpy_perf.c
> > > > > > >   Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
> > > > > > > platforms
> > > > > > >
> > > > > > >  app/test/Makefile  |   6 +
> > > > > > >  app/test/test_memcpy.c |  52 +-
> > > > > > >  app/test/test_memcpy_perf.c| 238 +---
> > > > > > >  .../common/include/arch/x86/rte_memcpy.h   | 664
> > > > > > +++--
> > > > > > >  4 files changed, 656 insertions(+), 304 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 1.9.3
> > > > > > >
> > > > > > >
> > > > > > Are you able to compile this with gcc 4.9.2?  The compilation of
> > > > > > test_memcpy_perf is taking forever for me.  It appears hung.
> > > > > > Neil
> > > > >
> > > > >
> > > > > Neil,
> > > > >
> > > > > Thanks for reporting this!
> > > > > It should compile but will take quite some time if the CPU doesn't 
> > > > > support
> > > AVX2, the reason is that:
> > > > > 1. The SSE & AVX memcpy implementation is more complicated than
> > > AVX2
> > > > > version thus the compiler takes more time to compile and optimize 2.
> > > > > The new test_memcpy_perf.c contains 126 constants memcpy calls for
> > > > > better test case coverage, that's quite a lot
> > > > >
> > > > > I've just tested this patch on an Ivy Bridge machine with GCC 4.9.2:
> > > > > 1. The whole compile process takes 9'41" with the original
> > > > > test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2. It takes
> > > > > only 2'41" after I reduce the constant memcpy call number to 12 + 12
> > > > > = 24
> > > > >
> > > > > I'll reduce memcpy call in the next version of patch.
> > > > >
> > > > ok, thank you.  I'm all for optimzation, but I think a compile that
> > > > takes almost
> > > > 10 minutes for a single file is going to generate some raised eyebrows
> > > > when end users start tinkering with it
> > > >
> > > > Neil
> > > >
> > > > > Zhihong (John)
> > > > >
> > > Even two minutes is a very long time to compile, IMHO. The whole of DPDK
> > > doesn't take that long to compile right now, and that's with a couple of 
> > > huge
> > > header files with routing tables in it. Any chance you could cut compile 
> > > time
> > > down to a few seconds while still having reasonable tests?
> > > Also, when there is AVX2 present on the system, what is the compile time
> > > like for that code?
> > >
> > >   /Bruce
> > 
> > Neil, Bruce,
> > 
> > Some data first.
> > 
> > Sandy Bridge without AVX2:
> > 1. original w/ 10 constant memcpy: 2'25"
> > 2. patch w/ 12 constant memcpy: 2'41"
> > 3. patch w/ 63 constant memcpy: 9'41"
> > 
> > Haswell with AVX2:
> > 1. original w/ 10 constant memcpy: 1'57"
> > 2. patch w/ 12 constant memcpy: 1'56"
> > 3. patch w/ 63 constant memcpy: 3'16"
> > 
> > Also, to address Bruce's question, we have to reduce test case to cut 

[dpdk-dev] [PATCH v6 0/6] enicpmd: Cisco Systems Inc. VIC Ethernet PMD

2015-01-21 Thread Sujith Sankar (ssujith)
Hi David,

Apologies for the delay.  I was not able to find quality time to finish it as a 
few other things have been keeping me busy.  But I shall work on it and provide 
the doc and the perf details soon.
In the mean time, it would be great if you could point me to some resources on 
running pktgen-dpdk as I was stuck on it.

Thanks,
-Sujith

From: David Marchand mailto:david.march...@6wind.com>>
Date: Tuesday, 20 January 2015 4:55 pm
To: "Sujith Sankar (ssujith)" mailto:ssujith at 
cisco.com>>
Cc: "dev at dpdk.org" mailto:dev at 
dpdk.org>>, "Prasad Rao (prrao)" mailto:prrao at 
cisco.com>>, Neil Horman mailto:nhorman at 
tuxdriver.com>>, Thomas Monjalon mailto:thomas.monjalon at 6wind.com>>
Subject: Re: [dpdk-dev] [PATCH v6 0/6] enicpmd: Cisco Systems Inc. VIC Ethernet 
PMD

Hello Sujith,

Any news on the documentation and the performance numbers you said you would 
send ?

Thanks.

--
David Marchand

On Thu, Nov 27, 2014 at 4:31 PM, Thomas Monjalon mailto:thomas.monjalon at 6wind.com>> wrote:
2014-11-27 04:27, Sujith Sankar:
> Thanks Thomas, David and Neil !
>
> I shall work on finishing the documentation.
> About that, you had mentioned that you wanted it in doc/drivers/ path.
> Could I send a patch with documentation in the path doc/drivers/enicpmd/ ?

Yes.
I'd prefer doc/drivers/enic/ but it's a detail ;)
The format must be sphinx rst to allow web publishing.

It would be great to have some design documentation of every drivers
in doc/drivers.

Thanks
--
Thomas



[dpdk-dev] [PATCH v4 10/11] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-01-21 Thread Qiu, Michael
On 1/19/2015 6:43 PM, Tetsuya Mukawa wrote:
> These functions are used for attaching or detaching a port.
> When rte_eal_dev_attach() is called, the function tries to realize the
> device name as pci address. If this is done successfully,
> rte_eal_dev_attach() will attach physical device port. If not, attaches
> virtual devive port.
> When rte_eal_dev_detach() is called, the function gets the device type
> of this port to know whether the port is came from physical or virtual.
> And then specific detaching function will be called.
>
> v4:
> - Fix comment.
> - Add error checking.
> - Fix indent of 'if' statement.
> - Change function name.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/eal_common_dev.c  | 273 
> 
>  lib/librte_eal/common/eal_private.h |  11 ++
>  lib/librte_eal/common/include/rte_dev.h |  33 
>  lib/librte_eal/linuxapp/eal/Makefile|   1 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   6 +-
>  5 files changed, 321 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_dev.c 
> b/lib/librte_eal/common/eal_common_dev.c
> index eae5656..828bd70 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -32,10 +32,13 @@
>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -107,3 +110,273 @@ rte_eal_dev_init(void)
>   }
>   return 0;
>  }
> +
> +/* So far, DPDK hotplug function only supports linux */
> +#ifdef ENABLE_HOTPLUG
> +static void
> +rte_eal_dev_invoke(struct rte_driver *driver,
> + struct rte_devargs *devargs, enum rte_eal_invoke_type type)
> +{
> + if ((driver == NULL) || (devargs == NULL))
> + return;
> +
> + switch (type) {
> + case RTE_EAL_INVOKE_TYPE_PROBE:
> + driver->init(devargs->virtual.drv_name, devargs->args);
> + break;
> + case RTE_EAL_INVOKE_TYPE_CLOSE:
> + driver->uninit(devargs->virtual.drv_name, devargs->args);
> + break;
> + default:
> + break;
> + }
> +}
> +
> +static int
> +rte_eal_dev_find_and_invoke(const char *name, int type)

This function is totally for vdev, so I would like it shows in name,
like *rte_eal_vdev_find_and_invoke*

> +{
> + struct rte_devargs *devargs;
> + struct rte_driver *driver;
> +
> + if (name == NULL)
> + return -EINVAL;
> +
> + /* call the init function for each virtual device */
> + TAILQ_FOREACH(devargs, _list, next) {
> +
> + if (devargs->type != RTE_DEVTYPE_VIRTUAL)
> + continue;
> +
> + if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
> + continue;
> +
> + TAILQ_FOREACH(driver, _driver_list, next) {
> + if (driver->type != PMD_VDEV)
> + continue;
> +
> + /* search a driver prefix in virtual device name */
> + if (!strncmp(driver->name, devargs->virtual.drv_name,
> + strlen(driver->name))) {
> + rte_eal_dev_invoke(driver, devargs, type);
> + break;
> + }
> + }
> +
> + if (driver == NULL) {
> + RTE_LOG(WARNING, EAL, "no driver found for %s\n",
> +   devargs->virtual.drv_name);
> + }
> + return 0;
> + }
> + return 1;
> +}
> +
> +/* attach the new physical device, then store port_id of the device */
> +static int
> +rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
> +{
> + uint8_t new_port_id;
> + struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
> +
> + if ((addr == NULL) || (port_id == NULL))
> + goto err;
> +
> + /* save current port status */
> + rte_eth_dev_save(devs);
> + /* re-construct pci_device_list */
> + if (rte_eal_pci_scan())
> + goto err;
> + /* invoke probe func of the driver can handle the new device */
> + if (rte_eal_pci_probe_one(addr))
> + goto err;
> + /* get port_id enabled by above procedures */
> + if (rte_eth_dev_get_changed_port(devs, _port_id))
> + goto err;
> +
> + *port_id = new_port_id;
> + return 0;
> +err:
> + RTE_LOG(ERR, EAL, "Drver, cannot attach the device\n");
> + return -1;
> +}
> +
> +/* detach the new physical device, then store pci_addr of the device */
> +static int
> +rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
> +{
> + struct rte_pci_addr freed_addr;
> + struct rte_pci_addr vp;
> +
> + if (addr == NULL)
> + goto err;
> +
> + /* check whether the driver supports detach feature, or not */
> + if 

[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-21 Thread Wang, Zhihong


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, January 21, 2015 3:16 AM
> To: Stephen Hemminger
> Cc: Wang, Zhihong; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in
> arch/x86/rte_memcpy.h for both SSE and AVX platforms
> 
> On Tue, Jan 20, 2015 at 09:15:38AM -0800, Stephen Hemminger wrote:
> > On Mon, 19 Jan 2015 09:53:34 +0800
> > zhihong.wang at intel.com wrote:
> >
> > > Main code changes:
> > >
> > > 1. Differentiate architectural features based on CPU flags
> > >
> > > a. Implement separated move functions for SSE/AVX/AVX2 to make
> > > full utilization of cache bandwidth
> > >
> > > b. Implement separated copy flow specifically optimized for
> > > target architecture
> > >
> > > 2. Rewrite the memcpy function "rte_memcpy"
> > >
> > > a. Add store aligning
> > >
> > > b. Add load aligning based on architectural features
> > >
> > > c. Put block copy loop into inline move functions for better
> > > control of instruction order
> > >
> > > d. Eliminate unnecessary MOVs
> > >
> > > 3. Rewrite the inline move functions
> > >
> > > a. Add move functions for unaligned load cases
> > >
> > > b. Change instruction order in copy loops for better pipeline
> > > utilization
> > >
> > > c. Use intrinsics instead of assembly code
> > >
> > > 4. Remove slow glibc call for constant copies
> > >
> > > Signed-off-by: Zhihong Wang 
> >
> > Dumb question: why not fix glibc memcpy instead?
> > What is special about rte_memcpy?
> >
> >
> Fair point.  Though, does glibc implement optimized memcpys per arch?  Or
> do they just rely on the __builtin's from gcc to get optimized variants?
> 
> Neil

Neil, Stephen,

Glibc has per arch implementation but is for general purpose, while rte_memcpy 
is more for small size & in cache memcpy, which is the DPDK case. This lead to 
different trade-offs and optimization techniques.
Also, glibc's update from version to version is also based on general 
judgments. We can say that glibc 2.18 is for Ivy Bridge and 2.20 is for 
Haswell, though not full accurate. But we need an implementation for both Sandy 
Bridge and Haswell.

For instance, glibc 2.18 has load aligning optimization for unaligned memcpy 
but doesn't support 256-bit mov; while glibc 2.20 add support for 256-bit mov, 
but remove load aligning optimization. This hurts unaligned memcpy performance 
a lot on architectures like Ivy Bridge. Glibc's reason is that the load 
aligning optimization doesn't help when src/dst isn't in cache, which could be 
the general case, but not the DPDK case.

Zhihong (John)


[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-21 Thread Liu, Jijiang
Hi,

> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, January 21, 2015 2:16 AM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and
> csum forwarding engine
> 
> 
> 
> >>> Ok, and why it should be our problem?
> >>> We have a lot of things done in a different manner then
> >>> linux/freebsd kernel drivers, Why now it became a problem?
> >>
> >> If linux doesn't need an equivalent flag for doing the same thing, it
> >> probably means we don't need it either.
> >
> > Probably yes  Or probably not.
> > Why do we need to guess what was the intention of guys who wrote that
> part of linux driver?
> 
> Because the dpdk looks very similar to that part of linux driver.

A  guy from Intel  who have already confirmed that the NVGRE is not supported 
yet in Linux kernel.

He said "So far as I know it is not yet supported and I have no information on 
when it will be."

> > BTW, the macro for GRE is here:
> > find lib/librte_pmd_i40e/i40e -type f | xargs grep TUN | grep TXD
> > lib/librte_pmd_i40e/i40e/i40e_type.h:#define
> > I40E_TXD_CTX_UDP_TUNNELING (0x1ULL <<
> I40E_TXD_CTX_QW0_NATT_SHIFT)
> > lib/librte_pmd_i40e/i40e/i40e_type.h:#define
> > I40E_TXD_CTX_GRE_TUNNELING (0x2ULL <<
> I40E_TXD_CTX_QW0_NATT_SHIFT)
> >
> > Though it not used (yet?) by some reason.
> >
> >>
> 
> Regards,
> Olivier



[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-21 Thread Ouyang, Changchun


> -Original Message-
> From: Wodkowski, PawelX
> Sent: Tuesday, January 20, 2015 5:35 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
> Changchun
> > Sent: Monday, January 12, 2015 6:59 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> >
> > It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF
> RSS.
> >
> > The psrtype will determine how many queues the received packets will
> > distribute to, and the value of psrtype should depends on both facet:
> > max VF rxq number which has been negotiated with PF, and the number of
> > rxq specified in config on guest.
> >
> > Signed-off-by: Changchun Ouyang 
> >
> > Changes in v6:
> >   - Raise an error for the case of ETH_16_POOLS in config vf rss, as the
> previous
> > logic have changed it into: ETH_32_POOLS.
> >
> > Changes in v4:
> >  - The number of rxq from config should be power of 2 and should not
> > bigger than
> > max VF rxq number(negotiated between guest and host).
> >
> > ---
> >  lib/librte_pmd_ixgbe/ixgbe_pf.c   |  15 ++
> >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 102
> > +-
> >  2 files changed, 105 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > b/lib/librte_pmd_ixgbe/ixgbe_pf.c index dbda9b5..93f6e43 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
> > *eth_dev)
> > IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
> >mac.num_rar_entries),
> > 0);
> > IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
> >mac.num_rar_entries),
> > 0);
> >
> > +   /*
> > +* VF RSS can support at most 4 queues for each VF, even if
> > +* 8 queues are available for each VF, it need refine to 4
> > +* queues here due to this limitation, otherwise no queue
> > +* will receive any packet even RSS is enabled.
> > +*/
> > +   if (eth_dev->data->dev_conf.rxmode.mq_mode ==
> > ETH_MQ_RX_VMDQ_RSS) {
> > +   if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
> > +   RTE_ETH_DEV_SRIOV(eth_dev).active =
> > ETH_32_POOLS;
> > +   RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
> > +   RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
> > +   dev_num_vf(eth_dev) * 4;
> > +   }
> > +   }
> > +
> 
> I did not looked before at your patches but I think you are messing with
> things that should not be changed:
> 
> Why you are changing those values. They are set up during
> ixgbe_pf_host_init(). Limitation you are describing is only RSS related. If
> there will be reconfiguration from ETH_MQ_RX_VMDQ_RSS to other mode
> those value need to be re-evaluated. If you find this kind of limitation you
> should handle it during RSS part configuration. Or if your way is the right 
> way
> you should explicitly make separate function that will re-evaluate those
> parameters each time.
> 
> Second issue with this code is that the nb_q_per_pool is changed from:
> ixgbe_pf_host_configure() -> ixgbe_dev_start() -> rte_eth_dev_start() and
> rte_eth_dev_check_vf_rss_rxq_num() -> rte_eth_dev_check_mq_mode() ->
> rte_eth_dev_configure()
> 
> Which one is the right one? If both, why they are calculated twice?
> 
> I don't think that rte_eth_dev_data::sriov field should be changed at all - it
> holds current SRIOV capabilities.
> If this will change during runtime it no point to have this field at all and 
> should
> be some kind of "siov_get()"
> function that will calculate and return those parameters dynamically.
> 
> Please refer also to
>  .com>
> for further issues.
> 
> I think this patchset should not be applied.

The better way should be either raise your comments before this patch is merged 
into mainline, or
You send out a patch to fix it.
I agree on part of what you said, the check is not necessary for vf rss in 
pf_host_configure because
Check_mq_mode has already check the queue number, I will send out a patch to 
fix it by removing this check.

On the other hand, I disagree with you on " rte_eth_dev_data::sriov field 
should be changed at all ",
The reason we need refine those value, is that those value get in pf_init, 
which is called on dev probe stage,
And those value are not accurate, they should vary according to mq mode, the mq 
mode could be determined only after
Dev is configured.

> 
> > /* set VMDq map to default PF pool */
> > hw->mac.ops.set_vmdq(hw, 0,
> > RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > index f69abda..20627df 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > @@ -3327,6 +3327,67 @@ 

[dpdk-dev] [PATCH v4 05/11] ethdev: Add functions that will be used by port hotplug functions

2015-01-21 Thread Qiu, Michael
On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
> The patch adds following functions.
>
> - rte_eth_dev_save()
>   The function is used for saving current rte_eth_dev structures.
> - rte_eth_dev_get_changed_port()
>   The function receives the rte_eth_dev structures, then compare
>   these with current values to know which port is actually
>   attached or detached.
> - rte_eth_dev_get_addr_by_port()
>   The function returns a pci address of a ethdev specified by port
>   identifier.
> - rte_eth_dev_get_port_by_addr()
>   The function returns a port identifier of a ethdev specified by
>   pci address.
> - rte_eth_dev_get_name_by_port()
>   The function returns a unique identifier name of a ethdev
>   specified by port identifier.
> - Add rte_eth_dev_check_detachable()
>   The function returns whether a PMD supports detach function.
>
> Also the patch changes scope of rte_eth_dev_allocated() to global.
> This function will be called by virtual PMDs to support port hotplug.
> So change scope of the function to global.
>
> v4:
> - Add paramerter checking.
> v3:
> - Fix if-condition bug while comparing pci addresses.
> - Add error checking codes.
> Reported-by: Mark Enright 
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_ether/rte_ethdev.c | 102 
> +-
>  lib/librte_ether/rte_ethdev.h |  80 +
>  2 files changed, 181 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index e5145b7..e572ef4 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
>   RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
>  }
>  
> -static struct rte_eth_dev *
> +struct rte_eth_dev *
>  rte_eth_dev_allocated(const char *name)
>  {
>   unsigned i;
> @@ -422,6 +422,106 @@ rte_eth_dev_count(void)
>   return (nb_ports);
>  }
>  
> +void
> +rte_eth_dev_save(struct rte_eth_dev *devs)
> +{
> + if (devs == NULL)
> + return;
> +
> + /* save current rte_eth_devices */
> + memcpy(devs, rte_eth_devices,
> + sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS);
> +}
> +
> +int
> +rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
> +{
> + if ((devs == NULL) || (port_id == NULL))
> + return -1;
> +
> + /* check which port was attached or detached */
> + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) {
> + if (rte_eth_devices[*port_id].attached ^ devs->attached)
> + return 0;
> + }
> + return 1;
> +}
> +
> +int
> +rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
> +{
> + if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return -EINVAL;
> + }
> +
> + if (addr == NULL) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -EINVAL;
> + }
> +
> + *addr = rte_eth_devices[port_id].pci_dev->addr;
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
> +{
> + struct rte_pci_addr *tmp;
> +
> + if ((addr == NULL) || (port_id == NULL)) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -1;

Is it better to replace "-1" to "-EINVAL" ?

> + }
> +
> + for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
> + if (!rte_eth_devices[*port_id].attached)
> + continue;
> + if (!rte_eth_devices[*port_id].pci_dev)
> + continue;
> + tmp = _eth_devices[*port_id].pci_dev->addr;
> + if (eal_compare_pci_addr(tmp, addr) == 0)
> + return 0;
> + }
> + return -1;
> +}
> +
> +int
> +rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
> +{
> + char *tmp;
> +
> + if (rte_eth_dev_validate_port(port_id) == DEV_INVALID) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return -EINVAL;
> + }
> +
> + if (name == NULL) {
> + PMD_DEBUG_TRACE("Null pointer is specified\n");
> + return -EINVAL;
> + }
> +
> + /* shouldn't check 'rte_eth_devices[i].data',
> +  * because it might be overwritten by VDEV PMD */
> + tmp = rte_eth_dev_data[port_id].name;
> + strncpy(name, tmp, strlen(tmp) + 1);
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_check_detachable(uint8_t port_id)
> +{
> + uint32_t drv_flags;
> +
> + if (port_id >= RTE_MAX_ETHPORTS) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return -EINVAL;
> + }
> +
> + drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
> + return !(drv_flags & RTE_PCI_DRV_DETACHABLE);
> +}
> +
>  static int
>  

[dpdk-dev] [PATCH v8 3/4] i40e: support of controlling hash functions

2015-01-21 Thread Zhang, Helin
Hi Thomas

Sure, I will do it ASAP! Thank you for the helps!

Regards,
Helin

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, January 20, 2015 3:54 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 3/4] i40e: support of controlling hash
> functions
> 
> Hi Helin,
> 
> 2014-12-02 10:19, Helin Zhang:
> > Hash filter control has been implemented for i40e. It includes
> > getting/setting,
> > - global hash configurations (hash function type, and symmetric
> >   hash enable per flow type)
> > - symmetric hash enable per port
> >
> > Signed-off-by: Helin Zhang 
> > ---
> >  lib/librte_ether/rte_eth_ctrl.h   |  63 
> >  lib/librte_pmd_i40e/i40e_ethdev.c | 294
> > +-
> >  2 files changed, 355 insertions(+), 2 deletions(-)
> 
> Please, could you split ethdev and i40e parts while keeping Konstantin's ack?
> 
> [...]
> > + * Each bit in valid_bit_mask[] indicates if the coresponding bit in
> 
> Typo: corresponding
> 
> [...]
> > +   /** Bit mask indicates if the coresponding bit is valid */
> 
> Same typo
> 
> [...]
> > +   /** Details of hash filter infomation */
> 
> Typo: information
> 
> > +   union {
> > +   /* For RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT */
> > +   uint8_t enable;
> > +   /* Global configurations of hash filter */
> > +   struct rte_eth_hash_global_conf global_conf;
> > +   } info;
> 
> Why these comments are not doxygen'ed?
> 
> Sorry for nitpicking, that's the last review pass ;)
> --
> Thomas