date:20150122

[dpdk-dev] make config on cents 6.6 with 3.5 kernel and 4.7 gcc not finding config target

2015-01-22 Thread Vipin Agrawal

I was able to build it without any issues earlier yesterday but all of a 
sudden, it started to give me this error.  All I did was add the boot options 
in grub.con : default_hugepagesz=1G hugepagesz=1G hugepages=4 hpet_mmap

Even after removing them, this problem still persists.

[root at localhost dpdk-1.8.0]# make config T=x86_64-default-linuxapp-gcc
make[1]: *** No rule to make target 
`/root/dpdk-1.8.0/config/defconfig_x86_64-default-linuxapp-gcc', needed by 
`/root/dpdk-1.8.0/build/.config'.  Stop.
make: *** [config] Error 2





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.

[dpdk-dev] [PATCH] ixgbe: do not include CRC in Tx byte count

2015-01-22 Thread step...@networkplumber.org

From: Stephen Hemminger 

The ixgbe driver was including CRC in the transmit packet byte
count, but not for packets received. This was notice when forwarding and
the number of bytes received was greater than the number of bytes transmitted
for the same number of packets. Make the driver behave like other
virtual devices and not include CRC in byte count. Use the same queue
counters already computed and used for Rx.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b58ec45..27355eb 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1724,12 +1724,15 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
struct ixgbe_hw_stats *hw_stats =
IXGBE_DEV_PRIVATE_TO_STATS(dev->data->dev_private);
uint32_t bprc, lxon, lxoff, total;
-   uint64_t total_missed_rx, total_qbrc, total_qprc;
+   uint64_t total_missed_rx;
+   uint64_t total_qbrc, total_qprc, total_qbtc, total_qptc;
unsigned i;

total_missed_rx = 0;
total_qbrc = 0;
total_qprc = 0;
+   total_qbtc = 0;
+   total_qptc = 0;

hw_stats->crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS);
hw_stats->illerrc += IXGBE_READ_REG(hw, IXGBE_ILLERRC);
@@ -1770,6 +1773,8 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)

total_qprc += hw_stats->qprc[i];
total_qbrc += hw_stats->qbrc[i];
+   total_qptc += hw_stats->qptc[i];
+   total_qbtc += hw_stats->qbtc[i];
}
hw_stats->mlfc += IXGBE_READ_REG(hw, IXGBE_MLFC);
hw_stats->mrfc += IXGBE_READ_REG(hw, IXGBE_MRFC);
@@ -1860,8 +1865,8 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
/* Fill out the rte_eth_stats statistics structure */
stats->ipackets = total_qprc;
stats->ibytes = total_qbrc;
-   stats->opackets = hw_stats->gptc;
-   stats->obytes = hw_stats->gotc;
+   stats->opackets = total_qptc;
+   stats->obytes = total_qbtc;
stats->imcasts = hw_stats->mprc;

for (i = 0; i < IXGBE_QUEUE_STAT_COUNTERS; i++) {
-- 
2.1.4

[dpdk-dev] [PATCH] mk: allow application to override clean

2015-01-22 Thread step...@networkplumber.org

From: Stephen Hemminger 

In some cases application may want to have additional rules
for clean. This can be handled by allowing the double colon
form of rule.

 https://www.gnu.org/software/make/manual/html_node/Double_002dColon.html

Single colon and double colon rules for same target causes
an error.

Signed-off-by: Stephen Hemminger 

--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -325,7 +325,7 @@
 # Clean all generated files
 #
 .PHONY: clean
-clean: _postclean
+clean:: _postclean
$(Q)rm -f $(_BUILD_TARGETS) $(_INSTALL_TARGETS) $(_CLEAN_TARGETS)

 .PHONY: doclean
---
 mk/rte.app.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 40afb2c..9c8b06a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -325,7 +325,7 @@ $(RTE_OUTPUT)/app/$(APP).map: $(APP)
 # Clean all generated files
 #
 .PHONY: clean
-clean: _postclean
+clean:: _postclean
$(Q)rm -f $(_BUILD_TARGETS) $(_INSTALL_TARGETS) $(_CLEAN_TARGETS)

 .PHONY: doclean
-- 
2.1.4

[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.

2015-01-22 Thread Ananyev, Konstantin



> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, January 22, 2015 6:55 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and 
> several other enhancements.
> 
> On Tue, Jan 20, 2015 at 06:40:49PM +, Konstantin Ananyev wrote:
> > v3 changes:
> > Applied review comments from Thomas:
> > - fix spelling errors reported by codespell.
> > - split last patch into two:
> > first to remove unused macros,
> > second to add some comments about ACL internal layout.
> >
> > v2 changes:
> > - When build with the compilers that don't support AVX2 instructions,
> > make rte_acl_classify_avx2() do nothing and return an error.
> > - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*.
> > - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y
> > always buildable.
> >
> > This patch series contain several fixes and enhancements for ACL library.
> > See complete list below.
> > Two main changes that are externally visible:
> > - Introduce new classify method:  RTE_ACL_CLASSIFY_AVX2.
> > It uses AVX2 instructions and 256 bit wide data types
> > to perform internal trie traversal.
> > That helps to increase classify() throughput.
> > This method is selected as default one on CPUs that supports AVX2.
> > - Introduce new field in the build config structure: max_size.
> > It specifies maximum size that internal RT structure for given context
> > can reach.
> > The purpose of that is to allow user to decide about space/performance 
> > trade-off
> > (faster classify() vs less space for RT internal structures)
> > for each given set of rules.
> >
> > Konstantin Ananyev (18):
> >   fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y
> >   app/test: few small fixes fot test_acl.c
> >   librte_acl: make data_indexes long enough to survive idle transitions.
> >   librte_acl: remove build phase heuristsic with negative performance
> > effect.
> >   librte_acl: fix a bug at build phase that can cause matches beeing
> > overwirtten.
> >   librte_acl: introduce DFA nodes compression (group64) for identical
> > entries.
> >   librte_acl: build/gen phase - simplify the way match nodes are
> > allocated.
> >   librte_acl: make scalar RT code to be more similar to vector one.
> >   librte_acl: a bit of RT code deduplication.
> >   EAL: introduce rte_ymm and relatives in rte_common_vect.h.
> >   librte_acl: add AVX2 as new rte_acl_classify() method
> >   test-acl: add ability to manually select RT method.
> >   librte_acl: Remove search_sse_2 and relatives.
> >   libter_acl: move lo/hi dwords shuffle out from calc_addr
> >   libte_acl: make calc_addr a define to deduplicate the code.
> >   libte_acl: introduce max_size into rte_acl_config.
> >   libte_acl: remove unused macros.
> >   libte_acl: add some comments about ACL internal layout.
> >
> >  app/test-acl/main.c | 126 +++--
> >  app/test/test_acl.c |   8 +-
> >  examples/l3fwd-acl/main.c   |   3 +-
> >  examples/l3fwd/main.c   |   2 +-
> >  lib/librte_acl/Makefile |  18 +
> >  lib/librte_acl/acl.h|  58 ++-
> >  lib/librte_acl/acl_bld.c| 392 +++-
> >  lib/librte_acl/acl_gen.c| 268 +++
> >  lib/librte_acl/acl_run.h|   7 +-
> >  lib/librte_acl/acl_run_avx2.c   |  54 +++
> >  lib/librte_acl/acl_run_avx2.h   | 284 
> >  lib/librte_acl/acl_run_scalar.c |  65 ++-
> >  lib/librte_acl/acl_run_sse.c| 585 
> > +---
> >  lib/librte_acl/acl_run_sse.h| 357 +++
> >  lib/librte_acl/acl_vect.h   | 132 +++---
> >  lib/librte_acl/rte_acl.c|  47 +-
> >  lib/librte_acl/rte_acl.h|   4 +
> >  lib/librte_acl/rte_acl_osdep_alone.h|  47 +-
> >  lib/librte_eal/common/include/rte_common_vect.h |  39 +-
> >  lib/librte_lpm/rte_lpm.h|   2 +-
> >  20 files changed, 1444 insertions(+), 1054 deletions(-)
> >  create mode 100644 lib/librte_acl/acl_run_avx2.c
> >  create mode 100644 lib/librte_acl/acl_run_avx2.h
> >  create mode 100644 lib/librte_acl/acl_run_sse.h
> >
> > --
> > 1.8.5.3
> >
> >
> I'm sorry I've not looked at this yet Konstantin, I'm trying to get to it soon
> Neil

No worries, and thanks for your reviews :)
Konstantin

[dpdk-dev] Packet drops during non-exhaustive flood with OVS and 1.8.0

2015-01-22 Thread Andrey Korolyov

On Wed, Jan 21, 2015 at 8:02 PM, Andrey Korolyov  wrote:
> Hello,
>
> I observed that the latest OVS with dpdk-1.8.0 and igb_uio starts to
> drop packets earlier than a regular Linux ixgbe 10G interface, setup
> follows:
>
> receiver/forwarder:
> - 8 core/2 head system with E5-2603v2, cores 1-3 are given to OVS exclusively
> - n-dpdk-rxqs=6, rx scattering is not enabled
> - x520 da
> - 3.10/3.18 host kernel
> - during 'legacy mode' testing, queue interrupts are scattered through all 
> cores
>
> sender:
> - 16-core E52630, netmap framework for packet generation
> - pkt-gen -f tx -i eth2 -s 10.6.9.0-10.6.9.255 -d
> 10.6.10.0-10.6.10.255 -S 90:e2:ba:84:19:a0 -D 90:e2:ba:85:06:07 -R
> 1100, results in 11Mpps 60-byte packet flood, there are constant
> values during test.
>
> OVS contains only single drop rule at the moment:
> ovs-ofctl add-flow br0 in_port=1,actions=DROP
>
> Packet generator was launched for tens of seconds for both Linux stack
> and OVS+DPDK cases, resulting in zero drop/error count on the
> interface in first, along with same counter values on pktgen and host
> interface stat (means that the none of generated packets are
> unaccounted).
>
> I selected rate for about 11M because OVS starts to do packet drop
> around this value, after same short test interface stat shows
> following:
>
> statistics  : {collisions=0, rx_bytes=22003928768,
> rx_crc_err=0, rx_dropped=0, rx_errors=10694693, rx_frame_err=0,
> rx_over_err=0, rx_packets=343811387, tx_bytes=0, tx_dropped=0,
> tx_errors=0, tx_packets=0}
>
> pktgen side:
> Sent 354506080 packets, 60 bytes each, in 32.23 seconds.
> Speed: 11.00 Mpps Bandwidth: 5.28 Gbps (raw 7.39 Gbps)
>
> If rate will be increased up to 13-14Mpps, the relative error/overall
> ratio will rise up to a one third. So far OVS on dpdk shows perfect
> results and I do not want to reject this solution due to exhaustive
> behavior like described one, so I`m open for any suggestions to
> improve the situation (except using 1.7 branch :) ).

At a glance it looks like there is a problem with pmd threads, as they
starting to consume about five thousandth of sys% on a dedicated cores
during flood but in theory they should not. Any ideas for
debugging/improving this situation are very welcomed!

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng



On 2015/1/22 19:34, Bruce Richardson wrote:
> On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
>> On 2015/01/22 16:35, Matthew Hall wrote:
>>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
 Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
>>> No guarantee. But a theory. It might use some things from the EAL init to 
>>> figure out which version of the accelerated algorithm to use.
>>
>> This selection is done at compile-time.
>> And if the size is constant, I guess DPDK assumes memcpy is replaced by
>> inline __builtin_memcpy.
>> I haven't checked the performance of builtin memcpy, but probably much
>> faster.
>>
> 
> Yes, that assumption is correct. A couple of years ago we discovered that for
> constant size values, the compiler would generate much faster code for us
> using a regular memcpy than rte_memcpy, hence the macro.
> 
> /Bruce
> 
>> Tetsuya
>>
>>> Matthew.
>>
>>
> 
> 

Hi,Bruce

I test it,most results like you said use constant may be faster,but sometimes 
not.

linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279893712 @@ not faster
rte_memcpy(variable) used:277818600
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279264328 @@ not faster
rte_memcpy(variable) used:277667116
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279491832 @@ not faster
rte_memcpy(variable) used:277622772
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279402156 @@ not faster
rte_memcpy(variable) used:277738464
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279305172 @@ not faster
rte_memcpy(variable) used:277483004
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279784124 @@ not faster
rte_memcpy(variable) used:277605332
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322817260
rte_memcpy(variable) used:350333864
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322840748
rte_memcpy(variable) used:350297868
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322488240
rte_memcpy(variable) used:350348652
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:322021428
rte_memcpy(variable) used:350416440
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:321370900
rte_memcpy(variable) used:350355796
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:322704552
rte_memcpy(variable) used:349900832
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:422705828
rte_memcpy(variable) used:425493328
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:422421840 @@ not faster
rte_memcpy(variable) used:413691412
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:425233088 @@ not faster
rte_memcpy(variable) used:421136724
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:901014608 @@ not faster
rte_memcpy(variable) used:900997388
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:900803308 @@ not faster
rte_memcpy(variable) used:900794076
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:901842436 @@ not faster
rte_memcpy(variable) used:901218984
linux-mnSyvH:/mnt/sdb/linhf/test #



here is my test codes:

#include 
#include 
#include 


int main(int narg, char** args)
{
int i;
char buf[1024];
uint64_t start, end;

if (narg < 3) {
printf("usage:./rte_memcpy_test size times\n");
return 0;
}

size_t size_v = atoi(args[1]);
const size_t size_c = atoi(args[1]);
int times = atoi(args[2]);

start = rte_rdtsc();
for(i = 0; i < times; i++) {
rte_memcpy(buf, buf, size_c);
}
end = rte_rdtsc();
printf("rte_memcpy(constant) used:%llu\n", end - start);

start = rte_rdtsc();
for (i = 0; i < times; i++) {
rte_memcpy(buf, buf, size_v);
}
end = rte_rdtsc();
printf("rte_memcpy(variable) used:%llu\n", end - start);

return 0;
}





-- 
Regards,
Haifeng

[dpdk-dev] Segmentation fault in ixgbe_rxtx_vec.c:444 with 1.8.0

2015-01-22 Thread Prashant Upadhyaya

On Wed, Jan 21, 2015 at 7:19 PM, Bruce Richardson <
bruce.richardson at intel.com> wrote:

> On Tue, Jan 20, 2015 at 11:39:03AM +0100, Martin Weiser wrote:
> > Hi again,
> >
> > I did some further testing and it seems like this issue is linked to
> > jumbo frames. I think a similar issue has already been reported by
> > Prashant Upadhyaya with the subject 'Packet Rx issue with DPDK1.8'.
> > In our application we use the following rxmode port configuration:
> >
> > .mq_mode= ETH_MQ_RX_RSS,
> > .split_hdr_size = 0,
> > .header_split   = 0,
> > .hw_ip_checksum = 1,
> > .hw_vlan_filter = 0,
> > .jumbo_frame= 1,
> > .hw_strip_crc   = 1,
> > .max_rx_pkt_len = 9000,
> >
> > and the mbuf size is calculated like the following:
> >
> > (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
> >
> > This works fine with DPDK 1.7 and jumbo frames are split into buffer
> > chains and can be forwarded on another port without a problem.
> > With DPDK 1.8 and the default configuration (CONFIG_RTE_IXGBE_INC_VECTOR
> > enabled) the application sometimes crashes like described in my first
> > mail and sometimes packet receiving stops with subsequently arriving
> > packets counted as rx errors. When CONFIG_RTE_IXGBE_INC_VECTOR is
> > disabled the packet processing also comes to a halt as soon as jumbo
> > frames arrive with a the slightly different effect that now
> > rte_eth_tx_burst refuses to send any previously received packets.
> >
> > Is there anything special to consider regarding jumbo frames when moving
> > from DPDK 1.7 to 1.8 that we might have missed?
> >
> > Martin
> >
> >
> >
> > On 19.01.15 11:26, Martin Weiser wrote:
> > > Hi everybody,
> > >
> > > we quite recently updated one of our applications to DPDK 1.8.0 and are
> > > now seeing a segmentation fault in ixgbe_rxtx_vec.c:444 after a few
> minutes.
> > > I just did some quick debugging and I only have a very limited
> > > understanding of the code in question but it seems that the 'continue'
> > > in line 445 without increasing 'buf_idx' might cause the problem. In
> one
> > > debugging session when the crash occurred the value of 'buf_idx' was 2
> > > and the value of 'pkt_idx' was 8965.
> > > Any help with this issue would be greatly appreciated. If you need any
> > > further information just let me know.
> > >
> > > Martin
> > >
> > >
> >
> Hi Martin, Prashant,
>
> I've managed to reproduce the issue here and had a look at it. Could you
> both perhaps try the proposed change below and see if it fixes the problem
> for
> you and gives you a working system? If so, I'll submit this as a patch fix
> officially - or go back to the drawing board, if not. :-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> index b54cb19..dfaccee 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> @@ -402,10 +402,10 @@ reassemble_packets(struct igb_rx_queue *rxq, struct
> rte_mbuf **rx_bufs,
> struct rte_mbuf *pkts[RTE_IXGBE_VPMD_RX_BURST]; /*finished pkts*/
> struct rte_mbuf *start = rxq->pkt_first_seg;
> struct rte_mbuf *end =  rxq->pkt_last_seg;
> -   unsigned pkt_idx = 0, buf_idx = 0;
> +   unsigned pkt_idx, buf_idx;
>
>
> -   while (buf_idx < nb_bufs) {
> +   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> if (end != NULL) {
> /* processing a split packet */
> end->next = rx_bufs[buf_idx];
> @@ -448,7 +448,6 @@ reassemble_packets(struct igb_rx_queue *rxq, struct
> rte_mbuf **rx_bufs,
> rx_bufs[buf_idx]->data_len += rxq->crc_len;
> rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
> }
> -   buf_idx++;
> }
>
> /* save the partial packet for next time */
>
>
> Regards,
> /Bruce
>
> Hi Bruce,

I am afraid your patch did not work for me. In my case I am not trying to
receive jumbo frames but normal frames. They are not received at my
application. Further, your patched function is not getting stimulated in my
usecase.

Regards
-Prashant

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Luke Gorrie

On 22 January 2015 at 14:29, Jay Rolette  wrote:

> Microseconds matter. Scaling up to 100GbE, nanoseconds matter.
>

True. Is there a cut-off point though? Does one nanosecond matter?

AVX512 will fit a 64-byte packet in one register and move that to or from
memory with one instruction. L1/L2 cache bandwidth per server is growing on
a double-exponential curve (both bandwidth per core and cores per CPU). I
wonder if moving data around in cache will soon be too cheap for us to
justify worrying about.

I suppose that 1500 byte wide registers are still a ways off though ;-)

Cheers!
-Luke (begging your indulgence for wandering off on a tangent)

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Tetsuya Mukawa

On 2015/01/22 16:35, Matthew Hall wrote:
> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> No guarantee. But a theory. It might use some things from the EAL init to 
> figure out which version of the accelerated algorithm to use.

This selection is done at compile-time.
And if the size is constant, I guess DPDK assumes memcpy is replaced by
inline __builtin_memcpy.
I haven't checked the performance of builtin memcpy, but probably much
faster.

Tetsuya

> Matthew.

[dpdk-dev] [PATCH v4 06/11] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-01-22 Thread Tetsuya Mukawa

Hi Michael,

On 2015/01/22 17:12, Qiu, Michael wrote:
> On 1/21/2015 6:01 PM, Tetsuya Mukawa wrote:
>> Hi Michael,
>>
>> On 2015/01/20 18:23, Qiu, Michael wrote:
>>> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
 The patch adds functions for unmapping igb_uio resources. The patch is only
 for Linux and igb_uio environment. VFIO and BSD are not supported.

 v4:
 - Add paramerter checking.
 - Add header file to determine if hotplug can be enabled.

 Signed-off-by: Tetsuya Mukawa 
 ---
  lib/librte_eal/common/Makefile  |  1 +
  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
  lib/librte_eal/linuxapp/eal/eal_pci.c   | 38 +++
  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
 +
  5 files changed, 156 insertions(+)
  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h

 diff --git a/lib/librte_eal/common/Makefile 
 b/lib/librte_eal/common/Makefile
 index 52c1a5f..db7cc93 100644
 --- a/lib/librte_eal/common/Makefile
 +++ b/lib/librte_eal/common/Makefile
 @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
  INC += rte_hexdump.h rte_devargs.h rte_dev.h
  INC += rte_common_vect.h
  INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 +INC += rte_dev_hotplug.h
  
  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
  INC += rte_warnings.h
 diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
 b/lib/librte_eal/common/include/rte_dev_hotplug.h
 new file mode 100644
 index 000..b333e0f
 --- /dev/null
 +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
 @@ -0,0 +1,44 @@
 +/*-
 + *   BSD LICENSE
 + *
 + *   Copyright(c) 2015 IGEL Co.,LTd.
 + *   All rights reserved.
 + *
 + *   Redistribution and use in source and binary forms, with or without
 + *   modification, are permitted provided that the following conditions
 + *   are met:
 + *
 + * * Redistributions of source code must retain the above copyright
 + *   notice, this list of conditions and the following disclaimer.
 + * * Redistributions in binary form must reproduce the above copyright
 + *   notice, this list of conditions and the following disclaimer in
 + *   the documentation and/or other materials provided with the
 + *   distribution.
 + * * Neither the name of IGEL Co.,Ltd. nor the names of its
 + *   contributors may be used to endorse or promote products derived
 + *   from this software without specific prior written permission.
 + *
 + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 + */
 +
 +#ifndef _RTE_DEV_HOTPLUG_H_
 +#define _RTE_DEV_HOTPLUG_H_
 +
 +/*
 + * determine if hotplug can be enabled on the system
 + */
 +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
>>> As you said, VFIO should not work with it, so does it need to add the
>>> vfio check here?
>> Could I have a advice of you?
>> First I guess it's the best to include "eal_vfio.h" here, and add
>> checking of VFIO_PRESENT macro.
>
> I have a question, will your hotplug  feature support freebsd ?
>
> If not, how about to put it in  "lib/librte_eal/linuxapp/eal/" ? Also 
> include attach or detach affairs.

I appreciate your comments.

So far, FreeBSD doesn't support PCI hotplug. So I didn't implement code
for FreeBSD.
But in the future, I want to implement it when FreeBSD supports it.
Also my hotplug implementation depends on legacy code already
implemented in common layer.
Anyway, for me it's nice to implement the feature in common layer.

>> But it seems I cannot reach "eal_vfio.h" from this file.
> Yes, you can't :)
>
>> My second option is just checking RTE_EAL_VFIO macro.
>> But according to "eal_vfio.h", if kernel is under 3.6.0, VFIO_PRESENT
> Actually,  in my opinion, whatever vfio or uio, only need be care in
> runtime.
>
> DPDK to check vfio only to add supp

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread EDMISON, Kelvin (Kelvin)



On 2015-01-21, 3:54 PM, "Neil Horman"  wrote:

>On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
>> On Wed, 21 Jan 2015 13:26:20 +
>> Bruce Richardson  wrote:
>> 
>> > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote:
>> > > 
>> > > On 21/01/15 14:02, Bruce Richardson wrote:
>> > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote:
>> > > >>On 21/01/15 04:44, Wang, Zhihong wrote:
>> > > -Original Message-
>> > > From: Richardson, Bruce
>> > > Sent: Wednesday, January 21, 2015 12:15 AM
>> > > To: Neil Horman
>> > > Cc: Wang, Zhihong; dev at dpdk.org
>> > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>> > > 
>> > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote:
>> > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong wrote:
>> > > >>>-Original Message-
>> > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com]
>> > > >>>Sent: Monday, January 19, 2015 9:02 PM
>> > > >>>To: Wang, Zhihong
>> > > >>>Cc: dev at dpdk.org
>> > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
>> > > >>>
>> > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800,
>>zhihong.wang at intel.com
>> > > wrote:
>> > > This patch set optimizes memcpy for DPDK for both SSE and
>>AVX
>> > > platforms.
>> > > It also extends memcpy test coverage with unaligned cases
>>and
>> > > more test
>> > > >>>points.
>> > > Optimization techniques are summarized below:
>> > > 
>> > > 1. Utilize full cache bandwidth
>> > > 
>> > > 2. Enforce aligned stores
>> > > 
>> > > 3. Apply load address alignment based on architecture
>>features
>> > > 
>> > > 4. Make load/store address available as early as possible
>> > > 
>> > > 5. General optimization techniques like inlining, branch
>> > > reducing, prefetch pattern access
>> > > 
>> > > Zhihong Wang (4):
>> > >    Disabled VTA for memcpy test in app/test/Makefile
>> > >    Removed unnecessary test cases in test_memcpy.c
>> > >    Extended test coverage in test_memcpy_perf.c
>> > >    Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
>>and AVX
>> > >  platforms
>> > > 
>> > >   app/test/Makefile  |   6 +
>> > >   app/test/test_memcpy.c |  52
>>+-
>> > >   app/test/test_memcpy_perf.c| 238
>>+---
>> > >   .../common/include/arch/x86/rte_memcpy.h   | 664
>> > > >>>+++--
>> > >   4 files changed, 656 insertions(+), 304 deletions(-)
>> > > 
>> > > --
>> > > 1.9.3
>> > > 
>> > > 
>> > > >>>Are you able to compile this with gcc 4.9.2?  The
>>compilation of
>> > > >>>test_memcpy_perf is taking forever for me.  It appears hung.
>> > > >>>Neil
>> > > >>Neil,
>> > > >>
>> > > >>Thanks for reporting this!
>> > > >>It should compile but will take quite some time if the CPU
>>doesn't support
>> > > AVX2, the reason is that:
>> > > >>1. The SSE & AVX memcpy implementation is more complicated
>>than
>> > > AVX2
>> > > >>version thus the compiler takes more time to compile and
>>optimize 2.
>> > > >>The new test_memcpy_perf.c contains 126 constants memcpy
>>calls for
>> > > >>better test case coverage, that's quite a lot
>> > > >>
>> > > >>I've just tested this patch on an Ivy Bridge machine with GCC
>>4.9.2:
>> > > >>1. The whole compile process takes 9'41" with the original
>> > > >>test_memcpy_perf.c (63 + 63 = 126 constant memcpy calls) 2.
>>It takes
>> > > >>only 2'41" after I reduce the constant memcpy call number to
>>12 + 12
>> > > >>= 24
>> > > >>
>> > > >>I'll reduce memcpy call in the next version of patch.
>> > > >>
>> > > >ok, thank you.  I'm all for optimzation, but I think a compile
>>that
>> > > >takes almost
>> > > >10 minutes for a single file is going to generate some raised
>>eyebrows
>> > > >when end users start tinkering with it
>> > > >
>> > > >Neil
>> > > >
>> > > >>Zhihong (John)
>> > > >>
>> > > Even two minutes is a very long time to compile, IMHO. The
>>whole of DPDK
>> > > doesn't take that long to compile right now, and that's with a
>>couple of huge
>> > > header files with routing tables in it. Any chance you could
>>cut compile time
>> > > down to a few seconds while still having reasonable tests?
>> > > Also, when there is AVX2 present on the system, what is the
>>compile time
>> > > like for that code?
>> > > 
>> > >  /Bruce
>> > > >>>Neil, Bruce,
>> > > >>>
>> > > >>>Some data first.
>> > > >>>
>> > > >>>Sandy Bridge without AVX2:
>> > > >>>1. original w/ 10 constant mem

[dpdk-dev] [PATCH v8 4/4] docs: Add ABI documentation

2015-01-22 Thread Butler, Siobhan A


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Thursday, January 22, 2015 3:49 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v8 4/4] docs: Add ABI documentation
> 
> Adding a document describing rudimentary ABI policy and adding notice
> space for any deprecation announcements
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 
> 
> ---
> Change notes:
> 
> v5) Updated documentation to add notes from Thomas M.
> 
> v6) Moved abi.txt to guides/rel_notes/abi.rst
> 
> v7) Updated abi.rst to integrate with index file
> Updated abi.rst to conform to rst formatting
> Updated abi.rst to include example deprecation notices.  Its not exactly 
> the
> language that Thomas indicated, but I think it makes the idea clear.
> 
> v8) Add missing file index.rst which was left out of the prior commit
> ---
>  doc/guides/rel_notes/abi.rst   | 40
> 
>  doc/guides/rel_notes/index.rst |  1 +
>  2 files changed, 41 insertions(+)
>  create mode 100644 doc/guides/rel_notes/abi.rst
> 
> diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst new
> file mode 100644 index 000..73d88ca
> --- /dev/null
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -0,0 +1,40 @@
> +ABI policy
> +==
> +ABI versions are set at the time of major release labeling, and ABI may
> +change multiple times between the last labeling and the HEAD label of
> +the git tree without warning.
> +
> +ABI versions, once released are available until such time as their
> +deprecation has been noted here for at least one major release cycle,
> +after it has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and
> +then the decision to remove it is made during the development of DPDK
> +1.9.  The decision will be recorded here, shipped with the DPDK 1.9
> +release, and actually removed when DPDK
> +1.10 ships.
> +
> +ABI versions may be deprecated in whole, or in part as needed by a given
> update.
> +
> +Some ABI changes may be too significant to reasonably maintain multiple
> +versions of.  In those events ABI's may be updated without backward
> +compatibility provided.  The requirements for doing so are:
> +
> +#. At least 3 acknoweldgements of the need on the dpdk.org #. A full
> +deprecation cycle must be made to offer downstream consumers sufficient
> +warning of the change.  E.g. if dpdk 2.0 is under development when the
> +change is proposed, a deprecation notice must be added to this file,
> +and released with dpdk 2.0.  Then the change may be incorporated for
> +dpdk 2.1 #. The LIBABIVER variable in the makefilei(s) where the ABI
> +changes are incorporated must be incremented in parallel with the ABI
> +changes themselves
> +
> +Note that the above process for ABI deprecation should not be
> +undertaken lightly.  ABI stability is extreemely important for
> +downstream consumers of the DPDK, especially when distributed in shared
> +object form.  Every effort should be made to preserve ABI whenever
> +possible.  For instance, reorganizing public structure field for
> +astetic or readability purposes should be avoided as it will cause ABI
> +breakage.  Only significant (e.g. performance) reasons should be seen as
> cause to alter ABI.
> +
> +Examples of Deprecation notices
> +---
> +* The Macro #RTE_FOO is deprecated and will be removed with version
> +2.0, to be replaced with the inline function rte_bar()
> +* The function rte_mbuf_grok has been updated to include new parameter
> +in version 2.0.  Backwards compatibility will be maintained for this
> +function until the release of version 2.1
> +* The members struct foo have been reorganized in release 2.0.  Existing
> binary applications will have backwards compatibility in release 2.0, while
> newly built binaries will need to reference new structure variant struct foo2.
> Compatibility will be removed in release 2.2, and all applications will 
> require
> updating a rebuilding to the new structure at that time, which will be
> renamed to the origional struct foo.
> +* Significant ABI changes are planned for the librte_dostuff library.  The
> upcomming release 2.0 will not contain these changes, but release 2.1 will,
> and no backwards compatibility is planned due to the invasive nature of
> these changes.  Binaries using this library built prior to version 2.1 will 
> require
> updating and recompilation.
> +
> +Deprecation Notices
> +---
> diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
> index 2724149..cf712b2 100644
> --- a/doc/guides/rel_notes/index.rst
> +++ b/doc/guides/rel_notes/index.rst
> @@ -48,4 +48,5 @@ Contents
>  updating_apps
>  known_issues
>  resolved_issues
> +abi
>  faq
> --
> 2.1.0

Acked-by: Siobhan Butler

[dpdk-dev] Purge all entries in a rte_hash

2015-01-22 Thread Padam J. Singh

Hello,

Is there some way to purge all keys in an rte_hash while maintaining read 
concurrency?

I am assuming that I can?t do a free/create step while other threads may still 
be doing lookups on it.

What I can do is store the key as part of the value in the array of user data, 
iterate this array and call delete on the hash with the key. Would this be the 
most optimal way?

Thanks,
Padam

[dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread

2015-01-22 Thread Cunming Liang

Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. ? dynamically created thread will be able to reset/stop timer for lcore 
thread,
but it will be not allowed to setup timer for itself or another non-lcore 
thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return 
straightway.

Signed-off-by: Cunming Liang 
---
 lib/librte_timer/rte_timer.c | 40 +++-
 lib/librte_timer/rte_timer.h |  2 +-
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 269a992..601c159 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE];

 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do { \
-   unsigned __lcore_id = rte_lcore_id();   \
-   priv_timer[__lcore_id].stats.name += (n);   \
+#define __TIMER_STAT_ADD(name, n) do { \
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) \
+   priv_timer[__lcore_id].stats.name += (n);   \
} while(0)
 #else
 #define __TIMER_STAT_ADD(name, n) do {} while(0)
@@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim,
unsigned lcore_id;

lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   lcore_id = LCORE_ID_ANY;

/* wait that the timer is in correct status before update,
 * and mark it as being configured */
while (success == 0) {
prev_status.u32 = tim->status.u32;

+   /*
+* prevent race condition of non-EAL threads
+* to update the timer. When 'owner == LCORE_ID_ANY',
+* it means updated by a non-EAL thread.
+*/
+   if (lcore_id == (unsigned)LCORE_ID_ANY &&
+   (uint16_t)lcore_id == prev_status.owner)
+   return -1;
+
/* timer is running on another core, exit */
if (prev_status.state == RTE_TIMER_RUNNING &&
-   (unsigned)prev_status.owner != lcore_id)
+   prev_status.owner != (uint16_t)lcore_id)
return -1;

/* timer is being configured on another core */
@@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* round robin for tim_lcore */
if (tim_lcore == (unsigned)LCORE_ID_ANY) {
-   tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore,
-  0, 1);
-   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   if (lcore_id < RTE_MAX_LCORE) {
+   tim_lcore = rte_get_next_lcore(
+   priv_timer[lcore_id].prev_lcore,
+   0, 1);
+   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   } else
+   tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1);
}

/* wait that the timer is in correct status before update,
@@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
return -1;

__TIMER_STAT_ADD(reset, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim)
return -1;

__TIMER_STAT_ADD(stop, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -499,6 +517,10 @@ void rte_timer_manage(void)
uint64_t cur_time;
int i, ret;

+   /* timer manager only runs on EAL thread */
+   if (lcore_id >= RTE_MAX_LCORE)
+   return;
+
__TIMER_STAT_ADD(manage, 1);
/* optimize for the case where per-cpu list is empty */
if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 4907cf5..5c5df91 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -76,7 +76,7 @@ extern "C" {
 #define RTE_TIMER_RUNNING 2 /**< State: timer function is running. */
 #define RTE_TIMER_CONFIG  3 /**< State: timer is being configured. */

-#define RTE_TIMER_NO_OWNER -1 /**< Timer has no owner. */
+#define RTE_TIMER_NO_OWNER -2 /**< Timer has no owner. */

 /**
  * Timer type: Periodic or single (one-shot).
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 14/15] ring: add support to non-EAL thread

2015-01-22 Thread Cunming Liang

ring debug stat won't take care non-EAL thread.

Signed-off-by: Cunming Liang 
---
 lib/librte_ring/rte_ring.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7cd5f2d..39bacdd 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -188,10 +188,12 @@ struct rte_ring {
  *   The number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   r->stats[__lcore_id].name##_objs += n;  \
-   r->stats[__lcore_id].name##_bulk += 1;  \
+#define __RING_STAT_ADD(r, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   r->stats[__lcore_id].name##_objs += n;  \
+   r->stats[__lcore_id].name##_bulk += 1;  \
+   }   \
} while(0)
 #else
 #define __RING_STAT_ADD(r, name, n) do {} while(0)
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

2015-01-22 Thread Cunming Liang

For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.

Signed-off-by: Cunming Liang 
---
 lib/librte_mempool/rte_mempool.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 3314651..4845f27 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -198,10 +198,12 @@ struct rte_mempool {
  *   Number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   mp->stats[__lcore_id].name##_objs += n; \
-   mp->stats[__lcore_id].name##_bulk += 1; \
+#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   mp->stats[__lcore_id].name##_objs += n; \
+   mp->stats[__lcore_id].name##_bulk += 1; \
+   }   \
} while(0)
 #else
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
@@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const 
*obj_table,
__MEMPOOL_STAT_ADD(mp, put, n);

 #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
-   /* cache is not enabled or single producer */
-   if (unlikely(cache_size == 0 || is_mp == 0))
+   /* cache is not enabled or single producer or none EAL thread */
+   if (unlikely(cache_size == 0 || is_mp == 0 ||
+lcore_id >= RTE_MAX_LCORE))
goto ring_enqueue;

/* Go straight to ring if put would overflow mem allocated for cache */
@@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table,
uint32_t cache_size = mp->cache_size;

/* cache is not enabled or single consumer */
-   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
+   if (unlikely(cache_size == 0 || is_mc == 0 ||
+n >= cache_size || lcore_id >= RTE_MAX_LCORE))
goto ring_dequeue;

cache = &mp->local_cache[lcore_id];
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 12/15] eal: fix recursive spinlock in non-EAL thraed

2015-01-22 Thread Cunming Liang

In non-EAL thread, lcore_id alrways be LCORE_ID_ANY.
It cann't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/include/generic/rte_spinlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h 
b/lib/librte_eal/common/include/generic/rte_spinlock.h
index dea885c..c7fb0df 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -179,7 +179,7 @@ static inline void 
rte_spinlock_recursive_init(rte_spinlock_recursive_t *slr)
  */
 static inline void rte_spinlock_recursive_lock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
rte_spinlock_lock(&slr->sl);
@@ -212,7 +212,7 @@ static inline void 
rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
  */
 static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
if (rte_spinlock_trylock(&slr->sl) == 0)
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 11/15] eal: set _lcore_id and _socket_id to (-1) by default

2015-01-22 Thread Cunming Liang

For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY.
The libraries using *_lcore_id* as index need to take care.
*_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity
by rte_thread_set_affinity()

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 5b16302..2b3c9a8 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,8 +56,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 6eb1525..ab94e20 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -57,8 +57,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 10/15] log: fix the gap to support non-EAL thread

2015-01-22 Thread Cunming Liang

For those non-EAL thread, *_lcore_id* is invalid and probably larger than 
RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log 
level and log type.
Others shares the global log level.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/eal_common_log.c  | 17 +++--
 lib/librte_eal/common/include/rte_log.h |  5 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index cf57619..e8dc94a 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -193,11 +193,20 @@ rte_set_log_type(uint32_t type, int enable)
rte_logs.type &= (~type);
 }

+/* Get global log type */
+uint32_t
+rte_get_log_type(void)
+{
+   return rte_logs.type;
+}
+
 /* get the current loglevel for the message beeing processed */
 int rte_log_cur_msg_loglevel(void)
 {
unsigned lcore_id;
lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   return rte_get_log_level();
return log_cur_msg[lcore_id].loglevel;
 }

@@ -206,6 +215,8 @@ int rte_log_cur_msg_logtype(void)
 {
unsigned lcore_id;
lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   return rte_get_log_type();
return log_cur_msg[lcore_id].logtype;
 }

@@ -265,8 +276,10 @@ rte_vlog(__attribute__((unused)) uint32_t level,

/* save loglevel and logtype in a global per-lcore variable */
lcore_id = rte_lcore_id();
-   log_cur_msg[lcore_id].loglevel = level;
-   log_cur_msg[lcore_id].logtype = logtype;
+   if (lcore_id < RTE_MAX_LCORE) {
+   log_cur_msg[lcore_id].loglevel = level;
+   log_cur_msg[lcore_id].logtype = logtype;
+   }

ret = vfprintf(f, format, ap);
fflush(f);
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index db1ea08..f83a0d9 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void);
 void rte_set_log_type(uint32_t type, int enable);

 /**
+ * Get the global log type.
+ */
+uint32_t rte_get_log_type(void);
+
+/**
  * Get the current loglevel for the message being processed.
  *
  * Before calling the user-defined stream for logging, the log
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 09/15] malloc: fix the issue of SOCKET_ID_ANY

2015-01-22 Thread Cunming Liang

Add check for rte_socket_id(), avoid get unexpected return like (-1).

Signed-off-by: Cunming Liang 
---
 lib/librte_malloc/malloc_heap.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/librte_malloc/malloc_heap.h b/lib/librte_malloc/malloc_heap.h
index b4aec45..a47136d 100644
--- a/lib/librte_malloc/malloc_heap.h
+++ b/lib/librte_malloc/malloc_heap.h
@@ -44,7 +44,12 @@ extern "C" {
 static inline unsigned
 malloc_get_numa_socket(void)
 {
-   return rte_socket_id();
+   unsigned socket_id = rte_socket_id();
+
+   if (socket_id == (unsigned)SOCKET_ID_ANY)
+   return 0;
+
+   return socket_id;
 }

 void *
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 08/15] enic: fix re-define freebsd compile complain

2015-01-22 Thread Cunming Liang

Some macro already been defined by freebsd 'sys/param.h'.

Signed-off-by: Cunming Liang 
---
 lib/librte_pmd_enic/enic.h| 1 +
 lib/librte_pmd_enic/enic_compat.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h
index c43417c..189c3b9 100644
--- a/lib/librte_pmd_enic/enic.h
+++ b/lib/librte_pmd_enic/enic.h
@@ -66,6 +66,7 @@
 #define ENIC_CALC_IP_CKSUM  1
 #define ENIC_CALC_TCP_UDP_CKSUM 2
 #define ENIC_MAX_MTU9000
+#undef PAGE_SIZE
 #define PAGE_SIZE   4096
 #define PAGE_ROUND_UP(x) \
unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1)))
diff --git a/lib/librte_pmd_enic/enic_compat.h 
b/lib/librte_pmd_enic/enic_compat.h
index b1af838..b84c766 100644
--- a/lib/librte_pmd_enic/enic_compat.h
+++ b/lib/librte_pmd_enic/enic_compat.h
@@ -67,6 +67,7 @@
 #define pr_warn(y, args...) dev_warning(0, y, ##args)
 #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__)

+#undef ALIGN
 #define ALIGN(x, a)  __ALIGN_MASK(x, (typeof(x))(a)-1)
 #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask))
 #define udelay usleep
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 07/15] eal: apply affinity of EAL thread by assigned cpuset

2015-01-22 Thread Cunming Liang

EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal.c  | 13 ---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 63 +-
 lib/librte_eal/linuxapp/eal/eal.c|  7 +++-
 lib/librte_eal/linuxapp/eal/eal_thread.c | 67 +++-
 4 files changed, 54 insertions(+), 96 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 69f3c03..98c5a83 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv)
int i, fctret, ret;
pthread_t thread_id;
static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
+   char cpuset[CPU_STR_LEN];

if (!rte_atomic32_test_and_set(&run_once))
return -1;
@@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");

-   RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n",
-   rte_config.master_lcore, thread_id);
-
eal_check_mem_on_local_socket();

rte_eal_mcfg_complete();

+   eal_thread_init_master(rte_config.master_lcore);
+
+   eal_thread_dump_affinity(cpuset, CPU_STR_LEN);
+
+   RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n",
+   rte_config.master_lcore, thread_id, cpuset);
+
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");

@@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv)
rte_panic("Cannot create thread\n");
}

-   eal_thread_init_master(rte_config.master_lcore);
-
/*
 * Launch a dummy function on all slave lcores, so that master lcore
 * knows they are all ready when this function returns.
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index d0c077b..5b16302 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -103,55 +103,27 @@ eal_thread_set_affinity(void)
 {
int s;
pthread_t thread;
-
-/*
- * According to the section VERSIONS of the CPU_ALLOC man page:
- *
- * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were added
- * in glibc 2.3.3.
- *
- * CPU_COUNT() first appeared in glibc 2.6.
- *
- * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),CPU_ALLOC(),
- * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(),  CPU_SET_S(),  CPU_CLR_S(),
- * CPU_ISSET_S(),  CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and CPU_EQUAL_S()
- * first appeared in glibc 2.7.
- */
-#if defined(CPU_ALLOC)
-   size_t size;
-   cpu_set_t *cpusetp;
-
-   cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
-   if (cpusetp == NULL) {
-   RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
-   return -1;
-   }
-
-   size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
-   CPU_ZERO_S(size, cpusetp);
-   CPU_SET_S(rte_lcore_id(), size, cpusetp);
+   unsigned lcore_id = rte_lcore_id();

thread = pthread_self();
-   s = pthread_setaffinity_np(thread, size, cpusetp);
+   s = pthread_setaffinity_np(thread, sizeof(cpuset_t),
+  &lcore_config[lcore_id].cpuset);
if (s != 0) {
RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   CPU_FREE(cpusetp);
return -1;
}

-   CPU_FREE(cpusetp);
-#else /* CPU_ALLOC */
-   cpuset_t cpuset;
-   CPU_ZERO( &cpuset );
-   CPU_SET( rte_lcore_id(), &cpuset );
+   /* acquire system unique id  */
+   rte_gettid();
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(&lcore_config[lcore_id].cpuset);
+
+   CPU_COPY(&lcore_config[lcore_id].cpuset, &RTE_PER_LCORE(_cpuset));
+
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);

-   thread = pthread_self();
-   s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
-   if (s != 0) {
-   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   return -1;
-   }
-#endif
return 0;
 }

@@ -174,6 +146,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
unsigned lcore_id;
pthread_t thread_id;
int m2s, s2m;
+   char cpuset[CPU_STR_LEN];

thread_id = pthread_self();

@@ -185,9 +158,6 @@ eal_thread_loop(__attribute__((unused)) void *arg)
if (lcore_id == RTE_MAX_LCORE)
rte_panic("cannot retrieve lcore id\n");

-   RTE_LOG(DEBUG, EAL, "Core %u is ready (tid=%p)\n",
-   lcore_id, thread_id);
-
m2s = lcore_config[lcore_id].pipe_master2slave[0];
s2m = lcore_config[lcore_id].pipe_slave2master[1];

@@ -198,6 +168,11

[dpdk-dev] [PATCH v1 06/15] eal: add rte_gettid() to acquire unique system tid

2015-01-22 Thread Cunming Liang

The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   |  9 +
 lib/librte_eal/common/include/rte_eal.h  | 27 +++
 lib/librte_eal/linuxapp/eal/eal_thread.c |  7 +++
 3 files changed, 43 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 10220c7..d0c077b 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   long lwpid;
+   thr_self(&lwpid);
+   return (int)lwpid;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index f4ecd2e..8ccdd65 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -41,6 +41,9 @@
  */

 #include 
+#include 
+
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -262,6 +265,30 @@ rte_set_application_usage_hook( rte_usage_hook_t 
usage_func );
  */
 int rte_eal_has_hugepages(void);

+/**
+ * A wrap API for syscall gettid.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+int rte_sys_gettid(void);
+
+/**
+ * Get system unique thread id.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+static inline int rte_gettid(void)
+{
+   static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
+   if (RTE_PER_LCORE(_thread_id) == -1)
+   RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
+   return RTE_PER_LCORE(_thread_id);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 748a83a..ed20c93 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,9 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   return (int)syscall(SYS_gettid);
+}
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 05/15] eal: add eal_common_thread.c for common thread API

2015-01-22 Thread Cunming Liang

The API works for both EAL thread and none EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successful set the cpu affinity.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/Makefile|   1 +
 lib/librte_eal/common/eal_common_thread.c | 142 ++
 lib/librte_eal/linuxapp/eal/Makefile  |   2 +
 3 files changed, 145 insertions(+)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index d434882..78406be 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -73,6 +73,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_hexdump.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_devargs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_options.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c

 CFLAGS_eal.o := -D_GNU_SOURCE
 #CFLAGS_eal_thread.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
new file mode 100644
index 000..d996690
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -0,0 +1,142 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "eal_thread.h"
+
+int
+rte_thread_set_affinity(rte_cpuset_t *cpusetp)
+{
+   int s;
+   unsigned lcore_id;
+   pthread_t tid;
+
+   if (!cpusetp)
+   return -1;
+
+   lcore_id = rte_lcore_id();
+   if (lcore_id != (unsigned)LCORE_ID_ANY) {
+   /* EAL thread */
+   tid = lcore_config[lcore_id].thread_id;
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(cpusetp);
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* update lcore_config */
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);
+   rte_memcpy(&lcore_config[lcore_id].cpuset, cpusetp,
+  sizeof(rte_cpuset_t));
+   } else {
+   /* none EAL thread */
+   tid = pthread_self();
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+

[dpdk-dev] [PATCH v1 04/15] eal: new TLS definition and API declaration

2015-01-22 Thread Cunming Liang

1. add two TLS *_socket_id* and *_cpuset*
2. add two external API rte_thread_set/get_affinity
3. add one internal API eal_thread_dump_affinity

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c|  2 ++
 lib/librte_eal/common/eal_thread.h| 14 ++
 lib/librte_eal/common/include/rte_lcore.h | 29 +++--
 lib/librte_eal/linuxapp/eal/eal_thread.c  |  2 ++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index ab05368..10220c7 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,6 +56,8 @@
 #include "eal_thread.h"

 RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
  * Send a message to a slave lcore identified by slave_id to call a
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index a25ee86..28edf51 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
return socket_id;
 }

+/**
+ * Dump the current pthread cpuset.
+ * This function is private to EAL.
+ *
+ * @param str
+ *   The string buffer the cpuset will dump to.
+ * @param size
+ *   The string buffer size.
+ */
+#define CPU_STR_LEN256
+void
+eal_thread_dump_affinity(char str[], unsigned size);
+
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 4c7d6bb..facdbdc 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -80,7 +81,9 @@ struct lcore_config {
  */
 extern struct lcore_config lcore_config[RTE_MAX_LCORE];

-RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
+RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id". */
+RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". */
+RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */

 /**
  * Return the ID of the execution unit we are running on.
@@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id)
 static inline unsigned
 rte_socket_id(void)
 {
-   return lcore_config[rte_lcore_id()].socket_id;
+   return RTE_PER_LCORE(_socket_id);
 }

 /**
@@ -229,6 +232,28 @@ rte_get_next_lcore(unsigned i, int skip_master, int wrap)
 i

[dpdk-dev] [PATCH v1 03/15] eal: add support parsing socket_id from cpuset

2015-01-22 Thread Cunming Liang

It returns the socket_id if all cpus in the cpuset belongs
to the same NUMA node, otherwise it will return SOCKET_ID_ANY.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c   |  7 +
 lib/librte_eal/common/eal_thread.h  | 52 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c |  7 +
 3 files changed, 66 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 72f8ac2..162fb4f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -41,6 +41,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_thread.h"

 /* No topology information available on FreeBSD including NUMA info */
 #define cpu_core_id(X) 0
@@ -112,3 +113,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(__rte_unused unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index b53b84d..a25ee86 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -34,6 +34,10 @@
 #ifndef EAL_THREAD_H
 #define EAL_THREAD_H

+#include 
+
+#include 
+
 /**
  * basic loop of thread, called for each thread by eal_init().
  *
@@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void *arg);
  */
 void eal_thread_init_master(unsigned lcore_id);

+/**
+ * Get the NUMA socket id from cpu id.
+ * This function is private to EAL.
+ *
+ * @param cpu_id
+ *   The logical process id.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+unsigned eal_cpu_socket_id(unsigned cpu_id);
+
+/**
+ * Get the NUMA socket id from cpuset.
+ * This function is private to EAL.
+ *
+ * @param cpusetp
+ *   The point to a valid cpu set.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+static inline int
+eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
+{
+   unsigned cpu = 0;
+   int socket_id = SOCKET_ID_ANY;
+   int sid;
+
+   if (cpusetp == NULL)
+   return SOCKET_ID_ANY;
+
+   do {
+   if (!CPU_ISSET(cpu, cpusetp))
+   continue;
+
+   if (socket_id == SOCKET_ID_ANY)
+   socket_id = eal_cpu_socket_id(cpu);
+
+   sid = eal_cpu_socket_id(cpu);
+   if (socket_id != sid) {
+   socket_id = SOCKET_ID_ANY;
+   break;
+   }
+
+   } while (++cpu < RTE_MAX_LCORE);
+
+   return socket_id;
+}
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index 29615f8..922af6d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -45,6 +45,7 @@

 #include "eal_private.h"
 #include "eal_filesystem.h"
+#include "eal_thread.h"

 #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u"
 #define CORE_ID_FILE "topology/core_id"
@@ -197,3 +198,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Cunming Liang

It supports one new eal long option '--lcores' for EAL thread cpuset assignment.

The format pattern:
--lcores='lcores[@cpus]<,lcores[@cpus]>'
lcores, cpus could be a single digit or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.

e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
  lcore 0 runs on cpuset 0x41 (cpu 0,6)
  lcore 1 runs on cpuset 0x2 (cpu 1)
  lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
  lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
  lcore 6 runs on cpuset 0x41 (cpu 0,6)

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_options.c | 262 -
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 4 files changed, 261 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_launch.c 
b/lib/librte_eal/common/eal_common_launch.c
index 599f83b..2d732b1 100644
--- a/lib/librte_eal/common/eal_common_launch.c
+++ b/lib/librte_eal/common/eal_common_launch.c
@@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
rte_eal_wait_lcore(lcore_id);
}
 }
-
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index e2810ab..fc47588 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "eal_internal_cfg.h"
 #include "eal_options.h"
@@ -85,6 +86,7 @@ eal_long_options[] = {
{OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
{OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
+   {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
{0, 0, 0, 0}
 };

@@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
if (min == RTE_MAX_LCORE)
min = idx;
for (idx = min; idx <= max; idx++) {
-   cfg->lcore_role[idx] = ROLE_RTE;
-   lcore_config[idx].core_index = count;
-   count++;
+   if (cfg->lcore_role[idx] != ROLE_RTE) {
+   cfg->lcore_role[idx] = ROLE_RTE;
+   lcore_config[idx].core_index = count;
+   count++;
+   }
}
min = RTE_MAX_LCORE;
} else
@@ -289,6 +293,241 @@ eal_parse_master_lcore(const char *arg)
return 0;
 }

+/*
+ * Parse elem, the elem could be single number or '(' ')' group
+ * Within group elem, '-' used for a range seperator;
+ *',' used for a single number.
+ */
+static int
+eal_parse_set(const char *input, uint16_t set[], unsigned num)
+{
+   unsigned idx;
+   const char *str = input;
+   char *end = NULL;
+   unsigned min, max;
+
+   memset(set, 0, num * sizeof(uint16_t));
+
+   while (isblank(*str))
+   str++;
+
+   /* only digit or left bracket is qulify for start point */
+   if ((!isdigit(*str) && *str != '(') || *str == '\0')
+   return -1;
+
+   /* process single number */
+   if (*str != '(') {
+   errno = 0;
+   idx = strtoul(str, &end, 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+   else {
+   while (isblank(*end))
+   end++;
+
+   if (*end != ',' && *end != '\0' &&
+   *end != '@')
+   return -1;
+
+   set[idx] = 1;
+   return end - input;
+   }
+   }
+
+   /* process set within bracket */
+   str++;
+   while (isblank(*str))
+   str++;
+   if (*str == '\0')
+   return -1;
+
+   min = RTE_MAX_LCORE;
+   do {
+
+   /* go ahead to the first digit */
+   while (isblank(*str))
+   str++;
+   if (!isdigit(*str))
+   return -1;
+
+   /* get the digit value */
+   errno = 0;
+   idx = strtoul(str, &end, 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+
+   /* go ahead to separator '-',',' and ')' */
+   while (isblank(*end))
+   end++;
+   if (*end == '-') {
+   if (min == RTE_MAX_LCORE)
+   min = idx;
+   else /* avoid continuous '-' */
+   return -1;
+   } else if ((*end == ',') || (*end == ')')) {
+

[dpdk-dev] [PATCH v1 01/15] eal: add cpuset into per EAL thread lcore_config

2015-01-22 Thread Cunming Liang

The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.

It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 +++
 lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++
 lib/librte_eal/common/include/rte_lcore.h | 8 
 lib/librte_eal/linuxapp/eal/Makefile  | 1 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c   | 8 
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 662f024..72f8ac2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -76,11 +76,18 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(&lcore_config[lcore_id].cpuset);
+
lcore_config[lcore_id].detected = (lcore_id < ncpus);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c 
b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ee87d..a34d500 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -45,6 +45,8 @@
 #include "eal_internal_cfg.h"
 #include "eal_filesystem.h"

+/* avoid re-defined against with freebsd header */
+#undef PAGE_SIZE
 #define PAGE_SIZE (sysconf(_SC_PAGESIZE))

 /*
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 49b2c03..4c7d6bb 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -50,6 +50,13 @@ extern "C" {

 #define LCORE_ID_ANY -1/**< Any lcore. */

+#if defined(__linux__)
+   typedef cpu_set_t rte_cpuset_t;
+#elif defined(__FreeBSD__)
+#include 
+   typedef cpuset_t rte_cpuset_t;
+#endif
+
 /**
  * Structure storing internal configuration (per-lcore)
  */
@@ -65,6 +72,7 @@ struct lcore_config {
unsigned socket_id;/**< physical socket id for this lcore */
unsigned core_id;  /**< core number on socket for this lcore */
int core_index;/**< relative index, starting from 0 */
+   rte_cpuset_t cpuset;   /**< cpu set which the lcore affinity to */
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 72ecf3a..0e9c447 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -87,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_options.c

 CFLAGS_eal.o := -D_GNU_SOURCE
+CFLAGS_eal_lcore.o := -D_GNU_SOURCE
 CFLAGS_eal_thread.o := -D_GNU_SOURCE
 CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index c67e0e6..29615f8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -158,11 +158,19 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(&lcore_config[lcore_id].cpuset);
+
+   /* in 1:1 mapping, record related cpu detected state */
lcore_config[lcore_id].detected = cpu_detected(lcore_id);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 00/15] support multi-pthread per core

2015-01-22 Thread Cunming Liang

The patch series contain the enhancements of EAL and fixes for libraries
to run multi-pthreads(either EAL or non-EAL thread) per physical core. 
Two major changes list as below:
- Extend the core affinity of each EAL thread to 1:n.
  Each lcore stands for a EAL thread rather than a logical core.
  The change adds new EAL option to allow static lcore to cpuset assginment.
  Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is the 
special case.
- Fix the libraries to allow running on any non-EAL thread.
  It fix the gaps running libraries in non-EAL thread(dynamic created by user).
  Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE.

Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen in 
RFC review.

*** BLURB HERE ***

Cunming Liang (15):
  eal: add cpuset into per EAL thread lcore_config
  eal: new eal option '--lcores' for cpu assignment
  eal: add support parsing socket_id from cpuset
  eal: new TLS definition and API declaration
  eal: add eal_common_thread.c for common thread API
  eal: add rte_gettid() to acquire unique system tid
  eal: apply affinity of EAL thread by assigned cpuset
  enic: fix re-define freebsd compile complain
  malloc: fix the issue of SOCKET_ID_ANY
  log: fix the gap to support non-EAL thread
  eal: set _lcore_id and _socket_id to (-1) by default
  eal: fix recursive spinlock in non-EAL thraed
  mempool: add support to non-EAL thread
  ring: add support to non-EAL thread
  timer: add support to non-EAL thread

 lib/librte_eal/bsdapp/eal/Makefile |   1 +
 lib/librte_eal/bsdapp/eal/eal.c|  13 +-
 lib/librte_eal/bsdapp/eal/eal_lcore.c  |  14 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c |   2 +
 lib/librte_eal/bsdapp/eal/eal_thread.c |  76 +++---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_log.c |  17 +-
 lib/librte_eal/common/eal_common_options.c | 262 -
 lib/librte_eal/common/eal_common_thread.c  | 142 +++
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/common/eal_thread.h |  66 ++
 .../common/include/generic/rte_spinlock.h  |   4 +-
 lib/librte_eal/common/include/rte_eal.h|  27 +++
 lib/librte_eal/common/include/rte_lcore.h  |  37 ++-
 lib/librte_eal/common/include/rte_log.h|   5 +
 lib/librte_eal/linuxapp/eal/Makefile   |   4 +
 lib/librte_eal/linuxapp/eal/eal.c  |   7 +-
 lib/librte_eal/linuxapp/eal/eal_lcore.c|  15 ++
 lib/librte_eal/linuxapp/eal/eal_thread.c   |  78 +++---
 lib/librte_malloc/malloc_heap.h|   7 +-
 lib/librte_mempool/rte_mempool.h   |  18 +-
 lib/librte_pmd_enic/enic.h |   1 +
 lib/librte_pmd_enic/enic_compat.h  |   1 +
 lib/librte_ring/rte_ring.h |  10 +-
 lib/librte_timer/rte_timer.c   |  40 +++-
 lib/librte_timer/rte_timer.h   |   2 +-
 26 files changed, 721 insertions(+), 131 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v2 6/6] doc: commands changed in testpmd_funcs for 2tuple amd 5tuple filter

2015-01-22 Thread Jingjing Wu

document of new commands:
 - 2tuple_filter (port_id) (add|del)
   dst_port (dst_port_value) protocol (protocol_value)
   mask (mask_value) tcp_flags (tcp_flags_value)
   priority (prio_value) queue (queue_id)
 - 5tuple_filter (port_id) (add|del)
   dst_ip (dst_address) src_ip (src_address)
   dst_port (dst_port_value) src_port (src_port_value)
   protocol (protocol_value)
   mask (mask_value) tcp_flags (tcp_flags_value)
   priority (prio_value) queue (queue_id)

Signed-off-by: Jingjing Wu 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 99 ++---
 1 file changed, 21 insertions(+), 78 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index be935c2..56d7c82 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1448,76 +1448,48 @@ Example:
 priority: disable, 0
 queue: 3

-add_2tuple_filter
+2tuple_filter
 ~

-Add a 2-tuple filter,
+Add or delete a 2-tuple filter,
 which identify packets by specific protocol and destination TCP/UDP port
 and forwards packets into one of the receive queues.

-add_2tuple_filter (port_id) protocol (pro_value) (pro_mask) dst_port 
(port_value) (port_mask)
-flags (flg_value) priority (prio_value) queue (queue_id) index (idx)
+2tuple_filter (port_id) (add|del) dst_port (dst_port_value) protocol 
(protocol_value)
+mask (mask_value) tcp_flags (tcp_flags_value) priority (prio_value) queue 
(queue_id)

 The available information parameters are:

 *   port_id: the port which the 2-tuple filter assigned on.

-*   pro_value: IP L4 protocol
+*   dst_port_value: destination port in L4.

-*   pro_mask: protocol participates in the match or not, 1 means participate
+*   protocol_value: IP L4 protocol.

-*   port_value: destination port in L4.
+*   mask_value: participates in the match or not by bit for field above, 1b 
means participate.

-*   port_mask: destination port participates in the match or not, 1 means 
participate.
+*   tcp_flags_value: TCP control bits. The non-zero value is invalid, when the 
pro_value is not set to 0x06 (TCP).

-*   flg_value: TCP control bits. The non-zero value is invalid, when the 
pro_value is not set to 0x06 (TCP).
+*   prio_value: priority of this filter.

-*   prio_value: the priority of this filter.
-
-*   queue_id: The receive queue associated with this 2-tuple filter
+*   queue_id: The receive queue associated with this 2-tuple filter.

-*   index: the index of this 2-tuple filter
-
-Example:
+Example, to add/remove an 2tuple filter rule:

 .. code-block:: console

-testpmd> add_2tuple_filter 0 protocol 0x06 1 dst_port 32 1 flags 0x02 
priority 3 queue 3 index 0
-
-remove_2tuple_filter
-
-
-Remove a 2-tuple filter
-
-remove_2tuple_filter (port_id) index (idx)
+testpmd> 2tuple_filter 0 add dst_port 32 protocol 0x06 mask 0x03 tcp_flags 
0x02 priority 3 queue 3
+testpmd> 2tuple_filter 0 del dst_port 32 protocol 0x06 mask 0x03 tcp_flags 
0x02 priority 3 queue 3

-get_2tuple_filter
+5tuple_filter
 ~

-Get and display a 2-tuple filter
-
-get_2tuple_filter (port_id) index (idx)
-
-Example:
-
-.. code-block:: console
-
-testpmd> get_2tuple_filter 0 index 0
-
-filter[0]:
-Destination Port: 0x0020 mask: 1
-protocol: 0x06 mask:1 tcp_flags: 0x02
-priority: 3   queue: 3
-
-add_5tuple_filter
-~
-
-Add a 5-tuple filter,
+Add or delete a 5-tuple filter,
 which consists of a 5-tuple (protocol, source and destination IP addresses, 
source and destination TCP/UDP/SCTP port)
 and routes packets into one of the receive queues.

-add_5tuple_filter (port_id) dst_ip (dst_address) src_ip (src_address) dst_port 
(dst_port_value) src_port (src_port_value)
-protocol (protocol_value) mask (mask_value) flags (flags_value) priority 
(prio_value) queue (queue_id) index (idx)
+5tuple_filter (port_id) (add|del) dst_ip (dst_address) src_ip (src_address) 
dst_port (dst_port_value) src_port (src_port_value)
+protocol (protocol_value) mask (mask_value) tcp_flags (tcp_flags_value) 
priority (prio_value) queue (queue_id)

 The available information parameters are:

@@ -1535,47 +1507,18 @@ The available information parameters are:

 *   mask_value: participates in the match or not by bit for field above, 1b 
means participate

-*   flags_value: TCP control bits. The non-zero value is invalid, when the 
protocol_value is not set to 0x06 (TCP).
+*   tcp_flags_value: TCP control bits. The non-zero value is invalid, when the 
protocol_value is not set to 0x06 (TCP).

 *   prio_value: the priority of this filter.

 *   queue_id: The receive queue associated with this 5-tuple filter.

-*   index: the index of this 5-tuple filter
-
-Example:
-
-.. code-block:: console
-
-testpmd> add_5tuple_filter 1 dst_ip 2.2.2.5 src_ip 2.2.2.4 dst_port 64 
src_port 32 protocol 0x06 mask 0x1F flags 0x0 priority 3 queue 3 index 0
-
-remov

[dpdk-dev] [PATCH v2 5/6] ethdev: remove old APIs and structures of 5tuple and 2tuple filters

2015-01-22 Thread Jingjing Wu

Following structures are removed:
 - rte_2tuple_filter
 - rte_5tuple_filter
Following APIs are removed:
 - rte_eth_dev_add_2tuple_filter
 - rte_eth_dev_remove_2tuple_filter
 - rte_eth_dev_get_2tuple_filter
 - rte_eth_dev_add_5tuple_filter
 - rte_eth_dev_remove_5tuple_filter
 - rte_eth_dev_get_5tuple_filter
It also move macros TCP_*_FLAG to rte_eth_ctrl.h, and removes the macro
TCP_UGR_FLAG which is duplicated with TCP_URG_FLAG.

Signed-off-by: Jingjing Wu 
---
 lib/librte_ether/rte_eth_ctrl.h |   7 ++
 lib/librte_ether/rte_ethdev.c   | 116 
 lib/librte_ether/rte_ethdev.h   | 194 
 3 files changed, 7 insertions(+), 310 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 3465c68..e4b9b52 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -139,6 +139,13 @@ struct rte_eth_ethertype_filter {
RTE_NTUPLE_FLAGS_DST_PORT | \
RTE_NTUPLE_FLAGS_PROTO)

+#define TCP_URG_FLAG 0x20
+#define TCP_ACK_FLAG 0x10
+#define TCP_PSH_FLAG 0x08
+#define TCP_RST_FLAG 0x04
+#define TCP_SYN_FLAG 0x02
+#define TCP_FIN_FLAG 0x01
+#define TCP_FLAG_ALL 0x3F

 /**
  * A structure used to define the ntuple filter entry
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ea3a1fb..a2e71e0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3056,122 +3056,6 @@ rte_eth_dev_get_syn_filter(uint8_t port_id,
 }

 int
-rte_eth_dev_add_2tuple_filter(uint8_t port_id, uint16_t index,
-   struct rte_2tuple_filter *filter, uint16_t rx_queue)
-{
-   struct rte_eth_dev *dev;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
-   return -ENODEV;
-   }
-   if (filter->protocol != IPPROTO_TCP &&
-   filter->tcp_flags != 0){
-   PMD_DEBUG_TRACE("tcp flags is 0x%x, but the protocol value"
-   " is not TCP\n",
-   filter->tcp_flags);
-   return -EINVAL;
-   }
-
-   dev = &rte_eth_devices[port_id];
-   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->add_2tuple_filter, -ENOTSUP);
-   return (*dev->dev_ops->add_2tuple_filter)(dev, index, filter, rx_queue);
-}
-
-int
-rte_eth_dev_remove_2tuple_filter(uint8_t port_id, uint16_t index)
-{
-   struct rte_eth_dev *dev;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
-   return -ENODEV;
-   }
-
-   dev = &rte_eth_devices[port_id];
-   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->remove_2tuple_filter, -ENOTSUP);
-   return (*dev->dev_ops->remove_2tuple_filter)(dev, index);
-}
-
-int
-rte_eth_dev_get_2tuple_filter(uint8_t port_id, uint16_t index,
-   struct rte_2tuple_filter *filter, uint16_t *rx_queue)
-{
-   struct rte_eth_dev *dev;
-
-   if (filter == NULL || rx_queue == NULL)
-   return -EINVAL;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
-   return -ENODEV;
-   }
-
-   dev = &rte_eth_devices[port_id];
-   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_2tuple_filter, -ENOTSUP);
-   return (*dev->dev_ops->get_2tuple_filter)(dev, index, filter, rx_queue);
-}
-
-int
-rte_eth_dev_add_5tuple_filter(uint8_t port_id, uint16_t index,
-   struct rte_5tuple_filter *filter, uint16_t rx_queue)
-{
-   struct rte_eth_dev *dev;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
-   return -ENODEV;
-   }
-
-   if (filter->protocol != IPPROTO_TCP &&
-   filter->tcp_flags != 0){
-   PMD_DEBUG_TRACE("tcp flags is 0x%x, but the protocol value"
-   " is not TCP\n",
-   filter->tcp_flags);
-   return -EINVAL;
-   }
-
-   dev = &rte_eth_devices[port_id];
-   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->add_5tuple_filter, -ENOTSUP);
-   return (*dev->dev_ops->add_5tuple_filter)(dev, index, filter, rx_queue);
-}
-
-int
-rte_eth_dev_remove_5tuple_filter(uint8_t port_id, uint16_t index)
-{
-   struct rte_eth_dev *dev;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
-   return -ENODEV;
-   }
-
-   dev = &rte_eth_devices[port_id];
-   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->remove_5tuple_filter, -ENOTSUP);
-   return (*dev->dev_ops->remove_5tuple_filter)(dev, index);
-}
-
-int
-rte_eth_dev_get_5tuple_filter(uint8_t port_id, uint16_t index,
-   struct rte_5tuple_filter *filter, uint16_t *rx_queue)
-{
-   struct rte_eth_dev *dev;
-
-   if (filter == NULL || rx_queue == NULL)
-   return -EINVAL;
-
-   if (port_id >= nb_ports) {
-   PMD_DEBUG_T

[dpdk-dev] [PATCH v2 4/6] testpmd: new commands for ntuple filter

2015-01-22 Thread Jingjing Wu

Following commands of 5tuple and 2tuple filter are removed:
 - add_2tuple_filter (port_id) protocol (pro_value) (pro_mask)
   dst_port (port_value) (port_mask) flags (flg_value) priority (prio
   queue (queue_id) index (idx)
 - remove_2tuple_filter (port_id) index (idx)
 - get_2tuple_filter (port_id) index (idx)
 - add_5tuple_filter (port_id) dst_ip (dst_address) src_ip (src_addres
   dst_port (dst_port_value) src_port (src_port_value) protocol (prot
   mask (mask_value) flags (flags_value) priority (prio_value)"
   queue (queue_id) index (idx)
 - remove_5tuple_filter (port_id) index (idx)
 - get_5tuple_filter (port_id) index (idx)

New commands are added for 5tuple and 2tuple filter by using filter_ctrl API
and new ntuple filter structure:
 - 2tuple_filter (port_id) (add|del)
   dst_port (dst_port_value) protocol (protocol_value)
   mask (mask_value) tcp_flags (tcp_flags_value)
   priority (prio_value) queue (queue_id)
 - 5tuple_filter (port_id) (add|del)
   dst_ip (dst_address) src_ip (src_address)
   dst_port (dst_port_value) src_port (src_port_value)
   protocol (protocol_value)
   mask (mask_value) tcp_flags (tcp_flags_value)
   priority (prio_value) queue (queue_id)

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c | 406 ++---
 app/test-pmd/config.c  |  65 
 2 files changed, 186 insertions(+), 285 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4beb404..790b142 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -659,28 +659,19 @@ static void cmd_help_long_parsed(void *parsed_result,
" (ether_type) (drop|fwd) queue (queue_id)\n"
"Add/Del an ethertype filter.\n\n"

-   "add_2tuple_filter (port_id) protocol (pro_value) 
(pro_mask)"
-   " dst_port (port_value) (port_mask) flags (flg_value) 
priority (prio_value)"
-   " queue (queue_id) index (idx)\n"
-   "add a 2tuple filter.\n\n"
-
-   "remove_2tuple_filter (port_id) index (idx)\n"
-   "remove a 2tuple filter.\n\n"
-
-   "get_2tuple_filter (port_id) index (idx)\n"
-   "get info of a 2tuple filter.\n\n"
-
-   "add_5tuple_filter (port_id) dst_ip (dst_address) 
src_ip (src_address)"
-   " dst_port (dst_port_value) src_port (src_port_value) 
protocol (protocol_value)"
-   " mask (mask_value) flags (flags_value) priority 
(prio_value)"
-   " queue (queue_id) index (idx)\n"
-   "add a 5tuple filter.\n\n"
-
-   "remove_5tuple_filter (port_id) index (idx)\n"
-   "remove a 5tuple filter.\n\n"
-
-   "get_5tuple_filter (port_id) index (idx)\n"
-   "get info of a 5tuple filter.\n\n"
+   "2tuple_filter (port_id) (add|del)"
+   " dst_port (dst_port_value) protocol (protocol_value)"
+   " mask (mask_value) tcp_flags (tcp_flags_value)"
+   " priority (prio_value) queue (queue_id)\n"
+   "Add/Del a 2tuple filter.\n\n"
+
+   "5tuple_filter (port_id) (add|del)"
+   " dst_ip (dst_address) src_ip (src_address)"
+   " dst_port (dst_port_value) src_port (src_port_value)"
+   " protocol (protocol_value)"
+   " mask (mask_value) tcp_flags (tcp_flags_value)"
+   " priority (prio_value) queue (queue_id)\n"
+   "Add/Del a 5tuple filter.\n\n"

"add_syn_filter (port_id) priority (high|low) queue 
(queue_id)"
"add syn filter.\n\n"
@@ -7357,21 +7348,20 @@ cmdline_parse_inst_t cmd_get_syn_filter = {
 /* *** ADD/REMOVE A 2tuple FILTER *** */
 struct cmd_2tuple_filter_result {
cmdline_fixed_string_t filter;
-   uint8_t port_id;
-   cmdline_fixed_string_t protocol;
-   uint8_t protocol_value;
-   uint8_t protocol_mask;
+   uint8_t  port_id;
+   cmdline_fixed_string_t ops;
cmdline_fixed_string_t dst_port;
uint16_t dst_port_value;
-   uint16_t dst_port_mask;
-   cmdline_fixed_string_t flags;
-   uint8_t flags_value;
+   cmdline_fixed_string_t protocol;
+   uint8_t protocol_value;
+   cmdline_fixed_string_t mask;
+   uint8_t  mask_value;
+   cmdline_fixed_string_t tcp_flags;
+   uint8_t tcp_flags_value;
cmdline_fixed_string_t priority;
-   uint8_t priority_value;
+   uint8_t  priority_value;
cmdline_fixed_string_t queue;
-   uint16_t queue_id;
-   cmdline_fixed_string_t index;
-   uint16_t index_value;
+   uint16_t  queue_id;
 };

 static void
@@ -7379,59 +7369

[dpdk-dev] [PATCH v2 3/6] e1000: ntuple filter functions replace old ones for 2tuple and 5tuple filter

2015-01-22 Thread Jingjing Wu

This patch defines new functions dealing with ntuple filters which is
corresponding to 2tuple filter for 82580 and i350 in HW, and to 5tuple
filter for 82576 in HW.
It removes old functions which deal with 2tuple and 5tuple filters in igb 
driver.
Ntuple filter is dealt with through entrance eth_igb_filter_ctrl.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_e1000/e1000_ethdev.h |  69 ++-
 lib/librte_pmd_e1000/igb_ethdev.c   | 869 
 2 files changed, 647 insertions(+), 291 deletions(-)

diff --git a/lib/librte_pmd_e1000/e1000_ethdev.h 
b/lib/librte_pmd_e1000/e1000_ethdev.h
index d155e77..571a70d 100644
--- a/lib/librte_pmd_e1000/e1000_ethdev.h
+++ b/lib/librte_pmd_e1000/e1000_ethdev.h
@@ -67,14 +67,6 @@

 #define E1000_IMIR_DSTPORT 0x
 #define E1000_IMIR_PRIORITY0xE000
-#define E1000_IMIR_EXT_SIZE_BP 0x1000
-#define E1000_IMIR_EXT_CTRL_UGR0x2000
-#define E1000_IMIR_EXT_CTRL_ACK0x4000
-#define E1000_IMIR_EXT_CTRL_PSH0x8000
-#define E1000_IMIR_EXT_CTRL_RST0x0001
-#define E1000_IMIR_EXT_CTRL_SYN0x0002
-#define E1000_IMIR_EXT_CTRL_FIN0x0004
-#define E1000_IMIR_EXT_CTRL_BP 0x0008
 #define E1000_MAX_TTQF_FILTERS 8
 #define E1000_2TUPLE_MAX_PRI   7

@@ -96,11 +88,6 @@
 #define E1000_MAX_FTQF_FILTERS   8
 #define E1000_FTQF_PROTOCOL_MASK 0x00FF
 #define E1000_FTQF_5TUPLE_MASK_SHIFT 28
-#define E1000_FTQF_PROTOCOL_COMP_MASK0x1000
-#define E1000_FTQF_SOURCE_ADDR_MASK  0x2000
-#define E1000_FTQF_DEST_ADDR_MASK0x4000
-#define E1000_FTQF_SOURCE_PORT_MASK  0x8000
-#define E1000_FTQF_VF_MASK_EN0x8000
 #define E1000_FTQF_QUEUE_MASK0x03ff
 #define E1000_FTQF_QUEUE_SHIFT   16
 #define E1000_FTQF_QUEUE_ENABLE  0x0100
@@ -131,6 +118,56 @@ struct e1000_vf_info {
uint16_t tx_rate;
 };

+TAILQ_HEAD(e1000_5tuple_filter_list, e1000_5tuple_filter);
+TAILQ_HEAD(e1000_2tuple_filter_list, e1000_2tuple_filter);
+
+struct e1000_5tuple_filter_info {
+   uint32_t dst_ip;
+   uint32_t src_ip;
+   uint16_t dst_port;
+   uint16_t src_port;
+   uint8_t proto;   /* l4 protocol. */
+   /* the packet matched above 5tuple and contain any set bit will hit 
this filter. */
+   uint8_t tcp_flags;
+   uint8_t priority;/* seven levels (001b-111b), 111b is highest,
+ used when more than one filter matches. */
+   uint8_t dst_ip_mask:1,   /* if mask is 1b, do not compare dst ip. */
+   src_ip_mask:1,   /* if mask is 1b, do not compare src ip. */
+   dst_port_mask:1, /* if mask is 1b, do not compare dst port. */
+   src_port_mask:1, /* if mask is 1b, do not compare src port. */
+   proto_mask:1;/* if mask is 1b, do not compare protocol. */
+};
+
+struct e1000_2tuple_filter_info {
+   uint16_t dst_port;
+   uint8_t proto;   /* l4 protocol. */
+   /* the packet matched above 2tuple and contain any set bit will hit 
this filter. */
+   uint8_t tcp_flags;
+   uint8_t priority;/* seven levels (001b-111b), 111b is highest,
+ used when more than one filter matches. */
+   uint8_t dst_ip_mask:1,   /* if mask is 1b, do not compare dst ip. */
+   src_ip_mask:1,   /* if mask is 1b, do not compare src ip. */
+   dst_port_mask:1, /* if mask is 1b, do not compare dst port. */
+   src_port_mask:1, /* if mask is 1b, do not compare src port. */
+   proto_mask:1;/* if mask is 1b, do not compare protocol. */
+};
+
+/* 5tuple filter structure */
+struct e1000_5tuple_filter {
+   TAILQ_ENTRY(e1000_5tuple_filter) entries;
+   uint16_t index;   /* the index of 5tuple filter */
+   struct e1000_5tuple_filter_info filter_info;
+   uint16_t queue;   /* rx queue assigned to */
+};
+
+/* 2tuple filter structure */
+struct e1000_2tuple_filter {
+   TAILQ_ENTRY(e1000_2tuple_filter) entries;
+   uint16_t index; /* the index of 2tuple filter */
+   struct e1000_2tuple_filter_info filter_info;
+   uint16_t queue;   /* rx queue assigned to */
+};
+
 /*
  * Structure to store filters' info.
  */
@@ -138,6 +175,12 @@ struct e1000_filter_info {
uint8_t ethertype_mask; /* Bit mask for every used ethertype filter */
/* store used ethertype filters*/
uint16_t ethertype_filters[E1000_MAX_ETQF_FILTERS];
+   /* Bit mask for every used 5tuple filter */
+   uint8_t fivetuple_mask;
+   struct e1000_5tuple_filter_list fivetuple_list;
+   /* Bit mask for every used 2tuple filter */
+   uint8_t twotuple_mask;
+   struct e1000_2tuple_filter_list twotuple_list;
 };

 /*
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_

[dpdk-dev] [PATCH v2 2/6] ixgbe: ntuple filter functions replace old ones for 5tuple filter

2015-01-22 Thread Jingjing Wu

This patch defines new functions dealing with ntuple filters which is
corresponding to 5tuple in HW.
It removes old functions which deal with 5tuple filters.
Ntuple filter ie dealt with through entrance ixgbe_dev_filter_ctrl.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 468 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  52 +++-
 2 files changed, 389 insertions(+), 131 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b58ec45..9f1ad5b 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -231,14 +231,18 @@ static int ixgbe_add_syn_filter(struct rte_eth_dev *dev,
 static int ixgbe_remove_syn_filter(struct rte_eth_dev *dev);
 static int ixgbe_get_syn_filter(struct rte_eth_dev *dev,
struct rte_syn_filter *filter, uint16_t *rx_queue);
-static int ixgbe_add_5tuple_filter(struct rte_eth_dev *dev, uint16_t index,
-   struct rte_5tuple_filter *filter, uint16_t rx_queue);
-static int ixgbe_remove_5tuple_filter(struct rte_eth_dev *dev,
-   uint16_t index);
-static int ixgbe_get_5tuple_filter(struct rte_eth_dev *dev, uint16_t index,
-   struct rte_5tuple_filter *filter, uint16_t *rx_queue);
-
-static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
+static int ixgbe_add_5tuple_filter(struct rte_eth_dev *dev,
+   struct ixgbe_5tuple_filter *filter);
+static void ixgbe_remove_5tuple_filter(struct rte_eth_dev *dev,
+   struct ixgbe_5tuple_filter *filter);
+static int ixgbe_add_del_ntuple_filter(struct rte_eth_dev *dev,
+   struct rte_eth_ntuple_filter *filter,
+   bool add);
+static int ixgbe_ntuple_filter_handle(struct rte_eth_dev *dev,
+   enum rte_filter_op filter_op,
+   void *arg);
+static int ixgbe_get_ntuple_filter(struct rte_eth_dev *dev,
+   struct rte_eth_ntuple_filter *filter);
 static int ixgbe_add_del_ethertype_filter(struct rte_eth_dev *dev,
struct rte_eth_ethertype_filter *filter,
bool add);
@@ -251,6 +255,7 @@ static int ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
 enum rte_filter_type filter_type,
 enum rte_filter_op filter_op,
 void *arg);
+static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -386,9 +391,6 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.add_syn_filter  = ixgbe_add_syn_filter,
.remove_syn_filter   = ixgbe_remove_syn_filter,
.get_syn_filter  = ixgbe_get_syn_filter,
-   .add_5tuple_filter   = ixgbe_add_5tuple_filter,
-   .remove_5tuple_filter= ixgbe_remove_5tuple_filter,
-   .get_5tuple_filter   = ixgbe_get_5tuple_filter,
.filter_ctrl = ixgbe_dev_filter_ctrl,
 };

@@ -736,6 +738,8 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
eth_driver *eth_drv,
IXGBE_DEV_PRIVATE_TO_HWSTRIP_BITMAP(eth_dev->data->dev_private);
struct ixgbe_dcb_config *dcb_config =
IXGBE_DEV_PRIVATE_TO_DCB_CFG(eth_dev->data->dev_private);
+   struct ixgbe_filter_info *filter_info =
+   IXGBE_DEV_PRIVATE_TO_FILTER_INFO(eth_dev->data->dev_private);
uint32_t ctrl_ext;
uint16_t csum;
int diag, i;
@@ -917,6 +921,11 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
eth_driver *eth_drv,
/* enable support intr */
ixgbe_enable_intr(eth_dev);

+   /* initialize 5tuple filter list */
+   TAILQ_INIT(&filter_info->fivetuple_list);
+   memset(filter_info->fivetuple_mask, 0,
+   sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
+
return 0;
 }

@@ -1606,6 +1615,9 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_vf_info *vfinfo =
*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
+   struct ixgbe_filter_info *filter_info =
+   IXGBE_DEV_PRIVATE_TO_FILTER_INFO(dev->data->dev_private);
+   struct ixgbe_5tuple_filter *p_5tuple, *p_5tuple_next;
int vf;

PMD_INIT_FUNC_TRACE();
@@ -1635,6 +1647,18 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
/* Clear recorded link status */
memset(&link, 0, sizeof(link));
rte_ixgbe_dev_atomic_write_link_status(dev, &link);
+
+   /* Remove all ntuple filters of the device */
+   for (p_5tuple = TAILQ_FIRST(&filter_info->fivetuple_list);
+p_5tuple != NULL; p_5tuple = p_5tuple_next) {
+   p_5tuple_next = TAILQ_NEXT(p_5tuple, entries);
+   TAILQ_REMOVE(&filter_info->fivetuple_list,
+

[dpdk-dev] [PATCH v2 1/6] ethdev: define ntuple filter type and its structure

2015-01-22 Thread Jingjing Wu

This patch defines ntuple filter type RTE_ETH_FILTER_NTUPLE and its structure 
rte_eth_ntuple_filter.
It also corrects the typo TCP_UGR_FLAG to TCP_URG_FLAG

Signed-off-by: Jingjing Wu 
---
 lib/librte_ether/rte_eth_ctrl.h | 50 +
 lib/librte_ether/rte_ethdev.h   |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 5d9c387..3465c68 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -53,6 +53,7 @@ enum rte_filter_type {
RTE_ETH_FILTER_NONE = 0,
RTE_ETH_FILTER_MACVLAN,
RTE_ETH_FILTER_ETHERTYPE,
+   RTE_ETH_FILTER_NTUPLE,
RTE_ETH_FILTER_TUNNEL,
RTE_ETH_FILTER_FDIR,
RTE_ETH_FILTER_MAX
@@ -117,6 +118,55 @@ struct rte_eth_ethertype_filter {
 };

 /**
+ * Define all structures for ntuple Filter type.
+ */
+
+#define RTE_NTUPLE_FLAGS_DST_IP0x0001 /**< If set, dst_ip is part of 
ntuple */
+#define RTE_NTUPLE_FLAGS_SRC_IP0x0002 /**< If set, src_ip is part of 
ntuple */
+#define RTE_NTUPLE_FLAGS_DST_PORT  0x0004 /**< If set, dst_port is part of 
ntuple */
+#define RTE_NTUPLE_FLAGS_SRC_PORT  0x0008 /**< If set, src_port is part of 
ntuple */
+#define RTE_NTUPLE_FLAGS_PROTO 0x0010 /**< If set, protocol is part of 
ntuple */
+#define RTE_NTUPLE_FLAGS_TCP_FLAG  0x0020 /**< If set, tcp flag is involved */
+
+#define RTE_5TUPLE_FLAGS ( \
+   RTE_NTUPLE_FLAGS_DST_IP | \
+   RTE_NTUPLE_FLAGS_SRC_IP | \
+   RTE_NTUPLE_FLAGS_DST_PORT | \
+   RTE_NTUPLE_FLAGS_SRC_PORT | \
+   RTE_NTUPLE_FLAGS_PROTO)
+
+#define RTE_2TUPLE_FLAGS ( \
+   RTE_NTUPLE_FLAGS_DST_PORT | \
+   RTE_NTUPLE_FLAGS_PROTO)
+
+
+/**
+ * A structure used to define the ntuple filter entry
+ * to support RTE_ETH_FILTER_NTUPLE with RTE_ETH_FILTER_ADD,
+ * RTE_ETH_FILTER_DELETE and RTE_ETH_FILTER_GET operations.
+ */
+struct rte_eth_ntuple_filter {
+   uint16_t flags;  /**< Flags from RTE_NTUPLE_FLAGS_* */
+   uint32_t dst_ip; /**< Destination IP address in big endian. */
+   uint32_t dst_ip_mask;/**< Mask of destination IP address. */
+   uint32_t src_ip; /**< Source IP address in big endian. */
+   uint32_t src_ip_mask;/**< Mask of destination IP address. */
+   uint16_t dst_port;   /**< Destination port in big endian. */
+   uint16_t dst_port_mask;  /**< Mask of destination port. */
+   uint16_t src_port;   /**< Source Port in big endian. */
+   uint16_t src_port_mask;  /**< Mask of source port. */
+   uint8_t proto;   /**< L4 protocol. */
+   uint8_t proto_mask;  /**< Mask of L4 protocol. */
+   /** tcp_flags only meaningful when the proto is TCP.
+   The packet matched above ntuple fields and contain
+   any set bit in tcp_flags will hit this filter. */
+   uint8_t tcp_flags;
+   uint16_t priority;   /**< seven levels (001b-111b), 111b is highest,
+ used when more than one filter matches. */
+   uint16_t queue;  /**< Queue assigned to when match*/
+};
+
+/**
  * Tunneled type.
  */
 enum rte_eth_tunnel_type {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1200c1c..5992e43 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -963,6 +963,8 @@ struct rte_eth_dev_callback;
 /** @internal Structure to keep track of registered callbacks */
 TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);

+
+#define TCP_URG_FLAG 0x20
 #define TCP_UGR_FLAG 0x20
 #define TCP_ACK_FLAG 0x10
 #define TCP_PSH_FLAG 0x08
-- 
1.9.3

[dpdk-dev] [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters

2015-01-22 Thread Jingjing Wu

v2 changes:
  - remove the code which is already applied in patch "Integrate ethertype
filter in igb/ixgbe driver to new API".
  - modify commands' description in doc testpmd_funcs.rst.

The patch set uses filter_ctrl API to replace old 2tuple and 5tuple filter APIs.
It defines ntuple filter to combine 2tuple and 5tuple types. 
It uses new functions and structure to replace old ones in igb/ixgbe driver,
new commands to replace old ones in testpmd, and removes the old APIs.
It removes the filter's index parameters from user interface, only the
filter's key and assigned queue are visible to user.

Jingjing Wu (6):
  ethdev: define ntuple filter type and its structure
  ixgbe: ntuple filter functions replace old ones for 5tuple filter
  e1000: ntuple filter functions replace old ones for 2tuple and 5tuple
filter
  testpmd: new commands for ntuple filter
  ethdev: remove old APIs and structures of 5tuple and 2tuple filters
  doc: commands changed in testpmd_funcs for 2tuple amd 5tuple filter

 app/test-pmd/cmdline.c  | 406 ++---
 app/test-pmd/config.c   |  65 ---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  99 +---
 lib/librte_ether/rte_eth_ctrl.h |  57 ++
 lib/librte_ether/rte_ethdev.c   | 116 
 lib/librte_ether/rte_ethdev.h   | 192 --
 lib/librte_pmd_e1000/e1000_ethdev.h |  69 ++-
 lib/librte_pmd_e1000/igb_ethdev.c   | 869 +++-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 468 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  52 +-
 10 files changed, 1300 insertions(+), 1093 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v9 5/5] app/testpmd: add commands to support hash functions

2015-01-22 Thread Helin Zhang

To demonstrate the hash filter control, commands are added.
They are,
- get_sym_hash_ena_per_port
- set_sym_hash_ena_per_port
- get_hash_global_config
- set_hash_global_config

Signed-off-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 333 +
 1 file changed, 333 insertions(+)

v6 changes:
* Flow type strings are used to replace Packet Classification Types, to isolate
  hardware specific things.

v7 changes:
* Removed commands of,
  get_sym_hash_ena_per_pctype
  set_sym_hash_ena_per_pctype
  get_filter_swap
  set_filter_swap
  get_hash_function
  set_hash_function.
* Added new commands of,
  get_hash_global_config
  set_hash_global_config

v8 changes:
* Fixed the compile issue on ICC, of "error #188: enumerated type mixed with
  another type".

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4beb404..590e427 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -735,6 +736,21 @@ static void cmd_help_long_parsed(void *parsed_result,
"flow_director_flex_payload (port_id)"
" (l2|l3|l4) (config)\n"
"Configure flex payload selection.\n\n"
+
+   "get_sym_hash_ena_per_port (port_id)\n"
+   "get symmetric hash enable configuration per 
port.\n\n"
+
+   "set_sym_hash_ena_per_port (port_id) (enable|disable)\n"
+   "set symmetric hash enable configuration per port"
+   " to enable or disable.\n\n"
+
+   "get_hash_global_config (port_id)\n"
+   "Get the global configurations of hash filters.\n\n"
+
+   "set_hash_global_config (port_id) 
(toeplitz|simple_xor|default)"
+   " 
(ip4|ip4-frag|tcp4|udp4|#sctp4|ip6|ip6-frag|tcp6|udp6|sctp6)"
+   " (enable|disable)\n"
+   "Set the global configurations of hash filters.\n\n"
);
}
 }
@@ -8670,6 +8686,319 @@ cmdline_parse_inst_t cmd_set_flow_director_flex_payload 
= {
},
 };

+/* *** Classification Filters Control *** */
+/* *** Get symmetric hash enable per port *** */
+struct cmd_get_sym_hash_ena_per_port_result {
+   cmdline_fixed_string_t get_sym_hash_ena_per_port;
+   uint8_t port_id;
+};
+
+static void
+cmd_get_sym_hash_per_port_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct cmd_get_sym_hash_ena_per_port_result *res = parsed_result;
+   struct rte_eth_hash_filter_info info;
+   int ret;
+
+   if (rte_eth_dev_filter_supported(res->port_id,
+   RTE_ETH_FILTER_HASH) < 0) {
+   printf("RTE_ETH_FILTER_HASH not supported on port: %d\n",
+   res->port_id);
+   return;
+   }
+
+   memset(&info, 0, sizeof(info));
+   info.info_type = RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT;
+   ret = rte_eth_dev_filter_ctrl(res->port_id, RTE_ETH_FILTER_HASH,
+   RTE_ETH_FILTER_GET, &info);
+
+   if (ret < 0) {
+   printf("Cannot get symmetric hash enable per port "
+   "on port %u\n", res->port_id);
+   return;
+   }
+
+   printf("Symmetric hash is %s on port %u\n", info.info.enable ?
+   "enabled" : "disabled", res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_get_sym_hash_ena_per_port_all =
+   TOKEN_STRING_INITIALIZER(struct cmd_get_sym_hash_ena_per_port_result,
+   get_sym_hash_ena_per_port, "get_sym_hash_ena_per_port");
+cmdline_parse_token_num_t cmd_get_sym_hash_ena_per_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_get_sym_hash_ena_per_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_get_sym_hash_ena_per_port = {
+   .f = cmd_get_sym_hash_per_port_parsed,
+   .data = NULL,
+   .help_str = "get_sym_hash_ena_per_port port_id",
+   .tokens = {
+   (void *)&cmd_get_sym_hash_ena_per_port_all,
+   (void *)&cmd_get_sym_hash_ena_per_port_port_id,
+   NULL,
+   },
+};
+
+/* *** Set symmetric hash enable per port *** */
+struct cmd_set_sym_hash_ena_per_port_result {
+   cmdline_fixed_string_t set_sym_hash_ena_per_port;
+   cmdline_fixed_string_t enable;
+   uint8_t port_id;
+};
+
+static void
+cmd_set_sym_hash_per_port_parsed(void *parsed_result,
+__rte_unused struct cmdline *cl,
+__rte_unused void *data)
+{
+   struct cmd_set_sym_hash_ena_per_port_result *res = parsed_result;
+   struct rte_eth_hash_filter_info

[dpdk-dev] [PATCH v9 4/5] i40e: support of controlling hash functions

2015-01-22 Thread Helin Zhang

Hash filter control has been implemented for i40e. It includes
getting/setting,
- global hash configurations (hash function type, and symmetric
  hash enable per flow type)
- symmetric hash enable per port

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 294 +-
 1 file changed, 292 insertions(+), 2 deletions(-)

v5 changes:
* Integrated with filter API defined recently.

v6 changes:
* Implemented the mapping function to convert RSS offload types to Packet
  Classification Types, to isolate the real hardware specific things.
* Removed initialization of global registers in i40e PMD, as global registers
  shouldn't be initialized per port.
* Added more annotations to get code more understandable.
* Corrected annotation format for documenation.

v7 changes:
* Removed swap configurations, as it is not allowed by hardware design.
* Put symmetric hash per flow type and hash function type into
  'RTE_ETH_HASH_FILTER_GLOBAL_CONFIG', as they are controlling global registers
  which will affects all the ports of the same NIC.

v8 changes:
* Removed redundant return value checks of i40e_flowtype_to_pctype(), as it
  should always be correct.
* Fixed the compile issue on ICC, of "error #188: enumerated type mixed with
  another type".

v9 changes:
* Splitted the patch, one is for i40e only.

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index 48bc34d..9fa6bec 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -93,6 +93,18 @@
I40E_PFINT_ICR0_ENA_VFLR_MASK | \
I40E_PFINT_ICR0_ENA_ADMINQ_MASK)

+#define I40E_FLOW_TYPES ( \
+   (1UL << RTE_ETH_FLOW_TYPE_UDPV4) | \
+   (1UL << RTE_ETH_FLOW_TYPE_TCPV4) | \
+   (1UL << RTE_ETH_FLOW_TYPE_SCTPV4) | \
+   (1UL << RTE_ETH_FLOW_TYPE_IPV4_OTHER) | \
+   (1UL << RTE_ETH_FLOW_TYPE_FRAG_IPV4) | \
+   (1UL << RTE_ETH_FLOW_TYPE_UDPV6) | \
+   (1UL << RTE_ETH_FLOW_TYPE_TCPV6) | \
+   (1UL << RTE_ETH_FLOW_TYPE_SCTPV6) | \
+   (1UL << RTE_ETH_FLOW_TYPE_IPV6_OTHER) | \
+   (1UL << RTE_ETH_FLOW_TYPE_FRAG_IPV6))
+
 static int eth_i40e_dev_init(\
__attribute__((unused)) struct eth_driver *eth_drv,
struct rte_eth_dev *eth_dev);
@@ -199,6 +211,7 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
enum rte_filter_op filter_op,
void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);
+static void i40e_hw_init(struct i40e_hw *hw);

 static struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
@@ -398,6 +411,9 @@ eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,
/* Make sure all is clean before doing PF reset */
i40e_clear_hw(hw);

+   /* Initialize the hardware */
+   i40e_hw_init(hw);
+
/* Reset here to make sure all is clean for each PF */
ret = i40e_pf_reset(hw);
if (ret) {
@@ -5136,6 +5152,260 @@ i40e_pf_config_mq_rx(struct i40e_pf *pf)
return ret;
 }

+/* Get the symmetric hash enable configurations per port */
+static void
+i40e_get_symmetric_hash_enable_per_port(struct i40e_hw *hw, uint8_t *enable)
+{
+   uint32_t reg = I40E_READ_REG(hw, I40E_PRTQF_CTL_0);
+
+   *enable = reg & I40E_PRTQF_CTL_0_HSYM_ENA_MASK ? 1 : 0;
+}
+
+/* Set the symmetric hash enable configurations per port */
+static void
+i40e_set_symmetric_hash_enable_per_port(struct i40e_hw *hw, uint8_t enable)
+{
+   uint32_t reg = I40E_READ_REG(hw, I40E_PRTQF_CTL_0);
+
+   if (enable > 0) {
+   if (reg & I40E_PRTQF_CTL_0_HSYM_ENA_MASK) {
+   PMD_DRV_LOG(INFO, "Symmetric hash has already "
+   "been enabled");
+   return;
+   }
+   reg |= I40E_PRTQF_CTL_0_HSYM_ENA_MASK;
+   } else {
+   if (!(reg & I40E_PRTQF_CTL_0_HSYM_ENA_MASK)) {
+   PMD_DRV_LOG(INFO, "Symmetric hash has already "
+   "been disabled");
+   return;
+   }
+   reg &= ~I40E_PRTQF_CTL_0_HSYM_ENA_MASK;
+   }
+   I40E_WRITE_REG(hw, I40E_PRTQF_CTL_0, reg);
+   I40E_WRITE_FLUSH(hw);
+}
+
+/*
+ * Get global configurations of hash function type and symmetric hash enable
+ * per flow type (pctype). Note that global configuration means it affects all
+ * the ports on the same NIC.
+ */
+static int
+i40e_get_hash_filter_global_config(struct i40e_hw *hw,
+  struct rte_eth_hash_global_conf *g_cfg)
+{
+   uint32_t reg, mask = I40E_FLOW_TYPES;
+   uint32_t i;
+   enum i40e_filter_pctype pctype;
+
+   memset(g_cfg, 0, sizeof(*g_cfg));
+   reg = I40E_READ_REG(hw, I40E_GLQF_CTL);
+   if (r

[dpdk-dev] [PATCH v9 3/5] ethdev: support of configuring hash functions

2015-01-22 Thread Helin Zhang

In order to support hash filter configuration, filter type of hash
is added, also the corresponding structures, macros and definitions
are added.

Signed-off-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 63 +
 1 file changed, 63 insertions(+)

v9 changes:
* Added typo fixes.
* Splitted the patch, one is for ethdev only.

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 4b3c5fc..0ce241e 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -55,6 +55,7 @@ enum rte_filter_type {
RTE_ETH_FILTER_ETHERTYPE,
RTE_ETH_FILTER_TUNNEL,
RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_HASH,
RTE_ETH_FILTER_MAX
 };

@@ -449,6 +450,68 @@ struct rte_eth_fdir_stats {
uint32_t best_cnt; /**< Number of filters in best effort spaces. */
 };

+/**
+ * Hash filter information types.
+ * - RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT is for getting/setting the
+ *   information/configuration of 'symmetric hash enable' per port.
+ * - RTE_ETH_HASH_FILTER_GLOBAL_CONFIG is for getting/setting the global
+ *   configurations of hash filters. Those global configurations are valid
+ *   for all ports of the same NIC.
+ */
+enum rte_eth_hash_filter_info_type {
+   RTE_ETH_HASH_FILTER_INFO_TYPE_UNKNOWN = 0,
+   /** Symmetric hash enable per port */
+   RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT,
+   /** Configure globally for hash filter */
+   RTE_ETH_HASH_FILTER_GLOBAL_CONFIG,
+   RTE_ETH_HASH_FILTER_INFO_TYPE_MAX,
+};
+
+/**
+ * Hash function types.
+ */
+enum rte_eth_hash_function {
+   RTE_ETH_HASH_FUNCTION_DEFAULT = 0,
+   RTE_ETH_HASH_FUNCTION_TOEPLITZ, /**< Toeplitz */
+   RTE_ETH_HASH_FUNCTION_SIMPLE_XOR, /**< Simple XOR */
+   RTE_ETH_HASH_FUNCTION_MAX,
+};
+
+#define UINT32_BIT (CHAR_BIT * sizeof(uint32_t))
+#define RTE_SYM_HASH_MASK_ARRAY_SIZE \
+   (RTE_ALIGN(RTE_ETH_FLOW_TYPE_MAX, UINT32_BIT)/UINT32_BIT)
+/**
+ * A structure used to set or get global hash function configurations which
+ * include symmetric hash enable per flow type and hash function type.
+ * Each bit in sym_hash_enable_mask[] indicates if the symmetric hash of the
+ * coresponding flow type is enabled or not.
+ * Each bit in valid_bit_mask[] indicates if the corresponding bit in
+ * sym_hash_enable_mask[] is valid or not. For the configurations gotten, it
+ * also means if the flow type is supported by hardware or not.
+ */
+struct rte_eth_hash_global_conf {
+   enum rte_eth_hash_function hash_func; /**< Hash function type */
+   /** Bit mask for symmetric hash enable per flow type */
+   uint32_t sym_hash_enable_mask[RTE_SYM_HASH_MASK_ARRAY_SIZE];
+   /** Bit mask indicates if the corresponding bit is valid */
+   uint32_t valid_bit_mask[RTE_SYM_HASH_MASK_ARRAY_SIZE];
+};
+
+/**
+ * A structure used to set or get hash filter information, to support filter
+ * type of 'RTE_ETH_FILTER_HASH' and its operations.
+ */
+struct rte_eth_hash_filter_info {
+   enum rte_eth_hash_filter_info_type info_type; /**< Information type */
+   /** Details of hash filter information */
+   union {
+   /** For RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT */
+   uint8_t enable;
+   /** Global configurations of hash filter */
+   struct rte_eth_hash_global_conf global_conf;
+   } info;
+};
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.1.4

[dpdk-dev] [PATCH v9 2/5] ethdev: code style fixes

2015-01-22 Thread Helin Zhang

Added code style fixes.

Signed-off-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 5d9c387..4b3c5fc 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -62,8 +62,8 @@ enum rte_filter_type {
  * Generic operations on filters
  */
 enum rte_filter_op {
+   /** used to check whether the type filter is supported */
RTE_ETH_FILTER_NOP = 0,
-   /**< used to check whether the type filter is supported */
RTE_ETH_FILTER_ADD,  /**< add filter entry */
RTE_ETH_FILTER_UPDATE,   /**< update filter entry */
RTE_ETH_FILTER_DELETE,   /**< delete filter entry */
@@ -75,16 +75,15 @@ enum rte_filter_op {
RTE_ETH_FILTER_OP_MAX
 };

-/**
+/*
  * MAC filter type
  */
 enum rte_mac_filter_type {
RTE_MAC_PERFECT_MATCH = 1, /**< exact match of MAC addr. */
-   RTE_MACVLAN_PERFECT_MATCH,
-   /**< exact match of MAC addr and VLAN ID. */
+   RTE_MACVLAN_PERFECT_MATCH, /**< exact match of MAC addr and VLAN ID. */
RTE_MAC_HASH_MATCH, /**< hash match of MAC addr. */
+   /** hash match of MAC addr and exact match of VLAN ID. */
RTE_MACVLAN_HASH_MATCH,
-   /**< hash match of MAC addr and exact match of VLAN ID. */
 };

 /**
-- 
1.8.1.4

[dpdk-dev] [PATCH v9 1/5] i40e: use constant as the default hash keys

2015-01-22 Thread Helin Zhang

Calculating the default RSS hash keys at run time is not needed
at all, and may have race conditions. The alternative is to use
array of random values which were generated manually as the
default hash keys.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index b47a3d2..48bc34d 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -73,7 +73,7 @@
 /* Maximun number of VSI */
 #define I40E_MAX_NUM_VSIS  (384UL)

-/* Default queue interrupt throttling time in microseconds*/
+/* Default queue interrupt throttling time in microseconds */
 #define I40E_ITR_INDEX_DEFAULT  0
 #define I40E_QUEUE_ITR_INTERVAL_DEFAULT 32 /* 32 us */
 #define I40E_QUEUE_ITR_INTERVAL_MAX 8160 /* 8160 us */
@@ -200,9 +200,6 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);

-/* Default hash key buffer for RSS */
-static uint32_t rss_key_default[I40E_PFQF_HKEY_MAX_INDEX + 1];
-
 static struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
 #include "rte_pci_dev_ids.h"
@@ -5039,9 +5036,12 @@ i40e_pf_config_rss(struct i40e_pf *pf)
}
if (rss_conf.rss_key == NULL || rss_conf.rss_key_len <
(I40E_PFQF_HKEY_MAX_INDEX + 1) * sizeof(uint32_t)) {
-   /* Calculate the default hash key */
-   for (i = 0; i <= I40E_PFQF_HKEY_MAX_INDEX; i++)
-   rss_key_default[i] = (uint32_t)rte_rand();
+   /* Random default keys */
+   static uint32_t rss_key_default[] = {0x6b793944,
+   0x23504cb5, 0x5bea75b6, 0x309f4f12, 0x3dc0a2b8,
+   0x024ddcdf, 0x339b8ca0, 0x4c4af64a, 0x34fac605,
+   0x55d85839, 0x3a58997d, 0x2ec938e1, 0x66031581};
+
rss_conf.rss_key = (uint8_t *)rss_key_default;
rss_conf.rss_key_len = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
sizeof(uint32_t);
-- 
1.8.1.4

[dpdk-dev] [PATCH v9 0/5] Support configuring hash functions

2015-01-22 Thread Helin Zhang

These patches mainly support configuring hash functions. In detail,
 - It can get/set global hash configurations.
  * Get/set symmetric hash enable per flow type.
  * Get/set hash function type.
 - It can get/set symmetric hash enable per port.
 - Four commands have been implemented in testpmd to support testing above.
   * get_sym_hash_ena_per_port
   * set_sym_hash_ena_per_port
   * get_hash_global_config
   * set_hash_global_config

It also uses constant hash keys to replace runtime generating hash keys.
Global initialization is added to correctly put registers to an initial state.

v3 changes:
* Removed renamings in rte_ethdev.h.
* Redesigned filter control API and its relevant structures/enums.
* Renamed header file from rte_eth_features.h to rte_eth_ctrol.h.
* Remove public header file of rte_i40e.h specific for i40e.
* Added hardware initialization function during port init.
* Used constant random hash keys in i40e PF.
* renamed the commands in testpmd based on the redesigned filter control API.

v4 changes:
* Fixed a bug in testpmd to support 'set_sym_hash_ena_per_port'.

v5 changes:
* Integrated with filter API defined recently.
* Remove all for filter API definition, as it has already defined and merged
  recently.

v6 changes:
* Flow type strings are used to replace Packet Classification Types, to isolate
  hardware specific things.
* Implemented the mapping function to convert RSS offload types to Packet
  Classification Types, to isolate the real hardware specific things.
* Removed initialization of global registers in i40e PMD, as global registers
  shouldn't be initialized per port.
* Added more annotations to get code more understandable.
* Corrected annotation format for documenation.

v7 changes:
* Removed swap configurations, as it is not allowed by hardware design.
* Put symmetric hash per flow type and hash function type into
  'RTE_ETH_HASH_FILTER_GLOBAL_CONFIG', as they are controlling global registers
  which will affects all the ports of the same NIC.

v8 changes:
* Removed redundant checks in i40e_ethdev.c.
* Solved compile errors on ICC.

v9 changes:
* Added typo fixes in rte_eth_ctrl.h.
* Splitted modifications in both rte_eth_ctrl.h and i40e, one patch is for 
ethdev,
  the other one is for i40e.

Helin Zhang (5):
  i40e: use constant as the default hash keys
  ethdev: code style fixes
  ethdev: support of configuring hash functions
  i40e: support of controlling hash functions
  app/testpmd: add commands to support hash functions

 app/test-pmd/cmdline.c| 333 ++
 lib/librte_ether/rte_eth_ctrl.h   |  72 -
 lib/librte_pmd_i40e/i40e_ethdev.c | 308 +--
 3 files changed, 699 insertions(+), 14 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 02:34:07PM +, Ananyev, Konstantin wrote:
> Hi Bruce,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, January 22, 2015 12:19 PM
> > To: Liang, Cunming
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> > cpu assignment
> > 
> > On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote:
> > > It supports one new eal long option '--lcores' for EAL thread cpuset 
> > > assignment.
> > >
> > > The format pattern:
> > >   --lcores='lcores[@cpus]<,lcores[@cpus]>'
> > > lcores, cpus could be a single digit or a group.
> > > '(' and ')' are necessary if it's a group.
> > > If not supply '@cpus', the value of cpus uses the same as lcores.
> > >
> > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
> > >   lcore 0 runs on cpuset 0x41 (cpu 0,6)
> > >   lcore 1 runs on cpuset 0x2 (cpu 1)
> > >   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
> > >   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
> > >   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> > >
> > 
> > This strikes me as very confusing, though a couple of tweaks might help with
> > readability. The lcore 0 at the end is especially confusing.
> 
> Didn't get you here: do you find (0,6) confusing, right?
> Because braces implicitly specifies affinity for group of en-braced lcores? 
> 
> > Perhaps we can
> > limit the allowed formats here,
> > * require the lcore_id to be specified - the lack of an lcore id for the 
> > last part
> > makes having it as lcore 0 surprising.
> 
> Again, not sure I understand you properly:  lcore_id(s) are always specified 
> explicitly. 
> Physical cpus part might be omitted.
> 
> > * only allow one lcore id to be given for each set of cores.
> 
> So you mean for '(3-5)@(0,2)' user would have to: '3@(0,2),4@(0,2),5@(0,2)'?
> I don't see big difference here, but imagine you'd like to create a pool of 
> 32 EAL-threads running on same cpu set.
> With current syntax it is just something like: '(32-63)@(0-7)'.
> With what you proposing it will be a very long list.  
> 
> > 
> > I think it may still be readable if we allow the core set to be omitted if 
> > its
> > to be the same as the lcore_id.
> 
> I think that is supported.
> See lcore_id=1 in Steve's example above.
> As I understand: --lcores='0,2,3-5' is equal to '-l 0,2,3-5' and to '-c 0x3d'.
> 
> Konstantin

Ok, thanks for the clarification.

/Bruce

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 08:53:13PM +0800, Linhaifeng wrote:
> 
> 
> On 2015/1/22 19:34, Bruce Richardson wrote:
> > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/01/22 16:35, Matthew Hall wrote:
> >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
>  Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> >>> No guarantee. But a theory. It might use some things from the EAL init to 
> >>> figure out which version of the accelerated algorithm to use.
> >>
> >> This selection is done at compile-time.
> >> And if the size is constant, I guess DPDK assumes memcpy is replaced by
> >> inline __builtin_memcpy.
> >> I haven't checked the performance of builtin memcpy, but probably much
> >> faster.
> >>
> > 
> > Yes, that assumption is correct. A couple of years ago we discovered that 
> > for
> > constant size values, the compiler would generate much faster code for us
> > using a regular memcpy than rte_memcpy, hence the macro.
> > 
> > /Bruce
> > 
> >> Tetsuya
> >>
> >>> Matthew.
> >>
> >>
> > 
> > 
> 
> Hi,Bruce
> 
> I test it,most results like you said use constant may be faster,but sometimes 
> not.
> 
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
> rte_memcpy(constant) used:279893712   @@ not faster
> rte_memcpy(variable) used:277818600
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
> rte_memcpy(constant) used:279264328   @@ not faster
> rte_memcpy(variable) used:277667116
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
> rte_memcpy(constant) used:279491832   @@ not faster
> rte_memcpy(variable) used:277622772
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
> rte_memcpy(constant) used:279402156   @@ not faster
> rte_memcpy(variable) used:277738464
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
> rte_memcpy(constant) used:279305172   @@ not faster
> rte_memcpy(variable) used:277483004
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
> rte_memcpy(constant) used:279784124   @@ not faster
> rte_memcpy(variable) used:277605332
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
> rte_memcpy(constant) used:322817260
> rte_memcpy(variable) used:350333864
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
> rte_memcpy(constant) used:322840748
> rte_memcpy(variable) used:350297868
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
> rte_memcpy(constant) used:322488240
> rte_memcpy(variable) used:350348652
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
> rte_memcpy(constant) used:322021428
> rte_memcpy(variable) used:350416440
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
> rte_memcpy(constant) used:321370900
> rte_memcpy(variable) used:350355796
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
> rte_memcpy(constant) used:322704552
> rte_memcpy(variable) used:349900832
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
> rte_memcpy(constant) used:422705828
> rte_memcpy(variable) used:425493328
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
> rte_memcpy(constant) used:422421840   @@ not faster
> rte_memcpy(variable) used:413691412
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
> rte_memcpy(constant) used:425233088   @@ not faster
> rte_memcpy(variable) used:421136724
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
> rte_memcpy(constant) used:901014608   @@ not faster
> rte_memcpy(variable) used:900997388
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
> rte_memcpy(constant) used:900803308   @@ not faster
> rte_memcpy(variable) used:900794076
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
> rte_memcpy(constant) used:901842436   @@ not faster
> rte_memcpy(variable) used:901218984
> linux-mnSyvH:/mnt/sdb/linhf/test #
> 
> 
> 
> here is my test codes:
> 
> #include 
> #include 
> #include 
> 
> 
> int main(int narg, char** args)
> {
> int i;
> char buf[1024];
> uint64_t start, end;
> 
> if (narg < 3) {
> printf("usage:./rte_memcpy_test size times\n");
> return 0;
> }
> 
> size_t size_v = atoi(args[1]);
> const size_t size_c = atoi(args[1]);

This (size_c) is a run-time constant, not a compile-time constant. To trigger 
the
memcpy optimizations inside the compiler, the size value must be constant at
compile time.

Regards,
/Bruce

> int times = atoi(args[2]);
> 
> start = rte_rdtsc();
> for(i = 0; i < times; i++) {
> rte_memcpy(buf, buf, size_c);
> }
> end = rte_rdtsc();
> printf("rte_memcpy(constant) used:%llu\n", end - start);
> 
> start = rte_rdtsc();

[dpdk-dev] Segmentation fault in ixgbe_rxtx_vec.c:444 with 1.8.0

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 07:35:45PM +0530, Prashant Upadhyaya wrote:
> On Wed, Jan 21, 2015 at 7:19 PM, Bruce Richardson <
> bruce.richardson at intel.com> wrote:
> 
> > On Tue, Jan 20, 2015 at 11:39:03AM +0100, Martin Weiser wrote:
> > > Hi again,
> > >
> > > I did some further testing and it seems like this issue is linked to
> > > jumbo frames. I think a similar issue has already been reported by
> > > Prashant Upadhyaya with the subject 'Packet Rx issue with DPDK1.8'.
> > > In our application we use the following rxmode port configuration:
> > >
> > > .mq_mode= ETH_MQ_RX_RSS,
> > > .split_hdr_size = 0,
> > > .header_split   = 0,
> > > .hw_ip_checksum = 1,
> > > .hw_vlan_filter = 0,
> > > .jumbo_frame= 1,
> > > .hw_strip_crc   = 1,
> > > .max_rx_pkt_len = 9000,
> > >
> > > and the mbuf size is calculated like the following:
> > >
> > > (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
> > >
> > > This works fine with DPDK 1.7 and jumbo frames are split into buffer
> > > chains and can be forwarded on another port without a problem.
> > > With DPDK 1.8 and the default configuration (CONFIG_RTE_IXGBE_INC_VECTOR
> > > enabled) the application sometimes crashes like described in my first
> > > mail and sometimes packet receiving stops with subsequently arriving
> > > packets counted as rx errors. When CONFIG_RTE_IXGBE_INC_VECTOR is
> > > disabled the packet processing also comes to a halt as soon as jumbo
> > > frames arrive with a the slightly different effect that now
> > > rte_eth_tx_burst refuses to send any previously received packets.
> > >
> > > Is there anything special to consider regarding jumbo frames when moving
> > > from DPDK 1.7 to 1.8 that we might have missed?
> > >
> > > Martin
> > >
> > >
> > >
> > > On 19.01.15 11:26, Martin Weiser wrote:
> > > > Hi everybody,
> > > >
> > > > we quite recently updated one of our applications to DPDK 1.8.0 and are
> > > > now seeing a segmentation fault in ixgbe_rxtx_vec.c:444 after a few
> > minutes.
> > > > I just did some quick debugging and I only have a very limited
> > > > understanding of the code in question but it seems that the 'continue'
> > > > in line 445 without increasing 'buf_idx' might cause the problem. In
> > one
> > > > debugging session when the crash occurred the value of 'buf_idx' was 2
> > > > and the value of 'pkt_idx' was 8965.
> > > > Any help with this issue would be greatly appreciated. If you need any
> > > > further information just let me know.
> > > >
> > > > Martin
> > > >
> > > >
> > >
> > Hi Martin, Prashant,
> >
> > I've managed to reproduce the issue here and had a look at it. Could you
> > both perhaps try the proposed change below and see if it fixes the problem
> > for
> > you and gives you a working system? If so, I'll submit this as a patch fix
> > officially - or go back to the drawing board, if not. :-)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > index b54cb19..dfaccee 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > @@ -402,10 +402,10 @@ reassemble_packets(struct igb_rx_queue *rxq, struct
> > rte_mbuf **rx_bufs,
> > struct rte_mbuf *pkts[RTE_IXGBE_VPMD_RX_BURST]; /*finished pkts*/
> > struct rte_mbuf *start = rxq->pkt_first_seg;
> > struct rte_mbuf *end =  rxq->pkt_last_seg;
> > -   unsigned pkt_idx = 0, buf_idx = 0;
> > +   unsigned pkt_idx, buf_idx;
> >
> >
> > -   while (buf_idx < nb_bufs) {
> > +   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> > if (end != NULL) {
> > /* processing a split packet */
> > end->next = rx_bufs[buf_idx];
> > @@ -448,7 +448,6 @@ reassemble_packets(struct igb_rx_queue *rxq, struct
> > rte_mbuf **rx_bufs,
> > rx_bufs[buf_idx]->data_len += rxq->crc_len;
> > rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
> > }
> > -   buf_idx++;
> > }
> >
> > /* save the partial packet for next time */
> >
> >
> > Regards,
> > /Bruce
> >
> > Hi Bruce,
> 
> I am afraid your patch did not work for me. In my case I am not trying to
> receive jumbo frames but normal frames. They are not received at my
> application. Further, your patched function is not getting stimulated in my
> usecase.
> 
> Regards
> -Prashant

Hi Prashant,

can your problem be reproduced using testpmd? If so can you perhaps send me the
command-line for testpmd and traffic profile needed to reproduce the issue?

Thanks,
/Bruce

[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Wodkowski, PawelX

Hi,
I want to mention that similar but for me much more readable syntax have 
Pktgen-DPDK for defining core - port mapping. Maybe we can adopt this syntax
for new '--lcores' parameter.

See '-m' parameter syntax on Pktgen readme.
https://github.com/pktgen/Pktgen-DPDK/blob/master/dpdk/examples/pktgen/README.md

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, January 22, 2015 3:34 PM
> To: Richardson, Bruce; Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> cpu
> assignment
> 
> Hi Bruce,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, January 22, 2015 12:19 PM
> > To: Liang, Cunming
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> > cpu
> assignment
> >
> > On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote:
> > > It supports one new eal long option '--lcores' for EAL thread cpuset
> assignment.
> > >
> > > The format pattern:
> > >   --lcores='lcores[@cpus]<,lcores[@cpus]>'
> > > lcores, cpus could be a single digit or a group.
> > > '(' and ')' are necessary if it's a group.
> > > If not supply '@cpus', the value of cpus uses the same as lcores.
> > >
> > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
> > >   lcore 0 runs on cpuset 0x41 (cpu 0,6)
> > >   lcore 1 runs on cpuset 0x2 (cpu 1)
> > >   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
> > >   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
> > >   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> > >
> >
> > This strikes me as very confusing, though a couple of tweaks might help with
> > readability. The lcore 0 at the end is especially confusing.
> 
> Didn't get you here: do you find (0,6) confusing, right?
> Because braces implicitly specifies affinity for group of en-braced lcores?
> 
> > Perhaps we can
> > limit the allowed formats here,
> > * require the lcore_id to be specified - the lack of an lcore id for the 
> > last part
> > makes having it as lcore 0 surprising.
> 
> Again, not sure I understand you properly:  lcore_id(s) are always specified
> explicitly.
> Physical cpus part might be omitted.
> 
> > * only allow one lcore id to be given for each set of cores.
> 
> So you mean for '(3-5)@(0,2)' user would have to: '3@(0,2),4@(0,2),5@(0,2)'?
> I don't see big difference here, but imagine you'd like to create a pool of 
> 32 EAL-
> threads running on same cpu set.
> With current syntax it is just something like: '(32-63)@(0-7)'.
> With what you proposing it will be a very long list.
> 
> >
> > I think it may still be readable if we allow the core set to be omitted if 
> > its
> > to be the same as the lcore_id.
> 
> I think that is supported.
> See lcore_id=1 in Steve's example above.
> As I understand: --lcores='0,2,3-5' is equal to '-l 0,2,3-5' and to '-c 0x3d'.
> 
> Konstantin
> 
> >
> > It's probably still not going to be very tidy, but I think we can improve 
> > things.
> >
> > /Bruce
> >
> > > Signed-off-by: Cunming Liang 
> > > ---
> > >  lib/librte_eal/common/eal_common_launch.c  |   1 -
> > >  lib/librte_eal/common/eal_common_options.c | 262
> -
> > >  lib/librte_eal/common/eal_options.h|   2 +
> > >  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
> > >  4 files changed, 261 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/eal_common_launch.c
> b/lib/librte_eal/common/eal_common_launch.c
> > > index 599f83b..2d732b1 100644
> > > --- a/lib/librte_eal/common/eal_common_launch.c
> > > +++ b/lib/librte_eal/common/eal_common_launch.c
> > > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
> > >   rte_eal_wait_lcore(lcore_id);
> > >   }
> > >  }
> > > -
> > > diff --git a/lib/librte_eal/common/eal_common_options.c
> b/lib/librte_eal/common/eal_common_options.c
> > > index e2810ab..fc47588 100644
> > > --- a/lib/librte_eal/common/eal_common_options.c
> > > +++ b/lib/librte_eal/common/eal_common_options.c
> > > @@ -45,6 +45,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  #include "eal_internal_cfg.h"
> > >  #include "eal_options.h"
> > > @@ -85,6 +86,7 @@ eal_long_options[] = {
> > >   {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
> > >   {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
> > >   {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> > > + {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
> > >   {0, 0, 0, 0}
> > >  };
> > >
> > > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
> > >   if (min == RTE_MAX_LCORE)
> > >   min = idx;
> > >   for (idx = min; idx <= max; idx++) {
> > > - cfg->lcore_role[idx] = ROLE_RTE;
> > > - lcore_config[idx].core_index = count;
> > > - count++;
> > > +

[dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to rte_memzone_reserve

2015-01-22 Thread Cian Ferriter

Removes the dependency that this memzone reserve has on the
socket currently running on. Following the socket of the master
core will yield more predictable results when calling this
function after initialisation.

Signed-off-by: Cian Ferriter 
Reviewed-by: Maryam Tahhan 
Reviewed-by: Bruce Richardson 
---
 lib/librte_ether/rte_ethdev.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
 mode change 100644 => 100755 lib/librte_ether/rte_ethdev.c

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
old mode 100644
new mode 100755
index ea3a1fb..088bffc
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -184,7 +184,7 @@ rte_eth_dev_data_alloc(void)
if (rte_eal_process_type() == RTE_PROC_PRIMARY){
mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
-   rte_socket_id(), flags);
+   rte_lcore_to_socket_id(rte_get_master_lcore()), 
flags);
} else
mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
if (mz == NULL)
-- 
1.7.4.1

[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-22 Thread Vlad Zolotarov


On 01/21/15 10:44, Wodkowski, PawelX wrote:
>
>> -Original Message-
>> From: Ouyang, Changchun
>> Sent: Wednesday, January 21, 2015 3:44 AM
>> To: Wodkowski, PawelX; dev at dpdk.org
>> Cc: Ouyang, Changchun
>> Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
>>
>>
>>
>>> -Original Message-
>>> From: Wodkowski, PawelX
>>> Sent: Tuesday, January 20, 2015 5:35 PM
>>> To: Ouyang, Changchun; dev at dpdk.org
>>> Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
>>>
 -Original Message-
 From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
>>> Changchun
 Sent: Monday, January 12, 2015 6:59 AM
 To: dev at dpdk.org
 Subject: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

 It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF
>>> RSS.
 The psrtype will determine how many queues the received packets will
 distribute to, and the value of psrtype should depends on both facet:
 max VF rxq number which has been negotiated with PF, and the number of
 rxq specified in config on guest.

 Signed-off-by: Changchun Ouyang 

 Changes in v6:
- Raise an error for the case of ETH_16_POOLS in config vf rss, as the
>>> previous
  logic have changed it into: ETH_32_POOLS.

 Changes in v4:
   - The number of rxq from config should be power of 2 and should not
 bigger than
  max VF rxq number(negotiated between guest and host).

 ---
   lib/librte_pmd_ixgbe/ixgbe_pf.c   |  15 ++
   lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 102
 +-
   2 files changed, 105 insertions(+), 12 deletions(-)

 diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
 b/lib/librte_pmd_ixgbe/ixgbe_pf.c index dbda9b5..93f6e43 100644
 --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
 +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
 @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
 *eth_dev)
IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
 mac.num_rar_entries),
 0);
IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
 mac.num_rar_entries),
 0);

 +  /*
 +   * VF RSS can support at most 4 queues for each VF, even if
 +   * 8 queues are available for each VF, it need refine to 4
 +   * queues here due to this limitation, otherwise no queue
 +   * will receive any packet even RSS is enabled.
 +   */
 +  if (eth_dev->data->dev_conf.rxmode.mq_mode ==
 ETH_MQ_RX_VMDQ_RSS) {
 +  if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
 +  RTE_ETH_DEV_SRIOV(eth_dev).active =
 ETH_32_POOLS;
 +  RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
 +  RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
 +  dev_num_vf(eth_dev) * 4;
 +  }
 +  }
 +
>>> I did not looked before at your patches but I think you are messing with
>>> things that should not be changed:
>>>
>>> Why you are changing those values. They are set up during
>>> ixgbe_pf_host_init(). Limitation you are describing is only RSS related. If
>>> there will be reconfiguration from ETH_MQ_RX_VMDQ_RSS to other mode
>>> those value need to be re-evaluated. If you find this kind of limitation you
>>> should handle it during RSS part configuration. Or if your way is the right 
>>> way
>>> you should explicitly make separate function that will re-evaluate those
>>> parameters each time.
>>>
>>> Second issue with this code is that the nb_q_per_pool is changed from:
>>> ixgbe_pf_host_configure() -> ixgbe_dev_start() -> rte_eth_dev_start() and
>>> rte_eth_dev_check_vf_rss_rxq_num() -> rte_eth_dev_check_mq_mode() ->
>>> rte_eth_dev_configure()
>>>
>>> Which one is the right one? If both, why they are calculated twice?
>>>
>>> I don't think that rte_eth_dev_data::sriov field should be changed at all - 
>>> it
>>> holds current SRIOV capabilities.
>>> If this will change during runtime it no point to have this field at all 
>>> and should
>>> be some kind of "siov_get()"
>>> function that will calculate and return those parameters dynamically.
>>>
>>> Please refer also to
>>>
>> >> .com>
>>> for further issues.
>>>
>>> I think this patchset should not be applied.
>> The better way should be either raise your comments before this patch is
>> merged into mainline, or
> Yes, I should but I trusted that Vlad review was covering this part.

I'm new on the list and my experience with DPDK is about two months so, 
pls., don't judge me too harsh... ;)
I tried to cover the obvious things and actually learned the code while 
reviewing. The things u say, Pavel(X?) make sense and I obviously missed 
that.
But as Changchun mentioned there is nothing that can't be fixed with a 
followup patches... ;)


> Does no matter
> my, fault.
>
>> You send out a patch to fix it.
>> I agree on part of what you said, the check is not necessary for vf rss in
>> pf_hos

[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Ananyev, Konstantin

Hi Bruce,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, January 22, 2015 12:19 PM
> To: Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> cpu assignment
> 
> On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote:
> > It supports one new eal long option '--lcores' for EAL thread cpuset 
> > assignment.
> >
> > The format pattern:
> > --lcores='lcores[@cpus]<,lcores[@cpus]>'
> > lcores, cpus could be a single digit or a group.
> > '(' and ')' are necessary if it's a group.
> > If not supply '@cpus', the value of cpus uses the same as lcores.
> >
> > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
> >   lcore 0 runs on cpuset 0x41 (cpu 0,6)
> >   lcore 1 runs on cpuset 0x2 (cpu 1)
> >   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
> >   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
> >   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> >
> 
> This strikes me as very confusing, though a couple of tweaks might help with
> readability. The lcore 0 at the end is especially confusing.

Didn't get you here: do you find (0,6) confusing, right?
Because braces implicitly specifies affinity for group of en-braced lcores? 

> Perhaps we can
> limit the allowed formats here,
> * require the lcore_id to be specified - the lack of an lcore id for the last 
> part
> makes having it as lcore 0 surprising.

Again, not sure I understand you properly:  lcore_id(s) are always specified 
explicitly. 
Physical cpus part might be omitted.

> * only allow one lcore id to be given for each set of cores.

So you mean for '(3-5)@(0,2)' user would have to: '3@(0,2),4@(0,2),5@(0,2)'?
I don't see big difference here, but imagine you'd like to create a pool of 32 
EAL-threads running on same cpu set.
With current syntax it is just something like: '(32-63)@(0-7)'.
With what you proposing it will be a very long list.  

> 
> I think it may still be readable if we allow the core set to be omitted if its
> to be the same as the lcore_id.

I think that is supported.
See lcore_id=1 in Steve's example above.
As I understand: --lcores='0,2,3-5' is equal to '-l 0,2,3-5' and to '-c 0x3d'.

Konstantin

> 
> It's probably still not going to be very tidy, but I think we can improve 
> things.
> 
> /Bruce
> 
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/common/eal_common_launch.c  |   1 -
> >  lib/librte_eal/common/eal_common_options.c | 262 
> > -
> >  lib/librte_eal/common/eal_options.h|   2 +
> >  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
> >  4 files changed, 261 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_launch.c 
> > b/lib/librte_eal/common/eal_common_launch.c
> > index 599f83b..2d732b1 100644
> > --- a/lib/librte_eal/common/eal_common_launch.c
> > +++ b/lib/librte_eal/common/eal_common_launch.c
> > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
> > rte_eal_wait_lcore(lcore_id);
> > }
> >  }
> > -
> > diff --git a/lib/librte_eal/common/eal_common_options.c 
> > b/lib/librte_eal/common/eal_common_options.c
> > index e2810ab..fc47588 100644
> > --- a/lib/librte_eal/common/eal_common_options.c
> > +++ b/lib/librte_eal/common/eal_common_options.c
> > @@ -45,6 +45,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "eal_internal_cfg.h"
> >  #include "eal_options.h"
> > @@ -85,6 +86,7 @@ eal_long_options[] = {
> > {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
> > {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
> > {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> > +   {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
> > {0, 0, 0, 0}
> >  };
> >
> > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
> > if (min == RTE_MAX_LCORE)
> > min = idx;
> > for (idx = min; idx <= max; idx++) {
> > -   cfg->lcore_role[idx] = ROLE_RTE;
> > -   lcore_config[idx].core_index = count;
> > -   count++;
> > +   if (cfg->lcore_role[idx] != ROLE_RTE) {
> > +   cfg->lcore_role[idx] = ROLE_RTE;
> > +   lcore_config[idx].core_index = count;
> > +   count++;
> > +   }
> > }
> > min = RTE_MAX_LCORE;
> > } else
> > @@ -289,6 +293,241 @@ eal_parse_master_lcore(const char *arg)
> > return 0;
> >  }
> >
> > +/*
> > + * Parse elem, the elem could be single number or '(' ')' group
> > + * Within group elem, '-' used for a range seperator;
> > + *',' used for a single number.
> > + */
> > +static int
> > +eal_parse_set(const char *input, uint16_t set[], unsigned num)
> > +{
> > +   unsigned idx;
> > +   const char

[dpdk-dev] [PATCH v6 4/4] docs: Add ABI documentation

2015-01-22 Thread Neil Horman

On Wed, Jan 21, 2015 at 11:24:12PM +0100, Thomas Monjalon wrote:
> 2015-01-21 14:43, Neil Horman:
> > On Wed, Jan 21, 2015 at 05:05:51PM +0100, Thomas Monjalon wrote:
> > > 2015-01-21 09:59, Neil Horman:
> > > > Considered and answered already.  I'm in favor of listing macros and 
> > > > structure
> > > > changes in the abi document, but I think an exhaustive list isn't 
> > > > needed.  If it
> > > > is, we could spend pages diving into minute.  Better to point out the 
> > > > need for
> > > > abi noticies as patches get posted.
> > > 
> > > I'm afraid you don't understand what I'm saying. Copy/paste:
> > > "No, I was suggesting to explain in this doc that macro removal must be
> > > announced with a deprecation notice,
> > > and that in case structure must be reworked, the name must change if we
> > > want to preserve ABI compatibility with old structure."
> > > Rewording: if you agree with this policy, please add it in this document.
> > > 
> > Yes, we're on the same page regarding what your asking, I just don't agree 
> > that
> > it needs to be explicitly called out.  I thought I was clear on that.
> > Appaerntly not however, so if it will settle the point, I'll just add it.
> 
> OK maybe I didn't explain enough my proposal.
> You can disagree but I want to be sure we think about the same thing.
> 
> 1) Macros are not part of the ABI but can be part of the API.
> Such macro removal must be announced in the previous release.
> 2) Structures are part of the ABI but cannot be versionned as the functions.
> So an ABI breaking change should be done by cloning the structure in a new 
> one.
> And the API functions where this structure appears should be cloned and 
> versionned
> to support new structure while keeping old version.
> 
> Maybe that these precisions are confuse and useless.
> Now I think I understand what you were saying by "an exhaustive list isn't 
> needed".
> You mean listing all types of ABI/API breakage like I did with these 2 cases, 
> right?
> I thought it was related to list of real/effective deprecations.
> 
> > > > > Neil, we expect that you consider comments done previously and that 
> > > > > you test your patch.
> > > > > Otherwise, we are losing time in useless reviews.
> > > > > 
> > > > Thomas, I have considered your comments, I simply don't agree with all 
> > > > of them,
> > > > and I made that clear.
> > > > 
> > > > As for losing time, you let the first attempt at this
> > > > patch rot on the list in 1.7 and have done the same thing for the 1.8 
> > > > cycle
> > > > until I yelled for reviews.
> > > 
> > > Now, I'm really upset of your wrong assumptions.
> > > You sent your first proposal on september, during 1.8 cycle, not 1.7 !
> > > And during this cycle, the decision was to postpone it for 2.0 release.
> > > 
> > you're missing the point. I apologize for not getting the release numbers 
> > right,
> > it should be 1.8 to 2.0 not 1.7 to 1.8 as you note, but that doesn't really
> > matter.  The point was 6 months.  6 months this has been sitting around.
> 
> No, 5 months. Yes, it's long.
> 
> > In that time up to this point I've gotten one review from another devloper 
> > on the
> > set, and you indicating that its not ready yet.  Then, the day 1.8 
> > released, I
> > reposed the patch series as we agreed, and its taken almost 5 weeks before 
> > I've
> > gotten any feedback on it, and then its feedback that could have been given 
> > 6
> > months ago (you'll note this patch was initially identical to the version I
> > posted back in september).  I think you can understand how I find that
> > frustrating.
> 
> You must understand that I'd prefer more people feel involved by this change.
> It would be saner to have this policy reviewed and acked by many developpers.
> As it was announced on the roadmap for 2.0, this first month of the cycle was
> ideal to have more discussions on how this policy can be precisely applied.
> You only received my comments (which may be useless) and it's now time to
> apply this important patchset.
> 
> > > I don't understand what's wrong with you.
> > The above is whats wrong with me.  The fact that I can try and try and try 
> > to
> > add value to this project so that I can expand its user base, and the best 
> > I've
> > thus far been able to receive is indifference.  At worst, the indifference 
> > is
> > followed by being told that the indifference is tantamount to rejection.
> > 
> > 
> > > You don't make any effort to understand what we are saying and
> > > you make no effort to understand what is this doc directory.
> > > You prefer crying that your patch is not applied.
> > No effort?  How many emails have I written contesting your opinions, 
> > presenting
> > supporting evidence, only to be met with assertions?  I don't think I'm the 
> > one
> > not making an effort here.
> 
> At the end, I accept your point of view and will apply the patchset.
> 
> > > And I still don't understand if you are willing to work on a test tool

[dpdk-dev] [PATCH v1 00/15] support multi-pthread per core

2015-01-22 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, January 22, 2015 8:16 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 00/15] support multi-pthread per core
> 
> The patch series contain the enhancements of EAL and fixes for libraries
> to run multi-pthreads(either EAL or non-EAL thread) per physical core.
> Two major changes list as below:
> - Extend the core affinity of each EAL thread to 1:n.
>   Each lcore stands for a EAL thread rather than a logical core.
>   The change adds new EAL option to allow static lcore to cpuset assginment.
>   Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is the 
> special case.
> - Fix the libraries to allow running on any non-EAL thread.
>   It fix the gaps running libraries in non-EAL thread(dynamic created by 
> user).
>   Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE.
> 
> Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen 
> in RFC review.
> 
> *** BLURB HERE ***
> 
> Cunming Liang (15):
>   eal: add cpuset into per EAL thread lcore_config
>   eal: new eal option '--lcores' for cpu assignment
>   eal: add support parsing socket_id from cpuset
>   eal: new TLS definition and API declaration
>   eal: add eal_common_thread.c for common thread API
>   eal: add rte_gettid() to acquire unique system tid
>   eal: apply affinity of EAL thread by assigned cpuset
>   enic: fix re-define freebsd compile complain
>   malloc: fix the issue of SOCKET_ID_ANY
>   log: fix the gap to support non-EAL thread
>   eal: set _lcore_id and _socket_id to (-1) by default
>   eal: fix recursive spinlock in non-EAL thraed
>   mempool: add support to non-EAL thread
>   ring: add support to non-EAL thread
>   timer: add support to non-EAL thread
> 
>  lib/librte_eal/bsdapp/eal/Makefile |   1 +
>  lib/librte_eal/bsdapp/eal/eal.c|  13 +-
>  lib/librte_eal/bsdapp/eal/eal_lcore.c  |  14 ++
>  lib/librte_eal/bsdapp/eal/eal_memory.c |   2 +
>  lib/librte_eal/bsdapp/eal/eal_thread.c |  76 +++---
>  lib/librte_eal/common/eal_common_launch.c  |   1 -
>  lib/librte_eal/common/eal_common_log.c |  17 +-
>  lib/librte_eal/common/eal_common_options.c | 262 
> -
>  lib/librte_eal/common/eal_common_thread.c  | 142 +++
>  lib/librte_eal/common/eal_options.h|   2 +
>  lib/librte_eal/common/eal_thread.h |  66 ++
>  .../common/include/generic/rte_spinlock.h  |   4 +-
>  lib/librte_eal/common/include/rte_eal.h|  27 +++
>  lib/librte_eal/common/include/rte_lcore.h  |  37 ++-
>  lib/librte_eal/common/include/rte_log.h|   5 +
>  lib/librte_eal/linuxapp/eal/Makefile   |   4 +
>  lib/librte_eal/linuxapp/eal/eal.c  |   7 +-
>  lib/librte_eal/linuxapp/eal/eal_lcore.c|  15 ++
>  lib/librte_eal/linuxapp/eal/eal_thread.c   |  78 +++---
>  lib/librte_malloc/malloc_heap.h|   7 +-
>  lib/librte_mempool/rte_mempool.h   |  18 +-
>  lib/librte_pmd_enic/enic.h |   1 +
>  lib/librte_pmd_enic/enic_compat.h  |   1 +
>  lib/librte_ring/rte_ring.h |  10 +-
>  lib/librte_timer/rte_timer.c   |  40 +++-
>  lib/librte_timer/rte_timer.h   |   2 +-
>  26 files changed, 721 insertions(+), 131 deletions(-)
>  create mode 100644 lib/librte_eal/common/eal_common_thread.c
> 
> --

Acked-by: Konstantin Ananyev 

> 1.8.1.4

[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

2015-01-22 Thread Ananyev, Konstantin


Hi Miroslaw,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Walukiewicz, Miroslaw
> Sent: Thursday, January 22, 2015 12:45 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL 
> thread
> 
> 
> 
> > -Original Message-
> > From: Liang, Cunming
> > Sent: Thursday, January 22, 2015 1:20 PM
> > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > thread
> >
> >
> >
> > > -Original Message-
> > > From: Walukiewicz, Miroslaw
> > > Sent: Thursday, January 22, 2015 5:53 PM
> > > To: Liang, Cunming; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-
> > EAL
> > > thread
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > > Sent: Thursday, January 22, 2015 9:17 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > > > thread
> > > >
> > > > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > > > It allows using rte_mempool in either EAL thread or any user pthread.
> > > > As in non-EAL thread, it directly rely on rte_ring and it's none 
> > > > preemptive.
> > > > It doesn't suggest to run multi-pthread/cpu which compete the
> > > > rte_mempool.
> > > > It will get bad performance and has critical risk if scheduling policy 
> > > > is RT.
> > > >
> > > > Signed-off-by: Cunming Liang 
> > > > ---
> > > >  lib/librte_mempool/rte_mempool.h | 18 +++---
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mempool/rte_mempool.h
> > > > b/lib/librte_mempool/rte_mempool.h
> > > > index 3314651..4845f27 100644
> > > > --- a/lib/librte_mempool/rte_mempool.h
> > > > +++ b/lib/librte_mempool/rte_mempool.h
> > > > @@ -198,10 +198,12 @@ struct rte_mempool {
> > > >   *   Number to add to the object-oriented statistics.
> > > >   */
> > > >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {
> > \
> > > > -   unsigned __lcore_id = rte_lcore_id();   \
> > > > -   mp->stats[__lcore_id].name##_objs += n; \
> > > > -   mp->stats[__lcore_id].name##_bulk += 1; \
> > > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
> > > > +   unsigned __lcore_id = rte_lcore_id();   \
> > > > +   if (__lcore_id < RTE_MAX_LCORE) {   \
> > > > +   mp->stats[__lcore_id].name##_objs += n; \
> > > > +   mp->stats[__lcore_id].name##_bulk += 1; \
> > > > +   }   \
> > > > } while(0)
> > > >  #else
> > > >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > > > void * const *obj_table,
> > > > __MEMPOOL_STAT_ADD(mp, put, n);
> > > >
> > > >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > > > -   /* cache is not enabled or single producer */
> > > > -   if (unlikely(cache_size == 0 || is_mp == 0))
> > > > +   /* cache is not enabled or single producer or none EAL thread */
> > >
> > > I don't understand this limitation.
> > >
> > > I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> > > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > > /** Per-lcore local cache. */
> > > struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> > > #endif
> > >
> > > But why we cannot extent the size of the local cache table to something
> > like
> > > RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> > >
> > > Keeping this condition here is a  real performance killer!!.
> > > I saw in my test application spending more 95% of CPU time reading the
> > atomic
> > > in M C/MP ring utilizing access to mempool.
> > [Liang, Cunming] This is the first step to make it work.
> > By Konstantin's comments, shall prevent to allocate unique id by ourselves.
> > And the return value from gettid() is too large as an index.
> > For non-EAL thread performance gap, will think about additional fix patch
> > here.
> > If care about performance, still prefer to choose EAL thread now.
> 
> In previous patch you had allocation of the thread id on base of unique 
> gettid() as number
> not a potential pointer as we can expect from implementation getid() from 
> Linux or BSD.

I am really puzzled with your sentence above.
What ' potential pointer' you are talking about?
rte_lcore_id() - returns unsigned 32bit integer (as it always did).
_lcore_id for each EAL thread is assigned at rte_eal_init().
For the EAL thread  _lcore_id value is in interval [0, RTE_MAX_LCORE) and
it is up to the user to make sure that each _lcore_id is unique inside DPDK 
MultiProcess grou

[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.

2015-01-22 Thread Neil Horman

On Tue, Jan 20, 2015 at 06:40:49PM +, Konstantin Ananyev wrote:
> v3 changes:
> Applied review comments from Thomas:
> - fix spelling errors reported by codespell.
> - split last patch into two:
> first to remove unused macros,
> second to add some comments about ACL internal layout.
> 
> v2 changes:
> - When build with the compilers that don't support AVX2 instructions,
> make rte_acl_classify_avx2() do nothing and return an error.
> - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*.
> - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y
> always buildable.
> 
> This patch series contain several fixes and enhancements for ACL library.
> See complete list below.
> Two main changes that are externally visible:
> - Introduce new classify method:  RTE_ACL_CLASSIFY_AVX2.
> It uses AVX2 instructions and 256 bit wide data types
> to perform internal trie traversal.
> That helps to increase classify() throughput.
> This method is selected as default one on CPUs that supports AVX2.
> - Introduce new field in the build config structure: max_size.
> It specifies maximum size that internal RT structure for given context
> can reach.
> The purpose of that is to allow user to decide about space/performance 
> trade-off
> (faster classify() vs less space for RT internal structures)
> for each given set of rules.
> 
> Konstantin Ananyev (18):
>   fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y
>   app/test: few small fixes fot test_acl.c
>   librte_acl: make data_indexes long enough to survive idle transitions.
>   librte_acl: remove build phase heuristsic with negative performance
> effect.
>   librte_acl: fix a bug at build phase that can cause matches beeing
> overwirtten.
>   librte_acl: introduce DFA nodes compression (group64) for identical
> entries.
>   librte_acl: build/gen phase - simplify the way match nodes are
> allocated.
>   librte_acl: make scalar RT code to be more similar to vector one.
>   librte_acl: a bit of RT code deduplication.
>   EAL: introduce rte_ymm and relatives in rte_common_vect.h.
>   librte_acl: add AVX2 as new rte_acl_classify() method
>   test-acl: add ability to manually select RT method.
>   librte_acl: Remove search_sse_2 and relatives.
>   libter_acl: move lo/hi dwords shuffle out from calc_addr
>   libte_acl: make calc_addr a define to deduplicate the code.
>   libte_acl: introduce max_size into rte_acl_config.
>   libte_acl: remove unused macros.
>   libte_acl: add some comments about ACL internal layout.
> 
>  app/test-acl/main.c | 126 +++--
>  app/test/test_acl.c |   8 +-
>  examples/l3fwd-acl/main.c   |   3 +-
>  examples/l3fwd/main.c   |   2 +-
>  lib/librte_acl/Makefile |  18 +
>  lib/librte_acl/acl.h|  58 ++-
>  lib/librte_acl/acl_bld.c| 392 +++-
>  lib/librte_acl/acl_gen.c| 268 +++
>  lib/librte_acl/acl_run.h|   7 +-
>  lib/librte_acl/acl_run_avx2.c   |  54 +++
>  lib/librte_acl/acl_run_avx2.h   | 284 
>  lib/librte_acl/acl_run_scalar.c |  65 ++-
>  lib/librte_acl/acl_run_sse.c| 585 
> +---
>  lib/librte_acl/acl_run_sse.h| 357 +++
>  lib/librte_acl/acl_vect.h   | 132 +++---
>  lib/librte_acl/rte_acl.c|  47 +-
>  lib/librte_acl/rte_acl.h|   4 +
>  lib/librte_acl/rte_acl_osdep_alone.h|  47 +-
>  lib/librte_eal/common/include/rte_common_vect.h |  39 +-
>  lib/librte_lpm/rte_lpm.h|   2 +-
>  20 files changed, 1444 insertions(+), 1054 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_avx2.c
>  create mode 100644 lib/librte_acl/acl_run_avx2.h
>  create mode 100644 lib/librte_acl/acl_run_sse.h
> 
> -- 
> 1.8.5.3
> 
> 
I'm sorry I've not looked at this yet Konstantin, I'm trying to get to it soon
Neil

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Jay Rolette

On Thu, Jan 22, 2015 at 12:27 PM, Luke Gorrie  wrote:

> On 22 January 2015 at 14:29, Jay Rolette  wrote:
>
>> Microseconds matter. Scaling up to 100GbE, nanoseconds matter.
>>
>
> True. Is there a cut-off point though?
>

There are always engineering trade-offs that have to be made. If I'm
optimizing something today, I'm certainly not starting at something that
takes 1ns for an app that is doing L4-7 processing. It's all about
profiling and figuring out where the bottlenecks are.

For past networking products I've built, there was a lot of traffic that
the software didn't have to do much to. Minimal L2/L3 checks, then forward
the packet. It didn't even have to parse the headers because that was
offloaded on an FPGA. The only way to make those packets faster was to turn
them around in the FPGA and not send them to the CPU at all. That change
improved small packet performance by ~30%. That was on high-end network
processors that are significantly faster than Intel processors for packet
handling.

It seems to be a strange thing when you realize that just getting the
packets into the CPU is expensive, nevermind what you do with them after
that.

Does one nanosecond matter?
>

You just have to be careful when talking about things like a nanosecond.
It's sounds really small, but IPG for a 10G link is only 9.6ns. It's all
relative.

AVX512 will fit a 64-byte packet in one register and move that to or from
> memory with one instruction. L1/L2 cache bandwidth per server is growing on
> a double-exponential curve (both bandwidth per core and cores per CPU). I
> wonder if moving data around in cache will soon be too cheap for us to
> justify worrying about.
>

Adding cores helps with aggregate performance, but doesn't really help with
latency on a single packet. That said, I'll take advantage of anything I
can from the hardware to either let me scale up how much traffic I can
handle or the amount of features I can add at the same performance level!

Jay

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng



On 2015/1/22 12:45, Matthew Hall wrote:
> One theory. Many DPDK functions crash if they are called before 
> rte_eal_init() 
> is called. So perhaps this could be a cause, since that won't have been 
> called 
> when working on a constant

Hi, Matthew

Thank you for your response.

Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-22 Thread Wodkowski, PawelX

> 
> I'm new on the list and my experience with DPDK is about two months so,
> pls., don't judge me too harsh... ;)
> I tried to cover the obvious things and actually learned the code while
> reviewing. The things u say, Pavel(X?) make sense and I obviously missed

I am really puzzled about mail client I have to use. It is really stubborn 
about using my correct name :P

> that.
> But as Changchun mentioned there is nothing that can't be fixed with a
> followup patches... ;)
> 
Roger that :P 
No judging, I should also look those patches before they were acked.

Waiting for fixes.

Pawel

[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

2015-01-22 Thread Walukiewicz, Miroslaw



> -Original Message-
> From: Liang, Cunming
> Sent: Thursday, January 22, 2015 1:20 PM
> To: Walukiewicz, Miroslaw; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> thread
> 
> 
> 
> > -Original Message-
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, January 22, 2015 5:53 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-
> EAL
> > thread
> >
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > Sent: Thursday, January 22, 2015 9:17 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > > thread
> > >
> > > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > > It allows using rte_mempool in either EAL thread or any user pthread.
> > > As in non-EAL thread, it directly rely on rte_ring and it's none 
> > > preemptive.
> > > It doesn't suggest to run multi-pthread/cpu which compete the
> > > rte_mempool.
> > > It will get bad performance and has critical risk if scheduling policy is 
> > > RT.
> > >
> > > Signed-off-by: Cunming Liang 
> > > ---
> > >  lib/librte_mempool/rte_mempool.h | 18 +++---
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/lib/librte_mempool/rte_mempool.h
> > > b/lib/librte_mempool/rte_mempool.h
> > > index 3314651..4845f27 100644
> > > --- a/lib/librte_mempool/rte_mempool.h
> > > +++ b/lib/librte_mempool/rte_mempool.h
> > > @@ -198,10 +198,12 @@ struct rte_mempool {
> > >   *   Number to add to the object-oriented statistics.
> > >   */
> > >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {
>   \
> > > - unsigned __lcore_id = rte_lcore_id();   \
> > > - mp->stats[__lcore_id].name##_objs += n; \
> > > - mp->stats[__lcore_id].name##_bulk += 1; \
> > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
> > > + unsigned __lcore_id = rte_lcore_id();   \
> > > + if (__lcore_id < RTE_MAX_LCORE) {   \
> > > + mp->stats[__lcore_id].name##_objs += n; \
> > > + mp->stats[__lcore_id].name##_bulk += 1; \
> > > + }   \
> > >   } while(0)
> > >  #else
> > >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > > void * const *obj_table,
> > >   __MEMPOOL_STAT_ADD(mp, put, n);
> > >
> > >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > > - /* cache is not enabled or single producer */
> > > - if (unlikely(cache_size == 0 || is_mp == 0))
> > > + /* cache is not enabled or single producer or none EAL thread */
> >
> > I don't understand this limitation.
> >
> > I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > /** Per-lcore local cache. */
> > struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> > #endif
> >
> > But why we cannot extent the size of the local cache table to something
> like
> > RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> >
> > Keeping this condition here is a  real performance killer!!.
> > I saw in my test application spending more 95% of CPU time reading the
> atomic
> > in M C/MP ring utilizing access to mempool.
> [Liang, Cunming] This is the first step to make it work.
> By Konstantin's comments, shall prevent to allocate unique id by ourselves.
> And the return value from gettid() is too large as an index.
> For non-EAL thread performance gap, will think about additional fix patch
> here.
> If care about performance, still prefer to choose EAL thread now.

In previous patch you had allocation of the thread id on base of unique 
gettid() as number 
not a potential pointer as we can expect from implementation getid() from Linux 
or BSD.

The another problem is that we compare here int with some unique thread 
identifier.
How can you prevent that when implementation of gettid will change and unique 
thread identifier will be 
Less than RTE_MAX_LCORE and will be still unique. 

I think that your assumption will work for well-known operating systems but 
will be very unportable.

Regarding performance the DPDK can work efficiently in different environments 
including pthreads. 
You can imagine running DPDK from pthread application where affinity will be 
made by application. 
Effectiveness depends on application thread implementation comparable to EAL 
threads. 

I think that this is a goal for this change.

> >
> > Same comment for get operation below
> >
> > > + if (unlikely(cache_size == 0 || is_mp == 0 ||
> > > +  lcore_id >= RTE_MAX_LCORE))
> > >   goto ring_enqueue;
> > >
> > >   /* Go straight to ring if put would overflow mem allocated for cache
> > > */

[dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread

2015-01-22 Thread Liang, Cunming



> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Thursday, January 22, 2015 5:58 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, January 22, 2015 9:17 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread
> >
> > Allow to setup timers only for EAL (lcore) threads (__lcore_id <
> > MAX_LCORE_ID).
> > E.g. ? dynamically created thread will be able to reset/stop timer for lcore
> > thread,
> > but it will be not allowed to setup timer for itself or another non-lcore
> > thread.
> > rte_timer_manage() for non-lcore thread would simply do nothing and
> > return straightway.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_timer/rte_timer.c | 40 +++
> > -
> >  lib/librte_timer/rte_timer.h |  2 +-
> >  2 files changed, 32 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
> > index 269a992..601c159 100644
> > --- a/lib/librte_timer/rte_timer.c
> > +++ b/lib/librte_timer/rte_timer.c
> > @@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE];
> >
> 
> Why not extend the priv_timer size to value being in range returned by
> rte_lcore_id().
> 
> All timer stuff will work automatically after such change without any change 
> in
> timer logic including stats.
[Liang, Cunming] The same reason as mempool does.
It won't expect to involve dynamic unique id allocation for user thread on the 
first step.
The failure secondary won't release the reserved id which cause potential 
unexpected leak.
So will look for other approach to improve the libraries in the next step.
> 
> >  /* when debug is enabled, store some statistics */
> >  #ifdef RTE_LIBRTE_TIMER_DEBUG
> > -#define __TIMER_STAT_ADD(name, n) do { \
> > -   unsigned __lcore_id = rte_lcore_id();   \
> > -   priv_timer[__lcore_id].stats.name += (n);   \
> > +#define __TIMER_STAT_ADD(name, n) do {
> > \
> > +   unsigned __lcore_id = rte_lcore_id();   \
> > +   if (__lcore_id < RTE_MAX_LCORE)
> > \
> > +   priv_timer[__lcore_id].stats.name += (n);   \
> > } while(0)
> >  #else
> >  #define __TIMER_STAT_ADD(name, n) do {} while(0)
> > @@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim,
> > unsigned lcore_id;
> >
> > lcore_id = rte_lcore_id();
> > +   if (lcore_id >= RTE_MAX_LCORE)
> > +   lcore_id = LCORE_ID_ANY;
> >
> > /* wait that the timer is in correct status before update,
> >  * and mark it as being configured */
> > while (success == 0) {
> > prev_status.u32 = tim->status.u32;
> >
> > +   /*
> > +* prevent race condition of non-EAL threads
> > +* to update the timer. When 'owner == LCORE_ID_ANY',
> > +* it means updated by a non-EAL thread.
> > +*/
> > +   if (lcore_id == (unsigned)LCORE_ID_ANY &&
> > +   (uint16_t)lcore_id == prev_status.owner)
> > +   return -1;
> > +
> > /* timer is running on another core, exit */
> > if (prev_status.state == RTE_TIMER_RUNNING &&
> > -   (unsigned)prev_status.owner != lcore_id)
> > +   prev_status.owner != (uint16_t)lcore_id)
> > return -1;
> >
> > /* timer is being configured on another core */
> > @@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t
> > expire,
> >
> > /* round robin for tim_lcore */
> > if (tim_lcore == (unsigned)LCORE_ID_ANY) {
> > -   tim_lcore =
> > rte_get_next_lcore(priv_timer[lcore_id].prev_lcore,
> > -  0, 1);
> > -   priv_timer[lcore_id].prev_lcore = tim_lcore;
> > +   if (lcore_id < RTE_MAX_LCORE) {
> > +   tim_lcore = rte_get_next_lcore(
> > +   priv_timer[lcore_id].prev_lcore,
> > +   0, 1);
> > +   priv_timer[lcore_id].prev_lcore = tim_lcore;
> > +   } else
> > +   tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0,
> > 1);
> > }
> >
> > /* wait that the timer is in correct status before update,
> > @@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t
> > expire,
> > return -1;
> >
> > __TIMER_STAT_ADD(reset, 1);
> > -   if (prev_status.state == RTE_TIMER_RUNNING) {
> > +   if (prev_status.state == RTE_TIMER_RUNNING &&
> > +   lcore_id < RTE_MAX_LCORE) {
> > priv_timer[lcore_id].updated = 1;
> > }
> >
> > @@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim)
> > return -1;
> >
> > __TIMER_STAT_A

[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

2015-01-22 Thread Liang, Cunming



> -Original Message-
> From: Walukiewicz, Miroslaw
> Sent: Thursday, January 22, 2015 5:53 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> thread
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > Sent: Thursday, January 22, 2015 9:17 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > thread
> >
> > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > It allows using rte_mempool in either EAL thread or any user pthread.
> > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> > It doesn't suggest to run multi-pthread/cpu which compete the
> > rte_mempool.
> > It will get bad performance and has critical risk if scheduling policy is 
> > RT.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_mempool/rte_mempool.h | 18 +++---
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.h
> > b/lib/librte_mempool/rte_mempool.h
> > index 3314651..4845f27 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -198,10 +198,12 @@ struct rte_mempool {
> >   *   Number to add to the object-oriented statistics.
> >   */
> >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
> > -   unsigned __lcore_id = rte_lcore_id();   \
> > -   mp->stats[__lcore_id].name##_objs += n; \
> > -   mp->stats[__lcore_id].name##_bulk += 1; \
> > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
> > +   unsigned __lcore_id = rte_lcore_id();   \
> > +   if (__lcore_id < RTE_MAX_LCORE) {   \
> > +   mp->stats[__lcore_id].name##_objs += n; \
> > +   mp->stats[__lcore_id].name##_bulk += 1; \
> > +   }   \
> > } while(0)
> >  #else
> >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > void * const *obj_table,
> > __MEMPOOL_STAT_ADD(mp, put, n);
> >
> >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > -   /* cache is not enabled or single producer */
> > -   if (unlikely(cache_size == 0 || is_mp == 0))
> > +   /* cache is not enabled or single producer or none EAL thread */
> 
> I don't understand this limitation.
> 
> I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> /** Per-lcore local cache. */
> struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> #endif
> 
> But why we cannot extent the size of the local cache table to something like
> RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> 
> Keeping this condition here is a  real performance killer!!.
> I saw in my test application spending more 95% of CPU time reading the atomic
> in M C/MP ring utilizing access to mempool.
[Liang, Cunming] This is the first step to make it work.
By Konstantin's comments, shall prevent to allocate unique id by ourselves.
And the return value from gettid() is too large as an index.
For non-EAL thread performance gap, will think about additional fix patch here.
If care about performance, still prefer to choose EAL thread now.
> 
> Same comment for get operation below
> 
> > +   if (unlikely(cache_size == 0 || is_mp == 0 ||
> > +lcore_id >= RTE_MAX_LCORE))
> > goto ring_enqueue;
> >
> > /* Go straight to ring if put would overflow mem allocated for cache
> > */
> > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
> > **obj_table,
> > uint32_t cache_size = mp->cache_size;
> >
> > /* cache is not enabled or single consumer */
> > -   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> > +   if (unlikely(cache_size == 0 || is_mc == 0 ||
> > +n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> > goto ring_dequeue;
> >
> > cache = &mp->local_cache[lcore_id];
> > --
> > 1.8.1.4

[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote:
> It supports one new eal long option '--lcores' for EAL thread cpuset 
> assignment.
> 
> The format pattern:
>   --lcores='lcores[@cpus]<,lcores[@cpus]>'
> lcores, cpus could be a single digit or a group.
> '(' and ')' are necessary if it's a group.
> If not supply '@cpus', the value of cpus uses the same as lcores.
> 
> e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
>   lcore 0 runs on cpuset 0x41 (cpu 0,6)
>   lcore 1 runs on cpuset 0x2 (cpu 1)
>   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
>   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
>   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> 

This strikes me as very confusing, though a couple of tweaks might help with
readability. The lcore 0 at the end is especially confusing. Perhaps we can 
limit the allowed formats here,
* require the lcore_id to be specified - the lack of an lcore id for the last 
part
makes having it as lcore 0 surprising.
* only allow one lcore id to be given for each set of cores. 

I think it may still be readable if we allow the core set to be omitted if its
to be the same as the lcore_id.

It's probably still not going to be very tidy, but I think we can improve 
things.

/Bruce

> Signed-off-by: Cunming Liang 
> ---
>  lib/librte_eal/common/eal_common_launch.c  |   1 -
>  lib/librte_eal/common/eal_common_options.c | 262 
> -
>  lib/librte_eal/common/eal_options.h|   2 +
>  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
>  4 files changed, 261 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_launch.c 
> b/lib/librte_eal/common/eal_common_launch.c
> index 599f83b..2d732b1 100644
> --- a/lib/librte_eal/common/eal_common_launch.c
> +++ b/lib/librte_eal/common/eal_common_launch.c
> @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
>   rte_eal_wait_lcore(lcore_id);
>   }
>  }
> -
> diff --git a/lib/librte_eal/common/eal_common_options.c 
> b/lib/librte_eal/common/eal_common_options.c
> index e2810ab..fc47588 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "eal_internal_cfg.h"
>  #include "eal_options.h"
> @@ -85,6 +86,7 @@ eal_long_options[] = {
>   {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
>   {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
>   {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> + {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
>   {0, 0, 0, 0}
>  };
>  
> @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
>   if (min == RTE_MAX_LCORE)
>   min = idx;
>   for (idx = min; idx <= max; idx++) {
> - cfg->lcore_role[idx] = ROLE_RTE;
> - lcore_config[idx].core_index = count;
> - count++;
> + if (cfg->lcore_role[idx] != ROLE_RTE) {
> + cfg->lcore_role[idx] = ROLE_RTE;
> + lcore_config[idx].core_index = count;
> + count++;
> + }
>   }
>   min = RTE_MAX_LCORE;
>   } else
> @@ -289,6 +293,241 @@ eal_parse_master_lcore(const char *arg)
>   return 0;
>  }
>  
> +/*
> + * Parse elem, the elem could be single number or '(' ')' group
> + * Within group elem, '-' used for a range seperator;
> + *',' used for a single number.
> + */
> +static int
> +eal_parse_set(const char *input, uint16_t set[], unsigned num)
> +{
> + unsigned idx;
> + const char *str = input;
> + char *end = NULL;
> + unsigned min, max;
> +
> + memset(set, 0, num * sizeof(uint16_t));
> +
> + while (isblank(*str))
> + str++;
> +
> + /* only digit or left bracket is qulify for start point */
> + if ((!isdigit(*str) && *str != '(') || *str == '\0')
> + return -1;
> +
> + /* process single number */
> + if (*str != '(') {
> + errno = 0;
> + idx = strtoul(str, &end, 10);
> + if (errno || end == NULL || idx >= num)
> + return -1;
> + else {
> + while (isblank(*end))
> + end++;
> +
> + if (*end != ',' && *end != '\0' &&
> + *end != '@')
> + return -1;
> +
> + set[idx] = 1;
> + return end - input;
> + }
> + }
> +
> + /* process set within bracket */
> + str++;
> + while (isblank(*str))
> + str++;
> + if (*str == '\0')
> + return -1;
> +
> + min = RTE_MAX_LCORE;
> + do {
> +
> + /* go ahead to the fi

[dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for ethertype filter

2015-01-22 Thread Thomas Monjalon

Hi Jingjing,

Few comments about how to send a v2 patch (as described in 
http://dpdk.org/dev#send):
--subject-prefix 'PATCH v2' to make it clear that it supersedes a previous 
version
--annotate to add a changelog below the three dashes
--in-reply-to  to have the versions threaded together
http://dpdk.org/dev/patchwork to set the previous version as superseded

2015-01-22 09:09, Jingjing Wu:
> new commands for ethertype filter
>   - ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr)
> (mac_address) ethertype (ether_type) (drop|fwd) queue (queue_id)
> 
> Signed-off-by: Jingjing Wu 

Applied

Thanks
-- 
Thomas

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng

#define rte_memcpy(dst, src, n)  \
((__builtin_constant_p(n)) ?  \
memcpy((dst), (src), (n)) :  \
rte_memcpy_func((dst), (src), (n)))


Why call memcpy when n is constant variable?
Can i change them to the follow codes?

#define rte_memcpy(dst, src, n)  \
{   \
int num = n;\
rte_memcpy_func((dst), (src), (num)))   \
}


-- 
Regards,
Haifeng

[dpdk-dev] [PATCH RFC 00/13] Update build system

2015-01-22 Thread Thomas Monjalon

2015-01-22 10:03, Gonzalez Monroy, Sergio:
> > From: Gonzalez Monroy, Sergio
> > Sent: Monday, January 12, 2015 5:22 PM
> > To: Thomas Monjalon
> > Subject: Re: [dpdk-dev] [PATCH RFC 00/13] Update build system
> > 
> > Hi Thomas,
> > 
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Monday, January 12, 2015 4:52 PM
> > >
> > > Hi Sergio,
> > >
> > > 2015-01-12 16:33, Sergio Gonzalez Monroy:
> > > > This patch series updates the DPDK build system.
> > >
> > > Thanks for proposing such rework.
> > > We need discussions on that topic. So I ask some questions below.
> > >
> > > > Following are the goals it tries to accomplish:
> > > >  - Create a library containing core DPDK libraries (librte_eal,
> > > >librte_malloc, librte_mempool, librte_mbuf and librte_ring).
> > > >The idea of core libraries is to group those libraries that are
> > > >always required for any DPDK application.
> > >
> > > How is it better? Is it only to reduce dependencies lines?
> > >
> > In my opinion I think that there are a set of libraries that are always 
> > required
> > and therefore should be grouped as a single one.
> > Basically all apps and other DPDK libs would have dependencies to these core
> > libraries.
> > 
> > Aside from that, I don't think there is any difference. Note that this 
> > affects
> > shared libraries, with no difference for apps linked against static libs.
> > 
> > > >  - Remove config option to build a combined library.
> > >
> > > Why removing combined library? Is there people finding it helpful?
> > >
> > I don't think it makes sense from a shared library point of view, maybe it
> > does for static?
> > For example, in the case of shared libraries I think we want to try to 
> > avoid the
> > case where we have an app linked against librte_dpdk.so, but such library
> > may contain different libraries depending on the options that were enabled
> > when the lib was built.
> > 
> > The core libraries would be that set of libraries that are always required 
> > for
> > an app, and its content would be fixed regardless of the option libraries 
> > (like
> > acl, hash, distributor, etc.) We could add more libraries as core if we 
> > think it is
> > a better solution, but the goal should be that librte_core.so contains the
> > same libraries/API regardless of the system/arch.
> > 
> > > >  - For shared libraries, explicitly link against dependant
> > > >libraries (adding entries to DT_NEEDED).
> > >
> > > OK, good.
> > >
> > > >  - Update app linking flags against static/shared DPDK libs.
> > > >
> > > > Note that this patch turns up being quite big because of moving lib
> > > > directories to a new subdirectory.
> > > > I have ommited the actual diff from the patch doing the move of
> > > > librte_eal as it is quite big (6MB). Probably a different approach
> > > > is
> > > preferred.
> > >
> > > Why do you think moving directories is needed?
> > >
> > Actually I am not sure is the best way to do this :) There is no need to 
> > move
> > them, as the same result could be achieved without moving directories, but I
> > thought that it would be easier for anyone to see which libraries are 'core'
> > and which are not.
> > 
> > Not moving those directories would definitely simplify this patch series.
> > 
> > > Thanks
> > > --
> > > Thomas
> > 
> > Thanks,
> > Sergio
> 
> Hi Thomas,
> 
> Any other comments/suggestions ? 
> My main concern would be the patch needed to move librte_eal (around 6MB). 
> 
> Thoughts?

I think you shouldn't move the libs.
Maybe we can link the core libs into one (not sure of the interest)
but I think we shouldn't move them in a core/ subdir.

On another side, I'd like to see KNI moving out of EAL.

-- 
Thomas

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
> On 2015/01/22 16:35, Matthew Hall wrote:
> > On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> >> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> > No guarantee. But a theory. It might use some things from the EAL init to 
> > figure out which version of the accelerated algorithm to use.
> 
> This selection is done at compile-time.
> And if the size is constant, I guess DPDK assumes memcpy is replaced by
> inline __builtin_memcpy.
> I haven't checked the performance of builtin memcpy, but probably much
> faster.
> 

Yes, that assumption is correct. A couple of years ago we discovered that for
constant size values, the compiler would generate much faster code for us
using a regular memcpy than rte_memcpy, hence the macro.

/Bruce

> Tetsuya
> 
> > Matthew.
> 
>

[dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to rte_memzone_reserve

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 09:56:48AM +, Ferriter, Cian wrote:
> Hey guys,
> 
> I just wanted to ask is there anything more that can be done with this patch 
> or is it in an acceptable state for pushing?
> 
> Cian

At this stage I think I'm ok with the patch contents, unless anyone else 
objects.
However, your patch submission is missing the sign-off line needed before it
can be committed. Can you please resubmit with the proper sign-off. [See 
http://www.dpdk.org/dev]

Regards,
/Bruce

> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferriter, Cian
> Sent: Monday, January 19, 2015 6:39 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to 
> rte_memzone_reserve
> 
> I would be happy with the original suggestion. If the ethdev data for a port 
> in use is in cache it removes the performance concern associated the current 
> setup and my fix. The original suggestion also fixes the crash that I was 
> seeing because of memory being reserved from a numa node with no 
> "--socket-mem" allocated.
> 
> Cian
> 
> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, January 14, 2015 10:10 AM
> To: Ferriter, Cian
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to 
> rte_memzone_reserve
> 
> On Tue, Jan 13, 2015 at 06:05:25PM +, Ferriter, Cian wrote:
> > Comments on alternative solutions:
> > 1) how would this solution work when there is no NIC present, and 
> > "rte_eth_from_rings" is called? Here, could you have an else where the 
> > socket id of the master core is passed to the "memzone_reserve"?
> > 2) how would you advise making this change? I have looked at where 
> > "rte_eth_dev_allocate" is being called and in all but one case, there is a 
> > "numa_id" that could be passed in. This isn't the case for " 
> > rte_eth_dev_init" however, is there an easy solution for this? Would there 
> > now need to be an "rte_eth_dev_data" struct for each socket that there is a 
> > NIC attached to, reserving memory from that socket?
> > 
> > Cian
> 
> While I think the issues you highlight can probably be overcome, I'm not so 
> sure any more how much it matters what numa node this is allocated on. The 
> ethdev data for any port in use by a port should be in the cache. In that 
> case, if it doesn't matter, your original suggestion would work fine.
> 
>   /Bruce
> 
> > 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Tuesday, January 13, 2015 1:56 PM
> > To: Ferriter, Cian
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id 
> > passed to rte_memzone_reserve
> > 
> > On Tue, Jan 13, 2015 at 09:23:16AM +, Ferriter, Cian wrote:
> > > Passing a socket id of "rte_socket_id()" can cause problems in non DPDK 
> > > applications as there is a dependency on the current logical core we are 
> > > running on.
> > > Passing " rte_lcore_to_socket_id(rte_get_master_lcore())" as the socket 
> > > id to rte_memzone_reserve resolves these issues as the master lcore 
> > > doesn't change.
> > > 
> > 
> > The only trouble is that when affinitizing the memory for the NICs to the 
> > socket of the master lcore, it gives us no way to correctly configure an 
> > app to use NICs connected to two different sockets on the one system. All 
> > memory for all NICs will end up on the same socket. Two possible 
> > alternative solutions:
> > 1) affinitize memory to the socket the NIC is connected to
> > 2) add a socket parameter to the API calls to allow the user complete 
> > control over their memory allocations
> > 
> > Obviously the second one breaks backward compatibility (assume we modify 
> > existing API call), but is more powerful.
> > 
> > Thoughts?
> > 
> > /Bruce
> > 
> > > -Original Message-
> > > From: Ferriter, Cian
> > > Sent: Tuesday, January 13, 2015 9:22 AM
> > > To: dev at dpdk.org
> > > Cc: Ferriter, Cian
> > > Subject: [PATCH] lib/librte_ether: change socket_id passed to 
> > > rte_memzone_reserve
> > > 
> > > Change the socket id that is passed to rte_memzone_reserve from the 
> > > socket id of current logical core to the socket id of the master_lcore.
> > > ---
> > >  lib/librte_ether/rte_ethdev.c |2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)  mode change
> > > 100644 => 100755 lib/librte_ether/rte_ethdev.c
> > > 
> > > diff --git a/lib/librte_ether/rte_ethdev.c 
> > > b/lib/librte_ether/rte_ethdev.c old mode 100644 new mode 100755 
> > > index 95f2ceb..835540d
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -184,7 +184,7 @@ rte_eth_dev_data_alloc(void)
> > >   if (rte_eal_process_type() == RTE_PROC_PRIMARY){
> > >   mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > >   RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
> > > - rte_socket_id(),

[dpdk-dev] [PATCH v2] add one option memory-only for secondary processes

2015-01-22 Thread Bruce Richardson

On Thu, Jan 22, 2015 at 09:05:34AM +, Chi, Xiaobo (NSN - CN/Hangzhou) wrote:
> Hi, Bruce,
> Since the DPDK2.0 merge window is opened now, so is it possible for this 
> patch to be one candidate for v2.0?
> I searched in the DPDK 
> patchwork(http://www.dpdk.org/dev/patchwork/project/dpdk/list/?state=*&q=memory-only&archive=both
>  ), but can not find this V2 patch. Can you please help to check why? Thanks 
> a lot.
> 
> Filters: Search = memory-only  remove filter
> Patch  Date   Submitter   DelegateState
> [dpdk-dev] add one option memory-only for those secondary PRBs
> 2014-12-02  chixiaobo   Not Applicable
> [dpdk-dev] add one option memory-only for those secondary PRBs
> 2014-12-02  chixiaobo   Changes Requested
> 
> Brgs,
> Chi Xiaobo
> 
That's a question that Thomas is better able to answer than me, since he is the
man with control over patchwork! :-)

Thomas, any feedback here?

Thanks,
/Bruce
> 
> -Original Message-
> 
> From: ext Bruce Richardson [mailto:bruce.richardson at intel.com] 
> Sent: Tuesday, December 16, 2014 6:04 PM
> To: Chi, Xiaobo (NSN - CN/Hangzhou)
> Cc: ext Hiroshi Shimamoto; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> On Tue, Dec 16, 2014 at 09:26:48AM +, Chi, Xiaobo (NSN - CN/Hangzhou) 
> wrote:
> > Hi, Bruce,
> > How about this patch, can it be merged to master branch? Thanks.
> > 
> > Brgs,
> > Chi Xiaobo
> > 
> 
> At this point, I think we are well past code-freeze for new features for 1.8,
> but this looks a good candidate for 2.0 once the merge window for that opens.
> 
> /Bruce
> 
> > 
> > -Original Message-
> > From: Chi, Xiaobo (NSN - CN/Hangzhou) 
> > Sent: Monday, December 15, 2014 5:58 PM
> > To: 'ext Hiroshi Shimamoto'; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > processes
> > 
> > Hi, Hiroshi,
> > Yes, the should be performance degradation, not only due to the mempool 
> > cache, but also due to process scheduling overhead (lead by no CPU pin.)
> > I have not done the performance testing. In my project scenarios, those 
> > SECONDARY processes only send/receive messages to/from the PRIMARY process 
> > via mempool/ring, the throughput is not so high, so the performance 
> > degradation is not critical to us. but there are dozens of SECONDARY 
> > processes in our system, it will be hard to manually properly pin them to 
> > different CPU cores, what we want is to apply linux standard scheduling 
> > mechanism to do load balance between CPU cores.
> > 
> > Brgs,
> > Chi Xiaobo
> > 
> > 
> > -Original Message-
> > From: ext Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com] 
> > Sent: Thursday, December 11, 2014 11:03 AM
> > To: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > processes
> > 
> > Hi,
> > 
> > sorry for the delay.
> > 
> > > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for 
> > > secondary processes
> > > 
> > > Hi, Hiroshi,
> > > Yes, you are right, in order to avoid such problem, while create the 
> > > mempool, which shall be shared between the primary
> > > process and those secondary Processes, we need to assign the cache_size 
> > > param value to be zero. And in order to make the
> > > system more stable, it's better to define the RTE_MEMPOOL_CACHE_MAX_SIZE 
> > > to be 0 in rte_config.h.
> > 
> > Yes, it prevents the data corruption, but it also hurts the performance.
> > I think, if we use the mbuf w/o cache for PMD, we will see the performance 
> > degradation.
> > 
> > Don't you have any number?
> > 
> > thanks,
> > Hiroshi
> > 
> > > 
> > > /* create the mempool */
> > > struct rte_mempool *
> > > rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
> > >  unsigned cache_size, unsigned private_data_size,
> > >  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
> > >  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
> > >  int socket_id, unsigned flags);
> > > 
> > > 
> > > Brgs,
> > > Chi xiaobo
> > > 
> > > 
> > > -Original Message-
> > > From: ext Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> > > Sent: Wednesday, December 03, 2014 6:54 PM
> > > To: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for 
> > > secondary processes
> > > 
> > > Hi,
> > > 
> > > > Subject: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > > > processes
> > > >
> > > > From: Chi Xiaobo 
> > > >
> > > > Problem: There is one normal DPDK processes deployment scenarios: one 
> > > > primary process and several (even hundreds) secondary
> > > > processes; all outside packets/messages are sent/received by primary 
> > > > process and then distribute them to those secondary
> > > >

[dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for ethertype filter

2015-01-22 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu
> Sent: Thursday, January 22, 2015 1:09 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for 
> ethertype filter
> 
> new commands for ethertype filter
>   - ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr)
> (mac_address) ethertype (ether_type) (drop|fwd) queue (queue_id)
> 
> Signed-off-by: Jingjing Wu 
> ---
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 51 
> +++--
>  1 file changed, 12 insertions(+), 39 deletions(-)
> 
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index be935c2..218835a 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1392,61 +1392,34 @@ Filter Functions
> 
>  This section details the available filter functions that are available.
> 
> -add_ethertype_filter
> +ethertype_filter
>  
> 
> -Add a L2 Ethertype filter, which identify packets by their L2 Ethertype 
> mainly assign them to a receive
> queue.
> +Add or delete a L2 Ethertype filter, which identify packets by their L2 
> Ethertype mainly assign them to
> a receive queue.
> 
> -add_ethertype_filter (port_id) ethertype (eth_value) priority 
> (enable|disable) (pri_value) queue
> (queue_id) index (idx)
> +ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr) (mac_address) 
> ethertype (ether_type)
> (drop|fwd) queue (queue_id)
> 
>  The available information parameters are:
> 
>  *   port_id:  the port which the Ethertype filter assigned on.
> 
> -*   eth_value: the EtherType value want to match,
> -for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
> invalid.
> -
> -*   enable: user priority participates in the match.
> -
> -*   disable: user priority doesn't participate in the match.
> -
> -*   pri_value: user priority value that want to match.
> -
> -*   queue_id : The receive queue associated with this EtherType filter
> +*   mac_addr: compare destination mac address.
> 
> -*   index: the index of this EtherType filter
> +*   mac_ignr: ignore destination mac address match.
> 
> -Example:
> -
> -.. code-block:: console
> +*   mac_address: destination mac address to match.
> 
> -testpmd> add_ethertype_filter 0 ethertype 0x0806 priority disable 0 
> queue 3 index 0
> -Assign ARP packet to receive queue 3
> -
> -remove_ethertype_filter
> -~~~
> -
> -Remove a L2 Ethertype filter
> -
> -remove_ethertype_filter (port_id) index (idx)
> -
> -get_ethertype_filter
> -
> -
> -Get and display a L2 Ethertype filter
> +*   ether_type: the EtherType value want to match,
> +for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
> invalid.
> 
> -get_ethertype_filter (port_id) index (idx)
> +*   queue_id : The receive queue associated with this EtherType filter. It 
> is meaningless when
> deleting or dropping.
> 
> -Example:
> +Example, to add/remove an ethertype filter rule:
> 
>  .. code-block:: console
> 
> -testpmd> get_ethertype_filter 0 index 0
> -
> -filter[0]:
> -ethertype: 0x0806
> -priority: disable, 0
> -queue: 3
> +testpmd> ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 3
> +testpmd> ethertype_filter 0 del mac_ignr ethertype 0x0806 fwd queue 3

Hi Jingjing,

There is a duplicate line here.
I have applied the patch and checked the HTML output (applies and builds 
cleanly).

Regards,

Bernard
> 
>  add_2tuple_filter
>  ~
> --
> 1.9.3

[dpdk-dev] [PATCH RFC 00/13] Update build system

2015-01-22 Thread Gonzalez Monroy, Sergio

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, January 22, 2015 10:39 AM
> To: Gonzalez Monroy, Sergio
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC 00/13] Update build system
> 
> 2015-01-22 10:03, Gonzalez Monroy, Sergio:
> > > From: Gonzalez Monroy, Sergio
> > > Sent: Monday, January 12, 2015 5:22 PM
> > > To: Thomas Monjalon
> > > Subject: Re: [dpdk-dev] [PATCH RFC 00/13] Update build system
> > >
> > > Hi Thomas,
> > >
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > Sent: Monday, January 12, 2015 4:52 PM
> > > >
> > > > Hi Sergio,
> > > >
> > > > 2015-01-12 16:33, Sergio Gonzalez Monroy:
> > > > > This patch series updates the DPDK build system.
> > > >
> > > > Thanks for proposing such rework.
> > > > We need discussions on that topic. So I ask some questions below.
> > > >
> > > > > Following are the goals it tries to accomplish:
> > > > >  - Create a library containing core DPDK libraries (librte_eal,
> > > > >librte_malloc, librte_mempool, librte_mbuf and librte_ring).
> > > > >The idea of core libraries is to group those libraries that are
> > > > >always required for any DPDK application.
> > > >
> > > > How is it better? Is it only to reduce dependencies lines?
> > > >
> > > In my opinion I think that there are a set of libraries that are
> > > always required and therefore should be grouped as a single one.
> > > Basically all apps and other DPDK libs would have dependencies to
> > > these core libraries.
> > >
> > > Aside from that, I don't think there is any difference. Note that
> > > this affects shared libraries, with no difference for apps linked against
> static libs.
> > >
> > > > >  - Remove config option to build a combined library.
> > > >
> > > > Why removing combined library? Is there people finding it helpful?
> > > >
> > > I don't think it makes sense from a shared library point of view,
> > > maybe it does for static?
> > > For example, in the case of shared libraries I think we want to try
> > > to avoid the case where we have an app linked against
> > > librte_dpdk.so, but such library may contain different libraries
> > > depending on the options that were enabled when the lib was built.
> > >
> > > The core libraries would be that set of libraries that are always
> > > required for an app, and its content would be fixed regardless of
> > > the option libraries (like acl, hash, distributor, etc.) We could
> > > add more libraries as core if we think it is a better solution, but
> > > the goal should be that librte_core.so contains the same libraries/API
> regardless of the system/arch.
> > >
> > > > >  - For shared libraries, explicitly link against dependant
> > > > >libraries (adding entries to DT_NEEDED).
> > > >
> > > > OK, good.
> > > >
> > > > >  - Update app linking flags against static/shared DPDK libs.
> > > > >
> > > > > Note that this patch turns up being quite big because of moving
> > > > > lib directories to a new subdirectory.
> > > > > I have ommited the actual diff from the patch doing the move of
> > > > > librte_eal as it is quite big (6MB). Probably a different
> > > > > approach is
> > > > preferred.
> > > >
> > > > Why do you think moving directories is needed?
> > > >
> > > Actually I am not sure is the best way to do this :) There is no
> > > need to move them, as the same result could be achieved without
> > > moving directories, but I thought that it would be easier for anyone to
> see which libraries are 'core'
> > > and which are not.
> > >
> > > Not moving those directories would definitely simplify this patch series.
> > >
> > > > Thanks
> > > > --
> > > > Thomas
> > >
> > > Thanks,
> > > Sergio
> >
> > Hi Thomas,
> >
> > Any other comments/suggestions ?
> > My main concern would be the patch needed to move librte_eal (around
> 6MB).
> >
> > Thoughts?
> 
> I think you shouldn't move the libs.
> Maybe we can link the core libs into one (not sure of the interest) but I 
> think
> we shouldn't move them in a core/ subdir.
> 
> On another side, I'd like to see KNI moving out of EAL.
> 
> --
> Thomas

I think moving KNI out of EAL belongs to a different patch.

We can still link librte_core without moving the directories into core/

I'll work on it.

Thanks,
Sergio

[dpdk-dev] [RFC 00/16] enhance checksum offload API

2015-01-22 Thread Thomas Monjalon

2015-01-22 00:41, Olivier MATZ:
> We use the attached scapy script (dpdk-cksum-test.py) for testing.

Attaching the python script which was filtered out.
-- next part --
A non-text attachment was scrubbed...
Name: dpdk-cksum-test.py
Type: text/x-python
Size: 2407 bytes
Desc: not available
URL: 
<http://dpdk.org/ml/archives/dev/attachments/20150122/0115fb4b/attachment-0001.py>

[dpdk-dev] [PATCH v7 4/4] docs: Add ABI documentation

2015-01-22 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, January 21, 2015 9:00 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v7 4/4] docs: Add ABI documentation
> 
> Adding a document describing rudimentary ABI policy and adding notice space 
> for any deprecation
> announcements
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 

Hi Neil,

Tried to apply the patch and build it, the following warnings occurred.

Applying: docs: Add ABI documentation
/nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/.git/rebase-apply/patch:55: 
trailing whitespace.
---  
/nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/.git/rebase-apply/patch:22: 
new blank line at EOF.
+
warning: 2 lines add whitespace errors.

sivswdev01> make doc-guides-html
sphinx for guides...
/nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/doc/guides/rel_notes/abi.rst::
 WARNING: document isn't included in any toctree

Change to doc/guides/rel_notes/index.rst is missing from the patch.

Suggest applying and building the patch and then checking the generated HTML in 
Firefox to see that it is as you expect.
file:///home/nhorman/dpdk-doc-next/build/doc/html/guides/rel_notes/index.html

Regards,

Bernard.


> 
> ---
> Change notes:
> 
> v5) Updated documentation to add notes from Thomas M.
> 
> v6) Moved abi.txt to guides/rel_notes/abi.rst
> 
> v7) Updated abi.rst to integrate with index file
> Updated abi.rst to conform to rst formatting
> Updated abi.rst to include example deprecation notices.  Its not exactly 
> the language that Thomas
> indicated, but I think it makes the idea clear.
> ---
>  doc/guides/rel_notes/abi.rst | 41 +
>  1 file changed, 41 insertions(+)
>  create mode 100644 doc/guides/rel_notes/abi.rst
> 
> diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst new 
> file mode 100644 index
> 000..9b72719
> --- /dev/null
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -0,0 +1,41 @@
> +ABI policy
> +==
> +ABI versions are set at the time of major release labeling, and ABI may
> +change multiple times between the last labeling and the HEAD label of
> +the git tree without warning.
> +
> +ABI versions, once released are available until such time as their
> +deprecation has been noted here for at least one major release cycle,
> +after it has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and
> +then the decision to remove it is made during the development of DPDK
> +1.9.  The decision will be recorded here, shipped with the DPDK 1.9
> +release, and actually removed when DPDK
> +1.10 ships.
> +
> +ABI versions may be deprecated in whole, or in part as needed by a given 
> update.
> +
> +Some ABI changes may be too significant to reasonably maintain multiple
> +versions of.  In those events ABI's may be updated without backward
> +compatibility provided.  The requirements for doing so are:
> +
> +#. At least 3 acknoweldgements of the need on the dpdk.org #. A full
> +deprecation cycle must be made to offer downstream consumers sufficient
> +warning of the change.  E.g. if dpdk 2.0 is under development when the
> +change is proposed, a deprecation notice must be added to this file,
> +and released with dpdk 2.0.  Then the change may be incorporated for
> +dpdk 2.1 #. The LIBABIVER variable in the makefilei(s) where the ABI
> +changes are incorporated must be incremented in parallel with the ABI
> +changes themselves
> +
> +Note that the above process for ABI deprecation should not be
> +undertaken lightly.  ABI stability is extreemely important for
> +downstream consumers of the DPDK, especially when distributed in shared
> +object form.  Every effort should be made to preserve ABI whenever
> +possible.  For instance, reorganizing public structure field for
> +astetic or readability purposes should be avoided as it will cause ABI
> +breakage.  Only significant (e.g. performance) reasons should be seen as 
> cause to alter ABI.
> +
> +Examples of Deprecation notices
> +---
> +* The Macro #RTE_FOO is deprecated and will be removed with version
> +2.0, to be replaced with the inline function rte_bar()
> +* The function rte_mbuf_grok has been updated to include new parameter
> +in version 2.0.  Backwards compatibility will be maintained for this
> +function until the release of version 2.1
> +* The members struct foo have been reorganized in release 2.0.  Existing 
> binary applications will have
> backwards compatibility in release 2.0, while newly built binaries will need 
> to reference new structure
> variant struct foo2.  Compatibility will be removed in release 2.2, and all 
> applications will require
> updating a rebuilding to the new structure at that time, which will be 
> renamed to the origional struct
> foo.
> +* Significant ABI changes are planned for the librte_dostuff library.  The 
> upcommi

[dpdk-dev] [PATCH] mk: add support for ICC 15 compiler

2015-01-22 Thread Daniel Mrzyglod

This patch add Support for ICC 15.

ICC 15 changed inline-max-size and inline-max-total-size default values,
so for ICC 15 flags -no-inline-max-size -no-inline-max-total-size must be added.

additionally disable compile error for:
13368 - loop was not vectorized with "vector always assert"
15527 - loop was not vectorized: function call to fprintf cannot be vectorize

Signed-off-by: Daniel Mrzyglod 
---
 mk/toolchain/icc/rte.vars.mk | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
index 5503fb0..e39d710 100644
--- a/mk/toolchain/icc/rte.vars.mk
+++ b/mk/toolchain/icc/rte.vars.mk
@@ -66,11 +66,18 @@ TOOLCHAIN_ASFLAGS =
 # Turn off some ICC warnings -
 #   Remark #271   : trailing comma is nonstandard
 #   Warning #1478 : function "" (declared at line N of "")
+#   error #13368: loop was not vectorized with "vector always assert"
+#   error #15527: loop was not vectorized: function call to fprintf cannot be 
vectorize
 #   was declared "deprecated"
 WERROR_FLAGS := -Wall -Werror-all -w2 -diag-disable 271 -diag-warning 1478
+WERROR_FLAGS += -diag-disable 13368 -diag-disable 15527

 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
+# disable max-inline params boundaries for ICC 15 compiler
+ifeq ($(shell test $(ICC_MAJOR_VERSION) -eq 15 && echo 1), 1)
+   TOOLCHAIN_CFLAGS += -no-inline-max-size -no-inline-max-total-size
+endif

 export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
 export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS
-- 
2.1.0

[dpdk-dev] [PATCH v8 4/4] docs: Add ABI documentation

2015-01-22 Thread Neil Horman

Adding a document describing rudimentary ABI policy and adding notice space for
any deprecation announcements

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 

---
Change notes:

v5) Updated documentation to add notes from Thomas M.

v6) Moved abi.txt to guides/rel_notes/abi.rst

v7) Updated abi.rst to integrate with index file
Updated abi.rst to conform to rst formatting
Updated abi.rst to include example deprecation notices.  Its not exactly the
language that Thomas indicated, but I think it makes the idea clear.

v8) Add missing file index.rst which was left out of the prior commit
---
 doc/guides/rel_notes/abi.rst   | 40 
 doc/guides/rel_notes/index.rst |  1 +
 2 files changed, 41 insertions(+)
 create mode 100644 doc/guides/rel_notes/abi.rst

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
new file mode 100644
index 000..73d88ca
--- /dev/null
+++ b/doc/guides/rel_notes/abi.rst
@@ -0,0 +1,40 @@
+ABI policy
+==
+ABI versions are set at the time of major release labeling, and ABI may change
+multiple times between the last labeling and the HEAD label of the git tree
+without warning.
+
+ABI versions, once released are available until such time as their
+deprecation has been noted here for at least one major release cycle, after it
+has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and then the decision 
to
+remove it is made during the development of DPDK 1.9.  The decision will be
+recorded here, shipped with the DPDK 1.9 release, and actually removed when 
DPDK
+1.10 ships.
+
+ABI versions may be deprecated in whole, or in part as needed by a given 
update.
+
+Some ABI changes may be too significant to reasonably maintain multiple
+versions of.  In those events ABI's may be updated without backward
+compatibility provided.  The requirements for doing so are:
+
+#. At least 3 acknoweldgements of the need on the dpdk.org
+#. A full deprecation cycle must be made to offer downstream consumers 
sufficient warning of the change.  E.g. if dpdk 2.0 is under development when 
the change is proposed, a deprecation notice must be added to this file, and 
released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
+#. The LIBABIVER variable in the makefilei(s) where the ABI changes are 
incorporated must be incremented in parallel with the ABI changes themselves
+
+Note that the above process for ABI deprecation should not be undertaken
+lightly.  ABI stability is extreemely important for downstream consumers of the
+DPDK, especially when distributed in shared object form.  Every effort should 
be
+made to preserve ABI whenever possible.  For instance, reorganizing public
+structure field for astetic or readability purposes should be avoided as it 
will
+cause ABI breakage.  Only significant (e.g. performance) reasons should be seen
+as cause to alter ABI.
+
+Examples of Deprecation notices
+---
+* The Macro #RTE_FOO is deprecated and will be removed with version 2.0, to be 
replaced with the inline function rte_bar()
+* The function rte_mbuf_grok has been updated to include new parameter in 
version 2.0.  Backwards compatibility will be maintained for this function 
until the release of version 2.1
+* The members struct foo have been reorganized in release 2.0.  Existing 
binary applications will have backwards compatibility in release 2.0, while 
newly built binaries will need to reference new structure variant struct foo2.  
Compatibility will be removed in release 2.2, and all applications will require 
updating a rebuilding to the new structure at that time, which will be renamed 
to the origional struct foo.
+* Significant ABI changes are planned for the librte_dostuff library.  The 
upcomming release 2.0 will not contain these changes, but release 2.1 will, and 
no backwards compatibility is planned due to the invasive nature of these 
changes.  Binaries using this library built prior to version 2.1 will require 
updating and recompilation.
+
+Deprecation Notices
+---
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 2724149..cf712b2 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -48,4 +48,5 @@ Contents
 updating_apps
 known_issues
 resolved_issues
+abi
 faq
-- 
2.1.0

[dpdk-dev] [PATCH v8 3/4] Add library version extenstion

2015-01-22 Thread Neil Horman

To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken.  This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 

---
Change Notes:
v3)
Made symlinking of libraries conditional on a DSO build

v4) Removed erroneous newline
changed @exit 1 to @false
changed ./$(LIB) to $<
---
 lib/librte_acl/Makefile  |  2 ++
 lib/librte_cfgfile/Makefile  |  2 ++
 lib/librte_cmdline/Makefile  |  2 ++
 lib/librte_compat/Makefile   |  2 ++
 lib/librte_distributor/Makefile  |  2 ++
 lib/librte_eal/bsdapp/eal/Makefile   |  2 ++
 lib/librte_eal/linuxapp/eal/Makefile |  2 ++
 lib/librte_ether/Makefile|  2 ++
 lib/librte_hash/Makefile |  2 ++
 lib/librte_ip_frag/Makefile  |  2 ++
 lib/librte_ivshmem/Makefile  |  2 ++
 lib/librte_kni/Makefile  |  2 ++
 lib/librte_kvargs/Makefile   |  2 ++
 lib/librte_lpm/Makefile  |  2 ++
 lib/librte_malloc/Makefile   |  2 ++
 lib/librte_mbuf/Makefile |  2 ++
 lib/librte_mempool/Makefile  |  2 ++
 lib/librte_meter/Makefile|  2 ++
 lib/librte_pipeline/Makefile |  2 ++
 lib/librte_pmd_af_packet/Makefile|  2 ++
 lib/librte_pmd_bond/Makefile |  2 ++
 lib/librte_pmd_e1000/Makefile|  2 ++
 lib/librte_pmd_enic/Makefile |  2 ++
 lib/librte_pmd_i40e/Makefile |  2 ++
 lib/librte_pmd_ixgbe/Makefile|  2 ++
 lib/librte_pmd_pcap/Makefile |  2 ++
 lib/librte_pmd_ring/Makefile |  2 ++
 lib/librte_pmd_virtio/Makefile   |  2 ++
 lib/librte_pmd_vmxnet3/Makefile  |  2 ++
 lib/librte_pmd_xenvirt/Makefile  |  2 ++
 lib/librte_port/Makefile |  2 ++
 lib/librte_power/Makefile|  2 ++
 lib/librte_ring/Makefile |  2 ++
 lib/librte_sched/Makefile|  2 ++
 lib/librte_table/Makefile|  2 ++
 lib/librte_timer/Makefile|  2 ++
 lib/librte_vhost/Makefile|  2 ++
 mk/rte.lib.mk| 12 ++--
 38 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 45cbf80..765deb1 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -39,6 +39,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)

 EXPORT_MAP := rte_acl_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += tb_mem.c

diff --git a/lib/librte_cfgfile/Makefile b/lib/librte_cfgfile/Makefile
index a4f73de..032c240 100644
--- a/lib/librte_cfgfile/Makefile
+++ b/lib/librte_cfgfile/Makefile
@@ -41,6 +41,8 @@ CFLAGS += $(WERROR_FLAGS)

 EXPORT_MAP := rte_cfgfile_version.map

+LIBABIVER := 1
+
 #
 # all source are stored in SRCS-y
 #
diff --git a/lib/librte_cmdline/Makefile b/lib/librte_cmdline/Makefile
index 3c71831..719dff6 100644
--- a/lib/librte_cmdline/Makefile
+++ b/lib/librte_cmdline/Makefile
@@ -38,6 +38,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3

 EXPORT_MAP := rte_cmdline_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := cmdline.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_cirbuf.c
diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
index 0bab870..0c57533 100644
--- a/lib/librte_compat/Makefile
+++ b/lib/librte_compat/Makefile
@@ -32,6 +32,8 @@
 include $(RTE_SDK)/mk/rte.vars.mk


+LIBABIVER := 1
+
 # install includes
 SYMLINK-y-include := rte_compat.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 3674a2c..4c9af17 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,6 +39,8 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)

 EXPORT_MAP := rte_distributor_version.map

+LIBABIVER := 1
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 0b5f9d9..ae214a4 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -48,6 +48,8 @@ CFLAGS += $(WERROR_FLAGS) -O3

 EXPORT_MAP := rte_eal_version.map

+LIBABIVER := 1
+
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) := eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_memory.c
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index bae8af1..e117cec 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -35,6 +35,8 @@ LIB = librte_eal.a

 EXPORT_MAP := rte_eal_version.map

+LIBABIVER := 1
+
 VPATH += $(RTE_SDK)/lib/librte_eal/common

 CFLAGS += -I$(SRCDI

[dpdk-dev] [PATCH v8 2/4] Provide initial versioning for all DPDK libraries

2015-01-22 Thread Neil Horman

Add linker version script files to each DPDK library to put a stake in the
ground from which we can start cleaning up API's

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 

---
Change Notes:

v2)
* Updated export map to not require full path
---
 lib/librte_acl/Makefile|   2 +
 lib/librte_acl/rte_acl_version.map |  21 
 lib/librte_cfgfile/Makefile|   2 +
 lib/librte_cfgfile/rte_cfgfile_version.map |  14 +++
 lib/librte_cmdline/Makefile|   2 +
 lib/librte_cmdline/rte_cmdline_version.map |  69 +
 lib/librte_distributor/Makefile|   2 +
 lib/librte_distributor/rte_distributor_version.map |  16 +++
 lib/librte_eal/bsdapp/eal/Makefile |   2 +
 lib/librte_eal/bsdapp/eal/rte_eal_version.map  |  90 
 lib/librte_eal/linuxapp/eal/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map|  90 
 lib/librte_ether/Makefile  |   2 +
 lib/librte_ether/rte_ether_version.map | 113 +
 lib/librte_hash/Makefile   |   2 +
 lib/librte_hash/rte_hash_version.map   |  18 
 lib/librte_ip_frag/Makefile|   2 +
 lib/librte_ip_frag/rte_ipfrag_version.map  |  14 +++
 lib/librte_ivshmem/Makefile|   2 +
 lib/librte_ivshmem/rte_ivshmem_version.map |  13 +++
 lib/librte_kni/Makefile|   2 +
 lib/librte_kni/rte_kni_version.map |  20 
 lib/librte_kvargs/Makefile |   2 +
 lib/librte_kvargs/rte_kvargs_version.map   |  10 ++
 lib/librte_lpm/Makefile|   2 +
 lib/librte_lpm/rte_lpm_version.map |  24 +
 lib/librte_malloc/Makefile |   2 +
 lib/librte_malloc/rte_malloc_version.map   |  19 
 lib/librte_mbuf/Makefile   |   2 +
 lib/librte_mbuf/rte_mbuf_version.map   |  14 +++
 lib/librte_mempool/Makefile|   2 +
 lib/librte_mempool/rte_mempool_version.map |  18 
 lib/librte_meter/Makefile  |   2 +
 lib/librte_meter/rte_meter_version.map |  13 +++
 lib/librte_pipeline/Makefile   |   2 +
 lib/librte_pipeline/rte_pipeline_version.map   |  23 +
 lib/librte_pmd_af_packet/Makefile  |   2 +
 .../rte_pmd_af_packet_version.map  |   7 ++
 lib/librte_pmd_bond/Makefile   |   2 +
 lib/librte_pmd_bond/rte_eth_bond_version.map   |  21 
 lib/librte_pmd_e1000/Makefile  |   2 +
 lib/librte_pmd_e1000/rte_pmd_e1000_version.map |   5 +
 lib/librte_pmd_enic/Makefile   |   2 +
 lib/librte_pmd_enic/rte_pmd_enic_version.map   |   5 +
 lib/librte_pmd_i40e/Makefile   |   2 +
 lib/librte_pmd_i40e/rte_pmd_i40e_version.map   |   5 +
 lib/librte_pmd_ixgbe/Makefile  |   2 +
 lib/librte_pmd_ixgbe/rte_pmd_ixgbe_version.map |   5 +
 lib/librte_pmd_pcap/Makefile   |   2 +
 lib/librte_pmd_pcap/rte_pmd_pcap_version.map   |   5 +
 lib/librte_pmd_ring/Makefile   |   2 +
 lib/librte_pmd_ring/rte_eth_ring.c |   2 +-
 lib/librte_pmd_ring/rte_eth_ring.h |   6 --
 lib/librte_pmd_ring/rte_eth_ring_version.map   |  10 ++
 lib/librte_pmd_virtio/Makefile |   1 +
 lib/librte_pmd_virtio/rte_pmd_virtio_version.map   |   5 +
 lib/librte_pmd_vmxnet3/Makefile|   2 +
 lib/librte_pmd_vmxnet3/rte_pmd_vmxnet3_version.map |   5 +
 lib/librte_pmd_xenvirt/Makefile|   2 +
 lib/librte_pmd_xenvirt/rte_eth_xenvirt_version.map |   8 ++
 lib/librte_port/Makefile   |   2 +
 lib/librte_port/rte_port_version.map   |  18 
 lib/librte_power/Makefile  |   2 +
 lib/librte_power/rte_power_version.map |  18 
 lib/librte_ring/Makefile   |   2 +
 lib/librte_ring/rte_ring_version.map   |  12 +++
 lib/librte_sched/Makefile  |   2 +
 lib/librte_sched/rte_sched_version.map |  22 
 lib/librte_table/Makefile  |   2 +
 lib/librte_table/rte_table_version.map |  22 
 lib/librte_timer/Makefile  |   2 +
 lib/librte_timer/rte_timer_version.map |  16 +++
 lib/librte_vhost/Makefile  |   2 +
 lib/librte_vhost/rte_vhost_version.map |  14 +++
 74 files changed, 874 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_acl/rte_acl_version.map
 create mode 100644 lib/librte_cfgfile/rte_cfg

[dpdk-dev] [PATCH v8 1/4] compat: Add infrastructure to support symbol versioning

2015-01-22 Thread Neil Horman

Add initial pass header files to support symbol versioning.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: "Richardson, Bruce" 
CC: "Gonzalez Monroy, Sergio" 

---
Change Notes:
V2)
Moved ifeq to _INSTALL target

V3)
Undo V2 changes and make librte_compat use the rte.install.mk file
instead

v4)
changed --version-script to accept SRCDIR in this patch at per request
documented versioning macros
cleaned up macro parameter consistency
converted SA macro to RTE_STR macro
fixed copyright
---
 lib/Makefile   |   1 +
 lib/librte_compat/Makefile |  38 +
 lib/librte_compat/rte_compat.h | 117 +
 mk/rte.lib.mk  |   4 ++
 4 files changed, 160 insertions(+)
 create mode 100644 lib/librte_compat/Makefile
 create mode 100644 lib/librte_compat/rte_compat.h

diff --git a/lib/Makefile b/lib/Makefile
index 0ffc982..d617d81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -31,6 +31,7 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+DIRS-y += librte_compat
 DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
 DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
new file mode 100644
index 000..0bab870
--- /dev/null
+++ b/lib/librte_compat/Makefile
@@ -0,0 +1,38 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2013 Neil Horman 
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+
+# install includes
+SYMLINK-y-include := rte_compat.h
+
+include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
new file mode 100644
index 000..d7cc176
--- /dev/null
+++ b/lib/librte_compat/rte_compat.h
@@ -0,0 +1,117 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010 Neil Horman .
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH

[dpdk-dev] [PATCH 0/7] vmxnet3: driver enhancements

2015-01-22 Thread Thomas Monjalon

2015-01-21 16:49, Stephen Hemminger:
> On Thu, 15 Jan 2015 12:02:11 +0100
> Thomas Monjalon  wrote:
> 
> > Someone to review these patches?
> 
> Any comments from 
> Bruce Richardson 

Sorry, what do you mean?
Is there some comments I missed?

-- 
Thomas

[dpdk-dev] [PATCH v7 4/4] docs: Add ABI documentation

2015-01-22 Thread Neil Horman

On Thu, Jan 22, 2015 at 10:56:08AM +, Iremonger, Bernard wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Wednesday, January 21, 2015 9:00 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v7 4/4] docs: Add ABI documentation
> > 
> > Adding a document describing rudimentary ABI policy and adding notice space 
> > for any deprecation
> > announcements
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > CC: "Richardson, Bruce" 
> 
> Hi Neil,
> 
> Tried to apply the patch and build it, the following warnings occurred.
> 
> Applying: docs: Add ABI documentation
> /nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/.git/rebase-apply/patch:55:
>  trailing whitespace.
> ---  
> /nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/.git/rebase-apply/patch:22:
>  new blank line at EOF.
> +
> warning: 2 lines add whitespace errors.
> 
Those errors don't make much sense.  Theres no whitespace on line 55 of the
patch after the last character and line 22 isn't the end of a file that I can
see.

> sivswdev01> make doc-guides-html
> sphinx for guides...
> /nfs/sie/disks/git_workspace/bairemon/dpdk-doc-next/doc/guides/rel_notes/abi.rst::
>  WARNING: document isn't included in any toctree
> 
> Change to doc/guides/rel_notes/index.rst is missing from the patch.
> 
> Suggest applying and building the patch and then checking the generated HTML 
> in Firefox to see that it is as you expect.
> file:///home/nhorman/dpdk-doc-next/build/doc/html/guides/rel_notes/index.html
> 
No, the generated html works fine.  I have the abi line added to index.rst but
somehow it didn't get comitted, so my testing worked, but the patch isn't right.
I'll resend it, and try to figure out whats wrong with those lines, though they
really seem screwy to me.
Neil

> Regards,
> 
> Bernard.
> 
>  
> > 
> > ---
> > Change notes:
> > 
> > v5) Updated documentation to add notes from Thomas M.
> > 
> > v6) Moved abi.txt to guides/rel_notes/abi.rst
> > 
> > v7) Updated abi.rst to integrate with index file
> > Updated abi.rst to conform to rst formatting
> > Updated abi.rst to include example deprecation notices.  Its not 
> > exactly the language that Thomas
> > indicated, but I think it makes the idea clear.
> > ---
> >  doc/guides/rel_notes/abi.rst | 41 +
> >  1 file changed, 41 insertions(+)
> >  create mode 100644 doc/guides/rel_notes/abi.rst
> > 
> > diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst 
> > new file mode 100644 index
> > 000..9b72719
> > --- /dev/null
> > +++ b/doc/guides/rel_notes/abi.rst
> > @@ -0,0 +1,41 @@
> > +ABI policy
> > +==
> > +ABI versions are set at the time of major release labeling, and ABI may
> > +change multiple times between the last labeling and the HEAD label of
> > +the git tree without warning.
> > +
> > +ABI versions, once released are available until such time as their
> > +deprecation has been noted here for at least one major release cycle,
> > +after it has been tagged.  E.g. the ABI for DPDK 1.8 is shipped, and
> > +then the decision to remove it is made during the development of DPDK
> > +1.9.  The decision will be recorded here, shipped with the DPDK 1.9
> > +release, and actually removed when DPDK
> > +1.10 ships.
> > +
> > +ABI versions may be deprecated in whole, or in part as needed by a given 
> > update.
> > +
> > +Some ABI changes may be too significant to reasonably maintain multiple
> > +versions of.  In those events ABI's may be updated without backward
> > +compatibility provided.  The requirements for doing so are:
> > +
> > +#. At least 3 acknoweldgements of the need on the dpdk.org #. A full
> > +deprecation cycle must be made to offer downstream consumers sufficient
> > +warning of the change.  E.g. if dpdk 2.0 is under development when the
> > +change is proposed, a deprecation notice must be added to this file,
> > +and released with dpdk 2.0.  Then the change may be incorporated for
> > +dpdk 2.1 #. The LIBABIVER variable in the makefilei(s) where the ABI
> > +changes are incorporated must be incremented in parallel with the ABI
> > +changes themselves
> > +
> > +Note that the above process for ABI deprecation should not be
> > +undertaken lightly.  ABI stability is extreemely important for
> > +downstream consumers of the DPDK, especially when distributed in shared
> > +object form.  Every effort should be made to preserve ABI whenever
> > +possible.  For instance, reorganizing public structure field for
> > +astetic or readability purposes should be avoided as it will cause ABI
> > +breakage.  Only significant (e.g. performance) reasons should be seen as 
> > cause to alter ABI.
> > +
> > +Examples of Deprecation notices
> > +---
> > +* The Macro #RTE_FOO is deprecated and will be removed with version
> > +2.0, to be replaced with the inline function rte_

[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them

2015-01-22 Thread Dan Aloni

While VFIO doesn't allow us to map complete BARs with MSI-X tables,
it does allow us to map around them in PAGE_SIZE granularity. There
might be adapters that provide their registers in the same BAR
but on a different page. For example, Intel's NVME adapter, though
not a network adapter, provides only one MMIO BAR that contains
the MSI-X table.

Signed-off-by: Dan Aloni 
CC: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  |  5 +-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  |  4 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 99 +++---
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 ++-
 5 files changed, 101 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f54101e8aa..4a74a9372a15 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -118,13 +118,14 @@ pci_find_max_end_va(void)

 /* map a particular resource from a file */
 void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
 {
void *mapaddr;

/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED, fd, offset);
+   MAP_SHARED | additional_flags, fd, offset);
if (mapaddr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
__func__, fd, requested_addr,
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 1070eb88fe0a..0a0853d4c4df 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -66,7 +66,7 @@ extern void *pci_map_addr;
 void *pci_find_max_end_va(void);

 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
-   size_t size);
+  size_t size, int additional_flags);

 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index e53f06b82430..eaa2e36f643e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -139,7 +139,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)

if (pci_map_resource(uio_res->maps[i].addr, fd,
 (off_t)uio_res->maps[i].offset,
-(size_t)uio_res->maps[i].size)
+(size_t)uio_res->maps[i].size, 0)
!= uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
"Cannot mmap device resource\n");
@@ -379,7 +379,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
pci_map_addr = pci_find_max_end_va();

mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
-   (size_t)maps[j].size);
+   (size_t)maps[j].size, 0);
if (mapaddr == MAP_FAILED)
fail = 1;

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 20e097727f80..f6542a1f1464 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -62,6 +62,9 @@

 #ifdef VFIO_PRESENT

+#define PAGE_SIZE   (sysconf(_SC_PAGESIZE))
+#define PAGE_MASK   (~(PAGE_SIZE - 1))
+
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
@@ -72,10 +75,12 @@ static struct vfio_config vfio_cfg;

 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, int *msix_bar)
+pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t *msix_table_offset,
+ uint32_t *msix_table_size)
 {
int ret;
uint32_t reg;
+   uint16_t flags;
uint8_t cap_id, cap_offset;

/* read PCI capability pointer from config space */
@@ -134,7 +139,18 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar)
return -1;
}

+   ret = pread64(fd, &flags, sizeof(flags),
+   
VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+   cap_offset + 2);
+   if (ret != sizeof(flags)) {
+   RTE_LOG(ERR, EAL, "Cannot read table flags from 
PCI config "
+

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Luke Gorrie

Howdy!

This memcpy discussion is absolutely fascinating. Glad to be a fly on the
wall!

On 21 January 2015 at 22:25, Jim Thompson  wrote:

>
> The differences with DPDK are that a) entire cores (including the AVX/SSE
> units and even AES-NI (FPU) are dedicated to DPDK, and b) DPDK is a library,
> and the resulting networking applications are exactly that, applications.
> The "operating system? is now a control plane.
>
>
Here is another thought: when is it time to start thinking of packet copy
as a cheap unit-time operation?

Packets are shrinking exponentially when measured in:

- Cache lines
- Cache load/store operations needed to copy
- Number of vector move instructions needed to copy

because those units are all based on exponentially growing quantities,
while the byte size of packets stays the same for many applications.

So when is it time to stop caring?

(Are we already there, even, for certain conditions? How about Haswell CPU,
data already exclusively in our L1 cache, start and end both known to be
cache-line-aligned?)

Cheers,
-Luke (eagerly awaiting arrival of Haswell server...)

[dpdk-dev] [PATCH RFC 00/13] Update build system

2015-01-22 Thread Gonzalez Monroy, Sergio

> From: Gonzalez Monroy, Sergio
> Sent: Monday, January 12, 2015 5:22 PM
> To: Thomas Monjalon
> Subject: Re: [dpdk-dev] [PATCH RFC 00/13] Update build system
> 
> Hi Thomas,
> 
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Monday, January 12, 2015 4:52 PM
> >
> > Hi Sergio,
> >
> > 2015-01-12 16:33, Sergio Gonzalez Monroy:
> > > This patch series updates the DPDK build system.
> >
> > Thanks for proposing such rework.
> > We need discussions on that topic. So I ask some questions below.
> >
> > > Following are the goals it tries to accomplish:
> > >  - Create a library containing core DPDK libraries (librte_eal,
> > >librte_malloc, librte_mempool, librte_mbuf and librte_ring).
> > >The idea of core libraries is to group those libraries that are
> > >always required for any DPDK application.
> >
> > How is it better? Is it only to reduce dependencies lines?
> >
> In my opinion I think that there are a set of libraries that are always 
> required
> and therefore should be grouped as a single one.
> Basically all apps and other DPDK libs would have dependencies to these core
> libraries.
> 
> Aside from that, I don't think there is any difference. Note that this affects
> shared libraries, with no difference for apps linked against static libs.
> 
> > >  - Remove config option to build a combined library.
> >
> > Why removing combined library? Is there people finding it helpful?
> >
> I don't think it makes sense from a shared library point of view, maybe it
> does for static?
> For example, in the case of shared libraries I think we want to try to avoid 
> the
> case where we have an app linked against librte_dpdk.so, but such library
> may contain different libraries depending on the options that were enabled
> when the lib was built.
> 
> The core libraries would be that set of libraries that are always required for
> an app, and its content would be fixed regardless of the option libraries 
> (like
> acl, hash, distributor, etc.) We could add more libraries as core if we think 
> it is
> a better solution, but the goal should be that librte_core.so contains the
> same libraries/API regardless of the system/arch.
> 
> > >  - For shared libraries, explicitly link against dependant
> > >libraries (adding entries to DT_NEEDED).
> >
> > OK, good.
> >
> > >  - Update app linking flags against static/shared DPDK libs.
> > >
> > > Note that this patch turns up being quite big because of moving lib
> > > directories to a new subdirectory.
> > > I have ommited the actual diff from the patch doing the move of
> > > librte_eal as it is quite big (6MB). Probably a different approach
> > > is
> > preferred.
> >
> > Why do you think moving directories is needed?
> >
> Actually I am not sure is the best way to do this :) There is no need to move
> them, as the same result could be achieved without moving directories, but I
> thought that it would be easier for anyone to see which libraries are 'core'
> and which are not.
> 
> Not moving those directories would definitely simplify this patch series.
> 
> > Thanks
> > --
> > Thomas
> 
> Thanks,
> Sergio

Hi Thomas,

Any other comments/suggestions ? 
My main concern would be the patch needed to move librte_eal (around 6MB). 

Thoughts?

Thanks,
Sergio

[dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread

2015-01-22 Thread Walukiewicz, Miroslaw



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, January 22, 2015 9:17 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 15/15] timer: add support to non-EAL thread
> 
> Allow to setup timers only for EAL (lcore) threads (__lcore_id <
> MAX_LCORE_ID).
> E.g. ? dynamically created thread will be able to reset/stop timer for lcore
> thread,
> but it will be not allowed to setup timer for itself or another non-lcore
> thread.
> rte_timer_manage() for non-lcore thread would simply do nothing and
> return straightway.
> 
> Signed-off-by: Cunming Liang 
> ---
>  lib/librte_timer/rte_timer.c | 40 +++
> -
>  lib/librte_timer/rte_timer.h |  2 +-
>  2 files changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
> index 269a992..601c159 100644
> --- a/lib/librte_timer/rte_timer.c
> +++ b/lib/librte_timer/rte_timer.c
> @@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE];
> 

Why not extend the priv_timer size to value being in range returned by 
rte_lcore_id().

All timer stuff will work automatically after such change without any change in 
timer logic including stats.

>  /* when debug is enabled, store some statistics */
>  #ifdef RTE_LIBRTE_TIMER_DEBUG
> -#define __TIMER_STAT_ADD(name, n) do {   \
> - unsigned __lcore_id = rte_lcore_id();   \
> - priv_timer[__lcore_id].stats.name += (n);   \
> +#define __TIMER_STAT_ADD(name, n) do {
>   \
> + unsigned __lcore_id = rte_lcore_id();   \
> + if (__lcore_id < RTE_MAX_LCORE)
>   \
> + priv_timer[__lcore_id].stats.name += (n);   \
>   } while(0)
>  #else
>  #define __TIMER_STAT_ADD(name, n) do {} while(0)
> @@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim,
>   unsigned lcore_id;
> 
>   lcore_id = rte_lcore_id();
> + if (lcore_id >= RTE_MAX_LCORE)
> + lcore_id = LCORE_ID_ANY;
> 
>   /* wait that the timer is in correct status before update,
>* and mark it as being configured */
>   while (success == 0) {
>   prev_status.u32 = tim->status.u32;
> 
> + /*
> +  * prevent race condition of non-EAL threads
> +  * to update the timer. When 'owner == LCORE_ID_ANY',
> +  * it means updated by a non-EAL thread.
> +  */
> + if (lcore_id == (unsigned)LCORE_ID_ANY &&
> + (uint16_t)lcore_id == prev_status.owner)
> + return -1;
> +
>   /* timer is running on another core, exit */
>   if (prev_status.state == RTE_TIMER_RUNNING &&
> - (unsigned)prev_status.owner != lcore_id)
> + prev_status.owner != (uint16_t)lcore_id)
>   return -1;
> 
>   /* timer is being configured on another core */
> @@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t
> expire,
> 
>   /* round robin for tim_lcore */
>   if (tim_lcore == (unsigned)LCORE_ID_ANY) {
> - tim_lcore =
> rte_get_next_lcore(priv_timer[lcore_id].prev_lcore,
> -0, 1);
> - priv_timer[lcore_id].prev_lcore = tim_lcore;
> + if (lcore_id < RTE_MAX_LCORE) {
> + tim_lcore = rte_get_next_lcore(
> + priv_timer[lcore_id].prev_lcore,
> + 0, 1);
> + priv_timer[lcore_id].prev_lcore = tim_lcore;
> + } else
> + tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0,
> 1);
>   }
> 
>   /* wait that the timer is in correct status before update,
> @@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t
> expire,
>   return -1;
> 
>   __TIMER_STAT_ADD(reset, 1);
> - if (prev_status.state == RTE_TIMER_RUNNING) {
> + if (prev_status.state == RTE_TIMER_RUNNING &&
> + lcore_id < RTE_MAX_LCORE) {
>   priv_timer[lcore_id].updated = 1;
>   }
> 
> @@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim)
>   return -1;
> 
>   __TIMER_STAT_ADD(stop, 1);
> - if (prev_status.state == RTE_TIMER_RUNNING) {
> + if (prev_status.state == RTE_TIMER_RUNNING &&
> + lcore_id < RTE_MAX_LCORE) {
>   priv_timer[lcore_id].updated = 1;
>   }
> 
> @@ -499,6 +517,10 @@ void rte_timer_manage(void)
>   uint64_t cur_time;
>   int i, ret;
> 
> + /* timer manager only runs on EAL thread */
> + if (lcore_id >= RTE_MAX_LCORE)
> + return;
> +
>   __TIMER_STAT_ADD(manage, 1);
>   /* optimize for the case where per-cpu list is empty */
>   if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
> diff --git a/lib/librte_timer

[dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to rte_memzone_reserve

2015-01-22 Thread Ferriter, Cian

Hey guys,

I just wanted to ask is there anything more that can be done with this patch or 
is it in an acceptable state for pushing?

Cian

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferriter, Cian
Sent: Monday, January 19, 2015 6:39 PM
To: Richardson, Bruce
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to 
rte_memzone_reserve

I would be happy with the original suggestion. If the ethdev data for a port in 
use is in cache it removes the performance concern associated the current setup 
and my fix. The original suggestion also fixes the crash that I was seeing 
because of memory being reserved from a numa node with no "--socket-mem" 
allocated.

Cian

-Original Message-
From: Richardson, Bruce
Sent: Wednesday, January 14, 2015 10:10 AM
To: Ferriter, Cian
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id passed to 
rte_memzone_reserve

On Tue, Jan 13, 2015 at 06:05:25PM +, Ferriter, Cian wrote:
> Comments on alternative solutions:
> 1) how would this solution work when there is no NIC present, and 
> "rte_eth_from_rings" is called? Here, could you have an else where the socket 
> id of the master core is passed to the "memzone_reserve"?
> 2) how would you advise making this change? I have looked at where 
> "rte_eth_dev_allocate" is being called and in all but one case, there is a 
> "numa_id" that could be passed in. This isn't the case for " 
> rte_eth_dev_init" however, is there an easy solution for this? Would there 
> now need to be an "rte_eth_dev_data" struct for each socket that there is a 
> NIC attached to, reserving memory from that socket?
> 
> Cian

While I think the issues you highlight can probably be overcome, I'm not so 
sure any more how much it matters what numa node this is allocated on. The 
ethdev data for any port in use by a port should be in the cache. In that case, 
if it doesn't matter, your original suggestion would work fine.

/Bruce

> 
> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, January 13, 2015 1:56 PM
> To: Ferriter, Cian
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_ether: change socket_id 
> passed to rte_memzone_reserve
> 
> On Tue, Jan 13, 2015 at 09:23:16AM +, Ferriter, Cian wrote:
> > Passing a socket id of "rte_socket_id()" can cause problems in non DPDK 
> > applications as there is a dependency on the current logical core we are 
> > running on.
> > Passing " rte_lcore_to_socket_id(rte_get_master_lcore())" as the socket id 
> > to rte_memzone_reserve resolves these issues as the master lcore doesn't 
> > change.
> > 
> 
> The only trouble is that when affinitizing the memory for the NICs to the 
> socket of the master lcore, it gives us no way to correctly configure an app 
> to use NICs connected to two different sockets on the one system. All memory 
> for all NICs will end up on the same socket. Two possible alternative 
> solutions:
> 1) affinitize memory to the socket the NIC is connected to
> 2) add a socket parameter to the API calls to allow the user complete 
> control over their memory allocations
> 
> Obviously the second one breaks backward compatibility (assume we modify 
> existing API call), but is more powerful.
> 
> Thoughts?
> 
> /Bruce
> 
> > -Original Message-
> > From: Ferriter, Cian
> > Sent: Tuesday, January 13, 2015 9:22 AM
> > To: dev at dpdk.org
> > Cc: Ferriter, Cian
> > Subject: [PATCH] lib/librte_ether: change socket_id passed to 
> > rte_memzone_reserve
> > 
> > Change the socket id that is passed to rte_memzone_reserve from the socket 
> > id of current logical core to the socket id of the master_lcore.
> > ---
> >  lib/librte_ether/rte_ethdev.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)  mode change
> > 100644 => 100755 lib/librte_ether/rte_ethdev.c
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.c 
> > b/lib/librte_ether/rte_ethdev.c old mode 100644 new mode 100755 
> > index 95f2ceb..835540d
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -184,7 +184,7 @@ rte_eth_dev_data_alloc(void)
> > if (rte_eal_process_type() == RTE_PROC_PRIMARY){
> > mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
> > -   rte_socket_id(), flags);
> > +   rte_lcore_to_socket_id(rte_get_master_lcore()), 
> > flags);
> > } else
> > mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > if (mz == NULL)
> > --
> > 1.7.4.1
> >

[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread

2015-01-22 Thread Walukiewicz, Miroslaw



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> Sent: Thursday, January 22, 2015 9:17 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> thread
> 
> For non-EAL thread, bypass per lcore cache, directly use ring pool.
> It allows using rte_mempool in either EAL thread or any user pthread.
> As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> It doesn't suggest to run multi-pthread/cpu which compete the
> rte_mempool.
> It will get bad performance and has critical risk if scheduling policy is RT.
> 
> Signed-off-by: Cunming Liang 
> ---
>  lib/librte_mempool/rte_mempool.h | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.h
> b/lib/librte_mempool/rte_mempool.h
> index 3314651..4845f27 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -198,10 +198,12 @@ struct rte_mempool {
>   *   Number to add to the object-oriented statistics.
>   */
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \
> - unsigned __lcore_id = rte_lcore_id();   \
> - mp->stats[__lcore_id].name##_objs += n; \
> - mp->stats[__lcore_id].name##_bulk += 1; \
> +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
> + unsigned __lcore_id = rte_lcore_id();   \
> + if (__lcore_id < RTE_MAX_LCORE) {   \
> + mp->stats[__lcore_id].name##_objs += n; \
> + mp->stats[__lcore_id].name##_bulk += 1; \
> + }   \
>   } while(0)
>  #else
>  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> void * const *obj_table,
>   __MEMPOOL_STAT_ADD(mp, put, n);
> 
>  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> - /* cache is not enabled or single producer */
> - if (unlikely(cache_size == 0 || is_mp == 0))
> + /* cache is not enabled or single producer or none EAL thread */

I don't understand this limitation. 

I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below 
#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
/** Per-lcore local cache. */
struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
#endif

But why we cannot extent the size of the local cache table to something like 
RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()

Keeping this condition here is a  real performance killer!!. 
I saw in my test application spending more 95% of CPU time reading the atomic 
in M C/MP ring utilizing access to mempool. 

Same comment for get operation below

> + if (unlikely(cache_size == 0 || is_mp == 0 ||
> +  lcore_id >= RTE_MAX_LCORE))
>   goto ring_enqueue;
> 
>   /* Go straight to ring if put would overflow mem allocated for cache
> */
> @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
> **obj_table,
>   uint32_t cache_size = mp->cache_size;
> 
>   /* cache is not enabled or single consumer */
> - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> + if (unlikely(cache_size == 0 || is_mc == 0 ||
> +  n >= cache_size || lcore_id >= RTE_MAX_LCORE))
>   goto ring_dequeue;
> 
>   cache = &mp->local_cache[lcore_id];
> --
> 1.8.1.4

[dpdk-dev] [PATCH] doc: commands changed in testpmd_funcs for ethertype filter

2015-01-22 Thread Jingjing Wu

new commands for ethertype filter
  - ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr)
(mac_address) ethertype (ether_type) (drop|fwd) queue (queue_id)

Signed-off-by: Jingjing Wu 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 51 +++--
 1 file changed, 12 insertions(+), 39 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index be935c2..218835a 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1392,61 +1392,34 @@ Filter Functions

 This section details the available filter functions that are available.

-add_ethertype_filter
+ethertype_filter
 

-Add a L2 Ethertype filter, which identify packets by their L2 Ethertype mainly 
assign them to a receive queue.
+Add or delete a L2 Ethertype filter, which identify packets by their L2 
Ethertype mainly assign them to a receive queue.

-add_ethertype_filter (port_id) ethertype (eth_value) priority (enable|disable) 
(pri_value) queue (queue_id) index (idx)
+ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr) (mac_address) 
ethertype (ether_type) (drop|fwd) queue (queue_id)

 The available information parameters are:

 *   port_id:  the port which the Ethertype filter assigned on.

-*   eth_value: the EtherType value want to match,
-for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
invalid.
-
-*   enable: user priority participates in the match.
-
-*   disable: user priority doesn't participate in the match.
-
-*   pri_value: user priority value that want to match.
-
-*   queue_id : The receive queue associated with this EtherType filter
+*   mac_addr: compare destination mac address.

-*   index: the index of this EtherType filter
+*   mac_ignr: ignore destination mac address match.

-Example:
-
-.. code-block:: console
+*   mac_address: destination mac address to match.

-testpmd> add_ethertype_filter 0 ethertype 0x0806 priority disable 0 queue 
3 index 0
-Assign ARP packet to receive queue 3
-
-remove_ethertype_filter
-~~~
-
-Remove a L2 Ethertype filter
-
-remove_ethertype_filter (port_id) index (idx)
-
-get_ethertype_filter
-
-
-Get and display a L2 Ethertype filter
+*   ether_type: the EtherType value want to match,
+for example 0x0806 for ARP packet. 0x0800 (IPv4) and 0x86DD (IPv6) are 
invalid.

-get_ethertype_filter (port_id) index (idx)
+*   queue_id : The receive queue associated with this EtherType filter. It is 
meaningless when deleting or dropping.

-Example:
+Example, to add/remove an ethertype filter rule:

 .. code-block:: console

-testpmd> get_ethertype_filter 0 index 0
-
-filter[0]:
-ethertype: 0x0806
-priority: disable, 0
-queue: 3
+testpmd> ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 3
+testpmd> ethertype_filter 0 del mac_ignr ethertype 0x0806 fwd queue 3

 add_2tuple_filter
 ~
-- 
1.9.3

[dpdk-dev] [PATCH v2] add one option memory-only for secondary processes

2015-01-22 Thread Chi, Xiaobo (NSN - CN/Hangzhou)

Hi, Bruce,
Since the DPDK2.0 merge window is opened now, so is it possible for this patch 
to be one candidate for v2.0?
I searched in the DPDK 
patchwork(http://www.dpdk.org/dev/patchwork/project/dpdk/list/?state=*&q=memory-only&archive=both
 ), but can not find this V2 patch. Can you please help to check why? Thanks a 
lot.

Filters: Search = memory-only  remove filter
PatchDate   Submitter   DelegateState
[dpdk-dev] add one option memory-only for those secondary PRBs  2014-12-02  
chixiaobo   Not Applicable
[dpdk-dev] add one option memory-only for those secondary PRBs  2014-12-02  
chixiaobo   Changes Requested

Brgs,
Chi Xiaobo


-Original Message-

From: ext Bruce Richardson [mailto:bruce.richard...@intel.com] 
Sent: Tuesday, December 16, 2014 6:04 PM
To: Chi, Xiaobo (NSN - CN/Hangzhou)
Cc: ext Hiroshi Shimamoto; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
processes

On Tue, Dec 16, 2014 at 09:26:48AM +, Chi, Xiaobo (NSN - CN/Hangzhou) wrote:
> Hi, Bruce,
> How about this patch, can it be merged to master branch? Thanks.
> 
> Brgs,
> Chi Xiaobo
> 

At this point, I think we are well past code-freeze for new features for 1.8,
but this looks a good candidate for 2.0 once the merge window for that opens.

/Bruce

> 
> -Original Message-
> From: Chi, Xiaobo (NSN - CN/Hangzhou) 
> Sent: Monday, December 15, 2014 5:58 PM
> To: 'ext Hiroshi Shimamoto'; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> Hi, Hiroshi,
> Yes, the should be performance degradation, not only due to the mempool 
> cache, but also due to process scheduling overhead (lead by no CPU pin.)
> I have not done the performance testing. In my project scenarios, those 
> SECONDARY processes only send/receive messages to/from the PRIMARY process 
> via mempool/ring, the throughput is not so high, so the performance 
> degradation is not critical to us. but there are dozens of SECONDARY 
> processes in our system, it will be hard to manually properly pin them to 
> different CPU cores, what we want is to apply linux standard scheduling 
> mechanism to do load balance between CPU cores.
> 
> Brgs,
> Chi Xiaobo
> 
> 
> -Original Message-
> From: ext Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com] 
> Sent: Thursday, December 11, 2014 11:03 AM
> To: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> Hi,
> 
> sorry for the delay.
> 
> > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > processes
> > 
> > Hi, Hiroshi,
> > Yes, you are right, in order to avoid such problem, while create the 
> > mempool, which shall be shared between the primary
> > process and those secondary Processes, we need to assign the cache_size 
> > param value to be zero. And in order to make the
> > system more stable, it's better to define the RTE_MEMPOOL_CACHE_MAX_SIZE to 
> > be 0 in rte_config.h.
> 
> Yes, it prevents the data corruption, but it also hurts the performance.
> I think, if we use the mbuf w/o cache for PMD, we will see the performance 
> degradation.
> 
> Don't you have any number?
> 
> thanks,
> Hiroshi
> 
> > 
> > /* create the mempool */
> > struct rte_mempool *
> > rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
> >unsigned cache_size, unsigned private_data_size,
> >rte_mempool_ctor_t *mp_init, void *mp_init_arg,
> >rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
> >int socket_id, unsigned flags);
> > 
> > 
> > Brgs,
> > Chi xiaobo
> > 
> > 
> > -Original Message-
> > From: ext Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> > Sent: Wednesday, December 03, 2014 6:54 PM
> > To: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > processes
> > 
> > Hi,
> > 
> > > Subject: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > > processes
> > >
> > > From: Chi Xiaobo 
> > >
> > > Problem: There is one normal DPDK processes deployment scenarios: one 
> > > primary process and several (even hundreds) secondary
> > > processes; all outside packets/messages are sent/received by primary 
> > > process and then distribute them to those secondary
> > > processes by DPDK's ring/sharedmemory mechanism. In such scenarios, those 
> > > SECONDARY processes need only hugepage based
> > > sharememory mechanism and it???s upper libs (such as ring, mempool, 
> > > etc.), they need not cpu core pinning, iopl privilege
> > > changing , pci device, timer, alarm, interrupt, shared_driver_list,  
> > > core_info, threads for each core, etc. Then, for
> > > such kind of SECONDARY processes, the current rte_eal_init() is too heavy.
> > >
> > > Solution:One new EAL

[dpdk-dev] Should the other queues at same port work when one queue is full ?

2015-01-22 Thread XU Liang

Thanks, I had validated the 'rx_drop_en' setting. It's worked. 
Regards,/Liang--From:Bruce
 Richardson Time:2015 Jan 19 (Mon) 18:58To:?? 
Cc:dev Subject:Re: [dpdk-dev] Should 
the other queues at same port work when one queue is full ?
On Sun, Jan 18, 2015 at 07:12:31PM +0800, XU Liang wrote:
> I configured the 82599 ports to work in multi-queue mode and flow director to 
> assign different TCP connections to different queues. A multi-process 
> application receive packets from queues and each?process reads a queue. When 
> I kill one process, the process's?queue is full, all?descriptors of the queue 
> is used. Then I send packets to other queues, but no?packet is received by 
> other processes from other queues. And no ierrors at the port stats.??I'm not 
> sure it's a bug or designed that way.?
> I expect that when a process exits abnormally affect only part of the 
> connections, but now all the connections are not working properly.?How can I 
> just turn off the exception queue, so that other processes / queues work 
> properly.

You need to turn on the "drop enable" bit in your NIC configuration to allow 
packets for full queues to be dropped, allowing other queues to continue as 
normal.
In DPDK this is set by the value "rx_drop_en" in the rx configuration.

In the latest DPDK tree, you can see this value being set for the symmetric mp
example application in: examples/multi_process/symmetric_mp/main.c

Regards,
/Bruce

[dpdk-dev] [PATCH v4 06/11] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-01-22 Thread Qiu, Michael

On 1/21/2015 6:01 PM, Tetsuya Mukawa wrote:
> Hi Michael,
>
> On 2015/01/20 18:23, Qiu, Michael wrote:
>> On 1/19/2015 6:42 PM, Tetsuya Mukawa wrote:
>>> The patch adds functions for unmapping igb_uio resources. The patch is only
>>> for Linux and igb_uio environment. VFIO and BSD are not supported.
>>>
>>> v4:
>>> - Add paramerter checking.
>>> - Add header file to determine if hotplug can be enabled.
>>>
>>> Signed-off-by: Tetsuya Mukawa 
>>> ---
>>>  lib/librte_eal/common/Makefile  |  1 +
>>>  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
>>>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 38 +++
>>>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
>>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
>>> +
>>>  5 files changed, 156 insertions(+)
>>>  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h
>>>
>>> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
>>> index 52c1a5f..db7cc93 100644
>>> --- a/lib/librte_eal/common/Makefile
>>> +++ b/lib/librte_eal/common/Makefile
>>> @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h
>>>  INC += rte_hexdump.h rte_devargs.h rte_dev.h
>>>  INC += rte_common_vect.h
>>>  INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
>>> +INC += rte_dev_hotplug.h
>>>  
>>>  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
>>>  INC += rte_warnings.h
>>> diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h 
>>> b/lib/librte_eal/common/include/rte_dev_hotplug.h
>>> new file mode 100644
>>> index 000..b333e0f
>>> --- /dev/null
>>> +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
>>> @@ -0,0 +1,44 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright(c) 2015 IGEL Co.,LTd.
>>> + *   All rights reserved.
>>> + *
>>> + *   Redistribution and use in source and binary forms, with or without
>>> + *   modification, are permitted provided that the following conditions
>>> + *   are met:
>>> + *
>>> + * * Redistributions of source code must retain the above copyright
>>> + *   notice, this list of conditions and the following disclaimer.
>>> + * * Redistributions in binary form must reproduce the above copyright
>>> + *   notice, this list of conditions and the following disclaimer in
>>> + *   the documentation and/or other materials provided with the
>>> + *   distribution.
>>> + * * Neither the name of IGEL Co.,Ltd. nor the names of its
>>> + *   contributors may be used to endorse or promote products derived
>>> + *   from this software without specific prior written permission.
>>> + *
>>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +#ifndef _RTE_DEV_HOTPLUG_H_
>>> +#define _RTE_DEV_HOTPLUG_H_
>>> +
>>> +/*
>>> + * determine if hotplug can be enabled on the system
>>> + */
>>> +#if defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
>> As you said, VFIO should not work with it, so does it need to add the
>> vfio check here?
> Could I have a advice of you?
> First I guess it's the best to include "eal_vfio.h" here, and add
> checking of VFIO_PRESENT macro.


I have a question, will your hotplug  feature support freebsd ?

If not, how about to put it in  "lib/librte_eal/linuxapp/eal/" ? Also 
include attach or detach affairs.

> But it seems I cannot reach "eal_vfio.h" from this file.

Yes, you can't :)

> My second option is just checking RTE_EAL_VFIO macro.
> But according to "eal_vfio.h", if kernel is under 3.6.0, VFIO_PRESENT

Actually,  in my opinion, whatever vfio or uio, only need be care in
runtime.

DPDK to check vfio only to add support  for vfio, but this does not
means the device will use vfio,

So even if VFIO_PRESENT is defined, and vfio is enabled, but the device
is bind to igb_uio, then your hotplug still  need work, but if it bind
to vfio, will not, am I right?

If yes, I'm not sure if your hotplug has this ability, but it is
reasonable, I think.

> will not be defined even when RTE_EAL_VFIO is enabled.
> So I guess simply macro checking will not work correctly.
>  
> Anyway, here are my implementation choices so far.
>
> 1

[dpdk-dev] [PATCH 0/7] unification of flow types and RSS offload types

2015-01-22 Thread Wu, Jingjing



> -Original Message-
> From: Zhang, Helin
> Sent: Monday, January 19, 2015 2:56 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Cao, Waterman; Zhang, Helin
> Subject: [PATCH 0/7] unification of flow types and RSS offload types
> 
> It unifies the flow types and RSS offload types for all PMDs. Previously flow
> types are defined actually for i40e, and there has different RSS offloads 
> tyeps
> for 1/10G and 40G seperately. This is not so convenient for application
> development, and not good for adding new PMDs. In addition, it enables
> new RSS offloads of 'tcp' and 'all' in testpmd.
> 
> 
> Helin Zhang (7):
>   app/test-pmd: code style fix
>   ethdev: code style fix
>   i40e: code style fix
>   ethdev: fix of calculating the size of flow type mask array
>   ethdev: unification of flow types
>   ethdev: unification of RSS offload types
>   app/testpmd: support new rss offloads
> 
>  app/test-pipeline/init.c|   2 +-
>  app/test-pmd/cmdline.c  | 107 +++
>  app/test-pmd/config.c   | 137 +++--
>  examples/distributor/main.c |   9 +-
>  examples/ip_pipeline/init.c |   2 +-
>  examples/l3fwd-acl/main.c   |   7 +-
>  lib/librte_ether/rte_eth_ctrl.h |  91 +++-
>  lib/librte_ether/rte_ethdev.h   | 147 
> +---
>  lib/librte_pmd_e1000/e1000_ethdev.h |  11 +++
>  lib/librte_pmd_e1000/igb_ethdev.c   |   1 +
>  lib/librte_pmd_e1000/igb_rxtx.c |  27 ++
>  lib/librte_pmd_i40e/i40e_ethdev.c   | 126 ++-
>  lib/librte_pmd_i40e/i40e_ethdev.h   |  50 +--
>  lib/librte_pmd_i40e/i40e_ethdev_vf.c|   1 +
>  lib/librte_pmd_i40e/i40e_fdir.c |  91 ++--
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   1 +
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  11 +++
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  27 ++
>  lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c |   1 +
>  lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h |   6 ++
>  lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c   |  10 +--
>  21 files changed, 473 insertions(+), 392 deletions(-)
> 
> --
> 1.9.3

Acked-by: Jingjing Wu

[dpdk-dev] [PATCH v8 3/4] i40e: support of controlling hash functions

2015-01-22 Thread Zhang, Helin

Hi Thomas

I have sent out v9 of this patch set, with adopting your comments. Thank you 
very much!

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, January 20, 2015 3:54 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 3/4] i40e: support of controlling hash
> functions
> 
> Hi Helin,
> 
> 2014-12-02 10:19, Helin Zhang:
> > Hash filter control has been implemented for i40e. It includes
> > getting/setting,
> > - global hash configurations (hash function type, and symmetric
> >   hash enable per flow type)
> > - symmetric hash enable per port
> >
> > Signed-off-by: Helin Zhang 
> > ---
> >  lib/librte_ether/rte_eth_ctrl.h   |  63 
> >  lib/librte_pmd_i40e/i40e_ethdev.c | 294
> > +-
> >  2 files changed, 355 insertions(+), 2 deletions(-)
> 
> Please, could you split ethdev and i40e parts while keeping Konstantin's ack?
I have split it into two patches. Please forgive my typo (splitted->split) in 
changes logs
which will not occur in commits.
Do I need ask Konstantin to send out his Acked-by again? Or we can just add it?
The only differences is just as below.

--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -485,7 +485,7 @@ enum rte_eth_hash_function {
  * include symmetric hash enable per flow type and hash function type.
  * Each bit in sym_hash_enable_mask[] indicates if the symmetric hash of the
  * coresponding flow type is enabled or not.
- * Each bit in valid_bit_mask[] indicates if the corresponding bit in
+ * Each bit in valid_bit_mask[] indicates if the coresponding bit in
  * sym_hash_enable_mask[] is valid or not. For the configurations gotten, it
  * also means if the flow type is supported by hardware or not.
  */
@@ -493,7 +493,7 @@ struct rte_eth_hash_global_conf {
enum rte_eth_hash_function hash_func; /**< Hash function type */
/** Bit mask for symmetric hash enable per flow type */
uint32_t sym_hash_enable_mask[RTE_SYM_HASH_MASK_ARRAY_SIZE];
-   /** Bit mask indicates if the corresponding bit is valid */
+   /** Bit mask indicates if the coresponding bit is valid */
uint32_t valid_bit_mask[RTE_SYM_HASH_MASK_ARRAY_SIZE];
 };

@@ -502,12 +502,12 @@ struct rte_eth_hash_global_conf {
  * type of 'RTE_ETH_FILTER_HASH' and its operations.
  */
 struct rte_eth_hash_filter_info {
-   enum rte_eth_hash_filter_info_type info_type; /**< Information type */
-   /** Details of hash filter information */
+   enum rte_eth_hash_filter_info_type info_type; /**< Information type. */
+   /** Details of hash filter infomation */
union {
-   /** For RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT */
+   /* For RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT */
uint8_t enable;
-   /** Global configurations of hash filter */
+   /* Global configurations of hash filter */
struct rte_eth_hash_global_conf global_conf;
} info;
 };

> 
> [...]
> > + * Each bit in valid_bit_mask[] indicates if the coresponding bit in
> 
> Typo: corresponding
Thanks, it is corrected in v9.

> 
> [...]
> > +   /** Bit mask indicates if the coresponding bit is valid */
> 
> Same typo
Thanks, it is corrected in v9.

> 
> [...]
> > +   /** Details of hash filter infomation */
> 
> Typo: information
Thanks, it is corrected in v9.

> 
> > +   union {
> > +   /* For RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT */
> > +   uint8_t enable;
> > +   /* Global configurations of hash filter */
> > +   struct rte_eth_hash_global_conf global_conf;
> > +   } info;
> 
> Why these comments are not doxygen'ed?
Thanks, it is enabled in v9.

> 
> Sorry for nitpicking, that's the last review pass ;)
Don't worry, it is not nitpicking from my point of view. I really appreciate 
your hard works!
Thank you!

Regards,
Helin

> --
> Thomas

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Jay Rolette

On Thu, Jan 22, 2015 at 3:06 AM, Luke Gorrie  wrote:

Here is another thought: when is it time to start thinking of packet copy
> as a cheap unit-time operation?
>

Pretty much never short of changes to memory architecture, IMO. Frankly,
there are never enough cycles for deep packet inspection applications that
need to run at/near line-rate. Don't waste any doing something you can
avoid in the first place.

Microseconds matter. Scaling up to 100GbE, nanoseconds matter.

Jay

[dpdk-dev] [PATCH v2] add one option memory-only for secondary processes

2015-01-22 Thread Thomas Monjalon

2015-01-22 11:17, Bruce Richardson:
> On Thu, Jan 22, 2015 at 09:05:34AM +, Chi, Xiaobo (NSN - CN/Hangzhou) 
> wrote:
> > Hi, Bruce,
> > Since the DPDK2.0 merge window is opened now, so is it possible for this 
> > patch to be one candidate for v2.0?
> > I searched in the DPDK 
> > patchwork(http://www.dpdk.org/dev/patchwork/project/dpdk/list/?state=*&q=memory-only&archive=both
> >  ), but can not find this V2 patch. Can you please help to check why? 
> > Thanks a lot.
> > 
> > Filters: Search = memory-only  remove filter
> > PatchDate   Submitter   DelegateState
> > [dpdk-dev] add one option memory-only for those secondary PRBs  
> > 2014-12-02  chixiaobo   Not Applicable
> > [dpdk-dev] add one option memory-only for those secondary PRBs  
> > 2014-12-02  chixiaobo   Changes Requested
> > 
> > Brgs,
> > Chi Xiaobo
> > 
> That's a question that Thomas is better able to answer than me, since he is 
> the
> man with control over patchwork! :-)
> 
> Thomas, any feedback here?

I have no log for this kind of problem.
But I know that patchwork ignores emails with special characters.
And in your commit log, there are some in "mechanism and it???s upper libs".
Moreover, this commit log should be wrapped.
A quick look shows also that some spaces/tabs are missing.
It was a v2 and there is no change log.
Please submit a v3 after cleaning.

I didn't review this patch and nobody gave its Acked-by.
So at the moment, it's pending. I'll try to review v3 carefully.
Other comments are welcome. I feel this patch can break some important things.
Which tests have you done? (it could be described in commit log)

Last point: I don't like the current implementation of secondary process
and Ericsson wanted to discuss their own implementation:
http://dpdk.org/ml/archives/dev/2014-December/009796.html

-- 
Thomas

[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

2015-01-22 Thread Liu, Jijiang

Hi,

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Thursday, January 22, 2015 3:45 AM
> To: Liu, Jijiang
> Cc: Olivier MATZ; Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and
> csum forwarding engine
> 
> On Wed, 21 Jan 2015 03:12:35 +
> "Liu, Jijiang"  wrote:
> 
> > > Because the dpdk looks very similar to that part of linux driver.
> >
> > A  guy from Intel  who have already confirmed that the NVGRE is not
> supported yet in Linux kernel.
> >
> > He said "So far as I know it is not yet supported and I have no information
> on when it will be."
> 
> The existing GRETAP support is sufficient to support NVGRE. No new work is
> needed.

Sorry, I meant i40e NVGRE feature.

[dpdk-dev] [RFC 00/16] enhance checksum offload API

2015-01-22 Thread Olivier MATZ

Test done on testpmd on x86_64-native-linuxapp-gcc

platform:

  Tester (linux)   <>   DUT (DPDK)
 ixgbe6 port0 (i40e or ixgbe)

Run testpmd on DUT:

  cd dpdk.org/
  make install T=x86_64-native-linuxapp-gcc
  cd x86_64-native-linuxapp-gcc/
  modprobe uio
  insmod kmod/igb_uio.ko
  python ../tools/dpdk_nic_bind.py -b igb_uio :02:00.0
  echo 0 > /proc/sys/kernel/randomize_va_space
  echo 1000 >
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  echo 1000 >
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
  mount -t hugetlbfs none /mnt/huge
  ./app/testpmd -c 0x55 -n 4 -m 800 -- -i --port-topology=chained
--enable-rx-cksum

Disable all offload feature on Tester, and start capture:

  ethtool -K ixgbe6 rx off tx off tso off gso off gro off lro off
  ip l set ixgbe6 up
  tcpdump -n -e -i ixgbe6 -s 0 -w /tmp/cap

We use the attached scapy script (dpdk-cksum-test.py) for testing.

In each attached capture file, odd packets are generated by scapy and
even ones are generated by the dpdk (except for TSO where several tx
packet corresponds to one rx packet).

In some conditions, the checksum cannot be calculated, for instance when
inner checksum is done in hw and outer checksum in sw, or if it is not
supported by hardware.

Notes:
- case 6 is not present for ixgbe (inner + outer)
- case 5 is not present for i40e (tso)
- some strange behavior to be analyzed for first packets of tso on ixgbe
- ipip tunnel is not working in case 6 of i40e

--

case 1) calculate checksum of out_ip  (was case A in [1])

mb->l2_len = len(out_eth)
mb->l3_len = len(out_ip)
mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM
set out_ip checksum to 0 in the packet

Testpmd commands:

stop
csum parse_tunnel off 0
csum set ip hw 0
csum set tcp sw 0
csum set udp sw 0
csum set sctp sw 0
tso set 0 0
set fwd csum
set verbose 1
start

--

case 2) calculate checksum of out_ip and out_udp

mb->l2_len = len(out_eth)
mb->l3_len = len(out_ip)
mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM
set out_ip checksum to 0 in the packet
set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()

Testpmd commands:

stop
csum parse_tunnel off 0
csum set ip hw 0
csum set tcp hw 0
csum set udp hw 0
csum set sctp hw 0
tso set 0 0
set fwd csum
set verbose 1
start

--

case 3) calculate checksum of in_ip

mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
mb->l3_len = len(in_ip)
mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM
set in_ip checksum to 0 in the packet

Testpmd commands:

stop
csum parse_tunnel on 0
csum set ip hw 0
csum set tcp sw 0
csum set udp sw 0
csum set sctp sw 0
csum set outer-ip sw 0
tso set 0 0
set fwd csum
set verbose 1
start

--

case 4) calculate checksum of in_ip and in_tcp  (was case B.2 in [1])

mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
mb->l3_len = len(in_ip)
mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_TCP_CKSUM
set in_ip checksum to 0 in the packet
set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum()

Testpmd commands:

stop
csum parse_tunnel on 0
csum set ip hw 0
csum set tcp hw 0
csum set udp hw 0
csum set sctp hw 0
csum set outer-ip sw 0
tso set 0 0
set fwd csum
set verbose 1
start

--

case 5) segment inner TCP

mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth)
mb->l3_len = len(in_ip)
mb->l4_len = len(in_tcp)
mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |
  PKT_TX_TCP_SEG;
set in_ip checksum to 0 in the packet
set in_tcp checksum to pseudo header without including the IP
  payload length using rte_ipv4_phdr_cksum()

Testpmd commands:

stop
csum parse_tunnel on 0
csum set ip hw 0
csum set tcp hw 0
csum set udp hw 0
csum set sctp hw 0
csum set outer-ip sw 0
tso set 500 0
set fwd csum
set verbose 1
start

--

case 6) calculate checksum of out_ip, in_ip, in_tcp (was case C in [1])

mb->outer_l2_len = len(out_eth)
mb->outer_l3_len = len(out_ip)
mb->l2_len = len(out_udp + vxlan + in_eth)
mb->l3_len = len(in_ip)
mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM  | \
  PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
set out_ip checksum to 0 in the packet
set in_ip checksum to 0 in the packet
set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum()

Testpmd commands:

stop
csum parse_tunnel on 0
csum set ip hw 0
csum set tcp hw 0
csum set udp hw 0
csum set sctp hw 0
csum set outer-ip hw 0
tso set 0 0
set fwd csum
set verbose 1
start
-- next part --
A non-text attachment was scrubbed...
Name: i40e_case1.cap
Type: application/vnd.tcpdump.pcap
Size: 60080 bytes
Desc: not available
URL: 
<http://dpdk.org/ml/archive

[dpdk-dev] [RFC 16/16] testpmd: support ipip tunnel in csum forward engine

2015-01-22 Thread Olivier Matz

Add support for IP over IP tunnels.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c  |  2 +-
 app/test-pmd/csumonly.c | 40 ++--
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 9304207..b1832e3 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -323,7 +323,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"ip|udp|tcp|sctp always concern the inner layer.\n"
"outer-ip concerns the outer IP layer in"
" case the packet is recognized as a tunnel packet by"
-   " the forward engine (vxlan and gre are supported)\n"
+   " the forward engine (vxlan, gre and ipip are 
supported)\n"
"Please check the NIC datasheet for HW limits.\n\n"

"csum parse-tunnel (on|off) (port_id)\n"
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 02c01f6..407e3b3 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -278,6 +278,35 @@ parse_gre(struct simple_gre_hdr *gre_hdr, struct 
testpmd_offload_info *info)
info->l2_len += sizeof(struct simple_gre_hdr);
 }

+
+/* Parse an encapsulated ip or ipv6 header */
+static void
+parse_encap_ip(void *encap_ip, struct testpmd_offload_info *info)
+{
+   struct ipv4_hdr *ipv4_hdr = encap_ip;
+   struct ipv6_hdr *ipv6_hdr = encap_ip;
+   uint8_t ip_version;
+
+   ip_version = (ipv4_hdr->version_ihl & 0xf0) >> 4;
+
+   if (ip_version != 4 && ip_version != 6)
+   return;
+
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+
+   if (ip_version == 4) {
+   parse_ipv4(ipv4_hdr, info);
+   info->ethertype = _htons(ETHER_TYPE_IPv4);
+   } else {
+   parse_ipv6(ipv6_hdr, info);
+   info->ethertype = _htons(ETHER_TYPE_IPv6);
+   }
+   info->l2_len = 0;
+}
+
 /* modify the IPv4 or IPv4 source address of a packet */
 static void
 change_ip_addresses(void *l3_hdr, uint16_t ethertype)
@@ -430,6 +459,7 @@ uint16_t testpmd_ol_flags)
  *   UDP|TCP|SCTP
  *   Ether / (vlan) / outer IP|IP6 / GRE / Ether / IP|IP6 / UDP|TCP|SCTP
  *   Ether / (vlan) / outer IP|IP6 / GRE / IP|IP6 / UDP|TCP|SCTP
+ *   Ether / (vlan) / outer IP|IP6 / IP|IP6 / UDP|TCP|SCTP
  *
  * The testpmd command line for this forward engine sets the flags
  * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
@@ -511,14 +541,12 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
gre_hdr = (struct simple_gre_hdr *)
((char *)l3_hdr + info.l3_len);
parse_gre(gre_hdr, &info);
+   } else if (info.l4_proto == IPPROTO_IPIP) {
+   void *encap_ip_hdr;
+   encap_ip_hdr = (char *)l3_hdr + info.l3_len;
+   parse_encap_ip(encap_ip_hdr, &info);
}
}
-   info.l4_proto == IPPROTO_GRE) {
-   struct simple_gre_hdr *gre_hdr;
-   gre_hdr = (struct simple_gre_hdr *)((char *)l3_hdr +
-   info.l3_len);
-   parse_gre(gre_hdr, &info);
-   }

/* update l3_hdr and outer_l3_hdr if a tunnel was parsed */
if (info.is_tunnel) {
-- 
2.1.3

[dpdk-dev] [RFC 15/16] testpmd: support gre tunnels in csum fwd engine

2015-01-22 Thread Olivier Matz

Add support for Ethernet over GRE and IP over GRE tunnels.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c  |  6 ++--
 app/test-pmd/csumonly.c | 87 +
 2 files changed, 84 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 451c728..9304207 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -321,9 +321,9 @@ static void cmd_help_long_parsed(void *parsed_result,
" checksum with when transmitting a packet using the"
" csum forward engine.\n"
"ip|udp|tcp|sctp always concern the inner layer.\n"
-   "outer-ip concerns the outer IP layer (in"
-   " case the packet is recognized as a vxlan packet by"
-   " the forward engine)\n"
+   "outer-ip concerns the outer IP layer in"
+   " case the packet is recognized as a tunnel packet by"
+   " the forward engine (vxlan and gre are supported)\n"
"Please check the NIC datasheet for HW limits.\n\n"

"csum parse-tunnel (on|off) (port_id)\n"
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 52af0e7..02c01f6 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -100,6 +100,12 @@ struct testpmd_offload_info {
uint16_t tso_segsz;
 };

+/* simplified GRE header (flags must be 0) */
+struct simple_gre_hdr {
+   uint16_t flags;
+   uint16_t proto;
+};
+
 static uint16_t
 get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
 {
@@ -218,6 +224,60 @@ parse_vxlan(struct udp_hdr *udp_hdr, struct 
testpmd_offload_info *info,
info->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
 }

+/* Parse a gre header */
+static void
+parse_gre(struct simple_gre_hdr *gre_hdr, struct testpmd_offload_info *info)
+{
+   struct ether_hdr *eth_hdr;
+   struct ipv4_hdr *ipv4_hdr;
+   struct ipv6_hdr *ipv6_hdr;
+
+   /* if flags != 0; it's not supported */
+   if (gre_hdr->flags != 0)
+   return;
+
+   if (gre_hdr->proto == _htons(ETHER_TYPE_IPv4)) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+
+   ipv4_hdr = (struct ipv4_hdr *)((char *)gre_hdr +
+   sizeof(struct simple_gre_hdr));
+
+   parse_ipv4(ipv4_hdr, info);
+   info->ethertype = _htons(ETHER_TYPE_IPv4);
+   info->l2_len = 0;
+
+   } else if (gre_hdr->proto == _htons(ETHER_TYPE_IPv6)) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+
+   ipv6_hdr = (struct ipv6_hdr *)((char *)gre_hdr +
+   sizeof(struct simple_gre_hdr));
+
+   info->ethertype = _htons(ETHER_TYPE_IPv6);
+   parse_ipv6(ipv6_hdr, info);
+   info->l2_len = 0;
+
+   } else if (gre_hdr->proto == _htons(0x6558)) { /* ETH_P_TEB in linux */
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+
+   eth_hdr = (struct ether_hdr *)((char *)gre_hdr +
+   sizeof(struct simple_gre_hdr));
+
+   parse_ethernet(eth_hdr, info);
+   } else
+   return;
+
+   info->l2_len += sizeof(struct simple_gre_hdr);
+}
+
 /* modify the IPv4 or IPv4 source address of a packet */
 static void
 change_ip_addresses(void *l3_hdr, uint16_t ethertype)
@@ -368,6 +428,8 @@ uint16_t testpmd_ol_flags)
  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
  *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
  *   UDP|TCP|SCTP
+ *   Ether / (vlan) / outer IP|IP6 / GRE / Ether / IP|IP6 / UDP|TCP|SCTP
+ *   Ether / (vlan) / outer IP|IP6 / GRE / IP|IP6 / UDP|TCP|SCTP
  *
  * The testpmd command line for this forward engine sets the flags
  * TESTPMD_TX_OFFLOAD_* in ports[tx_port].tx_ol_flags. They control
@@ -437,12 +499,25 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
parse_ethernet(eth_hdr, &info);
l3_hdr = (char *)eth_hdr + info.l2_len;

-   /* check if it's a supported tunnel (only vxlan for now) */
-   if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_PARSE_TUNNEL) &&
-   info.l4_proto == IPPROTO_UDP) {
-   struct udp_hdr *udp_hdr;
-   udp_hdr = (struct udp_hdr *)((char *)l3_hdr + 
info.l3_len);
-   parse_vxlan(udp_hdr, &info, m->ol_flags);
+

[dpdk-dev] [RFC 14/16] testpmd: introduce parse_vxlan in csum fwd engine

2015-01-22 Thread Olivier Matz

Move code parsing vxlan into a function. It will ease the support
of GRE tunnels and IPIP tunnels in next commits.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c | 68 +++--
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 0b89d89..52af0e7 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -93,7 +93,6 @@ struct testpmd_offload_info {
uint16_t l3_len;
uint16_t l4_len;
uint8_t l4_proto;
-   uint8_t l4_tun_len;
uint8_t is_tunnel;
uint16_t outer_ethertype;
uint16_t outer_l2_len;
@@ -191,6 +190,34 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct 
testpmd_offload_info *info)
}
 }

+/* Parse a vxlan header */
+static void
+parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
+   uint64_t mbuf_olflags)
+{
+   struct ether_hdr *eth_hdr;
+
+   /* check udp destination port, 4789 is the default vxlan port
+* (rfc7348) or that the rx offload flag is set (i40e only
+* currently) */
+   if (udp_hdr->dst_port != _htons(4789) &&
+   (mbuf_olflags & (PKT_RX_TUNNEL_IPV4_HDR |
+   PKT_RX_TUNNEL_IPV6_HDR)) != 0)
+   return;
+
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+
+   eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
+   sizeof(struct udp_hdr) +
+   sizeof(struct vxlan_hdr));
+
+   parse_ethernet(eth_hdr, info);
+   info->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
+}
+
 /* modify the IPv4 or IPv4 source address of a packet */
 static void
 change_ip_addresses(void *l3_hdr, uint16_t ethertype)
@@ -356,7 +383,6 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
struct rte_mbuf *m;
struct ether_hdr *eth_hdr;
void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
-   struct udp_hdr *udp_hdr;
uint16_t nb_rx;
uint16_t nb_tx;
uint16_t i;
@@ -414,33 +440,15 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
/* check if it's a supported tunnel (only vxlan for now) */
if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_PARSE_TUNNEL) &&
info.l4_proto == IPPROTO_UDP) {
+   struct udp_hdr *udp_hdr;
udp_hdr = (struct udp_hdr *)((char *)l3_hdr + 
info.l3_len);
+   parse_vxlan(udp_hdr, &info, m->ol_flags);
+   }

-   /* check udp destination port, 4789 is the default
-* vxlan port (rfc7348) */
-   if (udp_hdr->dst_port == _htons(4789)) {
-   info.l4_tun_len = ETHER_VXLAN_HLEN;
-   info.is_tunnel = 1;
-
-   /* currently, this flag is set by i40e only if the
-* packet is vxlan */
-   } else if (m->ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
-   PKT_RX_TUNNEL_IPV6_HDR))
-   info.is_tunnel = 1;
-
-   if (info.is_tunnel == 1) {
-   info.outer_ethertype = info.ethertype;
-   info.outer_l2_len = info.l2_len;
-   info.outer_l3_len = info.l3_len;
-   outer_l3_hdr = l3_hdr;
-
-   eth_hdr = (struct ether_hdr *)((char *)udp_hdr +
-   sizeof(struct udp_hdr) +
-   sizeof(struct vxlan_hdr));
-
-   parse_ethernet(eth_hdr, &info);
-   l3_hdr = (char *)eth_hdr + info.l2_len;
-   }
+   /* update l3_hdr and outer_l3_hdr if a tunnel was parsed */
+   if (info.is_tunnel) {
+   outer_l3_hdr = l3_hdr;
+   l3_hdr = (char *)l3_hdr + info.outer_l3_len + 
info.l2_len;
}

/* step 2: change all source IPs (v4 or v6) so we need
@@ -472,7 +480,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
if (testpmd_ol_flags & 
TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM) {
m->outer_l2_len = info.outer_l2_len;
m->outer_l3_len = info.outer_l3_len;
-   m->l2_len = info.l4_tun_len + info.l2_len;
+   m->l2_len = info.l2_len;
m->l3_len = info.l3_len;
}
else {
@@ -482,9 +490,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
   the payload will be modifie

[dpdk-dev] [RFC 13/16] testpmd: use a structure to store offload info in csum fwd engine

2015-01-22 Thread Olivier Matz

To simplify the API of parse_* functions, store all the offload
information for the current packet in a structure.

No functional change.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c | 222 +---
 1 file changed, 115 insertions(+), 107 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index b023f12..0b89d89 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -86,6 +86,21 @@
 #define _htons(x) (x)
 #endif

+/* structure that caches offload info for the current packet */
+struct testpmd_offload_info {
+   uint16_t ethertype;
+   uint16_t l2_len;
+   uint16_t l3_len;
+   uint16_t l4_len;
+   uint8_t l4_proto;
+   uint8_t l4_tun_len;
+   uint8_t is_tunnel;
+   uint16_t outer_ethertype;
+   uint16_t outer_l2_len;
+   uint16_t outer_l3_len;
+   uint16_t tso_segsz;
+};
+
 static uint16_t
 get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
 {
@@ -106,38 +121,36 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t 
ethertype)

 /* Parse an IPv4 header to fill l3_len, l4_len, and l4_proto */
 static void
-parse_ipv4(struct ipv4_hdr *ipv4_hdr, uint16_t *l3_len, uint8_t *l4_proto,
-   uint16_t *l4_len)
+parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct testpmd_offload_info *info)
 {
struct tcp_hdr *tcp_hdr;

-   *l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
-   *l4_proto = ipv4_hdr->next_proto_id;
+   info->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+   info->l4_proto = ipv4_hdr->next_proto_id;

/* only fill l4_len for TCP, it's useful for TSO */
-   if (*l4_proto == IPPROTO_TCP) {
-   tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + *l3_len);
-   *l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   if (info->l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + info->l3_len);
+   info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
} else
-   *l4_len = 0;
+   info->l4_len = 0;
 }

 /* Parse an IPv6 header to fill l3_len, l4_len, and l4_proto */
 static void
-parse_ipv6(struct ipv6_hdr *ipv6_hdr, uint16_t *l3_len, uint8_t *l4_proto,
-   uint16_t *l4_len)
+parse_ipv6(struct ipv6_hdr *ipv6_hdr, struct testpmd_offload_info *info)
 {
struct tcp_hdr *tcp_hdr;

-   *l3_len = sizeof(struct ipv6_hdr);
-   *l4_proto = ipv6_hdr->proto;
+   info->l3_len = sizeof(struct ipv6_hdr);
+   info->l4_proto = ipv6_hdr->proto;

/* only fill l4_len for TCP, it's useful for TSO */
-   if (*l4_proto == IPPROTO_TCP) {
-   tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + *l3_len);
-   *l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   if (info->l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + info->l3_len);
+   info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
} else
-   *l4_len = 0;
+   info->l4_len = 0;
 }

 /*
@@ -146,35 +159,34 @@ parse_ipv6(struct ipv6_hdr *ipv6_hdr, uint16_t *l3_len, 
uint8_t *l4_proto,
  * header. The l4_len argument is only set in case of TCP (useful for TSO).
  */
 static void
-parse_ethernet(struct ether_hdr *eth_hdr, uint16_t *ethertype, uint16_t 
*l2_len,
-   uint16_t *l3_len, uint8_t *l4_proto, uint16_t *l4_len)
+parse_ethernet(struct ether_hdr *eth_hdr, struct testpmd_offload_info *info)
 {
struct ipv4_hdr *ipv4_hdr;
struct ipv6_hdr *ipv6_hdr;

-   *l2_len = sizeof(struct ether_hdr);
-   *ethertype = eth_hdr->ether_type;
+   info->l2_len = sizeof(struct ether_hdr);
+   info->ethertype = eth_hdr->ether_type;

-   if (*ethertype == _htons(ETHER_TYPE_VLAN)) {
+   if (info->ethertype == _htons(ETHER_TYPE_VLAN)) {
struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);

-   *l2_len  += sizeof(struct vlan_hdr);
-   *ethertype = vlan_hdr->eth_proto;
+   info->l2_len  += sizeof(struct vlan_hdr);
+   info->ethertype = vlan_hdr->eth_proto;
}

-   switch (*ethertype) {
+   switch (info->ethertype) {
case _htons(ETHER_TYPE_IPv4):
-   ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
-   parse_ipv4(ipv4_hdr, l3_len, l4_proto, l4_len);
+   ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + info->l2_len);
+   parse_ipv4(ipv4_hdr, info);
break;
case _htons(ETHER_TYPE_IPv6):
-   ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
-   parse_ipv6(ipv6_hdr, l3_len, l4_proto, l4_len);
+   ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + info->l2_len);
+   parse_ipv6(ipv6_hdr, info);
break;
default:
-   *l4_len = 0;
-   *l3_len = 0;
-   *l4_proto = 0;
+

[dpdk-dev] [RFC 12/16] testpmd: introduce parse_ipv* in csum fwd engine

2015-01-22 Thread Olivier Matz

These functions may be used to parse encapsulated layers
when we will support IP over GRE tunnels.

No functional change.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/csumonly.c | 51 +
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 3921643..b023f12 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -104,6 +104,42 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t 
ethertype)
return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr);
 }

+/* Parse an IPv4 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv4(struct ipv4_hdr *ipv4_hdr, uint16_t *l3_len, uint8_t *l4_proto,
+   uint16_t *l4_len)
+{
+   struct tcp_hdr *tcp_hdr;
+
+   *l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+   *l4_proto = ipv4_hdr->next_proto_id;
+
+   /* only fill l4_len for TCP, it's useful for TSO */
+   if (*l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + *l3_len);
+   *l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   } else
+   *l4_len = 0;
+}
+
+/* Parse an IPv6 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv6(struct ipv6_hdr *ipv6_hdr, uint16_t *l3_len, uint8_t *l4_proto,
+   uint16_t *l4_len)
+{
+   struct tcp_hdr *tcp_hdr;
+
+   *l3_len = sizeof(struct ipv6_hdr);
+   *l4_proto = ipv6_hdr->proto;
+
+   /* only fill l4_len for TCP, it's useful for TSO */
+   if (*l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + *l3_len);
+   *l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   } else
+   *l4_len = 0;
+}
+
 /*
  * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
  * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
@@ -115,7 +151,6 @@ parse_ethernet(struct ether_hdr *eth_hdr, uint16_t 
*ethertype, uint16_t *l2_len,
 {
struct ipv4_hdr *ipv4_hdr;
struct ipv6_hdr *ipv6_hdr;
-   struct tcp_hdr *tcp_hdr;

*l2_len = sizeof(struct ether_hdr);
*ethertype = eth_hdr->ether_type;
@@ -130,26 +165,18 @@ parse_ethernet(struct ether_hdr *eth_hdr, uint16_t 
*ethertype, uint16_t *l2_len,
switch (*ethertype) {
case _htons(ETHER_TYPE_IPv4):
ipv4_hdr = (struct ipv4_hdr *) ((char *)eth_hdr + *l2_len);
-   *l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
-   *l4_proto = ipv4_hdr->next_proto_id;
+   parse_ipv4(ipv4_hdr, l3_len, l4_proto, l4_len);
break;
case _htons(ETHER_TYPE_IPv6):
ipv6_hdr = (struct ipv6_hdr *) ((char *)eth_hdr + *l2_len);
-   *l3_len = sizeof(struct ipv6_hdr);
-   *l4_proto = ipv6_hdr->proto;
+   parse_ipv6(ipv6_hdr, l3_len, l4_proto, l4_len);
break;
default:
+   *l4_len = 0;
*l3_len = 0;
*l4_proto = 0;
break;
}
-
-   if (*l4_proto == IPPROTO_TCP) {
-   tcp_hdr = (struct tcp_hdr *)((char *)eth_hdr +
-   *l2_len + *l3_len);
-   *l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
-   } else
-   *l4_len = 0;
 }

 /* modify the IPv4 or IPv4 source address of a packet */
-- 
2.1.3

[dpdk-dev] [RFC 11/16] testpmd: rename vxlan in outer_ip in csum commands

2015-01-22 Thread Olivier Matz

The tx_checksum command concerns outer IP checksum, not VxLAN checksum.
Actually there is no checkum in VxLAN header, there is one checksum in
outer IP header, and one checksum in outer UDP header. This option only
controls the outer IP checksum.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c  | 16 
 app/test-pmd/csumonly.c | 25 ++---
 app/test-pmd/testpmd.h  |  4 ++--
 3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 1d294bc..451c728 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -316,12 +316,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"Disable hardware insertion of a VLAN header in"
" packets sent on a port.\n\n"

-   "csum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)\n"
+   "csum set (ip|udp|tcp|sctp|outer-ip) (hw|sw) 
(port_id)\n"
"Select hardware or software calculation of the"
" checksum with when transmitting a packet using the"
" csum forward engine.\n"
"ip|udp|tcp|sctp always concern the inner layer.\n"
-   "vxlan concerns the outer IP and UDP layer (in"
+   "outer-ip concerns the outer IP layer (in"
" case the packet is recognized as a vxlan packet by"
" the forward engine)\n"
"Please check the NIC datasheet for HW limits.\n\n"
@@ -2887,8 +2887,8 @@ csum_show(int port_id)
(ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) ? "hw" : "sw");
printf("SCTP checksum offload is %s\n",
(ol_flags & TESTPMD_TX_OFFLOAD_SCTP_CKSUM) ? "hw" : "sw");
-   printf("VxLAN checksum offload is %s\n",
-   (ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM) ? "hw" : "sw");
+   printf("Outer-Ip checksum offload is %s\n",
+   (ol_flags & TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM) ? "hw" : "sw");

/* display warnings if configuration is not supported by the NIC */
rte_eth_dev_info_get(port_id, &dev_info);
@@ -2942,8 +2942,8 @@ cmd_csum_parsed(void *parsed_result,
mask = TESTPMD_TX_OFFLOAD_TCP_CKSUM;
} else if (!strcmp(res->proto, "sctp")) {
mask = TESTPMD_TX_OFFLOAD_SCTP_CKSUM;
-   } else if (!strcmp(res->proto, "vxlan")) {
-   mask = TESTPMD_TX_OFFLOAD_VXLAN_CKSUM;
+   } else if (!strcmp(res->proto, "outer-ip")) {
+   mask = TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM;
}

if (hw)
@@ -2962,7 +2962,7 @@ cmdline_parse_token_string_t cmd_csum_mode =
mode, "set");
 cmdline_parse_token_string_t cmd_csum_proto =
TOKEN_STRING_INITIALIZER(struct cmd_csum_result,
-   proto, "ip#tcp#udp#sctp#vxlan");
+   proto, "ip#tcp#udp#sctp#outer-ip");
 cmdline_parse_token_string_t cmd_csum_hwsw =
TOKEN_STRING_INITIALIZER(struct cmd_csum_result,
hwsw, "hw#sw");
@@ -2974,7 +2974,7 @@ cmdline_parse_inst_t cmd_csum_set = {
.f = cmd_csum_parsed,
.data = NULL,
.help_str = "enable/disable hardware calculation of L3/L4 checksum when 
"
-   "using csum forward engine: csum set ip|tcp|udp|sctp|vxlan 
hw|sw ",
+   "using csum forward engine: csum set ip|tcp|udp|sctp|outer-ip 
hw|sw ",
.tokens = {
(void *)&cmd_csum_csum,
(void *)&cmd_csum_mode,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 858eb47..3921643 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -259,13 +259,16 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t 
outer_ethertype,
ipv4_hdr->hdr_checksum = 0;
ol_flags |= PKT_TX_OUTER_IPV4;

-   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM)
ol_flags |= PKT_TX_OUTER_IP_CKSUM;
else
ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
-   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
+   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_OUTER_IP_CKSUM)
ol_flags |= PKT_TX_OUTER_IPV6;

+   /* outer UDP checksum is always done in software as we have no
+* hardware supporting it today, and no API for it. */
+
udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + outer_l3_len);
/* do not recalculate udp cksum if it was 0 */
if (udp_hdr->dgram_cksum != 0) {
@@ -300,8 +303,8 @@ process_outer_cksums(void *outer_l3_hdr, uint16_t 
outer_ethertype,
  * The testpmd comm

[dpdk-dev] [RFC 10/16] testpmd: add csum parse_tunnel command

2015-01-22 Thread Olivier Matz

Add a new command related to csum forward engine:

  csum parse-tunnel (on|off) (port_id)

If enabled, the tunnel packets received by the csum forward engine are
parsed and seen as "outer-headers/inner-headers/data".

If disabled, the parsing of the csum forward engine stops at the first
l4 layer. A tunnel packet is seens as "headers/data" (inner headers are
included in payload).

Signed-off-by: Olivier Matz 
---
 app/test-pmd/cmdline.c  | 64 +
 app/test-pmd/csumonly.c |  3 ++-
 app/test-pmd/testpmd.h  |  5 +++-
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 260a273..1d294bc 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -326,6 +326,10 @@ static void cmd_help_long_parsed(void *parsed_result,
" the forward engine)\n"
"Please check the NIC datasheet for HW limits.\n\n"

+   "csum parse-tunnel (on|off) (port_id)\n"
+   "If disabled, treat tunnel packets as non-tunneled"
+   " packets (treat inner headers as payload).\n\n"
+
"csum show (port_id)\n"
"Display tx checksum offload configuration\n\n"

@@ -2873,6 +2877,8 @@ csum_show(int port_id)
uint16_t ol_flags;

ol_flags = ports[port_id].tx_ol_flags;
+   printf("Parse tunnel is %s\n",
+   (ol_flags & TESTPMD_TX_OFFLOAD_PARSE_TUNNEL) ? "on" : "off");
printf("IP checksum offload is %s\n",
(ol_flags & TESTPMD_TX_OFFLOAD_IP_CKSUM) ? "hw" : "sw");
printf("UDP checksum offload is %s\n",
@@ -2995,6 +3001,63 @@ cmdline_parse_inst_t cmd_csum_show = {
},
 };

+/* Enable/disable tunnel parsing */
+struct cmd_csum_tunnel_result {
+   cmdline_fixed_string_t csum;
+   cmdline_fixed_string_t parse;
+   cmdline_fixed_string_t onoff;
+   uint8_t port_id;
+};
+
+static void
+cmd_csum_tunnel_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_csum_tunnel_result *res = parsed_result;
+
+   if (port_id_is_invalid(res->port_id)) {
+   printf("invalid port %d\n", res->port_id);
+   return;
+   }
+
+   if (!strcmp(res->onoff, "on"))
+   ports[res->port_id].tx_ol_flags |=
+   TESTPMD_TX_OFFLOAD_PARSE_TUNNEL;
+   else
+   ports[res->port_id].tx_ol_flags &=
+   (~TESTPMD_TX_OFFLOAD_PARSE_TUNNEL);
+
+   csum_show(res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_csum_tunnel_csum =
+   TOKEN_STRING_INITIALIZER(struct cmd_csum_tunnel_result,
+   csum, "csum");
+cmdline_parse_token_string_t cmd_csum_tunnel_parse =
+   TOKEN_STRING_INITIALIZER(struct cmd_csum_tunnel_result,
+   parse, "parse_tunnel");
+cmdline_parse_token_string_t cmd_csum_tunnel_onoff =
+   TOKEN_STRING_INITIALIZER(struct cmd_csum_tunnel_result,
+   onoff, "on#off");
+cmdline_parse_token_num_t cmd_csum_tunnel_portid =
+   TOKEN_NUM_INITIALIZER(struct cmd_csum_tunnel_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_csum_tunnel = {
+   .f = cmd_csum_tunnel_parsed,
+   .data = NULL,
+   .help_str = "enable/disable parsing of tunnels for csum engine: "
+   "csum parse_tunnel on|off ",
+   .tokens = {
+   (void *)&cmd_csum_tunnel_csum,
+   (void *)&cmd_csum_tunnel_parse,
+   (void *)&cmd_csum_tunnel_onoff,
+   (void *)&cmd_csum_tunnel_portid,
+   NULL,
+   },
+};
+
 /* *** ENABLE HARDWARE SEGMENTATION IN TX PACKETS *** */
 struct cmd_tso_set_result {
cmdline_fixed_string_t tso;
@@ -8731,6 +8794,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_tx_vlan_set_pvid,
(cmdline_parse_inst_t *)&cmd_csum_set,
(cmdline_parse_inst_t *)&cmd_csum_show,
+   (cmdline_parse_inst_t *)&cmd_csum_tunnel,
(cmdline_parse_inst_t *)&cmd_tso_set,
(cmdline_parse_inst_t *)&cmd_tso_show,
(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index ca5ca39..858eb47 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -373,7 +373,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
l3_hdr = (char *)eth_hdr + l2_len;

/* check if it's a supported tunnel (only vxlan for now) */
-   if (l4_proto == IPPROTO_UDP) {
+   if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_PARSE_TUNNEL) &&
+   l4_proto == IPPROTO_UDP) {
udp_hdr = (struct udp_hdr *)((char *)l3_hdr + l3_len);

1 2 >

1 - 100 of 111 matches

Mail list logo