[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin
Hi Anuj,

Thanks for fixes!
I have 2 comments
- from i40e_ethdev.h : #define I40E_DEFAULT_RX_WTHRESH  0
- (26 + 32) / 4 (batched descriptor writeback) should be (26 + 4 * 32) / 4
(batched descriptor writeback)
, thus we have 135 bytes/packet

This corresponds to 58.8 Mpps

Regards,
Vladimir

2015-07-01 17:22 GMT+03:00 Anuj Kalia :

> Vladimir,
>
> Few possible fixes to your PCIe analysis (let me know if I'm wrong):
> - ECRC is probably disabled (check using sudo lspci -vvv | grep
> CGenEn-), so TLP header is 26 bytes
> - Descriptor writeback can be batched using high value of WTHRESH,
> which is what DPDK uses by default
> - Read request contains full TLP header (26 bytes)
>
> Assuming WTHRESH = 4, bytes transferred from NIC to host per packet =
> 26 + 64 (packet itself) +
> (26 + 32) / 4 (batched descriptor writeback) +
> (26 / 4) (read request for new descriptors) =
> 111 bytes / packet
>
> This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP
> overhead, rate = 67.4 Mpps
>
> --Anuj
>
>
>
> On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin 
> wrote:
> > In case with syn flood you should take into account return syn-ack
> traffic,
> > which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
> > faster. And don't forget about DLLP's generated by rx traffic, which
> > saturates host-to-NIC bus.
> >
> > 2015-07-01 16:05 GMT+03:00 Pavel Odintsov :
> >
> >> Yes, Bruce, we understand this. But we are working with huge SYN
> >> attacks processing and they are 64byte only :(
> >>
> >> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
> >>  wrote:
> >> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
> >> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
> >> >> achieve 40GE line rate...
> >> >>
> >> > Note that this would only apply for your minimal i.e. 64-byte, packet
> >> sizes.
> >> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
> >> requirements
> >> > are lower and you can easier achieve line rate.
> >> >
> >> > /Bruce
> >> >
> >> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
> >> medvedkinv at gmail.com> wrote:
> >> >> > Hi Pavel,
> >> >> >
> >> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710
> rx
> >> only
> >> >> > case.
> >> >> > Assume we have 32byte descriptors (if we want more offload).
> >> >> > DMA makes one pcie transaction with packet payload, one descriptor
> >> writeback
> >> >> > and one memory request for free descriptors for every 4 packets.
> For
> >> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY +
> 6
> >> DLL +
> >> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
> >> itself) +
> >> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> >> >> > descriptors). Note that we do not take into account PCIe
> ACK/NACK/FC
> >> Update
> >> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits
> 1
> >> byte in
> >> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20
> ns.
> >> Thus
> >> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
> >> >> > Correct me if I'm wrong.
> >> >> >
> >> >> > Regards,
> >> >> > Vladimir
> >> >> >
> >> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours, Pavel Odintsov
> >>
>


[dpdk-dev] DPDK Hash library

2015-07-01 Thread Abdul, Jaffar
Hi,

I am wondering how can I use the hash library if I don't know the number of 
entries in the bucket (number of entries in the bucket can grow dynamically)
I am trying to use the DPDK hash library for MAC table where I can't give the 
fixed number of elements in each bucket.

Please let me know if you have suggestions on this topic 

Thanks in advance!

Thanks
Jaffar




[dpdk-dev] 100% cpu of VM (OS windows 7)

2015-07-01 Thread Wei Li
dpdk:   2.0.0
ovs:2.4.90
qemu:   2.3.0
OS of vm: windows7 64bit
driver of virtio for windows: virtio-win-0.1.96_amd64

ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 tap0 -- set Interface tap0 type=dpdkvhostuser
qemu-system-x86_64 win.img -cpu host -smp 2 --enable-kvm -m 2G -vnc :1 
-object 
memory-backend-file,id=mem,size=2G,mem-path=/dev/hugepages,share=on 
-numa node,memdev=mem -mem-prealloc -chardev 
socket,id=char1,path=/var/run/openvswitch/tap0 -netdev 
type=vhost-user,id=net1,chardev=char1,vhostforce -device 
virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
 
&


|-||-|
| VM  || ovs |use vhostuser
|-||-|

When VM receiving packets from virtionet, the cpu of vm get 100%, but 
sending packets looks like ok

if the OS of VM is Linux, there is no problem

what different between windows 7 and linux?

Have anyone meet the problem?


[dpdk-dev] g++ applications doesn't compile anymore when dpdk header files are included

2015-07-01 Thread Dirk-Holger Lenz
Hello,
the g++ complains the following problems when rte_.h header files 
are included:
   x86_64-native-linuxapp-gcc/include/rte_common.h:95:59: warning: 
invalid conversion from ?void*? to ?rte_mempool_objhdr*? this 
appears in multiple flavours
   -as a workaround the '-fpermissive' option of the g++ may be used but 
that's not nice.
x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:50:6: 
error: use of enum ?rte_cpu_flag_t? without previous declaration enum 
rte_cpu_flag_t;
   x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:55:6: 
error: use of enum ?cpu_register_t? without previous declaration
  enum cpu_register_t;
x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:104:31: 
error: use of enum ?rte_cpu_flag_t? without previous declaration 
rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
   -as a workaround the file 
'lib/librte_eal/common/include/generic/rte_cpuflags.h' may be modified:
   remove or uncomment the following lines:
line 50: //enum rte_cpu_flag_t;
line 55: //enum cpu_register_t;
lines 103/104: //static inline int
   //rte_cpu_get_flag_enabled(enum 
rte_cpu_flag_t feature);

Best regards
Dirk
-- 
Dirk-Holger Lenz
+49 157 848 09099
dirk.lenz at ng4t.com
ng 4 T  Telecommunication
Technology
Testing
Tools


ng4T GmbH
Siemensdamm 50
13629 Berlin
Germany
www.ng4t.com 
+49 30 65218530

Berlin-Charlottenburg, HRB 123546
CEO Dr. Andreas Kallmann


[dpdk-dev] g++ applications doesn't compile anymore when dpdk header files are included

2015-07-01 Thread Sanford, Robert
Hello Dirk,

We recently resolved a similar problem with rte_cpuflags.h.
I'm attaching a git-diff of how we worked around it.

We may submit a patch for this, eventually.

--
Regards,
Robert


>Hello,
>the g++ complains the following problems when rte_.h header files
>are included:
>   x86_64-native-linuxapp-gcc/include/rte_common.h:95:59: warning:
>invalid conversion from ?void*? to ?rte_mempool_objhdr*? this
>appears in multiple flavours
>   -as a workaround the '-fpermissive' option of the g++ may be used but
>that's not nice.
>x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:50:6:
>error: use of enum ?rte_cpu_flag_t? without previous declaration enum
>rte_cpu_flag_t;
>   x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:55:6:
>error: use of enum ?cpu_register_t? without previous declaration
>  enum cpu_register_t;
>x86_64-native-linuxapp-gcc/include/generic/rte_cpuflags.h:104:31:
>error: use of enum ?rte_cpu_flag_t? without previous declaration
>rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
>   -as a workaround the file
>'lib/librte_eal/common/include/generic/rte_cpuflags.h' may be modified:
>   remove or uncomment the following lines:
>line 50: //enum rte_cpu_flag_t;
>line 55: //enum cpu_register_t;
>lines 103/104: //static inline int
>   //rte_cpu_get_flag_enabled(enum
>rte_cpu_flag_t feature);
>
>Best regards
>Dirk
>-- 
>Dirk-Holger Lenz
>+49 157 848 09099
>dirk.lenz at ng4t.com
>   ng 4 T  Telecommunication
>Technology
>Testing
>Tools
>
>
>ng4T GmbH
>Siemensdamm 50
>13629 Berlin
>Germany
>www.ng4t.com <http://www.ng4t.com/>
>+49 30 65218530
>
>Berlin-Charlottenburg, HRB 123546
>CEO Dr. Andreas Kallmann

-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: cpuflags-diff.txt
URL: 
<http://dpdk.org/ml/archives/dev/attachments/20150701/43e5c753/attachment.txt>


[dpdk-dev] OVDK userspace vhost : Issue with VIRTIO_RING_F_INDIRECT_DESC capability

2015-07-01 Thread sai kiran
Hi,

I am using OVDK userspace-vhost interfaces for VM-to-external communication
and facing an issue with them.

I am using the topology mentioned in
https://github.com/01org/dpdk-ovs/blob/development/docs/04_Sample_Configurations/02_Userspace-vHost.md
But the Guest is a Freebsd VM and our own custom userspace virtio drivers.
Guest does not have DPDK.


   1. I could start OVDK application and vswitchd, and provision a FreeBSD
   Guest VM with two userspace-vhost interfaces.
   2. These two userspace-vhost interfaces do not have
VIRTIO_RING_F_INDIRECT_DESC
   capability negotiated from backend OVDK-Qemu
   3. Inside the freebsd VM, i have my own userspace drivers running, which
   make use of indirect descriptors of virtio.
   4. *Question 1* : By default, without INDIRECT descriptor capability, my
   drivers fail to run.Is there any way to increase ring size of
   virtio? This can help us  avoid using indirect descriptors because of
   more space in the ring
   5. *Question 2*: When I try to set the capability from backend QEMU,
   guest drivers do not see any packets reaching the guest. Is there
   any way to resolve this ??

Any help/suggestion would be of great help.

*Thanks & Regards,*
*Saikiran V*


[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin
In case with syn flood you should take into account return syn-ack traffic,
which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
faster. And don't forget about DLLP's generated by rx traffic, which
saturates host-to-NIC bus.

2015-07-01 16:05 GMT+03:00 Pavel Odintsov :

> Yes, Bruce, we understand this. But we are working with huge SYN
> attacks processing and they are 64byte only :(
>
> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
>  wrote:
> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
> >> achieve 40GE line rate...
> >>
> > Note that this would only apply for your minimal i.e. 64-byte, packet
> sizes.
> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
> requirements
> > are lower and you can easier achieve line rate.
> >
> > /Bruce
> >
> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
> medvedkinv at gmail.com> wrote:
> >> > Hi Pavel,
> >> >
> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx
> only
> >> > case.
> >> > Assume we have 32byte descriptors (if we want more offload).
> >> > DMA makes one pcie transaction with packet payload, one descriptor
> writeback
> >> > and one memory request for free descriptors for every 4 packets. For
> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
> DLL +
> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
> itself) +
> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> >> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC
> Update
> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1
> byte in
> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
> Thus
> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
> >> > Correct me if I'm wrong.
> >> >
> >> > Regards,
> >> > Vladimir
> >> >
> >> >
>
>
>
> --
> Sincerely yours, Pavel Odintsov
>


[dpdk-dev] [PATCH v4] Add unit test for thash library

2015-07-01 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 07:41:13PM -0400, Vladimir Medvedkin wrote:
> Add unit test for thash library
> 
> v4 changes
> - Reflect rte_thash.h changes
> 
> v3 changes
> - Fix checkpatch errors
> 
> v2 changes
> - fix typo
> - remove unnecessary comments
> 
> Signed-off-by: Vladimir Medvedkin 

Acked-by: Bruce Richardson 



[dpdk-dev] [PATCH v6] Add toeplitz hash algorithm used by RSS

2015-07-01 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 07:40:20PM -0400, Vladimir Medvedkin wrote:
> Software implementation of the Toeplitz hash function used by RSS.
> Can be used either for packet distribution on single queue NIC
> or for simulating of RSS computation on specific NIC (for example
> after GRE header decapsulating).
> 
> v6 changes
> - Fix compilation error
> - Rename some defines and function
> 
> v5 changes
> - Fix errors reported by checkpatch.pl
> 
> v4 changes
> - Fix copyright
> - rename bswap_mask constant, add rte_ prefix
> - change rte_ipv[46]_tuple struct
> - change rte_thash_load_v6_addr prototype
> 
> v3 changes
> - Rework API to be more generic
> - Add sctp_tag into tuple
> 
> v2 changes
> - Add ipv6 support
> - Various style fixes
> 
> Signed-off-by: Vladimir Medvedkin 

Acked-by: Bruce Richardson 



[dpdk-dev] [PATCH] cxgbe: fix compilation using icc

2015-07-01 Thread Bruce Richardson
When compiling the cxgbe driver with icc, multiple errors about using
enums as integers appear across a number of files, including in the base
code and in the DPDK-specific driver code.

.../drivers/net/cxgbe/cxgbe_main.c(386): error #188: enumerated type mixed
with another type
t4_get_port_type_description(pi->port_type));
 ^
For the errors in the base driver code we use the CFLAGS_BASE_DRIVER 
approach used by other drivers to disable warnings.

For errors in the DPDK-specific code, typecasts are used to fix the
errors in the code itself.

Signed-off-by: Bruce Richardson 
---
 drivers/net/cxgbe/Makefile | 17 +
 drivers/net/cxgbe/cxgbe_main.c |  5 +++--
 drivers/net/cxgbe/sge.c|  3 ++-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/cxgbe/Makefile b/drivers/net/cxgbe/Makefile
index ae12d75..0711976 100644
--- a/drivers/net/cxgbe/Makefile
+++ b/drivers/net/cxgbe/Makefile
@@ -49,7 +49,7 @@ ifeq ($(CC), icc)
 #
 # CFLAGS for icc
 #
-CFLAGS_BASE_DRIVER = -wd174 -wd593 -wd869 -wd981 -wd2259
+CFLAGS_BASE_DRIVER = -wd188
 else
 #
 # CFLAGS for gcc/clang
@@ -57,18 +57,27 @@ else
 ifeq ($(shell test $(CC) = gcc && test $(GCC_VERSION) -ge 44 && echo 1), 1)
 CFLAGS += -Wno-deprecated
 endif
-CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value
-CFLAGS_BASE_DRIVER += -Wno-strict-aliasing -Wno-format-extra-args
+CFLAGS_BASE_DRIVER =

 endif

 #
+# Add extra flags for base driver files (also known as shared code)
+# to disable warnings in them
+#
+BASE_DRIVER_OBJS=$(patsubst %.c,%.o,$(notdir $(wildcard $(SRCDIR)/base/*.c)))
+$(foreach obj, $(BASE_DRIVER_OBJS), $(eval 
CFLAGS_$(obj)+=$(CFLAGS_BASE_DRIVER)))
+
+VPATH += $(SRCDIR)/base
+
+
+#
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe_main.c
 SRCS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += sge.c
-SRCS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += base/t4_hw.c
+SRCS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += t4_hw.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index dad0a98..b879820 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -383,7 +383,8 @@ static void print_port_info(struct adapter *adap)
if (bufp != buf)
--bufp;
sprintf(bufp, "BASE-%s",
-   t4_get_port_type_description(pi->port_type));
+   t4_get_port_type_description(
+   (enum fw_port_type)pi->port_type));

dev_info(adap,
 " " PCI_PRI_FMT " Chelsio rev %d %s %s\n",
@@ -629,7 +630,7 @@ static int adap_init0(struct adapter *adap)
dev_err(adap, "Failed to restart. Exit.\n");
goto bye;
}
-   state &= ~DEV_STATE_INIT;
+   state = (enum dev_state)((unsigned)state & ~DEV_STATE_INIT);
}

t4_get_fw_version(adap, >params.fw_vers);
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 4da6320..359296e 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -1670,7 +1670,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
if (fl) {
struct sge_eth_rxq *rxq = container_of(fl, struct sge_eth_rxq,
   fl);
-   enum chip_type chip = CHELSIO_CHIP_VERSION(adap->params.chip);
+   enum chip_type chip = (enum chip_type)CHELSIO_CHIP_VERSION(
+   adap->params.chip);

/*
 * Allocate the ring for the hardware free list (with space
-- 
2.4.3



[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Pavel Odintsov
Yes, Bruce, we understand this. But we are working with huge SYN
attacks processing and they are 64byte only :(

On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
 wrote:
> On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
>> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
>> achieve 40GE line rate...
>>
> Note that this would only apply for your minimal i.e. 64-byte, packet sizes.
> Once you go up to larger e.g. 128B packets, your PCI bandwidth requirements
> are lower and you can easier achieve line rate.
>
> /Bruce
>
>> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin  
>> wrote:
>> > Hi Pavel,
>> >
>> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx only
>> > case.
>> > Assume we have 32byte descriptors (if we want more offload).
>> > DMA makes one pcie transaction with packet payload, one descriptor 
>> > writeback
>> > and one memory request for free descriptors for every 4 packets. For
>> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6 DLL +
>> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet itself) +
>> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
>> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC Update
>> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1 byte 
>> > in
>> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.  Thus
>> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
>> > Correct me if I'm wrong.
>> >
>> > Regards,
>> > Vladimir
>> >
>> >



-- 
Sincerely yours, Pavel Odintsov


[dpdk-dev] [PATCH] virtio: fix the vq size issue

2015-07-01 Thread Xie, Huawei
On 7/1/2015 3:49 PM, Ouyang Changchun wrote:
> This commit breaks virtio basic packets rx functionality:
>   d78deadae4dca240e85054bf2d604a801676becc
>
> The QEMU use 256 as default vring size, also use this default value to 
> calculate the virtio
> avail ring base address and used ring base address, and vhost in the backend 
> use the ring base
> address to do packet IO.
>
> Virtio spec also says the queue size in PCI configuration is read-only, so 
> virtio front end
> can't change it. just need use the read-only value to allocate space for 
> vring and calculate the
> avail and used ring base address. Otherwise, the avail and used ring base 
> address will be different
> between host and guest, accordingly, packet IO can't work normally.
virtio driver could still use the vq_size to initialize avail ring and
use ring so that they still have the same base address.
The other issue is vhost use  index & (vq->size -1) to index the ring.


Thomas:
This fix works but introduces slight change with original code. Could we
just rollback that commit?

d78deadae4dca240e85054bf2d604a801676becc


>
> Signed-off-by: Changchun Ouyang 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 +++---
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
>



[dpdk-dev] [PATCH] virtio: fix the vq size issue

2015-07-01 Thread Ouyang Changchun
This commit breaks virtio basic packets rx functionality:
  d78deadae4dca240e85054bf2d604a801676becc

The QEMU use 256 as default vring size, also use this default value to 
calculate the virtio
avail ring base address and used ring base address, and vhost in the backend 
use the ring base
address to do packet IO.

Virtio spec also says the queue size in PCI configuration is read-only, so 
virtio front end
can't change it. just need use the read-only value to allocate space for vring 
and calculate the
avail and used ring base address. Otherwise, the avail and used ring base 
address will be different
between host and guest, accordingly, packet IO can't work normally.

Signed-off-by: Changchun Ouyang 
---
 drivers/net/virtio/virtio_ethdev.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index fe5f9a1..d84de13 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -263,8 +263,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 */
vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
-   if (nb_desc == 0)
-   nb_desc = vq_size;
if (vq_size == 0) {
PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
return -EINVAL;
@@ -275,15 +273,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

-   if (nb_desc < vq_size) {
-   if (!rte_is_power_of_2(nb_desc)) {
-   PMD_INIT_LOG(ERR,
-"nb_desc(%u) size is not powerof 2",
-nb_desc);
-   return -EINVAL;
-   }
-   vq_size = nb_desc;
-   }
+   if (nb_desc != vq_size)
+   PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size 
(%d), fall to vq size",
+   nb_desc, vq_size);

if (queue_type == VTNET_RQ) {
snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
-- 
1.8.4.2



[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Pavel Odintsov
Thanks for answer, Vladimir! So we need look for x16 NIC if we want
achieve 40GE line rate...

On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin  
wrote:
> Hi Pavel,
>
> Looks like you ran into pcie bottleneck. So let's calculate xl710 rx only
> case.
> Assume we have 32byte descriptors (if we want more offload).
> DMA makes one pcie transaction with packet payload, one descriptor writeback
> and one memory request for free descriptors for every 4 packets. For
> Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6 DLL +
> 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet itself) +
> 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> descriptors). Note that we do not take into account PCIe ACK/NACK/FC Update
> DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1 byte in
> 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.  Thus
> in theory pcie 3.0 x8 may transfer not more than 50mpps.
> Correct me if I'm wrong.
>
> Regards,
> Vladimir
>
>
> 2015-06-29 18:41 GMT+03:00 Pavel Odintsov :
>>
>> Hello, Andrew!
>>
>> What NIC have you used? Is it XL710?
>>
>> On Mon, Jun 29, 2015 at 6:38 PM, Andrew Theurer 
>> wrote:
>> >
>> >
>> > On Mon, Jun 29, 2015 at 10:06 AM, Keunhong Lee 
>> > wrote:
>> >>
>> >> I have not used XL710 or i40e.
>> >> I have no opinion for those NICs.
>> >>
>> >> Keunhong.
>> >>
>> >> 2015-06-29 15:59 GMT+09:00 Pavel Odintsov :
>> >>
>> >> > Hello!
>> >> >
>> >> > Lee, thank you so much for sharing your experience! What do you think
>> >> > about 40GE version of 82599?
>> >> >
>> >> > On Mon, Jun 29, 2015 at 2:35 AM, Keunhong Lee 
>> >> > wrote:
>> >> > > DISCLAIMER: This information is not verified. This is truly my
>> >> > > personal
>> >> > > opinion.
>> >> > >
>> >> > > As I know, intel 82599 is the only 10G NIC which supports line rate
>> >> > > with
>> >> > > minimum sized packets (64 byte).
>> >> > > According to our internal tests, Mellanox's 40G NICs even support
>> >> > > less
>> >> > than
>> >> > > 30Mpps.
>> >> > > I think 40 Mpps is the hardware capacity.
>> >
>> >
>> > This is approximately what I see as well.
>> >
>> >>
>> >> > >
>> >> > > Keunhong.
>> >> > >
>> >> > >
>> >> > >
>> >> > > 2015-06-28 19:34 GMT+09:00 Pavel Odintsov
>> >> > > :
>> >> > >>
>> >> > >> Hello, folks!
>> >> > >>
>> >> > >> We have execute bunch of tests for receive data with Intel XL710
>> >> > >> 40GE
>> >> > >> NIC. We want to achieve wire speed on this platform for traffic
>> >> > >> capture.
>> >> > >>
>> >> > >> But we definitely can't do it. We tried with different versions of
>> >> > >> DPDK: 1.4, 1.6, 1.8, 2.0. And have not success.
>> >> > >>
>> >> > >> We achieved only 40Mpps and could do more.
>> >> > >>
>> >> > >> Could anybody help us with this issue? Looks like this NIC's could
>> >> > >> not
>> >> > >> work on wire speed :(
>> >> > >>
>> >> > >> Platform: Intel Xeon E5 e5 2670 + XL 710.
>> >> > >>
>> >> > >> --
>> >> > >> Sincerely yours, Pavel Odintsov
>> >> > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sincerely yours, Pavel Odintsov
>> >> >
>> >
>> > -Andrew
>> >
>> >
>>
>>
>>
>> --
>> Sincerely yours, Pavel Odintsov
>
>



-- 
Sincerely yours, Pavel Odintsov


[dpdk-dev] [PATCH v7 12/12] eal: Consolidate rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp

2015-07-01 Thread Tetsuya Mukawa
On 2015/06/30 23:56, Iremonger, Bernard wrote:
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Tuesday, June 30, 2015 9:24 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
>> Tetsuya.Mukawa
>> Subject: [PATCH v7 12/12] eal: Consolidate
>> rte_eal_pci_probe/close_one_driver() of linuxapp and bsdapp
>>
>> From: "Tetsuya.Mukawa" 
>>
>> This patch consolidates below functions, and implements these in common
>> eal code.
>>  - rte_eal_pci_probe_one_driver()
>>  - rte_eal_pci_close_one_driver()
>>
>> Because pci_map_device() is only implemented in linuxapp, the patch
>> implements it in bsdapp too. This implemented function will be merged to
>> linuxapp one with later patch.
> Hi Tetsuya,
>
> The description lines above seem to be out of date now as pci_map_device() is 
> not implemented in the bsdapp now.
>
> Regards,
>
> Bernard.
>

Hi Bernard,

Yes, I needed to be change above description.
I will fix it in next patch series.

Regards,
Tetsuya


[dpdk-dev] [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD

2015-07-01 Thread Tetsuya Mukawa
On 2015/07/01 0:40, Bruce Richardson wrote:
> On Tue, Jun 30, 2015 at 04:08:08PM +0100, Iremonger, Bernard wrote:
>>> -Original Message-
>>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>>> Sent: Tuesday, June 30, 2015 9:27 AM
>>> To: dev at dpdk.org
>>> Cc: Iremonger, Bernard; david.marchand at 6wind.com; Richardson, Bruce;
>>> Tetsuya.Mukawa
>>> Subject: [PATCH v4] eal: Enable Port Hotplug as default in Linux and BSD
>>>
>>> From: "Tetsuya.Mukawa" 
>>>
>>> This patch removes CONFIG_RTE_LIBRTE_EAL_HOTPLUG option, and enables
>>> it as default in both Linux and BSD.
>>> Also, to support port hotplug, rte_eal_pci_scan() and below missing symbols
>>> should be exported to ethdev library.
>>>  - rte_eal_parse_devargs_str()
>>>  - rte_eal_pci_close_one()
>>>  - rte_eal_pci_probe_one()
>>>  - rte_eal_pci_scan()
>>>  - rte_eal_vdev_init()
>>>  - rte_eal_vdev_uninit()
>>>
>>> Signed-off-by: Tetsuya Mukawa 
>> Hi Tetsuya,
>>
>> Would it be cleaner to add this patch to the  [PATCH  v7 12/12] patch set as 
>>  patch 13 rather than having it as a separate patch?
>>
> The other patch set is cleanup, merging BSD and Linuxapp code, so I think it's
> best kept as a separate set. New features I'd suggest keeping separate from
> cleanup. 
> That being said, it is only one patch, so it probably doesn't matter much 
> either
> way. :-)
>
> /Bruce

Thanks for commenting. Let's keep separating.
It will be easy for me to write cover-letter of the patch series.

Regards,
Tetsuya


[dpdk-dev] [PATCH] fm10k: fix an error message when adding default VLAN

2015-07-01 Thread Thomas Monjalon
2015-06-26 10:37, Shaopeng He:
> The default MAC address is directly copied to Device Ethernet
> Link address array in the device initialize phase, which

Do you mean "device start phase" instead?

> bypasses fm10k MAC address number check mechanism, and will
> cause an error message when adding default VLAN. Fix it by

What is the error message?
Is it only an error message or a behaviour error?

> moving default MAC address registration to device
> initialize phase.

Yes it is moved from start to init.

> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -791,14 +791,10 @@ fm10k_dev_start(struct rte_eth_dev *dev)
>   }
>   }
>  
> - if (hw->mac.default_vid && hw->mac.default_vid <= ETHER_MAX_VLAN_ID) {
> - /* Update default vlan */
> + /* Update default vlan */
> + if (hw->mac.default_vid && hw->mac.default_vid <= ETHER_MAX_VLAN_ID)
>   fm10k_vlan_filter_set(dev, hw->mac.default_vid, true);
>  
> - /* Add default mac/vlan filter to PF/Switch manager */
> - fm10k_MAC_filter_set(dev, hw->mac.addr, true);
> - }
> -
>   return 0;
>  }
>  
> @@ -2144,6 +2140,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
>  
>   fm10k_mbx_unlock(hw);
>  
> + /* Add default mac address */
> + fm10k_MAC_filter_set(dev, hw->mac.addr, true);
>  
>   return 0;
>  }
> 




[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Vladimir Medvedkin
Hi Pavel,

Looks like you ran into pcie bottleneck. So let's calculate xl710 rx only
case.
Assume we have 32byte descriptors (if we want more offload).
DMA makes one pcie transaction with packet payload, one descriptor
writeback and one memory request for free descriptors for every 4 packets.
For Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
DLL + 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
itself) + 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
descriptors). Note that we do not take into account PCIe ACK/NACK/FC Update
DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1 byte
in 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
Thus in theory pcie 3.0 x8 may transfer not more than 50mpps.
Correct me if I'm wrong.

Regards,
Vladimir


2015-06-29 18:41 GMT+03:00 Pavel Odintsov :

> Hello, Andrew!
>
> What NIC have you used? Is it XL710?
>
> On Mon, Jun 29, 2015 at 6:38 PM, Andrew Theurer 
> wrote:
> >
> >
> > On Mon, Jun 29, 2015 at 10:06 AM, Keunhong Lee 
> wrote:
> >>
> >> I have not used XL710 or i40e.
> >> I have no opinion for those NICs.
> >>
> >> Keunhong.
> >>
> >> 2015-06-29 15:59 GMT+09:00 Pavel Odintsov :
> >>
> >> > Hello!
> >> >
> >> > Lee, thank you so much for sharing your experience! What do you think
> >> > about 40GE version of 82599?
> >> >
> >> > On Mon, Jun 29, 2015 at 2:35 AM, Keunhong Lee 
> >> > wrote:
> >> > > DISCLAIMER: This information is not verified. This is truly my
> >> > > personal
> >> > > opinion.
> >> > >
> >> > > As I know, intel 82599 is the only 10G NIC which supports line rate
> >> > > with
> >> > > minimum sized packets (64 byte).
> >> > > According to our internal tests, Mellanox's 40G NICs even support
> less
> >> > than
> >> > > 30Mpps.
> >> > > I think 40 Mpps is the hardware capacity.
> >
> >
> > This is approximately what I see as well.
> >
> >>
> >> > >
> >> > > Keunhong.
> >> > >
> >> > >
> >> > >
> >> > > 2015-06-28 19:34 GMT+09:00 Pavel Odintsov  >:
> >> > >>
> >> > >> Hello, folks!
> >> > >>
> >> > >> We have execute bunch of tests for receive data with Intel XL710
> 40GE
> >> > >> NIC. We want to achieve wire speed on this platform for traffic
> >> > >> capture.
> >> > >>
> >> > >> But we definitely can't do it. We tried with different versions of
> >> > >> DPDK: 1.4, 1.6, 1.8, 2.0. And have not success.
> >> > >>
> >> > >> We achieved only 40Mpps and could do more.
> >> > >>
> >> > >> Could anybody help us with this issue? Looks like this NIC's could
> >> > >> not
> >> > >> work on wire speed :(
> >> > >>
> >> > >> Platform: Intel Xeon E5 e5 2670 + XL 710.
> >> > >>
> >> > >> --
> >> > >> Sincerely yours, Pavel Odintsov
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Sincerely yours, Pavel Odintsov
> >> >
> >
> > -Andrew
> >
> >
>
>
>
> --
> Sincerely yours, Pavel Odintsov
>


[dpdk-dev] How to get net_device and use struct ethtool_cmd at DPDK enverinment?

2015-07-01 Thread "Scott.Jhuang (莊清翔) : 6309"
Hi Sy Jong,

Have any idea?

"Scott.Jhuang (???) : 6309" ? 2015?06?23? 21:24 ??:
Dear Sy Jong,

Yes, I have check out DPDK KNI, but I still can't find how to prepare 
net_device structure...
And I also doesn't find how to get "ethtool_cmd.phy_address"
Could you let me know the path of source code folder

Choi, Sy Jong ? 2015?06?19? 10:35 ??:
Hi Scott,

DPDK PMD are interfacing using rte_ethdev.c which link to ixgbe_ethdev.c 
there?s no ?net_device? in our code.

But if you search DPDk code based, we have KNI example to teach you how to 
prepare the net_device structure.
Have you check out our DPDK KNI codes?

Regards,
Choi, Sy Jong
Platform Application Engineer

From: "Scott.Jhuang (? ? ?) : 6309" [mailto:scott.jhu...@cas-well.com]
Sent: Thursday, June 18, 2015 12:25 PM
To: Choi, Sy Jong; dev at dpdk.org
Subject: Re: [dpdk-dev] How to get net_device and use struct ethtool_cmd at 
DPDK enverinment?

Dear Sy Jong,

I'm planning to program a driver to get all the ethport's net_device structure, 
because I need some information from these net_device structures.
And I also need to use net_device struct's ethtool_cmd to get some information 
e.g. ethtool_cmd.phy_address, net_device->ethtool_ops->get_settings.

In fact, I need some information from net_device struct to access and control 
PHY's link-up/down,
and I reference igb driver to design the link-up/down functions, since in DPDK 
envirenment doesn't have igb driver,
so In DPDK envirenment, I don't know how to get network deivce's net_device 
structs and more information which initial by igb driver(because doesn't have 
igb driver).

Choi, Sy Jong ? 2015?06?17? 11:15 ??:
Hi Scott,

You are right, the KNI will be a good reference for you. It demonstrate how 
DPDK PMD interface with kernel.
May I know are you planning to build the interface to ethtool? You can try 
running KNI app.

Regards,
Choi, Sy Jong
Platform Application Engineer

From: "Scott.Jhuang (?? ?) : 6309" [mailto:scott.jhu...@cas-well.com]
Sent: Wednesday, June 17, 2015 11:12 AM
To: Choi, Sy Jong; dev at dpdk.org
Subject: Re: [dpdk-dev] How to get net_device and use struct ethtool_cmd at 
DPDK enverinment?

Hi Sy Jong,

But...I am programming a driver now, have any sample driver I can reference?

Choi, Sy Jong ? 2015?06?16? 14:48 ??:

Hi Scott,



You can review DPDK KNI sample app, there's ethtool support using a vEth device 
interfacing to DPDK PMD.



Pure DPDK PMD require programming to display the information in ethtool. The 
interfacing is demonstrate on KNI sample app.



Regards,

Choi, Sy Jong

Platform Application Engineer



-Original Message-

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of "Scott.Jhuang (???) : 6309"

Sent: Monday, June 15, 2015 6:35 PM

To: dev at dpdk.org

Subject: [dpdk-dev] How to get net_device and use struct ethtool_cmd at DPDK 
enverinment?



Hi,



I want to get etherport's net_device structs and using ethtool_cmd to get some 
information of etherports.

Are these capabilitys igb_uio driver also provided?



If not, how can I get net_devices and use ethtool_cmd capabilitys?



--



Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at cas-well.com

CASWELL Inc.  http://www.cas-well.com





? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.

--

Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at cas-well.com

CASWELL Inc.  http://www.cas-well.com
??? ?? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.

--

Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at cas-well.com

CASWELL Inc.  http://www.cas-well.com
? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.


--

Best Regards,
Scott Jhuang

Software Engineering Dept.
Software Engineer
CASwell
238 ?242?8?
8F, No.242, Bo-Ai 

[dpdk-dev] [PATCH v3 0/7] Expose IXGBE extended stats to DPDK apps

2015-07-01 Thread Tahhan, Maryam
> This patch set implements xstats_get() and xstats_reset() in dev_ops for ixgbe
> to expose detailed error statistics to DPDK applications. The dump_cfg
> application was extended to demonstrate the usage of retrieving statistics for
> DPDK interfaces and renamed to proc_info in order reflect this new
> functionality. This patch set also removes non generic statistics from the
> statistics strings at the ethdev level and marks the relevant registers as
> depricated in struct rte_eth_stats.
> 
> v2:
>  - Fixed patch dependencies.
>  - Broke down patches into smaller logical changes.
> 
> v3:
>  - Removes non-generic stats fields in rte_stats_strings and deprecates
>the fields related to them in struct rte_eth_stats.
>  - Modifies rte_eth_xstats_get() to return generic stats and extended stats.
> 
> Maryam Tahhan (7):
>   ixgbe: move stats register reads to a new function
>   ixgbe: add functions to get and reset xstats
>   ethdev: expose extended error stats
>   ethdev: remove HW specific stats in stats structs
>   ixgbe: add NIC specific stats removed from ethdev
>   app: remove dump_cfg
>   app: add a new app proc_info
> 
>  MAINTAINERS  |   4 +
>  app/Makefile |   2 +-
>  app/dump_cfg/Makefile|  45 
>  app/dump_cfg/main.c  |  92 ---
>  app/proc_info/Makefile   |  45 
>  app/proc_info/main.c | 512
> +++
>  doc/guides/rel_notes/abi.rst |  11 +
>  drivers/net/ixgbe/ixgbe_ethdev.c | 192 ---
>  lib/librte_ether/rte_ethdev.c|  29 ++-
>  lib/librte_ether/rte_ethdev.h|  30 ++-
>  mk/rte.sdktest.mk|   4 +-
>  11 files changed, 762 insertions(+), 204 deletions(-)  delete mode 100644
> app/dump_cfg/Makefile  delete mode 100644 app/dump_cfg/main.c  create
> mode 100644 app/proc_info/Makefile  create mode 100644
> app/proc_info/main.c
> 
> --
> 1.9.3

Hi Olivier,
I posted a new patch set with the suggested mods. Let me know if there are any 
issues.

Thanks in advance.

Best Regards, 
Maryam



[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Anuj Kalia
Thanks for the comments.

On Wed, Jul 1, 2015 at 1:32 PM, Vladimir Medvedkin  
wrote:
> Hi Anuj,
>
> Thanks for fixes!
> I have 2 comments
> - from i40e_ethdev.h : #define I40E_DEFAULT_RX_WTHRESH  0
> - (26 + 32) / 4 (batched descriptor writeback) should be (26 + 4 * 32) / 4
> (batched descriptor writeback)
> , thus we have 135 bytes/packet
>
> This corresponds to 58.8 Mpps
>
> Regards,
> Vladimir
>
> 2015-07-01 17:22 GMT+03:00 Anuj Kalia :
>>
>> Vladimir,
>>
>> Few possible fixes to your PCIe analysis (let me know if I'm wrong):
>> - ECRC is probably disabled (check using sudo lspci -vvv | grep
>> CGenEn-), so TLP header is 26 bytes
>> - Descriptor writeback can be batched using high value of WTHRESH,
>> which is what DPDK uses by default
>> - Read request contains full TLP header (26 bytes)
>>
>> Assuming WTHRESH = 4, bytes transferred from NIC to host per packet =
>> 26 + 64 (packet itself) +
>> (26 + 32) / 4 (batched descriptor writeback) +
>> (26 / 4) (read request for new descriptors) =
>> 111 bytes / packet
>>
>> This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP
>> overhead, rate = 67.4 Mpps
>>
>> --Anuj
>>
>>
>>
>> On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin 
>> wrote:
>> > In case with syn flood you should take into account return syn-ack
>> > traffic,
>> > which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
>> > faster. And don't forget about DLLP's generated by rx traffic, which
>> > saturates host-to-NIC bus.
>> >
>> > 2015-07-01 16:05 GMT+03:00 Pavel Odintsov :
>> >
>> >> Yes, Bruce, we understand this. But we are working with huge SYN
>> >> attacks processing and they are 64byte only :(
>> >>
>> >> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
>> >>  wrote:
>> >> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
>> >> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
>> >> >> achieve 40GE line rate...
>> >> >>
>> >> > Note that this would only apply for your minimal i.e. 64-byte, packet
>> >> sizes.
>> >> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
>> >> requirements
>> >> > are lower and you can easier achieve line rate.
>> >> >
>> >> > /Bruce
>> >> >
>> >> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
>> >> medvedkinv at gmail.com> wrote:
>> >> >> > Hi Pavel,
>> >> >> >
>> >> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710
>> >> >> > rx
>> >> only
>> >> >> > case.
>> >> >> > Assume we have 32byte descriptors (if we want more offload).
>> >> >> > DMA makes one pcie transaction with packet payload, one descriptor
>> >> writeback
>> >> >> > and one memory request for free descriptors for every 4 packets.
>> >> >> > For
>> >> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY +
>> >> >> > 6
>> >> DLL +
>> >> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
>> >> itself) +
>> >> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
>> >> >> > descriptors). Note that we do not take into account PCIe
>> >> >> > ACK/NACK/FC
>> >> Update
>> >> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits
>> >> >> > 1
>> >> byte in
>> >> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20
>> >> >> > ns.
>> >> Thus
>> >> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
>> >> >> > Correct me if I'm wrong.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Vladimir
>> >> >> >
>> >> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Sincerely yours, Pavel Odintsov
>> >>
>
>


[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Bruce Richardson
On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
> achieve 40GE line rate...
> 
Note that this would only apply for your minimal i.e. 64-byte, packet sizes.
Once you go up to larger e.g. 128B packets, your PCI bandwidth requirements
are lower and you can easier achieve line rate.

/Bruce

> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin  
> wrote:
> > Hi Pavel,
> >
> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx only
> > case.
> > Assume we have 32byte descriptors (if we want more offload).
> > DMA makes one pcie transaction with packet payload, one descriptor writeback
> > and one memory request for free descriptors for every 4 packets. For
> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6 DLL +
> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet itself) +
> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC Update
> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1 byte in
> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.  Thus
> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
> > Correct me if I'm wrong.
> >
> > Regards,
> > Vladimir
> >
> >


[dpdk-dev] [PATCH v2 0/4] kni: fix build with kernel 4.1

2015-07-01 Thread De Lara Guarch, Pablo
Hi

> -Original Message-
> From: Miguel Bernal Marin [mailto:miguel.bernal.marin at linux.intel.com]
> Sent: Friday, June 26, 2015 11:15 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: [PATCH v2 0/4] kni: fix build with kernel 4.1
> 
> Due to API changes in netdevice.h in 4.1 kernel release, KNI modules
> would not build.  This patch set adds the properly checks to fix
> compilation.
> 
> Changes in v2:
> 
>  - Fixed vHost module build errors.
> 
> Miguel Bernal Marin (4):
>   kni: fix igb_ndo_bridge_getlink to buid with 4.1
>   kni: fix header_ops to build with 4.1
>   kni: fix function parameter from proto_ops pointers
>   kni: fix missing validation when vhost HDR is enabled
> 
>  lib/librte_eal/linuxapp/kni/compat.h   |  4 
>  lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c | 10 ++
>  lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h  |  5 +
>  lib/librte_eal/linuxapp/kni/kni_net.c  |  4 
>  lib/librte_eal/linuxapp/kni/kni_vhost.c| 17 -
>  5 files changed, 39 insertions(+), 1 deletion(-)
> 
> --
> 2.4.4

Thanks!

Acked-by: Pablo de Lara 


[dpdk-dev] [PATCH] fm10k: support XEN domain0

2015-07-01 Thread Thomas Monjalon
2015-06-30 03:27, He, Shaopeng:
> From: He, Shaopeng
> > From: Liu, Jijiang
> > > Acked-by: Jijiang Liu 
> > >
> > > I think this patch could be merged before Stephen's following patch[1] is
> > > merged, then Stephen should rework the patch[1].
> > > Thanks.
> > >
> > > [1]http://dpdk.org/ml/archives/dev/2015-March/014992.html
> > 
> > Do you think we can accept this patch in current no-so-elegant way, so user
> > can
> > use XEN with fm10k from release 2.1; or better to wait for Stephen's patch?
> > Thank you in advance for your attention to this matter.
> 
> This patch is necessary for fm10k to use on XEN environment with DPDK. How 
> could
> we move forward, could you please kindly give some advice?

Applied.
It's probably the last time this ifdef is used. Next time, we'll have to 
introduce
a function which is common to Xen and standard case. A rework is welcome.
The Xen case could also be switched at run-time. Stephen's patch must be
reviewed for that.


[dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v

2015-07-01 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 05:50:14PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> So update on this. Summary is that its purely my fault, apologies for
> prematurely suspecting the wrong areas. Details below
> 
> 1. So my AWS box had an eth0 interface without DPDK, I enabled dpdk
> AND created a KNI interface also AND named the KNI interface to be
> eth0
> 
> 2. So Ubuntu started its dhcpclient on that interface, but my app
> doesnt really do anything do read the dhcp (renews) from the KNI and
> send it out the physical port and vice versa .. The kni was just
> sitting there not doing much of Rx/Tx
> 
> 3. Now my l2fwd-equivalent code started working fine, after a few
> minutes, the dhcp client on ubuntu gave up attempting dhcp renew (eth0
> already had an IP) and attempted to take off the IP from eth0
> 
> 4. At this point the standard KNI examples in dpdk which has callbacks
> registered, ended up being invoked - and the examples have a
> port_stop() and a port_start() in them - and exactly at this point my
> app crashed
> 
> So my bad! I just no-oped the callbacks for now and changed AWS eht0
> from dhcp to static IP and this are fine now ! My system has been up
> for long with no issues.
> 
> Thanks again Thomas and Bruce for the quick response and suggestions
>
No problem. Thanks for letting us know that this was resolved.

Regards,
/Bruce


[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-07-01 Thread Bruce Richardson
On Tue, Jun 30, 2015 at 10:49:26PM -0700, Matthew Hall wrote:
> With those two items commented out, and these CFLAGS:
> 
> "-g -O0 -fPIC -msse4.2"
> 

The recommended way of specifying a particular instruction set is via the
RTE_MACHINE setting in your build time config. Can you perhaps reproduce the
issue using a setting there?

/Bruce

> it looks like I can reproduce the issue in clang 2.6 series:
> 
> /vagrant/external/dpdk/build/include/rte_rtm.h:56:15: error: invalid operand 
> for inline asm constraint 'i'
> asm volatile(".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory");
> 
> So there are definitely some corner cases that seem to be able to trigger it.
> 
> On Jun 30, 2015, at 10:17 PM, Matthew Hall  wrote:
> 
> > To be a bit more specific, this is what I had to do to fix it for clang 3.6 
> > SVN snapshot release.
> > 
> > I am not sure if there is a better way of handling this situation. I'd love 
> > to know where I could improve it.
> > 
> > Matthew.
> > 
> > diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> > index f595cd0..8c883ee 100644
> > --- a/mk/rte.cpuflags.mk
> > +++ b/mk/rte.cpuflags.mk
> > @@ -77,13 +77,13 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
> > CPUFLAGS += RDRAND
> > endif
> > 
> > -ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> > -CPUFLAGS += FSGSBASE
> > -endif
> > +#ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> > +#CPUFLAGS += FSGSBASE
> > +#endif
> > 
> > -ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> > -CPUFLAGS += F16C
> > -endif
> > +#ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> > +#CPUFLAGS += F16C
> > +#endif
> > 
> > ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
> > CPUFLAGS += AVX2
> 


[dpdk-dev] [PATCH v4 0/9] Chelsio Terminator 5 (T5) 10G/40G Poll Mode Driver

2015-07-01 Thread Rahul Lakkireddy
On Tue, Jun 30, 2015 at 23:01:39 +0200, Thomas Monjalon wrote:
> 2015-06-30 04:58, Rahul Lakkireddy:
> > This series of patches add the CXGBE Poll Mode Driver support for Chelsio
> > Terminator 5 series of 10G/40G adapters.  The CXGBE PMD is split into 
> > multiple
> > patches.  The first patch adds the hardware specific api for all supported
> > Chelsio T5 adapters and the patches from 2 to 8 add the actual DPDK CXGBE 
> > PMD.
> > 
> > Also, the CXGBE PMD is enabled for compilation and linking by patch 2.
> > MAINTAINERS file is also updated by patch 2 to claim responsibility for the
> > CXGBE PMD.
> > 
> > More information on the CXGBE PMD can be found in the documentation added by
> > patch 9.  
> > 
> > v4:
> > - Fix 32-bit and clang compilation.
> > - Moved cxgbe doc entry in MAINTAINERS from patch 2 to patch 9 for 
> > consistency.
> > 
> > v3:
> > - Merge patches 10 and 11 with patch 2.
> > - Add rte_pmd_cxgbe_version.map and add EXPORT_MAP and LIBABIVER to cxgbe
> >   Makefile.
> > - Use RTE_DIM macro for calculating ARRAY_SIZE.
> > 
> > v2:
> > - Move the driver to drivers/net directory and update all config files and
> >   commit logs.  Also update MAINTAINERS.
> > - Break the second patch into more patches; incrementally, adding features 
> > to
> >   the cxgbe poll mode driver.
> > - Replace bitwise operations in finding last (most significant) bit set with
> >   gcc's __builtin_clz.
> > - Fix the return value returned by link update eth_dev operation.
> > - Few bug fixes and code cleanup.
> > 
> > Rahul Lakkireddy (9):
> >   cxgbe: add hardware specific api for all supported Chelsio T5 series
> > adapters.
> >   cxgbe: add cxgbe poll mode driver.
> >   cxgbe: add device configuration and RX support for cxgbe PMD.
> >   cxgbe: add TX support for cxgbe PMD.
> >   cxgbe: add device related operations for cxgbe PMD.
> >   cxgbe: add port statistics for cxgbe PMD.
> >   cxgbe: add link related functions for cxgbe PMD.
> >   cxgbe: add flow control functions for cxgbe PMD.
> >   doc: add cxgbe PMD documentation under doc/guides/nics/cxgbe.rst
> 
> Applied, thanks for the good work and welcome :)

Thank you very much.  We are very glad to be on board.

Thanks,
Rahul


[dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements

2015-07-01 Thread Thomas Monjalon
2015-06-30 11:27, Adrien Mazarguil:
> This patchset adds compatibility with the upcoming Mellanox OFED 3.0
> release (new kernel drivers and userland support libraries), which supports
> new features such as L3/L4 checksum validation offloads and addresses
> several bugs and limitations at the same time.
> 
> v2:
>  - Bugfix for a possible crash when allocating mbufs.
>  - Several API changes following the release of Mellanox OFED 3.0.
>  - Performance improvements made possible by the new API.
>  - Add TX checksum offloads.
>  - Update documentation to reflect the changes.
> 
> Adrien Mazarguil (6):
>   mlx4: fix possible crash on scattered mbuf allocation failure
>   mlx4: add MOFED 3.0 compatibility to interfaces names retrieval
>   mlx4: use MOFED 3.0 fast verbs interface for TX operations
>   mlx4: move scattered TX processing to helper function
>   mlx4: add L2 tunnel (VXLAN) checksum offload support
>   doc: update mlx4 documentation following MOFED 3.0 changes
> 
> Alex Rosenbaum (8):
>   mlx4: avoid looking up WR ID to improve RX performance
>   mlx4: merge RX queue setup functions
>   mlx4: use MOFED 3.0 extended flow steering API
>   mlx4: use MOFED 3.0 fast verbs interface for RX operations
>   mlx4: improve performance by requesting TX completion events less
> often
>   mlx4: shrink TX queue elements for better performance
>   mlx4: prefetch completed TX mbufs before releasing them
>   mlx4: associate resource domain with CQs and QPs to enhance
> performance
> 
> Gilad Berman (1):
>   mlx4: add L3 and L4 checksum offload support
> 
> Olga Shern (5):
>   mlx4: make sure experimental device query function is implemented
>   mlx4: allow applications to partially use fork()
>   mlx4: improve accuracy of link status information
>   mlx4: fix support for multiple VLAN filters
>   mlx4: disable multicast echo when device is not VF
> 
> Or Ami (3):
>   mlx4: fix error message for invalid number of descriptors
>   mlx4: remove provision for flow creation failure in DMFS A0 mode
>   mlx4: query netdevice to get initial MAC address

Applied, thanks


[dpdk-dev] [PATCH v4 0/4] bonding corrections and additions

2015-07-01 Thread Declan Doherty
On 28/06/15 23:02, Thomas Monjalon wrote:
> Declan, Neil,
>
> Please help to review this series.
>
...


Hey Thomas, I'll review this version of the patchset today, sorry for 
taking so long to get to this.

Cheers
Declan



[dpdk-dev] [PATCH v3 0/8] Dynamic RSS Configuration for Bonding

2015-07-01 Thread Declan Doherty
On 29/06/15 15:50, Tomasz Kulasek wrote:
> OVERVIEW
> 
> 1) Setting .rxmode.mq_mode for bonding device to ETH_MQ_RX_RSS makes bonding
> device fully RSS-capable, so all slaves are synchronized with its 
> configuration.
> This mode is intended to provide RSS configuration as known from "dynamic RSS
> configuration for one port" and made slaves transparent for client application
> implementation.
>
> 2) If .rxmode.mq_mode for bonding device isn't ETH_MQ_RX_RSS, slaves are not
> synchronized. That provides an ability to configure them manually. This mode 
> may
> be useful when application wants to manage RSS in an unusual way and the
> consistency of RSS configuration for slaves isn't required.
>
> Turning on/off RSS mode for slaves when bonding is started is not possible.
> Other RSS configuration is propagated over slaves, when bonding device API is
> used to do it.
>
> v3 changes:
>   - checkpatch cleanups
...
>
Acked-by : Declan Doherty 




[dpdk-dev] Using rte_ring_mp_xyz() across EAL and non-EAL threads ?

2015-07-01 Thread Gopakumar Choorakkot Edakkunni
Hi,

I have a requirement where one of my non-EAL app threads needs to
handoff some packets to an EAL task. I was thinking of using
rte_ring_mp_enqueue/dequeue for that purpose. I looked at the code for
the rte_ring library and it doesnt look like it has any "EAL"
dependencies, but I wanted to double confirm that there are no issues
in using it that way. Dint find much yes/no info about that on the
mailers/docs. Pls let me know your thoughts.

Rgds,
Gopa.


[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

2015-07-01 Thread Anuj Kalia
Vladimir,

Few possible fixes to your PCIe analysis (let me know if I'm wrong):
- ECRC is probably disabled (check using sudo lspci -vvv | grep
CGenEn-), so TLP header is 26 bytes
- Descriptor writeback can be batched using high value of WTHRESH,
which is what DPDK uses by default
- Read request contains full TLP header (26 bytes)

Assuming WTHRESH = 4, bytes transferred from NIC to host per packet =
26 + 64 (packet itself) +
(26 + 32) / 4 (batched descriptor writeback) +
(26 / 4) (read request for new descriptors) =
111 bytes / packet

This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP
overhead, rate = 67.4 Mpps

--Anuj



On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin  
wrote:
> In case with syn flood you should take into account return syn-ack traffic,
> which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
> faster. And don't forget about DLLP's generated by rx traffic, which
> saturates host-to-NIC bus.
>
> 2015-07-01 16:05 GMT+03:00 Pavel Odintsov :
>
>> Yes, Bruce, we understand this. But we are working with huge SYN
>> attacks processing and they are 64byte only :(
>>
>> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
>>  wrote:
>> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
>> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
>> >> achieve 40GE line rate...
>> >>
>> > Note that this would only apply for your minimal i.e. 64-byte, packet
>> sizes.
>> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
>> requirements
>> > are lower and you can easier achieve line rate.
>> >
>> > /Bruce
>> >
>> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
>> medvedkinv at gmail.com> wrote:
>> >> > Hi Pavel,
>> >> >
>> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx
>> only
>> >> > case.
>> >> > Assume we have 32byte descriptors (if we want more offload).
>> >> > DMA makes one pcie transaction with packet payload, one descriptor
>> writeback
>> >> > and one memory request for free descriptors for every 4 packets. For
>> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
>> DLL +
>> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
>> itself) +
>> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
>> >> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC
>> Update
>> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1
>> byte in
>> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
>> Thus
>> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
>> >> > Correct me if I'm wrong.
>> >> >
>> >> > Regards,
>> >> > Vladimir
>> >> >
>> >> >
>>
>>
>>
>> --
>> Sincerely yours, Pavel Odintsov
>>


[dpdk-dev] [PATCH v2] mempool: improve cache search

2015-07-01 Thread Zoltan Kiss
The current way has a few problems:

- if cache->len < n, we copy our elements into the cache first, then
  into obj_table, that's unnecessary
- if n >= cache_size (or the backfill fails), and we can't fulfil the
  request from the ring alone, we don't try to combine with the cache
- if refill fails, we don't return anything, even if the ring has enough
  for our request

This patch rewrites it severely:
- at the first part of the function we only try the cache if cache->len < n
- otherwise take our elements straight from the ring
- if that fails but we have something in the cache, try to combine them
- the refill happens at the end, and its failure doesn't modify our return
  value

Signed-off-by: Zoltan Kiss 
---
v2:
- fix subject
- add unlikely for branch where request is fulfilled both from cache and ring

 lib/librte_mempool/rte_mempool.h | 63 +---
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 6d4ce9a..1e96f03 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void 
**obj_table,
unsigned lcore_id = rte_lcore_id();
uint32_t cache_size = mp->cache_size;

-   /* cache is not enabled or single consumer */
+   cache = >local_cache[lcore_id];
+   /* cache is not enabled or single consumer or not enough */
if (unlikely(cache_size == 0 || is_mc == 0 ||
-n >= cache_size || lcore_id >= RTE_MAX_LCORE))
+cache->len < n || lcore_id >= RTE_MAX_LCORE))
goto ring_dequeue;

-   cache = >local_cache[lcore_id];
cache_objs = cache->objs;

-   /* Can this be satisfied from the cache? */
-   if (cache->len < n) {
-   /* No. Backfill the cache first, and then fill from it */
-   uint32_t req = n + (cache_size - cache->len);
-
-   /* How many do we require i.e. number to fill the cache + the 
request */
-   ret = rte_ring_mc_dequeue_bulk(mp->ring, 
>objs[cache->len], req);
-   if (unlikely(ret < 0)) {
-   /*
-* In the offchance that we are buffer constrained,
-* where we are not able to allocate cache + n, go to
-* the ring directly. If that fails, we are truly out of
-* buffers.
-*/
-   goto ring_dequeue;
-   }
-
-   cache->len += req;
-   }
-
/* Now fill in the response ... */
for (index = 0, len = cache->len - 1; index < n; ++index, len--, 
obj_table++)
*obj_table = cache_objs[len];
@@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table,

__MEMPOOL_STAT_ADD(mp, get_success, n);

-   return 0;
+   ret = 0;
+   goto cache_refill;

 ring_dequeue:
 #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
@@ -994,11 +975,45 @@ ring_dequeue:
else
ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n);

+#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
+   if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) {
+   uint32_t req = n - cache->len;
+
+   ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req);
+   if (ret == 0) {
+   cache_objs = cache->objs;
+   obj_table += req;
+   for (index = 0; index < cache->len;
+++index, ++obj_table)
+   *obj_table = cache_objs[index];
+   cache->len = 0;
+   }
+   }
+#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
+
if (ret < 0)
__MEMPOOL_STAT_ADD(mp, get_fail, n);
else
__MEMPOOL_STAT_ADD(mp, get_success, n);

+#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
+cache_refill:
+   /* If previous dequeue was OK and we have less than n, start refill */
+   if (ret == 0 && cache_size > 0 && cache->len < n) {
+   uint32_t req = cache_size - cache->len;
+
+   cache_objs = cache->objs;
+   ret = rte_ring_mc_dequeue_bulk(mp->ring,
+  >objs[cache->len],
+  req);
+   if (likely(ret == 0))
+   cache->len += req;
+   else
+   /* Don't spoil the return value */
+   ret = 0;
+   }
+#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
+
return ret;
 }

-- 
1.9.1



[dpdk-dev] How to set pool mirror on Intel 82599ES

2015-07-01 Thread Zhou, Tianlin
Dear all,


I used the following steps to setup pool mirror for Intel 82599ES, but failed.

Step 1: I setup 4 VFs for 05:00.0

05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ 
Network Connection (rev 01)
05:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
05:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
05:10.4 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
05:10.6 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)

Step 2: bind 05:00.0 to igb_uio

Step3: start testpmd and then type the following command

// I want to set pool mirroring on 05:00.0
// Mirror the traffic VF 4 - Pool 3 (05:10.6) to VF 3 - Pool 2 (05:10.4)
set port 0 mirror-rule 0 pool-mirror 0x8 dst-pool 2 on

Step 4: send traffic on VF4.
I expected that I could see the same traffic on VF 3, but I failed.

Any wrong with my steps?
Anyone ever setup pool mirroring successfully?

Thanks a lot!
-Tianlin


[dpdk-dev] [PATCH v2] vfio: Fix overflow while assigning vfio BAR region offset and size

2015-07-01 Thread Burakov, Anatoly
Hi all,

> The last patch from Rahul does not solve the problem. For those cases where 
> the MSI-X table is in one of the BARs to map, the memreg array is still in 
> use.

Rahul's initial patch was pretty much what you have submitted, it just didn't 
build on a 32-bit system.

> My fix was using unsigned long instead of uint32_t for the memreg array as 
> this is used as ?a parameter for mmap system call which expects such a type 
> for the offset (and size).

Maybe use off_t? That would at least be guaranteed to compile on any system...

> In a 32-bit system mmap system call and VFIO mmap implementation will get an 
> unsigned long offset, as it does the struct vma_area_struct for vm_pgoff.
> VFIO will not be able to map the right BAR except for BAR 0.
> 
> So, basically, VFIO kernel code does not work for 32 bit systems.
> 
> I think we should define memreg as unsigned long and to report this problem 
> to the VFIO kernel maintainer.

If that's the case, this should indeed be taken up with the kernel maintainers. 
I don't have a 32-bit system handy to test it, unfortunately.

Thanks,
Anatoly


[dpdk-dev] [PATCH v2] vfio: Fix overflow while assigning vfio BAR region offset and size

2015-07-01 Thread Alejandro Lucero
I submitted a patch for fixing this issue on the 25th of June. I did not
notice someone had reported this before. The last patch from Rahul does not
solve the problem. For those cases where the MSI-X table is in one of the
BARs to map, the memreg array is still in use.

My fix was using unsigned long instead of uint32_t for the memreg array as
this is used as  a parameter for mmap system call which expects such a type
for the offset (and size). This worked for me but I did not realize this
has to be compiled for 32 bit systems as well. In that case unsigned long
will work for the mmap but not for the VFIO kernel API which expects
uint64_t for the offset and size inside the struct vfio_region_info.

The point is, the offset param from the vfio_region_info has the index BAR
to map. For this VFIO kernel code uses VFIO_PCI_INDEX_TO_OFFSET:

 #define VFIO_PCI_OFFSET_SHIFT

40
 #define VFIO_PCI_INDEX_TO_OFFSET
(index
) ((u64
)(index
) <<
VFIO_PCI_OFFSET_SHIFT
)

This index will be used by the VFIO mmap implementation when the DPDK code
tries to map the BARs. That code does the opposite for getting the index:

index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);

In this case PAGE_SHIFT needs to be used because the mmap system call
modifies the offset previously.

In a 32-bit system mmap system call and VFIO mmap implementation will get
an unsigned long offset, as it does the struct vma_area_struct for
vm_pgoff. VFIO will not be able to map the right BAR except for BAR 0.

So, basically, VFIO kernel code does not work for 32 bit systems.

I think we should define memreg as unsigned long and to report this problem
to the VFIO kernel maintainer.




On Tue, Jun 30, 2015 at 10:12 PM, Thomas Monjalon  wrote:

> Hi Anatoly,
> Please could you review this fix to allow Chelsio using VFIO?
> Thanks
>
> 2015-06-23 20:30, Rahul Lakkireddy:
> > When using vfio, the probe fails over Chelsio T5 adapters after
> > commit-id 90a1633b2 (eal/linux: allow to map BARs with MSI-X tables).
> >
> > While debugging further, found that the BAR region offset and size read
> from
> > vfio are u64, but are assigned to uint32_t variables.  This results in
> the u64
> > value getting truncated to 0 and passing wrong offset and size to mmap
> for
> > subsequent BAR regions (i.e. trying to overwrite previously allocated
> BAR 0
> > region).
> >
> > The fix is to use these region offset and size directly rather than
> assigning
> > to uint32_t variables.
> >
> > Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
> > Signed-off-by: Rahul Lakkireddy 
> > Signed-off-by: Kumar Sanghvi 
>
>


[dpdk-dev] rte_lpm4 with expanded next hop support now available

2015-07-01 Thread Matthew Hall
On Jul 1, 2015, at 4:20 AM, Bruce Richardson  
wrote:
> Could you maybe send a patch (or set) with all your changes in it here for us
> to look at? [I did look at it in github, but I'm not very familiar with github
> and the changes seem to be spread over a whole series of commits]

Here is a view of the specific commits:

https://github.com/megahall/dpdk_mhall/compare/megahall/lpm-expansion

I'll work on emails when I get a moment. I was hoping since the branch is open 
to all for download someone could sync it and try it in an environment that has 
some kind of performance tests / known results for the self-tests as my 
development setup is not that great compared to some of the other DPDK 
engineers out there.

> In terms of ABI issues, the overall function set for lpm4 library is not that
> big, so it may be possible to maintain old a new copies of the functions in 
> parallel
> for one release, and solve the ABI issues that way. I'm quite keen to get 
> these
> changes in, since I think being limited to 255 next hops is quite a limitation
> for many cases.

Sounds good.

> A final interesting suggestion I might throw out, is: can we make the lpm 
> library
> configurable in that it can use either 8-bit, 16/24 bit or even pointer based
> next hops (I won't say 64-bit, as for pointers we might be able to get away
> with less than 64-bits being stored)? Would such a thing be useful to people?

I think this could be pretty nice, the tricky part is that, at least in the 
version Vladimir and Stephen helped me cook up, a lot of bitfield trickery was 
involved. So we'd need to switch away from bitfields to something a bit more 
flexible or easy to work with when variable configuration comes into the 
picture. Also not sure how it'd work at runtime versus compilation, etc. You 
guys know more than me about this stuff I think.

Matthew.



[dpdk-dev] [PATCH] virtio: fix the vq size issue

2015-07-01 Thread Xu, Qian Q
Tested-by: Qian Xu 
- Test Commit: c55e94f560ef5c9fcee4584952de1d0bd414aece
- OS: Fedora 21
- GCC: gcc (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
- CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
- NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
- Target: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 
01)
- Total 2 cases(Test Case7 and 8), 2 passed, 0 failed. 

Test Case 1:  test_perf_virtio_one_vm_dpdk_fwd_vhost-cuse_jumboframe


On host:

1. Start up vhost-switch, mergeable 1 means the jubmo frame feature is enabled. 
vm2vm 0 means only one vm without vm to vm communication::

taskset -c 1-3 /examples/vhost/build/vhost-switch -c 0xf -n 4 
--huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 1 --mergeable 1 --zero-copy 0 
--vm2vm 0


2. Start VM with vhost cuse as backend::

taskset -c 4-6  /home/qxu10/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 
-object memory-backend-file, id=mem,size=2048M,mem-path=/mnt/huge,share=on 
-numa node,memdev=mem -mem-prealloc \
-enable-kvm -m 2048 -smp 4 -cpu host -name dpdk1-vm1 \
-drive file=/home/img/dpdk1-vm1.img \
-netdev tap,id=vhost3,ifname=tap_vhost3,vhost=on,script=no \
-device virtio-net 
pci,netdev=vhost3,mac=52:54:00:00:00:01,id=net3,csum=off,gso=off,guest_csum=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
 \
-netdev tap,id=vhost4,ifname=tap_vhost4,vhost=on,script=no \
-device 
virtio-net-pci,netdev=vhost4,mac=52:54:00:00:00:02,id=net4,csum=off,gso=off,guest_csum=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
 \
-netdev tap,id=ipvm1,ifname=tap3,script=/etc/qemu-ifup -device 
rtl8139,netdev=ipvm1,id=net0,mac=00:00:00:00:00:01 \
-localtime -nographic

On guest:

3. ensure the dpdk folder copied to the guest with the same config file and 
build process as host. Then bind 2 virtio devices to igb_uio and start testpmd, 
below is the step for reference::

.//tools/dpdk_nic_bind.py --bind igb_uio 00:03.0 00:04.0

.//x86_64-native-linuxapp-gcc/app/test-pmd/testpmd -c f -n 4 
-- -i --txqflags 0x0f00 --max-pkt-len 9000 

$ >set fwd mac

$ >start tx_first

4. After typing start tx_first in testpmd, user can see there would be 2 virtio 
device with MAC and vlan id registered in vhost-user, the log would be shown in 
host's vhost-sample output.

5. Send traffic(30second) to virtio1 and virtio2, and set the packet size from 
64 to 1518 as well as the jumbo frame 3000. Check the performance in Mpps. The 
traffic sent to virtio1 should have the DEST MAC of Virtio1's MAC, Vlan id of 
Virtio1. The traffic sent to virtio2 should have the DEST MAC of Virtio2's MAC, 
Vlan id of Virtio2. As to the functionality criteria, The received rate should 
not be zero. As to the performance criteria, need check it with developer or 
design doc/PRD.

Test Case 7:  test_perf_virtio_one_vm_dpdk_fwd_vhost-user_jumboframe


This case is similar to TestCase1, just change the backend from vhost cuse to 
vhost-user, so need rebuild the dpdk in vhost-user on host, other steps are 
same as TestCase1. The command to launch vm is different, see below as 
example:: 

/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 
-cpu host -enable-kvm -m 2048 -object 
memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa 
node,memdev=mem -mem-prealloc -smp 2 -drive file=/home/img/dpdk1-vm1.img 
-chardev socket,id=char0,path=
---
 drivers/net/virtio/virtio_ethdev.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index fe5f9a1..d84de13 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -263,8 +263,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 */
vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
-   if (nb_desc == 0)
-   nb_desc = vq_size;
if (vq_size == 0) {
PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
return -EINVAL;
@@ -275,15 +273,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

-   if (nb_desc < vq_size) {
-   if (!rte_is_power_of_2(nb_desc)) {
-   PMD_INIT_LOG(ERR,
-"nb_desc(%u) size is not powerof 2",
-nb_desc);
-   return -EINVAL;
-   }
-   vq_size = nb_desc;
-   }
+   if (nb_desc != vq_size)
+   PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size 
(%d), fall to vq size",
+   nb_desc, vq_size);

if (queue_type == VTNET_RQ) {
snprintf(vq_name, 

[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-07-01 Thread Matthew Hall
Previously, with the -msse4.2 flag removed, the build failed for a different 
reason.

I can retry without it and see if it's the case in the new DPDK.

On Jul 1, 2015, at 4:10 AM, Bruce Richardson  
wrote:

> On Tue, Jun 30, 2015 at 10:49:26PM -0700, Matthew Hall wrote:
>> With those two items commented out, and these CFLAGS:
>> 
>> "-g -O0 -fPIC -msse4.2"
>> 
> 
> The recommended way of specifying a particular instruction set is via the
> RTE_MACHINE setting in your build time config. Can you perhaps reproduce the
> issue using a setting there?
> 
> /Bruce



[dpdk-dev] [PATCH] hash: add missing symbol in version map

2015-07-01 Thread Gonzalez Monroy, Sergio
On 30/06/2015 19:37, Pablo de Lara wrote:
> rte_hash_hash is a public function but was not in
> rte_hash_version.map
>
> Signed-off-by: Pablo de Lara 
> ---
>   lib/librte_hash/rte_hash_version.map |1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/lib/librte_hash/rte_hash_version.map 
> b/lib/librte_hash/rte_hash_version.map
> index 0b749e8..94a0fec 100644
> --- a/lib/librte_hash/rte_hash_version.map
> +++ b/lib/librte_hash/rte_hash_version.map
> @@ -11,6 +11,7 @@ DPDK_2.0 {
>   rte_hash_del_key_with_hash;
>   rte_hash_find_existing;
>   rte_hash_free;
> + rte_hash_hash;
>   rte_hash_lookup;
>   rte_hash_lookup_bulk;
>   rte_hash_lookup_with_hash;
Acked-by: Sergio Gonzalez Monroy 


[dpdk-dev] How to get net_device and use struct ethtool_cmd at DPDK enverinment?

2015-07-01 Thread Choi, Sy Jong
Hi Scott,

Please refer to our KNI library at:-
dpdk-1.8.0\lib\librte_eal\linuxapp\kni\ethtool\igb\igb.h

Regards,
Choi, Sy Jong
Platform Application Engineer

From: "Scott.Jhuang (???) : 6309" [mailto:scott.jhu...@cas-well.com]
Sent: Wednesday, July 01, 2015 2:44 PM
To: Choi, Sy Jong; dev at dpdk.org
Subject: Re: [dpdk-dev] How to get net_device and use struct ethtool_cmd at 
DPDK enverinment?

Hi Sy Jong,

Have any idea?

"Scott.Jhuang (???) : 6309" ? 2015?06?23? 21:24 ??:
Dear Sy Jong,

Yes, I have check out DPDK KNI, but I still can't find how to prepare 
net_device structure...
And I also doesn't find how to get "ethtool_cmd.phy_address"
Could you let me know the path of source code folder

Choi, Sy Jong ? 2015?06?19? 10:35 ??:
Hi Scott,

DPDK PMD are interfacing using rte_ethdev.c which link to ixgbe_ethdev.c 
there?s no ?net_device? in our code.

But if you search DPDk code based, we have KNI example to teach you how to 
prepare the net_device structure.
Have you check out our DPDK KNI codes?

Regards,
Choi, Sy Jong
Platform Application Engineer

From: "Scott.Jhuang (? ? ?) : 6309" [mailto:scott.jhu...@cas-well.com]
Sent: Thursday, June 18, 2015 12:25 PM
To: Choi, Sy Jong; dev at dpdk.org
Subject: Re: [dpdk-dev] How to get net_device and use struct ethtool_cmd at 
DPDK enverinment?

Dear Sy Jong,

I'm planning to program a driver to get all the ethport's net_device structure, 
because I need some information from these net_device structures.
And I also need to use net_device struct's ethtool_cmd to get some information 
e.g. ethtool_cmd.phy_address, net_device->ethtool_ops->get_settings.

In fact, I need some information from net_device struct to access and control 
PHY's link-up/down,
and I reference igb driver to design the link-up/down functions, since in DPDK 
envirenment doesn't have igb driver,
so In DPDK envirenment, I don't know how to get network deivce's net_device 
structs and more information which initial by igb driver(because doesn't have 
igb driver).

Choi, Sy Jong ? 2015?06?17? 11:15 ??:
Hi Scott,

You are right, the KNI will be a good reference for you. It demonstrate how 
DPDK PMD interface with kernel.
May I know are you planning to build the interface to ethtool? You can try 
running KNI app.

Regards,
Choi, Sy Jong
Platform Application Engineer

From: "Scott.Jhuang (?? ?) : 6309" [mailto:scott.jhu...@cas-well.com]
Sent: Wednesday, June 17, 2015 11:12 AM
To: Choi, Sy Jong; dev at dpdk.org
Subject: Re: [dpdk-dev] How to get net_device and use struct ethtool_cmd at 
DPDK enverinment?

Hi Sy Jong,

But...I am programming a driver now, have any sample driver I can reference?

Choi, Sy Jong ? 2015?06?16? 14:48 ??:

Hi Scott,



You can review DPDK KNI sample app, there's ethtool support using a vEth device 
interfacing to DPDK PMD.



Pure DPDK PMD require programming to display the information in ethtool. The 
interfacing is demonstrate on KNI sample app.



Regards,

Choi, Sy Jong

Platform Application Engineer



-Original Message-

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of "Scott.Jhuang (???) : 6309"

Sent: Monday, June 15, 2015 6:35 PM

To: dev at dpdk.org

Subject: [dpdk-dev] How to get net_device and use struct ethtool_cmd at DPDK 
enverinment?



Hi,



I want to get etherport's net_device structs and using ethtool_cmd to get some 
information of etherports.

Are these capabilitys igb_uio driver also provided?



If not, how can I get net_devices and use ethtool_cmd capabilitys?



--



Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at cas-well.com

CASWELL Inc.  http://www.cas-well.com





? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.

--

Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at cas-well.com

CASWELL Inc.  http://www.cas-well.com
??? ?? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.

--

Best Regards,

Scott Jhuang



Software Engineering Dept.

Software Engineer

CASwell

238 ?242?8?

8F, No.242, Bo-Ai St., Shu-Lin Dist, New Taipei City 238, Taiwan

Tel?+886-2-7705- # 6309

Fax?+886-2-7731-9988

E-mail?scott.jhuang at 

[dpdk-dev] [PATCH v4 4/4] vhost: add comment for potential unwanted callback on listenfds

2015-07-01 Thread Ouyang, Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, June 30, 2015 5:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 4/4] vhost: add comment for potential
> unwanted callback on listenfds
> 
> add comment for potential unwanted callback on listenfds
> 
> v4 changes:
> add comment for potential unwanted callback on listenfds
> 
> Signed-off-by: Huawei Xie 

Acked-by: Changchun Ouyang 

> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user/fd_man.c
> b/lib/librte_vhost/vhost_user/fd_man.c
> index bd30f8d..d68b270 100644
> --- a/lib/librte_vhost/vhost_user/fd_man.c
> +++ b/lib/librte_vhost/vhost_user/fd_man.c
> @@ -242,6 +242,13 @@ fdset_event_dispatch(struct fdset *pfdset)
> 
>   pthread_mutex_unlock(>fd_mutex);
> 
> + /*
> +  * When select is blocked, other threads might unregister
> +  * listenfds from and register new listenfds into fdset.
> +  * When select returns, the entries for listenfds in the fdset
> +  * might have been updated. It is ok if there is unwanted call
> +  * for new listenfds.
> +  */
>   ret = select(maxfds + 1, , , NULL, );
>   if (ret <= 0)
>   continue;
> --
> 1.8.1.4



[dpdk-dev] [PATCH v4 3/4] vhost: version map file update

2015-07-01 Thread Ouyang, Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, June 30, 2015 5:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 3/4] vhost: version map file update
> 
> update version map file for rte_vhost_driver_unregister API
> 
> v3 changes:
> update version map file
> 
> Signed-off-by: Huawei Xie 

Acked-by: Changchun Ouyang 

> ---
>  lib/librte_vhost/rte_vhost_version.map | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> index 163dde0..fb6bb9e 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -13,3 +13,11 @@ DPDK_2.0 {
> 
>   local: *;
>  };
> +
> +DPDK_2.1 {
> + global:
> +
> + rte_vhost_driver_unregister;
> +
> + local: *;
> +} DPDK_2.0;
> --
> 1.8.1.4



[dpdk-dev] [PATCH v4 2/4] vhost: vhost unix domain socket cleanup

2015-07-01 Thread Ouyang, Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, June 30, 2015 5:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 2/4] vhost: vhost unix domain socket cleanup
> 
> rte_vhost_driver_unregister API will remove the listenfd from event list, and
> then close it.
> 
> v2 changes:
> -minor code style fix, remove unnecessary new line
> 
> Signed-off-by: Huawei Xie 
> Signed-off-by: Peng Sun 

Acked-by: Changchun Ouyang 

> ---
>  lib/librte_vhost/rte_virtio_net.h|  3 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |  9 
> lib/librte_vhost/vhost_user/vhost-net-user.c | 68
> +++-  lib/librte_vhost/vhost_user/vhost-net-
> user.h |  2 +-
>  4 files changed, 69 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_virtio_net.h
> b/lib/librte_vhost/rte_virtio_net.h
> index 5d38185..5630fbc 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -188,6 +188,9 @@ int rte_vhost_enable_guest_notification(struct
> virtio_net *dev, uint16_t queue_i
>  /* Register vhost driver. dev_name could be different for multiple instance
> support. */  int rte_vhost_driver_register(const char *dev_name);
> 
> +/* Unregister vhost driver. This is only meaningful to vhost user. */
> +int rte_vhost_driver_unregister(const char *dev_name);
> +
>  /* Register callbacks. */
>  int rte_vhost_driver_callback_register(struct virtio_net_device_ops const *
> const);
>  /* Start vhost driver session blocking loop. */ diff --git
> a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
> b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
> index 6b68abf..1ae7c49 100644
> --- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
> @@ -405,6 +405,15 @@ rte_vhost_driver_register(const char *dev_name)  }
> 
>  /**
> + * An empty function for unregister
> + */
> +int
> +rte_vhost_driver_unregister(const char *dev_name __rte_unused) {
> + return 0;
> +}
> +
> +/**
>   * The CUSE session is launched allowing the application to receive open,
>   * release and ioctl calls.
>   */
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> index 31f1215..87a4711 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -66,6 +66,8 @@ struct connfd_ctx {
>  struct _vhost_server {
>   struct vhost_server *server[MAX_VHOST_SERVER];
>   struct fdset fdset;
> + int vserver_cnt;
> + pthread_mutex_t server_mutex;
>  };
> 
>  static struct _vhost_server g_vhost_server = { @@ -74,10 +76,10 @@ static
> struct _vhost_server g_vhost_server = {
>   .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
>   .num = 0
>   },
> + .vserver_cnt = 0,
> + .server_mutex = PTHREAD_MUTEX_INITIALIZER,
>  };
> 
> -static int vserver_idx;
> -
>  static const char *vhost_message_str[VHOST_USER_MAX] = {
>   [VHOST_USER_NONE] = "VHOST_USER_NONE",
>   [VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
> @@ -427,7 +429,6 @@ vserver_message_handler(int connfd, void *dat, int
> *remove)
>   }
>  }
> 
> -
>  /**
>   * Creates and initialise the vhost server.
>   */
> @@ -436,34 +437,77 @@ rte_vhost_driver_register(const char *path)  {
>   struct vhost_server *vserver;
> 
> - if (vserver_idx == 0)
> + pthread_mutex_lock(_vhost_server.server_mutex);
> + if (ops == NULL)
>   ops = get_virtio_net_callbacks();
> - if (vserver_idx == MAX_VHOST_SERVER)
> +
> + if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "error: the number of servers reaches maximum\n");
> + pthread_mutex_unlock(_vhost_server.server_mutex);
>   return -1;
> + }
> 
>   vserver = calloc(sizeof(struct vhost_server), 1);
> - if (vserver == NULL)
> + if (vserver == NULL) {
> + pthread_mutex_unlock(_vhost_server.server_mutex);
>   return -1;
> -
> - unlink(path);
> + }
> 
>   vserver->listenfd = uds_socket(path);
>   if (vserver->listenfd < 0) {
>   free(vserver);
> + pthread_mutex_unlock(_vhost_server.server_mutex);
>   return -1;
>   }
> - vserver->path = path;
> +
> + vserver->path = strdup(path);
> 
>   fdset_add(_vhost_server.fdset, vserver->listenfd,
> - vserver_new_vq_conn, NULL,
> - vserver);
> + vserver_new_vq_conn, NULL, vserver);
> 
> - g_vhost_server.server[vserver_idx++] = vserver;
> + g_vhost_server.server[g_vhost_server.vserver_cnt++] = vserver;
> + pthread_mutex_unlock(_vhost_server.server_mutex);
> 
>   return 0;
>  }
> 
> 
> +/**
> + * Unregister the specified vhost server  */ int
> +rte_vhost_driver_unregister(const char *path) {
> + int i;
> + int 

[dpdk-dev] [PATCH v4 1/4] vhost: call fdset_del_slot to remove connection fd

2015-07-01 Thread Ouyang, Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, June 30, 2015 5:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v4 1/4] vhost: call fdset_del_slot to remove
> connection fd
> 
> In the event handler of connection fd, the connection fd could be possibly
> closed. The event dispatch loop would then try to remove the fd from fdset.
> Between these two actions, another thread might register a new listenfd
> reusing the val of just closed fd, so we couldn't call fdset_del which would
> wrongly clean up the new listenfd. A new function fdset_del_slot is provided
> to cleanup the fd at the specified location.
> 
> v4 changes:
> - call fdset_del_slot to remove connection fd
> 
> Signed-off-by: Huawei Xie 

Acked-by: Changchun Ouyang 

> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 27
> ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/fd_man.c
> b/lib/librte_vhost/vhost_user/fd_man.c
> index 831c9c1..bd30f8d 100644
> --- a/lib/librte_vhost/vhost_user/fd_man.c
> +++ b/lib/librte_vhost/vhost_user/fd_man.c
> @@ -188,6 +188,24 @@ fdset_del(struct fdset *pfdset, int fd)  }
> 
>  /**
> + *  Unregister the fd at the specified slot from the fdset.
> + */
> +static void
> +fdset_del_slot(struct fdset *pfdset, int index) {
> + if (pfdset == NULL || index < 0 || index >= MAX_FDS)
> + return;
> +
> + pthread_mutex_lock(>fd_mutex);
> +
> + pfdset->fd[index].fd = -1;
> + pfdset->fd[index].rcb = pfdset->fd[index].wcb = NULL;
> + pfdset->num--;
> +
> + pthread_mutex_unlock(>fd_mutex);
> +}
> +
> +/**
>   * This functions runs in infinite blocking loop until there is no fd in
>   * pfdset. It calls corresponding r/w handler if there is event on the fd.
>   *
> @@ -248,8 +266,15 @@ fdset_event_dispatch(struct fdset *pfdset)
>* We don't allow fdset_del to be called in callback
>* directly.
>*/
> + /*
> +  * When we are to clean up the fd from fdset,
> +  * because the fd is closed in the cb,
> +  * the old fd val could be reused by when creates
> new
> +  * listen fd in another thread, we couldn't call
> +  * fd_set_del.
> +  */
>   if (remove1 || remove2)
> - fdset_del(pfdset, fd);
> + fdset_del_slot(pfdset, i);
>   }
>   }
>  }
> --
> 1.8.1.4



[dpdk-dev] [PATCH] cxgbe: fix build with clang

2015-07-01 Thread Thomas Monjalon
GCC_VERSION is empty in case of clang:
/bin/sh: line 0: test: -ge: unary operator expected

It cannot be quoted because an integer is expected.
So the fix is to check empty value in a separate test.

Signed-off-by: Thomas Monjalon 
---
 drivers/net/cxgbe/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

This patch is already applied.

diff --git a/drivers/net/cxgbe/Makefile b/drivers/net/cxgbe/Makefile
index 4dfc6b0..ae12d75 100644
--- a/drivers/net/cxgbe/Makefile
+++ b/drivers/net/cxgbe/Makefile
@@ -52,9 +52,9 @@ ifeq ($(CC), icc)
 CFLAGS_BASE_DRIVER = -wd174 -wd593 -wd869 -wd981 -wd2259
 else
 #
-# CFLAGS for gcc
+# CFLAGS for gcc/clang
 #
-ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1)
+ifeq ($(shell test $(CC) = gcc && test $(GCC_VERSION) -ge 44 && echo 1), 1)
 CFLAGS += -Wno-deprecated
 endif
 CFLAGS_BASE_DRIVER = -Wno-unused-parameter -Wno-unused-value
-- 
2.4.2



[dpdk-dev] rte_lpm4 with expanded next hop support now available

2015-07-01 Thread Matthew Hall
Hello,

Based on the wonderful assistance from Vladimir and Stephen and a close friend 
of mine that is a hypervisor developer who helped me reverse engineer and 
rewrite rte_lpm_lookupx4, I have got a known-working version of rte_lpm4 with 
expanded 24 bit next hop support available here:

https://github.com/megahall/dpdk_mhall/tree/megahall/lpm-expansion

I'm going to be working on rte_lpm6 next, it seems to take a whole ton of 
memory to run the self-test, if anybody knows how much that would help, as it 
seems to run out when I tried it.

Sadly this change is not ABI compatible or performance compatible with the 
original rte_lpm because I had to hack on the bitwise layout to get more data 
in there, and it will run maybe 50% slower because it has to access some more 
memory.

Despite all this I'd really like to do the right thing find a way to contribute 
it back, perhaps as a second kind of rte_lpm, so I wouldn't be the only person 
using it and forking the code when I already met several others who needed it. 
I could use some ideas how to handle the situation.

Matthew.


[dpdk-dev] [PATCH v2] vfio: Fix overflow while assigning vfio BAR region offset and size

2015-07-01 Thread Thomas Monjalon
Hi Anatoly,
Please could you review this fix to allow Chelsio using VFIO?
Thanks

2015-06-23 20:30, Rahul Lakkireddy:
> When using vfio, the probe fails over Chelsio T5 adapters after
> commit-id 90a1633b2 (eal/linux: allow to map BARs with MSI-X tables).
> 
> While debugging further, found that the BAR region offset and size read from
> vfio are u64, but are assigned to uint32_t variables.  This results in the u64
> value getting truncated to 0 and passing wrong offset and size to mmap for
> subsequent BAR regions (i.e. trying to overwrite previously allocated BAR 0
> region).
> 
> The fix is to use these region offset and size directly rather than assigning
> to uint32_t variables.
> 
> Fixes: 90a1633b2347 ("eal/linux: allow to map BARs with MSI-X tables")
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Kumar Sanghvi 



[dpdk-dev] [PATCH v4 0/9] Chelsio Terminator 5 (T5) 10G/40G Poll Mode Driver

2015-07-01 Thread Thomas Monjalon
2015-06-30 04:58, Rahul Lakkireddy:
> This series of patches add the CXGBE Poll Mode Driver support for Chelsio
> Terminator 5 series of 10G/40G adapters.  The CXGBE PMD is split into multiple
> patches.  The first patch adds the hardware specific api for all supported
> Chelsio T5 adapters and the patches from 2 to 8 add the actual DPDK CXGBE PMD.
> 
> Also, the CXGBE PMD is enabled for compilation and linking by patch 2.
> MAINTAINERS file is also updated by patch 2 to claim responsibility for the
> CXGBE PMD.
> 
> More information on the CXGBE PMD can be found in the documentation added by
> patch 9.  
> 
> v4:
> - Fix 32-bit and clang compilation.
> - Moved cxgbe doc entry in MAINTAINERS from patch 2 to patch 9 for 
> consistency.
> 
> v3:
> - Merge patches 10 and 11 with patch 2.
> - Add rte_pmd_cxgbe_version.map and add EXPORT_MAP and LIBABIVER to cxgbe
>   Makefile.
> - Use RTE_DIM macro for calculating ARRAY_SIZE.
> 
> v2:
> - Move the driver to drivers/net directory and update all config files and
>   commit logs.  Also update MAINTAINERS.
> - Break the second patch into more patches; incrementally, adding features to
>   the cxgbe poll mode driver.
> - Replace bitwise operations in finding last (most significant) bit set with
>   gcc's __builtin_clz.
> - Fix the return value returned by link update eth_dev operation.
> - Few bug fixes and code cleanup.
> 
> Rahul Lakkireddy (9):
>   cxgbe: add hardware specific api for all supported Chelsio T5 series
> adapters.
>   cxgbe: add cxgbe poll mode driver.
>   cxgbe: add device configuration and RX support for cxgbe PMD.
>   cxgbe: add TX support for cxgbe PMD.
>   cxgbe: add device related operations for cxgbe PMD.
>   cxgbe: add port statistics for cxgbe PMD.
>   cxgbe: add link related functions for cxgbe PMD.
>   cxgbe: add flow control functions for cxgbe PMD.
>   doc: add cxgbe PMD documentation under doc/guides/nics/cxgbe.rst

Applied, thanks for the good work and welcome :)