date:20160530

[dpdk-dev] [PATCH v1] change log level for check when add port in blacklist

2016-05-30 Thread Thomas Monjalon

2016-05-31 10:40, xu, huilong:
> maybe we should change log level, when add port in blacklist,for check it 
> easy.
> and it not influence performance and function

Please, could you show an example of the device init logs at INFO level
before and after this patch?
Thanks

[dpdk-dev] rte_pmd_init_all is missing while upgrading from DPDK 1.6 to DPDK 1.7

2016-05-30 Thread shiva m

Hi,

Thank you very much for your replies. I removed this API, I also see
rte_eal_pci_set_blacklist() is no more available with DPDK 1.7.1,  I am
using rte_eal_devargs_add(). Any document mentioning changes from DPDK-1.6
to DPDK-1.7?

Thanks,
Shiva

On Tue, May 24, 2016 at 11:32 AM, Wiles, Keith 
wrote:

> The PMD?s are now being inited in the EAL init call for the code and that
> API is not required anymore. If you are adding a new driver, make sure it
> is up to date with any changes in the driver code too.
>
> Regards,
> Keith
>
>
>
> >Hi,
> >
> >I have rte_pmd_init_all used in my code. While upgrading from DPDK 1.6 to
> >DPDK 1.7, I see this function is missing in DPDK 1.7. I
> >removed rte_pmd_init_all from my code, but I am not sure whether
> >initialization is done for all poll mode drivers
> >
> >What is the right way to do this?. Can anyone please help in this.
> >
> >Thanks,
> >Shiva
> >
>
>
>
>

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-30 Thread Yuanhan Liu

Hi Ilya,

Generically speaking, this patch looks good to me. But I guess still
need more time to check this issue later; I still failed to reproduce
it on my side after all. So, please allow a late merge.

Thanks.

--yliu

On Mon, May 30, 2016 at 02:05:07PM +0300, Ilya Maximets wrote:
> Ping.
> 
> Best regards, Ilya Maximets.
> 
> On 23.05.2016 14:04, Ilya Maximets wrote:
> > On 23.05.2016 13:57, Yuanhan Liu wrote:
> >> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> >>> In current implementation guest application can reinitialize vrings
> >>> by executing start after stop. In the same time host application
> >>> can still poll virtqueue while device stopped in guest and it will
> >>> crash with segmentation fault while vring reinitialization because
> >>> of dereferencing of bad descriptor addresses.
> >>>
> >>> OVS crash for example:
> >>> <>
> >>> [test-pmd inside guest VM]
> >>>
> >>>   testpmd> port stop all
> >>>   Stopping ports...
> >>>   Checking link statuses...
> >>>   Port 0 Link Up - speed 1 Mbps - full-duplex
> >>>   Done
> >>>   testpmd> port config all rxq 2
> >>>   testpmd> port config all txq 2
> >>>   testpmd> port start all
> >>>   Configuring Port 0 (socket 0)
> >>>   Port 0: 52:54:00:CB:44:C8
> >>>   Checking link statuses...
> >>>   Port 0 Link Up - speed 1 Mbps - full-duplex
> >>>   Done
> >>
> >> I actually didn't manage to reproduce it on my side, with the
> >> vhost-example instead of OVS though. Is that all the commands
> >> to reproduce it, and run them just after start test-pmd?
> > 
> > Actually, I think, packet flow should be enabled while performing
> > above actions and some traffic already should be sent through port
> > to change last used idx on vhost side.
> > 
> > Something like:
> > start
> > ..wait a while.. see that packets are flowing.
> > stop
> > port stop
> > port config
> > port config
> > port start
> >>
> >>> [OVS on host]
> >>>   Program received signal SIGSEGV, Segmentation fault.
> >>>   rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> >>>
> >>>   (gdb) bt
> >>>   #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
> >>>   #1  copy_desc_to_mbuf
> >>>   #2  rte_vhost_dequeue_burst
> >>>   #3  netdev_dpdk_vhost_rxq_recv
> >>>   ...
> >>>
> >>>   (gdb) bt full
> >>>   #0  rte_memcpy
> >>>   ...
> >>>   #1  copy_desc_to_mbuf
> >>>   desc_addr = 0
> >>>   mbuf_offset = 0
> >>>   desc_offset = 12
> >>>   ...
> >>> <>
> >>>
> >>> Fix that by checking addresses of descriptors before using them.
> >>>
> >>> Note: For mergeable buffers this patch checks only guest's address for
> >>> zero, but in non-meargeable case host's address checked. This is done
> >>> because checking of host's address in mergeable case requires additional
> >>> refactoring to keep virtqueue in consistent state in case of error.
> >>>
> >>> Signed-off-by: Ilya Maximets 
> >>> ---
> >>>
> >>> Actually, current virtio implementation looks broken for me. Because
> >>> 'virtio_dev_start' breaks virtqueue while it still available from the 
> >>> vhost
> >>> side.
> >>>
> >>> There was 2 patches about this behaviour:
> >>>
> >>>   1. a85786dc816f ("virtio: fix states handling during initialization")
> >>>   2. 9a0615af7746 ("virtio: fix restart")
> >>>
> >>> The second patch fixes somehow issue intoduced in the first patch, but 
> >>> actually
> >>> also breaks vhost in the way described above.
> >>> It's not pretty clear for me what to do in current situation with virtio,
> >>> because it will be broken for guest application even if vhost will not 
> >>> crash.
> >>>
> >>> May be it'll be better to forbid stopping of virtio device and force user 
> >>> to
> >>> exit and start again (may be implemented in hidden from user way)?
> >>>
> >>> This patch adds additional sane checks, so it should be applied anyway, 
> >>> IMHO.
> >>
> >> Agreed.
> >>
> >>--yliu
> >>
> >>

[dpdk-dev] [PATCH] app/testpmd: log mbuf pool creation

2016-05-30 Thread Thomas Monjalon

2016-05-30 14:04, Olivier Matz:
> Enhance the logs related to mbuf pool creation. Display an info level
> log when creating the mbuf, and display the error as a string on failure.
> 
> After the patch, we have:
> 
>   [...]
>   EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
>   USER1: create a new mbuf pool : n=331456, \
>   size=2176, socket=0
>   EAL: Error - exiting with code: 1
> Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate \
>   memory

Yes printing the error reason is a good improvement.
But why the previous line is at INFO level? It looks to be a debug.

[dpdk-dev] [PATCH] kni: fix use of undefined comma variable in makefile

2016-05-30 Thread Olivier Matz

Hi Ferruh,

On 05/30/2016 05:49 PM, Ferruh Yigit wrote:
> On 5/30/2016 12:56 PM, Olivier Matz wrote:
>> The $(comma) variable is not defined in this Makefile, nor in
>> any included Makefile. Seen while doing a "make clean" on ubuntu:
>>
>>   $ make clean
>>   == Clean lib
>>   == Clean lib/librte_compat
>>   == Clean lib/librte_eal
>>   == Clean lib/librte_eal/common
>>   == Clean lib/librte_eal/linuxapp
>>   == Clean lib/librte_eal/linuxapp/eal
>>   == Clean lib/librte_eal/linuxapp/igb_uio
>>   == Clean lib/librte_eal/linuxapp/kni
>>   tr: missing operand after ?.-?
>>   Two strings must be given when translating.
>>   Try 'tr --help' for more information.
> 
> I don't observe this error on Ubuntu.
> 
> Also did a quick check and $(comma) seems defined, but not sure exactly
> where. What I test:
> 
> $ lsb_release -si
> Ubuntu
> 
> 
> $ git diff
> diff --git a/lib/librte_eal/linuxapp/kni/Makefile
> b/lib/librte_eal/linuxapp/kni/Makefile
> index ac99d3f..dcad241 100644
> --- a/lib/librte_eal/linuxapp/kni/Makefile
> +++ b/lib/librte_eal/linuxapp/kni/Makefile
> @@ -90,4 +90,7 @@ SRCS-y += kni_net.c
>  SRCS-y += kni_ethtool.c
>  SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
> 
> +test_comma:
> +   @echo "= comma: $(comma) ="
> +
> 
> 
> 
> $ make -C lib/librte_eal/linuxapp/kni test_comma
> make: Entering directory '.../dpdk/lib/librte_eal/linuxapp/kni'
> = comma: , =
> make: Leaving directory '.../dpdk/lib/librte_eal/linuxapp/kni'

Thanks for testing and review.
I added the following lines to the makefile (just after the include
rte.vars.mk):

$(info --)
$(info comma=$(comma))
$(info $(origin comma))

And it gives me:

$ make clean
== Clean lib/librte_eal/linuxapp/kni

comma=,
file
make -C /lib/modules/3.13.0-79-generic/build
M=/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni
O=/lib/modules/3.13.0-79-generic/build clean
make -C /usr/src/linux-headers-3.13.0-79-generic \
KBUILD_SRC=/usr/src/linux-headers-3.13.0-79-generic \

KBUILD_EXTMOD="/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni"
-f /usr/src/linux-headers-3.13.0-79-generic/Makefile \
clean
make -f /usr/src/linux-headers-3.13.0-79-generic/scripts/Makefile.clean
obj=/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni

comma=
undefined
tr: missing operand after ?.-?
Two strings must be given when translating.
Try 'tr --help' for more information.
[...]

Actually this Makefile is used twice, and second time the $(comma)
variable is not defined.

It seems the $(comma) variable is defined in rte.cpuflags.mk when
including rte.vars.mk:

  mk/rte.vars.mk
mk/target/generic/rte.vars.mk
  mk/rte.cpuflags.mk(only if KERNELRELEASE is unset, which is
 not the case for the second call)


Do you see the issue when you do a "make clean"?

Thanks,
Olivier

[dpdk-dev] [PATCH v3] ip_pipeline: add rss support

2016-05-30 Thread Jasvinder Singh

This patch enables rss (receive side scaling) per network interface
through the configuration file. The user can specify following
parameters in LINK section for enabling the rss feature - rss_qs,
rss_proto_ipv4, rss_proto_ipv6 and ip_proto_l2.

The "rss_qs" is mandatory parameter which indicates the queues to be
used for rss, while rest of the parameters are optional. When optional
parameters are not provided in the configuration file, default setting
(ETH_RSS_IPV4 | ETH_RSS_IPV6) is assumed for "rss_hf" field of the
rss_conf structure.

For example, following configuration can be applied for using the rss
on port 0 of the network interface;

[PIPELINE0]
type = MASTER
core = 0

[LINK0]
rss_qs = 0 1

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ0.1 RXQ1.0
pktq_out = TXQ0.0 TXQ1.0 TXQ0.1

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
v3
- rebase on top of ip_pipeline configuration file clean up patch
  (http://dpdk.org/dev/patchwork/patch/13106/)
v2
- add check on the number of rss queues entries

 examples/ip_pipeline/app.h  |  27 ++--
 examples/ip_pipeline/config_check.c |  32 +++-
 examples/ip_pipeline/config_parse.c | 299 +++-
 examples/ip_pipeline/init.c |  70 -
 4 files changed, 408 insertions(+), 20 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 05d608b..976fbd0 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -51,6 +51,14 @@
 #define APP_PARAM_NAME_SIZE  PIPELINE_NAME_SIZE
 #define APP_LINK_PCI_BDF_SIZE16

+#ifndef APP_LINK_MAX_HWQ_IN
+#define APP_LINK_MAX_HWQ_IN  64
+#endif
+
+#ifndef APP_LINK_MAX_HWQ_OUT
+#define APP_LINK_MAX_HWQ_OUT 64
+#endif
+
 struct app_mempool_params {
char *name;
uint32_t parsed;
@@ -70,6 +78,12 @@ struct app_link_params {
uint32_t tcp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
uint32_t udp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
uint32_t sctp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
+   uint32_t rss_qs[APP_LINK_MAX_HWQ_IN];
+   uint32_t n_rss_qs;
+   uint64_t rss_proto_ipv4;
+   uint64_t rss_proto_ipv6;
+   uint64_t rss_proto_l2;
+   uint32_t promisc;
uint32_t state; /* DOWN = 0, UP = 1 */
uint32_t ip; /* 0 = Invalid */
uint32_t depth; /* Valid only when IP is valid */
@@ -77,7 +91,6 @@ struct app_link_params {
char pci_bdf[APP_LINK_PCI_BDF_SIZE];

struct rte_eth_conf conf;
-   uint8_t promisc;
 };

 struct app_pktq_hwq_in_params {
@@ -383,17 +396,9 @@ struct app_eal_params {
 #define APP_MAX_MEMPOOLS 8
 #endif

-#ifndef APP_LINK_MAX_HWQ_IN
-#define APP_LINK_MAX_HWQ_IN  64
-#endif
-
-#ifndef APP_LINK_MAX_HWQ_OUT
-#define APP_LINK_MAX_HWQ_OUT 64
-#endif
-
-#define APP_MAX_HWQ_IN (APP_MAX_LINKS * 
APP_LINK_MAX_HWQ_IN)
+#define APP_MAX_HWQ_IN  (APP_MAX_LINKS * APP_LINK_MAX_HWQ_IN)

-#define APP_MAX_HWQ_OUT   (APP_MAX_LINKS * 
APP_LINK_MAX_HWQ_OUT)
+#define APP_MAX_HWQ_OUT (APP_MAX_LINKS * APP_LINK_MAX_HWQ_OUT)

 #ifndef APP_MAX_PKTQ_SWQ
 #define APP_MAX_PKTQ_SWQ 256
diff --git a/examples/ip_pipeline/config_check.c 
b/examples/ip_pipeline/config_check.c
index fd9ff49..18f57be 100644
--- a/examples/ip_pipeline/config_check.c
+++ b/examples/ip_pipeline/config_check.c
@@ -56,6 +56,26 @@ check_mempools(struct app_params *app)
}
 }

+static inline uint32_t
+link_rxq_used(struct app_link_params *link, uint32_t q_id)
+{
+   uint32_t i;
+
+   if ((link->arp_q == q_id) ||
+   (link->tcp_syn_q == q_id) ||
+   (link->ip_local_q == q_id) ||
+   (link->tcp_local_q == q_id) ||
+   (link->udp_local_q == q_id) ||
+   (link->sctp_local_q == q_id))
+   return 1;
+
+   for (i = 0; i < link->n_rss_qs; i++)
+   if (link->rss_qs[i] == q_id)
+   return 1;
+
+   return 0;
+}
+
 static void
 check_links(struct app_params *app)
 {
@@ -90,14 +110,12 @@ check_links(struct app_params *app)
rxq_max = link->udp_local_q;
if (link->sctp_local_q > rxq_max)
rxq_max = link->sctp_local_q;
+   for (i = 0; i < link->n_rss_qs; i++)
+   if (link->rss_qs[i] > rxq_max)
+   rxq_max = link->rss_qs[i];

for (i = 1; i <= rxq_max; i++)
-   APP_CHECK(((link->arp_q == i) ||
-   (link->tcp_syn_q == i) ||
-   (link->ip_local_q == i) ||
-   (link->tcp_local_q == i) ||
-   (link->udp_local_q ==

[dpdk-dev] about rx checksum flags

2016-05-30 Thread Adrien Mazarguil

On Mon, May 30, 2016 at 05:26:21PM +0200, Olivier Matz wrote:
> Hi,
> 
> I'm planning to add the support for offloads in virtio-net pmd.
> It appears that the current rx flags in mbuf are not sufficient to
> describe the state of a packet received from a virtual driver.
> I think we need a way to say "the checksum in the packet data is
> not calculated, but the integrity of the data is verified".
> 
> Currently, we have one flag for L4 (same for IP):
> 
>   PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.
> 
> This has also another problem that has already been discussed [1]:
> if no flag is set, it is expected that the checksum is verified by
> hw, but there is no way to say "the hw does not know if the cksum
> is correct".
> 
> I would like to extend this flag to a 4-state value (2 bits):
> 
>  PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
>   -> the application should verify the checksum by sw
> 
>  PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
>   -> the application can drop the packet without additional check
> 
>  PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
>   -> the application can accept the packet without verifying the
>  checksum by sw
> 
>  PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
>  data, but the integrity of the L4 header is verified.
>   -> the application can process the packet but must not verify the
>  checksum by sw. It has to take care to recalculate the cksum
>  if the packet is transmitted (either by sw or using tx offload)

This API makes and lot of sense for both mlx4 and mlx5 PMDs as well, for
which hardware only returns information about "good" headers ("unknown"
should be assumed otherwise). It does not map well with the current API that
only provides info about bad checksums.

For instance, non-IP packets do not have valid L3/L4 checksums and thus are
not returned as "good" by hardware. These PMDs perform the "not good means
bad" conversion for DPDK which is wrong in such cases. Same for unknown
packet types.

> To keep the compatibility with application, the old flag is kept at the
> same value, and a new flag is added. It is assumed that the behavior
> of applications was:
> 
>   PKT_RX_L4_CKSUM_BAD = 0 -> packet is accepted
>   PKT_RX_L4_CKSUM_BAD = 1 -> packet is dropped
> 
> The new checksum states for L4 (same for IP) would be:
> 
>   old flag   new flag   meaning
>   0  0  PKT_RX_L4_CKSUM_UNKNOWN
>   1  0  PKT_RX_L4_CKSUM_BAD
>   0  1  PKT_RX_L4_CKSUM_GOOD
>   1  1  PKT_RX_L4_CKSUM_NONE
> 
> With this, an old application that only checks the old flag, and
> running using a dpdk having this modification would accept GOOD and
> UNKNOWN packets (like today), drop BAD packets (like today) and
> drop NONE packets (this is a new feature that has to be explicitly
> enabled by the application).
> 
> 
> Any comment?

Considering mlx4 and mlx5 can only return "good" or "unknown" checksums, I'm
all for updating the API as suggested.

> Olivier
> 
> 
> [1] http://dpdk.org/ml/archives/dev/2015-January/011550.html

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH] kni: fix use of undefined comma variable in makefile

2016-05-30 Thread Ferruh Yigit

On 5/30/2016 5:21 PM, Olivier Matz wrote:
> Hi Ferruh,
> 
> On 05/30/2016 05:49 PM, Ferruh Yigit wrote:
>> On 5/30/2016 12:56 PM, Olivier Matz wrote:
>>> The $(comma) variable is not defined in this Makefile, nor in
>>> any included Makefile. Seen while doing a "make clean" on ubuntu:
>>>
>>>   $ make clean
>>>   == Clean lib
>>>   == Clean lib/librte_compat
>>>   == Clean lib/librte_eal
>>>   == Clean lib/librte_eal/common
>>>   == Clean lib/librte_eal/linuxapp
>>>   == Clean lib/librte_eal/linuxapp/eal
>>>   == Clean lib/librte_eal/linuxapp/igb_uio
>>>   == Clean lib/librte_eal/linuxapp/kni
>>>   tr: missing operand after ?.-?
>>>   Two strings must be given when translating.
>>>   Try 'tr --help' for more information.
>>
>> I don't observe this error on Ubuntu.
>>
>> Also did a quick check and $(comma) seems defined, but not sure exactly
>> where. What I test:
>>
>> $ lsb_release -si
>> Ubuntu
>>
>>
>> $ git diff
>> diff --git a/lib/librte_eal/linuxapp/kni/Makefile
>> b/lib/librte_eal/linuxapp/kni/Makefile
>> index ac99d3f..dcad241 100644
>> --- a/lib/librte_eal/linuxapp/kni/Makefile
>> +++ b/lib/librte_eal/linuxapp/kni/Makefile
>> @@ -90,4 +90,7 @@ SRCS-y += kni_net.c
>>  SRCS-y += kni_ethtool.c
>>  SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
>>
>> +test_comma:
>> +   @echo "= comma: $(comma) ="
>> +
>>
>>
>>
>> $ make -C lib/librte_eal/linuxapp/kni test_comma
>> make: Entering directory '.../dpdk/lib/librte_eal/linuxapp/kni'
>> = comma: , =
>> make: Leaving directory '.../dpdk/lib/librte_eal/linuxapp/kni'
> 
> Thanks for testing and review.
> I added the following lines to the makefile (just after the include
> rte.vars.mk):
> 
>   $(info --)
>   $(info comma=$(comma))
>   $(info $(origin comma))
> 
> And it gives me:
> 
>   $ make clean
>   == Clean lib/librte_eal/linuxapp/kni
>   
>   comma=,
>   file
>   make -C /lib/modules/3.13.0-79-generic/build
> M=/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni
> O=/lib/modules/3.13.0-79-generic/build clean
>   make -C /usr/src/linux-headers-3.13.0-79-generic \
> KBUILD_SRC=/usr/src/linux-headers-3.13.0-79-generic \
> 
> KBUILD_EXTMOD="/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni"
> -f /usr/src/linux-headers-3.13.0-79-generic/Makefile \
>   clean
>   make -f /usr/src/linux-headers-3.13.0-79-generic/scripts/Makefile.clean
> obj=/home/user/dpdk.org/build/build/lib/librte_eal/linuxapp/kni
>   
>   comma=
>   undefined
>   tr: missing operand after ?.-?
>   Two strings must be given when translating.
>   Try 'tr --help' for more information.
>   [...]
> 
> Actually this Makefile is used twice, and second time the $(comma)
> variable is not defined.
> 
> It seems the $(comma) variable is defined in rte.cpuflags.mk when
> including rte.vars.mk:
> 
>   mk/rte.vars.mk
> mk/target/generic/rte.vars.mk
>   mk/rte.cpuflags.mk(only if KERNELRELEASE is unset, which is
>  not the case for the second call)
> 
> 
> Do you see the issue when you do a "make clean"?
> 

No issue on "make clean".

I did same modification as you did, for me comma defined for both times,
a system variable can be triggering the behavior perhaps.

What I got with "make clean" is:

...
== Clean lib/librte_eal/linuxapp/kni

comma=,
file
make -C /lib/modules/4.4.0-22-generic/build
M=/home/ferruhy/development/dpdk/build/build/lib/librte_eal/linuxapp/kni
O=/lib/modules/4.4.0-22-generic/build clean
make -C /usr/src/linux-headers-4.4.0-22-generic
KBUILD_SRC=/usr/src/linux-headers-4.4.0-22-generic \
-f /usr/src/linux-headers-4.4.0-22-generic/Makefile clean
make -f /usr/src/linux-headers-4.4.0-22-generic/scripts/Makefile.clean
obj=/home/ferruhy/development/dpdk/build/build/lib/librte_eal/linuxapp/kni

comma=,
file
  rm -rf
/home/ferruhy/development/dpdk/build/build/lib/librte_eal/linuxapp/kni/.tmp_versions
  rm -f
/home/ferruhy/development/dpdk/build/build/lib/librte_eal/linuxapp/kni/Module.symvers
== Clean lib/librte_eal/linuxapp/xen_dom0
...

[dpdk-dev] [PATCH v2] i40evf: fix return value if command fails

2016-05-30 Thread Bruce Richardson

On Tue, May 10, 2016 at 10:51:59AM +0800, Jingjing Wu wrote:
> Previously, if message is sent successfully, but no response is
> received, function "i40evf_execute_vf_cmd" will return without error.
> The root cause is value "err" is overwritten. This patch fixes it.
> 
> Fixes: ae19955e7c86 ("i40evf: support reporting PF reset")
> Signed-off-by: Jingjing Wu 
> ---
Applied to dpdk-next-net/rel_16_07

/Bruce

[dpdk-dev] about rx checksum flags

2016-05-30 Thread Olivier Matz

Hi,

I'm planning to add the support for offloads in virtio-net pmd.
It appears that the current rx flags in mbuf are not sufficient to
describe the state of a packet received from a virtual driver.
I think we need a way to say "the checksum in the packet data is
not calculated, but the integrity of the data is verified".

Currently, we have one flag for L4 (same for IP):

  PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.

This has also another problem that has already been discussed [1]:
if no flag is set, it is expected that the checksum is verified by
hw, but there is no way to say "the hw does not know if the cksum
is correct".

I would like to extend this flag to a 4-state value (2 bits):

 PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
  -> the application should verify the checksum by sw

 PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
  -> the application can drop the packet without additional check

 PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
  -> the application can accept the packet without verifying the
 checksum by sw

 PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
 data, but the integrity of the L4 header is verified.
  -> the application can process the packet but must not verify the
 checksum by sw. It has to take care to recalculate the cksum
 if the packet is transmitted (either by sw or using tx offload)

To keep the compatibility with application, the old flag is kept at the
same value, and a new flag is added. It is assumed that the behavior
of applications was:

  PKT_RX_L4_CKSUM_BAD = 0 -> packet is accepted
  PKT_RX_L4_CKSUM_BAD = 1 -> packet is dropped

The new checksum states for L4 (same for IP) would be:

  old flag   new flag   meaning
  0  0  PKT_RX_L4_CKSUM_UNKNOWN
  1  0  PKT_RX_L4_CKSUM_BAD
  0  1  PKT_RX_L4_CKSUM_GOOD
  1  1  PKT_RX_L4_CKSUM_NONE

With this, an old application that only checks the old flag, and
running using a dpdk having this modification would accept GOOD and
UNKNOWN packets (like today), drop BAD packets (like today) and
drop NONE packets (this is a new feature that has to be explicitly
enabled by the application).


Any comment?

Olivier


[1] http://dpdk.org/ml/archives/dev/2015-January/011550.html

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-05-30 Thread Huawei Xie

We keep a common vq structure, containing only vq related fields,
and then split others into RX, TX and control queue respectively.

Signed-off-by: Huawei Xie 
---
v2:
- don't split virtio_dev_rx/tx_queue_setup
v3:
- fix some 80 char warnings
- fix other newer version checkpatch warnings
- remove '\n' in PMD_RX_LOG
- remove hdr zone allocation for RX queue

 drivers/net/virtio/virtio_ethdev.c  | 352 ++--
 drivers/net/virtio/virtio_ethdev.h  |   2 +-
 drivers/net/virtio/virtio_pci.c |   4 +-
 drivers/net/virtio/virtio_pci.h |   3 +-
 drivers/net/virtio/virtio_rxtx.c| 294 ++
 drivers/net/virtio/virtio_rxtx.h|  56 -
 drivers/net/virtio/virtio_rxtx_simple.c |  83 
 drivers/net/virtio/virtqueue.h  |  70 +++
 8 files changed, 491 insertions(+), 373 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3fb628..256888a 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -114,40 +114,61 @@ struct rte_virtio_xstats_name_off {
 };

 /* [rt]x_qX_ is prepended to the name string here */
-static const struct rte_virtio_xstats_name_off rte_virtio_q_stat_strings[] = {
-   {"good_packets",   offsetof(struct virtqueue, packets)},
-   {"good_bytes", offsetof(struct virtqueue, bytes)},
-   {"errors", offsetof(struct virtqueue, errors)},
-   {"multicast_packets",  offsetof(struct virtqueue, multicast)},
-   {"broadcast_packets",  offsetof(struct virtqueue, broadcast)},
-   {"undersize_packets",  offsetof(struct virtqueue, size_bins[0])},
-   {"size_64_packets",offsetof(struct virtqueue, size_bins[1])},
-   {"size_65_127_packets",offsetof(struct virtqueue, size_bins[2])},
-   {"size_128_255_packets",   offsetof(struct virtqueue, size_bins[3])},
-   {"size_256_511_packets",   offsetof(struct virtqueue, size_bins[4])},
-   {"size_512_1023_packets",  offsetof(struct virtqueue, size_bins[5])},
-   {"size_1024_1517_packets", offsetof(struct virtqueue, size_bins[6])},
-   {"size_1518_max_packets",  offsetof(struct virtqueue, size_bins[7])},
+static const struct rte_virtio_xstats_name_off rte_virtio_rxq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_rx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_rx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_rx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_rx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_rx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_rx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_rx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_rx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[7])},
 };

-#define VIRTIO_NB_Q_XSTATS (sizeof(rte_virtio_q_stat_strings) / \
-   sizeof(rte_virtio_q_stat_strings[0]))
+/* [rt]x_qX_ is prepended to the name string here */
+static const struct rte_virtio_xstats_name_off rte_virtio_txq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_tx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_tx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_tx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_tx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_tx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_tx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_tx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_tx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[7])},
+};
+
+#define VIRTIO_NB_RXQ_XSTATS (sizeof(rte_virtio_rxq_stat_strings) / \
+   sizeof(rte_virtio_rxq_stat_strings[0]))
+#define VIRTIO_NB_TXQ_XSTATS (sizeof(rte_virt

[dpdk-dev] [PATCH] kni: fix use of undefined comma variable in makefile

2016-05-30 Thread Ferruh Yigit

On 5/30/2016 12:56 PM, Olivier Matz wrote:
> The $(comma) variable is not defined in this Makefile, nor in
> any included Makefile. Seen while doing a "make clean" on ubuntu:
> 
>   $ make clean
>   == Clean lib
>   == Clean lib/librte_compat
>   == Clean lib/librte_eal
>   == Clean lib/librte_eal/common
>   == Clean lib/librte_eal/linuxapp
>   == Clean lib/librte_eal/linuxapp/eal
>   == Clean lib/librte_eal/linuxapp/igb_uio
>   == Clean lib/librte_eal/linuxapp/kni
>   tr: missing operand after ?.-?
>   Two strings must be given when translating.
>   Try 'tr --help' for more information.

I don't observe this error on Ubuntu.

Also did a quick check and $(comma) seems defined, but not sure exactly
where. What I test:

$ lsb_release -si
Ubuntu


$ git diff
diff --git a/lib/librte_eal/linuxapp/kni/Makefile
b/lib/librte_eal/linuxapp/kni/Makefile
index ac99d3f..dcad241 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -90,4 +90,7 @@ SRCS-y += kni_net.c
 SRCS-y += kni_ethtool.c
 SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c

+test_comma:
+   @echo "= comma: $(comma) ="
+



$ make -C lib/librte_eal/linuxapp/kni test_comma
make: Entering directory '.../dpdk/lib/librte_eal/linuxapp/kni'
= comma: , =
make: Leaving directory '.../dpdk/lib/librte_eal/linuxapp/kni'

[dpdk-dev] [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything!

2016-05-30 Thread Bodireddy, Bhanuprakash

From: Martinx - ? [mailto:thiagocmarti...@gmail.com]
Sent: Monday, May 30, 2016 5:01 PM
To: Bodireddy, Bhanuprakash 
Cc: Christian Ehrhardt ;  ; dev ; qemu-stable 
at nongnu.org
Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of 
dpdkvhostuser OVS socket interface, it slows down everything!

Hello Bhanu,

 I'm a little bit confused, you said that the problem can be fixed but, later, 
you also said that:

 "On a Multi VM setup even with the above patch applied, one might see 
aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci"...

 My idea is to use OVS with DPDK in a multi-vm environment but, based on your 
answer, this is not possible, because the VM A, can interfere with VM B... Is 
that true even with that patch applied, can you confirm this?

[BHANU] With the patch applied the issue should be fixed.  Without the patch,  
VM A can interfere with VM B when VM A isn?t processing its queues which 
eventually triggers vhost send retries until timeout (100ms in this case). This 
cause the pmd thread to slow down and that affects the other Virtual 
Machines(VM B) on the host as happened in your case. The patch that I pointed 
will remove the retry logic completely.

Also in your case, packets are sent to the idle VM (no packets drained from the 
virt queues inside) which triggered the issue and affected the neighboring VMs. 
consider sending the traffic to the newly booted VM after the forwarding is 
enabled inside the guest.

 I don't think that diverting the traffic from a VM that loaded virtio-pci 
drivers is a doable solution (since you can't predict what the owners of the 
VMs will be doing), also, specially because in my env, the DPDK App is a L2 
bridge, so, it receives traffic that is not destined to it (might be even 
harder to try to do this)...

 I have all the required hardware to keep testing this, so, let me know when 
you guys (Intel / Canonical) have newer versions, I'll test it with pleasure!   
:-)
[BHANU] Apply the patch from the thread and this should resolve the issue 
reported.

Thanks!
Thiago

On 25 May 2016 at 11:00, Bodireddy, Bhanuprakash mailto:bhanuprakash.bodireddy at intel.com>> wrote:
I could reproduce the issue and this can be fixed as below

Firstly, the throughput issues observed with other VMs when a new VM is started 
can be fixed using the patch in the thread 
http://openvswitch.org/pipermail/dev/2016-May/071615.html.  I have put up an 
explanation in this thread for the cause of issue especially with multi VM 
setup on OVS DPDK.

On a Multi VM setup even with the above patch applied, one might see aggregate 
throughput difference when vNIC is bind to igb_uio vs virtio-pci, this is for 
the fact that the interrupt overhead is significantly higher when virtio-pci is 
in use.

More importantly if you have setup explicit flows matching VM's MAC/IP, 
disabling the flows to the VM that are idle would improve the aggregate 
throughput and lessen the burden on the pmd thread.   'watch -d 
./utilities/ovs-appctl dpctl/show -s' will show no. of packet stats.

Regards,
Bhanu Prakash.

>-Original Message-
>From: dev [mailto:dev-bounces at openvswitch.orgopenvswitch.org>] On Behalf Of Christian
>Ehrhardt
>Sent: Wednesday, May 25, 2016 7:08 AM
>To: Martinx - ? mailto:thiagocmartinsc at 
>gmail.com>>
>Cc: mailto:dev at openvswitch.org>> openvswitch.org>; dev dpdk.org>;
>qemu-stable at nongnu.org
>Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of
>dpdkvhostuser OVS socket interface, it slows down everything!
>
>Hi again,
>another forgotten case.
>
>I currently I lack the HW to fully reproduce this, but the video summary is
>pretty good and shows the issue in an impressive way.
>
>Also the description is good and here as well I wonder if anybody else could
>reproduce this.
>Any hints / insights are welcome.
>
>P.S. and also again - two list cross posting, but here as well it is yet 
>unclear
>which it belongs to so I'll keep it as well
>
>Christian Ehrhardt
>Software Engineer, Ubuntu Server
>Canonical Ltd
>
>On Sun, May 22, 2016 at 6:35 PM, Martinx - ?
>mailto:thiagocmartinsc at gmail.com>>
>wrote:
>
>> Guys,
>>
>>  I'm seeing a strange problem here, in my OVS+DPDK deployment, on top
>> of Ubuntu 16.04 (DPDK 2.2 and OVS 2.5).
>>
>>  Here is what I'm trying to do: run OVS with DPDK at the host, for KVM
>> Guests that also, will be running more DPDK Apps.
>>
>>  The host have 2 x 10G NICs, for OVS+DPDK and each KVM Guest receives
>> its own VLAN tagged traffic (or all tags).
>>
>>  There is an IXIA Traffic Generator sending 10G of traffic on both
>> directions (20G total).
>>
>>  Exemplifying, the problem is, lets say that I already have 2 VMs (or
>> 10) running DPDK Apps (on top of dpdkvhostuser), everything is working
>> as expected, then, if I boot the 3rd (or 11) KVM Guest, the OVS+DPDK
>> br

[dpdk-dev] [PATCH] virtio: use volatile to get used->idx in the loop

2016-05-30 Thread Yuanhan Liu

On Wed, May 25, 2016 at 12:16:41AM +0800, Huawei Xie wrote:
> There is no external function call or any barrier in the loop,
> the used->idx would only be retrieved once.
> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index c3fb628..f6d6305 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -204,7 +204,8 @@ virtio_send_command(struct virtqueue *vq, struct 
> virtio_pmd_ctrl *ctrl,
>   usleep(100);
>   }
>  
> - while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
> + while (vq->vq_used_cons_idx !=
> +*((volatile uint16_t *)(&vq->vq_ring.used->idx))) {

I'm wondering maybe we could fix VIRTQUEUE_NUSED (which has no such
qualifier) and use this macro here?

If you check the reference of that macro, you might find similar
issues, say, it is also used inside the while-loop of
virtio_recv_mergeable_pkts().

--yliu

[dpdk-dev] [PATCH v8 3/3] i40e: add floating VEB extension support

2016-05-30 Thread Peng, Yuan

;&
+   (pf->vf_fbitmap & 1 << vf_id)) {
vf->vsi = i40e_vsi_setup(vf->pf, I40E_VSI_SRIOV,
NULL, vf->vf_idx);
} else {
--
2.1.4

-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: floating-VEB-testplan - attach.txt
URL: 
<http://dpdk.org/ml/archives/dev/attachments/20160530/8259c3de/attachment-0001.txt>

[dpdk-dev] [PATCH v1] null: set port_id in mbufs received from null PMD

2016-05-30 Thread Tetsuya Mukawa

On 2016/05/27 21:28, Mcnamara, John wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sean Harte
>> Sent: Friday, May 27, 2016 1:27 PM
>> To: mukawa at igel.co.jp
>> Cc: dev at dpdk.org; Harte, Sean 
>> Subject: [dpdk-dev] [PATCH v1] null: set port_id in mbufs received from
>> null PMD
>>
>> Ensure that the port field is set in mbufs received from the null PMD.
>>
>> Signed-off-by: Sean Harte 
> Acked-by: John McNamara 
>
Acked-by: Tetsuya Mukawa

[dpdk-dev] [PATCH 1/2] qede: rename config option

2016-05-30 Thread Bruce Richardson

On Fri, May 06, 2016 at 09:21:30PM -0700, Rasesh Mody wrote:
> Rename RTE_LIBRTE_QEDE_DEBUG_DRV to RTE_LIBRTE_QEDE_DEBUG_DRIVER
> 
> Fixes: 425cba2a5176 ("qede: enable PMD build")
> Fixes: 33e9ff1b72ca ("qede: add core driver")
> 
These fixes lines don't have the correct commit id's in them. The commit
"qede: enable PMD build" is actually commit 3eae93a9, and "qede: add core
driver" is 2ea6f76a. The commit id of a patch can change from what it is
originally when the patch is applied.

/Bruce

[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-05-30 Thread Bruce Richardson

On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
> From: Harish Patil 
> 
> Under certain scenarios, MFW periodically polls the driver
> for LAN statistics. This patch implements the osal hook to
> fill in the stats.
> 
> Fixes: ffa002d318d36 ("qede: add base driver")
> 
What is MFW?

/Bruce

[dpdk-dev] [PATCH v4] ip_pipeline: configuration file parser cleanup

2016-05-30 Thread Jasvinder Singh

This commit adds following changes to configuration file parsing of
the ip pipeline application;

1. Parsing routines related to packet queues (pktq_in/out fields in the
PIPELINE section) and message queues (msgq_in/out fields of in the MSGQ
Section) are updated.

In the parsing routines, function "strtok_r()" is used for parsing the
string instead of manually checking the string termination, white
spaces, tabs etc., between the string tokens. Each call to strtok_r()
returns a pointer to a null-terminated string containing the next token.
If no more tokens are found, strtok_r() returns NULL. As a result of
using strtok_r(), the code size of the parsing routines is reduced
significantly.

2. Replace PARSER_PARAM_ADD_CHECK macro by more specific macros such as
PARSE_CHECK_DUPLICATE_SECTION, PARSE_CHECK_DUPLICATE_SECTION_EAL to detect
duplicate entries in the various sections of the configuration file

3. Add new macros PARSER_ERROR_NO_ELEMENTS and PARSE_ERROR_TOO_MANY_ELEMENTS
for detecting no element and more elements than allowed situations
respectively, in the section entry.

4. Add new macros APP_PARAM_ADD_LINK_FOR_RXQ, APP_PARAM_ADD_LINK_FOR_TXQ
and APP_PARAM_ADD_LINK_FOR_TM which add corresponding nic ports entry to
the application param structure while parsing rx/tx queues, TM (Traffic
Manager) port sections and pktq_in/out entries of pipeline sections

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
v4
- update the commit message
- move APP_PARAM_ADD macro from app.h to config_parse.c as it is only used
  by the routines defined in this file
- remove extra newline character from error message display
- rebased on top of ip_pipeline CLI patchset
  (http://dpdk.org/dev/patchwork/patch/12911/) and NULL packet processing
  fix patch (http://dpdk.org/dev/patchwork/patch/12807/)

v3
- add check on the number of pktq_in/out entries
- add check on the number of msgq_in/out entries

v2
- update the commit message
- change the local variable name from "token" to "name"

 examples/ip_pipeline/app.h  |  25 +-
 examples/ip_pipeline/config_parse.c | 472 +++-
 2 files changed, 196 insertions(+), 301 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index e775024..05d608b 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -50,6 +50,7 @@

 #define APP_PARAM_NAME_SIZE  PIPELINE_NAME_SIZE
 #define APP_LINK_PCI_BDF_SIZE16
+
 struct app_mempool_params {
char *name;
uint32_t parsed;
@@ -370,6 +371,8 @@ struct app_eal_params {
/* Support running on Xen dom0 without hugetlbfs */
uint32_t xen_dom0_present;
int xen_dom0;
+
+   uint32_t parsed;
 };

 #ifndef APP_APPNAME_SIZE
@@ -529,28 +532,6 @@ do 
\
sscanf(obj->name, prefix "%" SCNu32, &id);  
\
 while (0)  \

-#define APP_PARAM_ADD(obj_array, obj_name) \
-({ \
-   ssize_t obj_idx;\
-   const ssize_t obj_count = RTE_DIM(obj_array);   \
-   \
-   obj_idx = APP_PARAM_FIND(obj_array, obj_name);  \
-   if (obj_idx < 0) {  \
-   for (obj_idx = 0; obj_idx < obj_count; obj_idx++) { \
-   if (!APP_PARAM_VALID(&((obj_array)[obj_idx])))  \
-   break;  \
-   }   \
-   \
-   if (obj_idx < obj_count) {  \
-   (obj_array)[obj_idx].name = strdup(obj_name);   \
-   if ((obj_array)[obj_idx].name == NULL)  \
-   obj_idx = -EINVAL;  \
-   } else  \
-   obj_idx = -ENOMEM;  \
-   }   \
-   obj_idx;\
-})
-
 #defineAPP_CHECK(exp, fmt, ...)
\
 do {   \
if (!(exp)) {   \
diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index 3951e1d..53130a0 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -1,4 +1,4 @@
-/*-
+?/*-

[dpdk-dev] [PATCH] qat: fix phys address of content descriptor

2016-05-30 Thread De Lara Guarch, Pablo

Hi Arek,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Arek Kusztal
> Sent: Monday, May 30, 2016 1:39 PM
> To: dev at dpdk.org
> Cc: Trahe, Fiona; Griffin, John; Jain, Deepak K; olivier.matz at 6wind.com;
> thomas.monjalon at 6wind.com; Kusztal, ArkadiuszX
> Subject: [dpdk-dev] [PATCH] qat: fix phys address of content descriptor
> 
> From: Arkadiusz Kusztal 
> 
> this patch fixes an error with computation of physical address of
> content descriptor in the symmetric operations session
> 
> Signed-off-by: Arkadiusz Kusztal 

You forgot to include the "Fixes" line. Also you could reword the commit 
message like:
"Fixes wrong computation of physical address..."

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-30 Thread Ilya Maximets

On 30.05.2016 15:00, Tan, Jianfeng wrote:
> Hi,
> 
>> -Original Message-
>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>> Sent: Friday, May 20, 2016 8:50 PM
>> To: dev at dpdk.org; Xie, Huawei; Yuanhan Liu
>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
>> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
>>
>> In current implementation guest application can reinitialize vrings
>> by executing start after stop. In the same time host application
>> can still poll virtqueue while device stopped in guest and it will
>> crash with segmentation fault while vring reinitialization because
>> of dereferencing of bad descriptor addresses.
>>
>> OVS crash for example:
>> <>
>> [test-pmd inside guest VM]
>>
>>  testpmd> port stop all
>>  Stopping ports...
>>  Checking link statuses...
>>  Port 0 Link Up - speed 1 Mbps - full-duplex
>>  Done
>>  testpmd> port config all rxq 2
>>  testpmd> port config all txq 2
>>  testpmd> port start all
>>  Configuring Port 0 (socket 0)
>>  Port 0: 52:54:00:CB:44:C8
>>  Checking link statuses...
>>  Port 0 Link Up - speed 1 Mbps - full-duplex
>>  Done
>>
>> [OVS on host]
>>  Program received signal SIGSEGV, Segmentation fault.
>>  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
>> rte_memcpy.h
>>
>>  (gdb) bt
>>  #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>  #1  copy_desc_to_mbuf
>>  #2  rte_vhost_dequeue_burst
>>  #3  netdev_dpdk_vhost_rxq_recv
>>  ...
>>
>>  (gdb) bt full
>>  #0  rte_memcpy
>>  ...
>>  #1  copy_desc_to_mbuf
>>  desc_addr = 0
>>  mbuf_offset = 0
>>  desc_offset = 12
>>  ...
>> <>
>>
>> Fix that by checking addresses of descriptors before using them.
>>
>> Note: For mergeable buffers this patch checks only guest's address for
>> zero, but in non-meargeable case host's address checked. This is done
>> because checking of host's address in mergeable case requires additional
>> refactoring to keep virtqueue in consistent state in case of error.
> 
> 
> I agree with you that it should be fixed because malicious guest could launch
> DOS attack on vswitch with the current implementation.
> 
> But I don't understand why you do not fix the mergable case in
> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in
> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?
> 
> Thanks,
> Jianfeng

Hi.
As I said inside commit-message, checking of host's address in mergeable case
requires additional refactoring to keep virtqueue in consistent state.

There are few issues with checking inside copy_mbuf_to_desc_mergable() :

1. Ring elements already reserved and we must fill them with some
   sane data before going out of virtio_dev_merge_rx().

2. copy_mbuf_to_desc_mergable() can't return an error in current
   implementation (additional checking needed), otherwise used->idx
   will be decremented (I think, it's bad).

Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio
reinitialization, because guest's address will be zero (case described in
commit-message). Checking of guest's address will not help in case of bad and
not NULL address, but useful in this common case.
Also, we can't catch bad address what we able to map, so, malicious guest could
break vhost anyway.

I agree, that checking of host's address is better, but this may be done later
together with resolving above issues.

Best regards, Ilya Maximets.

[dpdk-dev] [PATCH v5 1/3] mempool: support external handler

2016-05-30 Thread Jerin Jacob

On Fri, May 27, 2016 at 03:44:31PM +0100, Hunt, David wrote:
> 
> 
Hi David,
[snip]
>  That chunk of code above would be better moved all right. I'd suggest
> moving it to the
> rte_mempool_create function, as that's the one that needs the backward
> compatibility.

OK

> 
> On the flags issue, each mempool handler can re-interpret the flags as
> needed. Maybe we
> could use the upper half of the bits for different handlers, changing the
> meaning of the
> bits depending on which handler is being set up. We can then keep the lower
> half for bits that are common across all handlers? That way the user can

Common lower half bit in flags looks good.

> just set the bits they
> are interested in for that handler. Also, the alloc function has access to
> the flags, so maybe the
> handler specific setup could be handled in the alloc function rather than
> adding a new function pointer?

Yes. I agree.

> 
> Of course, that won't help if we need to pass in more data, in which case
> we'd probably need an
> opaque data pointer somewhere. It would probably be most useful to pass it
> in with the
> alloc, which may need the data. Any suggestions?

But the top level rte_mempool_create() function needs to pass the data. Right?
That would be an ABI change. IMO, we need to start thinking about
passing a struct of config data to rte_mempool_create to create
backward compatibility on new argument addition to rte_mempool_create()

Other points in HW assisted pool manager perspective,

1) May be RING can be replaced with some other higher abstraction name
for the internal MEMPOOL_F_RING_CREATED flag
2) IMO, It is better to change void *pool in struct rte_mempool to
anonymous union type, something like below, so that mempool
implementation can choose the best type.
union {
void *pool;
uint64_t val;
}

3) int32_t handler_idx creates 4 byte hole in struct rte_mempool in
64 bit system. IMO it better to rearrange.(as const struct rte_memzone
*mz comes next)

4) IMO, It is better to change ext_alloc/ext_free to ext_create/ext_destroy
as their is no allocation in HW assisted pool manager case,
it will be mostly creating some HW initialization.

> 
> Regards,
> Dave.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

[dpdk-dev] [PATCH] qat: fix phys address of content descriptor

2016-05-30 Thread Jain, Deepak K


-Original Message-
From: Kusztal, ArkadiuszX 
Sent: Monday, May 30, 2016 1:39 PM
To: dev at dpdk.org
Cc: Trahe, Fiona ; Griffin, John ; Jain, Deepak K ; olivier.matz at 
6wind.com; thomas.monjalon at 6wind.com; Kusztal, ArkadiuszX 

Subject: [PATCH] qat: fix phys address of content descriptor

From: Arkadiusz Kusztal 

this patch fixes an error with computation of physical address of content 
descriptor in the symmetric operations session

Signed-off-by: Arkadiusz Kusztal 
---
 drivers/crypto/qat/qat_crypto.c  | 9 ++---
 lib/librte_cryptodev/rte_cryptodev.c | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c 
index 495ea1c..abe0511 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -807,12 +807,15 @@ static inline uint32_t adf_modulo(uint32_t data, uint32_t 
shift)
return data - mult;
 }

-void qat_crypto_sym_session_init(struct rte_mempool *mp, void *priv_sess)
+void qat_crypto_sym_session_init(struct rte_mempool *mp, void 
+*sym_sess)
 {
-   struct qat_session *s = priv_sess;
+   struct rte_cryptodev_sym_session *sess = sym_sess;
+   struct qat_session *s = (void *)sess->_private;

PMD_INIT_FUNC_TRACE();
-   s->cd_paddr = rte_mempool_virt2phy(mp, &s->cd);
+   s->cd_paddr = rte_mempool_virt2phy(mp, sess) +
+   offsetof(struct qat_session, cd) +
+   offsetof(struct rte_cryptodev_sym_session, _private);
 }

 int qat_dev_config(__rte_unused struct rte_cryptodev *dev) diff --git 
a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
index aa4ea42..960e2d5 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -956,7 +956,7 @@ rte_cryptodev_sym_session_init(struct rte_mempool *mp,
sess->mp = mp;

if (dev->dev_ops->session_initialize)
-   (*dev->dev_ops->session_initialize)(mp, sess->_private);
+   (*dev->dev_ops->session_initialize)(mp, sess);
 }

 static int
--
2.1.0

Acked-by: Deepak Kumar Jain

[dpdk-dev] [PATCH] port: add kni interface support

2016-05-30 Thread Dumitrescu, Cristian

Hi Wei Jie,

Thank you for submitting this patch. I am currently travelling, I will review 
your patch and come back to you hopefully later this week or early next week.

Regards,
Cristian

> -Original Message-
> From: WeiJie Zhuang [mailto:zhuangwj at gmail.com]
> Sent: Saturday, May 28, 2016 12:26 PM
> To: Dumitrescu, Cristian 
> Cc: dev at dpdk.org; WeiJie Zhuang 
> Subject: [PATCH] port: add kni interface support
> 
> 1. add KNI port type to the packet framework
> 2. add KNI support to the IP Pipeline sample Application
> 
> Signed-off-by: WeiJie Zhuang 
> ---
> v2:
> * Fix check patch error.
> ---
>  doc/api/doxy-api-index.md   |   1 +
>  examples/ip_pipeline/Makefile   |   6 +-
>  examples/ip_pipeline/app.h  |  74 +
>  examples/ip_pipeline/config/kni.cfg |  12 ++
>  examples/ip_pipeline/config_check.c |  34 
>  examples/ip_pipeline/config_parse.c | 130 +++
>  examples/ip_pipeline/init.c |  79 +
>  examples/ip_pipeline/kni/kni.c  |  80 +
>  examples/ip_pipeline/kni/kni.h  |  16 ++
>  examples/ip_pipeline/pipeline_be.h  |  13 ++
>  examples/ip_pipeline/thread.c   |   9 +
>  lib/librte_port/Makefile|   7 +
>  lib/librte_port/rte_port_kni.c  | 316
> 
>  lib/librte_port/rte_port_kni.h  |  81 +
>  14 files changed, 856 insertions(+), 2 deletions(-)
>  create mode 100644 examples/ip_pipeline/config/kni.cfg
>  create mode 100644 examples/ip_pipeline/kni/kni.c
>  create mode 100644 examples/ip_pipeline/kni/kni.h
>  create mode 100644 lib/librte_port/rte_port_kni.c
>  create mode 100644 lib/librte_port/rte_port_kni.h
> 
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index f626386..e38a959 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -119,6 +119,7 @@ There are many libraries, so their headers may be
> grouped by topics:
>  [reass](@ref rte_port_ras.h),
>  [sched](@ref rte_port_sched.h),
>  [src/sink] (@ref rte_port_source_sink.h)
> +[kni]  (@ref rte_port_kni.h)
>* [table](@ref rte_table.h):
>  [lpm IPv4] (@ref rte_table_lpm.h),
>  [lpm IPv6] (@ref rte_table_lpm_ipv6.h),
> diff --git a/examples/ip_pipeline/Makefile b/examples/ip_pipeline/Makefile
> index 10fe1ba..848c2aa 100644
> --- a/examples/ip_pipeline/Makefile
> +++ b/examples/ip_pipeline/Makefile
> @@ -43,9 +43,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # binary name
>  APP = ip_pipeline
> 
> +VPATH += $(SRCDIR)/kni
>  VPATH += $(SRCDIR)/pipeline
> 
> -INC += $(wildcard *.h) $(wildcard pipeline/*.h)
> +INC += $(wildcard *.h) $(wildcard pipeline/*.h) $(wildcard kni/*.h)
> 
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) := main.c
> @@ -56,6 +57,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += kni.c
> 
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_fe.c
> @@ -72,7 +74,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) +=
> pipeline_flow_actions.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_routing_be.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_routing.c
> 
> -CFLAGS += -I$(SRCDIR) -I$(SRCDIR)/pipeline
> +CFLAGS += -I$(SRCDIR) -I$(SRCDIR)/pipeline -I$(SRCDIR)/kni
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -Wno-error=unused-function -Wno-
> error=unused-variable
> 
> diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
> index e775024..a86ce57 100644
> --- a/examples/ip_pipeline/app.h
> +++ b/examples/ip_pipeline/app.h
> @@ -44,7 +44,9 @@
>  #include 
> 
>  #include 
> +#include 
> 
> +#include "kni.h"
>  #include "cpu_core_map.h"
>  #include "pipeline.h"
> 
> @@ -99,6 +101,18 @@ struct app_pktq_hwq_out_params {
>   struct rte_eth_txconf conf;
>  };
> 
> +struct app_kni_params {
> + char *name;
> + uint32_t parsed;
> +
> + uint32_t socket_id;
> + uint32_t core_id;
> + uint32_t hyper_th_id;
> +
> + uint32_t mempool_id;
> + uint32_t burst;
> +};
> +
>  struct app_pktq_swq_params {
>   char *name;
>   uint32_t parsed;
> @@ -172,6 +186,7 @@ enum app_pktq_in_type {
>   APP_PKTQ_IN_SWQ,
>   APP_PKTQ_IN_TM,
>   APP_PKTQ_IN_SOURCE,
> + APP_PKTQ_IN_KNI,
>  };
> 
>  struct app_pktq_in_params {
> @@ -184,6 +199,7 @@ enum app_pktq_out_type {
>   APP_PKTQ_OUT_SWQ,
>   APP_PKTQ_OUT_TM,
>   APP_PKTQ_OUT_SINK,
> + APP_PKTQ_OUT_KNI,
>  };
> 
>  struct app_pktq_out_params {
> @@ -434,6 +450,10 @@ struct app_eal_params {
>  #define APP_THREAD_HEADROOM_STATS_COLLECT1
>  #endif
> 
> +#ifndef APP_MAX_KNI
> +#define APP_MAX_KNI  8
> +#endif
> +
>  struct a

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-30 Thread Ilya Maximets

Ping.

Best regards, Ilya Maximets.

On 23.05.2016 14:04, Ilya Maximets wrote:
> On 23.05.2016 13:57, Yuanhan Liu wrote:
>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>>> In current implementation guest application can reinitialize vrings
>>> by executing start after stop. In the same time host application
>>> can still poll virtqueue while device stopped in guest and it will
>>> crash with segmentation fault while vring reinitialization because
>>> of dereferencing of bad descriptor addresses.
>>>
>>> OVS crash for example:
>>> <>
>>> [test-pmd inside guest VM]
>>>
>>> testpmd> port stop all
>>> Stopping ports...
>>> Checking link statuses...
>>> Port 0 Link Up - speed 1 Mbps - full-duplex
>>> Done
>>> testpmd> port config all rxq 2
>>> testpmd> port config all txq 2
>>> testpmd> port start all
>>> Configuring Port 0 (socket 0)
>>> Port 0: 52:54:00:CB:44:C8
>>> Checking link statuses...
>>> Port 0 Link Up - speed 1 Mbps - full-duplex
>>> Done
>>
>> I actually didn't manage to reproduce it on my side, with the
>> vhost-example instead of OVS though. Is that all the commands
>> to reproduce it, and run them just after start test-pmd?
> 
> Actually, I think, packet flow should be enabled while performing
> above actions and some traffic already should be sent through port
> to change last used idx on vhost side.
> 
> Something like:
>   start
>   ..wait a while.. see that packets are flowing.
>   stop
>   port stop
>   port config
>   port config
>   port start
>>
>>> [OVS on host]
>>> Program received signal SIGSEGV, Segmentation fault.
>>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>>
>>> (gdb) bt
>>> #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>> #1  copy_desc_to_mbuf
>>> #2  rte_vhost_dequeue_burst
>>> #3  netdev_dpdk_vhost_rxq_recv
>>> ...
>>>
>>> (gdb) bt full
>>> #0  rte_memcpy
>>> ...
>>> #1  copy_desc_to_mbuf
>>> desc_addr = 0
>>> mbuf_offset = 0
>>> desc_offset = 12
>>> ...
>>> <>
>>>
>>> Fix that by checking addresses of descriptors before using them.
>>>
>>> Note: For mergeable buffers this patch checks only guest's address for
>>> zero, but in non-meargeable case host's address checked. This is done
>>> because checking of host's address in mergeable case requires additional
>>> refactoring to keep virtqueue in consistent state in case of error.
>>>
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>>
>>> Actually, current virtio implementation looks broken for me. Because
>>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>>> side.
>>>
>>> There was 2 patches about this behaviour:
>>>
>>> 1. a85786dc816f ("virtio: fix states handling during initialization")
>>> 2. 9a0615af7746 ("virtio: fix restart")
>>>
>>> The second patch fixes somehow issue intoduced in the first patch, but 
>>> actually
>>> also breaks vhost in the way described above.
>>> It's not pretty clear for me what to do in current situation with virtio,
>>> because it will be broken for guest application even if vhost will not 
>>> crash.
>>>
>>> May be it'll be better to forbid stopping of virtio device and force user to
>>> exit and start again (may be implemented in hidden from user way)?
>>>
>>> This patch adds additional sane checks, so it should be applied anyway, 
>>> IMHO.
>>
>> Agreed.
>>
>>  --yliu
>>
>>

[dpdk-dev] [PATCH] app/testpmd: log mbuf pool creation

2016-05-30 Thread Olivier Matz

Enhance the logs related to mbuf pool creation. Display an info level
log when creating the mbuf, and display the error as a string on failure.

After the patch, we have:

  [...]
  EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
  USER1: create a new mbuf pool : n=331456, \
  size=2176, socket=0
  EAL: Error - exiting with code: 1
Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate \
  memory

Signed-off-by: Olivier Matz 
---
 app/test-pmd/testpmd.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9d11830..a585aad 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -49,6 +49,7 @@
 #include 

 #include 
+#include 
 #include 
 #include 
 #include 
@@ -415,6 +416,10 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));

+   RTE_LOG(INFO, USER1,
+   "create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
+   pool_name, nb_mbuf, mbuf_seg_size, socket_id);
+
 #ifdef RTE_LIBRTE_PMD_XENVIRT
rte_mp = rte_mempool_gntalloc_create(pool_name, nb_mbuf, mb_size,
(unsigned) mb_mempool_cache,
@@ -446,8 +451,9 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
}

if (rte_mp == NULL) {
-   rte_exit(EXIT_FAILURE, "Creation of mbuf pool for socket %u "
-   "failed\n", socket_id);
+   rte_exit(EXIT_FAILURE,
+   "Creation of mbuf pool for socket %u failed: %s\n",
+   socket_id, rte_strerror(rte_errno));
} else if (verbose_level > 0) {
rte_mempool_dump(stdout, rte_mp);
}
-- 
2.8.0.rc3

[dpdk-dev] [PATCH v2 1/2] ethdev: add callback to get register size in bytes

2016-05-30 Thread Panu Matilainen

On 05/30/2016 12:39 PM, zr at semihalf.com wrote:
> From: Zyta Szpak 
>
> Version 2 of fixing the fixed register width assumption.
> rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
> do not provide register size to the app in any way. It is
> needed to allocate proper number of bytes before retrieving
> registers content with rte_eth_dev_get_reg.
>
> Signed-off-by: Zyta Szpak 
[...]
> diff --git a/lib/librte_ether/rte_ether_version.map 
> b/lib/librte_ether/rte_ether_version.map
> index 214ecc7..288bc63 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -130,5 +130,6 @@ DPDK_16.04 {
>   rte_eth_tx_buffer_drop_callback;
>   rte_eth_tx_buffer_init;
>   rte_eth_tx_buffer_set_err_callback;
> + rte_eth_dev_get_reg_width;
>
>  } DPDK_2.2;

This symbol did not exist in DPDK 16.04 so it must not be added there. 
Add a new section for 16.07 which inherits from DPDK_16.04.

- Panu -

[dpdk-dev] [PATCH] kni: fix use of undefined comma variable in makefile

2016-05-30 Thread Olivier Matz

The $(comma) variable is not defined in this Makefile, nor in
any included Makefile. Seen while doing a "make clean" on ubuntu:

  $ make clean
  == Clean lib
  == Clean lib/librte_compat
  == Clean lib/librte_eal
  == Clean lib/librte_eal/common
  == Clean lib/librte_eal/linuxapp
  == Clean lib/librte_eal/linuxapp/eal
  == Clean lib/librte_eal/linuxapp/igb_uio
  == Clean lib/librte_eal/linuxapp/kni
  tr: missing operand after ?.-?
  Two strings must be given when translating.
  Try 'tr --help' for more information.

This commit replaces $(comma) by a ',' character, it's not a problem in
that case since we are inside antiquotes.

Fixes: a09b359daca3 ("kni: fix build on Ubuntu 14.04")
Signed-off-by: Olivier Matz 
---
 lib/librte_eal/linuxapp/kni/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/kni/Makefile 
b/lib/librte_eal/linuxapp/kni/Makefile
index ac99d3f..8cc6b61 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -47,7 +47,7 @@ MODULE_CFLAGS += -Wall -Werror
 ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
 MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -d .)
 UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE 
$(RTE_KERNELDIR)/include/generated/utsrelease.h \
-| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
+| cut -d '"' -f2 | cut -d- -f1,2 | tr .- ,`,1)
 MODULE_CFLAGS += 
-D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_CODE))"
 endif

-- 
2.8.0.rc3

[dpdk-dev] [PATCH] qat: fix phys address of content descriptor

2016-05-30 Thread Arek Kusztal

From: Arkadiusz Kusztal 

this patch fixes an error with computation of physical address of
content descriptor in the symmetric operations session

Signed-off-by: Arkadiusz Kusztal 
---
 drivers/crypto/qat/qat_crypto.c  | 9 ++---
 lib/librte_cryptodev/rte_cryptodev.c | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 495ea1c..abe0511 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -807,12 +807,15 @@ static inline uint32_t adf_modulo(uint32_t data, uint32_t 
shift)
return data - mult;
 }

-void qat_crypto_sym_session_init(struct rte_mempool *mp, void *priv_sess)
+void qat_crypto_sym_session_init(struct rte_mempool *mp, void *sym_sess)
 {
-   struct qat_session *s = priv_sess;
+   struct rte_cryptodev_sym_session *sess = sym_sess;
+   struct qat_session *s = (void *)sess->_private;

PMD_INIT_FUNC_TRACE();
-   s->cd_paddr = rte_mempool_virt2phy(mp, &s->cd);
+   s->cd_paddr = rte_mempool_virt2phy(mp, sess) +
+   offsetof(struct qat_session, cd) +
+   offsetof(struct rte_cryptodev_sym_session, _private);
 }

 int qat_dev_config(__rte_unused struct rte_cryptodev *dev)
diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index aa4ea42..960e2d5 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -956,7 +956,7 @@ rte_cryptodev_sym_session_init(struct rte_mempool *mp,
sess->mp = mp;

if (dev->dev_ops->session_initialize)
-   (*dev->dev_ops->session_initialize)(mp, sess->_private);
+   (*dev->dev_ops->session_initialize)(mp, sess);
 }

 static int
-- 
2.1.0

[dpdk-dev] [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything!

2016-05-30 Thread Martinx - ジェームズ

Answers inline, as follows:

On 30 May 2016 at 12:44, Bodireddy, Bhanuprakash <
bhanuprakash.bodireddy at intel.com> wrote:

> *From:* Martinx - ? [mailto:thiagocmartinsc at gmail.com]
> *Sent:* Monday, May 30, 2016 5:01 PM
> *To:* Bodireddy, Bhanuprakash 
> *Cc:* Christian Ehrhardt ; <
> dev at openvswitch.org> ; dev ;
> qemu-stable at nongnu.org
> *Subject:* Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of
> dpdkvhostuser OVS socket interface, it slows down everything!
>
>
>
> Hello Bhanu,
>
>
>
>  I'm a little bit confused, you said that the problem can be fixed but,
> later, you also said that:
>
>
>
>  "On a Multi VM setup even with the above patch applied, one might see
> aggregate throughput difference when vNIC is bind to igb_uio vs
> virtio-pci"...
>
>
>
>  My idea is to use OVS with DPDK in a multi-vm environment but, based on
> your answer, this is not possible, because the VM A, can interfere with VM
> B... Is that true even with that patch applied, can you confirm this?
>
>
>
> [BHANU] With the patch applied the issue should be fixed.  Without the
> patch,  VM A can interfere with VM B when VM A isn?t processing its queues
> which eventually triggers vhost send retries until timeout (100ms in this
> case). This cause the pmd thread to slow down and that affects the other
> Virtual Machines(VM B) on the host as happened in your case. The patch that
> I pointed will remove the retry logic completely.
>

Sounds cool! I'll try that.


>
>
> Also in your case, packets are sent to the idle VM (no packets drained
> from the virt queues inside) which triggered the issue and affected the
> neighboring VMs. consider sending the traffic to the newly booted VM after
> the forwarding is enabled inside the guest.
>

This looks impossible to do in a dynamic environment... Specially if using
OpenStack with OVS + DPDK at the Compute Nodes... Customers will launch
their NFV (DPDK Apps) and the traffic will be already there... I can't see
a way of "stopping traffic" if the l2 forward becomes down, or if customer
reboots its Instance (that will bring virtio-pci momentarily).

Even on KVM-Only environments, this looks very hard to do and a nightmare
to manage...

How can I tell a customer that, before running "service my-l2-dpdk-app
stop", he needs to communicate the "cloud provider", or to access the
KVM-Host to manage OVS+DPDK as root, to stop sending traffic to the VM...


>
>  I don't think that diverting the traffic from a VM that loaded virtio-pci
> drivers is a doable solution (since you can't predict what the owners of
> the VMs will be doing), also, specially because in my env, the DPDK App is
> a L2 bridge, so, it receives traffic that is not destined to it (might be
> even harder to try to do this)...
>
>
>
>  I have all the required hardware to keep testing this, so, let me know
> when you guys (Intel / Canonical) have newer versions, I'll test it with
> pleasure!   :-)
>
> [BHANU] Apply the patch from the thread and this should resolve the issue
> reported.
>

I'll definitely give it a try! Thank you for sharing that... I'll post the
results soon.


>
>
> Thanks!
>
> Thiago
>
>
>
> On 25 May 2016 at 11:00, Bodireddy, Bhanuprakash <
> bhanuprakash.bodireddy at intel.com> wrote:
>
> I could reproduce the issue and this can be fixed as below
>
> Firstly, the throughput issues observed with other VMs when a new VM is
> started can be fixed using the patch in the thread
> http://openvswitch.org/pipermail/dev/2016-May/071615.html.  I have put up
> an explanation in this thread for the cause of issue especially with multi
> VM setup on OVS DPDK.
>
> On a Multi VM setup even with the above patch applied, one might see
> aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci,
> this is for the fact that the interrupt overhead is significantly higher
> when virtio-pci is in use.
>
> More importantly if you have setup explicit flows matching VM's MAC/IP,
> disabling the flows to the VM that are idle would improve the aggregate
> throughput and lessen the burden on the pmd thread.   'watch -d
> ./utilities/ovs-appctl dpctl/show -s' will show no. of packet stats.
>
> Regards,
> Bhanu Prakash.
>
>
> >-Original Message-
> >From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Christian
> >Ehrhardt
> >Sent: Wednesday, May 25, 2016 7:08 AM
> >To: Martinx - ? 
> >Cc:  ; dev ;
> >qemu-stable at nongnu.org
> >Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of
> >dpdkvhostuser OVS socket interface, it slows down everything!
> >
>
> >Hi again,
> >another forgotten case.
> >
> >I currently I lack the HW to fully reproduce this, but the video summary
> is
> >pretty good and shows the issue in an impressive way.
> >
> >Also the description is good and here as well I wonder if anybody else
> could
> >reproduce this.
> >Any hints / insights are welcome.
> >
> >P.S. and also again - two list cross posting, but here as well it is yet
> unclear
> >which it

[dpdk-dev] [PATCH v5 1/3] mempool: support external handler

2016-05-30 Thread Hunt, David

On 5/30/2016 10:41 AM, Jerin Jacob wrote:
--snip--
>> Of course, that won't help if we need to pass in more data, in which case
>> we'd probably need an
>> opaque data pointer somewhere. It would probably be most useful to pass it
>> in with the
>> alloc, which may need the data. Any suggestions?
> But the top level rte_mempool_create() function needs to pass the data. Right?
> That would be an ABI change. IMO, we need to start thinking about
> passing a struct of config data to rte_mempool_create to create
> backward compatibility on new argument addition to rte_mempool_create()

New mempool handlers will use rte_mempool_create_empty(), 
rte_mempool_set_handler(),
then rte_mempool_populate_*(). These three functions are new to this 
release, to no problem
to add a parameter to one of them for the config data. Also since we're 
adding some new
elements to the mempool structure, how about we add a new pointer for a 
void pointer to a
config data structure, as defined by the handler.

So, new element in rte_mempool struct alongside the *pool
 void *pool;
 void *pool_config;

Then add a param to the rte_mempool_set_handler function:
int
rte_mempool_set_handler(struct rte_mempool *mp, const char *name, void 
*pool_config)

The function would simply set the pointer in the mempool struct, and the 
custom handler
alloc/create function would use as apporopriate as needed. Handlers that 
do not need this
data can be passed NULL.

> Other points in HW assisted pool manager perspective,
>
> 1) May be RING can be replaced with some other higher abstraction name
> for the internal MEMPOOL_F_RING_CREATED flag

Agreed. I'll change to MEMPOOL_F_POOL_CREATED, since we're already 
changing the *ring
element of the mempool struct to *pool

> 2) IMO, It is better to change void *pool in struct rte_mempool to
> anonymous union type, something like below, so that mempool
> implementation can choose the best type.
>   union {
>   void *pool;
>   uint64_t val;
>   }

Could we do this by using the union for the *pool_config suggested 
above, would that give
you what you need?

> 3) int32_t handler_idx creates 4 byte hole in struct rte_mempool in
> 64 bit system. IMO it better to rearrange.(as const struct rte_memzone
> *mz comes next)
OK, Will look at this.

> 4) IMO, It is better to change ext_alloc/ext_free to ext_create/ext_destroy
> as their is no allocation in HW assisted pool manager case,
> it will be mostly creating some HW initialization.

OK, I'll change. I think that makes more sense.

Regards,
Dave.

[dpdk-dev] [PATCH 3/3] kni: move more kernel version check to compat header

2016-05-30 Thread Ferruh Yigit

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/kni/compat.h   |  9 +
 lib/librte_eal/linuxapp/kni/kni_misc.c | 30 --
 lib/librte_eal/linuxapp/kni/kni_net.c  |  8 
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
b/lib/librte_eal/linuxapp/kni/compat.h
index d10040d..647ba3c 100644
--- a/lib/librte_eal/linuxapp/kni/compat.h
+++ b/lib/librte_eal/linuxapp/kni/compat.h
@@ -14,16 +14,25 @@

 #endif /* < 2.6.39 */

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 33)
+#define HAVE_SIMPLIFIED_PERNET_OPERATIONS
+#endif
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 35)
 #define sk_sleep(s) (s)->sk_sleep
 #endif

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)
+#define HAVE_CHANGE_CARRIER_CB
+#endif
+
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0)
 #define HAVE_IOV_ITER_MSGHDR
 #endif

 #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)
 #define HAVE_KIOCB_MSG_PARAM
+#define HAVE_REBUILD_HEADER
 #endif

 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 7, 0)
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
b/lib/librte_eal/linuxapp/kni/kni_misc.c
index e1ec443..4f34232 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -35,6 +35,8 @@
 #include 

 #include 
+
+#include "compat.h"
 #include "kni_dev.h"

 MODULE_LICENSE("Dual BSD/GPL");
@@ -105,7 +107,7 @@ struct kni_net {

 static int __net_init kni_init_net(struct net *net)
 {
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
struct kni_net *knet = net_generic(net, kni_net_id);
 #else
struct kni_net *knet;
@@ -116,7 +118,7 @@ static int __net_init kni_init_net(struct net *net)
ret = -ENOMEM;
return ret;
}
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif

/* Clear the bit of device in use */
clear_bit(KNI_DEV_IN_USE_BIT_NUM, &knet->device_in_use);
@@ -124,7 +126,7 @@ static int __net_init kni_init_net(struct net *net)
init_rwsem(&knet->kni_list_lock);
INIT_LIST_HEAD(&knet->kni_list_head);

-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
return 0;
 #else
ret = net_assign_generic(net, kni_net_id, knet);
@@ -132,25 +134,25 @@ static int __net_init kni_init_net(struct net *net)
kfree(knet);

return ret;
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
 }

 static void __net_exit kni_exit_net(struct net *net)
 {
-#if LINUX_VERSION_CODE <= KERNEL_VERSION(2, 6, 32)
+#ifndef HAVE_SIMPLIFIED_PERNET_OPERATIONS
struct kni_net *knet = net_generic(net, kni_net_id);

kfree(knet);
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
 }

 static struct pernet_operations kni_net_ops = {
.init = kni_init_net,
.exit = kni_exit_net,
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
.id   = &kni_net_id,
.size = sizeof(struct kni_net),
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
 };

 static int __init
@@ -165,11 +167,11 @@ kni_init(void)
return -EINVAL;
}

-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
rc = register_pernet_subsys(&kni_net_ops);
 #else
rc = register_pernet_gen_subsys(&kni_net_id, &kni_net_ops);
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
if (rc)
return -EPERM;

@@ -187,11 +189,11 @@ kni_init(void)
return 0;

 out:
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
unregister_pernet_subsys(&kni_net_ops);
 #else
register_pernet_gen_subsys(&kni_net_id, &kni_net_ops);
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
return rc;
 }

@@ -199,11 +201,11 @@ static void __exit
 kni_exit(void)
 {
misc_deregister(&kni_misc);
-#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32)
+#ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
unregister_pernet_subsys(&kni_net_ops);
 #else
register_pernet_gen_subsys(&kni_net_id, &kni_net_ops);
-#endif /* LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 32) */
+#endif
KNI_PRINT("### DPDK kni module unloaded  ###\n");
 }

diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c 
b/lib/librte_eal/linuxapp/kni/kni_net.c
index 076372c..fc82193 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -641,7 +641,7 @@ kni_net_header(struct sk_buff *skb, struct net_device *dev,
 /*
  * Re-fill the eth header
  */
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0))
+#ifdef HAVE_REBUILD_HEADER
 static int
 kni_net_rebuild_header(struct sk_buff *skb)
 {
@@ -671,7 +671,7 @@ static int kni_net_set_mac(struct net_device *netdev, void 
*p)
r

[dpdk-dev] [PATCH 2/3] kni: compat.h syntax update

2016-05-30 Thread Ferruh Yigit

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/kni/compat.h | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
b/lib/librte_eal/linuxapp/kni/compat.h
index 0e939e4..d10040d 100644
--- a/lib/librte_eal/linuxapp/kni/compat.h
+++ b/lib/librte_eal/linuxapp/kni/compat.h
@@ -14,19 +14,17 @@

 #endif /* < 2.6.39 */

-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,35)
-
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 35)
 #define sk_sleep(s) (s)->sk_sleep
+#endif

-#endif /* < 2.6.35 */
-
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0)
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0)
 #define HAVE_IOV_ITER_MSGHDR
 #endif

-#if ( LINUX_VERSION_CODE < KERNEL_VERSION(4,1,0) )
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)
 #define HAVE_KIOCB_MSG_PARAM
-#endif /* < 4.1.0 */
+#endif

 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 7, 0)
 #define HAVE_TRANS_START_HELPER
-- 
2.5.5

[dpdk-dev] [PATCH 1/3] kni: fix compile error for Linux 4.7

2016-05-30 Thread Ferruh Yigit

Fix compile error becuase of Linux API change, 'trans_start' field
removed from 'struct net_device'.

Linux: 9b36627acecd ("net: remove dev->trans_start")

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/kni/compat.h  | 4 
 lib/librte_eal/linuxapp/kni/kni_net.c | 9 -
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
b/lib/librte_eal/linuxapp/kni/compat.h
index cf100b6..0e939e4 100644
--- a/lib/librte_eal/linuxapp/kni/compat.h
+++ b/lib/librte_eal/linuxapp/kni/compat.h
@@ -27,3 +27,7 @@
 #if ( LINUX_VERSION_CODE < KERNEL_VERSION(4,1,0) )
 #define HAVE_KIOCB_MSG_PARAM
 #endif /* < 4.1.0 */
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 7, 0)
+#define HAVE_TRANS_START_HELPER
+#endif
diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c 
b/lib/librte_eal/linuxapp/kni/kni_net.c
index 4095382..076372c 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -38,6 +38,8 @@

 #include 
 #include 
+
+#include "compat.h"
 #include "kni_dev.h"

 #define WD_TIMEOUT 5 /*jiffies */
@@ -426,7 +428,12 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
struct rte_kni_mbuf *pkt_kva = NULL;
struct rte_kni_mbuf *pkt_va = NULL;

-   dev->trans_start = jiffies; /* save the timestamp */
+   /* save the timestamp */
+#ifdef HAVE_TRANS_START_HELPER
+   netif_trans_update(dev);
+#else
+   dev->trans_start = jiffies;
+#endif

/* Check if the length of skb is less than mbuf size */
if (skb->len > kni->mbuf_size)
-- 
2.5.5

[dpdk-dev] [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything!

2016-05-30 Thread Martinx - ジェームズ

Hello Bhanu,

 I'm a little bit confused, you said that the problem can be fixed but,
later, you also said that:

 "On a Multi VM setup even with the above patch applied, one might see
aggregate throughput difference when vNIC is bind to igb_uio vs
virtio-pci"...

 My idea is to use OVS with DPDK in a multi-vm environment but, based on
your answer, this is not possible, because the VM A, can interfere with VM
B... Is that true even with that patch applied, can you confirm this?

 I don't think that diverting the traffic from a VM that loaded virtio-pci
drivers is a doable solution (since you can't predict what the owners of
the VMs will be doing), also, specially because in my env, the DPDK App is
a L2 bridge, so, it receives traffic that is not destined to it (might be
even harder to try to do this)...

 I have all the required hardware to keep testing this, so, let me know
when you guys (Intel / Canonical) have newer versions, I'll test it with
pleasure!   :-)

Thanks!
Thiago

On 25 May 2016 at 11:00, Bodireddy, Bhanuprakash <
bhanuprakash.bodireddy at intel.com> wrote:

> I could reproduce the issue and this can be fixed as below
>
> Firstly, the throughput issues observed with other VMs when a new VM is
> started can be fixed using the patch in the thread
> http://openvswitch.org/pipermail/dev/2016-May/071615.html.  I have put up
> an explanation in this thread for the cause of issue especially with multi
> VM setup on OVS DPDK.
>
> On a Multi VM setup even with the above patch applied, one might see
> aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci,
> this is for the fact that the interrupt overhead is significantly higher
> when virtio-pci is in use.
>
> More importantly if you have setup explicit flows matching VM's MAC/IP,
> disabling the flows to the VM that are idle would improve the aggregate
> throughput and lessen the burden on the pmd thread.   'watch -d
> ./utilities/ovs-appctl dpctl/show -s' will show no. of packet stats.
>
> Regards,
> Bhanu Prakash.
>
>
> >-Original Message-
> >From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Christian
> >Ehrhardt
> >Sent: Wednesday, May 25, 2016 7:08 AM
> >To: Martinx - ? 
> >Cc:  ; dev ;
> >qemu-stable at nongnu.org
> >Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of
> >dpdkvhostuser OVS socket interface, it slows down everything!
> >
> >Hi again,
> >another forgotten case.
> >
> >I currently I lack the HW to fully reproduce this, but the video summary
> is
> >pretty good and shows the issue in an impressive way.
> >
> >Also the description is good and here as well I wonder if anybody else
> could
> >reproduce this.
> >Any hints / insights are welcome.
> >
> >P.S. and also again - two list cross posting, but here as well it is yet
> unclear
> >which it belongs to so I'll keep it as well
> >
> >Christian Ehrhardt
> >Software Engineer, Ubuntu Server
> >Canonical Ltd
> >
> >On Sun, May 22, 2016 at 6:35 PM, Martinx - ?
> >
> >wrote:
> >
> >> Guys,
> >>
> >>  I'm seeing a strange problem here, in my OVS+DPDK deployment, on top
> >> of Ubuntu 16.04 (DPDK 2.2 and OVS 2.5).
> >>
> >>  Here is what I'm trying to do: run OVS with DPDK at the host, for KVM
> >> Guests that also, will be running more DPDK Apps.
> >>
> >>  The host have 2 x 10G NICs, for OVS+DPDK and each KVM Guest receives
> >> its own VLAN tagged traffic (or all tags).
> >>
> >>  There is an IXIA Traffic Generator sending 10G of traffic on both
> >> directions (20G total).
> >>
> >>  Exemplifying, the problem is, lets say that I already have 2 VMs (or
> >> 10) running DPDK Apps (on top of dpdkvhostuser), everything is working
> >> as expected, then, if I boot the 3rd (or 11) KVM Guest, the OVS+DPDK
> >> bridge at the host, slows down, a lot! The 3rd (or 11) VM affects not
> >> only the host, but also, all the other neighbors VMs!!!
> >>
> >>  NOTE: This problem appear since the boot of VM 1.
> >>
> >>  Soon as you, inside of the 3rd VM, bind the VirtIO NIC to the
> >> DPDK-Compative Drivers, the speed comes back to normal. If you bind it
> >> back to "virtio-pci", boom! The OVS+DPDK at the host and all VMs loses
> >> too much speed.
> >>
> >>  This problem is detailed at the following bug report:
> >>
> >> --
> >> The OVS+DPDK dpdkvhostuser socket bridge, only works as expected, if
> >> the KVM Guest also have DPDK drivers loaded:
> >>
> >> https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1577256
> >> --
> >>
> >>  Also, I've recorded a ~15 min screen cast video about this problem,
> >> so, you guys can see exactly what is happening here.
> >>
> >>
> >https://www.youtube.com/v/yHnaSikd9XY?version=3&vq=hd720&autoplay=
> >1
> >>
> >>  * At 5:25, I'm starting a VM that will boot up and load a DPDK App;
> >>
> >>  * At 5:33, OVS+DPDK is messed up, it loses speed;
> >>The KVM running with virtio-pci drivers breaks OVS+DPDK at the
> >> host;
> >>
> >>  * At 6:50, DPDK inside of the KVM guest loads up its drivers,

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-30 Thread Tan, Jianfeng

Hi,

> -Original Message-
> From: Ilya Maximets [mailto:i.maximets at samsung.com]
> Sent: Friday, May 20, 2016 8:50 PM
> To: dev at dpdk.org; Xie, Huawei; Yuanhan Liu
> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
> 
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
> 
> OVS crash for example:
> <>
> [test-pmd inside guest VM]
> 
>   testpmd> port stop all
>   Stopping ports...
>   Checking link statuses...
>   Port 0 Link Up - speed 1 Mbps - full-duplex
>   Done
>   testpmd> port config all rxq 2
>   testpmd> port config all txq 2
>   testpmd> port start all
>   Configuring Port 0 (socket 0)
>   Port 0: 52:54:00:CB:44:C8
>   Checking link statuses...
>   Port 0 Link Up - speed 1 Mbps - full-duplex
>   Done
> 
> [OVS on host]
>   Program received signal SIGSEGV, Segmentation fault.
>   rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
> rte_memcpy.h
> 
>   (gdb) bt
>   #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>   #1  copy_desc_to_mbuf
>   #2  rte_vhost_dequeue_burst
>   #3  netdev_dpdk_vhost_rxq_recv
>   ...
> 
>   (gdb) bt full
>   #0  rte_memcpy
>   ...
>   #1  copy_desc_to_mbuf
>   desc_addr = 0
>   mbuf_offset = 0
>   desc_offset = 12
>   ...
> <>
> 
> Fix that by checking addresses of descriptors before using them.
> 
> Note: For mergeable buffers this patch checks only guest's address for
> zero, but in non-meargeable case host's address checked. This is done
> because checking of host's address in mergeable case requires additional
> refactoring to keep virtqueue in consistent state in case of error.


I agree with you that it should be fixed because malicious guest could launch 
DOS attack on vswitch with the current implementation.

But I don't understand why you do not fix the mergable case in 
copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in 
fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?

Thanks,
Jianfeng

[dpdk-dev] [PATCH v3 10/10] doc: update xstats documentation

2016-05-30 Thread Remy Horton

Signed-off-by: Remy Horton 
---
 doc/guides/prog_guide/poll_mode_drv.rst | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst 
b/doc/guides/prog_guide/poll_mode_drv.rst
index 7698692..6cd86dd 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -299,10 +299,27 @@ Extended Statistics API
 ~~~

 The extended statistics API allows each individual PMD to expose a unique set
-of statistics. The client of the API provides an array of
-``struct rte_eth_xstats`` type. Each ``struct rte_eth_xstats`` contains a
-string and value pair. The amount of xstats exposed, and position of the
-statistic in the array must remain constant during runtime.
+of statistics. Accessing these from application programs is done via three
+functions:
+
+* ``rte_eth_xstats_count``: Fetches the number of extended statistics.
+* ``rte_eth_xstats_get``: Fills in an array of ``struct rte_eth_xstats``
+  with extended statistics.
+* ``rte_eth_xstats_names``: Fills in an array of ``struct rte_eth_name``
+  with extended statistic name lookup information.
+
+Each ``struct rte_eth_xstats`` contains an identifier and value pair, and
+each ``struct rte_eth_xstats_name`` contains an identifier and string pair.
+Each identifier within ``struct rte_eth_xstats`` must have a corresponding
+entry in ``struct rte_eth_xstats_name`` with a matching identifier. These
+identifiers, as well as the number of extended statistic exposed, must
+remain constant during runtime.
+
+Note that extended statistic identifiers are driver-specific, and hence
+might not be the same for different ports. Although it is expected that
+drivers will make the identifiers used within ``struct rte_eth_xstats`` and
+``struct rte_eth_xstats_name`` entries match the entries' array index, this
+property should not be relied on by applications for lookups.

 A naming scheme exists for the strings exposed to clients of the API. This is
 to allow scraping of the API for statistics of interest. The naming scheme uses
-- 
2.5.5

[dpdk-dev] [PATCH v3 09/10] remove name field from struct rte_eth_xstats

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch removes the name field
and all its usage of the old API.

Signed-off-by: Remy Horton 
---
 drivers/net/e1000/igb_ethdev.c | 2 --
 drivers/net/fm10k/fm10k_ethdev.c   | 3 ---
 drivers/net/i40e/i40e_ethdev.c | 4 
 drivers/net/i40e/i40e_ethdev_vf.c  | 1 -
 drivers/net/ixgbe/ixgbe_ethdev.c   | 3 ---
 drivers/net/virtio/virtio_ethdev.c | 2 --
 lib/librte_ether/rte_ethdev.c  | 3 ---
 lib/librte_ether/rte_ethdev.h  | 2 --
 8 files changed, 20 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 6d5e46f..2bfd3f8 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1741,7 +1741,6 @@ eth_igb_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Extended stats */
for (i = 0; i < IGB_NB_XSTATS; i++) {
-   xstats[i].name[0] = '\0';
xstats[i].id = i;
xstats[i].value = *(uint64_t *)(((char *)hw_stats) +
rte_igb_stats_strings[i].offset);
@@ -1824,7 +1823,6 @@ eth_igbvf_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,
return 0;

for (i = 0; i < IGBVF_NB_XSTATS; i++) {
-   xstats[i].name[0] = '\0';
xstats[i].id = i;
xstats[i].value = *(uint64_t *)(((char *)hw_stats) +
rte_igbvf_stats_strings[i].offset);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 179441d..be7431d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1310,7 +1310,6 @@ fm10k_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Global stats */
for (i = 0; i < FM10K_NB_HW_XSTATS; i++) {
-   xstats[count].name[0] = '\0';
xstats[count].value = *(uint64_t *)(((char *)hw_stats) +
fm10k_hw_stats_strings[count].offset);
count++;
@@ -1319,14 +1318,12 @@ fm10k_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,
/* PF queue stats */
for (q = 0; q < FM10K_MAX_QUEUES_PF; q++) {
for (i = 0; i < FM10K_NB_RX_Q_XSTATS; i++) {
-   xstats[count].name[0] = '\0';
xstats[count].value =
*(uint64_t *)(((char *)&hw_stats->q[q]) +
fm10k_hw_stats_rx_q_strings[i].offset);
count++;
}
for (i = 0; i < FM10K_NB_TX_Q_XSTATS; i++) {
-   xstats[count].name[0] = '\0';
xstats[count].value =
*(uint64_t *)(((char *)&hw_stats->q[q]) +
fm10k_hw_stats_tx_q_strings[i].offset);
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index e0ce695..cc77370 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -2284,7 +2284,6 @@ i40e_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Get stats from i40e_eth_stats struct */
for (i = 0; i < I40E_NB_ETH_XSTATS; i++) {
-   xstats[count].name[0] = '\0';
xstats[count].id = count;
xstats[count].value = *(uint64_t *)(((char *)&hw_stats->eth) +
rte_i40e_stats_strings[i].offset);
@@ -2293,7 +2292,6 @@ i40e_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Get individiual stats from i40e_hw_port struct */
for (i = 0; i < I40E_NB_HW_PORT_XSTATS; i++) {
-   xstats[count].name[0] = '\0';
xstats[count].id = count;
xstats[count].value = *(uint64_t *)(((char *)hw_stats) +
rte_i40e_hw_port_strings[i].offset);
@@ -2302,7 +2300,6 @@ i40e_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

for (i = 0; i < I40E_NB_RXQ_PRIO_XSTATS; i++) {
for (prio = 0; prio < 8; prio++) {
-   xstats[count].name[0] = '\0';
xstats[count].id = count;
xstats[count].value =
*(uint64_t *)(((char *)hw_stats) +
@@ -2314,7 +2311,6 @@ i40e_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

for (i = 0; i < I40E_NB_TXQ_PRIO_XSTATS; i++) {
for (prio = 0; prio < 8; prio++) {
-   xstats[count].name[0] = '\0';
xstats[count].id = count;
xstats[count].value =
*(uint64_t *)(((char *)hw_stats) +
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
i

[dpdk-dev] [PATCH v3 08/10] app/proc_info: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the proc_info
application to use the new API that seperates name string and value
queries.

Signed-off-by: Remy Horton 
---
 app/proc_info/main.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 5f83092..ef71fcf 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -1,7 +1,7 @@
 /*
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -243,11 +243,13 @@ nic_stats_clear(uint8_t port_id)
 static void
 nic_xstats_display(uint8_t port_id)
 {
+   struct rte_eth_xstats_name *ptr_names;
struct rte_eth_xstats *xstats;
int len, ret, i;
+   int idx_name;
static const char *nic_stats_border = "";

-   len = rte_eth_xstats_get(port_id, NULL, 0);
+   len = rte_eth_xstats_count(port_id);
if (len < 0) {
printf("Cannot get xstats count\n");
return;
@@ -258,6 +260,18 @@ nic_xstats_display(uint8_t port_id)
return;
}

+   ptr_names = malloc(sizeof(struct rte_eth_xstats_name) * len);
+   if (ptr_names == NULL) {
+   printf("Cannot allocate memory for xstat names\n");
+   free(xstats);
+   return;
+   }
+   if (len != rte_eth_xstats_names(
+   port_id, ptr_names, len)) {
+   printf("Cannot get xstat names\n");
+   return;
+   }
+
printf("## NIC extended statistics for port %-2d #\n",
   port_id);
printf("%s\n",
@@ -270,11 +284,17 @@ nic_xstats_display(uint8_t port_id)
}

for (i = 0; i < len; i++)
-   printf("%s: %"PRIu64"\n", xstats[i].name, xstats[i].value);
+   for (idx_name = 0; idx_name < len; idx_name++)
+   if (ptr_names[idx_name].id == xstats[i].id) {
+   printf("%s: %lu\n", ptr_names[idx_name].name,
+   xstats[i].value);
+   break;
+   }

printf("%s\n",
   nic_stats_border);
free(xstats);
+   free(ptr_names);
 }

 static void
-- 
2.5.5

[dpdk-dev] [PATCH v3 07/10] app/test-pmd: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the test-pmd
application to use the new API that seperates name string and value
queries.

Signed-off-by: Remy Horton 
---
 app/test-pmd/config.c | 52 +++
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 1c552e4..3ba5679 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -233,28 +233,56 @@ void
 nic_xstats_display(portid_t port_id)
 {
struct rte_eth_xstats *xstats;
-   int len, ret, i;
+   int cnt_xstats, idx_xstat, idx_name;
+   struct rte_eth_xstats_name *ptr_names;

printf("## NIC extended statistics for port %-2d\n", port_id);
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   printf("Error: Invalid port number %i\n", port_id);
+   return;
+   }
+
+   /* Get count */
+   cnt_xstats = rte_eth_xstats_count(port_id);
+   if (cnt_xstats  < 0) {
+   printf("Error: Cannot get count of xstats\n");
+   return;
+   }

-   len = rte_eth_xstats_get(port_id, NULL, 0);
-   if (len < 0) {
-   printf("Cannot get xstats count\n");
+   /* Get id-name lookup table */
+   ptr_names = malloc(sizeof(struct rte_eth_xstats_name) * cnt_xstats);
+   if (ptr_names == NULL) {
+   printf("Cannot allocate memory for xstats lookup\n");
return;
}
-   xstats = malloc(sizeof(xstats[0]) * len);
+   if (cnt_xstats != rte_eth_xstats_names(
+   port_id, ptr_names, cnt_xstats)) {
+   printf("Error: Cannot get xstats lookup\n");
+   return;
+   }
+
+   /* Get stats themselves */
+   xstats = malloc(sizeof(struct rte_eth_xstats) * cnt_xstats);
if (xstats == NULL) {
printf("Cannot allocate memory for xstats\n");
+   free(ptr_names);
return;
}
-   ret = rte_eth_xstats_get(port_id, xstats, len);
-   if (ret < 0 || ret > len) {
-   printf("Cannot get xstats\n");
-   free(xstats);
+   if (cnt_xstats != rte_eth_xstats_get(port_id, xstats, cnt_xstats)) {
+   printf("Error: Unable to get xstats\n");
return;
}
-   for (i = 0; i < len; i++)
-   printf("%s: %"PRIu64"\n", xstats[i].name, xstats[i].value);
+
+   /* Display xstats */
+   for (idx_xstat = 0; idx_xstat < cnt_xstats; idx_xstat++)
+   for (idx_name = 0; idx_name < cnt_xstats; idx_name++)
+   if (ptr_names[idx_name].id == xstats[idx_xstat].id) {
+   printf("%s: %"PRIu64"\n",
+   ptr_names[idx_name].name,
+   xstats[idx_xstat].value);
+   break;
+   }
+   free(ptr_names);
free(xstats);
 }

-- 
2.5.5

[dpdk-dev] [PATCH v3 06/10] drivers/net/virtio: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the virtio driver
to use the new API that seperates name string and value queries.

Signed-off-by: Remy Horton 
---
 drivers/net/virtio/virtio_ethdev.c | 62 +-
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3fb628..9d549de 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -80,6 +80,9 @@ static void virtio_dev_stats_get(struct rte_eth_dev *dev,
 struct rte_eth_stats *stats);
 static int virtio_dev_xstats_get(struct rte_eth_dev *dev,
 struct rte_eth_xstats *xstats, unsigned n);
+static int virtio_dev_xstats_names(struct rte_eth_dev *dev,
+  struct rte_eth_xstats_name *ptr_names,
+  unsigned limit);
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
 static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
@@ -615,6 +618,7 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
.dev_infos_get   = virtio_dev_info_get,
.stats_get   = virtio_dev_stats_get,
.xstats_get  = virtio_dev_xstats_get,
+   .xstats_names= virtio_dev_xstats_names,
.stats_reset = virtio_dev_stats_reset,
.xstats_reset= virtio_dev_stats_reset,
.link_update = virtio_dev_link_update,
@@ -708,6 +712,52 @@ virtio_update_stats(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
 }

+static int virtio_dev_xstats_names(struct rte_eth_dev *dev,
+  struct rte_eth_xstats_name *ptr_names,
+  __rte_unused unsigned limit)
+{
+   unsigned i;
+   unsigned count = 0;
+   unsigned t;
+
+   unsigned nstats = dev->data->nb_tx_queues * VIRTIO_NB_Q_XSTATS +
+   dev->data->nb_rx_queues * VIRTIO_NB_Q_XSTATS;
+
+   if (ptr_names == NULL) {
+   /* Note: limit checked in rte_eth_xstats_names() */
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct virtqueue *rxvq = dev->data->rx_queues[i];
+   if (rxvq == NULL)
+   continue;
+   for (t = 0; t < VIRTIO_NB_Q_XSTATS; t++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "rx_q%u_%s", i,
+   rte_virtio_q_stat_strings[t].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct virtqueue *txvq = dev->data->tx_queues[i];
+   if (txvq == NULL)
+   continue;
+   for (t = 0; t < VIRTIO_NB_Q_XSTATS; t++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "tx_q%u_%s", i,
+   rte_virtio_q_stat_strings[t].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   }
+   return count;
+   }
+   return nstats;
+}
+
 static int
 virtio_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstats *xstats,
  unsigned n)
@@ -730,9 +780,8 @@ virtio_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,
unsigned t;

for (t = 0; t < VIRTIO_NB_Q_XSTATS; t++) {
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-"rx_q%u_%s", i,
-rte_virtio_q_stat_strings[t].name);
+   xstats[count].name[0] = '\0';
+   xstats[count].id = count;
xstats[count].value = *(uint64_t *)(((char *)rxvq) +
rte_virtio_q_stat_strings[t].offset);
count++;

[dpdk-dev] [PATCH v3 05/10] drivers/net/i40e: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the i40e driver
to use the new API that seperates name string and value queries.

Signed-off-by: Remy Horton 
---
 drivers/net/i40e/i40e_ethdev.c| 81 ---
 drivers/net/i40e/i40e_ethdev_vf.c | 25 ++--
 2 files changed, 89 insertions(+), 17 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 24777d5..e0ce695 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -306,6 +306,9 @@ static void i40e_dev_stats_get(struct rte_eth_dev *dev,
   struct rte_eth_stats *stats);
 static int i40e_dev_xstats_get(struct rte_eth_dev *dev,
   struct rte_eth_xstats *xstats, unsigned n);
+static int i40e_dev_xstats_names(struct rte_eth_dev *dev,
+struct rte_eth_xstats_name *ptr_names,
+unsigned limit);
 static void i40e_dev_stats_reset(struct rte_eth_dev *dev);
 static int i40e_dev_queue_stats_mapping_set(struct rte_eth_dev *dev,
uint16_t queue_id,
@@ -467,6 +470,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.link_update  = i40e_dev_link_update,
.stats_get= i40e_dev_stats_get,
.xstats_get   = i40e_dev_xstats_get,
+   .xstats_names = i40e_dev_xstats_names,
.stats_reset  = i40e_dev_stats_reset,
.xstats_reset = i40e_dev_stats_reset,
.queue_stats_mapping_set  = i40e_dev_queue_stats_mapping_set,
@@ -2205,6 +2209,59 @@ i40e_xstats_calc_num(void)
(I40E_NB_TXQ_PRIO_XSTATS * 8);
 }

+static int i40e_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
+struct rte_eth_xstats_name *ptr_names,
+__rte_unused unsigned limit)
+{
+   unsigned count = 0;
+   unsigned i, prio;
+
+   if (ptr_names == NULL)
+   return i40e_xstats_calc_num();
+
+   /* Note: limit checked in rte_eth_xstats_names() */
+
+   /* Get stats from i40e_eth_stats struct */
+   for (i = 0; i < I40E_NB_ETH_XSTATS; i++) {
+   snprintf(ptr_names[count].name, sizeof(ptr_names[count].name),
+"%s", rte_i40e_stats_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+
+   /* Get individiual stats from i40e_hw_port struct */
+   for (i = 0; i < I40E_NB_HW_PORT_XSTATS; i++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+"%s", rte_i40e_hw_port_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+
+   for (i = 0; i < I40E_NB_RXQ_PRIO_XSTATS; i++) {
+   for (prio = 0; prio < 8; prio++) {
+   snprintf(ptr_names[count].name,
+sizeof(ptr_names[count].name),
+"rx_priority%u_%s", prio,
+rte_i40e_rxq_prio_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   }
+
+   for (i = 0; i < I40E_NB_TXQ_PRIO_XSTATS; i++) {
+   for (prio = 0; prio < 8; prio++) {
+   snprintf(ptr_names[count].name,
+sizeof(ptr_names[count].name),
+"tx_priority%u_%s", prio,
+rte_i40e_txq_prio_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   }
+   return count;
+}
+
 static int
 i40e_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstats *xstats,
unsigned n)
@@ -2227,8 +2284,8 @@ i40e_dev_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Get stats from i40e_eth_stats struct */
for (i = 0; i < I40E_NB_ETH_XSTATS; i++) {
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-"%s", rte_i40e_stats_strings[i].name);
+   xstats[count].name[0] = '\0';
+   xstats[count].id = count;
xstats[count].value = *(uint64_t *)(((char *)&hw_stats->eth) +
rte_i40e_stats_strings[i].offset);
c

[dpdk-dev] [PATCH v3 04/10] drivers/net/fm10k: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the fm10k driver
to use the new API that seperates name string and value queries.

Signed-off-by: Remy Horton 
---
 drivers/net/fm10k/fm10k_ethdev.c | 55 +---
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c2d377f..179441d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -1256,6 +1256,47 @@ fm10k_link_update(struct rte_eth_dev *dev,
return 0;
 }

+static int fm10k_xstats_names(__rte_unused struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit)
+{
+   unsigned i, q;
+   unsigned count = 0;
+
+   if (ptr_names != NULL) {
+   /* Note: limit checked in rte_eth_xstats_names() */
+
+   /* Global stats */
+   for (i = 0; i < FM10K_NB_HW_XSTATS; i++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "%s", fm10k_hw_stats_strings[count].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+
+   /* PF queue stats */
+   for (q = 0; q < FM10K_MAX_QUEUES_PF; q++) {
+   for (i = 0; i < FM10K_NB_RX_Q_XSTATS; i++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "rx_q%u_%s", q,
+   fm10k_hw_stats_rx_q_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   for (i = 0; i < FM10K_NB_TX_Q_XSTATS; i++) {
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "tx_q%u_%s", q,
+   fm10k_hw_stats_tx_q_strings[i].name);
+   ptr_names[count].id = count;
+   count++;
+   }
+   }
+   }
+   return FM10K_NB_XSTATS;
+}
+
 static int
 fm10k_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstats *xstats,
 unsigned n)
@@ -1269,8 +1310,7 @@ fm10k_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Global stats */
for (i = 0; i < FM10K_NB_HW_XSTATS; i++) {
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-"%s", fm10k_hw_stats_strings[count].name);
+   xstats[count].name[0] = '\0';
xstats[count].value = *(uint64_t *)(((char *)hw_stats) +
fm10k_hw_stats_strings[count].offset);
count++;
@@ -1279,18 +1319,14 @@ fm10k_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,
/* PF queue stats */
for (q = 0; q < FM10K_MAX_QUEUES_PF; q++) {
for (i = 0; i < FM10K_NB_RX_Q_XSTATS; i++) {
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-"rx_q%u_%s", q,
-fm10k_hw_stats_rx_q_strings[i].name);
+   xstats[count].name[0] = '\0';
xstats[count].value =
*(uint64_t *)(((char *)&hw_stats->q[q]) +
fm10k_hw_stats_rx_q_strings[i].offset);
count++;
}
for (i = 0; i < FM10K_NB_TX_Q_XSTATS; i++) {
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-"tx_q%u_%s", q,
-fm10k_hw_stats_tx_q_strings[i].name);
+   xstats[count].name[0] = '\0';
xstats[count].value =
*(uint64_t *)(((char *)&hw_stats->q[q]) +
fm10k_hw_stats_tx_q_strings[i].offset);
@@ -2629,6 +2665,7 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.allmulticast_disable   = fm10k_dev_allmulticast_disable,
.stats_get  = fm10k_stats_get,
.xstats_get = fm10k_xstats_get,
+   .xstats_names   = fm10k_xstats_names,
.stats_

[dpdk-dev] [PATCH v3 03/10] drivers/net/e1000: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the e1000 driver
to use the new API that seperates name string and value queries.

Signed-off-by: Remy Horton 
---
 drivers/net/e1000/igb_ethdev.c | 52 ++
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index f0921ee..6d5e46f 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -100,6 +100,9 @@ static void eth_igb_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *rte_stats);
 static int eth_igb_xstats_get(struct rte_eth_dev *dev,
  struct rte_eth_xstats *xstats, unsigned n);
+static int eth_igb_xstats_names(struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names,
+   unsigned limit);
 static void eth_igb_stats_reset(struct rte_eth_dev *dev);
 static void eth_igb_xstats_reset(struct rte_eth_dev *dev);
 static void eth_igb_infos_get(struct rte_eth_dev *dev,
@@ -165,6 +168,9 @@ static void eth_igbvf_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *rte_stats);
 static int eth_igbvf_xstats_get(struct rte_eth_dev *dev,
struct rte_eth_xstats *xstats, unsigned n);
+static int eth_igbvf_xstats_names(struct rte_eth_dev *dev,
+ struct rte_eth_xstats_name *ptr_names,
+ unsigned limit);
 static void eth_igbvf_stats_reset(struct rte_eth_dev *dev);
 static int igbvf_vlan_filter_set(struct rte_eth_dev *dev,
uint16_t vlan_id, int on);
@@ -324,6 +330,7 @@ static const struct eth_dev_ops eth_igb_ops = {
.link_update  = eth_igb_link_update,
.stats_get= eth_igb_stats_get,
.xstats_get   = eth_igb_xstats_get,
+   .xstats_names = eth_igb_xstats_names,
.stats_reset  = eth_igb_stats_reset,
.xstats_reset = eth_igb_xstats_reset,
.dev_infos_get= eth_igb_infos_get,
@@ -385,6 +392,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.link_update  = eth_igb_link_update,
.stats_get= eth_igbvf_stats_get,
.xstats_get   = eth_igbvf_xstats_get,
+   .xstats_names = eth_igbvf_xstats_names,
.stats_reset  = eth_igbvf_stats_reset,
.xstats_reset = eth_igbvf_stats_reset,
.vlan_filter_set  = igbvf_vlan_filter_set,
@@ -1691,6 +1699,26 @@ eth_igb_xstats_reset(struct rte_eth_dev *dev)
memset(stats, 0, sizeof(*stats));
 }

+static int eth_igb_xstats_names(__rte_unused struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names,
+   __rte_unused unsigned limit)
+{
+   unsigned i;
+
+   if (ptr_names == NULL)
+   return IGB_NB_XSTATS;
+
+   /* Note: limit checked in rte_eth_xstats_names() */
+
+   for (i = 0; i < IGB_NB_XSTATS; i++) {
+   snprintf(ptr_names[i].name, sizeof(ptr_names[i].name),
+"%s", rte_igb_stats_strings[i].name);
+   ptr_names[i].id = i;
+   }
+
+   return IGB_NB_XSTATS;
+}
+
 static int
 eth_igb_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstats *xstats,
   unsigned n)
@@ -1713,8 +1741,8 @@ eth_igb_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstats *xstats,

/* Extended stats */
for (i = 0; i < IGB_NB_XSTATS; i++) {
-   snprintf(xstats[i].name, sizeof(xstats[i].name),
-"%s", rte_igb_stats_strings[i].name);
+   xstats[i].name[0] = '\0';
+   xstats[i].id = i;
xstats[i].value = *(uint64_t *)(((char *)hw_stats) +
rte_igb_stats_strings[i].offset);
}
@@ -1762,6 +1790,22 @@ igbvf_read_stats_registers(struct e1000_hw *hw, struct 
e1000_vf_stats *hw_stats)
hw_stats->last_gotlbc, hw_stats->gotlbc);
 }

+static int eth_igbvf_xstats_names(__rte_unused struct rte_eth_dev *dev,
+ struct rte_eth_xstats_name *ptr_names,
+ __rte_unused unsigned limit)
+{
+   unsigned i;
+
+   if (ptr_names != NULL)
+   for (i = 0; i < IGBVF_NB_XSTATS; i++) {
+   snprintf(ptr_names[i].name,
+   sizeof(ptr_names[i].name), "%s",
+   rte_igbvf_stats_strings[i].name);
+   ptr_names[i].id = i;
+   }
+   return IGBVF_NB_XSTATS;
+}
+
 static int
 eth_igbvf_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstats *xstats,
 unsigned n)
@@ -1780,8 +1824

[dpdk-dev] [PATCH v3 02/10] drivers/net/ixgbe: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the ixgbe driver
to use the new API that seperates name string and value queries.

Signed-off-by: Remy Horton 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 101 +--
 1 file changed, 86 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a2b170b..bb5940b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -179,6 +179,10 @@ static int ixgbevf_dev_xstats_get(struct rte_eth_dev *dev,
  struct rte_eth_xstats *xstats, unsigned n);
 static void ixgbe_dev_stats_reset(struct rte_eth_dev *dev);
 static void ixgbe_dev_xstats_reset(struct rte_eth_dev *dev);
+static int ixgbe_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit);
+static int ixgbevf_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit);
 static int ixgbe_dev_queue_stats_mapping_set(struct rte_eth_dev *eth_dev,
 uint16_t queue_id,
 uint8_t stat_idx,
@@ -466,6 +470,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.xstats_get   = ixgbe_dev_xstats_get,
.stats_reset  = ixgbe_dev_stats_reset,
.xstats_reset = ixgbe_dev_xstats_reset,
+   .xstats_names = ixgbe_dev_xstats_names,
.queue_stats_mapping_set = ixgbe_dev_queue_stats_mapping_set,
.dev_infos_get= ixgbe_dev_info_get,
.dev_supported_ptypes_get = ixgbe_dev_supported_ptypes_get,
@@ -555,6 +560,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.xstats_get   = ixgbevf_dev_xstats_get,
.stats_reset  = ixgbevf_dev_stats_reset,
.xstats_reset = ixgbevf_dev_stats_reset,
+   .xstats_names = ixgbevf_dev_xstats_names,
.dev_close= ixgbevf_dev_close,
.allmulticast_enable  = ixgbevf_dev_allmulticast_enable,
.allmulticast_disable = ixgbevf_dev_allmulticast_disable,
@@ -685,6 +691,7 @@ static const struct rte_ixgbe_xstats_name_off 
rte_ixgbe_rxq_strings[] = {

 #define IXGBE_NB_RXQ_PRIO_STATS (sizeof(rte_ixgbe_rxq_strings) / \
   sizeof(rte_ixgbe_rxq_strings[0]))
+#define IXGBE_NB_RXQ_PRIO_VALUES 8

 static const struct rte_ixgbe_xstats_name_off rte_ixgbe_txq_strings[] = {
{"xon_packets", offsetof(struct ixgbe_hw_stats, pxontxc)},
@@ -695,6 +702,7 @@ static const struct rte_ixgbe_xstats_name_off 
rte_ixgbe_txq_strings[] = {

 #define IXGBE_NB_TXQ_PRIO_STATS (sizeof(rte_ixgbe_txq_strings) / \
   sizeof(rte_ixgbe_txq_strings[0]))
+#define IXGBE_NB_TXQ_PRIO_VALUES 8

 static const struct rte_ixgbe_xstats_name_off rte_ixgbevf_stats_strings[] = {
{"rx_multicast_packets", offsetof(struct ixgbevf_hw_stats, vfmprc)},
@@ -2695,8 +2703,75 @@ ixgbe_dev_stats_reset(struct rte_eth_dev *dev)
 /* This function calculates the number of xstats based on the current config */
 static unsigned
 ixgbe_xstats_calc_num(void) {
-   return IXGBE_NB_HW_STATS + (IXGBE_NB_RXQ_PRIO_STATS * 8) +
-   (IXGBE_NB_TXQ_PRIO_STATS * 8);
+   return IXGBE_NB_HW_STATS +
+   (IXGBE_NB_RXQ_PRIO_STATS * IXGBE_NB_RXQ_PRIO_VALUES) +
+   (IXGBE_NB_TXQ_PRIO_STATS * IXGBE_NB_TXQ_PRIO_VALUES);
+}
+
+static int ixgbe_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
+   struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit)
+{
+   const unsigned cnt_stats = ixgbe_xstats_calc_num();
+   unsigned stat, i, count;
+
+   if (ptr_names != NULL) {
+   count = 0;
+
+   /* Note: limit >= cnt_stats checked upstream
+* in rte_eth_xstats_names()
+*/
+
+   /* Extended stats from ixgbe_hw_stats */
+   for (i = 0; i < IXGBE_NB_HW_STATS; i++) {
+   ptr_names[count].id = count;
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+   "%s",
+   rte_ixgbe_stats_strings[i].name);
+   count++;
+   }
+
+   /* RX Priority Stats */
+   for (stat = 0; stat < IXGBE_NB_RXQ_PRIO_STATS; stat++) {
+   for (i = 0; i < IXGBE_NB_RXQ_PRIO_VALUES; i++) {
+   ptr_names[count].id = count;
+   snprintf(ptr_names[count].name,
+   sizeof(ptr_names[count].name),
+

[dpdk-dev] [PATCH v3 01/10] rte: change xstats to use integer ids

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patch changes the API for
xstats to use integer identifiers instead of strings.

Signed-off-by: Remy Horton 
---
 lib/librte_ether/rte_ethdev.c  | 95 +++---
 lib/librte_ether/rte_ethdev.h  | 44 
 lib/librte_ether/rte_ether_version.map |  7 +++
 3 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..86fb0bc 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1502,6 +1502,87 @@ rte_eth_stats_reset(uint8_t port_id)
dev->data->rx_mbuf_alloc_failed = 0;
 }

+int
+rte_eth_xstats_count(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+   int count;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+   dev = &rte_eth_devices[port_id];
+   if (dev->dev_ops->xstats_names != NULL) {
+   count = (*dev->dev_ops->xstats_names)(dev, NULL, 0);
+   if (count < 0)
+   return count;
+   } else
+   count = 0;
+   count += RTE_NB_STATS;
+   count += dev->data->nb_rx_queues * RTE_NB_RXQ_STATS;
+   count += dev->data->nb_tx_queues * RTE_NB_TXQ_STATS;
+   return count;
+}
+
+int
+rte_eth_xstats_names(uint8_t port_id, struct rte_eth_xstats_name *ptr_names,
+   unsigned limit)
+{
+   struct rte_eth_dev *dev;
+   int cnt_used_entries;
+   int cnt_expected_entries;
+   uint32_t idx, id_queue;
+
+   if (ptr_names == NULL)
+   return -EINVAL;
+   cnt_expected_entries = rte_eth_xstats_count(port_id);
+   if (cnt_expected_entries < 0)
+   return cnt_expected_entries;
+   if ((int)limit < cnt_expected_entries)
+   return -ERANGE;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+   dev = &rte_eth_devices[port_id];
+   if (dev->dev_ops->xstats_names != NULL) {
+   cnt_used_entries = (*dev->dev_ops->xstats_names)(
+   dev, ptr_names, limit);
+   if (cnt_used_entries < 0)
+   return cnt_used_entries;
+   } else
+   /* Driver itself does not support extended stats, but
+* still have basic stats.
+*/
+   cnt_used_entries = 0;
+
+   for (idx = 0; idx < RTE_NB_STATS; idx++) {
+   ptr_names[cnt_used_entries].id = cnt_used_entries;
+   snprintf(ptr_names[cnt_used_entries].name,
+   sizeof(ptr_names[0].name),
+   "%s", rte_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+   for (id_queue = 0; id_queue < dev->data->nb_rx_queues; id_queue++) {
+   for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
+   ptr_names[cnt_used_entries].id = cnt_used_entries;
+   snprintf(ptr_names[cnt_used_entries].name,
+   sizeof(ptr_names[0].name),
+   "rx_q%u%s",
+   id_queue, rte_rxq_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+
+   }
+   for (id_queue = 0; id_queue < dev->data->nb_tx_queues; id_queue++) {
+   for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
+   ptr_names[cnt_used_entries].id = cnt_used_entries;
+   snprintf(ptr_names[cnt_used_entries].name,
+   sizeof(ptr_names[0].name),
+   "tx_q%u%s",
+   id_queue, rte_txq_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+   }
+   return cnt_used_entries;
+}
+
 /* retrieve ethdev extended statistics */
 int
 rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats *xstats,
@@ -1546,8 +1627,8 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats 
*xstats,
stats_ptr = RTE_PTR_ADD(ð_stats,
rte_stats_strings[i].offset);
val = *stats_ptr;
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-   "%s", rte_stats_strings[i].name);
+   xstats[count].name[0] = '\0';
+   xstats[count].id = count + xcount;
xstats[count++].value = val;
}

@@ -1558,9 +1639,8 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats 
*xstats,
rte_rxq_stats_strings[i].offset +
q * sizeof(uint64_t));
val = *stats_ptr;
-   snprintf(xstats[count].name, sizeof(xstats[count].name),
-   "rx_q%u_%s", q,
-

[dpdk-dev] [PATCH v3 00/10] Remove string operations from xstats

2016-05-30 Thread Remy Horton

The current extended ethernet statistics fetching involve doing several
string operations, which causes performance issues if there are lots of
statistics and/or network interfaces. This patchset changes the API for
xstats to use integer identifiers instead of strings and implements
this new API for the ixgbe, i40e, e1000, fm10k, and virtio drivers.

--

v3 changes:
* Corrected ixgbe vf xstats fetching
* Added xstats changes to e1000, f10k, and virtio drivers
* Added cleanup patch that removes now-redundant name field
* Removed ethtool xstats command 
* Removed unused .xstats_count from eth-dev_ops
* Changed test-pmd & proc_info to use new API
* Added documentation update
* Added missing changes to .map file (affected shared lib builds)

v2 changes:
* Fetching xstats count now seperate API function
* Added #define constants for some magic numbers
* Fixed bug with virtual function count fetching
* For non-xstats-supporting drivers, queue stats returned
* Some refactoring/cleanups
* Removed index assumption from example

Remy Horton (10):
  rte: change xstats to use integer ids
  drivers/net/ixgbe: change xstats to use integer ids
  drivers/net/e1000: change xstats to use integer ids
  drivers/net/fm10k: change xstats to use integer ids
  drivers/net/i40e: change xstats to use integer ids
  drivers/net/virtio: change xstats to use integer ids
  app/test-pmd: change xstats to use integer ids
  app/proc_info: change xstats to use integer ids
  remove name field from struct rte_eth_xstats
  doc: update xstats documentation

 app/proc_info/main.c| 26 -
 app/test-pmd/config.c   | 52 +
 doc/guides/prog_guide/poll_mode_drv.rst | 25 +++--
 drivers/net/e1000/igb_ethdev.c  | 50 +++--
 drivers/net/fm10k/fm10k_ethdev.c| 52 ++---
 drivers/net/i40e/i40e_ethdev.c  | 77 +-
 drivers/net/i40e/i40e_ethdev_vf.c   | 24 +++-
 drivers/net/ixgbe/ixgbe_ethdev.c| 98 -
 drivers/net/virtio/virtio_ethdev.c  | 60 +---
 lib/librte_ether/rte_ethdev.c   | 92 ---
 lib/librte_ether/rte_ethdev.h   | 44 ++-
 lib/librte_ether/rte_ether_version.map  |  7 +++
 12 files changed, 527 insertions(+), 80 deletions(-)

-- 
2.5.5

[dpdk-dev] [PATCH v2 2/2] examples/ethtool: get reg width to allocate memory

2016-05-30 Thread z...@semihalf.com

From: Zyta Szpak 

Version 2 of fixing the fixed register width assumption.
Not every device uses 32-bit wide register. The app was allocating too
little space for 64-bit registers which resulted in memory corruption.
This commit resolves this by getting the size of register in bytes for
a specific device. If the device does not implement this function, it
fallsback to sizeof(uint32_t)

Signed-off-by: Zyta Szpak 
---
 examples/ethtool/lib/rte_ethtool.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/examples/ethtool/lib/rte_ethtool.c 
b/examples/ethtool/lib/rte_ethtool.c
index 42e05f1..59191ca 100644
--- a/examples/ethtool/lib/rte_ethtool.c
+++ b/examples/ethtool/lib/rte_ethtool.c
@@ -88,10 +88,14 @@ int
 rte_ethtool_get_regs_len(uint8_t port_id)
 {
int count_regs;
+   int reg_width;

count_regs = rte_eth_dev_get_reg_length(port_id);
+   reg_width = rte_eth_dev_get_reg_width(port_id);
+   if (reg_width < 0)
+   reg_width = sizeof(uint32_t);
if (count_regs > 0)
-   return count_regs * sizeof(uint32_t);
+   return count_regs * reg_width;
return count_regs;
 }

-- 
1.9.1

[dpdk-dev] [PATCH v2 1/2] ethdev: add callback to get register size in bytes

2016-05-30 Thread z...@semihalf.com

From: Zyta Szpak 

Version 2 of fixing the fixed register width assumption.
rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
do not provide register size to the app in any way. It is
needed to allocate proper number of bytes before retrieving
registers content with rte_eth_dev_get_reg.

Signed-off-by: Zyta Szpak 
---
 lib/librte_ether/rte_ethdev.c  | 12 
 lib/librte_ether/rte_ethdev.h  | 18 ++
 lib/librte_ether/rte_ether_version.map |  1 +
 3 files changed, 31 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..e0765f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3231,6 +3231,18 @@ rte_eth_dev_get_reg_length(uint8_t port_id)
 }

 int
+rte_eth_dev_get_reg_width(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = &rte_eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_width, -ENOTSUP);
+   return (*dev->dev_ops->get_reg_width)(dev);
+}
+
+int
 rte_eth_dev_get_reg_info(uint8_t port_id, struct rte_dev_reg_info *info)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..552eaed 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1292,6 +1292,9 @@ typedef int (*eth_timesync_write_time)(struct rte_eth_dev 
*dev,
 typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
 /**< @internal Retrieve device register count  */

+typedef int (*eth_get_reg_width_t)(struct rte_eth_dev *dev);
+/**< @internal Retrieve device register byte number */
+
 typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
struct rte_dev_reg_info *info);
 /**< @internal Retrieve registers  */
@@ -1455,6 +1458,8 @@ struct eth_dev_ops {

eth_get_reg_length_t get_reg_length;
/**< Get # of registers */
+   eth_get_reg_width_t get_reg_width;
+   /**< Get # of bytes in register */
eth_get_reg_t get_reg;
/**< Get registers */
eth_get_eeprom_length_t get_eeprom_length;
@@ -3971,6 +3976,19 @@ int rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t 
queue_id,
  */
 int rte_eth_dev_get_reg_length(uint8_t port_id);

+/*
+ * Retrieve the number of bytes in register for a specific device
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (>=0) number of registers if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - others depends on the specific operations implementation.
+ */
+int rte_eth_dev_get_reg_width(uint8_t port_id);
+
 /**
  * Retrieve device registers and register attributes
  *
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 214ecc7..288bc63 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -130,5 +130,6 @@ DPDK_16.04 {
rte_eth_tx_buffer_drop_callback;
rte_eth_tx_buffer_init;
rte_eth_tx_buffer_set_err_callback;
+   rte_eth_dev_get_reg_width;

 } DPDK_2.2;
-- 
1.9.1

[dpdk-dev] [PATCH v2 1/2] ethdev: add callback to get register size in bytes

2016-05-30 Thread z...@semihalf.com

From: Zyta Szpak 

Version 2 of fixing the fixed register width assumption.
rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
do not provide register size to the app in any way. It is
needed to allocate proper number of bytes before retrieving
registers content with rte_eth_dev_get_reg.

Signed-off-by: Zyta Szpak 
---
 lib/librte_ether/rte_ethdev.c  | 12 
 lib/librte_ether/rte_ethdev.h  | 18 ++
 lib/librte_ether/rte_ether_version.map |  1 +
 3 files changed, 31 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..e0765f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3231,6 +3231,18 @@ rte_eth_dev_get_reg_length(uint8_t port_id)
 }

 int
+rte_eth_dev_get_reg_width(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = &rte_eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_width, -ENOTSUP);
+   return (*dev->dev_ops->get_reg_width)(dev);
+}
+
+int
 rte_eth_dev_get_reg_info(uint8_t port_id, struct rte_dev_reg_info *info)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..552eaed 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1292,6 +1292,9 @@ typedef int (*eth_timesync_write_time)(struct rte_eth_dev 
*dev,
 typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
 /**< @internal Retrieve device register count  */

+typedef int (*eth_get_reg_width_t)(struct rte_eth_dev *dev);
+/**< @internal Retrieve device register byte number */
+
 typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
struct rte_dev_reg_info *info);
 /**< @internal Retrieve registers  */
@@ -1455,6 +1458,8 @@ struct eth_dev_ops {

eth_get_reg_length_t get_reg_length;
/**< Get # of registers */
+   eth_get_reg_width_t get_reg_width;
+   /**< Get # of bytes in register */
eth_get_reg_t get_reg;
/**< Get registers */
eth_get_eeprom_length_t get_eeprom_length;
@@ -3971,6 +3976,19 @@ int rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t 
queue_id,
  */
 int rte_eth_dev_get_reg_length(uint8_t port_id);

+/*
+ * Retrieve the number of bytes in register for a specific device
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (>=0) number of registers if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - others depends on the specific operations implementation.
+ */
+int rte_eth_dev_get_reg_width(uint8_t port_id);
+
 /**
  * Retrieve device registers and register attributes
  *
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 214ecc7..288bc63 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -130,5 +130,6 @@ DPDK_16.04 {
rte_eth_tx_buffer_drop_callback;
rte_eth_tx_buffer_init;
rte_eth_tx_buffer_set_err_callback;
+   rte_eth_dev_get_reg_width;

 } DPDK_2.2;
-- 
1.9.1

[dpdk-dev] [PATCH 1/2] ethdev: add callback to get register size in bytes

2016-05-30 Thread Zyta Szpak



On 27.05.2016 12:28, Panu Matilainen wrote:
> On 05/25/2016 09:36 AM, zr at semihalf.com wrote:
>> From: Zyta Szpak 
>>
>> Version 2 of fixing the fixed register width assumption.
>> rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
>> do not provide register size to the app in any way. It is
>> needed to allocate proper number of bytes before retrieving
>> registers content with rte_eth_dev_get_reg.
>>
>> Signed-off-by: Zyta Szpak 
>> ---
>>  lib/librte_ether/rte_ethdev.c | 12 
>>  lib/librte_ether/rte_ethdev.h | 18 ++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c 
>> b/lib/librte_ether/rte_ethdev.c
>> index a31018e..e0765f8 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -3231,6 +3231,18 @@ rte_eth_dev_get_reg_length(uint8_t port_id)
>>  }
>>
>>  int
>> +rte_eth_dev_get_reg_width(uint8_t port_id)
>> +{
>> +struct rte_eth_dev *dev;
>> +
>> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> +
>> +dev = &rte_eth_devices[port_id];
>> +RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_width, -ENOTSUP);
>> +return (*dev->dev_ops->get_reg_width)(dev);
>> +}
>> +
>> +int
>>  rte_eth_dev_get_reg_info(uint8_t port_id, struct rte_dev_reg_info 
>> *info)
>>  {
>>  struct rte_eth_dev *dev;
>> diff --git a/lib/librte_ether/rte_ethdev.h 
>> b/lib/librte_ether/rte_ethdev.h
>> index 2757510..552eaed 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -1292,6 +1292,9 @@ typedef int (*eth_timesync_write_time)(struct 
>> rte_eth_dev *dev,
>>  typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
>>  /**< @internal Retrieve device register count  */
>>
>> +typedef int (*eth_get_reg_width_t)(struct rte_eth_dev *dev);
>> +/**< @internal Retrieve device register byte number */
>> +
>>  typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
>>  struct rte_dev_reg_info *info);
>>  /**< @internal Retrieve registers  */
>> @@ -1455,6 +1458,8 @@ struct eth_dev_ops {
>>
>>  eth_get_reg_length_t get_reg_length;
>>  /**< Get # of registers */
>> +eth_get_reg_width_t get_reg_width;
>> +/**< Get # of bytes in register */
>>  eth_get_reg_t get_reg;
>>  /**< Get registers */
>>  eth_get_eeprom_length_t get_eeprom_length;
>
> This is an ABI break, but maybe it is part of that "driver 
> implementation API" which is exempt from the ABI/API policies. Thomas?
>
>> @@ -3971,6 +3976,19 @@ int rte_eth_tx_queue_info_get(uint8_t port_id, 
>> uint16_t queue_id,
>>   */
>>  int rte_eth_dev_get_reg_length(uint8_t port_id);
>>
>> +/*
>> + * Retrieve the number of bytes in register for a specific device
>> + *
>> + * @param port_id
>> + *   The port identifier of the Ethernet device.
>> + * @return
>> + *   - (>=0) number of registers if successful.
>> + *   - (-ENOTSUP) if hardware doesn't support.
>> + *   - (-ENODEV) if *port_id* invalid.
>> + *   - others depends on the specific operations implementation.
>> + */
>> +int rte_eth_dev_get_reg_width(uint8_t port_id);
>> +
>>  /**
>>   * Retrieve device registers and register attributes
>>   *
>
> The function needs to be exported via rte_ether_version.map as well.
OK, right!
>
> - Panu -
>>
>

[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-30 Thread Yuanhan Liu

On Mon, May 30, 2016 at 02:40:00AM +, Xie, Huawei wrote:
> On 5/27/2016 5:06 PM, Yuanhan Liu wrote:
> > On Tue, May 24, 2016 at 09:38:32PM +0800, Huawei Xie wrote:
> >>vq->vq_ring_mem = mz->phys_addr;
> >>vq->vq_ring_virt_mem = mz->addr;
> >> -  PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64, 
> >> (uint64_t)mz->phys_addr);
> >> -  PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, 
> >> (uint64_t)(uintptr_t)mz->addr);
> >> -  vq->virtio_net_hdr_mz  = NULL;
> >> -  vq->virtio_net_hdr_mem = 0;
> >> +  PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,
> >> +   (uint64_t)mz->phys_addr);
> >> +  PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64,
> >> +   (uint64_t)(uintptr_t)mz->addr);
> >> +
> >> +  hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz, socket_id,
> >> +   0, RTE_CACHE_LINE_SIZE);
> > We don't need allocate hdr_mz for Rx queue, and in such case, sz_hdr_mz
> > is 0. I'm wondering what hdr_mz would be then, NULL?
> >
> > Anyway, you should skip the hdr_mz allocation for Rx queue, and I also
> > would suggest you to move the vq_hdr_name setup here.
> 
> will check sz_hdr_mz before the zone allocation.
> 
> 
> >
> >> +  if (hdr_mz == NULL) {
> >> +  if (rte_errno == EEXIST)
> >> +  hdr_mz = rte_memzone_lookup(vq_hdr_name);
> >> +  if (hdr_mz == NULL) {
> >> +  ret = -ENOMEM;
> >> +  goto fail_q_alloc;
> >> +  }
> >> +  }
> >>  
> > ...
> >>  
> >>PMD_INIT_FUNC_TRACE();
> >>ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
> >> -  vtpci_queue_idx, 0, socket_id, &vq);
> >> +  vtpci_queue_idx, 0, socket_id, (void **)&cvq);
> > Unnecessary cast. Note that there are few others like that in this
> > patch.
> 
> This cast is needed.

Oh, right, indeed. Sorry.

> >
> >> -  PMD_RX_LOG(DEBUG, "dequeue:%d", num);
> >> -  PMD_RX_LOG(DEBUG, "packet len:%d", len[0]);
> >> +  PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
> >> +  PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
> > We should not append "\n" for PMD_RX_LOG; this macro alreadys does it.
> 
> Weird. Will remove it. Thanks.
> 
> >
> > Another note is that you might want to run checkpatch; I saw quite many
> > warnings.
> 
> Had checked. The warnings are all due to 80 char limitation of
> virtio_rxq_stats_strings. Just 4 or 5 chars cross 80 line limit. I
> prefer to keep the fields aligned.

Agreed. However, I was talking about others warnings.

--yliu

---
CHECK:BRACES: braces {} should be used on all arms of this statement
#198: FILE: drivers/net/virtio/virtio_ethdev.c:343:
+   if (queue_type == VTNET_RQ)
[...]
+   else if (queue_type == VTNET_TQ) {
[...]
+   } else if (queue_type == VTNET_CQ) {
[...]

CHECK:CAMELCASE: Avoid CamelCase: 
#280: FILE: drivers/net/virtio/virtio_ethdev.c:404:
+   PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces
between elements
#280: FILE: drivers/net/virtio/virtio_ethdev.c:404:
+   PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces
between elements
#282: FILE: drivers/net/virtio/virtio_ethdev.c:406:
+   PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64,

WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
#482: FILE: drivers/net/virtio/virtio_ethdev.c:753:
+   unsigned nstats = dev->data->nb_tx_queues * VIRTIO_NB_TXQ_XSTATS
+

WARNING:LONG_LINE: line over 80 characters
#551: FILE: drivers/net/virtio/virtio_ethdev.c:819:
+   memset(txvq->stats.size_bins, 0,
sizeof(txvq->stats.size_bins[0]) * 8);

WARNING:LONG_LINE: line over 80 characters
#571: FILE: drivers/net/virtio/virtio_ethdev.c:832:
+   memset(rxvq->stats.size_bins, 0,
sizeof(rxvq->stats.size_bins[0]) * 8);

CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#1529: FILE: drivers/net/virtio/virtio_rxtx_simple.c:362:
+   nb_commit = nb_pkts = RTE_MIN((vq->vq_free_cnt >> 1), nb_pkts);

CHECK:SPACING: No space is necessary after a cast
#1530: FILE: drivers/net/virtio/virtio_rxtx_simple.c:363:
+   desc_idx = (uint16_t) (vq->vq_avail_idx & desc_idx_max);

[dpdk-dev] [PATCH 1/2] ethdev: add callback to get register size in bytes

2016-05-30 Thread Zyta Szpak

Hi,
It is the standard DPDK return value -ENOTSUP when the function is not 
supported by Ethernet device. I think it is safer to keep it this way 
rather than default implicitly to sizeof(uint32_t) and more generic.

Regards,
Zyta

On 25.05.2016 15:14, Remy Horton wrote:
> 'noon,
>
> Was expecting rte_eth_dev_get_reg_width() itself to default to 
> sizeof(uint32_t) rather than -ENOTSUP, but that is purely personal 
> taste which others might disagree with. You'll also need a 
> documentation update & Fixes: line.
>
>
> On 25/05/2016 07:36, zr at semihalf.com wrote:
>> From: Zyta Szpak 
> [..]
>> Signed-off-by: Zyta Szpak 
>
> Acked-by: Remy Horton

[dpdk-dev] [PATCH v5 8/8] doc: update doc for virtio-user

2016-05-30 Thread Jianfeng Tan

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
---
 doc/guides/rel_notes/release_16_07.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index f6d543c..b1054b6 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -34,6 +34,10 @@ This section should contain new features added in this 
release. Sample format:

   Refer to the previous release notes for examples.

+* **Virtio support for containers.**
+
+  Add a new virtual device, named virtio-user, to support virtio for 
containers.
+

 Resolved Issues
 ---
-- 
2.1.4

[dpdk-dev] [PATCH v5 7/8] virtio-user: add a new vdev named virtio-user

2016-05-30 Thread Jianfeng Tan

Add a new virtual device named vhost-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.

Configured parameters include:
  - queues (optional, 1 by default), number of queue pairs, multi-queue
not supported for now.
  - cq (optional, 0 by default), not supported for now.
  - mac (optional), random value will be given if not specified.
  - queue_size (optional, 256 by default), size of virtqueues.
  - path (madatory), path of vhost, depends on the file type, vhost
user if the given path points to a unix socket; vhost-net if the
given path points to a char device.
  - ifname (optional), specify the name of backend tap device; only
valid when backend is vhost-net.

When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled
library can be used in both VM and container environment.

Examples:
path_vhost=/dev/vhost-net # use vhost-net as a backend
path_vhost= # use vhost-user as a backend

sudo ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \
--socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
--vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c   |  19 +-
 drivers/net/virtio/virtio_ethdev.h   |   2 +
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 309 +++
 3 files changed, 323 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 1866afd..f8972f2 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -59,7 +59,6 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

-static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev);
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
@@ -1038,7 +1037,7 @@ rx_func_get(struct rte_eth_dev *eth_dev)
  * This function is based on probe() function in virtio_pci.c
  * It returns 0 on success.
  */
-static int
+int
 eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
@@ -1069,9 +1068,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

-   ret = vtpci_init(pci_dev, hw, &dev_flags);
-   if (ret)
-   return ret;
+   if (pci_dev) {
+   ret = vtpci_init(pci_dev, hw, &dev_flags);
+   if (ret)
+   return ret;
+   }

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1163,7 +1164,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
-   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+   if (pci_dev)
+   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);

@@ -1442,7 +1444,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 {
struct virtio_hw *hw = dev->data->dev_private;

-   dev_info->driver_name = dev->driver->pci_drv.name;
+   if (dev->pci_dev)
+   dev_info->driver_name = dev->driver->pci_drv.name;
+   else
+   dev_info->driver_name = "virtio-user PMD";
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 66423a0..284afaa 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -113,6 +113,8 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue, struct 
rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/

[dpdk-dev] [PATCH v5 6/8] virtio-user: add new virtual pci driver for virtio

2016-05-30 Thread Jianfeng Tan

This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  --
  | -- |
  | | virtio driver  | |> (virtio_user_pci.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  | |
  | -- |
  --
|
|
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_pci.h  |   1 +
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   2 +
 drivers/net/virtio/virtio_user/virtio_user_pci.c | 218 +++
 4 files changed, 222 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_pci.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 68068bd..13b2b75 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_pci.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index a76daf7..d10d013 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -260,6 +260,7 @@ struct virtio_hw {
struct virtio_pci_common_cfg *common_cfg;
struct virtio_net_config *dev_cfg;
const struct virtio_pci_ops *vtpci_ops;
+   void*virtio_user_dev;
 };

 /*
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 8ca0095..9ebe440 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -57,4 +57,6 @@ struct virtio_user_dev {
 int virtio_user_start_device(struct virtio_user_dev *dev);
 int virtio_user_stop_device(struct virtio_user_dev *dev);

+const struct virtio_pci_ops vdev_ops;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/virtio_user_pci.c 
b/drivers/net/virtio/virtio_user/virtio_user_pci.c
new file mode 100644
index 000..b56419b
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/virtio_user_pci.c
@@ -0,0 +1,218 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "../virtio_logs.h"
+#include "../virtio_pci.h"
+#include "../virtqueue.h"
+#include "virtio_user_dev.h"
+
+#define virtio_user_get_dev(hw) \
+   ((struct virtio_user_dev *)(hw)->virtio_user_dev);
+
+static void
+vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+void *dst, int length)
+{
+   int i;

[dpdk-dev] [PATCH v5 5/8] virtio-user: add device emulation layer APIs

2016-05-30 Thread Jianfeng Tan

Two device emulation layer APIs are added for virtio driver to call:
  - virtio_user_start_device()
  - virtio_user_stop_device()

These APIs will get called by virtio driver, and they call vhost adapter
layer APIs to implement the functionality. Besides, this patch defines
a struct named virtio_user_dev to help manage the data stands for this
kind of virtual device.

  --
  | -- |
  | | virtio driver  | |
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate |-|> (virtio_user_dev.c, virtio_user_dev.h)
  | || |
  | | vhost adapter  | |
  | -- |
  --
|
|
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 168 +++
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  60 
 3 files changed, 229 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.c
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index c9f2bc0..68068bd 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -59,6 +59,7 @@ ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
new file mode 100644
index 000..41d8ad1
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -0,0 +1,168 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vhost.h"
+#include "virtio_user_dev.h"
+#include "../virtio_ethdev.h"
+
+static int
+virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
+{
+   int callfd, kickfd;
+   struct vhost_vring_file file;
+   struct vhost_vring_state state;
+   struct vring *vring = &dev->vrings[queue_sel];
+   struct vhost_vring_addr addr = {
+   .index = queue_sel,
+   .desc_user_addr = (uint64_t)(uintptr_t)vring->desc,
+   .avail_user_addr = (uint64_t)(uintptr_t)vring->avail,
+   .used_user_addr = (uint64_t)(uintptr_t)vring->used,
+   .log_guest_addr = 0,
+   .flags = 0, /* disable log */
+   };
+
+   /* May use invalid flag, but some backend leverages kickfd and callfd as
+* criteria to judge if dev is alive. so finally we use real event_fd.
+*/
+   callfd = eventfd(0, O_CLOEXEC | O_NONBLOCK);
+   if (callfd < 0) {
+   PMD_DRV_LOG(ERR, "callfd error, %s\n", strerror(errno));
+   return -1;
+   }
+   kickfd = eventf

[dpdk-dev] [PATCH v5 4/8] virtio-user: add vhost adapter layer

2016-05-30 Thread Jianfeng Tan

This patch is to provide vhost adapter layer implementations. Instead
of relying on a hypervisor to translate between device emulation and
vhost backend, here we directly talk with vhost backend through the
vhost file. Depending on the type of vhost file,
  - vhost-user is used if the given path points to a unix socket;
  - vhost-kernel is used if the given path points to a char device.

Here three main APIs are provided to upper layer (device emulation):
  - vhost_user_setup(), to set up env to talk to a vhost user backend;
  - vhost_kernel_setup(), to set up env to talk to a vhost kernel backend.
  - vhost_call(), to provide a unified interface to communicate with
vhost backend.

  --
  | -- |
  | | virtio driver  | |
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
|
| -- --> (vhost-user protocol or vhost-net ioctls)
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 config/common_linuxapp|   3 +
 drivers/net/virtio/Makefile   |   6 +
 drivers/net/virtio/virtio_user/vhost.c| 105 +++
 drivers/net/virtio/virtio_user/vhost.h| 222 +++
 drivers/net/virtio/virtio_user/vhost_kernel.c | 254 +
 drivers/net/virtio/virtio_user/vhost_user.c   | 378 ++
 6 files changed, 968 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/vhost.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost.h
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost_user.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7e698e2..946a6d4 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -43,3 +43,6 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
+
+# Enable virtio-user
+CONFIG_RTE_VIRTIO_VDEV=y
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index ef84f60..c9f2bc0 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -55,6 +55,12 @@ ifeq ($(findstring 
RTE_MACHINE_CPUFLAG_SSSE3,$(CFLAGS)),RTE_MACHINE_CPUFLAG_SSSE
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 endif

+ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/virtio_user/vhost.c 
b/drivers/net/virtio/virtio_user/vhost.c
new file mode 100644
index 000..1944a97
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost.c
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

[dpdk-dev] [PATCH v5 3/8] virtio: enable use virtual address to fill desc

2016-05-30 Thread Jianfeng Tan

This patch is related to how to calculate relative address for vhost
backend.

The principle is that: based on one or multiple shared memory regions,
vhost maintains a reference system with the frontend start address,
backend start address, and length for each segment, so that each
frontend address (GPA, Guest Physical Address) can be translated into
vhost-recognizable backend address. To make the address translation
efficient, we need to maintain as few regions as possible. In the case
of VM, GPA is always locally continuous. But for some other case, like
virtio-user, we use virtual address here.

It basically means:
  a. when set_base_addr, VA address is used;
  b. when preparing RX's descriptors, VA address is used;
  c. when transmitting packets, VA is filled in TX's descriptors;
  d. in TX and CQ's header, VA is used.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c  | 21 -
 drivers/net/virtio/virtio_rxtx.c|  5 ++---
 drivers/net/virtio/virtio_rxtx_simple.c | 13 +++--
 drivers/net/virtio/virtqueue.h  | 13 -
 4 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 781886d..1866afd 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -167,14 +167,14 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 * One RX packet for ACK.
 */
vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mem;
vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
vq->vq_free_cnt--;
i = vq->vq_ring.desc[head].next;

for (k = 0; k < pkt_num; k++) {
vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr)
+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
vq->vq_ring.desc[i].len = dlen[k];
@@ -184,7 +184,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
}

vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr);
vq->vq_ring.desc[i].len = sizeof(ctrl->status);
vq->vq_free_cnt--;
@@ -419,8 +419,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mem = hdr_mz->phys_addr;

memset(hdr_mz->addr, 0, hdr_mz_sz);
-   vring_hdr_desc_init(vq);
-
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
@@ -441,6 +439,19 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

+   if (dev->pci_dev)
+   vq->offset = offsetof(struct rte_mbuf, buf_physaddr);
+   else {
+   vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
+   vq->offset = offsetof(struct rte_mbuf, buf_addr);
+   if (vq->virtio_net_hdr_mz)
+   vq->virtio_net_hdr_mem =
+   (phys_addr_t)vq->virtio_net_hdr_mz->addr;
+   }
+
+   if (queue_type == VTNET_TQ)
+   vring_hdr_desc_init(vq);
+
if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
PMD_INIT_LOG(ERR, "setup_queue failed");
virtio_dev_queue_release(vq);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index f326222..5b0c3df 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -193,8 +193,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct 
rte_mbuf *cookie)

start_dp = vq->vq_ring.desc;
start_dp[idx].addr =
-   (uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
-   - hw->vtnet_hdr_size);
+   MBUF_DATA_DMA_ADDR(cookie, vq->offset) - hw->vtnet_hdr_size;
start_dp[idx].len =
cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
start_dp[idx].flags =  VRING_DESC_F_WRITE;
@@ -265,7 +264,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
}

do {
-   start_dp[idx].addr  = rte_mbuf_data_dma_addr(cookie);
+   start_dp[idx].addr  = MBUF_DATA_DMA_ADDR(cookie, txvq->offset);
start_dp[idx].len   = cookie->data_len;

[dpdk-dev] [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup

2016-05-30 Thread Jianfeng Tan

Abstract vring hdr desc init as an inline method.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_ethdev.c | 42 ++
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index a3031e4..781886d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -278,6 +278,26 @@ virtio_dev_queue_release(struct virtqueue *vq)
}
 }

+static void
+vring_hdr_desc_init(struct virtqueue *vq)
+{
+   int i;
+   struct virtio_tx_region *txr = vq->virtio_net_hdr_mz->addr;
+
+   for (i = 0; i < vq->vq_nentries; i++) {
+   struct vring_desc *start_dp = txr[i].tx_indir;
+
+   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
+
+   /* first indirect descriptor is always the tx header */
+   start_dp->addr = vq->virtio_net_hdr_mem + i * sizeof(*txr) +
+offsetof(struct virtio_tx_region, tx_hdr);
+
+   start_dp->len = vq->hw->vtnet_hdr_size;
+   start_dp->flags = VRING_DESC_F_NEXT;
+   }
+}
+
 int virtio_dev_queue_setup(struct rte_eth_dev *dev,
int queue_type,
uint16_t queue_idx,
@@ -375,8 +395,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

if (queue_type == VTNET_TQ) {
const struct rte_memzone *hdr_mz;
-   struct virtio_tx_region *txr;
-   unsigned int i;
+   size_t hdr_mz_sz = vq_size * sizeof(struct virtio_tx_region);

/*
 * For each xmit packet, allocate a virtio_net_hdr
@@ -385,7 +404,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
 dev->data->port_id, queue_idx);
hdr_mz = rte_memzone_reserve_aligned(vq_name,
-vq_size * sizeof(*txr),
+hdr_mz_sz,
 socket_id, 0,
 RTE_CACHE_LINE_SIZE);
if (hdr_mz == NULL) {
@@ -399,21 +418,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mz = hdr_mz;
vq->virtio_net_hdr_mem = hdr_mz->phys_addr;

-   txr = hdr_mz->addr;
-   memset(txr, 0, vq_size * sizeof(*txr));
-   for (i = 0; i < vq_size; i++) {
-   struct vring_desc *start_dp = txr[i].tx_indir;
-
-   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
-
-   /* first indirect descriptor is always the tx header */
-   start_dp->addr = vq->virtio_net_hdr_mem
-   + i * sizeof(*txr)
-   + offsetof(struct virtio_tx_region, tx_hdr);
-
-   start_dp->len = vq->hw->vtnet_hdr_size;
-   start_dp->flags = VRING_DESC_F_NEXT;
-   }
+   memset(hdr_mz->addr, 0, hdr_mz_sz);
+   vring_hdr_desc_init(vq);

} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
-- 
2.1.4

[dpdk-dev] [PATCH v5 1/8] virtio: hide phys addr check inside pci ops

2016-05-30 Thread Jianfeng Tan

This patch is to move phys addr check from virtio_dev_queue_setup
to pci ops. To makt that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Huawei Xie 
Acked-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 17 +
 drivers/net/virtio/virtio_pci.c| 30 --
 drivers/net/virtio/virtio_pci.h|  2 +-
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3fb628..a3031e4 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -364,17 +364,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
}
}

-   /*
-* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
-* and only accepts 32 bit page frame number.
-* Check if the allocated physical memory exceeds 16TB.
-*/
-   if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
-   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-   virtio_dev_queue_release(vq);
-   return -ENOMEM;
-   }
-
memset(mz->addr, 0, sizeof(mz->len));
vq->mz = mz;
vq->vq_ring_mem = mz->phys_addr;
@@ -446,7 +435,11 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   hw->vtpci_ops->setup_queue(hw, vq);
+   if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
+   PMD_INIT_LOG(ERR, "setup_queue failed");
+   virtio_dev_queue_release(vq);
+   return -EINVAL;
+   }

vq->configured = 1;
*pvq = vq;
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 9cdca06..6bd239c 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -55,6 +55,22 @@
  */
 #define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)

+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+   /* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+* and only accepts 32 bit page frame number.
+* Check if the allocated physical memory exceeds 16TB.
+*/
+   if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+   (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+   return 0;
+   }
+
+   return 1;
+}
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
   void *dst, int length)
@@ -143,15 +159,20 @@ legacy_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return dst;
 }

-static void
+static int
 legacy_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint32_t src;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
rte_eal_pci_ioport_write(&hw->io, &vq->vq_queue_index, 2,
 VIRTIO_PCI_QUEUE_SEL);
src = vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
rte_eal_pci_ioport_write(&hw->io, &src, 4, VIRTIO_PCI_QUEUE_PFN);
+
+   return 0;
 }

 static void
@@ -367,12 +388,15 @@ modern_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return io_read16(&hw->common_cfg->queue_size);
 }

-static void
+static int
 modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint64_t desc_addr, avail_addr, used_addr;
uint16_t notify_off;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
desc_addr = vq->mz->phys_addr;
avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
@@ -400,6 +424,8 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue 
*vq)
PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
vq->notify_addr, notify_off);
+
+   return 0;
 }

 static void
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 554efea..a76daf7 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -234,7 +234,7 @@ struct virtio_pci_ops {
uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec);

uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id);
-   void (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+   int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq);
 };
-- 
2.1.4

[dpdk-dev] [PATCH v5 0/8] virtio support for container

2016-05-30 Thread Jianfeng Tan

v5:
 - Rename struct virtio_user_hw to struct virtio_user_dev.
 - Rename "vdev_private" to "virtio_user_dev".
 - Move special handling into virtio_ethdev.c from queue_setup().
 - Add vring in virtio_user_dev (remove rte_eth_dev_data), so that
   device does not depend on driver's data structure (rte_eth_dev_data).
 - Remove update on doc/guides/nics/overview.rst, because virtio-user has
   exact feature set with virtio.
 - Change "unsigned long int" to "uint64_t", "unsigned" to "uint32_t".
 - Remove unnecessary cast in vdev_read_dev_config().
 - Add functions in virtio_user_dev.c with prefix of "virtio_user_".
 - Rebase on virtio-next-virtio.

v4:
 - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to
   judge if it's virtual device or physical device.
 - Change the added device name to virtio-user.
 - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c,
   virtio_user_dev.c.
 - Move virtio-user specific data from struct virtio_hw into struct
   virtio_user_hw.
 - Add support to send reset_owner message.
 - Change del_queue implementation. (This need more check)
 - Remove rte_panic(), and superseded with log.
 - Add reset_owner into virtio_pci_ops.reset.
 - Merge parameter "rx" and "tx" to "queues" to emliminate confusion.
 - Move get_features to after set_owner.
 - Redefine path in virtio_user_hw from char * to char [].

v3:
 - Remove --single-file option; do no change at EAL memory.
 - Remove the added API rte_eal_get_backfile_info(), instead we check all
   opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
 - Accordingly, add more restrictions at "Known issue" section.
 - Rename parameter from queue_num to queue_size for confusion.
 - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
 - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
   reuse eth_virtio_dev_init(), remove its static declaration.
 - Implement dev_uninit() for rte_eth_dev_detach().
 - WARN -> ERR, in vhost_embedded.c
 - Add more commit message for clarify the model.

v2:
 - Rebase on the patchset of virtio 1.0 support.
 - Fix cannot create non-hugepage memory.
 - Fix wrong size of memory region when "single-file" is used.
 - Fix setting of offset in virtqueue to use virtual address.
 - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
 - Add mac option to specify the mac address of this virtual device.
 - Update doc.

This patchset is to provide high performance networking interface (virtio)
for container-based DPDK applications. The way of starting DPDK apps in
containers with ownership of NIC devices exclusively is beyond the scope.
The basic idea here is to present a new virtual device (named virtio-user),
which can be discovered and initialized by DPDK. To minimize the change,
we reuse already-existing virtio PMD code (driver/net/virtio/).

Background: Previously, we usually use a virtio device in the context of
QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually
presented in VM as a PCI device.

  --
  |  virtio driver |  ->  VM
  --
|
| --> (over PCI bus or MMIO or Channel I/O)
|
  --
  | device emulate |
  ||  ->  QEMU
  | vhost adapter  |
  --
|
| --> (vhost-user protocol or vhost-net ioctls)
|
  --
  | vhost backend  |
  --

Compared to QEMU/VM case, virtio support for contaner requires to embedded
device framework inside the virtio PMD. So this converged driver actually
plays three roles:
  - virtio driver to drive this new kind of virtual device;
  - device emulation to present this virtual device and reponse to the
virtio driver, which is originally by QEMU;
  - and the role to communicate with vhost backend, which is also
originally by QEMU.

The code layout and functionality of each module:

  --
  | -- |
  | | virtio driver  | |> (virtio_user_pci.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate |-|> (virtio_user_dev.c)
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
 |
 | -- --> (vhost-user protocol or vhost-net ioctls)
 |
   --
   | vhost backend  |
   --

How to share memory? In VM's case, qemu always shares all physical layout
to backend. But it's not feasible for a container, as a process, to share
all virtual memory regions to backend. So only specified virtual memory
regions (with type of shared) are sent to backend. It's a limitation that
only addresses in these areas can be used to transmit or receive packets.

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot wor

[dpdk-dev] [PATCH v2] test: fix mempool perf test enq_count wraparound of 32-bit uint

2016-05-30 Thread Olivier Matz

Hi David,

On 05/26/2016 04:15 PM, David Hunt wrote:
> recent CPU's can easily wrap around a 32-bit unsigned int in
> the mempool perf test. Increase to a 64-bit uint.
> 
> v2: change from %lu to %"PRIu64"
> 
> Signed-off-by: David Hunt 

Acked-by: Olivier Matz

[dpdk-dev] [PATCH v2 5/7] eal/linux: mmap ioports on ppc64

2016-05-30 Thread Olivier Matz



On 05/24/2016 07:15 AM, Yuanhan Liu wrote:
> On Mon, May 23, 2016 at 03:40:58PM +0200, Olivier Matz wrote:
>> For reference, here is the report of the ABI checker for EAL:
>>
>> [?] struct rte_pci_ioport (2)
>>
>>  1 Field len has been added to this type.
>>1) This field will not be initialized by old clients.
>>2) Size of the inclusive type has been changed.
>>   NOTE: this field should be accessed only from the new library
>>   functions, otherwise it may result in crash or incorrect behavior
>>   of applications.
>>  2 Size of this type has been changed from 16 bytes to 24 bytes. 
>>The fields or parameters of such data type may be incorrectly
>>initialized or accessed by old client applications.
>>
>> [?] affected symbols (4)
>>  rte_eal_pci_ioport_map ( struct rte_pci_device* dev, int bar,
>> struct rte_pci_ioport* p ) @@ DPDK_16.04
>>  3rd parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
>>  rte_eal_pci_ioport_read ( struct rte_pci_ioport* p, void* data,
>> size_t len, off_t offset ) @@ DPDK_16.04
>>  1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
>>  rte_eal_pci_ioport_unmap ( struct rte_pci_ioport* p ) @@ DPDK_16.04
>>  1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
>>  rte_eal_pci_ioport_write ( struct rte_pci_ioport* p, void const* data,
>> size_t len, off_t offset ) @@ DPDK_16.04
>>  1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
>>
>>
>> My understanding of the comment for this structure is that it's
>> internal to EAL:
> 
> I'm not quite sure that is enough. Cc'ed Panu, the guru on ABI stuff,
> hopefully he could shed some light on it.
> 
>> /**
>>  * A structure used to access io resources for a pci device.
>>  * rte_pci_ioport is arch, os, driver specific, and should not be used
>> outside
>>  * of pci ioport api.
>>  */
>> struct rte_pci_ioport {
>>  ...
>> }
>>
>> So I'd say it's ok to have it integrated for 16.07.
> 
> I'll let Thomas to decide it :)

Panu or Thomas, do you have any comment on this?

Thanks,
Olivier

[dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy

2016-05-30 Thread Olivier Matz

Hi Jerin,

On 05/26/2016 10:07 AM, Jerin Jacob wrote:
> Signed-off-by: Jerin Jacob 
> ---
> v1..v2
> Corrected the the git commit message(s/mbuf/mempool/g)
> ---
>  lib/librte_mempool/rte_mempool.h | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.h 
> b/lib/librte_mempool/rte_mempool.h
> index 60339bd..24876a2 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -73,6 +73,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const 
> *obj_table,
>   unsigned n, int is_mp)
>  {
>   struct rte_mempool_cache *cache;
> - uint32_t index;
>   void **cache_objs;
>   unsigned lcore_id = rte_lcore_id();
>   uint32_t cache_size = mp->cache_size;
> @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const 
> *obj_table,
>*/
>  
>   /* Add elements back into the cache */
> - for (index = 0; index < n; ++index, obj_table++)
> - cache_objs[index] = *obj_table;
> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
>  
>   cache->len += n;
>  
> 

I also checked in the get_bulk() function, which looks like that:

/* Now fill in the response ... */
for (index = 0, len = cache->len - 1;
index < n;
++index, len--, obj_table++)
*obj_table = cache_objs[len];

I think we could replace it by something like:

rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n);

The only difference is that it won't reverse the pointers in the
table, but I don't see any problem with that.

What do you think?


Regards,
Olivier

[dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy

2016-05-30 Thread Olivier Matz



On 05/27/2016 05:05 PM, Thomas Monjalon wrote:
> 2016-05-27 17:12, Jerin Jacob:
>> IMHO, I think we should have means to abstract this _logical_ changes
>> under conditional compilation flags and any arch/platform can choose
>> to select what it suites better for that arch/platform.
>>
>> We may NOT need to have frequent patches to select the specific
>> configuration, but logical patches under compilation flags can be accepted 
>> and
>> each arch/platform can choose specific set configuration when we make
>> the final release candidate for the release.
>>
>> Any thoughts?
> 
> Yes having some #ifdefs for arch configuration may be reasonnable.
> But other methods must be preffered first:
> 1/ try implementing the function in arch-specific files

I agree with Thomas. This option should be preferred, and I think we
should avoid as much as possible to have:

#if ARCH1
  do stuff optimized for arch1
#elif ARCH2
  do the same stuff optimized for arch2
#else
  ...


In this particular case, rte_memcpy() seems to be the appropriate
function, because it should already be arch-optimized.


> 2/ and check at runtime if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_X
> 3/ or check #ifdef RTE_MACHINE_CPUFLAG_X
> 4/ or check #ifdef RTE_ARCH_Y
> 5/ or check a specific #ifdef RTE_FEATURE_NAME to choose in config files
> 
> The option 2 is a nice to have which implies other options.
> 
> Maybe that doc/guides/contributing/design.rst needs to be updated.


Regards,
Olivier

[dpdk-dev] [PATCH v2 09/11] enic: optimize the Tx function

2016-05-30 Thread Azarewicz, PiotrX T

Hi,

>  uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   uint16_t nb_pkts)
>  {
>   uint16_t index;
> - unsigned int frags;
> - unsigned int pkt_len;
> - unsigned int seg_len;
> - unsigned int inc_len;
> + unsigned int pkt_len, data_len;
>   unsigned int nb_segs;
> - struct rte_mbuf *tx_pkt, *next_tx_pkt;
> + struct rte_mbuf *tx_pkt;
>   struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
>   struct enic *enic = vnic_dev_priv(wq->vdev);
>   unsigned short vlan_id;
>   unsigned short ol_flags;

Above ol_flags  should be uint64_t.

> - uint8_t last_seg, eop;
> - unsigned int host_tx_descs = 0;
> + unsigned int wq_desc_avail;
> + int head_idx;
> + struct vnic_wq_buf *buf;
> + unsigned int hw_ip_cksum_enabled;
> + unsigned int desc_count;
> + struct wq_enet_desc *descs, *desc_p, desc_tmp;
> + uint16_t mss;
> + uint8_t vlan_tag_insert;
> + uint8_t eop;
> + uint64_t bus_addr;
> 
> + enic_cleanup_wq(enic, wq);
> + wq_desc_avail = vnic_wq_desc_avail(wq);
> + head_idx = wq->head_idx;
> + desc_count = wq->ring.desc_count;
> +
> + nb_pkts = RTE_MIN(nb_pkts, ENIC_TX_XMIT_MAX);
> +
> + hw_ip_cksum_enabled = enic->hw_ip_checksum;
>   for (index = 0; index < nb_pkts; index++) {
>   tx_pkt = *tx_pkts++;
> - inc_len = 0;
>   nb_segs = tx_pkt->nb_segs;
> - if (nb_segs > vnic_wq_desc_avail(wq)) {
> + if (nb_segs > wq_desc_avail) {
>   if (index > 0)
> - enic_post_wq_index(wq);
> -
> - /* wq cleanup and try again */
> - if (!enic_cleanup_wq(enic, wq) ||
> - (nb_segs > vnic_wq_desc_avail(wq))) {
> - return index;
> - }
> + goto post;
> + goto done;
>   }
> 
>   pkt_len = tx_pkt->pkt_len;
> + data_len = tx_pkt->data_len;
>   vlan_id = tx_pkt->vlan_tci;
>   ol_flags = tx_pkt->ol_flags;

Cause you may miss a lot of flags in here.
Piotr

> - for (frags = 0; inc_len < pkt_len; frags++) {
> - if (!tx_pkt)
> - break;
> - next_tx_pkt = tx_pkt->next;
> - seg_len = tx_pkt->data_len;
> - inc_len += seg_len;
> -
> - host_tx_descs++;
> - last_seg = 0;
> - eop = 0;
> - if ((pkt_len == inc_len) || !next_tx_pkt) {
> - eop = 1;
> - /* post if last packet in batch or > thresh */
> - if ((index == (nb_pkts - 1)) ||
> -(host_tx_descs > ENIC_TX_POST_THRESH))
> {
> - last_seg = 1;
> - host_tx_descs = 0;
> - }
> +
> + mss = 0;
> + vlan_tag_insert = 0;
> + bus_addr = (dma_addr_t)
> +(tx_pkt->buf_physaddr + tx_pkt->data_off);
> +
> + descs = (struct wq_enet_desc *)wq->ring.descs;
> + desc_p = descs + head_idx;
> +
> + eop = (data_len == pkt_len);
> +
> + if (ol_flags & PKT_TX_VLAN_PKT)
> + vlan_tag_insert = 1;
> +
> + if (hw_ip_cksum_enabled && (ol_flags &
> PKT_TX_IP_CKSUM))
> + mss |= ENIC_CALC_IP_CKSUM;
> +
> + if (hw_ip_cksum_enabled && (ol_flags &
> PKT_TX_TCP_UDP_CKSUM))
> + mss |= ENIC_CALC_TCP_UDP_CKSUM;
> +
> + wq_enet_desc_enc(&desc_tmp, bus_addr, data_len, mss, 0,
> 0, eop,
> +  eop, 0, vlan_tag_insert, vlan_id, 0);
> +
> + *desc_p = desc_tmp;
> + buf = &wq->bufs[head_idx];
> + buf->mb = (void *)tx_pkt;
> + head_idx = enic_ring_incr(desc_count, head_idx);
> + wq_desc_avail--;
> +
> + if (!eop) {
> + for (tx_pkt = tx_pkt->next; tx_pkt; tx_pkt =
> + tx_pkt->next) {
> + data_len = tx_pkt->data_len;
> +
> + if (tx_pkt->next == NULL)
> + eop = 1;
> + desc_p = descs + head_idx;
> + bus_addr = (dma_addr_t)(tx_pkt-
> >buf_physaddr
> ++ tx_pkt->data_off);
> + wq_enet_desc_enc((struct wq_enet_desc *)
> +  &desc_tmp, bus_addr,
> data_len,
> +  mss, 0, 0, eop, eop, 0,
> +  vlan_tag_insert

[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-30 Thread Xie, Huawei

On 5/30/2016 11:00 AM, Yuanhan Liu wrote:
> On Mon, May 30, 2016 at 02:40:00AM +, Xie, Huawei wrote:
>> On 5/27/2016 5:06 PM, Yuanhan Liu wrote:
>>> On Tue, May 24, 2016 at 09:38:32PM +0800, Huawei Xie wrote:
vq->vq_ring_mem = mz->phys_addr;
vq->vq_ring_virt_mem = mz->addr;
 -  PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64, 
 (uint64_t)mz->phys_addr);
 -  PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, 
 (uint64_t)(uintptr_t)mz->addr);
 -  vq->virtio_net_hdr_mz  = NULL;
 -  vq->virtio_net_hdr_mem = 0;
 +  PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,
 +   (uint64_t)mz->phys_addr);
 +  PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64,
 +   (uint64_t)(uintptr_t)mz->addr);
 +
 +  hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz, socket_id,
 +   0, RTE_CACHE_LINE_SIZE);
>>> We don't need allocate hdr_mz for Rx queue, and in such case, sz_hdr_mz
>>> is 0. I'm wondering what hdr_mz would be then, NULL?
>>>
>>> Anyway, you should skip the hdr_mz allocation for Rx queue, and I also
>>> would suggest you to move the vq_hdr_name setup here.
>> will check sz_hdr_mz before the zone allocation.
>>
>>
 +  if (hdr_mz == NULL) {
 +  if (rte_errno == EEXIST)
 +  hdr_mz = rte_memzone_lookup(vq_hdr_name);
 +  if (hdr_mz == NULL) {
 +  ret = -ENOMEM;
 +  goto fail_q_alloc;
 +  }
 +  }
  
>>> ...
  
PMD_INIT_FUNC_TRACE();
ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
 -  vtpci_queue_idx, 0, socket_id, &vq);
 +  vtpci_queue_idx, 0, socket_id, (void **)&cvq);
>>> Unnecessary cast. Note that there are few others like that in this
>>> patch.
>> This cast is needed.
> Oh, right, indeed. Sorry.
>
 -  PMD_RX_LOG(DEBUG, "dequeue:%d", num);
 -  PMD_RX_LOG(DEBUG, "packet len:%d", len[0]);
 +  PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
 +  PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
>>> We should not append "\n" for PMD_RX_LOG; this macro alreadys does it.
>> Weird. Will remove it. Thanks.
>>
>>> Another note is that you might want to run checkpatch; I saw quite many
>>> warnings.
>> Had checked. The warnings are all due to 80 char limitation of
>> virtio_rxq_stats_strings. Just 4 or 5 chars cross 80 line limit. I
>> prefer to keep the fields aligned.
> Agreed. However, I was talking about others warnings.

ok, i see you are using checkpatch of newer Linux kernel.


>
>   --yliu
>
> ---
> CHECK:BRACES: braces {} should be used on all arms of this statement
> #198: FILE: drivers/net/virtio/virtio_ethdev.c:343:

would fix this.

> +   if (queue_type == VTNET_RQ)
> [...]
> +   else if (queue_type == VTNET_TQ) {
> [...]
> +   } else if (queue_type == VTNET_CQ) {
> [...]
>
> CHECK:CAMELCASE: Avoid CamelCase: 
> #280: FILE: drivers/net/virtio/virtio_ethdev.c:404:
> +   PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,
>
> CHECK:CONCATENATED_STRING: Concatenated strings should use spaces
> between elements
> #280: FILE: drivers/net/virtio/virtio_ethdev.c:404:
> +   PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,
>
> CHECK:CONCATENATED_STRING: Concatenated strings should use spaces
> between elements
> #282: FILE: drivers/net/virtio/virtio_ethdev.c:406:
> +   PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64,
>
> WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
> #482: FILE: drivers/net/virtio/virtio_ethdev.c:753:
> +   unsigned nstats = dev->data->nb_tx_queues * VIRTIO_NB_TXQ_XSTATS
> +
>
> WARNING:LONG_LINE: line over 80 characters
> #551: FILE: drivers/net/virtio/virtio_ethdev.c:819:
> +   memset(txvq->stats.size_bins, 0,
> sizeof(txvq->stats.size_bins[0]) * 8);
>
> WARNING:LONG_LINE: line over 80 characters
> #571: FILE: drivers/net/virtio/virtio_ethdev.c:832:
> +   memset(rxvq->stats.size_bins, 0,
> sizeof(rxvq->stats.size_bins[0]) * 8);

would fix this.

>
> CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
> #1529: FILE: drivers/net/virtio/virtio_rxtx_simple.c:362:
> +   nb_commit = nb_pkts = RTE_MIN((vq->vq_free_cnt >> 1), nb_pkts);
>
> CHECK:SPACING: No space is necessary after a cast
> #1530: FILE: drivers/net/virtio/virtio_rxtx_simple.c:363:
> +   desc_idx = (uint16_t) (vq->vq_avail_idx & desc_idx_max);

All other warnings are due to the newer checkpatch script, and not
introduced by this patch, so wouldn't fix in this patch.
But i think some are better programming practice, like 'Concatenated
strings should use spaces between elements', so would post a patch to
fix them in virtio driver (if possible, all other drivers).

[dpdk-dev] [PATCH] i40e: fix vlan filter in promiscuous mode

2016-05-30 Thread Peng, Yuan

Tested-by: Peng Yuan 



- Test Commit: a3f9ec846f9e7347d3a98da52256607345b4861d

- OS/Kernel: Fedora 23/4.2.3

- GCC: gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)

- CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

- Total 7 cases, 7 passed, 0 failed.



pf

vf(pf_kerneldriver)


mac_filter

vlan_filter

mac+vlan_filter

mac_filter

vlan_filter

mac+vlan_filter

promisc off

PASS(dts case)

PASS

PASS

PASS(dts case)

PASS(dts case)

PASS

promisc on

N/A

PASS(dts case)

N/A

N/A

N/A

N/A






All the test cases I verified covers 7 scenarios as below table.



The issue happened in vlan_filter/promisc on, so I just describe the test steps 
in this scenario.



Test_vlan_enable_receipt



1.   Fortpark i40e driver,  . /dpdk_nic_bind.py --bind=igb_uio :8a:00.1 
:8a:00.3

2.   ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x6 -n 4  -- -i 
--portmask=0x1 --port-topology=loop --txqflags=0

3.   Testpmd> set verbose 1

Testpmd> set fwd mac

Testpmd> vlan set filter on 0

Testpmd> vlan set strip off 0

Testpmd> rx_vlan add 51 0

Testpmd>start

4.   Tester:

sendp([Ether(dst="00:00:00:00:03:15")/Dot1Q(vlan=51)/IP()/UDP()],iface="enp138s0f0",
 count=1)

5.   DUT can receive the packet, and check the vlan ID is correct.

6.   Send packet with vlan0 and the packet can be received,
send packet without vlan and the packet can be received.

Send packet with wrong vlan(52/4095) and packet can't be receive.



Test_ vlan_disable_receipt



1.   Testpmd>rx_vlan rm 51 0

Testpmd>start

2.   Tester:

sendp([Ether(dst="00:00:00:00:03:15")/Dot1Q(vlan=51)/IP()/UDP()],iface="enp138s0f0",
 count=1)

3.   DUT can not receive the packet.



The vlan filter works normally.



Thanks

Yuan.





-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jingjing Wu
Sent: Friday, May 27, 2016 4:06 PM
To: Zhang, Helin 
Cc: dev at dpdk.org; Wu, Jingjing ; Pei, Yulong 

Subject: [dpdk-dev] [PATCH] i40e: fix vlan filter in promiscuous mode



Vlan filter didn't work if promiscuous mode is enabled. That is because i40e 
driver uses MAC VLAN table for the l2 filtering and internal switch. And the 
vlan table is disabled by default, till the first time to add rule in vlan 
table.

This patch fixed this issue to enable vlan filter by using vlan table.



Fixes: 4861cde46116 (i40e: new poll mode driver)

Signed-off-by: Jingjing Wu mailto:jingjing.wu at 
intel.com>>

---

drivers/net/i40e/i40e_ethdev.c | 23 +++

1 file changed, 19 insertions(+), 4 deletions(-)



diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c 
index 24777d5..0d91e29 100644

--- a/drivers/net/i40e/i40e_ethdev.c

+++ b/drivers/net/i40e/i40e_ethdev.c

@@ -2443,12 +2443,16 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, int 
mask)  {

   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);

   struct i40e_vsi *vsi = pf->main_vsi;

+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);

if (mask & ETH_VLAN_FILTER_MASK) {

- if (dev->data->dev_conf.rxmode.hw_vlan_filter)

+if (dev->data->dev_conf.rxmode.hw_vlan_filter) {

+  i40e_aq_set_vsi_vlan_promisc(hw, vsi->seid, false, 
NULL);

  i40e_vsi_config_vlan_filter(vsi, TRUE);

- else

+} else {

+  i40e_aq_set_vsi_vlan_promisc(hw, vsi->seid, true, 
NULL);

  i40e_vsi_config_vlan_filter(vsi, FALSE);

+}

   }

if (mask & ETH_VLAN_STRIP_MASK) {

@@ -5419,17 +5423,28 @@ i40e_set_vlan_filter(struct i40e_vsi *vsi,

   uint16_t vlan_id, bool on)

{

   uint32_t vid_idx, vid_bit;

+   struct i40e_hw *hw = I40E_VSI_TO_HW(vsi);

+   struct i40e_aqc_add_remove_vlan_element_data vlan_data = {0};

+   int ret;

if (vlan_id > ETH_VLAN_ID_MAX)

return;

vid_idx = I40E_VFTA_IDX(vlan_id);

   vid_bit = I40E_VFTA_BIT(vlan_id);

+   vlan_data.vlan_tag = rte_cpu_to_le_16(vlan_id);

-if (on)

+   if (on) {

+ret = i40e_aq_add_vlan(hw, vsi->seid, &vlan_data, 1, NULL);

+if (ret != I40E_SUCCESS)

+  PMD_DRV_LOG(ERR, "Failed to add vlan filter");

vsi->vfta[vid_idx] |= vid_bit;

-else

+   } else {

+ret = i40e_aq_remove_vlan(hw, vsi->seid, &vlan_data, 1, NULL);

+if (ret != I40E_SUCCESS)

+  PMD_DRV_LOG(ERR, "Failed to remove vlan filter");

vsi->vfta[vid_idx] &= ~vid_bit;

+   }

}

 /**

--

2.4.0

[dpdk-dev] [PATCH] virtio: split virtio rx/tx queue

2016-05-30 Thread Xie, Huawei

On 5/27/2016 5:06 PM, Yuanhan Liu wrote:
> On Tue, May 24, 2016 at 09:38:32PM +0800, Huawei Xie wrote:
>>  vq->vq_ring_mem = mz->phys_addr;
>>  vq->vq_ring_virt_mem = mz->addr;
>> -PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64, 
>> (uint64_t)mz->phys_addr);
>> -PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, 
>> (uint64_t)(uintptr_t)mz->addr);
>> -vq->virtio_net_hdr_mz  = NULL;
>> -vq->virtio_net_hdr_mem = 0;
>> +PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:  0x%"PRIx64,
>> + (uint64_t)mz->phys_addr);
>> +PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64,
>> + (uint64_t)(uintptr_t)mz->addr);
>> +
>> +hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, sz_hdr_mz, socket_id,
>> + 0, RTE_CACHE_LINE_SIZE);
> We don't need allocate hdr_mz for Rx queue, and in such case, sz_hdr_mz
> is 0. I'm wondering what hdr_mz would be then, NULL?
>
> Anyway, you should skip the hdr_mz allocation for Rx queue, and I also
> would suggest you to move the vq_hdr_name setup here.

will check sz_hdr_mz before the zone allocation.


>
>> +if (hdr_mz == NULL) {
>> +if (rte_errno == EEXIST)
>> +hdr_mz = rte_memzone_lookup(vq_hdr_name);
>> +if (hdr_mz == NULL) {
>> +ret = -ENOMEM;
>> +goto fail_q_alloc;
>> +}
>> +}
>>  
> ...
>>  
>>  PMD_INIT_FUNC_TRACE();
>>  ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
>> -vtpci_queue_idx, 0, socket_id, &vq);
>> +vtpci_queue_idx, 0, socket_id, (void **)&cvq);
> Unnecessary cast. Note that there are few others like that in this
> patch.

This cast is needed.

>
>> -PMD_RX_LOG(DEBUG, "dequeue:%d", num);
>> -PMD_RX_LOG(DEBUG, "packet len:%d", len[0]);
>> +PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
>> +PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
> We should not append "\n" for PMD_RX_LOG; this macro alreadys does it.

Weird. Will remove it. Thanks.

>
> Another note is that you might want to run checkpatch; I saw quite many
> warnings.

Had checked. The warnings are all due to 80 char limitation of
virtio_rxq_stats_strings. Just 4 or 5 chars cross 80 line limit. I
prefer to keep the fields aligned.

>
>   --yliu
>

70 matches

Mail list logo