date:20160122

[dpdk-dev] [PKTGEN] additional terminal IO question

2016-01-22 Thread Matthew Hall

On Thu, Jan 21, 2016 at 05:35:00PM +0200, Arnon Warshavsky wrote:
> Keith,
> For the record, on my end (can only speak for myself)  this is not a real
> problem.
> I work around it by using a different theme and live with it happily ever
> after.
> I just provided the input since I encountered it.
> 
> /Arnon

For me, breaking stuff with a black background to gain questionably useful 
colors and/or themes seems like more overhead for cognition of the code for 
not much benefit.

This is going to break the tool people who use a Linux standard framebuffer 
with no X also, isn't it?

Matthew.

[dpdk-dev] [PKTGEN] [PATCH 1/2] usage_pktgen.rst: multiple instances: clean up section intro

2016-01-22 Thread Matthew Hall

On Thu, Jan 21, 2016 at 03:46:21PM +, Wiles, Keith wrote:
> It appears (if I compared the text correctly) the above only move a few 
> trailing words to the next line, why?

I believe in trying to leave code / docs cleaner than I found them.

Most Markdown / ReStructured Text has a tradition of 78-character wide line 
length limits.

So just trying to make the world a better place. That's it.

> Maybe this is clearer.
> 
> If you are running pktgen and your application together, you have to start up
> each one a bit differently to make sure they share the resources like memory,
> huge page files and ports. I will use two Pktgens running on the same machine,
> which just means you have imagine your application as one of the Pktgen 
> instances.

I'll make some adjustments when I get more time and try again.

In general I think it would be good if we had a bit more of a forest view of 
trying to make docs for all of the things pktgen docs are missing and not get 
too hung up on one tree or another.

I don't necessarily have a ton of time for editing these to the Nth degree, 
but I do want to make life better for the next confused people to avoid a 
storm of duplicate comments and issues.

Matthew.

[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment

2016-01-22 Thread Tetsuya Mukawa

On 2016/01/22 17:14, Xie, Huawei wrote:
> On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:
>> virtio: Extend virtio-net PMD to support container environment
>>
>> The patch adds a new virtio-net PMD configuration that allows the PMD to
>> work on host as if the PMD is in VM.
>> Here is new configuration for virtio-net PMD.
>>  - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
>> To use this mode, EAL needs physically contiguous memory. To allocate
>> such memory, add "--shm" option to application command line.
>>
>> To prepare virtio-net device on host, the users need to invoke QEMU
>> process in special qtest mode. This mode is mainly used for testing QEMU
>> devices from outer process. In this mode, no guest runs.
>> Here is QEMU command line.
>>
>>  $ qemu-system-x86_64 \
>>  -machine pc-i440fx-1.4,accel=qtest \
>>  -display none -qtest-log /dev/null \
>>  -qtest unix:/tmp/socket,server \
>>  -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1\
>>  -device virtio-net-pci,netdev=net0,mq=on \
>>  -chardev socket,id=chr1,path=/tmp/ivshmem,server \
>>  -device ivshmem,size=1G,chardev=chr1,vectors=1
>>
>>  * QEMU process is needed per port.
> Does qtest supports hot plug virtio-net pci device, so that we could run
> one QEMU process in host, which provisions the virtio-net virtual
> devices for the container?

Theoretically, we can use hot plug in some cases.
But I guess we have 3 concerns here.

1. Security.
If we share QEMU process between multiple DPDK applications, this QEMU
process will have all fds of  the applications on different containers.
In some cases, it will be security concern.
So, I guess we need to support current 1:1 configuration at least.

2. shared memory.
Currently, QEMU and DPDK application will map shared memory using same
virtual address.
So if multiple DPDK application connects to one QEMU process, each DPDK
application should have different address for shared memory. I guess
this will be a big limitation.

3. PCI bridge.
So far, QEMU has one PCI bridge, so we can connect almost 10 PCI devices
to QEMU.
(I forget correct number, but it's almost 10, because some slots are
reserved by QEMU)
A DPDK application needs both virtio-net and ivshmem device, so I guess
almost 5 DPDK applications can connect to one QEMU process, so far.
To add more PCI bridges solves this.
But we need to add a lot of implementation to support cascaded PCI
bridges and PCI devices.
(Also we need to solve above "2nd" concern.)

Anyway, if we use virtio-net PMD and vhost-user PMD, QEMU process will
not do anything after initialization.
(QEMU will try to read a qtest socket, then be stopped because there is
no message after initialization)
So I guess we can ignore overhead of these QEMU processes.
If someone cannot ignore it, I guess this is the one of cases that it's
nice to use your light weight container implementation.

Thanks,
Tetsuya

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread David Harton (dharton)

> > > Do you see a fundamental problem with parsing the strings to
> > > retrieve values?
> >
> > I think parsing strings is expensive CPU wise
> 
> Parsing the strings can be done once at startup, and the index of the
> statistics in the array cached. When actually reading statistics, access
> of the array of statistics at the previously cached index will return the
> same statistic. It is not needed to do strcmp() per statistic per read.

How is this order guaranteed through the API?  The stat is currently identified 
by the string not by order returned.  What if a driver returns more/less stats 
based on config that changes after boot?

> 
> 
> > That's why I was wondering if a binary id could be generated instead.
> 
> I've worked with such a system before, it dynamically registered string-
> >int mappings at runtime, and allowed "reverse" mapping them too. Its
> workable, and I'm not opposed to it if the conclusion is that it really is
> necessary, but I'm not sure about that.
> 
> 
> > The API
> > would be very similar to the current xstats API except it would pass
> > id/value pairs instead of string/value pairs.  This avoids string
> > comparisons to figure out while still allowing flexibility between
> drivers.
> 
> Apart from a once-off scan at startup, xstats achieves this.

Partially true.  You may not need to perform strcmp() per read, although I 
don't believe the defined API guarantees that this would be safe (see above); 
but, you still have to perform a strcpy() of each stat whether the caller is 
interested in it or not.

> 
> 
> >  It would also 100% guarantee that any strings returned by "const
> > char* rte_eth_xstats_str_get(uint32_t stat_id)" are consistent across
> > devices.
> 
> Yes - that is a nice feature that xstats (in its current form) doesn't
> have.
> But what price is there to pay for this?
> We need to map an ID for each stat that exists.
> 
> How and where will this mapping happen? Perhaps would you expand a little
> on how you see this working?

I wasn't thinking dynamic registration as that might be more than necessary.  I 
can code up a detailed snippet that shares the idea if wanted but I think the 
following communicates the basic idea:

enum rte_eth_stat_e {
/* accurate desc #1 */
RTE_ETH_STAT_1, 
/* accurate desc #2 */
RTE_ETH_STAT_2,
...
}
struct rte_eth_id_stat {
rte_eth_stat_e id;
uin64_t value;
}

int rte_eth_id_stats_num(uint8_t port_id, uint32_t *num_stats);
/* returns < 0 on error or the number of stats that could have been read (i.e. 
if userd  */
int rte_eth_id_stats_get(uint8_t port_id, uint32_t num_stats, rte_eth_id_stat 
*id_stats);
const char* rte_eth_id_stat_str(rte_eth_stat_e id);

This allows a driver to return whatever stats that it supports in a consistent 
manner and also in a performance friendly way.  In fact, the driver side would 
be identical to what they have today but instead of having arrays with "string 
stat name" they will have the rte_eth_stat_e.

> 
> > I also think there is less chance for error if drivers assign their
> > stats to ID versus constructing a string.
> 
> Have a look in how the mapping is done in the xstats implementation for
> ixgbe for example:
> http://dpdk.org/browse/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c#n540
> 
> It's a pretty simple  {"string stat name" -> offsetof( stuct hw,
> register_var_name )}

It's not how they add the strings but rather the format of the string itself 
that is error prone.  
IOTW, the "string stat name" is prone to not being formatted correctly or 
consistently.

> 
> 
> > I haven't checked my application, but I wonder if the binary size
> > would actually be smaller if all the stats strings were centrally
> > located versus distributed across binaries (highly dependent on the
> linker and optimization level).
> 
> Sounds like optimizing the wrong thing to me.

Wasn't trying to optimize image size as much as saying it's a side benefit due 
to string consolidation.

> 
> 
> > > If you like I could code up a minimal sample-app that only pulls
> > > statistics, and you can show "interest" in various statistics for
> > > printing / monitoring?
> >
> > Appreciate the offer.  I actually understand what's is available.
> > And, BTW, I apologize for being late to the game (looks like this was
> > discussed last summer) but I'm just now getting looped in and
> > following the mailer but I'm just wondering if something that is
> performance friendly for the user/application and flexible for the drivers
> is possible.
> 
> We're all looking for the same thing - just different approaches that's
> all :)
> 
> 
> RE: Thomas asking about performance numbers:
> I can scrape together some raw tsc data on Monday and post to list, and we
> can discuss it more then.

I can do the same if desired.  But, just to make sure we are discussing the 
same issue:

1) call rte_eth_xtats_get()
This will result in many string copies and depending on the driver *many* 
copies I don't want or

[dpdk-dev] [RFC 0/2] slow data path communication between DPDK port and Linux

2016-01-22 Thread Thomas Monjalon

2016-01-22 16:56, Ferruh Yigit:
> On Fri, Jan 22, 2016 at 05:31:45PM +0100, Thomas Monjalon wrote:
> > Hi Ferruh,
> > 
> > Not commenting the implementation, just the method.
> > 
> > 2016-01-22 16:00, Ferruh Yigit:
> > > This is slow data path communication implementation based on existing KNI.
> > > Difference is: librte_kni converted into a PMD, kdp kernel module is 
> > > almost
> > > same except all control path functionality removed and some 
> > > simplification done.
> > 
> > Is there a chance to submit such kernel module on LKML instead of DPDK?
> > We should avoid maintaining some out-of-tree modules.
> 
> The ones I have sent are not generic enough to be in Linux tree.

I've not read the details.
What is missing to be generic?

> We already maintain kni kernel module,

Yes it is painful and not accepted in some Linux distros.

> these patches are part of effort to make
> kni more maintainable, by separation of concerns, removing network drivers 
> from it,
> and simplifying some of code.

Your patch is not removing KNI unfortunately ;)

> For this patch set, tun/tap interface can be alternative, and it looks like it
> removes out-of-tree kernel module requirement, unless people want current
> FIFO implementation because of better performance.
> 
> For control path, unfortunately I am not aware of any solution without 
> out-of-tree
> kernel module support.
> 
> Thanks,
> ferruh

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Igor Ryzhov

Hello, everyone.

How about exposing stats according to IF-MIB?

Statistics to be exposed are - octets, unicast packets, multicast packets, 
broadcast packets, errors and discards for both TX and RX.
These counters are basic and implemented by most of drivers.

All other driver-specific counters can be exposed by xstats.

Best regards,
Igor

> 22 ???. 2016 ?., ? 17:44, Thomas Monjalon  
> ???(?):
> 
> 2016-01-22 14:18, Tahhan, Maryam:
>> So what can be enabled again in struct rte_eth_stats  from what was already 
>> there is the equivalent of: 
>> * rx_length_errors
>> * rx_crc_errors
>> * rx_missed_errors - the deprecation notice was removed for this field.
>> * multicast
>> 
>> What should be added in to distinguish between errors and drops. struct 
>> rte_eth_stats :
>> * rx_errors
>> * tx_errors
>> 
>> As for the detailed rx errors and tx errors I'm open to feedback from you 
>> folks as to what should go in and what is too detailed. These weren't in 
>> struct rte_eth_stats previously, they are available through xstats and are 
>> uniformly named across the drivers. Oliver + Harry any thoughts?
>> 
>> David I assume you are looking for all the missing fields to be added?
> 
> They are not missing. They just not exactly match ones having a
> long history in Linux kernel.
> Please let's avoid to blindly mimic others without thinking
> about modern needs.
> 
 From: David Harton
> Is there a reason the stats have been deprecated?  Why not keep
> the stats in line with the standard linux practices such as
> rtnl_link_stats64?
>

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Thomas Monjalon

2016-01-22 16:04, David Harton:
> I think parsing strings is expensive CPU wise
[...]
> wondering if something that is performance friendly for the user/application
> and flexible for the drivers is possible.

I think we need some numbers from experimentations and some requirements.
How many cycles it takes to read stats?
How often stats must be read at max?

[dpdk-dev] [RFC 0/2] slow data path communication between DPDK port and Linux

2016-01-22 Thread Thomas Monjalon

Hi Ferruh,

Not commenting the implementation, just the method.

2016-01-22 16:00, Ferruh Yigit:
> This is slow data path communication implementation based on existing KNI.
> Difference is: librte_kni converted into a PMD, kdp kernel module is almost
> same except all control path functionality removed and some simplification 
> done.

Is there a chance to submit such kernel module on LKML instead of DPDK?
We should avoid maintaining some out-of-tree modules.

[dpdk-dev] [RFC 0/2] slow data path communication between DPDK port and Linux

2016-01-22 Thread Ferruh Yigit

On Fri, Jan 22, 2016 at 05:31:45PM +0100, Thomas Monjalon wrote:
> Hi Ferruh,
> 
> Not commenting the implementation, just the method.
> 
> 2016-01-22 16:00, Ferruh Yigit:
> > This is slow data path communication implementation based on existing KNI.
> > Difference is: librte_kni converted into a PMD, kdp kernel module is almost
> > same except all control path functionality removed and some simplification 
> > done.
> 
> Is there a chance to submit such kernel module on LKML instead of DPDK?
> We should avoid maintaining some out-of-tree modules.

The ones I have sent are not generic enough to be in Linux tree.

We already maintain kni kernel module, these patches are part of effort to make
kni more maintainable, by separation of concerns, removing network drivers from 
it,
and simplifying some of code.

For this patch set, tun/tap interface can be alternative, and it looks like it
removes out-of-tree kernel module requirement, unless people want current
FIFO implementation because of better performance.

For control path, unfortunately I am not aware of any solution without 
out-of-tree
kernel module support.

Thanks,
ferruh

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Van Haaren, Harry

+Jay, (@all, please keep everybody in the CCs :)

> From: David Harton (dharton) [mailto:dharton at cisco.com]
> To: Van Haaren, Harry ; Thomas Monjalon
> > > xstats are driver agnostic and have a well-defined naming scheme.
> >
> > Indeed, described here:
> > http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#extended-
> > statistics-api
> 
> Thanks for sharing.  I do think what is in the link is well thought out but 
> it also may
> not match how the application would choose to represent that stat 
> (capitalization,
> abbreviation, etc).

Welcome. Sure, an application can match any xstats string to its "Display" 
counterpart,
or a few simple translations actually make (almost) all of them human readable. 
Translating
an _ to space, capitalize first letter of description items, and remove the 
unit at the end).
This use-case was considered when detailing the naming scheme.


> > The "rules" of encoding a statistic as an xstats string are pretty simple,
> > and if any PMD breaks the rules, it should be considered broken and fixed.
> 
> I understand it may be broken, but that doesn't help finding them and 
> ensuring they aren't
> broken to begin with.  What regression/automation is in place to ensure 
> drivers aren't
> broken?

Currently nothing automated - patch review on the mailing list. I'm open to 
ideas on this,
if you have suggestions.


> > Do you see a fundamental problem with parsing the strings to retrieve
> > values?
> 
> I think parsing strings is expensive CPU wise

Parsing the strings can be done once at startup, and the index of the 
statistics in the
array cached. When actually reading statistics, access of the array of 
statistics at the
previously cached index will return the same statistic. It is not needed to do 
strcmp()
per statistic per read.


> That's why I was wondering if a binary id could be generated instead.

I've worked with such a system before, it dynamically registered string->int 
mappings at
runtime, and allowed "reverse" mapping them too. Its workable, and I'm not 
opposed to it
if the conclusion is that it really is necessary, but I'm not sure about that.


> The API
> would be very similar to the current xstats API except it would pass id/value 
> pairs
> instead of string/value pairs.  This avoids string comparisons to figure out 
> while still
> allowing flexibility between drivers.

Apart from a once-off scan at startup, xstats achieves this. 


>  It would also 100% guarantee that any strings
> returned by "const char* rte_eth_xstats_str_get(uint32_t stat_id)" are 
> consistent across
> devices.

Yes - that is a nice feature that xstats (in its current form) doesn't have.
But what price is there to pay for this?
We need to map an ID for each stat that exists.

How and where will this mapping happen? Perhaps would you expand a little on how
you see this working? 

> I also think there is less chance for error if drivers assign their stats to 
> ID versus
> constructing a string. 

Have a look in how the mapping is done in the xstats implementation for ixgbe 
for example:
http://dpdk.org/browse/dpdk/tree/drivers/net/ixgbe/ixgbe_ethdev.c#n540

It's a pretty simple  {"string stat name" -> offsetof( stuct hw, 
register_var_name )}


> I haven't checked my application, but I wonder if the binary size
> would actually be smaller if all the stats strings were centrally located 
> versus
> distributed across binaries (highly dependent on the linker and optimization 
> level).

Sounds like optimizing the wrong thing to me.


> > If you like I could code up a minimal sample-app that only pulls
> > statistics, and you can show "interest" in various statistics for printing
> > / monitoring?
> 
> Appreciate the offer.  I actually understand what's is available.  And, BTW, 
> I apologize
> for being late to the game (looks like this was discussed last summer) but 
> I'm just now
> getting looped in and following the mailer but I'm just wondering if 
> something that is
> performance friendly for the user/application and flexible for the drivers is 
> possible.

We're all looking for the same thing - just different approaches that's all :)


RE: Thomas asking about performance numbers:
I can scrape together some raw tsc data on Monday and post to list, and we can 
discuss it more then.


Regards, -Harry

[dpdk-dev] [RFC PATCH 3/5] virtio: Add a new layer to abstract pci access method

2016-01-22 Thread Tetsuya Mukawa

On 2016/01/22 16:26, Xie, Huawei wrote:
> On 1/21/2016 7:08 PM, Tetsuya Mukawa wrote:
>> +static void
>> +phys_legacy_write16(struct virtio_hw *hw, uint16_t *addr, uint16_t val)
>> +{
>> +return outb_p((unsigned short)val,
>> +(unsigned short)(hw->io_base + (uint64_t)addr));
> outb_p -> outw_p
>
>> +}
>> +
>> +static void
>> +phys_legacy_write32(struct virtio_hw *hw, uint32_t *addr, uint32_t val)
>> +{
>> +return outb_p((unsigned int)val,
>> +(unsigned short)(hw->io_base + (uint64_t)addr));
> outb_p -> outl_p
>

Oops, thanks!

Tetsuya

[dpdk-dev] [PATCH 9/9] pci: blacklist only in global probe function

2016-01-22 Thread David Marchand

Move blacklist evaluation to rte_eal_pci_probe().
This way, rte_eal_pci_probe_one() (used by hotplug when attaching) now
accepts to probe devices that were blacklisted initially.

A new debug trace is added when skipping a device not part of a given
whitelist.

Signed-off-by: David Marchand 
---
 lib/librte_eal/common/eal_common_pci.c | 56 --
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 71222a5..52d4594 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -209,21 +209,6 @@ static int
 pci_probe_device(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
int ret;
-   struct rte_pci_addr *loc = >addr;
-
-   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
-   loc->domain, loc->bus, loc->devid, loc->function,
-   dev->numa_node);
-
-   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
-   dev->id.device_id, dr->name);
-
-   /* no initialization when blacklisted, return without error */
-   if (dev->devargs != NULL &&
-   dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
-   RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
-   return 1;
-   }

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 #ifdef RTE_PCI_CONFIG
@@ -306,6 +291,14 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
if (!dr)
goto err_return;

+   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function,
+   dev->numa_node);
+
+   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
+   dev->id.device_id, dr->name);
+
if (pci_probe_device(dr, dev) < 0)
goto err_return;

@@ -364,10 +357,8 @@ rte_eal_pci_probe(void)
struct rte_pci_device *dev;
struct rte_pci_driver *dr;
struct rte_devargs *devargs;
-   int probe_all = 0;
-
-   if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
-   probe_all = 1;
+   int whitelist = rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI);
+   int blacklist = rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI);

TAILQ_FOREACH(dev, _device_list, next) {

@@ -381,11 +372,30 @@ rte_eal_pci_probe(void)
if (devargs != NULL)
dev->devargs = devargs;

-   /* skip if not probing all and device is not whitelisted */
-   if (!probe_all &&
-   (devargs == NULL ||
-devargs->type != RTE_DEVTYPE_WHITELISTED_PCI))
+   RTE_LOG(DEBUG, EAL,
+   "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function,
+   dev->numa_node);
+
+   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n",
+   dev->id.vendor_id, dev->id.device_id, dr->name);
+
+   /*
+* We want to probe this device when:
+* - there is no whitelist/blacklist,
+* - there is a whitelist, and device is part of it,
+* - there is a blacklist, and device is not part of it.
+*/
+   if (whitelist && !devargs) {
+   RTE_LOG(DEBUG, EAL, "  Device is not whitelisted, not 
initializing\n");
+   continue;
+   }
+
+   if (blacklist && devargs) {
+   RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
continue;
+   }

if (pci_probe_device(dr, dev) < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
-- 
1.9.1

[dpdk-dev] [PATCH 8/9] pci: remove driver lookup from detach

2016-01-22 Thread David Marchand

A device is linked to a driver at probe time.
When detaching, we should call this same driver detach() function and no
need for a driver lookup.

Signed-off-by: David Marchand 
---
 lib/librte_eal/common/eal_common_pci.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index a1efd57..71222a5 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -256,16 +256,9 @@ pci_probe_device(struct rte_pci_driver *dr, struct 
rte_pci_device *dev)
  * driver.
  */
 static int
-pci_detach_device(struct rte_pci_driver *dr, struct rte_pci_device *dev)
+pci_detach_device(struct rte_pci_device *dev)
 {
-   struct rte_pci_addr *loc = >addr;
-
-   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
-   loc->domain, loc->bus, loc->devid,
-   loc->function, dev->numa_node);
-
-   RTE_LOG(DEBUG, EAL, "  remove driver: %x:%x %s\n", dev->id.vendor_id,
-   dev->id.device_id, dr->name);
+   struct rte_pci_driver *dr = dev->driver;

if (dr->devuninit && (dr->devuninit(dev) < 0))
return -1;  /* negative value is an error */
@@ -332,20 +325,22 @@ int
 rte_eal_pci_detach(const struct rte_pci_addr *addr)
 {
struct rte_pci_device *dev;
-   struct rte_pci_driver *dr;

if (addr == NULL)
return -1;

dev = pci_find_device(addr);
-   if (!dev)
+   if (!dev || !dev->driver)
goto err_return;

-   dr = pci_find_driver(dev);
-   if (!dr)
-   goto err_return;
+   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   dev->addr.domain, dev->addr.bus, dev->addr.devid,
+   dev->addr.function, dev->numa_node);
+
+   RTE_LOG(DEBUG, EAL, "  remove driver: %x:%x %s\n", dev->id.vendor_id,
+   dev->id.device_id, dev->driver->name);

-   if (pci_detach_device(dr, dev) < 0)
+   if (pci_detach_device(dev) < 0)
goto err_return;

TAILQ_REMOVE(_device_list, dev, next);
-- 
1.9.1

[dpdk-dev] [PATCH 7/9] pci: factorize driver search

2016-01-22 Thread David Marchand

Same idea as a few commits before, no need to implement the same logic in
probe and detach functions.
Here, the driver matching is done by a helper, then probe / detach
functions are called respectively.

Signed-off-by: David Marchand 
---
 lib/librte_eal/common/eal_common_pci.c | 96 --
 1 file changed, 34 insertions(+), 62 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 769c42e..a1efd57 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -280,55 +280,16 @@ pci_detach_device(struct rte_pci_driver *dr, struct 
rte_pci_device *dev)
return 0;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
-{
-   struct rte_pci_driver *dr = NULL;
-   int rc = 0;
-
-   TAILQ_FOREACH(dr, _driver_list, next) {
-   if (!pci_driver_supports_device(dr, dev))
-   continue;
-
-   rc = pci_probe_device(dr, dev);
-   if (rc < 0)
-   /* negative value is an error */
-   return -1;
-   if (rc > 0)
-   /* positive value means device is blacklisted */
-   continue;
-   return 0;
-   }
-   return 1;
-}
-
-/*
- * If vendor/device ID match, call the devuninit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
-static int
-pci_detach_all_drivers(struct rte_pci_device *dev)
+static struct rte_pci_driver *
+pci_find_driver(struct rte_pci_device *dev)
 {
-   struct rte_pci_driver *dr = NULL;
-   int rc = 0;
+   struct rte_pci_driver *dr;

TAILQ_FOREACH(dr, _driver_list, next) {
-   if (!pci_driver_supports_device(dr, dev))
-   continue;
-
-   rc = pci_detach_device(dr, dev);
-   if (rc < 0)
-   /* negative value is an error */
-   return -1;
-   return 0;
+   if (pci_driver_supports_device(dr, dev))
+   break;
}
-   return 1;
+   return dr;
 }

 /*
@@ -338,8 +299,8 @@ pci_detach_all_drivers(struct rte_pci_device *dev)
 int
 rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
 {
-   struct rte_pci_device *dev = NULL;
-   int ret = 0;
+   struct rte_pci_device *dev;
+   struct rte_pci_driver *dr;

if (addr == NULL)
return -1;
@@ -348,8 +309,11 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
if (!dev)
goto err_return;

-   ret = pci_probe_all_drivers(dev);
-   if (ret < 0)
+   dr = pci_find_driver(dev);
+   if (!dr)
+   goto err_return;
+
+   if (pci_probe_device(dr, dev) < 0)
goto err_return;

return 0;
@@ -367,8 +331,8 @@ err_return:
 int
 rte_eal_pci_detach(const struct rte_pci_addr *addr)
 {
-   struct rte_pci_device *dev = NULL;
-   int ret = 0;
+   struct rte_pci_device *dev;
+   struct rte_pci_driver *dr;

if (addr == NULL)
return -1;
@@ -377,8 +341,11 @@ rte_eal_pci_detach(const struct rte_pci_addr *addr)
if (!dev)
goto err_return;

-   ret = pci_detach_all_drivers(dev);
-   if (ret < 0)
+   dr = pci_find_driver(dev);
+   if (!dr)
+   goto err_return;
+
+   if (pci_detach_device(dr, dev) < 0)
goto err_return;

TAILQ_REMOVE(_device_list, dev, next);
@@ -399,28 +366,33 @@ err_return:
 int
 rte_eal_pci_probe(void)
 {
-   struct rte_pci_device *dev = NULL;
+   struct rte_pci_device *dev;
+   struct rte_pci_driver *dr;
struct rte_devargs *devargs;
int probe_all = 0;
-   int ret = 0;

if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
probe_all = 1;

TAILQ_FOREACH(dev, _device_list, next) {

+   /* no driver available */
+   dr = pci_find_driver(dev);
+   if (!dr)
+   continue;
+
/* set devargs in PCI structure */
devargs = pci_devargs_lookup(dev);
if (devargs != NULL)
dev->devargs = devargs;

-   /* probe all or only whitelisted devices */
-   if (probe_all)
-   ret = pci_probe_all_drivers(dev);
-   else if (devargs != NULL &&
-   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
-   if (ret < 0)
+   /* skip if not

[dpdk-dev] [PATCH 6/9] pci: cosmetic change

2016-01-22 Thread David Marchand

Indent previous commit, rename functions (internal and local functions do
not need rte_eal_ prefix).

Signed-off-by: David Marchand 
---
 lib/librte_eal/common/eal_common_pci.c | 106 -
 1 file changed, 52 insertions(+), 54 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 44549f7..769c42e 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -206,50 +206,49 @@ pci_driver_supports_device(const struct rte_pci_driver 
*dr,
  * driver.
  */
 static int
-rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
-struct rte_pci_device *dev)
+pci_probe_device(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
int ret;
-   struct rte_pci_addr *loc = >addr;
+   struct rte_pci_addr *loc = >addr;

-   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket 
%i\n",
-   loc->domain, loc->bus, loc->devid, 
loc->function,
-   dev->numa_node);
+   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   loc->domain, loc->bus, loc->devid, loc->function,
+   dev->numa_node);

-   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", 
dev->id.vendor_id,
-   dev->id.device_id, dr->name);
+   RTE_LOG(DEBUG, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
+   dev->id.device_id, dr->name);

-   /* no initialization when blacklisted, return without error */
-   if (dev->devargs != NULL &&
-   dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
-   RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
-   return 1;
-   }
+   /* no initialization when blacklisted, return without error */
+   if (dev->devargs != NULL &&
+   dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
+   RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
+   return 1;
+   }

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 #ifdef RTE_PCI_CONFIG
-   /*
-* Set PCIe config space for high performance.
-* Return value can be ignored.
-*/
-   pci_config_space_set(dev);
+   /*
+* Set PCIe config space for high performance.
+* Return value can be ignored.
+*/
+   pci_config_space_set(dev);
 #endif
-   /* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
-   if (ret != 0)
-   return ret;
-   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
-   rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* unbind current driver */
-   if (pci_unbind_kernel_driver(dev) < 0)
-   return -1;
-   }
-
-   /* reference driver structure */
-   dev->driver = dr;
-
-   /* call the driver devinit() function */
-   return dr->devinit(dr, dev);
+   /* map resources for devices that use igb_uio */
+   ret = pci_map_device(dev);
+   if (ret != 0)
+   return ret;
+   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
+  rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* unbind current driver */
+   if (pci_unbind_kernel_driver(dev) < 0)
+   return -1;
+   }
+
+   /* reference driver structure */
+   dev->driver = dr;
+
+   /* call the driver devinit() function */
+   return dr->devinit(dr, dev);
 }

 /*
@@ -257,29 +256,28 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
  * driver.
  */
 static int
-rte_eal_pci_detach_dev(struct rte_pci_driver *dr,
-   struct rte_pci_device *dev)
+pci_detach_device(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
-   struct rte_pci_addr *loc = >addr;
+   struct rte_pci_addr *loc = >addr;

-   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket 
%i\n",
-   loc->domain, loc->bus, loc->devid,
-   loc->function, dev->numa_node);
+   RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
+   loc->domain, loc->bus, loc->devid,
+   loc->function, dev->numa_node);

-   RTE_LOG(DEBUG, EAL, "  remove driver: %x:%x %s\n", 
dev->id.vendor_id,
-

[dpdk-dev] [PATCH 5/9] pci: factorize probe/detach code

2016-01-22 Thread David Marchand

Move pci id matching to a helper and reuse it in probe and detach
functions.

Signed-off-by: David Marchand 
---
 lib/librte_eal/common/eal_common_pci.c | 67 --
 1 file changed, 23 insertions(+), 44 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 2528775..44549f7 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -174,14 +174,10 @@ pci_add_device(struct rte_pci_device *dev)
return 0;
 }

-/*
- * If vendor/device ID match, call the devinit() function of the
- * driver.
- */
 static int
-rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
+pci_driver_supports_device(const struct rte_pci_driver *dr,
+  const struct rte_pci_device *dev)
 {
-   int ret;
const struct rte_pci_id *id_table;

for (id_table = dr->id_table; id_table->vendor_id != 0; id_table++) {
@@ -200,6 +196,20 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
id_table->subsystem_device_id != PCI_ANY_ID)
continue;

+   return 1;
+   }
+   return 0;
+}
+
+/*
+ * If vendor/device ID match, call the devinit() function of the
+ * driver.
+ */
+static int
+rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
+struct rte_pci_device *dev)
+{
+   int ret;
struct rte_pci_addr *loc = >addr;

RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket 
%i\n",
@@ -240,9 +250,6 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d

/* call the driver devinit() function */
return dr->devinit(dr, dev);
-   }
-   /* return positive value if driver is not found */
-   return 1;
 }

 /*
@@ -253,27 +260,6 @@ static int
 rte_eal_pci_detach_dev(struct rte_pci_driver *dr,
struct rte_pci_device *dev)
 {
-   const struct rte_pci_id *id_table;
-
-   if ((dr == NULL) || (dev == NULL))
-   return -EINVAL;
-
-   for (id_table = dr->id_table; id_table->vendor_id != 0; id_table++) {
-
-   /* check if device's identifiers match the driver's ones */
-   if (id_table->vendor_id != dev->id.vendor_id &&
-   id_table->vendor_id != PCI_ANY_ID)
-   continue;
-   if (id_table->device_id != dev->id.device_id &&
-   id_table->device_id != PCI_ANY_ID)
-   continue;
-   if (id_table->subsystem_vendor_id != 
dev->id.subsystem_vendor_id &&
-   id_table->subsystem_vendor_id != PCI_ANY_ID)
-   continue;
-   if (id_table->subsystem_device_id != 
dev->id.subsystem_device_id &&
-   id_table->subsystem_device_id != PCI_ANY_ID)
-   continue;
-
struct rte_pci_addr *loc = >addr;

RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket 
%i\n",
@@ -294,10 +280,6 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,
pci_unmap_device(dev);

return 0;
-   }
-
-   /* return positive value if driver is not found */
-   return 1;
 }

 /*
@@ -311,16 +293,16 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
struct rte_pci_driver *dr = NULL;
int rc = 0;

-   if (dev == NULL)
-   return -1;
-
TAILQ_FOREACH(dr, _driver_list, next) {
+   if (!pci_driver_supports_device(dr, dev))
+   continue;
+
rc = rte_eal_pci_probe_one_driver(dr, dev);
if (rc < 0)
/* negative value is an error */
return -1;
if (rc > 0)
-   /* positive value means driver not found */
+   /* positive value means device is blacklisted */
continue;
return 0;
}
@@ -338,17 +320,14 @@ pci_detach_all_drivers(struct rte_pci_device *dev)
struct rte_pci_driver *dr = NULL;
int rc = 0;

-   if (dev == NULL)
-   return -1;
-
TAILQ_FOREACH(dr, _driver_list, next) {
+   if (!pci_driver_supports_device(dr, dev))
+   continue;
+
rc = rte_eal_pci_detach_dev(dr, dev);
if (rc < 0)
/* negative value is an error */
return -1;
-   if (rc > 0)
-   /* positive value means driver not found */
-   continue;
return 0;
}
return 1;
-- 
1.9.1

[dpdk-dev] [PATCH 4/9] pci: rework sysfs parsing for driver

2016-01-22 Thread David Marchand

There is no use for pci_get_kernel_driver_by_path() apart recognising
kernel driver and fill kdrv field.
If driver parsing fails, report "none" driver rather than "unknown".

Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 48 ++-
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 7bec30f..00fdee7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -94,32 +94,37 @@ error:
 }

 static int
-pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
+pci_parse_sysfs_driver(const char *filename, struct rte_pci_device *dev)
 {
int count;
char path[PATH_MAX];
char *name;

-   if (!filename || !dri_name)
-   return -1;
-
count = readlink(filename, path, PATH_MAX);
if (count >= PATH_MAX)
return -1;

-   /* For device does not have a driver */
-   if (count < 0)
-   return 1;
+   dev->kdrv = RTE_KDRV_NONE;

-   path[count] = '\0';
+   if (count > 0) {
+   dev->kdrv = RTE_KDRV_UNKNOWN;

-   name = strrchr(path, '/');
-   if (name) {
-   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
-   return 0;
+   path[count] = '\0';
+   name = strrchr(path, '/');
+   if (name) {
+   name[0] = '\0';
+   name++;
+   }
+
+   if (!strcmp(name, "vfio-pci"))
+   dev->kdrv = RTE_KDRV_VFIO;
+   else if (!strcmp(name, "igb_uio"))
+   dev->kdrv = RTE_KDRV_IGB_UIO;
+   else if (!strcmp(name, "uio_pci_generic"))
+   dev->kdrv = RTE_KDRV_UIO_GENERIC;
}

-   return -1;
+   return 0;
 }

 /* Map pci device */
@@ -260,8 +265,6 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
unsigned long tmp;
struct rte_pci_device *dev;
struct rte_pci_device *dev2;
-   char driver[PATH_MAX];
-   int ret;

dev = malloc(sizeof(*dev));
if (dev == NULL)
@@ -341,25 +344,12 @@ pci_scan_one(const char *dirname, uint16_t domain, 
uint8_t bus,

/* parse driver */
snprintf(filename, sizeof(filename), "%s/driver", dirname);
-   ret = pci_get_kernel_driver_by_path(filename, driver);
-   if (ret < 0) {
+   if (pci_parse_sysfs_driver(filename, dev) < 0) {
RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
free(dev);
return -1;
}

-   if (!ret) {
-   if (!strcmp(driver, "vfio-pci"))
-   dev->kdrv = RTE_KDRV_VFIO;
-   else if (!strcmp(driver, "igb_uio"))
-   dev->kdrv = RTE_KDRV_IGB_UIO;
-   else if (!strcmp(driver, "uio_pci_generic"))
-   dev->kdrv = RTE_KDRV_UIO_GENERIC;
-   else
-   dev->kdrv = RTE_KDRV_UNKNOWN;
-   } else
-   dev->kdrv = RTE_KDRV_UNKNOWN;
-
dev2 = pci_find_device(>addr);
if (!dev2)
pci_add_device(dev);
-- 
1.9.1

[dpdk-dev] [PATCH 3/9] pci: minor cleanup

2016-01-22 Thread David Marchand

Those are the only fields that are explicitly set to 0 while the global
dev pointer is set to 0 a few lines before.

Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 83eec5d..7bec30f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -308,7 +308,6 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
dev->id.subsystem_device_id = (uint16_t)tmp;

/* get max_vfs */
-   dev->max_vfs = 0;
snprintf(filename, sizeof(filename), "%s/max_vfs", dirname);
if (!access(filename, F_OK) &&
eal_parse_sysfs_value(filename, ) == 0)
@@ -323,12 +322,8 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
}

/* get numa node */
-   snprintf(filename, sizeof(filename), "%s/numa_node",
-dirname);
-   if (access(filename, R_OK) != 0) {
-   /* if no NUMA support, set default to 0 */
-   dev->numa_node = 0;
-   } else {
+   snprintf(filename, sizeof(filename), "%s/numa_node", dirname);
+   if (!access(filename, R_OK)) {
if (eal_parse_sysfs_value(filename, ) < 0) {
free(dev);
return -1;
-- 
1.9.1

[dpdk-dev] [PATCH 2/9] pci: add internal device list helpers

2016-01-22 Thread David Marchand

Remove duplicate code by moving this to helpers in common.

Signed-off-by: David Marchand 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 34 +---
 lib/librte_eal/common/eal_common_pci.c  | 70 -
 lib/librte_eal/common/include/rte_pci.h | 24 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 34 +---
 4 files changed, 94 insertions(+), 68 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 4584522..401cda3 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -247,6 +247,7 @@ static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
struct rte_pci_device *dev;
+   struct rte_pci_device *dev2;
struct pci_bar_io bar;
unsigned i, max;

@@ -311,32 +312,15 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
dev->mem_resource[i].phys_addr = bar.pbi_base & 
~((uint64_t)0xf);
}

-   /* device is valid, add in list (sorted) */
-   if (TAILQ_EMPTY(_device_list)) {
-   TAILQ_INSERT_TAIL(_device_list, dev, next);
-   }
+   dev2 = pci_find_device(>addr);
+   if (!dev2)
+   pci_add_device(dev);
else {
-   struct rte_pci_device *dev2 = NULL;
-   int ret;
-
-   TAILQ_FOREACH(dev2, _device_list, next) {
-   ret = rte_eal_compare_pci_addr(>addr, >addr);
-   if (ret > 0)
-   continue;
-   else if (ret < 0) {
-   TAILQ_INSERT_BEFORE(dev2, dev, next);
-   return 0;
-   } else { /* already registered */
-   dev2->kdrv = dev->kdrv;
-   dev2->max_vfs = dev->max_vfs;
-   memmove(dev2->mem_resource,
-   dev->mem_resource,
-   sizeof(dev->mem_resource));
-   free(dev);
-   return 0;
-   }
-   }
-   TAILQ_INSERT_TAIL(_device_list, dev, next);
+   dev2->kdrv = dev->kdrv;
+   dev2->max_vfs = dev->max_vfs;
+   memmove(dev2->mem_resource, dev->mem_resource,
+   sizeof(dev->mem_resource));
+   free(dev);
}

return 0;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 1e12776..2528775 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -139,6 +139,41 @@ pci_unmap_resource(void *requested_addr, size_t size)
requested_addr);
 }

+struct rte_pci_device *
+pci_find_device(const struct rte_pci_addr *addr)
+{
+   struct rte_pci_device *dev;
+
+   TAILQ_FOREACH(dev, _device_list, next) {
+   if (!rte_eal_compare_pci_addr(>addr, addr))
+   break;
+   }
+
+   return dev;
+}
+
+int
+pci_add_device(struct rte_pci_device *dev)
+{
+   struct rte_pci_device *dev2;
+   int ret;
+
+   TAILQ_FOREACH(dev2, _device_list, next) {
+   ret = rte_eal_compare_pci_addr(>addr, >addr);
+   if (ret > 0)
+   continue;
+
+   if (ret == 0)
+   return -1;
+
+   TAILQ_INSERT_BEFORE(dev2, dev, next);
+   return 0;
+   }
+
+   TAILQ_INSERT_TAIL(_device_list, dev, next);
+   return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -332,16 +367,15 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
if (addr == NULL)
return -1;

-   TAILQ_FOREACH(dev, _device_list, next) {
-   if (rte_eal_compare_pci_addr(>addr, addr))
-   continue;
+   dev = pci_find_device(addr);
+   if (!dev)
+   goto err_return;

-   ret = pci_probe_all_drivers(dev);
-   if (ret < 0)
-   goto err_return;
-   return 0;
-   }
-   return -1;
+   ret = pci_probe_all_drivers(dev);
+   if (ret < 0)
+   goto err_return;
+
+   return 0;

 err_return:
RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
@@ -362,18 +396,16 @@ rte_eal_pci_detach(const struct rte_pci_addr *addr)
if (addr == NULL)
return -1;

-   TAILQ_FOREACH(dev, _device_list, next) {
-   if (rte_eal_compare_pci_addr(>addr, addr))
-   continue;
+   dev = pci_find_device(addr);
+   if (!dev)
+   goto err_return;

-   ret = pci_detach_all_drivers(dev);
-   if (ret < 0)
-   goto err_return;
+   ret =

[dpdk-dev] [PATCH 0/9] pci cleanup and blacklist rework

2016-01-22 Thread David Marchand

Before 2.2.0 release, while preparing for more changes in eal (and fixing
a problem reported by Roger M. [1]), I came up with this patchset that
tries to make the pci code more compact and easier to read.

I did limited testing, but this has been in my tree for quite some time
now, so sending this to get feedback for 2.3 / 16.04.


The 4th patch introduces a change in linux eal.
Before, if a pci device was bound to no kernel driver, eal would set kdrv
to "unknown". With this change, kdrv is set to "none".
This might make it possible to avoid the old issue of virtio devices being
used by dpdk while still bound to kernel driver reported by Franck B. [2].
I'll let virtio guys look at this.
At the very least, it makes more sense to me.


The 5th and 6th patches could be squashed.
I just find it easier to review this way.


In the 8th patch, I noticed that a driver lookup is done when detaching a
pci device while this makes no sense and looks buggy to me.


The last patch moves pci blacklist evaluation to rte_eal_pci_probe().
With this, it is now possible to attach a pci device blacklisted
initially.
I am not entirely happy with this last patch, I will try to send a
different solution next week (hooks in pci layer).


[1] http://dpdk.org/ml/archives/dev/2015-November/028140.html
[2] http://dpdk.org/ml/archives/dev/2015-September/023389.html

-- 
David Marchand

David Marchand (9):
  pci: no need for dynamic tailq init
  pci: add internal device list helpers
  pci: minor cleanup
  pci: rework sysfs parsing for driver
  pci: factorize probe/detach code
  pci: cosmetic change
  pci: factorize driver search
  pci: remove driver lookup from detach
  pci: blacklist only in global probe function

 lib/librte_eal/bsdapp/eal/eal_pci.c |  37 +---
 lib/librte_eal/common/eal_common_pci.c  | 320 +++-
 lib/librte_eal/common/include/rte_pci.h |  24 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  94 --
 4 files changed, 218 insertions(+), 257 deletions(-)

-- 
1.9.1

[dpdk-dev] [RFC] Remove "extern" keyword for functions from headers

2016-01-22 Thread Ferruh Yigit

Does anybody have an objection if I send a patch to remove "extern"
keywords in header files, the ones for function prototypes.

Regards,
ferruh

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread David Harton (dharton)

> > xstats are driver agnostic and have a well-defined naming scheme.
> 
> Indeed, described here:
> http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#extended-
> statistics-api

Thanks for sharing.  I do think what is in the link is well thought out but it 
also may not match how the application would choose to represent that stat 
(capitalization, abbreviation, etc).

> From: David Harton:
> > For example, what if there was a kind of "stats registry" composed of
> > ID and name.  It would work similar to xtats except instead of
> > advertising strings only the "get" API would return ID/count pairs.
> > If the application wishes to use the DPDK provided string they can
> > call another API to get the stat string via the ID.  These IDs would
> > be well-defined clearly explaining what the count represent.  This way
> > the strings for counts will be uniform across drivers and also make it
> clear to the users what the counts truly represent and the application
> could obtain stats from any driver in a driver agnostic manner.
> 
> What you (David Harton) describe about a well-defined name, and clearly
> explaining the value it is representing is what the goal was for xstats: a
> human readable string, which can be easily parsed and a value retrieved.
> 
> The "rules" of encoding a statistic as an xstats string are pretty simple,
> and if any PMD breaks the rules, it should be considered broken and fixed.

I understand it may be broken, but that doesn't help finding them and ensuring 
they aren't broken to begin with.  What regression/automation is in place to 
ensure drivers aren't broken?

> 
> Consistency is key, in this case consistency in the PMD xstats
> implementations to make it useful for clients of the API.
> 
> The big advantage xstats has over adding items to rte_eth_stats is that it
> won't break ABI.

100% agree.  I can see the intent.

> 
> Do you see a fundamental problem with parsing the strings to retrieve
> values?

I think parsing strings is expensive CPU wise and again subject to error as 
devices are added.  That's why I was wondering if a binary id could be 
generated instead.  The binary id could also be used to look up the string name 
if the application wants it.  The API would be very similar to the current 
xstats API except it would pass id/value pairs instead of string/value pairs.  
This avoids string comparisons to figure out while still allowing flexibility 
between drivers.  It would also 100% guarantee that any strings returned by 
"const char* rte_eth_xstats_str_get(uint32_t stat_id)" are consistent across 
devices.

I also think there is less chance for error if drivers assign their stats to ID 
versus constructing a string.  I haven't checked my application, but I wonder 
if the binary size would actually be smaller if all the stats strings were 
centrally located versus distributed across binaries (highly dependent on the 
linker and optimization level).

> 
> If you like I could code up a minimal sample-app that only pulls
> statistics, and you can show "interest" in various statistics for printing
> / monitoring?

Appreciate the offer.  I actually understand what's is available.  And, BTW, I 
apologize for being late to the game (looks like this was discussed last 
summer) but I'm just now getting looped in and following the mailer but I'm 
just wondering if something that is performance friendly for the 
user/application and flexible for the drivers is possible.

Thanks,
Dave

> 
> 
> Regards, -Harry

[dpdk-dev] [RFC 2/2] kdp: add virtual PMD for kernel slow data path communication

2016-01-22 Thread Ferruh Yigit

This patch provides slow data path communication to the Linux kernel.
Patch is based on librte_kni, and heavily re-uses it.

The main difference is librte_kni library converted into a PMD, to
provide ease of use for applications.

Now any application can use slow path communication without any update
in application, because of existing eal support for virtual PMD.

PMD's rx_pkt_burst() get packets from FIFO, and tx_pkt_burst() puts
packet to the FIFO.
The corresponding Linux virtual network device driver code
also gets/puts packages from FIFO as they are coming from hardware.

Signed-off-by: Ferruh Yigit 
---
 config/common_linuxapp  |   1 +
 drivers/net/Makefile|   3 +-
 drivers/net/kdp/Makefile|  60 +
 drivers/net/kdp/rte_eth_kdp.c   | 405 
 drivers/net/kdp/rte_kdp.c   | 365 
 drivers/net/kdp/rte_kdp.h   | 113 +
 drivers/net/kdp/rte_kdp_fifo.h  |  91 +++
 drivers/net/kdp/rte_pmd_kdp_version.map |   4 +
 lib/librte_eal/common/include/rte_log.h |   3 +-
 mk/rte.app.mk   |   3 +-
 10 files changed, 1045 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/kdp/Makefile
 create mode 100644 drivers/net/kdp/rte_eth_kdp.c
 create mode 100644 drivers/net/kdp/rte_kdp.c
 create mode 100644 drivers/net/kdp/rte_kdp.h
 create mode 100644 drivers/net/kdp/rte_kdp_fifo.h
 create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 73c91d8..b9dec0c 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -322,6 +322,7 @@ CONFIG_RTE_LIBRTE_PMD_NULL=y
 #
 # Compile KDP PMD
 #
+CONFIG_RTE_LIBRTE_PMD_KDP=y
 CONFIG_RTE_KDP_KMOD=y
 CONFIG_RTE_KDP_PREEMPT_DEFAULT=y

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..0be06f5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -51,6 +51,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += kdp

 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/kdp/Makefile b/drivers/net/kdp/Makefile
new file mode 100644
index 000..ac44c0f
--- /dev/null
+++ b/drivers/net/kdp/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_kdp.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_kdp_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_eth_kdp.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_ether
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git

[dpdk-dev] [RFC 1/2] kdp: add kernel data path kernel module

2016-01-22 Thread Ferruh Yigit

This kernel module is based on KNI module, but this one is stripped
version of it and only for data messages, no control functionality
provided.

FIFO implementation of the KNI is kept exact same, but ethtool related
code removed and virtual network management related code simplified.

This module contains kernel support to create network devices and
this module has a simple driver for virtual network device, the driver
simply puts/gets packages to/from FIFO instead of real hardware.

FIFO is created owned by userspace application, which is for this case
KDP PMD.

Signed-off-by: Ferruh Yigit 
---
 config/common_linuxapp |   8 +-
 lib/librte_eal/linuxapp/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   3 +-
 .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 138 +
 lib/librte_eal/linuxapp/kdp/Makefile   |  56 ++
 lib/librte_eal/linuxapp/kdp/compat.h   |  29 ++
 lib/librte_eal/linuxapp/kdp/kdp_dev.h  |  85 +++
 lib/librte_eal/linuxapp/kdp/kdp_fifo.h |  94 
 lib/librte_eal/linuxapp/kdp/kdp_misc.c | 463 +
 lib/librte_eal/linuxapp/kdp/kdp_net.c  | 578 +
 10 files changed, 1456 insertions(+), 3 deletions(-)
 create mode 100644 
lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h
 create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile
 create mode 100644 lib/librte_eal/linuxapp/kdp/compat.h
 create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h
 create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h
 create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_misc.c
 create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..73c91d8 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -320,6 +320,12 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
 CONFIG_RTE_LIBRTE_PMD_NULL=y

 #
+# Compile KDP PMD
+#
+CONFIG_RTE_KDP_KMOD=y
+CONFIG_RTE_KDP_PREEMPT_DEFAULT=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index d9c5233..e3f91a7 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -38,6 +38,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_KNI_KMOD),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
 endif
+ifeq ($(CONFIG_RTE_KDP_KMOD),y)
+DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kdp
+endif
 ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0
 endif
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..ac72aea 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -116,6 +116,7 @@ CFLAGS_eal_thread.o += -Wno-return-type
 endif

 INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h
+INC += rte_kdp_common.h

 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP)-include/exec-env := \
$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h
new file mode 100644
index 000..71f7dc9
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h
@@ -0,0 +1,138 @@
+/*-
+ *   This file is provided under a dual BSD/LGPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GNU LESSER GENERAL PUBLIC LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2.1 of the GNU Lesser General Public License
+ *   as published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   Lesser General Public License for more

[dpdk-dev] [RFC 0/2] slow data path communication between DPDK port and Linux

2016-01-22 Thread Ferruh Yigit

This is slow data path communication implementation based on existing KNI.

Difference is: librte_kni converted into a PMD, kdp kernel module is almost
same except all control path functionality removed and some simplification done.

Motivation is to simplify slow path data communication.
Now any application can use this new PMD to send/get data to Linux kernel.

PMD initialization functions handles creating virtual interfaces (with help of
kdp kernel module) and created FIFO. FIFO is used to share data between
userspace and kernelspace.


Sample usage:
1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel

a) insert kdp kernel module
insmod build/kmod/rte_kdp.ko

b) bind NIC to the DPDK using dpdk_nic_bind.py

c) ./testpmd --vdev eth_kdp0

c1) testpmd show two ports, one of them physical, other virtual
...
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:00
Configuring Port 1 (socket 0)
...
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex
Done

c2) This will create "kdp0" Linux interface
$ ip l show kdp0
21: kdp0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

d) Linux port can be used for data

d1)
$ ifconfig kdp0 1.0.0.2
$ ping 1.0.0.1
PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data.
64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms
64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms

d2)
$ tcpdump -nn -i kdp0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, 
length 64
15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 
64



2) Data travels between virtual Linux interfaces pass from DPDK application,
application can alter data

a) insert kdp kernel module
insmod build/kmod/rte_kdp.ko

b) No physical NIC involved

c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1

c1) testpmd show two ports, both of them are virtual
...
Configuring Port 0 (socket 0)
Port 0: 00:00:00:00:00:00
Configuring Port 1 (socket 0)
Port 1: 00:00:00:00:00:00
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Up - speed 1 Mbps - full-duplex
Done

c2) This will create "kdp0"  and "kdp1" Linux interfaces
$ ip l show kdp0; ip l show kdp1
22: kdp0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
23: kdp1:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

d) Data travel between virtual ports pass from DPDK application
$ifconfig kdp0 1.0.0.1
$ifconfig kdp1 1.0.0.2

d1)
$ ping 1.0.0.1
PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data.
64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms
64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms
64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms

d2)
$ tcpdump -nn -i kdp0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, 
length 64
15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 
64
15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, 
length 64
15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 
64




Ferruh Yigit (2):
  kdp: add kernel data path kernel module
  kdp: add virtual PMD for kernel slow data path communication

 config/common_linuxapp |   9 +-
 drivers/net/Makefile   |   3 +-
 drivers/net/kdp/Makefile   |  60 +++
 drivers/net/kdp/rte_eth_kdp.c  | 405 +++
 drivers/net/kdp/rte_kdp.c  | 365 +
 drivers/net/kdp/rte_kdp.h  | 113 
 drivers/net/kdp/rte_kdp_fifo.h |  91 
 drivers/net/kdp/rte_pmd_kdp_version.map|   4 +
 lib/librte_eal/common/include/rte_log.h|   3 +-
 lib/librte_eal/linuxapp/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   3 +-
 .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 138 +
 lib/librte_eal/linuxapp/kdp/Makefile   |  56 ++
 lib/librte_eal/linuxapp/kdp/compat.h   |  29 ++
 lib/librte_eal/linuxapp/kdp/kdp_dev.h  |  85 +++
 lib/librte_eal/linuxapp/kdp/kdp_fifo.h |  94 
 lib/librte_eal/linuxapp/kdp/kdp_misc.c | 463 +
 lib/librte_eal/linuxapp/kdp/kdp_net.c  | 578 +
 mk/rte.app.mk

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Matthew Hall

On Fri, Jan 22, 2016 at 06:02:24PM +0300, Igor Ryzhov wrote:
> How about exposing stats according to IF-MIB?
> 
> Statistics to be exposed are - octets, unicast packets, multicast packets, 
> broadcast packets, errors and discards for both TX and RX.
> 
> These counters are basic and implemented by most of drivers.

To be a bit more specific it would be good to have IF-MIB ifTable with the 
items from ifXTable as well:

ifIndex
ifMtu
ifHighSpeed
ifPromiscuousMode
ifPhysAddress
ifConnectorPresent

ifHCInOctets
ifHCInUcastPkts
ifHCInMulticastPkts
ifHCInBroadcastPkts
ifInDiscards
ifInErrors
ifInUnknownProtos

ifHCOutOctets
ifHCOutUcastPkts
ifHCOutMulticastPkts
ifHCOutBroadcastPkts
ifOutDiscards
ifOutErrors

A number of things are missing or weird in the DPDK stats interface. Then I 
get stuck trying to maintain them in my app instead and it's annoying.

Also, it is nice to get the struct populated atomically so the values are as 
self-consistent as possible. If you have to call a function separately on each 
stat it makes them self-inconsistent because it is less atomically populated.

>From long experience, this inconsistency is quite annoying when trying to make 
very accurate traffic measurements in network management software.

Matthew.

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Thomas Monjalon

+ Harry

2016-01-22 14:40, David Harton:
> Hi Maryam,
> 
> I'm not dictating they be re-added (although adding would be nice) but more 
> important I'm trying to express an application view point rather than a 
> driver view point.  
> 
> I completely understand how a driver wants to be able to advertise all the 
> stats they want to advertise to help them debug their issues (i.e. xstats).   
> Yet, I'm very interested in DPDK providing a driver agnostic method of 
> advertising well-defined stats.

xstats are driver agnostic and have a well-defined naming scheme.

> For example, what if there was a kind of "stats registry" composed of ID and 
> name.  It would work similar to xtats except instead of advertising strings 
> only the "get" API would return ID/count pairs.  If the application wishes to 
> use the DPDK provided string they can call another API to get the stat string 
> via the ID.  These IDs would be well-defined clearly explaining what the 
> count represent.  This way the strings for counts will be uniform across 
> drivers and also make it clear to the users what the counts truly represent 
> and the application could obtain stats from any driver in a driver agnostic 
> manner.

I don't understand how adding another indirection (an ID matching a string)
would help?
I have the feeling you want a list of possible statistics, right?

PS: please avoid top posting

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Thomas Monjalon

2016-01-22 14:18, Tahhan, Maryam:
> So what can be enabled again in struct rte_eth_stats  from what was already 
> there is the equivalent of: 
> * rx_length_errors
> * rx_crc_errors
> * rx_missed_errors - the deprecation notice was removed for this field.
> * multicast
> 
> What should be added in to distinguish between errors and drops. struct 
> rte_eth_stats :
>  * rx_errors
>  * tx_errors
> 
> As for the detailed rx errors and tx errors I'm open to feedback from you 
> folks as to what should go in and what is too detailed. These weren't in 
> struct rte_eth_stats previously, they are available through xstats and are 
> uniformly named across the drivers. Oliver + Harry any thoughts?
> 
> David I assume you are looking for all the missing fields to be added?

They are not missing. They just not exactly match ones having a
long history in Linux kernel.
Please let's avoid to blindly mimic others without thinking
about modern needs.

> > > From: David Harton
> > > > Is there a reason the stats have been deprecated?  Why not keep
> > > > the stats in line with the standard linux practices such as
> > > > rtnl_link_stats64?

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Van Haaren, Harry

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Subject: Re: [dpdk-dev] Future Direction for rte_eth_stats_get()

> + Harry

Hey all,

> xstats are driver agnostic and have a well-defined naming scheme.

Indeed, described here:
http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#extended-statistics-api

All of the rte_eth_stats statistics are presented in xstats consistently
(independent of which PMD is running), as they are implemented in rte_ethdev.c:
http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.c#n1500


From: David Harton:
> For example, what if there was a kind of "stats registry" composed of ID and 
> name.  It
> would work similar to xtats except instead of advertising strings only the 
> "get" API would
> return ID/count pairs.  If the application wishes to use the DPDK provided 
> string they can
> call another API to get the stat string via the ID.  These IDs would be 
> well-defined
> clearly explaining what the count represent.  This way the strings for counts 
> will be
> uniform across drivers and also make it clear to the users what the counts 
> truly represent
> and the application could obtain stats from any driver in a driver agnostic 
> manner.

What you (David Harton) describe about a well-defined name, and clearly 
explaining the value
it is representing is what the goal was for xstats: a human readable string, 
which can be
easily parsed and a value retrieved.

The "rules" of encoding a statistic as an xstats string are pretty simple, and 
if any PMD
breaks the rules, it should be considered broken and fixed.

Consistency is key, in this case consistency in the PMD xstats implementations 
to make it
useful for clients of the API.

The big advantage xstats has over adding items to rte_eth_stats is that it 
won't break ABI.

Do you see a fundamental problem with parsing the strings to retrieve values?

If you like I could code up a minimal sample-app that only pulls statistics,
and you can show "interest" in various statistics for printing / monitoring?


Regards, -Harry

[dpdk-dev] [PATCH v2 2/2] ethdev: move code to common place in hotplug

2016-01-22 Thread David Marchand

Move these error logs and checks on detach capabilities in a common place.

Signed-off-by: David Marchand 
---
Changes since v1:
- restore EINVAL error code for rte_eth_dev_(at|de)tach

 lib/librte_ether/rte_ethdev.c | 77 ++-
 1 file changed, 46 insertions(+), 31 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index cab74e0..a45563d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -514,7 +514,6 @@ rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t 
*port_id)

return 0;
 err:
-   RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n");
return -1;
 }

@@ -525,10 +524,6 @@ rte_eth_dev_detach_pdev(uint8_t port_id, struct 
rte_pci_addr *addr)
struct rte_pci_addr freed_addr;
struct rte_pci_addr vp;

-   /* check whether the driver supports detach feature, or not */
-   if (rte_eth_dev_is_detachable(port_id))
-   goto err;
-
/* get pci address by port id */
if (rte_eth_dev_get_addr_by_port(port_id, _addr))
goto err;
@@ -546,7 +541,6 @@ rte_eth_dev_detach_pdev(uint8_t port_id, struct 
rte_pci_addr *addr)
*addr = freed_addr;
return 0;
 err:
-   RTE_LOG(ERR, EAL, "Driver, cannot detach the device\n");
return -1;
 }

@@ -579,8 +573,6 @@ end:
if (args)
free(args);

-   if (ret < 0)
-   RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n");
return ret;
 }

@@ -590,10 +582,6 @@ rte_eth_dev_detach_vdev(uint8_t port_id, char *vdevname)
 {
char name[RTE_ETH_NAME_MAX_LEN];

-   /* check whether the driver supports detach feature, or not */
-   if (rte_eth_dev_is_detachable(port_id))
-   goto err;
-
/* get device name by port id */
if (rte_eth_dev_get_name_by_port(port_id, name))
goto err;
@@ -605,7 +593,6 @@ rte_eth_dev_detach_vdev(uint8_t port_id, char *vdevname)
strncpy(vdevname, name, sizeof(name));
return 0;
 err:
-   RTE_LOG(ERR, EAL, "Driver, cannot detach the device\n");
return -1;
 }

@@ -614,14 +601,27 @@ int
 rte_eth_dev_attach(const char *devargs, uint8_t *port_id)
 {
struct rte_pci_addr addr;
+   int ret = -1;

-   if ((devargs == NULL) || (port_id == NULL))
-   return -EINVAL;
+   if ((devargs == NULL) || (port_id == NULL)) {
+   ret = -EINVAL;
+   goto err;
+   }

-   if (eal_parse_pci_DomBDF(devargs, ) == 0)
-   return rte_eth_dev_attach_pdev(, port_id);
-   else
-   return rte_eth_dev_attach_vdev(devargs, port_id);
+   if (eal_parse_pci_DomBDF(devargs, ) == 0) {
+   ret = rte_eth_dev_attach_pdev(, port_id);
+   if (ret < 0)
+   goto err;
+   } else {
+   ret = rte_eth_dev_attach_vdev(devargs, port_id);
+   if (ret < 0)
+   goto err;
+   }
+
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n");
+   return ret;
 }

 /* detach the device, then store the name of the device */
@@ -629,26 +629,41 @@ int
 rte_eth_dev_detach(uint8_t port_id, char *name)
 {
struct rte_pci_addr addr;
-   int ret;
+   int ret = -1;

-   if (name == NULL)
-   return -EINVAL;
+   if (name == NULL) {
+   ret = -EINVAL;
+   goto err;
+   }
+
+   /* check whether the driver supports detach feature, or not */
+   if (rte_eth_dev_is_detachable(port_id))
+   goto err;

if (rte_eth_dev_get_device_type(port_id) == RTE_ETH_DEV_PCI) {
ret = rte_eth_dev_get_addr_by_port(port_id, );
if (ret < 0)
-   return ret;
+   goto err;

ret = rte_eth_dev_detach_pdev(port_id, );
-   if (ret == 0)
-   snprintf(name, RTE_ETH_NAME_MAX_LEN,
-   "%04x:%02x:%02x.%d",
-   addr.domain, addr.bus,
-   addr.devid, addr.function);
+   if (ret < 0)
+   goto err;

-   return ret;
-   } else
-   return rte_eth_dev_detach_vdev(port_id, name);
+   snprintf(name, RTE_ETH_NAME_MAX_LEN,
+   "%04x:%02x:%02x.%d",
+   addr.domain, addr.bus,
+   addr.devid, addr.function);
+   } else {
+   ret = rte_eth_dev_detach_vdev(port_id, name);
+   if (ret < 0)
+   goto err;
+   }
+
+   return 0;
+
+err:
+   RTE_LOG(ERR, EAL, "Driver, cannot detach the device\n");
+   return ret;
 }

 static int
-- 
1.9.1

[dpdk-dev] [PATCH v2 1/2] ethdev: remove useless null checks

2016-01-22 Thread David Marchand

We are in static functions and those passed arguments can't be NULL.

Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..cab74e0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -220,9 +220,6 @@ rte_eth_dev_create_unique_device_name(char *name, size_t 
size,
 {
int ret;

-   if ((name == NULL) || (pci_dev == NULL))
-   return -EINVAL;
-
ret = snprintf(name, size, "%d:%d.%d",
pci_dev->addr.bus, pci_dev->addr.devid,
pci_dev->addr.function);
@@ -505,9 +502,6 @@ rte_eth_dev_is_detachable(uint8_t port_id)
 static int
 rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
 {
-   if ((addr == NULL) || (port_id == NULL))
-   goto err;
-
/* re-construct pci_device_list */
if (rte_eal_pci_scan())
goto err;
@@ -531,9 +525,6 @@ rte_eth_dev_detach_pdev(uint8_t port_id, struct 
rte_pci_addr *addr)
struct rte_pci_addr freed_addr;
struct rte_pci_addr vp;

-   if (addr == NULL)
-   goto err;
-
/* check whether the driver supports detach feature, or not */
if (rte_eth_dev_is_detachable(port_id))
goto err;
@@ -566,9 +557,6 @@ rte_eth_dev_attach_vdev(const char *vdevargs, uint8_t 
*port_id)
char *name = NULL, *args = NULL;
int ret = -1;

-   if ((vdevargs == NULL) || (port_id == NULL))
-   goto end;
-
/* parse vdevargs, then retrieve device name and args */
if (rte_eal_parse_devargs_str(vdevargs, , ))
goto end;
@@ -602,9 +590,6 @@ rte_eth_dev_detach_vdev(uint8_t port_id, char *vdevname)
 {
char name[RTE_ETH_NAME_MAX_LEN];

-   if (vdevname == NULL)
-   goto err;
-
/* check whether the driver supports detach feature, or not */
if (rte_eth_dev_is_detachable(port_id))
goto err;
-- 
1.9.1

[dpdk-dev] [PATCH v2 0/2] minor cleanup in ethdev hotplug

2016-01-22 Thread David Marchand

It was first a preparation step for future patchsets, but I am not sure what
will become of them, so sending this anyway since it does not hurt to clean
this now.

Changes since v1:
- rebased on HEAD (previous patchset was based on another patch I sent
  separately)
- restored EINVAL error code for rte_eth_dev_(at|de)tach (thanks Jan)


-- 
David Marchand

David Marchand (2):
  ethdev: remove useless null checks
  ethdev: move code to common place in hotplug

 lib/librte_ether/rte_ethdev.c | 92 +--
 1 file changed, 46 insertions(+), 46 deletions(-)

-- 
1.9.1

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Tahhan, Maryam

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, January 22, 2016 2:44 PM
> To: Tahhan, Maryam ; David Harton
> (dharton) 
> Cc: dev at dpdk.org; olivier.matz at 6wind.com; Van Haaren, Harry
> 
> Subject: Re: [dpdk-dev] Future Direction for rte_eth_stats_get()
> 
> 2016-01-22 14:18, Tahhan, Maryam:
> > So what can be enabled again in struct rte_eth_stats  from what was
> already there is the equivalent of:
> > * rx_length_errors
> > * rx_crc_errors
> > * rx_missed_errors - the deprecation notice was removed for this
> field.
> > * multicast
> >
> > What should be added in to distinguish between errors and drops.
> struct rte_eth_stats :
> >  * rx_errors
> >  * tx_errors
> >
> > As for the detailed rx errors and tx errors I'm open to feedback from
> you folks as to what should go in and what is too detailed. These
> weren't in struct rte_eth_stats previously, they are available through
> xstats and are uniformly named across the drivers. Oliver + Harry any
> thoughts?
> >
> > David I assume you are looking for all the missing fields to be added?
> 
> They are not missing. They just not exactly match ones having a long
> history in Linux kernel.
> Please let's avoid to blindly mimic others without thinking about
> modern needs.
> 

My bad wording - I apologise, but I agree we should consider what makes sense 
and what doesn't.

> > > > From: David Harton
> > > > > Is there a reason the stats have been deprecated?  Why not
> keep
> > > > > the stats in line with the standard linux practices such as
> > > > > rtnl_link_stats64?

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread David Harton (dharton)

Hi Maryam,

I'm not dictating they be re-added (although adding would be nice) but more 
important I'm trying to express an application view point rather than a driver 
view point.  

I completely understand how a driver wants to be able to advertise all the 
stats they want to advertise to help them debug their issues (i.e. xstats).   
Yet, I'm very interested in DPDK providing a driver agnostic method of 
advertising well-defined stats.

For example, what if there was a kind of "stats registry" composed of ID and 
name.  It would work similar to xtats except instead of advertising strings 
only the "get" API would return ID/count pairs.  If the application wishes to 
use the DPDK provided string they can call another API to get the stat string 
via the ID.  These IDs would be well-defined clearly explaining what the count 
represent.  This way the strings for counts will be uniform across drivers and 
also make it clear to the users what the counts truly represent and the 
application could obtain stats from any driver in a driver agnostic manner.

Just an idea,
Dave

> -Original Message-
> From: Tahhan, Maryam [mailto:maryam.tahhan at intel.com]
> Sent: Friday, January 22, 2016 9:18 AM
> To: David Harton (dharton) ; dev at dpdk.org
> Cc: Olivier MATZ ; Van Haaren, Harry
> 
> Subject: RE: Future Direction for rte_eth_stats_get()
> 
> Hi David
> 
> + Olivier and Harry as contributors and advisors in the past on the stats
> and stats API.
> 
> So I pulled the rtnl_link_stats64 from http://lxr.free-
> electrons.com/source/include/uapi/linux/if_link.h#L41 and crossed them
> with http://dpdk.org/dev/patchwork/patch/5842/ and
> http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h
> 
> 40 /* The main device statistics structure */
>  41 struct rtnl_link_stats64 {
>  42 __u64   rx_packets; /* total packets received
> */  is in struct rte_eth_stats
>  43 __u64   tx_packets; /* total packets transmitted
> */   is in struct rte_eth_stats
>  44 __u64   rx_bytes;   /* total bytes received
> */   is in struct rte_eth_stats
>  45 __u64   tx_bytes;   /* total bytes transmitted
> */is in struct rte_eth_stats
>  46 __u64   rx_errors;  /* bad packets received
> */ I'm working on a patch to distinguish between error
> and drops for struct rte_eth_stats at the moment only drops are reported
> through ierrors and oerrors and we are tied to those field name for
> backward compatibility.
>  47 __u64   tx_errors;  /* packet transmit problems
> */Same comment as rx_errors.
>  48 __u64   rx_dropped; /* no space in linux buffers
> */   is in struct rte_eth_stats
>  49 __u64   tx_dropped; /* no space available in linux
> */  is in struct rte_eth_stats
>  50 __u64   multicast;  /* multicast packets received
> */was deprecated in struct rte_eth_stats - but we can
> remove the notice and expose again from drivers we modified
>  51 __u64   collisions;  Not in struct rte_eth_stats - not
> sure it's in xstats either...
>  52
>  53 /* detailed rx_errors: */
>  54 __u64   rx_length_errors;
>  was deprecated in struct rte_eth_stats, is in xstats - but we can
> remove the notice and expose again from drivers we modified
>  55 __u64   rx_over_errors; /* receiver ring buff overflow
> */ was never exposed to struct rte_eth_stats  - but is in xstats.
>  56 __u64   rx_crc_errors;  /* recved pkt with crc error
> */  was deprecated in struct rte_eth_stats, is in xstats - but
> we can remove the notice and expose again from drivers we modified
>  57 __u64   rx_frame_errors;/* recv'd frame alignment
> error */  was never exposed to struct rte_eth_stats - but is in
> xstats.
>  58 __u64   rx_fifo_errors; /* recv'r fifo overrun
> */was never in struct rte_eth_stats. Not exposed by
> all drivers
>  59 __u64   rx_missed_errors;   /* receiver missed packet
> */was deprecated in struct rte_eth_stats, is in xstats - but we
> can remove the notice
>  60
>  61 /* detailed tx_errors */
>  62 __u64   tx_aborted_errors;
>  was never in struct rte_eth_stats. Not exposed by all drivers
>  63 __u64   tx_carrier_errors;
>  was never in struct rte_eth_stats. Not exposed by all drivers
>  64 __u64   tx_fifo_errors;
>  was never in struct rte_eth_stats. Not exposed by all drivers
>  65 __u64   tx_heartbeat_errors;
>  was never in struct rte_eth_stats. Not exposed by all drivers
>  66 __u64   tx_window_errors;
>  was never in struct rte_eth_stats. Not exposed by all drivers
>  67
>  68 /* for cslip etc */
>  69 __u64   rx_compressed;
>  was never in struct rte_eth_stats. Not

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Tahhan, Maryam

Hi David

+ Olivier and Harry as contributors and advisors in the past on the stats and 
stats API.

So I pulled the rtnl_link_stats64 from 
http://lxr.free-electrons.com/source/include/uapi/linux/if_link.h#L41 and 
crossed them with http://dpdk.org/dev/patchwork/patch/5842/ and 
http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h 

40 /* The main device statistics structure */
 41 struct rtnl_link_stats64 {
 42 __u64   rx_packets; /* total packets received   */  
is in struct rte_eth_stats
 43 __u64   tx_packets; /* total packets transmitted*/  
 is in struct rte_eth_stats
 44 __u64   rx_bytes;   /* total bytes received */  
 is in struct rte_eth_stats
 45 __u64   tx_bytes;   /* total bytes transmitted  */  
  is in struct rte_eth_stats
 46 __u64   rx_errors;  /* bad packets received */  
   I'm working on a patch to distinguish between error and 
drops for struct rte_eth_stats at the moment only drops are reported through 
ierrors and oerrors and we are tied to those field name for backward 
compatibility.
 47 __u64   tx_errors;  /* packet transmit problems */  
  Same comment as rx_errors.
 48 __u64   rx_dropped; /* no space in linux buffers*/  
 is in struct rte_eth_stats
 49 __u64   tx_dropped; /* no space available in linux  */  
is in struct rte_eth_stats
 50 __u64   multicast;  /* multicast packets received   */  
  was deprecated in struct rte_eth_stats - but we can remove the 
notice and expose again from drivers we modified
 51 __u64   collisions;  Not in struct rte_eth_stats - not sure 
it's in xstats either... 
 52 
 53 /* detailed rx_errors: */
 54 __u64   rx_length_errors;   
  was deprecated in struct rte_eth_stats, 
is in xstats - but we can remove the notice and expose again from drivers we 
modified
 55 __u64   rx_over_errors; /* receiver ring buff overflow  */  
   was never exposed to struct rte_eth_stats  - but is in xstats.
 56 __u64   rx_crc_errors;  /* recved pkt with crc error*/  
was deprecated in struct rte_eth_stats, is in xstats - but we can 
remove the notice and expose again from drivers we modified
 57 __u64   rx_frame_errors;/* recv'd frame alignment error */ 
 was never exposed to struct rte_eth_stats - but is in xstats.
 58 __u64   rx_fifo_errors; /* recv'r fifo overrun  */  
  was never in struct rte_eth_stats. Not exposed by all drivers
 59 __u64   rx_missed_errors;   /* receiver missed packet   */  
  was deprecated in struct rte_eth_stats, is in xstats - but we can remove 
the notice
 60 
 61 /* detailed tx_errors */
 62 __u64   tx_aborted_errors;  
was never in struct rte_eth_stats. Not 
exposed by all drivers
 63 __u64   tx_carrier_errors;  
   was never in struct rte_eth_stats. Not 
exposed by all drivers
 64 __u64   tx_fifo_errors; 
  was never in struct rte_eth_stats. 
Not exposed by all drivers
 65 __u64   tx_heartbeat_errors;
  was never in struct rte_eth_stats. Not 
exposed by all drivers
 66 __u64   tx_window_errors;   
   was never in struct rte_eth_stats. Not 
exposed by all drivers
 67 
 68 /* for cslip etc */
 69 __u64   rx_compressed;  
  was never in struct rte_eth_stats. Not 
exposed by all drivers
 70 __u64   tx_compressed;  
  was never in struct rte_eth_stats. Not 
exposed by all drivers
 71 };

So what can be enabled again in struct rte_eth_stats  from what was already 
there is the equivalent of: 
* rx_length_errors
* rx_crc_errors
* rx_missed_errors - the deprecation notice was removed for this field.
* multicast

What should be added in to distinguish between errors and drops. struct 
rte_eth_stats :
 * rx_errors
 * tx_errors

As for the detailed rx errors and tx errors I'm open to feedback from you folks 
as to what should go in and what is too detailed. These weren't in struct 
rte_eth_stats previously, they are available through xstats and are uniformly 
named across the drivers. Oliver + Harry any thoughts?

David I assume you are looking for all the

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-22 Thread Tan, Jianfeng

Hi Amit,

On 1/20/2016 11:19 PM, Amit Tomer wrote:
> Hello,
>
>> For this case, please use --single-file option because it creates much more
>> than 8 fds, which can be handled by vhost-user sendmsg().
> Thanks, I'm able to verify it by sending ARP packet from container to
> host on arm64. But sometimes, I do see following message while running
> l2fwd in container(pointed by Rich).
>
> EAL: Master lcore 0 is ready (tid=8a7a3000;cpuset=[0])
> EAL: lcore 1 is ready (tid=89cdf050;cpuset=[1])
> Notice: odd number of ports in portmask.
> Lcore 0: RX port 0
> Initializing port 0... PANIC in kick_all_vq():
> TUNSETVNETHDRSZ failed: Inappropriate ioctl for device
>
> How it could be avoided?
>
> Thanks,
> Amit.

Thanks for pointing out this bug. Actually it's caused by one of my 
fault. So vhost-user cannot work well.
Below change can help start vhost-user.

diff --git a/drivers/net/virtio/vhost.c b/drivers/net/virtio/vhost.c
index e423e02..dbca374 100644
--- a/drivers/net/virtio/vhost.c
+++ b/drivers/net/virtio/vhost.c
@@ -483,8 +483,9 @@ static void kick_all_vq(struct virtio_hw *hw)
 uint64_t features = hw->guest_features;
 features &= ~(1ull << VIRTIO_NET_F_MAC);
 vhost_call(hw, VHOST_MSG_SET_FEATURES, );
-   if (ioctl(hw->backfd, TUNSETVNETHDRSZ, >vtnet_hdr_size) == -1)
-   rte_panic("TUNSETVNETHDRSZ failed: %s\n", strerror(errno));
+   if (hw->type == VHOST_KERNEL)
+   if (ioctl(hw->backfd, TUNSETVNETHDRSZ, 
>vtnet_hdr_size) == -1)
+   rte_panic("TUNSETVNETHDRSZ failed: %s\n", 
strerror(errno));
 PMD_DRV_LOG(INFO, "set features:%"PRIx64"\n", features);


Thanks,
Jianfeng

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread David Harton (dharton)

Hi Maryam,

Thanks for the pointer.  I'll review the convo's.

Consider I have an application that uses many(all?) of the DPDK drivers and 
their netmap counterparts depending on configuration.  The user interface 
provides a set of I/O stats to help debug I/O issues and that set is the same 
regardless of driver type.  The set of stats provided matches what linux 
provides today since netmap existed before dpdk.

What I want to avoid is having an application that is driver independent having 
to become driver dependent interpreting a bunch of strings (from xstats) or 
worse the layer running at the data plane core that is advertising the API 
needed by the application parsing those strings because the application cannot 
change the upper layer.

What if instead of passing strings and values a set of stat ids and value are 
returned.  At least this way the application can remain driver agnostic versus 
having to parse a free form string that likely isn't the same across driver 
types.

Also, why wouldn't rte_eth_stats_get() align with linux which is the defacto 
standard?  I understand that not every driver may not support every stat but 
that's ok.  Just return 0 (pause frames, crc errors, etc).  It's not like the 
list of linux stats is ever growing or advertising an absurd number of stats 
that aren't applicable to all drivers.

Thanks again,
Dave

> -Original Message-
> From: Tahhan, Maryam [mailto:maryam.tahhan at intel.com]
> Sent: Friday, January 22, 2016 6:08 AM
> To: David Harton (dharton) ; dev at dpdk.org
> Subject: RE: Future Direction for rte_eth_stats_get()
> 
> Hi David
> Some of the stats were HW specific rather than generic stats that should
> be exposed through rte_eth_stats and were migrated to the xstats API.
> http://dpdk.org/ml/archives/dev/2015-June/019915.html. The naming was also
> not generic enough to cover some of the drivers and in some cases what was
> exposed was only a partial view of the relevant stats.
> 
> They were marked as deprecated to deter people from using them in the
> future, but haven't been removed from all the driver implementations yet.
> The Registers that remain undeprecated are those considered to be generic.
> 
> Which registers are you particularly interested in that have been
> deprecated? Can you elaborate on what you mean by " scenarios where a
> static view is expected "
> 
> Thanks in advance.
> 
> Best Regards,
> Maryam
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Harton
> > (dharton)
> > Sent: Wednesday, January 20, 2016 5:19 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Future Direction for rte_eth_stats_get()
> >
> > I see that some of the rte_eth_stats have been marked deprecated in 2.2
> > that are returned by rte_eth_stats_get().  Applications that utilize any
> > number of device types rely on functions like this one to debug I/O
> > issues.
> >
> > Is there a reason the stats have been deprecated?  Why not keep the
> > stats in line with the standard linux practices such as
> rtnl_link_stats64?
> >
> > Note, using rte_eth_xstats_get() does not help for this particular
> scenario
> > because a common binary API is needed to communicate through
> > various layers and also provide a consistent view/meaning to users.  The
> > xstats is excellent for debugging device specific scenarios but can't
> help
> > in scenarios where a static view is expected.
> >
> > Thanks,
> > Dave

[dpdk-dev] [PATCH 1/2] mbuf: Add rte_pktmbuf_copy

2016-01-22 Thread Mrzyglod, DanielX T



>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
>Sent: Friday, July 10, 2015 1:38 AM
>To: dev at dpdk.org
>Cc: Mike Davison ; Stephen Hemminger
>
>Subject: [dpdk-dev] [PATCH 1/2] mbuf: Add rte_pktmbuf_copy
>
>From: Stephen Hemminger 
>
>Added rte_pktmbuf_copy() function since copying multi-part
>segments is common issue and can be problematic.
>
>Signed-off-by: Mike Davison 
>Reviewed-by: Stephen Hemminger 
>---
> lib/librte_mbuf/rte_mbuf.h | 59
>++
> 1 file changed, 59 insertions(+)
>
>diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>index 80419df..f0a543b 100644
>--- a/lib/librte_mbuf/rte_mbuf.h
>+++ b/lib/librte_mbuf/rte_mbuf.h
>@@ -60,6 +60,7 @@
> #include 
> #include 
> #include 
>+#include 
>
> #ifdef __cplusplus
> extern "C" {
>@@ -1272,6 +1273,64 @@ static inline int rte_pktmbuf_is_contiguous(const
>struct rte_mbuf *m)
>   return !!(m->nb_segs == 1);
> }
>
>+/*
>+ * Creates a copy of the given packet mbuf.
>+ *
>+ * Walks through all segments of the given packet mbuf, and for each of them:
>+ *  - Creates a new packet mbuf from the given pool.
>+ *  - Copy segment to newly created mbuf.
>+ * Then updates pkt_len and nb_segs of the new packet mbuf to match values
>+ * from the original packet mbuf.
>+ *
>+ * @param md
>+ *   The packet mbuf to be copied.
>+ * @param mp
>+ *   The mempool from which the mbufs are allocated.
>+ * @return
>+ *   - The pointer to the new mbuf on success.
>+ *   - NULL if allocation fails.
>+ */
>+static inline struct rte_mbuf *rte_pktmbuf_copy(struct rte_mbuf *md,
>+  struct rte_mempool *mp)
>+{
>+  struct rte_mbuf *mc = NULL;
>+  struct rte_mbuf **prev = 
>+
>+  do {
>+  struct rte_mbuf *mi;
>+
>+  mi = rte_pktmbuf_alloc(mp);
>+  if (unlikely(mi == NULL)) {
>+  rte_pktmbuf_free(mc);
>+  return NULL;
>+  }
>+
>+  mi->data_off = md->data_off;
>+  mi->data_len = md->data_len;
>+  mi->port = md->port;
>+  mi->vlan_tci = md->vlan_tci;
>+  mi->tx_offload = md->tx_offload;
>+  mi->hash = md->hash;
>+
>+  mi->next = NULL;
>+  mi->pkt_len = md->pkt_len;
>+  mi->nb_segs = md->nb_segs;
>+  mi->ol_flags = md->ol_flags;
>+  mi->packet_type = md->packet_type;
>+
>+  rte_memcpy(rte_pktmbuf_mtod(mi, char *),
>+ rte_pktmbuf_mtod(md, char *),
>+ md->data_len);
>+
>+  *prev = mi;
>+  prev = >next;
>+  } while ((md = md->next) != NULL);
>+
>+  *prev = NULL;
>+  __rte_mbuf_sanity_check(mc, 1);
>+  return mc;
>+}
>+
> /**
>  * Dump an mbuf structure to the console.
>  *
>--
>2.1.4

Hi Stephen :>
This patch look useful in case of backup buffs. 
There will be second approach ?

[dpdk-dev] [PATCH 15/16] fm10k: use default mailbox message handler for pf

2016-01-22 Thread Bruce Richardson

On Thu, Jan 21, 2016 at 06:36:00PM +0800, Wang Xiao W wrote:
> The new share code makes fm10k_msg_update_pvid_pf function static, so we can
> not refer to it now in fm10k_ethdev.c. The registered pf handler is almost the
> same as the default pf handler, removing it has no impact on mailbox.
> 
> Signed-off-by: Wang Xiao W 

What patch makes the function static, as we need to ensure that the build is
not broken by having this patch in the wrong place in the patchset?

Also, it seems strange having this patch in the middle of a series of base code
updates - perhaps it should go first, so that all base code update patches can
go one after the other.

/Bruce

> ---
>  drivers/net/fm10k/fm10k_ethdev.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index e967628..a118cf4 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -2367,29 +2367,16 @@ static const struct fm10k_msg_data fm10k_msgdata_vf[] 
> = {
>   FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
>  };
>  
> -/* Mailbox message handler in PF */
> -static const struct fm10k_msg_data fm10k_msgdata_pf[] = {
> - FM10K_PF_MSG_ERR_HANDLER(XCAST_MODES, fm10k_msg_err_pf),
> - FM10K_PF_MSG_ERR_HANDLER(UPDATE_MAC_FWD_RULE, fm10k_msg_err_pf),
> - FM10K_PF_MSG_LPORT_MAP_HANDLER(fm10k_msg_lport_map_pf),
> - FM10K_PF_MSG_ERR_HANDLER(LPORT_CREATE, fm10k_msg_err_pf),
> - FM10K_PF_MSG_ERR_HANDLER(LPORT_DELETE, fm10k_msg_err_pf),
> - FM10K_PF_MSG_UPDATE_PVID_HANDLER(fm10k_msg_update_pvid_pf),
> - FM10K_TLV_MSG_ERROR_HANDLER(fm10k_tlv_msg_error),
> -};
> -
>  static int
>  fm10k_setup_mbx_service(struct fm10k_hw *hw)
>  {
> - int err;
> + int err = 0;
>  
>   /* Initialize mailbox lock */
>   fm10k_mbx_initlock(hw);
>  
>   /* Replace default message handler with new ones */
> - if (hw->mac.type == fm10k_mac_pf)
> - err = hw->mbx.ops.register_handlers(>mbx, fm10k_msgdata_pf);
> - else
> + if (hw->mac.type == fm10k_mac_vf)
>   err = hw->mbx.ops.register_handlers(>mbx, fm10k_msgdata_vf);
>  
>   if (err) {
> -- 
> 1.9.3
>

[dpdk-dev] [RFC PATCH 4/5] EAL: Add new EAL "--shm" option.

2016-01-22 Thread Tetsuya Mukawa

On 2016/01/22 11:07, Tan, Jianfeng wrote:
> Hi Tetsuya,
>
> On 1/22/2016 9:43 AM, Tan, Jianfeng wrote:
>> Hi Tetsuya,
>>
>> On 1/21/2016 7:07 PM, Tetsuya Mukawa wrote:
>>> This is a temporary patch to get EAL memory under 16T(1 << 44).
>>>
>>> The patch adds new EAL "--shm" option. If the option is specified,
>>> EAL will allocate one file from hugetlbfs. This memory is for sharing
>>> memory between DPDK applicaiton and QEMU ivhsmem device.
>>>
>>> Signed-off-by: Tetsuya Mukawa 
>>> ---
>>>   lib/librte_eal/common/eal_common_options.c |  5 ++
>>>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>>>   lib/librte_eal/common/eal_options.h|  2 +
>>>   lib/librte_eal/common/include/rte_memory.h |  5 ++
>>>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 76
>>> ++
>>>   5 files changed, 89 insertions(+)
>>>
>>
> ...
>>> +vaddr = mmap((void *)(1ULL << 43), size, PROT_READ | PROT_WRITE,
>>> +MAP_SHARED | MAP_FIXED, fd, 0);
>>> +if (vaddr != MAP_FAILED) {
>>> +memset(vaddr, 0, size);
>>> +*pfd = fd;
>>> +}
>>
>> I'm not sure if hard-coded way is good enough. It's known that kernel
>> manages VMAs using red-black tree, but I don't know if kernel
>> allocates VMA from low address to high address (if yes, can we
>> leverage this feature?).
>>
>

Hi Jianfeng,

Thanks for comments.
Yes, this "--shm" patch is totally crap, and this patch series is just
for describing how to implement an access method abstraction layer and
how to use it. Main purpose of this patch series is this.

But without this "--shm" patch, if someone wants to test my patch, they
cannot do it.
Because of this, I just involved this patch.
I guess you have already implemented almost same feature in EAL, I will
use it when I submit my container patch.
If we can agree how to implement the access method abstraction layer, I
will submit my container patch separately.

> A little more:it seems that kernel uses
> arch_get_unmapped_area_topdown() -> unmapped_area_topdown() to do
> that, which starts at mm->highest_vm_end. If this value bigger than
> (1ULL << 44)?
>

Yes, I specified MAP_FIXED as mmap parameter.
Then arch_get_unmapped_area_topdown() will not go "unmapped_area_topdown()".
Anyway, we will need to check /proc/self/maps to determine where we can
mmap the memory.

Thanks,
Tetsuya

[dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...

2016-01-22 Thread Panu Matilainen

On 01/21/2016 12:51 PM, Ananyev, Konstantin wrote:
> Hi Panu,
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
>> Sent: Thursday, January 21, 2016 10:39 AM
>> To: Andralojc, WojciechX
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel 
>> Architecture Model Specific Registers (MSR)...
>>
>> On 01/21/2016 10:18 AM, Wojciech Andralojc wrote:
>>> Patch reworked.
>>>
>>> Signed-off-by: Wojciech Andralojc 
>>> ---
>>>lib/librte_eal/common/include/arch/x86/rte_msr.h |  88 +
>>>lib/librte_eal/linuxapp/eal/Makefile |   1 +
>>>lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c   | 116 
>>> +++
>>>3 files changed, 205 insertions(+)
>>>create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h
>>>create mode 100644 lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c
>>
>> This creates a new arch-specific public API, with rte_msr.h installed as
>> a public header and implementation in the library (as opposed to
>> inline), and so the new functions would have to be added to
>> rte_eal_version.map.
>>
>> However that is a bit of a problem since it only exists on IA
>> architectures, so it'd mean dummy entries in the version map for all
>> other architectures. All the other arch-specific APIs are inline code so
>> this is the first of its kind.
>
> My thought was:
> 1. implementation is linux specific (as I know not supposed to work under 
> freebsd).
> 2. they are not supposed to be used at run-time cide-path, so no need to be 
> inlined.

Speed is not the only interesting attribute of inlining, inlined code 
also effectively escapes the library ABI so it does not need versioning 
/ exporting.

> 3. As I understand we plan to  have a library that will use these functions 
> anyway.

Is this library going to be a generic or specific to Intel CPUs?
Also, if there are no other uses for the MSR API at the moment, perhaps 
the best place for this code would be within that library anyway?

> About dummy entries in the .map file: if we'll create a 'weak' generic 
> implementation,
> that would just return an error - would it solve the issue?

Sure it'd solve the issue of dummy entries in .map but people seemed 
opposed to having a highly arch-specific API exposed to all architectures.

- Panu -

[dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...

2016-01-22 Thread Andralojc, WojciechX

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Friday, January 22, 2016 11:05 AM
> To: Panu Matilainen; Andralojc, WojciechX
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel
> Architecture Model Specific Registers (MSR)...
> 
> > -Original Message-
> > From: Panu Matilainen [mailto:pmatilai at redhat.com]
> > Sent: Friday, January 22, 2016 10:06 AM
> > To: Ananyev, Konstantin; Andralojc, WojciechX
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel
> Architecture Model Specific Registers (MSR)...
> >
> > On 01/21/2016 12:51 PM, Ananyev, Konstantin wrote:
> > > Hi Panu,
> > >
> > >> -Original Message-
> > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu
> > >> Matilainen
> > >> Sent: Thursday, January 21, 2016 10:39 AM
> > >> To: Andralojc, WojciechX
> > >> Cc: dev at dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v3] Patch introducing API to read/write
> Intel Architecture Model Specific Registers (MSR)...
> > >>
> > >> On 01/21/2016 10:18 AM, Wojciech Andralojc wrote:
> > >>> Patch reworked.
> > >>>
> > >>> Signed-off-by: Wojciech Andralojc 
> > >>> ---
> > >>>lib/librte_eal/common/include/arch/x86/rte_msr.h |  88
> +
> > >>>lib/librte_eal/linuxapp/eal/Makefile |   1 +
> > >>>lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c   | 116
> +++
> > >>>3 files changed, 205 insertions(+)
> > >>>create mode 100644
> lib/librte_eal/common/include/arch/x86/rte_msr.h
> > >>>create mode 100644
> > >>> lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c
> > >>
> > >> This creates a new arch-specific public API, with rte_msr.h
> > >> installed as a public header and implementation in the library (as
> > >> opposed to inline), and so the new functions would have to be added
> > >> to rte_eal_version.map.
> > >>
> > >> However that is a bit of a problem since it only exists on IA
> > >> architectures, so it'd mean dummy entries in the version map for
> > >> all other architectures. All the other arch-specific APIs are
> > >> inline code so this is the first of its kind.
> > >
> > > My thought was:
> > > 1. implementation is linux specific (as I know not supposed to work under
> freebsd).
> > > 2. they are not supposed to be used at run-time cide-path, so no need to 
> > > be
> inlined.
> >
> > Speed is not the only interesting attribute of inlining, inlined code
> > also effectively escapes the library ABI so it does not need
> > versioning / exporting.
> >
> > > 3. As I understand we plan to  have a library that will use these 
> > > functions
> anyway.
> >
> > Is this library going to be a generic or specific to Intel CPUs?
> 
> As I understand - yes.
> It supposed to utilise new Intel chips CAT abilities.
> Wojciech, please correct me if I missed something here.

Konstantin, yes, you are right,
CAT enabling lib is Intel CPUs specific.

> 
> > Also, if there are no other uses for the MSR API at the moment,
> > perhaps the best place for this code would be within that library anyway?
> 
> Sounds good to me.

OK, sounds good.
Thank you both for your input.

> 
> Konstantin
> 
> >
> > > About dummy entries in the .map file: if we'll create a 'weak'
> > > generic implementation, that would just return an error - would it solve 
> > > the
> issue?
> >
> > Sure it'd solve the issue of dummy entries in .map but people seemed
> > opposed to having a highly arch-specific API exposed to all architectures.
> >
> > - Panu -
> >

Wojciech Andralojc

[dpdk-dev] [PATCH 1/3] i40e: enable extended tag

2016-01-22 Thread Thomas Monjalon

2015-12-21 10:38, Helin Zhang:
> PCIe feature of 'Extended Tag' is important for 40G performance.
> It adds its enabling during each port initialization, to ensure
> the high performance.

If it's so important, why the values are not documented?
Please start to fill a file doc/guides/nics/i40e.rst to explain
how the device works.
Thanks

[dpdk-dev] [PATCH] i40e: add VEB switching support for i40e

2016-01-22 Thread Thomas Monjalon

2016-01-21 14:49, Zhe Tao:
> VEB switching feature for i40e is used to enable the switching between the
>  VSIs connect to the virtual bridge. The old implementation is setting the
>  virtual bridge mode as VEPA which is port aggregation. Enable the switching 
>  ability by setting the loop back mode for the specific VSIs which connect to 
> PF
>  or VFs. 
> 
> Signed-off-by: Zhe Tao 

Is VEB something specific to i40e?
Please explain the acronyms VEB, VSI and VEPA in the commit log or in a
doc dedicated to i40e. Having some design explanations in a doc would help
to understand the changes and to contribute.
Thanks

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Tahhan, Maryam

Hi David
Some of the stats were HW specific rather than generic stats that should be 
exposed through rte_eth_stats and were migrated to the xstats API. 
http://dpdk.org/ml/archives/dev/2015-June/019915.html. The naming was also not 
generic enough to cover some of the drivers and in some cases what was exposed 
was only a partial view of the relevant stats.

They were marked as deprecated to deter people from using them in the future, 
but haven't been removed from all the driver implementations yet. The Registers 
that remain undeprecated are those considered to be generic.

Which registers are you particularly interested in that have been deprecated? 
Can you elaborate on what you mean by " scenarios where a static view is 
expected "

Thanks in advance.

Best Regards, 
Maryam


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Harton
> (dharton)
> Sent: Wednesday, January 20, 2016 5:19 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Future Direction for rte_eth_stats_get()
> 
> I see that some of the rte_eth_stats have been marked deprecated in 2.2
> that are returned by rte_eth_stats_get().  Applications that utilize any
> number of device types rely on functions like this one to debug I/O
> issues.
> 
> Is there a reason the stats have been deprecated?  Why not keep the
> stats in line with the standard linux practices such as rtnl_link_stats64?
> 
> Note, using rte_eth_xstats_get() does not help for this particular scenario
> because a common binary API is needed to communicate through
> various layers and also provide a consistent view/meaning to users.  The
> xstats is excellent for debugging device specific scenarios but can't help
> in scenarios where a static view is expected.
> 
> Thanks,
> Dave

[dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR)...

2016-01-22 Thread Ananyev, Konstantin



> -Original Message-
> From: Panu Matilainen [mailto:pmatilai at redhat.com]
> Sent: Friday, January 22, 2016 10:06 AM
> To: Ananyev, Konstantin; Andralojc, WojciechX
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] Patch introducing API to read/write Intel 
> Architecture Model Specific Registers (MSR)...
> 
> On 01/21/2016 12:51 PM, Ananyev, Konstantin wrote:
> > Hi Panu,
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> >> Sent: Thursday, January 21, 2016 10:39 AM
> >> To: Andralojc, WojciechX
> >> Cc: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3] Patch introducing API to read/write 
> >> Intel Architecture Model Specific Registers (MSR)...
> >>
> >> On 01/21/2016 10:18 AM, Wojciech Andralojc wrote:
> >>> Patch reworked.
> >>>
> >>> Signed-off-by: Wojciech Andralojc 
> >>> ---
> >>>lib/librte_eal/common/include/arch/x86/rte_msr.h |  88 
> >>> +
> >>>lib/librte_eal/linuxapp/eal/Makefile |   1 +
> >>>lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c   | 116 
> >>> +++
> >>>3 files changed, 205 insertions(+)
> >>>create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h
> >>>create mode 100644 lib/librte_eal/linuxapp/eal/arch/x86/rte_msr.c
> >>
> >> This creates a new arch-specific public API, with rte_msr.h installed as
> >> a public header and implementation in the library (as opposed to
> >> inline), and so the new functions would have to be added to
> >> rte_eal_version.map.
> >>
> >> However that is a bit of a problem since it only exists on IA
> >> architectures, so it'd mean dummy entries in the version map for all
> >> other architectures. All the other arch-specific APIs are inline code so
> >> this is the first of its kind.
> >
> > My thought was:
> > 1. implementation is linux specific (as I know not supposed to work under 
> > freebsd).
> > 2. they are not supposed to be used at run-time cide-path, so no need to be 
> > inlined.
> 
> Speed is not the only interesting attribute of inlining, inlined code
> also effectively escapes the library ABI so it does not need versioning
> / exporting.
> 
> > 3. As I understand we plan to  have a library that will use these functions 
> > anyway.
> 
> Is this library going to be a generic or specific to Intel CPUs?

As I understand - yes.
It supposed to utilise new Intel chips CAT abilities. 
Wojciech, please correct me if I missed something here.

> Also, if there are no other uses for the MSR API at the moment, perhaps
> the best place for this code would be within that library anyway?

Sounds good to me.

Konstantin

> 
> > About dummy entries in the .map file: if we'll create a 'weak' generic 
> > implementation,
> > that would just return an error - would it solve the issue?
> 
> Sure it'd solve the issue of dummy entries in .map but people seemed
> opposed to having a highly arch-specific API exposed to all architectures.
> 
>   - Panu -
>

[dpdk-dev] [PATCH] vfio/noiommu: Don't use iommu_present() to track fake groups

2016-01-22 Thread Alex Williamson

Using iommu_present() to determine whether an IOMMU group is real or
fake has some problems.  First, apparently Power systems don't
register an IOMMU on the device bus, so the groups and containers get
marked as noiommu and then won't bind to their actual IOMMU driver.
Second, I expect we'll run into the same issue as we try to support
vGPUs through vfio, since they're likely to emulate this behavior of
creating an IOMMU group on a virtual device and then providing a vfio
IOMMU backend tailored to the sort of isolation they provide, which
won't necessarily be fully compatible with the IOMMU API.

The solution here is to use the existing iommudata interface to IOMMU
groups, which allows us to easily identify the fake groups we've
created for noiommu purposes.  The iommudata we set is purely
arbitrary since we're only comparing the address, so we use the
address of the noiommu switch itself.

Reported-by: Alexey Kardashevskiy 
Fixes: 03a76b60f8ba ("vfio: Include No-IOMMU mode")
Signed-off-by: Alex Williamson 
---

Copying some DPDK folks and would appreciate validation that this
still works for the intended no-iommu use case.  Thanks!

 drivers/vfio/vfio.c |   24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82f25cc..ecca316 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -123,8 +123,8 @@ struct iommu_group *vfio_iommu_group_get(struct device *dev)
/*
 * With noiommu enabled, an IOMMU group will be created for a device
 * that doesn't already have one and doesn't have an iommu_ops on their
-* bus.  We use iommu_present() again in the main code to detect these
-* fake groups.
+* bus.  We set iommudata simply to be able to identify these groups
+* as special use and for reclamation later.
 */
if (group || !noiommu || iommu_present(dev->bus))
return group;
@@ -134,6 +134,7 @@ struct iommu_group *vfio_iommu_group_get(struct device *dev)
return NULL;

iommu_group_set_name(group, "vfio-noiommu");
+   iommu_group_set_iommudata(group, , NULL);
ret = iommu_group_add_device(group, dev);
iommu_group_put(group);
if (ret)
@@ -158,7 +159,7 @@ EXPORT_SYMBOL_GPL(vfio_iommu_group_get);
 void vfio_iommu_group_put(struct iommu_group *group, struct device *dev)
 {
 #ifdef CONFIG_VFIO_NOIOMMU
-   if (!iommu_present(dev->bus))
+   if (iommu_group_get_iommudata(group) == )
iommu_group_remove_device(dev);
 #endif

@@ -190,16 +191,10 @@ static long vfio_noiommu_ioctl(void *iommu_data,
return -ENOTTY;
 }

-static int vfio_iommu_present(struct device *dev, void *unused)
-{
-   return iommu_present(dev->bus) ? 1 : 0;
-}
-
 static int vfio_noiommu_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
-   return iommu_group_for_each_dev(iommu_group, NULL,
-   vfio_iommu_present) ? -EINVAL : 0;
+   return iommu_group_get_iommudata(iommu_group) ==  ? 0 : -EINVAL;
 }

 static void vfio_noiommu_detach_group(void *iommu_data,
@@ -323,8 +318,7 @@ static void vfio_group_unlock_and_free(struct vfio_group 
*group)
 /**
  * Group objects - create, release, get, put, search
  */
-static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
-   bool iommu_present)
+static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group)
 {
struct vfio_group *group, *tmp;
struct device *dev;
@@ -342,7 +336,9 @@ static struct vfio_group *vfio_create_group(struct 
iommu_group *iommu_group,
atomic_set(>container_users, 0);
atomic_set(>opened, 0);
group->iommu_group = iommu_group;
-   group->noiommu = !iommu_present;
+#ifdef CONFIG_VFIO_NOIOMMU
+   group->noiommu = (iommu_group_get_iommudata(iommu_group) == );
+#endif

group->nb.notifier_call = vfio_iommu_group_notifier;

@@ -767,7 +763,7 @@ int vfio_add_group_dev(struct device *dev,

group = vfio_group_get_from_iommu(iommu_group);
if (!group) {
-   group = vfio_create_group(iommu_group, iommu_present(dev->bus));
+   group = vfio_create_group(iommu_group);
if (IS_ERR(group)) {
iommu_group_put(iommu_group);
return PTR_ERR(group);

[dpdk-dev] [dpdk-dev,1/2] ethdev: remove useless null checks

2016-01-22 Thread Thomas Monjalon

2016-01-21 20:02, Jan Viktorin:
> On Thu, 21 Jan 2016 12:57:10 +0100
> David Marchand  wrote:
> > -   if ((name == NULL) || (pci_dev == NULL))
> > -   return -EINVAL;
> 
> Do you use a kind of assert in DPDK? The patch looks OK, however, I
> would prefer something like
> 
>   assert_not_null(name);
>   assert_not_null(pci_dev);
> 
> Usually, if some outer code is broken by mistake, the assert catches
> such an issue. At the same time, it documents the code by telling
> "this must never be NULL here". I agree, that returning -EINVAL for
> this kind of check is incorrect.
> 
> Same for other changes...

For this patch, I agree to remove useless checks.
For the other checks, EINVAL seems to be the right error code.
And yes you are right, we could replace most of EINVAL returns by a kind
of assert. RTE_VERIFY would be appropriate.

[dpdk-dev] [RFC PATCH 4/5] EAL: Add new EAL "--shm" option.

2016-01-22 Thread Tan, Jianfeng

Hi Tetsuya,

On 1/22/2016 9:43 AM, Tan, Jianfeng wrote:
> Hi Tetsuya,
>
> On 1/21/2016 7:07 PM, Tetsuya Mukawa wrote:
>> This is a temporary patch to get EAL memory under 16T(1 << 44).
>>
>> The patch adds new EAL "--shm" option. If the option is specified,
>> EAL will allocate one file from hugetlbfs. This memory is for sharing
>> memory between DPDK applicaiton and QEMU ivhsmem device.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>   lib/librte_eal/common/eal_common_options.c |  5 ++
>>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>>   lib/librte_eal/common/eal_options.h|  2 +
>>   lib/librte_eal/common/include/rte_memory.h |  5 ++
>>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 76 
>> ++
>>   5 files changed, 89 insertions(+)
>>
>
...
>> +vaddr = mmap((void *)(1ULL << 43), size, PROT_READ | PROT_WRITE,
>> +MAP_SHARED | MAP_FIXED, fd, 0);
>> +if (vaddr != MAP_FAILED) {
>> +memset(vaddr, 0, size);
>> +*pfd = fd;
>> +}
>
> I'm not sure if hard-coded way is good enough. It's known that kernel 
> manages VMAs using red-black tree, but I don't know if kernel 
> allocates VMA from low address to high address (if yes, can we 
> leverage this feature?).
>

A little more:it seems that kernel uses arch_get_unmapped_area_topdown() 
-> unmapped_area_topdown() to do that, which starts at 
mm->highest_vm_end. If this value bigger than (1ULL << 44)?

Thanks,
Jianfeng

[dpdk-dev] ixgbevf does not recover from pf reset

2016-01-22 Thread David Marchand

Hello,

On Fri, Jan 22, 2016 at 3:05 AM, Wu, Jingjing  wrote:
> Hi, David
>
> We also noticed this issue before. And we are planning to fix this issue.
> And the patch for i40e is ready:
> http://dpdk.org/dev/patchwork/patch/9832/
> http://dpdk.org/dev/patchwork/patch/9833/
>
> The solution is that: PF reset interrupt will be captured by DPDK VF, then 
> DPDK
> Call the application's callback function to reset the VF device. For i40e 
> device,
> application need to detach and re-attach the device.
>
> For ixgbe, I think the solution would be similar.

Ok, so to handle a problem in hardware, the application must implement
a new mechanism to recover the port ?
I don't find this solution really elegant ...

This should be handled by the pmd itself.


Regards,
-- 
David Marchand

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Jay Rolette

On Fri, Jan 22, 2016 at 9:22 AM, Van Haaren, Harry <
harry.van.haaren at intel.com> wrote:

> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Subject: Re: [dpdk-dev] Future Direction for rte_eth_stats_get()
>
> > + Harry
>
> Hey all,
>
> > xstats are driver agnostic and have a well-defined naming scheme.
>
> Indeed, described here:
>
> http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#extended-statistics-api
>
> All of the rte_eth_stats statistics are presented in xstats consistently
> (independent of which PMD is running), as they are implemented in
> rte_ethdev.c:
> http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.c#n1500
>
>
> From: David Harton:
> > For example, what if there was a kind of "stats registry" composed of ID
> and name.  It
> > would work similar to xtats except instead of advertising strings only
> the "get" API would
> > return ID/count pairs.  If the application wishes to use the DPDK
> provided string they can
> > call another API to get the stat string via the ID.  These IDs would be
> well-defined
> > clearly explaining what the count represent.  This way the strings for
> counts will be
> > uniform across drivers and also make it clear to the users what the
> counts truly represent
> > and the application could obtain stats from any driver in a driver
> agnostic manner.
>
> What you (David Harton) describe about a well-defined name, and clearly
> explaining the value
> it is representing is what the goal was for xstats: a human readable
> string, which can be
> easily parsed and a value retrieved.
>
> The "rules" of encoding a statistic as an xstats string are pretty simple,
> and if any PMD
> breaks the rules, it should be considered broken and fixed.
>
> Consistency is key, in this case consistency in the PMD xstats
> implementations to make it
> useful for clients of the API.
>
> The big advantage xstats has over adding items to rte_eth_stats is that it
> won't break ABI.
>
> Do you see a fundamental problem with parsing the strings to retrieve
> values?
>

When you have an appliance with a large number of ports and a large number
of stats that you need to collect, having to parse those strings constantly
is very slow compared to having fixed struct members or at least known ID
values.

Strings are also error prone. While they may be consistent at this point,
it is remarkably easy for them to get out of sync along the way. Using an
ID instead avoids that neatly.

Jay

[dpdk-dev] [RFC PATCH 4/5] EAL: Add new EAL "--shm" option.

2016-01-22 Thread Tan, Jianfeng

Hi Tetsuya,

On 1/21/2016 7:07 PM, Tetsuya Mukawa wrote:
> This is a temporary patch to get EAL memory under 16T(1 << 44).
>
> The patch adds new EAL "--shm" option. If the option is specified,
> EAL will allocate one file from hugetlbfs. This memory is for sharing
> memory between DPDK applicaiton and QEMU ivhsmem device.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>   lib/librte_eal/common/eal_common_options.c |  5 ++
>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>   lib/librte_eal/common/eal_options.h|  2 +
>   lib/librte_eal/common/include/rte_memory.h |  5 ++
>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 76 
> ++
>   5 files changed, 89 insertions(+)
>
...
>   }
>   
> +int
> +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr)
> +{
> + struct rte_mem_config *mcfg;
> + mcfg = rte_eal_get_configuration()->mem_config;
> +
> + if (pfd != NULL)
> + *pfd = mcfg->memseg[index].fd;
> + if (psize != NULL)
> + *psize = (uint64_t)mcfg->memseg[index].len;
> + if (paddr != NULL)
> + *paddr = (void *)(uint64_t)mcfg->memseg[index].addr;
> + return 0;
> +}

In my patch, I introduce another API to get memseg info. In my mind, no 
reason to keep those FDs open. How do you think?

> +
>   /*
>* Get physical address of any mapped virtual address in the current 
> process.
>*/
> @@ -1075,6 +1090,46 @@ calc_num_pages_per_socket(uint64_t * memory,
>   return total_num_pages;
>   }
>   
> +static void *
> +rte_eal_shm_create(int *pfd, const char *hugedir)
> +{
> + int ret, fd;
> + char filepath[256];
> + void *vaddr;
> + uint64_t size = internal_config.memory;
> +
> + sprintf(filepath, "%s/%s_cvio", hugedir,
> + internal_config.hugefile_prefix);
> +
> + fd = open(filepath, O_CREAT | O_RDWR, 0600);
> + if (fd < 0)
> + rte_panic("open %s failed: %s\n", filepath, strerror(errno));
> +
> + ret = flock(fd, LOCK_EX);
> + if (ret < 0) {
> + close(fd);
> + rte_panic("flock %s failed: %s\n", filepath, strerror(errno));
> + }
> +
> + ret = ftruncate(fd, size);
> + if (ret < 0)
> + rte_panic("ftruncate failed: %s\n", strerror(errno));
> +
> + /*
> +  * Here, we need to map under (1 << 44).
> +  * This is temporary implementation.
> +  */
> + vaddr = mmap((void *)(1ULL << 43), size, PROT_READ | PROT_WRITE,
> + MAP_SHARED | MAP_FIXED, fd, 0);
> + if (vaddr != MAP_FAILED) {
> + memset(vaddr, 0, size);
> + *pfd = fd;
> + }

I'm not sure if hard-coded way is good enough. It's known that kernel 
manages VMAs using red-black tree, but I don't know if kernel allocates 
VMA from low address to high address (if yes, can we leverage this 
feature?).

> + memset(vaddr, 0, size);
> +
> + return vaddr;
> +}
> +
>   /*
>* Prepare physical memory mapping: fill configuration structure with
>* these infos, return 0 on success.
> @@ -1127,6 +1182,27 @@ rte_eal_hugepage_init(void)
>   return 0;
>   }
>   
> + /* create shared memory consist of only one file */
> + if (internal_config.shm) {
> + int fd;
> + struct hugepage_info *hpi;
> +
> + hpi = _config.hugepage_info[0];
> + addr = rte_eal_shm_create(, hpi->hugedir);
> + if (addr == MAP_FAILED) {
> + RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> + strerror(errno));
> + return -1;
> + }
> + mcfg->memseg[0].phys_addr = rte_mem_virt2phy(addr);
> + mcfg->memseg[0].addr = addr;
> + mcfg->memseg[0].hugepage_sz = hpi->hugepage_sz;
> + mcfg->memseg[0].len = internal_config.memory;
> + mcfg->memseg[0].socket_id = 0;

As pointed out in my patchset, hard-coded socket_id into 0 may lead to 
failure. Do you have any better idea?

Thanks,
Jianfeng

[dpdk-dev] [PATCH] eal: add function to check if primary proc alive

2016-01-22 Thread Bruce Richardson

On Thu, Jan 21, 2016 at 09:02:41AM +, Van Haaren, Harry wrote:
> > From: Qiu, Michael
> > Sent: Thursday, January 21, 2016 6:14 AM
> > To: Van Haaren, Harry ; david.marchand at 
> > 6wind.com
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc 
> > alive
> > 
> > As we could start up many primaries, how does your secondary process
> > work with them?
> 
> When a primary process initializes, the location of the config file is 
> important. The default is /var/run/.rte_config
> 
> To run multiple primary processes, the --file-prefix= option is used to 
> specific a custom location for the config file. Eg: --file-prefix=testing
> /var/run/.testing_config
> 
> The rte_eal_check_primary_alive(const char*) function takes a char* parameter 
> - this is the location of the config file that the secondary process will 
> wait for. Setting it to the correct value will make this secondary process 
> wait for the corresponding primary process.
> 
> Regards, -Harry

Since a given secondary process only works with a single primary process, I'm 
not
sure why the user should want or need to pass in this parameter. What's the use
case for a secondary process wanting to know about a different primary process?
The details of what the config file is should largely be hidden from the user
IMHO.

If you want to allow a secondary to query an arbitrary primary process can you
still allow a NULL string to query the default primary based on the passed in
file-prefix parameter (if any)?

/Bruce

[dpdk-dev] [PATCH 2/2] i40e: add VLAN ether type config

2016-01-22 Thread Helin Zhang

It adds the setting VLAN ether type of single VLAN, inner and
outer VLAN. Single VLAN is treated as inner VLAN as usual.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 55 +++---
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 56ed28d..d97270d 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -273,6 +273,11 @@
 #define I40E_INSET_IPV6_TC_MASK   0x0009F00FUL
 #define I40E_INSET_IPV6_NEXT_HDR_MASK 0x000C00FFUL

+#define I40E_GL_SWT_L2TAGCTRL(_i) (0x001C0A70 + ((_i) * 4))
+#define I40E_GL_SWT_L2TAGCTRL_ETHERTYPE_SHIFT 16
+#define I40E_GL_SWT_L2TAGCTRL_ETHERTYPE_MASK  \
+   I40E_MASK(0x, I40E_GL_SWT_L2TAGCTRL_ETHERTYPE_SHIFT)
+
 static int eth_i40e_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_i40e_dev_uninit(struct rte_eth_dev *eth_dev);
 static int i40e_dev_configure(struct rte_eth_dev *dev);
@@ -2321,11 +2326,53 @@ i40e_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
 }

 static void
-i40e_vlan_tpid_set(__rte_unused struct rte_eth_dev *dev,
-  __rte_unused enum rte_vlan_type vlan_type,
-  __rte_unused uint16_t tpid)
+i40e_vlan_tpid_set(struct rte_eth_dev *dev,
+  enum rte_vlan_type vlan_type,
+  uint16_t tpid)
 {
-   PMD_INIT_FUNC_TRACE();
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint64_t reg_r = 0, reg_w = 0;
+   uint16_t reg_id = 0;
+   int ret;
+
+   switch (vlan_type) {
+   case ETH_VLAN_TYPE_OUTER:
+   reg_id = 2;
+   break;
+   case ETH_VLAN_TYPE_INNER:
+   reg_id = 3;
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "Unsupported vlan type %d", vlan_type);
+   return;
+   }
+
+   ret = i40e_aq_debug_read_register(hw, I40E_GL_SWT_L2TAGCTRL(reg_id),
+ _r, NULL);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Fail to debug read from "
+   "I40E_GL_SWT_L2TAGCTRL[%d]", reg_id);
+   return;
+   }
+   PMD_DRV_LOG(DEBUG, "Debug read from I40E_GL_SWT_L2TAGCTRL[%d]: "
+   "0x%08"PRIx64"", reg_id, reg_r);
+
+   reg_w = reg_r & (~(I40E_GL_SWT_L2TAGCTRL_ETHERTYPE_MASK));
+   reg_w |= ((uint64_t)tpid << I40E_GL_SWT_L2TAGCTRL_ETHERTYPE_SHIFT);
+   if (reg_r == reg_w) {
+   PMD_DRV_LOG(DEBUG, "No need to write");
+   return;
+   }
+
+   ret = i40e_aq_debug_write_register(hw, I40E_GL_SWT_L2TAGCTRL(reg_id),
+  reg_w, NULL);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Fail to debug write to "
+   "I40E_GL_SWT_L2TAGCTRL[%d]", reg_id);
+   return;
+   }
+   PMD_DRV_LOG(DEBUG, "Debug write 0x%08"PRIx64" to "
+   "I40E_GL_SWT_L2TAGCTRL[%d]", reg_w, reg_id);
 }

 static void
-- 
2.5.0

[dpdk-dev] [PATCH 1/2] ethdev: add vlan type for setting ether type

2016-01-22 Thread Helin Zhang

In order to set ether type of VLAN for single VLAN, inner
and outer VLAN, the VLAN type as an input parameter is added
to 'rte_eth_dev_set_vlan_ether_type()'.
In addition, corresponding changes in e1000, ixgbe and i40e are
also added.

Signed-off-by: Helin Zhang 
---
 app/test-pmd/cmdline.c   | 29 -
 app/test-pmd/config.c|  8 
 app/test-pmd/testpmd.h   |  3 ++-
 doc/guides/rel_notes/deprecation.rst |  4 
 doc/guides/rel_notes/release_2_3.rst |  6 ++
 drivers/net/e1000/igb_ethdev.c   | 20 +++-
 drivers/net/i40e/i40e_ethdev.c   |  4 +++-
 drivers/net/ixgbe/ixgbe_ethdev.c | 18 ++
 lib/librte_ether/rte_ethdev.c|  5 +++--
 lib/librte_ether/rte_ethdev.h| 19 +--
 10 files changed, 88 insertions(+), 28 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..ba1650b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -275,8 +275,8 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set the VLAN QinQ (extended queue in queue)"
" on a port.\n\n"

-   "vlan set tpid (value) (port_id)\n"
-   "Set the outer VLAN TPID for Packet Filtering on"
+   "vlan set (inner|outer) tpid (value) (port_id)\n"
+   "Set the VLAN TPID for Packet Filtering on"
" a port\n\n"

"rx_vlan add (vlan_id|all) (port_id)\n"
@@ -295,10 +295,6 @@ static void cmd_help_long_parsed(void *parsed_result,
"Remove a vlan_id, to the set of VLAN identifiers"
"filtered for VF(s) from port_id.\n\n"

-   "rx_vlan set tpid (value) (port_id)\n"
-   "Set the outer VLAN TPID for Packet Filtering on"
-   " a port\n\n"
-
"tunnel_filter add (port_id) (outer_mac) (inner_mac) 
(ip_addr) "
"(inner_vlan) (vxlan|nvgre) (filter_type) (tenant_id) 
(queue_id)\n"
"   add a tunnel filter of a port.\n\n"
@@ -2845,6 +2841,7 @@ cmdline_parse_inst_t cmd_vlan_offload = {
 struct cmd_vlan_tpid_result {
cmdline_fixed_string_t vlan;
cmdline_fixed_string_t set;
+   cmdline_fixed_string_t vlan_type;
cmdline_fixed_string_t what;
uint16_t tp_id;
uint8_t port_id;
@@ -2856,8 +2853,17 @@ cmd_vlan_tpid_parsed(void *parsed_result,
  __attribute__((unused)) void *data)
 {
struct cmd_vlan_tpid_result *res = parsed_result;
-   vlan_tpid_set(res->port_id, res->tp_id);
-   return;
+   enum rte_vlan_type vlan_type;
+
+   if (!strcmp(res->vlan_type, "inner"))
+   vlan_type = ETH_VLAN_TYPE_INNER;
+   else if (!strcmp(res->vlan_type, "outer"))
+   vlan_type = ETH_VLAN_TYPE_OUTER;
+   else {
+   printf("Unknown vlan type\n");
+   return;
+   }
+   vlan_tpid_set(res->port_id, vlan_type, res->tp_id);
 }

 cmdline_parse_token_string_t cmd_vlan_tpid_vlan =
@@ -2866,6 +2872,9 @@ cmdline_parse_token_string_t cmd_vlan_tpid_vlan =
 cmdline_parse_token_string_t cmd_vlan_tpid_set =
TOKEN_STRING_INITIALIZER(struct cmd_vlan_tpid_result,
 set, "set");
+cmdline_parse_token_string_t cmd_vlan_type =
+   TOKEN_STRING_INITIALIZER(struct cmd_vlan_tpid_result,
+vlan_type, "inner#outer");
 cmdline_parse_token_string_t cmd_vlan_tpid_what =
TOKEN_STRING_INITIALIZER(struct cmd_vlan_tpid_result,
 what, "tpid");
@@ -2879,10 +2888,12 @@ cmdline_parse_token_num_t cmd_vlan_tpid_portid =
 cmdline_parse_inst_t cmd_vlan_tpid = {
.f = cmd_vlan_tpid_parsed,
.data = NULL,
-   .help_str = "set tpid tp_id port_id, set the Outer VLAN Ether type",
+   .help_str = "set inner|outer tpid tp_id port_id, set the VLAN "
+   "Ether type",
.tokens = {
(void *)_vlan_tpid_vlan,
(void *)_vlan_tpid_set,
+   (void *)_vlan_type,
(void *)_vlan_tpid_what,
(void *)_vlan_tpid_tpid,
(void *)_vlan_tpid_portid,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7088f6f..537dd0b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1821,19 +1821,19 @@ rx_vlan_all_filter_set(portid_t port_id, int on)
 }

 void
-vlan_tpid_set(portid_t port_id, uint16_t tp_id)
+vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type, uint16_t tp_id)
 {
int diag;
if (port_id_is_invalid(port_id, ENABLED_WARN))
return;

-   diag = rte_eth_dev_set_vlan_ether_type(port_id, tp_id);
+   diag = rte_eth_dev_set_vlan_ether_type(port_id,

[dpdk-dev] [PATCH 0/2] i40e setting ether type of VLANs

2016-01-22 Thread Helin Zhang

It adds setting ether type of single VLAN(inner VLAN) and outer
VLAN for i40e. For ixgbe and e1000/igb, as the external API changed,
it supports setting single VLAN(inner VLAN) only.

Helin Zhang (2):
  ethdev: add vlan type for setting ether type
  i40e: add VLAN ether type config

 app/test-pmd/cmdline.c   | 29 --
 app/test-pmd/config.c|  8 ++---
 app/test-pmd/testpmd.h   |  3 +-
 doc/guides/rel_notes/deprecation.rst |  4 +++
 doc/guides/rel_notes/release_2_3.rst |  6 
 drivers/net/e1000/igb_ethdev.c   | 20 +
 drivers/net/i40e/i40e_ethdev.c   | 57 +---
 drivers/net/ixgbe/ixgbe_ethdev.c | 18 +---
 lib/librte_ether/rte_ethdev.c|  5 ++--
 lib/librte_ether/rte_ethdev.h| 19 ++--
 10 files changed, 138 insertions(+), 31 deletions(-)

-- 
2.5.0

[dpdk-dev] [PATCH 1/2] mbuf: Add rte_pktmbuf_copy

2016-01-22 Thread Stephen Hemminger

On Fri, 22 Jan 2016 13:40:44 +
"Mrzyglod, DanielX T"  wrote:

> 
> 
> >-Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> >Sent: Friday, July 10, 2015 1:38 AM
> >To: dev at dpdk.org
> >Cc: Mike Davison ; Stephen Hemminger
> >
> >Subject: [dpdk-dev] [PATCH 1/2] mbuf: Add rte_pktmbuf_copy
> >
> >From: Stephen Hemminger 
> >
> >Added rte_pktmbuf_copy() function since copying multi-part
> >segments is common issue and can be problematic.
> >
> >Signed-off-by: Mike Davison 
> >Reviewed-by: Stephen Hemminger 
> >---
> > lib/librte_mbuf/rte_mbuf.h | 59
> >++
> > 1 file changed, 59 insertions(+)
> >
> >diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >index 80419df..f0a543b 100644
> >--- a/lib/librte_mbuf/rte_mbuf.h
> >+++ b/lib/librte_mbuf/rte_mbuf.h
> >@@ -60,6 +60,7 @@
> > #include 
> > #include 
> > #include 
> >+#include 
> >
> > #ifdef __cplusplus
> > extern "C" {
> >@@ -1272,6 +1273,64 @@ static inline int rte_pktmbuf_is_contiguous(const
> >struct rte_mbuf *m)
> > return !!(m->nb_segs == 1);
> > }
> >
> >+/*
> >+ * Creates a copy of the given packet mbuf.
> >+ *
> >+ * Walks through all segments of the given packet mbuf, and for each of 
> >them:
> >+ *  - Creates a new packet mbuf from the given pool.
> >+ *  - Copy segment to newly created mbuf.
> >+ * Then updates pkt_len and nb_segs of the new packet mbuf to match values
> >+ * from the original packet mbuf.
> >+ *
> >+ * @param md
> >+ *   The packet mbuf to be copied.
> >+ * @param mp
> >+ *   The mempool from which the mbufs are allocated.
> >+ * @return
> >+ *   - The pointer to the new mbuf on success.
> >+ *   - NULL if allocation fails.
> >+ */
> >+static inline struct rte_mbuf *rte_pktmbuf_copy(struct rte_mbuf *md,
> >+struct rte_mempool *mp)
> >+{
> >+struct rte_mbuf *mc = NULL;
> >+struct rte_mbuf **prev = 
> >+
> >+do {
> >+struct rte_mbuf *mi;
> >+
> >+mi = rte_pktmbuf_alloc(mp);
> >+if (unlikely(mi == NULL)) {
> >+rte_pktmbuf_free(mc);
> >+return NULL;
> >+}
> >+
> >+mi->data_off = md->data_off;
> >+mi->data_len = md->data_len;
> >+mi->port = md->port;
> >+mi->vlan_tci = md->vlan_tci;
> >+mi->tx_offload = md->tx_offload;
> >+mi->hash = md->hash;
> >+
> >+mi->next = NULL;
> >+mi->pkt_len = md->pkt_len;
> >+mi->nb_segs = md->nb_segs;
> >+mi->ol_flags = md->ol_flags;
> >+mi->packet_type = md->packet_type;
> >+
> >+rte_memcpy(rte_pktmbuf_mtod(mi, char *),
> >+   rte_pktmbuf_mtod(md, char *),
> >+   md->data_len);
> >+
> >+*prev = mi;
> >+prev = >next;
> >+} while ((md = md->next) != NULL);
> >+
> >+*prev = NULL;
> >+__rte_mbuf_sanity_check(mc, 1);
> >+return mc;
> >+}
> >+
> > /**
> >  * Dump an mbuf structure to the console.
> >  *
> >--
> >2.1.4
> 
> Hi Stephen :>
> This patch look useful in case of backup buffs. 
> There will be second approach ?

I did bother resubmitting it since unless there are users in current
code base it would likely rot.

[dpdk-dev] [PATCH] i40evf: enable ops to add and remove mac address

2016-01-22 Thread Jingjing Wu

This patch implemented the ops of adding and removing mac
address in i40evf driver. Functions are assigned like:
  .mac_addr_add=  i40evf_add_mac_addr,
  .mac_addr_remove = i40evf_del_mac_addr,
To support multiple mac addresses setting, this patch also
extended the mac addresses adding and deletion when device
start and stop. For each VF, 64 mac addresses can be added
to in maximum.

Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/release_2_3.rst |   2 +
 drivers/net/i40e/i40e_ethdev.c   |   2 -
 drivers/net/i40e/i40e_ethdev.h   |   3 +
 drivers/net/i40e/i40e_ethdev_vf.c| 123 ++-
 4 files changed, 98 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..8b07f50 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,8 @@ DPDK Release 2.3
 New Features
 

+* **Added i40e VF mac address setting support.**
+

 Resolved Issues
 ---
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..30cd203 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -62,8 +62,6 @@
 #include "i40e_rxtx.h"
 #include "i40e_pf.h"

-/* Maximun number of MAC addresses */
-#define I40E_NUM_MACADDR_MAX   64
 #define I40E_CLEAR_PXE_WAIT_MS 200

 /* Maximun number of capability elements */
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 1f9792b..288d936 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -53,6 +53,9 @@
 #define I40E_DEFAULT_QP_NUM_FDIR  1
 #define I40E_UINT32_BIT_SIZE  (CHAR_BIT * sizeof(uint32_t))
 #define I40E_VFTA_SIZE(4096 / I40E_UINT32_BIT_SIZE)
+/* Maximun number of MAC addresses */
+#define I40E_NUM_MACADDR_MAX   64
+
 /*
  * vlan_id is a 12 bit number.
  * The VFTA array is actually a 4096 bit array, 128 of 32bit elements.
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 14d2a50..544bebb 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -139,6 +139,11 @@ static int i40evf_dev_tx_queue_start(struct rte_eth_dev 
*dev,
 uint16_t tx_queue_id);
 static int i40evf_dev_tx_queue_stop(struct rte_eth_dev *dev,
uint16_t tx_queue_id);
+static void i40evf_add_mac_addr(struct rte_eth_dev *dev,
+   struct ether_addr *addr,
+   uint32_t index,
+   uint32_t pool);
+static void i40evf_del_mac_addr(struct rte_eth_dev *dev, uint32_t index);
 static int i40evf_dev_rss_reta_update(struct rte_eth_dev *dev,
struct rte_eth_rss_reta_entry64 *reta_conf,
uint16_t reta_size);
@@ -210,6 +215,8 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
.rx_descriptor_done   = i40e_dev_rx_descriptor_done,
.tx_queue_setup   = i40e_dev_tx_queue_setup,
.tx_queue_release = i40e_dev_tx_queue_release,
+   .mac_addr_add = i40evf_add_mac_addr,
+   .mac_addr_remove  = i40evf_del_mac_addr,
.reta_update  = i40evf_dev_rss_reta_update,
.reta_query   = i40evf_dev_rss_reta_query,
.rss_hash_update  = i40evf_dev_rss_hash_update,
@@ -875,8 +882,11 @@ i40evf_stop_queues(struct rte_eth_dev *dev)
return 0;
 }

-static int
-i40evf_add_mac_addr(struct rte_eth_dev *dev, struct ether_addr *addr)
+static void
+i40evf_add_mac_addr(struct rte_eth_dev *dev,
+   struct ether_addr *addr,
+   __rte_unused uint32_t index,
+   __rte_unused uint32_t pool)
 {
struct i40e_virtchnl_ether_addr_list *list;
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
@@ -890,7 +900,7 @@ i40evf_add_mac_addr(struct rte_eth_dev *dev, struct 
ether_addr *addr)
addr->addr_bytes[0], addr->addr_bytes[1],
addr->addr_bytes[2], addr->addr_bytes[3],
addr->addr_bytes[4], addr->addr_bytes[5]);
-   return -1;
+   return;
}

list = (struct i40e_virtchnl_ether_addr_list *)cmd_buffer;
@@ -909,25 +919,29 @@ i40evf_add_mac_addr(struct rte_eth_dev *dev, struct 
ether_addr *addr)
PMD_DRV_LOG(ERR, "fail to execute command "
"OP_ADD_ETHER_ADDRESS");

-   return err;
+   return;
 }

-static int
-i40evf_del_mac_addr(struct rte_eth_dev *dev, struct ether_addr *addr)
+static void
+i40evf_del_mac_addr(struct rte_eth_dev *dev, uint32_t index)
 {
struct i40e_virtchnl_ether_addr_list *list;
struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   struct rte_eth_dev_data *data = dev->data;

[dpdk-dev] [PKTGEN] fixing weird termio issues that complicate debugging

2016-01-22 Thread Panu Matilainen

On 01/21/2016 05:03 PM, Wiles, Keith wrote:
> On 1/21/16, 2:46 AM, "Panu Matilainen"  wrote:
>
>> On 01/20/2016 06:26 PM, Wiles, Keith wrote:
>>> On 1/20/16, 12:32 AM, "dev on behalf of Matthew Hall" >> dpdk.org on behalf of mhall at mhcomputing.net> wrote:
>>>
 Hello,

 Since the pktgen code is reindented I am finding time to read through it
 and experiment and see if I can get it working.

 I have issues with the init process of pktgen. It is difficult to debug
 it because the init code does a lot of very scary stuff to the terminal
 control / TTY device at inconvenient times in an inconvenient order, and
 in the process damages the debug output and damages the screen of your
 GDB without doing weird things to run GDB on a different TTY.

 Of course I am willing to contribute patches and not just complain, but
 first I need some help to follow what is going on.

 Here is the problematic call-flow with some explanation what went wrong
 trying it on some community machines outside of its original environment:

 1) it calls printf("\n%s %s\n", wr_copyright_msg(), wr_powered_by());
 which dumps tons of weird boilerplate of licenses, copyrights, code
 creator, etc.

 It is open source and everybody that matters already knows who coded it,
 so is this stuff really that important? This gets in the way when you
 are trying to work on it and I just have to comment it out.
>>>
>>> One problem is a number of people wanted to steal the code and use in
>>> a paid application, so the copyright is some what a requirement.
>>
>> In that case, why is it under a BSD'ish license instead of something
>> like GPL that's designed to prevent it in the first place? Might be too
>> late to change it by now, just wondering.
>
> DPDK is BSD, so you can not use a GPL application with DPDK (I think)

Well I sure hope the license is not chosen by that assumption. Why 
assume when you can trivially check, eg:
http://www.gnu.org/licenses/license-list.html

DPDK itself is under the very lax 3-clause BSD license which is 
compatible with the GPL. The 4-clause "advertising license" used by 
pktgen is not. But its not the license I'm complaining about.

> anyway I can try to speed you the screens, but does it really matter
> as these are only at startup and I normally leave pktgen running for
> long periods of time. The extra time at the start does not seem to
> be a big issue, right?

We wouldn't be discussing this if it was not an issue. It is offensive 
enough to turn away both users and contributors, and merely speeding up 
a bit is not going to make it a whole lot better.

As I (and now others as well) already suggested changing it to a one 
line printout is what would make worlds of difference while still 
complying with the license AFAICT. The license text requires printing 
out the copyright notice, it does not say anything about rendering it in 
full-screen ascii-art, or printing out the entire license.

- Panu -

[dpdk-dev] [dpdk-dev, 2/2] ethdev: move code to common place in hotplug

2016-01-22 Thread David Marchand

Hello,

On Thu, Jan 21, 2016 at 7:42 PM, Thomas Monjalon
 wrote:
> 2016-01-21 19:06, David Marchand:
>> On Thu, Jan 21, 2016 at 4:38 PM, Jan Viktorin  
>> wrote:
>> > This change modifies the return value from -EINVAL to -1. I don't know
>> > whether is this an issue but it looks suspicious.
>>
>> Should not be an issue, as the api does not give details on expected
>> negative return values.
>> Just noticed, this also introduces a new log message that was not
>> displayed before.
>>
>> To be safe, I suppose I should restore this.
>>
>> Thomas, opinion ?
>
> I'm OK with the log message added for this error case.
> I would just keep the -EINVAL return value.

Ok will do.


-- 
David Marchand

[dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment

2016-01-22 Thread Xie, Huawei

On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:
> virtio: Extend virtio-net PMD to support container environment
>
> The patch adds a new virtio-net PMD configuration that allows the PMD to
> work on host as if the PMD is in VM.
> Here is new configuration for virtio-net PMD.
>  - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE
> To use this mode, EAL needs physically contiguous memory. To allocate
> such memory, add "--shm" option to application command line.
>
> To prepare virtio-net device on host, the users need to invoke QEMU
> process in special qtest mode. This mode is mainly used for testing QEMU
> devices from outer process. In this mode, no guest runs.
> Here is QEMU command line.
>
>  $ qemu-system-x86_64 \
>  -machine pc-i440fx-1.4,accel=qtest \
>  -display none -qtest-log /dev/null \
>  -qtest unix:/tmp/socket,server \
>  -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1\
>  -device virtio-net-pci,netdev=net0,mq=on \
>  -chardev socket,id=chr1,path=/tmp/ivshmem,server \
>  -device ivshmem,size=1G,chardev=chr1,vectors=1
>
>  * QEMU process is needed per port.

Does qtest supports hot plug virtio-net pci device, so that we could run
one QEMU process in host, which provisions the virtio-net virtual
devices for the container?

[dpdk-dev] [RFC PATCH 3/5] virtio: Add a new layer to abstract pci access method

2016-01-22 Thread Xie, Huawei

On 1/21/2016 7:08 PM, Tetsuya Mukawa wrote:
> +static void
> +phys_legacy_write16(struct virtio_hw *hw, uint16_t *addr, uint16_t val)
> +{
> + return outb_p((unsigned short)val,
> + (unsigned short)(hw->io_base + (uint64_t)addr));

outb_p -> outw_p

> +}
> +
> +static void
> +phys_legacy_write32(struct virtio_hw *hw, uint32_t *addr, uint32_t val)
> +{
> + return outb_p((unsigned int)val,
> + (unsigned short)(hw->io_base + (uint64_t)addr));

outb_p -> outl_p

> +}
> +

[dpdk-dev] ixgbevf does not recover from pf reset

2016-01-22 Thread Wu, Jingjing

Hi, David

We also noticed this issue before. And we are planning to fix this issue.
And the patch for i40e is ready:
http://dpdk.org/dev/patchwork/patch/9832/
http://dpdk.org/dev/patchwork/patch/9833/

The solution is that: PF reset interrupt will be captured by DPDK VF, then DPDK
Call the application's callback function to reset the VF device. For i40e 
device,
application need to detach and re-attach the device.

For ixgbe, I think the solution would be similar.

Thanks
Jingjing

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
> Sent: Friday, December 11, 2015 10:47 PM
> To: Zhang, Helin; Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] ixgbevf does not recover from pf reset
> 
> Hello Helin, Konstantin,
> 
> I noticed this issue quite some time ago (maybe around dpdk-1.3.0, not
> sure) but had no time to investigate/report.
> I hit it again with dpdk-2.2.0-rc2, so maybe all dpdk versions are impacted.
> 
> The test is quite simple :
> - send icmp packets from a system (tn) to itself using two 10G interfaces
> connected to two VF ports of a dual port 82599 nic,
> - run testpmd in a vm owning those two VF ports,
> - reset a pf interface in the host,
> - then no connectivity on the associated VF port unless it is completely
> restarted.
> 
> In my previous test, I think I also unplugged the fiber then replugged
> (without touching the pf interface status on the host) and observed the
> same issue.
> 
> I looked at ixgbevf pmd, but I can't find a place where the PF asking for 
> reset
> would be handled except at devinit.
> 
> 
> 
> Here is the full description of my test :
> 
> The system owning the two VF ports is a ubuntu 14.04 virtual machine
> (3.13.0-71-generic kernel).
> The host running this virtual machine is running ubuntu 14.04 as well (but
> kernel is newer 3.19.0-39-generic).
> 
> 
> - host setup :
> echo 3072
> >/sys/devices/system/node/node0/hugepages/hugepages-
> 2048kB/nr_hugepages
> echo 3072
> >/sys/devices/system/node/node1/hugepages/hugepages-
> 2048kB/nr_hugepages
> mkdir -p
> /mnt/huge
> mount -t hugetlbfs none /mnt/huge
> 
> echo 1 >
> "/sys/bus/pci/devices/:83:00.0/sriov_numvfs"
> bridge link set dev ixgbe0 hwmode veb
> ip link set dev ixgbe0 vf 0 mac 00:09:c0:0e:4e:64 ip link set dev ixgbe0 vf 0
> spoofchk off ip link set dev ixgbe0 mtu 9000 ip link set dev ixgbe0 up ip link
> set dev ixgbe0 address 00:09:c0:0e:4e:64
> 
> echo 1 >
> "/sys/bus/pci/devices/:83:00.1/sriov_numvfs"
> bridge link set dev ixgbe1 hwmode veb
> ip link set dev ixgbe1 vf 0 mac 00:09:c0:0e:4e:65 ip link set dev ixgbe1 vf 0
> spoofchk off ip link set dev ixgbe1 mtu 9000 ip link set dev ixgbe1 up ip link
> set dev ixgbe1 address 00:09:c0:0e:4e:65
> 
> qemu-system-x86_64 -k fr -daemonize -S --enable-kvm -m 3G -mem-path
> /mnt/huge -cpu host -smp cores=6,threads=2,sockets=1 -serial
> telnet::53714,server,nowait -serial null -monitor telnet::36576,server,nowait
> -device
> virtio-net,mac=DE:AD:DE:01:02:03,netdev=user.0,addr=03 -netdev
> user,id=user.0 -device pci-assign,host=:83:10.0,addr=04 -device
> pci-assign,host=:83:10.1,addr=05 -hda ubuntu-14.04-template.qcow2 -
> snapshot -vga none -display none
> 
> 
> - guest setup :
> echo 1024 >
> /sys/devices/system/node/node0/hugepages/hugepages-
> 2048kB/nr_hugepages
> mkdir -p
> /mnt/huge
> mount -t hugetlbfs none /mnt/huge
> 
> modprobe uio
> modprobe igb_uio
> echo :00:04.0 > /sys/m/bus/pci/drivers/ixgbevf/unbind
> echo :00:05.0 > /sys/m/bus/pci/drivers/ixgbevf/unbind
> echo 8086 10ed > /sys/bus/pci/drivers/igb_uio/new_id
> 
> testpmd -w:00:04.0 -w:00:05.0 --huge-dir=/mnt/huge -n 4 -c 0xe -- -i
> [snip] Port 0:
> 00:09:C0:0E:4E:64
> [snip]
> Port 1:
> 00:09:C0:0E:4E:65
> Checking link
> statuses...
> Port 0 Link Up - speed 1 Mbps -
> full-duplex
> Port 1 Link Up - speed 1 Mbps -
> full-duplex
> Done
> 
> testpmd>
> start
> 
> 
> - tn setup :
> ip link set dev xaui0 up
> ip link set dev xaui1 up
> ip addr add 1.1.1.1/24 dev xaui0
> arp -i xaui0 -s 1.1.1.2 00:09:c0:0e:4e:64
> 
> 
> 
> - From here, on tn, send icmp packets with a tcpdump running background
> on the other iface :
> ping 1.1.1.2
> PING 1.1.1.2 (1.1.1.2): 56 data
> bytes
> 03:10:15.003047 00:02:bb:a8:0d:10 > 00:09:c0:0e:4e:64, ethertype IPv4
> (0x0800), length 98: IP 1.1.1.1 > 1.1.1.2: icmp 64: echo request seq 1
> 03:10:16.003151 00:02:bb:a8:0d:10 > 00:09:c0:0e:4e:64, ethertype IPv4
> (0x0800), length 98: IP 1.1.1.1 > 1.1.1.2: icmp 64: echo request seq 2
> 03:10:17.003246 00:02:bb:a8:0d:10 > 00:09:c0:0e:4e:64, ethertype IPv4
> (0x0800), length 98: IP 1.1.1.1 > 1.1.1.2: icmp 64: echo request seq 3
> 
> 
> - Ok, now, kick ixgbe0 pf interface down / up on host system.
> ip link set dev ixgbe0 down
> ip link set dev ixgbe0 up
> 
> 
> - On tn, no packet is received anymore, the only workaround I found is to
> stop and restart the port 0 :
> 
> testpmd>
> stop
> [snip]
> Done.
> 
> testpmd> port stop

[dpdk-dev] [PATCH 1/3] i40e: enable extended tag

2016-01-22 Thread Wu, Jingjing

Hi

>   */
>  static void
> -i40e_hw_init(struct i40e_hw *hw)
> +i40e_hw_init(struct rte_eth_dev *dev)
>  {
> + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +
> + i40e_enable_extended_tag(dev);
If the function i40e_enable_extended_tag is only used here without
Checking the return value, why not define it as void?

Thanks
Jingjing
> +
>   /* clear the PF Queue Filter control register */
>   I40E_WRITE_REG(hw, I40E_PFQF_CTL_0, 0);
> 
> --
> 1.9.3

64 matches

Mail list logo