date:20150209

[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read

2015-02-09 Thread Dumitrescu, Cristian

Hi Stephen,

What is the reason not to clear statistics on read? Do you have a use-case / 
justification for it?

(BTW, I see you added the reset functions, but was it also your intention to 
remove the memset to 0 from the stats read functions? :) )

Regards,
Cristian

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Thursday, February 5, 2015 6:14 AM
To: dev at dpdk.org
Cc: Stephen Hemminger
Subject: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read

From: Stephen Hemminger 

Make rte_sched statistics API work like the ethernet statistics API.
Don't auto-clear statistics.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_sched/rte_sched.c | 30 ++
 lib/librte_sched/rte_sched.h | 29 +
 2 files changed, 59 insertions(+)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 8cb8bf1..d891e50 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -935,6 +935,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
 }

 int
+rte_sched_subport_stats_reset(struct rte_sched_port *port,
+ uint32_t subport_id)
+{
+   struct rte_sched_subport *s;
+
+   /* Check user parameters */
+   if (port == NULL || subport_id >= port->n_subports_per_port)
+   return -1;
+
+   s = port->subport + subport_id;
+   memset(>stats, 0, sizeof(struct rte_sched_subport_stats));
+   return 0;
+}
+
+int
 rte_sched_queue_read_stats(struct rte_sched_port *port,
uint32_t queue_id,
struct rte_sched_queue_stats *stats,
@@ -963,6 +978,21 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
return 0;
 }

+int
+rte_sched_queue_stats_reset(struct rte_sched_port *port,
+   uint32_t queue_id)
+{
+   struct rte_sched_queue_extra *qe;
+
+   /* Check user parameters */
+   if (port == NULL || queue_id >= rte_sched_port_queues_per_port(port))
+   return -1;
+
+   qe = port->queue_extra + queue_id;
+   memset(>stats, 0, sizeof(struct rte_sched_queue_stats));
+   return 0;
+}
+
 static inline uint32_t
 rte_sched_port_qindex(struct rte_sched_port *port, uint32_t subport, uint32_t 
pipe, uint32_t traffic_class, uint32_t queue)
 {
diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
index e9bf18a..3d007e4 100644
--- a/lib/librte_sched/rte_sched.h
+++ b/lib/librte_sched/rte_sched.h
@@ -317,6 +317,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
struct rte_sched_subport_stats *stats,
uint32_t *tc_ov);

+
+/**
+ * Hierarchical scheduler subport statistics reset
+ *
+ * @param port
+ *   Handle to port scheduler instance
+ * @param subport_id
+ *   Subport ID
+ * @return
+ *   0 upon success, error code otherwise
+ */
+int
+rte_sched_subport_stats_reset(struct rte_sched_port *port,
+ uint32_t subport_id);
+
 /**
  * Hierarchical scheduler queue statistics read
  *
@@ -338,6 +353,20 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
struct rte_sched_queue_stats *stats,
uint16_t *qlen);

+/**
+ * Hierarchical scheduler queue statistics reset
+ *
+ * @param port
+ *   Handle to port scheduler instance
+ * @param queue_id
+ *   Queue ID within port scheduler
+ * @return
+ *   0 upon success, error code otherwise
+ */
+int
+rte_sched_queue_stats_reset(struct rte_sched_port *port,
+   uint32_t queue_id);
+
 /*
  * Run-time
  *
-- 
2.1.4

--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] Error seen while running testpmd sample application

2015-02-09 Thread Shankari Vaidyalingam

Hi,

I'm trying to execute the testpmd sample appplication.
I'm getting the below error and not able to execute the sample application.

controller at controller-VirtualBox:~/software/dpdk-1.7.1$ sudo
./build/app/testpmd -c7 -n3 -- -i --nb-cores=2 --nb-ports=2
./build/app/testpmd: error while loading shared libraries:
librte_distributor.so: cannot open shared object file: No such file or
directory
controller at controller-VirtualBox:~/software/dpdk-1.7.1$ find . -name
librte_distributor.so -print
./build/build/lib/librte_distributor/librte_distributor.so
./build/lib/librte_distributor.so
./x86_64-native-linuxapp-gcc/build/lib/librte_distributor/librte_distributor.so
./x86_64-native-linuxapp-gcc/lib/librte_distributor.so
controller at controller-VirtualBox:~/software/dpdk-1.7.1$

I'm using Ubuntu 12.04 LTS version and DPDK 1.7.1 version.
I already went through the mailing list to see whether anyone is facing
similar issues. I was able to find 3 discussions but not able to find the
solution in them.

I used the below step to compile and execute the application:

  sudo ./tools/dpdk_nic_bind.py --status
  mkdir -p /mnt/huge
  sudo mount -t hugetlbfs nodev /mnt/huge
  sudo echo 64 >
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  ls
  sudo ./build/app/testpmd -c7 -n3 -- -i --nb-cores=2 --nb-ports=2



Please let me know how to solve this issue.

Regards
Shankari.V

[dpdk-dev] Vhost-user roadmap

2015-02-09 Thread Benoît Canet


Hello Xie,

I am testing your vhost-user patchset with the plumbing of
my ongoing lower latency neutron implementation.

Can you share with the list your roadmap of 2015 for the
dpdk/vhost topic ?

Best regards

Beno?t

[dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL thread

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 03:41 PM, Liang, Cunming wrote:
>>>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>>> -#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
>>> -   unsigned __lcore_id = rte_lcore_id();   \
>>> -   mp->stats[__lcore_id].name##_objs += n; \
>>> -   mp->stats[__lcore_id].name##_bulk += 1; \
>>> +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
>>> +   unsigned __lcore_id = rte_lcore_id();   \
>>> +   if (__lcore_id < RTE_MAX_LCORE) {   \
>>> +   mp->stats[__lcore_id].name##_objs += n; \
>>> +   mp->stats[__lcore_id].name##_bulk += 1; \
>>> +   }   \
>>
>> Does it mean that we have no statistics for non-EAL threads?
>> (same question for rings and timers in the next patches)
> [LCM] Yes, it is in this patch set, mainly focus on EAL thread and make sure 
> no running issue on non-EAL thread.
> For full non-EAL function, will have other patch set to enhance non-EAL 
> thread as the 2nd step.

OK

>>> @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
>> **obj_table,
>>> uint32_t cache_size = mp->cache_size;
>>>
>>> /* cache is not enabled or single consumer */
>>> -   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
>>> +   if (unlikely(cache_size == 0 || is_mc == 0 ||
>>> +n >= cache_size || lcore_id >= RTE_MAX_LCORE))
>>> goto ring_dequeue;
>>>
>>> cache = >local_cache[lcore_id];
>>>
>>
>> What is the performance impact of adding this test?
> [LCM] By perf in unit test, it's almost the same. But haven't measure EAL 
> thread and non-EAL thread share the same mempool.


When you say "unit test", are you talking about mempool tests from
"make test"? Do you have some numbers to share?

[dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to (-1) by default

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 03:24 PM, Liang, Cunming wrote:
>>> --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
>>> @@ -57,8 +57,8 @@
>>>  #include "eal_private.h"
>>>  #include "eal_thread.h"
>>>
>>> -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
>>> -RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
>>> +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
>>> +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
>>>  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
>>
>> As far as I understand, now a rte_lcore_id() can return LCORE_ID_ANY.
>> This should be modified in the rte_lcore_id() API comments.
>>
>> Same for rte_socket_id().
> [LCM] accept.
>>
>> I also wonder if the API of these functions should be modified to
>> return an int instead of an unsigned as LCORE_ID_ANY is -1.
> [LCM] I prefer not change the API definition. (unsigned)LCORE_ID_ANY already 
> used before.

OK

And what about directly defining the following?

#define LCORE_ID_ANY ((unsigned)-1)


It would avoid the casts.

[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read

2015-02-09 Thread Neil Horman

On Mon, Feb 09, 2015 at 10:48:36PM +, Dumitrescu, Cristian wrote:
> Hi Stephen,
> 
> What is the reason not to clear statistics on read? Do you have a use-case / 
> justification for it?
> 
> (BTW, I see you added the reset functions, but was it also your intention to 
> remove the memset to 0 from the stats read functions? :) )
> 
> Regards,
> Cristian
> 
Its the difference between a hardware and a software interface.  Hardware stats
are often read-clear, but software hides that, making stats continuous.
Exposing it is atypical for a software stack.
Neil

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Thursday, February 5, 2015 6:14 AM
> To: dev at dpdk.org
> Cc: Stephen Hemminger
> Subject: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read
> 
> From: Stephen Hemminger 
> 
> Make rte_sched statistics API work like the ethernet statistics API.
> Don't auto-clear statistics.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_sched/rte_sched.c | 30 ++
>  lib/librte_sched/rte_sched.h | 29 +
>  2 files changed, 59 insertions(+)
> 
> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> index 8cb8bf1..d891e50 100644
> --- a/lib/librte_sched/rte_sched.c
> +++ b/lib/librte_sched/rte_sched.c
> @@ -935,6 +935,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
>  }
>  
>  int
> +rte_sched_subport_stats_reset(struct rte_sched_port *port,
> +   uint32_t subport_id)
> +{
> + struct rte_sched_subport *s;
> +
> + /* Check user parameters */
> + if (port == NULL || subport_id >= port->n_subports_per_port)
> + return -1;
> +
> + s = port->subport + subport_id;
> + memset(>stats, 0, sizeof(struct rte_sched_subport_stats));
> + return 0;
> +}
> +
> +int
>  rte_sched_queue_read_stats(struct rte_sched_port *port,
>   uint32_t queue_id,
>   struct rte_sched_queue_stats *stats,
> @@ -963,6 +978,21 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
>   return 0;
>  }
>  
> +int
> +rte_sched_queue_stats_reset(struct rte_sched_port *port,
> + uint32_t queue_id)
> +{
> + struct rte_sched_queue_extra *qe;
> +
> + /* Check user parameters */
> + if (port == NULL || queue_id >= rte_sched_port_queues_per_port(port))
> + return -1;
> +
> + qe = port->queue_extra + queue_id;
> + memset(>stats, 0, sizeof(struct rte_sched_queue_stats));
> + return 0;
> +}
> +
>  static inline uint32_t
>  rte_sched_port_qindex(struct rte_sched_port *port, uint32_t subport, 
> uint32_t pipe, uint32_t traffic_class, uint32_t queue)
>  {
> diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
> index e9bf18a..3d007e4 100644
> --- a/lib/librte_sched/rte_sched.h
> +++ b/lib/librte_sched/rte_sched.h
> @@ -317,6 +317,21 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
>   struct rte_sched_subport_stats *stats,
>   uint32_t *tc_ov);
>  
> +
> +/**
> + * Hierarchical scheduler subport statistics reset
> + *
> + * @param port
> + *   Handle to port scheduler instance
> + * @param subport_id
> + *   Subport ID
> + * @return
> + *   0 upon success, error code otherwise
> + */
> +int
> +rte_sched_subport_stats_reset(struct rte_sched_port *port,
> +   uint32_t subport_id);
> +
>  /**
>   * Hierarchical scheduler queue statistics read
>   *
> @@ -338,6 +353,20 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
>   struct rte_sched_queue_stats *stats,
>   uint16_t *qlen);
>  
> +/**
> + * Hierarchical scheduler queue statistics reset
> + *
> + * @param port
> + *   Handle to port scheduler instance
> + * @param queue_id
> + *   Queue ID within port scheduler
> + * @return
> + *   0 upon success, error code otherwise
> + */
> +int
> +rte_sched_queue_stats_reset(struct rte_sched_port *port,
> + uint32_t queue_id);
> +
>  /*
>   * Run-time
>   *
> -- 
> 2.1.4
> 
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
> 
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
> 
> 
>

[dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL thread

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 03:19 PM, Liang, Cunming wrote:
>>> --- a/lib/librte_eal/common/include/rte_log.h
>>> +++ b/lib/librte_eal/common/include/rte_log.h
>>> @@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void);
>>>  void rte_set_log_type(uint32_t type, int enable);
>>>
>>>  /**
>>> + * Get the global log type.
>>> + */
>>> +uint32_t rte_get_log_type(void);
>>> +
>>> +/**
>>>   * Get the current loglevel for the message being processed.
>>>   *
>>>   * Before calling the user-defined stream for logging, the log
>>>
>>
>> Wouldn't it be better to change the variable:
>> static struct log_cur_msg log_cur_msg[RTE_MAX_LCORE];
>> into a pthread (tls) variable?
>>
>> With your patch, the log level and log type are not saved for
>> non-EAL threads. If TLS were used, I think it would work in any case.
> [LCM] Good point. But for this patch set, still suppose not involve big 
> impact to EAL thread.
> For improve non-EAL thread, we'll have a separate patch set for it.

OK, that's fine

Will it be for 2.0 or later?

[dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of SOCKET_ID_ANY

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 03:08 PM, Liang, Cunming wrote:
> 
> 
>> -Original Message-
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Monday, February 09, 2015 4:01 AM
>> To: Liang, Cunming; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of 
>> SOCKET_ID_ANY
>>
>> Hi,
>>
>> On 02/02/2015 03:02 AM, Cunming Liang wrote:
>>> Add check for rte_socket_id(), avoid get unexpected return like (-1).
>>>
>>> Signed-off-by: Cunming Liang 
>>> ---
>>>  lib/librte_malloc/malloc_heap.h | 7 ++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_malloc/malloc_heap.h 
>>> b/lib/librte_malloc/malloc_heap.h
>>> index b4aec45..a47136d 100644
>>> --- a/lib/librte_malloc/malloc_heap.h
>>> +++ b/lib/librte_malloc/malloc_heap.h
>>> @@ -44,7 +44,12 @@ extern "C" {
>>>  static inline unsigned
>>>  malloc_get_numa_socket(void)
>>>  {
>>> -   return rte_socket_id();
>>> +   unsigned socket_id = rte_socket_id();
>>> +
>>> +   if (socket_id == (unsigned)SOCKET_ID_ANY)
>>> +   return 0;
>>> +
>>> +   return socket_id;
>>>  }
>>>
>>>  void *
>>>
>>
>> The documentation off rte_malloc_socket() says:
>>
>> @param socket
>>   NUMA socket to allocate memory on. If SOCKET_ID_ANY is used, this
>>   function will behave the same as rte_malloc().
>>
>> void *
>> rte_malloc_socket(const char *type, size_t size, unsigned align, int
>> socket);
>>
>>
>> Your patch changes the behavior of rte_malloc() without explaining
>> why, and the documentation becomes wrong.
>>
>> Can you explain why you need this change?
> [LCM] I don't think I change the declaration of rte_malloc_socket().
> If socket_arg=SOCKET_ID_ANY, the socket value expect to the return value of 
> malloc_get_numa_socket().
> The malloc_get_numa_socket() supposed to return the correct TLS _socket_id.
> It works fine for normal cases. But as we change the default value of TLS 
> _socket_id to SOCKET_ID_ANY.
> And one lcore can run on multiple cpu, if all cpus in the cpuset are not 
> belongs to one NUMA node, the _socket_id would be SOCKET_ID_ANY.
> When user call rte_malloc_socket(SOCKET_ID_ANY), it does provide the same 
> behavior as rte_malloc().
> They both will get socket_id from malloc_get_numa_socket(). The addition part 
> is the exception path process.

Sorry, I checked again, you are right.

[dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by assigned cpuset

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 02:48 PM, Liang, Cunming wrote:
>> -Original Message-
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Monday, February 09, 2015 4:01 AM
>> To: Liang, Cunming; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by
>> assigned cpuset
>>
>> Hi,
>>
>> On 02/02/2015 03:02 AM, Cunming Liang wrote:
>>> EAL threads use assigned cpuset to set core affinity during startup.
>>> It keeps 1:1 mapping, if no '--lcores' option is used.
>>>
>>> [...]
>>>
>>>  lib/librte_eal/bsdapp/eal/eal.c  | 13 ---
>>>  lib/librte_eal/bsdapp/eal/eal_thread.c   | 63 
>>> +-
>>>  lib/librte_eal/linuxapp/eal/eal.c|  7 +++-
>>>  lib/librte_eal/linuxapp/eal/eal_thread.c | 67 
>>> +++-
>>>  4 files changed, 54 insertions(+), 96 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/bsdapp/eal/eal.c 
>>> b/lib/librte_eal/bsdapp/eal/eal.c
>>> index 69f3c03..98c5a83 100644
>>> --- a/lib/librte_eal/bsdapp/eal/eal.c
>>> +++ b/lib/librte_eal/bsdapp/eal/eal.c
>>> @@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv)
>>> int i, fctret, ret;
>>> pthread_t thread_id;
>>> static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
>>> +   char cpuset[CPU_STR_LEN];
>>>
>>> if (!rte_atomic32_test_and_set(_once))
>>> return -1;
>>> @@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv)
>>> if (rte_eal_pci_init() < 0)
>>> rte_panic("Cannot init PCI\n");
>>>
>>> -   RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n",
>>> -   rte_config.master_lcore, thread_id);
>>> -
>>> eal_check_mem_on_local_socket();
>>>
>>> rte_eal_mcfg_complete();
>>>
>>> +   eal_thread_init_master(rte_config.master_lcore);
>>> +
>>> +   eal_thread_dump_affinity(cpuset, CPU_STR_LEN);
>>> +
>>> +   RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n",
>>> +   rte_config.master_lcore, thread_id, cpuset);
>>> +
>>> if (rte_eal_dev_init() < 0)
>>> rte_panic("Cannot init pmd devices\n");
>>>
>>> @@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv)
>>> rte_panic("Cannot create thread\n");
>>> }
>>>
>>> -   eal_thread_init_master(rte_config.master_lcore);
>>> -
>>> /*
>>>  * Launch a dummy function on all slave lcores, so that master lcore
>>>  * knows they are all ready when this function returns.
>>
>> I wonder if changing this may have an impact on third-party drivers
>> that already use a management thread. Before the patch, the init()
>> function of the external library was called with default affinities,
>> and now it's called with the affinity from master lcore.
>>
>> I think it should at least be noticed in the commit log.
>>
>> Why are you doing this change? (I don't say it's a bad change, but
>> I don't understand why you are doing it here)
> [LCM] To be honest, the main purpose is I don't found any reason to have 
> linuxapp and freebsdapp in different init sequence.
> I means in linux it init_master before dev_init(), but in freebsd it reverse.


I agree that's something we should fix.


> And as the default value of TLS already changes, if dev_init() first and 
> using those TLS, the result will be not in an EAL thread.
> But actually they're in the EAL master thread. So I prefer to do the change 
> follows linuxapp sequence.

That makes sense. Is it possible to have this reordering in a separate
patch? The title could be
"eal: standardize init sequence between linux and bsd"



>>
>>
>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
>> b/lib/librte_eal/bsdapp/eal/eal_thread.c
>>> index d0c077b..5b16302 100644
>>> --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
>>> +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
>>> @@ -103,55 +103,27 @@ eal_thread_set_affinity(void)
>>>  {
>>> int s;
>>> pthread_t thread;
>>> -
>>> -/*
>>> - * According to the section VERSIONS of the CPU_ALLOC man page:
>>> - *
>>> - * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were
>> added
>>> - * in glibc 2.3.3.
>>> - *
>>> - * CPU_COUNT() first appeared in glibc 2.6.
>>> - *
>>> - * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),
>> CPU_ALLOC(),
>>> - * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(),  CPU_SET_S(),
>> CPU_CLR_S(),
>>> - * CPU_ISSET_S(),  CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and
>> CPU_EQUAL_S()
>>> - * first appeared in glibc 2.7.
>>> - */
>>> -#if defined(CPU_ALLOC)
>>> -   size_t size;
>>> -   cpu_set_t *cpusetp;
>>> -
>>> -   cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
>>> -   if (cpusetp == NULL) {
>>> -   RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
>>> -   return -1;
>>> -   }
>>> -
>>> -   size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
>>> -   CPU_ZERO_S(size, cpusetp);
>>> -   CPU_SET_S(rte_lcore_id(), size, cpusetp);
>>> +   unsigned lcore_id = rte_lcore_id();
>>>
>>> thread = pthread_self();
>>> -   s = pthread_setaffinity_np(thread, size, cpusetp);
>>> +

[dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for common thread API

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 02:12 PM, Liang, Cunming wrote:
>>> +int
>>> +rte_thread_get_affinity(rte_cpuset_t *cpusetp)
>>> +{
>>> +   if (!cpusetp)
>>> +   return -1;
>>
>> Same here. This is the only reason why rte_thread_get_affinity() could
>> fail. Removing this test would allow to change the API to return void
>> instead. It will avoid a useless test below in
>> eal_thread_dump_affinity().
> [LCM] The cpusetp is used as destination of memcpy and the function suppose 
> an EAL API.
> I don't think it's a good idea to remove the check, do you ?

I know we often have debate on this subject on the list. My personal
opinion is that checking a NULL pointer in these cases is useless
because the user is suppose to give a non-NULL pointer. Returning
an error will result in managing an error for something that cannot
happen.

On the other hand, adding an assert() (or the dpdk equivalent) would
be a good idea.


>>
>>> +
>>> +   rte_memcpy(cpusetp, _PER_LCORE(_cpuset),
>>> +  sizeof(rte_cpuset_t));
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +void
>>> +eal_thread_dump_affinity(char str[], unsigned size)
>>> +{
>>> +   rte_cpuset_t cpuset;
>>> +   unsigned cpu;
>>> +   int ret;
>>> +   unsigned int out = 0;
>>> +
>>> +   if (rte_thread_get_affinity() < 0) {
>>> +   str[0] = '\0';
>>> +   return;
>>> +   }
>>
>> This one could be removed it the (== NULL) test is removed.
>>
>>> +
>>> +   for (cpu = 0; cpu < RTE_MAX_LCORE; cpu++) {
>>> +   if (!CPU_ISSET(cpu, ))
>>> +   continue;
>>> +
>>> +   ret = snprintf(str + out,
>>> +  size - out, "%u,", cpu);
>>> +   if (ret < 0 || (unsigned)ret >= size - out)
>>> +   break;
>>
>> On the contrary, I think here returning an error to the user
>> would be useful so he can knows that the dump is not complete.
> [LCM] accept.
>>
>>
>> Regards,
>> Olivier

[dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API declaration

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 01:45 PM, Liang, Cunming wrote:
>>> +/**
>>> + * Dump the current pthread cpuset.
>>> + * This function is private to EAL.
>>> + *
>>> + * @param str
>>> + *   The string buffer the cpuset will dump to.
>>> + * @param size
>>> + *   The string buffer size.
>>> + */
>>> +#define CPU_STR_LEN256
>>> +void
>>> +eal_thread_dump_affinity(char str[], unsigned size);
>>
>> Although it's equivalent for function arguments, I think "char *str" is
>> usually preferred over "char str[]". See for instance in snprintf() or
>> fgets().
> [LCM] Accept.
>>
>> What is the purpose of CPU_STR_LEN?
> [LCM] For default quick reference for str[] definition used in dump_affinity()

So the API comment of the function is not placed at the right
place.

A comment "Default buffer size to use with eal_thread_dump_affinity()"
should be added above CPU_STR_LEN. Also, it could be renamed in
RTE_CPU_STR_LEN or RTE_CPU_AFFINITY_STR_LEN.



>>> @@ -80,7 +81,9 @@ struct lcore_config {
>>>   */
>>>  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
>>>
>>> -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
>>> +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id".
>> */
>>> +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id".
>> */
>>> +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset".
>> */
>>>
>>>  /**
>>>   * Return the ID of the execution unit we are running on.
>>> @@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id)
>>>  static inline unsigned
>>>  rte_socket_id(void)
>>>  {
>>> -   return lcore_config[rte_lcore_id()].socket_id;
>>> +   return RTE_PER_LCORE(_socket_id);
>>>  }
>>
>> I don't see where the _socket_id variable is assigned. I think there
>> is probably an issue with the splitting of the patches.
> [LCM] The value initializes as SOCKET_ID_ANY when RTE_DEFINE_PER_LCORE().
> And updated in eal_thread_set_affinity() for EAL thread and 
> rte_thread_set_affinity() for non-EAL thread.

This is done in a later patches:

"eal: set _lcore_id and _socket_id to (-1) by default"
"eal: apply affinity of EAL thread by assigned cpuset"

That's why I said there is probably an issue with the ordering
of the patches as these values are used here but initialized
later in the series.

[dpdk-dev] [PATCH v4 04/17] eal: add support parsing socket_id from cpuset

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 01:26 PM, Liang, Cunming wrote:
>>> @@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void
>> *arg);
>>>   */
>>>  void eal_thread_init_master(unsigned lcore_id);
>>>
>>> +/**
>>> + * Get the NUMA socket id from cpu id.
>>> + * This function is private to EAL.
>>> + *
>>> + * @param cpu_id
>>> + *   The logical process id.
>>> + * @return
>>> + *   socket_id or SOCKET_ID_ANY
>>> + */
>>> +unsigned eal_cpu_socket_id(unsigned cpu_id);
>>
>> Wouldn't it be better to rename the existing function cpu_socket_id()
>> in eal_cpu_socket_id() and export it in eal_thread.h?
>>
>> In case of bsd where cpu_socket_id() is implemented using a #define,
>> a new function should be created returning 0.
> [LCM] In eal_lcore.c, the cpu_socket_id()/cpu_core_id() defined as static and 
> only used in rte_eal_cpu_init().
> I suppose the purpose of origin design is to make the sysfs parsing only 
> visible in the file.
> No matter remove the 'static' prefix of cpu_core_id() or add a new wrap 
> eal_cpu_socket_id(), it results in a new extern EAL API.
> So I prefer not change the visibility of the origin static function but have 
> one as extern interface.

Yes, but I don't see what is the advantage of using a wrapper.
If there is no advantage, I think the one with the less code is
better.



>>> +static inline int
>>> +eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
>>> +{
>>> +   unsigned cpu = 0;
>>> +   int socket_id = SOCKET_ID_ANY;
>>> +   int sid;
>>> +
>>> +   if (cpusetp == NULL)
>>> +   return SOCKET_ID_ANY;
>>> +
>>> +   do {
>>> +   if (!CPU_ISSET(cpu, cpusetp))
>>> +   continue;
>>> +
>>> +   if (socket_id == SOCKET_ID_ANY)
>>> +   socket_id = eal_cpu_socket_id(cpu);
>>> +
>>> +   sid = eal_cpu_socket_id(cpu);
>>> +   if (socket_id != sid) {
>>> +   socket_id = SOCKET_ID_ANY;
>>> +   break;
>>> +   }
>>> +
>>> +   } while (++cpu < RTE_MAX_LCORE);
>>> +
>>> +   return socket_id;
>>> +}
>>
>>
>> I don't think this function should be inlined.
>>
>> As this function is not used, it could be interesting for reviewers
>> to understand when
> [LCM] It's used in eal_thread_set_affinity() of eal_thread.c.

As it's not visible in the patch, could you add an explanation in
the commit log?

[dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return value in 32bit icc

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 12:57 PM, Liang, Cunming wrote:
>>> @@ -469,7 +469,7 @@ eal_parse_lcores(const char *lcores)
>>> /* Remove all blank characters ahead and after */
>>> while (isblank(*lcores))
>>> lcores++;
>>> -   i = strnlen(lcores, sysconf(_SC_ARG_MAX));
>>> +   i = strnlen(lcores, PATH_MAX);
>>> while ((i > 0) && isblank(lcores[i - 1]))
>>> i--;
>>>
>>>
>>
>> I think PATH_MAX is not equivalent to _SC_ARG_MAX.
>>
>> But the main question is: why do we need to use strnlen() here instead
>> of strlen? We can expect that argv[] pointers are always nul-terminated.
>> Replacing them by strlen() would probably also solve the icc issue.
> [LCM] You're right, here strlen() also solve icc issue and no risk for argv[].
> But follows practice suggestion, keeping using those with 'n' function in 
> DPDK is not bad.
> There's additional two reason to keep strnlen and PATH_MAX.
> 1. PATH_MAX is defined as 4096 which is enough as our input. It doesn't 
> matter to be _SC_ARG_MAX or not.

PATH_MAX is 4096 but it's not related to the maximum argument length.

> 2. strnlen and PATH_MAX already used in eal_parse_coremask, to keep the style 
> consistent in '-l' and '--lcores'.

I don't think it's a valid argument.

What is the problem of using strlen()? It looks it solves all the
issues. Using strlen on valid strings is not a security issue.


Regards,
Olivier

[dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt framework

2015-02-09 Thread Thomas Monjalon

2015-02-09 16:18, Dumitrescu, Cristian:
> > About cfgfile, we are still waiting for the cleanup in qos_sched example:
> > http://dpdk.org/ml/archives/dev/2014-October/006774.html
> > Do you have news?

> We are working on some enhancements on librte_cfg for release 2.1, so in
> order to avoid unnecessary code churn, it is probably better to have the
> librte_cfgfile changes done first, then have a subsequent patch on qos_sched.

Why not deduplicating now?

[dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread lcore_config

2015-02-09 Thread Olivier MATZ

Hi,

On 02/09/2015 12:33 PM, Liang, Cunming wrote:
>> On 02/02/2015 03:02 AM, Cunming Liang wrote:
>>> The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
>>> as the lcore no longer always 1:1 pinning with physical cpu.
>>> The lcore now stands for a EAL thread rather than a logical cpu.
>>>
>>> It doesn't change the default behavior of 1:1 mapping, but allows to
>>> affinity the EAL thread to multiple cpus.
>>>
>>> [...]
>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c
>> b/lib/librte_eal/bsdapp/eal/eal_memory.c
>>> index 65ee87d..a34d500 100644
>>> --- a/lib/librte_eal/bsdapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
>>> @@ -45,6 +45,8 @@
>>>  #include "eal_internal_cfg.h"
>>>  #include "eal_filesystem.h"
>>>
>>> +/* avoid re-defined against with freebsd header */
>>> +#undef PAGE_SIZE
>>>  #define PAGE_SIZE (sysconf(_SC_PAGESIZE))
>>
>> I don't see the link with the patch. Should this go somewhere else?

Maybe you missed this one.


>>> diff --git a/lib/librte_eal/common/include/rte_lcore.h
>> b/lib/librte_eal/common/include/rte_lcore.h
>>> index 49b2c03..4c7d6bb 100644
>>> --- a/lib/librte_eal/common/include/rte_lcore.h
>>> +++ b/lib/librte_eal/common/include/rte_lcore.h
>>> @@ -50,6 +50,13 @@ extern "C" {
>>>
>>>  #define LCORE_ID_ANY -1/**< Any lcore. */
>>>
>>> +#if defined(__linux__)
>>> +   typedef cpu_set_t rte_cpuset_t;
>>> +#elif defined(__FreeBSD__)
>>> +#include 
>>> +   typedef cpuset_t rte_cpuset_t;
>>> +#endif
>>> +
>>
>> Should we also define RTE_CPU_SETSIZE?
>> For linux, should  be included?
> [LCM] It uses the fix size cpuset, won't use CPU_ALLOC() to get the pointer 
> of cpuset.
> The RTE_CPU_SETSIZE always equal to sizeof(rte_cpuset_t).

The advantage of using CPU_ALLOC() is to avoid issues when the number
of core will be higher than 1024. I agree it's probably a bit early
to think about this, but it could happen soon :)


>> If I understand well, after the patch series, the user of
>> rte_thread_set_affinity() and rte_thread_get_affinity() are
>> supposed to use the macros from sched.h to access to this
>> cpuset parameter. So I'm wondering if it's not better to
>> use cpu_set_t from libc instead of redefining rte_cpuset_t.
>>
>> To reword my question: what is the purpose of redefining
>> cpu_set_t in rte_cpuset_t if we still need to use all the
>> libc API to access to it?
> [LCM] In linux the type is *cpu_set_t*, but in freebsd it's *cpuset_t*.
> The purpose of *rte_cpuset_t* is to make the consistent type definition in 
> EAL, and to avoid lots of #ifdef for this diff.
> In either linux or freebsd, it still can use the MACRO in libc to set the 
> rte_cpuset_t.

OK, it makes sense then. I did not notice the difference between linux
and bsd.

[dpdk-dev] [PATCH v6 1/2] librte_pmd_null: Add null PMD

2015-02-09 Thread Tetsuya Mukawa

On 2015/02/06 20:32, Iremonger, Bernard wrote:
> Hi Tetsuya,
>
> My comments are in line below.
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Friday, February 6, 2015 4:38 AM
>> To: dev at dpdk.org
>> Cc: Iremonger, Bernard; Tetsuya Mukawa
>> Subject: [PATCH v6 1/2] librte_pmd_null: Add null PMD
>>
>> 'null PMD' is a driver of the virtual device particulary designed to measure 
>> performance of DPDK
>> PMDs. When an application call rx, null PMD just allocates mbufs and returns 
>> those. Also tx, the PMD
>> just frees mbufs.
>>
>> The PMD has following options.
>> - size: specify packe size allocated by RX. Default packet size is 64.
>> - copy: specify 1 or 0 to enable or disable copy while RX and TX.
>>  Default value is 0(disbaled).
>>  This option is used for emulating more realistic data transfer.
>>  Copy size is equal to packet size.
>>
>> To use the PMD, enable CONFIG_RTE_BUILD_SHARED_LIB in config file. Then 
>> compile the PMD as
>> shared library. The library can be linked using '-d'
>> option when an application invokes.
>>
>> Here is an example.
>> $ sudo ./testpmd -c f -n 4 -d librte_pmd_null.so \
>>  --vdev 'eth_null0' --vdev 'eth_null1' -- -i --no-flush-rx
>>
>> If testpmd is compiled with CONFIG_RTE_BUILD_SHARED_LIB, it may need to 
>> specify more libraries
>> using '-d' option.
>>
>> v4:
>>  - Fix memory leak.
>>(Thanks to Iremonger, Bernard)
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  config/common_bsdapp   |   5 +
>>  config/common_linuxapp |   5 +
>>  lib/Makefile   |   1 +
>>  lib/librte_pmd_null/Makefile   |  58 +
>>  lib/librte_pmd_null/rte_eth_null.c | 485 
>> +
>>  5 files changed, 554 insertions(+)
>>  create mode 100644 lib/librte_pmd_null/Makefile  create mode 100644
>> lib/librte_pmd_null/rte_eth_null.c
>>
>> diff --git a/config/common_bsdapp b/config/common_bsdapp index 
>> 9177db1..fa849be 100644
>> --- a/config/common_bsdapp
>> +++ b/config/common_bsdapp
>> @@ -224,6 +224,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y  
>> CONFIG_RTE_LIBRTE_PMD_BOND=y
>>
>>  #
>> +# Compile null PMD
>> +#
>> +CONFIG_RTE_LIBRTE_PMD_NULL=y
>> +
>> +#
>>  # Do prefetch of packet data within PMD driver receive function  #
>> CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/config/common_linuxapp
>> b/config/common_linuxapp index 27d05be..456fbfe 100644
>> --- a/config/common_linuxapp
>> +++ b/config/common_linuxapp
>> @@ -237,6 +237,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
>>
>>  #
>> +# Compile null PMD
>> +#
>> +CONFIG_RTE_LIBRTE_PMD_NULL=y
>> +
>> +#
>>  # Do prefetch of packet data within PMD driver receive function  #
>> CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/lib/Makefile b/lib/Makefile 
>> index
>> 0ffc982..d246c53 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -52,6 +52,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
>>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
>>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
>>  DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += librte_pmd_null
>>  DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
>>  DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
>>  DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl diff --git 
>> a/lib/librte_pmd_null/Makefile
>> b/lib/librte_pmd_null/Makefile new file mode 100644 index 000..0ec4db9
>> --- /dev/null
>> +++ b/lib/librte_pmd_null/Makefile
>> @@ -0,0 +1,58 @@
>> +#   BSD LICENSE
>> +#
>> +#   Copyright (C) IGEL Co.,Ltd.
>> +#   All rights reserved.
>> +#
>> +#   Redistribution and use in source and binary forms, with or without
>> +#   modification, are permitted provided that the following conditions
>> +#   are met:
>> +#
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in
>> +#   the documentation and/or other materials provided with the
>> +#   distribution.
>> +# * Neither the name of IGEL Co.,Ltd. nor the names of its
>> +#   contributors may be used to endorse or promote products derived
>> +#   from this software without specific prior written permission.
>> +#
>> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;

[dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread lcore_config

2015-02-09 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> Sent: Monday, February 09, 2015 5:07 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread 
> lcore_config
> 
> Hi,
> 
> On 02/09/2015 12:33 PM, Liang, Cunming wrote:
> >> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> >>> The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
> >>> as the lcore no longer always 1:1 pinning with physical cpu.
> >>> The lcore now stands for a EAL thread rather than a logical cpu.
> >>>
> >>> It doesn't change the default behavior of 1:1 mapping, but allows to
> >>> affinity the EAL thread to multiple cpus.
> >>>
> >>> [...]
> >>> diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c
> >> b/lib/librte_eal/bsdapp/eal/eal_memory.c
> >>> index 65ee87d..a34d500 100644
> >>> --- a/lib/librte_eal/bsdapp/eal/eal_memory.c
> >>> +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
> >>> @@ -45,6 +45,8 @@
> >>>  #include "eal_internal_cfg.h"
> >>>  #include "eal_filesystem.h"
> >>>
> >>> +/* avoid re-defined against with freebsd header */
> >>> +#undef PAGE_SIZE
> >>>  #define PAGE_SIZE (sysconf(_SC_PAGESIZE))
> >>
> >> I don't see the link with the patch. Should this go somewhere else?
> 
> Maybe you missed this one.
> 
> 
> >>> diff --git a/lib/librte_eal/common/include/rte_lcore.h
> >> b/lib/librte_eal/common/include/rte_lcore.h
> >>> index 49b2c03..4c7d6bb 100644
> >>> --- a/lib/librte_eal/common/include/rte_lcore.h
> >>> +++ b/lib/librte_eal/common/include/rte_lcore.h
> >>> @@ -50,6 +50,13 @@ extern "C" {
> >>>
> >>>  #define LCORE_ID_ANY -1/**< Any lcore. */
> >>>
> >>> +#if defined(__linux__)
> >>> + typedef cpu_set_t rte_cpuset_t;
> >>> +#elif defined(__FreeBSD__)
> >>> +#include 
> >>> + typedef cpuset_t rte_cpuset_t;
> >>> +#endif
> >>> +
> >>
> >> Should we also define RTE_CPU_SETSIZE?
> >> For linux, should  be included?
> > [LCM] It uses the fix size cpuset, won't use CPU_ALLOC() to get the pointer 
> > of cpuset.
> > The RTE_CPU_SETSIZE always equal to sizeof(rte_cpuset_t).
> 
> The advantage of using CPU_ALLOC() is to avoid issues when the number
> of core will be higher than 1024. I agree it's probably a bit early
> to think about this, but it could happen soon :)

I personally don't think, we'll hit 1K cpu limit anytime soon...
>From other side - fixed size cpuset allows to cleanup and simplify code quite 
>a bit.
So, I'd suggest to stick with fixed size for now.
Konstantin

> 
> 
> >> If I understand well, after the patch series, the user of
> >> rte_thread_set_affinity() and rte_thread_get_affinity() are
> >> supposed to use the macros from sched.h to access to this
> >> cpuset parameter. So I'm wondering if it's not better to
> >> use cpu_set_t from libc instead of redefining rte_cpuset_t.
> >>
> >> To reword my question: what is the purpose of redefining
> >> cpu_set_t in rte_cpuset_t if we still need to use all the
> >> libc API to access to it?
> > [LCM] In linux the type is *cpu_set_t*, but in freebsd it's *cpuset_t*.
> > The purpose of *rte_cpuset_t* is to make the consistent type definition in 
> > EAL, and to avoid lots of #ifdef for this diff.
> > In either linux or freebsd, it still can use the MACRO in libc to set the 
> > rte_cpuset_t.
> 
> OK, it makes sense then. I did not notice the difference between linux
> and bsd.

[dpdk-dev] [PATCH v7] testpmd: Add port hotplug support

2015-02-09 Thread Tetsuya Mukawa

The patch introduces following commands.
- port attach [ident]
- port detach [port_id]
 - attach: attaching a port
 - detach: detaching a port
 - ident: pci address of physical device.
  Or device name and parameters of virtual device.
 (ex. :02:00.0, eth_pcap0,iface=eth0)
 - port_id: port identifier

v7:
- Fix doc.
  (Thanks to Iremonger, Bernard)
- Fix port checking implementation of star_port();
  (Thanks to Qiu, Michael)
v5:
- Add testpmd documentation.
  (Thanks to Iremonger, Bernard)
v4:
 - Fix strings of command help.

Signed-off-by: Tetsuya Mukawa 
---
 app/test-pmd/cmdline.c  | 133 +++
 app/test-pmd/config.c   | 116 +---
 app/test-pmd/parameters.c   |  22 ++-
 app/test-pmd/testpmd.c  | 199 +---
 app/test-pmd/testpmd.h  |  18 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  57 
 6 files changed, 415 insertions(+), 130 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 590e427..a4ca914 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -573,6 +573,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"port close (port_id|all)\n"
"Close all ports or port_id.\n\n"

+   "port attach (ident)\n"
+   "Attach physical or virtual dev by pci address or 
virtual device name\n\n"
+
+   "port detach (port_id)\n"
+   "Detach physical or virtual dev by port_id\n\n"
+
"port config (port_id|all)"
" speed (10|100|1000|1|4|auto)"
" duplex (half|full|auto)\n"
@@ -864,6 +870,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = {
},
 };

+/* *** attach a specified port *** */
+struct cmd_operate_attach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t identifier;
+};
+
+static void cmd_operate_attach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_attach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "attach"))
+   attach_port(res->identifier);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_attach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_attach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   keyword, "attach");
+cmdline_parse_token_string_t cmd_operate_attach_port_identifier =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   identifier, NULL);
+
+cmdline_parse_inst_t cmd_operate_attach_port = {
+   .f = cmd_operate_attach_port_parsed,
+   .data = NULL,
+   .help_str = "port attach identifier, "
+   "identifier: pci address or virtual dev name",
+   .tokens = {
+   (void *)_operate_attach_port_port,
+   (void *)_operate_attach_port_keyword,
+   (void *)_operate_attach_port_identifier,
+   NULL,
+   },
+};
+
+/* *** detach a specified port *** */
+struct cmd_operate_detach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   uint8_t port_id;
+};
+
+static void cmd_operate_detach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_detach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "detach"))
+   detach_port(res->port_id);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_detach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_detach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   keyword, "detach");
+cmdline_parse_token_num_t cmd_operate_detach_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_operate_detach_port = {
+   .f = cmd_operate_detach_port_parsed,
+   .data = NULL,
+   .help_str = "port detach port_id",
+   .tokens = {
+   (void *)_operate_detach_port_port,
+   (void *)_operate_detach_port_keyword,
+   (void

[dpdk-dev] [PATCH v7] librte_pmd_pcap: Add port hotplug support

2015-02-09 Thread Tetsuya Mukawa

This patch adds finalization code to free resources allocated by the
PMD.

v6:
 - Fix a paramter of rte_eth_dev_free().
v4:
 - Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index af7fae8..5f88efd 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -498,6 +498,13 @@ static struct eth_dev_ops ops = {
.stats_reset = eth_stats_reset,
 };

+static struct eth_driver rte_pcap_pmd = {
+   .pci_drv = {
+   .name = "rte_pcap_pmd",
+   .drv_flags = RTE_PCI_DRV_DETACHABLE,
+   },
+};
+
 /*
  * Function handler that opens the pcap file for reading a stores a
  * reference of it for use it later on.
@@ -713,6 +720,10 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
if (*eth_dev == NULL)
goto error;

+   /* check length of device name */
+   if ((strlen((*eth_dev)->data->name) + 1) > sizeof(data->name))
+   goto error;
+
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in pci_driver
@@ -739,10 +750,13 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
data->mac_addrs = _addr;
+   strncpy(data->name,
+   (*eth_dev)->data->name, strlen((*eth_dev)->data->name));

(*eth_dev)->data = data;
(*eth_dev)->dev_ops = 
(*eth_dev)->pci_dev = pci_dev;
+   (*eth_dev)->driver = _pcap_pmd;

return 0;

@@ -927,10 +941,36 @@ rte_pmd_pcap_devinit(const char *name, const char *params)

 }

+static int
+rte_pmd_pcap_devuninit(const char *name, const char *params __rte_unused)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   RTE_LOG(INFO, PMD, "Closing pcap ethdev on numa socket %u\n",
+   rte_socket_id());
+
+   if (name == NULL)
+   return -1;
+
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->pci_dev);
+
+   rte_eth_dev_free(eth_dev);
+
+   return 0;
+}
+
 static struct rte_driver pmd_pcap_drv = {
.name = "eth_pcap",
.type = PMD_VDEV,
.init = rte_pmd_pcap_devinit,
+   .uninit = rte_pmd_pcap_devuninit,
 };

 PMD_REGISTER_DRIVER(pmd_pcap_drv);
-- 
1.9.1

[dpdk-dev] [PATCH v7 14/14] doc: Add port hotplug framework section to programmers guide

2015-02-09 Thread Tetsuya Mukawa

This patch adds a new section for describing port hotplug framework.

Signed-off-by: Tetsuya Mukawa 
---
 doc/guides/prog_guide/index.rst  |   1 +
 doc/guides/prog_guide/port_hotplug_framework.rst | 110 +++
 2 files changed, 111 insertions(+)
 create mode 100644 doc/guides/prog_guide/port_hotplug_framework.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 8d86dd4..428b76b 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -70,6 +70,7 @@ Programmer's Guide
 packet_classif_access_ctrl
 packet_framework
 vhost_lib
+port_hotplug_framework
 source_org
 dev_kit_build_system
 dev_kit_root_make_help
diff --git a/doc/guides/prog_guide/port_hotplug_framework.rst 
b/doc/guides/prog_guide/port_hotplug_framework.rst
new file mode 100644
index 000..355ae28
--- /dev/null
+++ b/doc/guides/prog_guide/port_hotplug_framework.rst
@@ -0,0 +1,110 @@
+..  BSD LICENSE
+Copyright(c) 2015 IGEL Co.,Ltd. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of IGEL Co.,Ltd. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Port Hotplug Framework
+==
+
+The Port Hotplug Framework provides DPDK applications with the ability to
+attach and detach ports at runtime. Because the framework depends on PMD
+implementation, the ports that PMDs cannot handle are out of scope of this
+framework. Furthermore, after detaching a port from a DPDK application, the
+framework doesn't provide a way for removing the devices from the system.
+For the ports backed by a physical NIC, the kernel will need to support PCI
+Hotplug feature.
+
+Overview
+
+
+The basic requirements of the Port Hotplug Framework are:
+
+*   DPDK applications that use the Port Hotplug Framework must manage their
+own ports.
+
+The Port Hotplug Framework is implemented to allow DPDK applications to
+manage ports. For example, when DPDK applications call the port attach
+function, the attached port number is returned. DPDK applications can
+also detach the port by port number.
+
+*   Kernel support is needed for attaching or detaching physical device
+ports.
+
+To attach new physical device ports, the device will be recognized by
+userspace driver I/O framework in kernel at first. Then DPDK
+applications can call the Port Hotplug functions to attach the ports.
+For detaching, steps are vice versa.
+
+*   Before detaching, they must be stopped and closed.
+
+DPDK applications must call "rte_eth_dev_stop()" and
+"rte_eth_dev_close()" APIs before detaching ports. These functions will
+start finalization sequence of the PMDs.
+
+*   The framework doesn't affect legacy DPDK applications behavior.
+
+If the Port Hotplug functions aren't called, all legacy DPDK apps can
+still work without modifications.
+
+Port Hotplug API overview
+-
+
+*   Attaching a port
+
+"rte_eal_dev_attach()" API attaches a port to DPDK application, and
+returns the attached port number. Before calling the API, the device
+should be recognized by an userspace driver I/O framework. The API
+receives a pci address like ":01:00.0" or a virtual device name
+like "eth_pcap0,iface=eth0". In the case of virtual device name, the
+format is the same as the general "--vdev" option of DPDK.
+
+*

[dpdk-dev] [PATCH v7 13/14] eal: Enable port hotplug framework in Linux

2015-02-09 Thread Tetsuya Mukawa

The patch enables CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux configuration.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index d428f84..81055f8 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
+# Compile Environment Abstraction Layer to support hotplug
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
1.9.1

[dpdk-dev] [PATCH v7 12/14] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-09 Thread Tetsuya Mukawa

These functions are used for attaching or detaching a port.
When rte_eal_dev_attach() is called, the function tries to realize the
device name as pci address. If this is done successfully,
rte_eal_dev_attach() will attach physical device port. If not, attaches
virtual devive port.
When rte_eal_dev_detach() is called, the function gets the device type
of this port to know whether the port is come from physical or virtual.
And then specific detaching function will be called.

v7:
- Fix typo of warning messages.
  (Thanks to Qiu, Michael)
v5:
- Change function names like below.
  rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
  rte_eal_dev_invoke() to rte_eal_vdev_invoke().
- Add code to handle a return value of rte_eal_devargs_remove().
- Fix pci address format in rte_eal_dev_detach().
v4:
- Fix comment.
- Add error checking.
- Fix indent of 'if' statement.
- Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 274 
 lib/librte_eal/common/eal_private.h |  11 ++
 lib/librte_eal/common/include/rte_dev.h |  33 
 lib/librte_eal/linuxapp/eal/Makefile|   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   6 +-
 5 files changed, 322 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..39407c0 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -32,10 +32,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -107,3 +110,274 @@ rte_eal_dev_init(void)
}
return 0;
 }
+
+/* So far, DPDK hotplug function only supports linux */
+#ifdef ENABLE_HOTPLUG
+static void
+rte_eal_vdev_invoke(struct rte_driver *driver,
+   struct rte_devargs *devargs, enum rte_eal_invoke_type type)
+{
+   if ((driver == NULL) || (devargs == NULL))
+   return;
+
+   switch (type) {
+   case RTE_EAL_INVOKE_TYPE_PROBE:
+   driver->init(devargs->virtual.drv_name, devargs->args);
+   break;
+   case RTE_EAL_INVOKE_TYPE_CLOSE:
+   driver->uninit(devargs->virtual.drv_name, devargs->args);
+   break;
+   default:
+   break;
+   }
+}
+
+static int
+rte_eal_vdev_find_and_invoke(const char *name, int type)
+{
+   struct rte_devargs *devargs;
+   struct rte_driver *driver;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   /* call the init function for each virtual device */
+   TAILQ_FOREACH(devargs, _list, next) {
+
+   if (devargs->type != RTE_DEVTYPE_VIRTUAL)
+   continue;
+
+   if (strncmp(name, devargs->virtual.drv_name, strlen(name)))
+   continue;
+
+   TAILQ_FOREACH(driver, _driver_list, next) {
+   if (driver->type != PMD_VDEV)
+   continue;
+
+   /* search a driver prefix in virtual device name */
+   if (!strncmp(driver->name, devargs->virtual.drv_name,
+   strlen(driver->name))) {
+   rte_eal_vdev_invoke(driver, devargs, type);
+   break;
+   }
+   }
+
+   if (driver == NULL) {
+   RTE_LOG(WARNING, EAL, "no driver found for %s\n",
+ devargs->virtual.drv_name);
+   }
+   return 0;
+   }
+   return 1;
+}
+
+/* attach the new physical device, then store port_id of the device */
+static int
+rte_eal_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   uint8_t new_port_id;
+   struct rte_eth_dev devs[RTE_MAX_ETHPORTS];
+
+   if ((addr == NULL) || (port_id == NULL))
+   goto err;
+
+   /* save current port status */
+   rte_eth_dev_save(devs);
+   /* re-construct pci_device_list */
+   if (rte_eal_pci_scan())
+   goto err;
+   /* invoke probe func of the driver can handle the new device */
+   if (rte_eal_pci_probe_one(addr))
+   goto err;
+   /* get port_id enabled by above procedures */
+   if (rte_eth_dev_get_changed_port(devs, _port_id))
+   goto err;
+
+   *port_id = new_port_id;
+   return 0;
+err:
+   RTE_LOG(ERR, EAL, "Driver, cannot attach the device\n");
+   return -1;
+}
+
+/* detach the new physical device, then store pci_addr of the device */
+static int
+rte_eal_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   struct rte_pci_addr freed_addr;
+   struct rte_pci_addr vp;
+
+   if (addr == NULL)
+   goto err;
+
+   /* check whether the driver supports detach feature, or not */
+   if

[dpdk-dev] [PATCH v7 10/14] eal/pci: Cleanup pci driver initialization code

2015-02-09 Thread Tetsuya Mukawa

- Add rte_eal_pci_close_one_dirver()
  The function is used for closing the specified driver and device.
- Add pci_invoke_all_drivers()
  The function is based on pci_probe_all_drivers. But it can not only
  probe but also close drivers.
- Add pci_close_all_drivers()
  The function tries to find a driver for the specified device, and
  then close the driver.
- Add rte_eal_pci_probe_one() and rte_eal_pci_close_one()
  The functions are used for probe and close a device.
  First the function tries to find a device that has the specified
  PCI address. Then, probe or close the device.

v5:
- Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
v4:
- Fix parameter checking.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c  | 90 +
 lib/librte_eal/common/eal_private.h | 24 +
 lib/librte_eal/common/include/rte_pci.h | 33 
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 69 +
 4 files changed, 206 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index a89f5c3..7c9b8c5 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -99,19 +99,27 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/*
- * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if initialization
- * failed, return 1 if no driver is found for this device.
- */
 static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+pci_invoke_all_drivers(struct rte_pci_device *dev,
+   enum rte_eal_invoke_type type)
 {
struct rte_pci_driver *dr = NULL;
-   int rc;
+   int rc = 0;
+
+   if ((dev == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX))
+   return -1;

TAILQ_FOREACH(dr, _driver_list, next) {
-   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   switch (type) {
+   case RTE_EAL_INVOKE_TYPE_PROBE:
+   rc = rte_eal_pci_probe_one_driver(dr, dev);
+   break;
+   case RTE_EAL_INVOKE_TYPE_CLOSE:
+   rc = rte_eal_pci_close_one_driver(dr, dev);
+   break;
+   default:
+   return -1;
+   }
if (rc < 0)
/* negative value is an error */
return -1;
@@ -123,6 +131,66 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
return 1;
 }

+#ifdef ENABLE_HOTPLUG
+static int
+rte_eal_pci_invoke_one(struct rte_pci_addr *addr,
+   enum rte_eal_invoke_type type)
+{
+   struct rte_pci_device *dev = NULL;
+   int ret = 0;
+
+   if ((addr == NULL) || (type >= RTE_EAL_INVOKE_TYPE_MAX))
+   return -1;
+
+   TAILQ_FOREACH(dev, _device_list, next) {
+   if (eal_compare_pci_addr(>addr, addr))
+   continue;
+
+   ret = pci_invoke_all_drivers(dev, type);
+   if (ret < 0)
+   goto invoke_err_return;
+
+   if (type == RTE_EAL_INVOKE_TYPE_CLOSE)
+   goto remove_dev;
+
+   return 0;
+   }
+
+   return -1;
+
+invoke_err_return:
+   RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
+   " cannot be used\n", dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function);
+   return -1;
+
+remove_dev:
+   TAILQ_REMOVE(_device_list, dev, next);
+   return 0;
+}
+
+
+/*
+ * Find the pci device specified by pci address, then invoke probe function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_probe_one(struct rte_pci_addr *addr)
+{
+   return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_PROBE);
+}
+
+/*
+ * Find the pci device specified by pci address, then invoke close function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_close_one(struct rte_pci_addr *addr)
+{
+   return rte_eal_pci_invoke_one(addr, RTE_EAL_INVOKE_TYPE_CLOSE);
+}
+#endif /* ENABLE_HOTPLUG */
+
 /*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
@@ -148,10 +216,12 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev,
+   RTE_EAL_INVOKE_TYPE_PROBE);
else if (devargs != NULL &&
devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
+   ret = pci_invoke_all_drivers(dev,
+

[dpdk-dev] [PATCH v7 09/14] eal/pci: Add a function to remove the entry of devargs list

2015-02-09 Thread Tetsuya Mukawa

The function removes the specified devargs entry from devargs_list.
Also, the patch adds sanity checking to rte_eal_devargs_add().

v5:
- Change function definition of rte_eal_devargs_remove().
v4:
- Fix sanity check code.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_devargs.c  | 60 +
 lib/librte_eal/common/include/rte_devargs.h | 21 ++
 2 files changed, 81 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index 4c7d11a..5b1ac8e 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -44,6 +44,35 @@
 struct rte_devargs_list devargs_list =
TAILQ_HEAD_INITIALIZER(devargs_list);

+
+/* find a entry specified by pci address or device name */
+static struct rte_devargs *
+rte_eal_devargs_find(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return NULL;
+
+   TAILQ_FOREACH(devargs, _list, next) {
+   switch (devtype) {
+   case RTE_DEVTYPE_WHITELISTED_PCI:
+   case RTE_DEVTYPE_BLACKLISTED_PCI:
+   if (eal_compare_pci_addr(>pci.addr, args) == 0)
+   goto found;
+   break;
+   case RTE_DEVTYPE_VIRTUAL:
+   if (memcmp(>virtual.drv_name, args,
+   strlen((char *)args)) == 0)
+   goto found;
+   break;
+   }
+   }
+   return NULL;
+found:
+   return devargs;
+}
+
 /* store a whitelist parameter for later parsing */
 int
 rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
@@ -87,6 +116,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
free(devargs);
return -1;
}
+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, >pci.addr)) {
+   RTE_LOG(ERR, EAL,
+   "device already registered: <%s>\n", buf);
+   return -1;
+   }
break;
case RTE_DEVTYPE_VIRTUAL:
/* save driver name */
@@ -98,6 +133,12 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
free(devargs);
return -1;
}
+   /* make sure there is no same entry */
+   if (rte_eal_devargs_find(devtype, >virtual.drv_name)) {
+   RTE_LOG(ERR, EAL,
+   "device already registered: <%s>\n", buf);
+   return -1;
+   }
break;
}

@@ -105,6 +146,25 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
return 0;
 }

+/* remove it from the devargs_list */
+int
+rte_eal_devargs_remove(enum rte_devtype devtype, void *args)
+{
+   struct rte_devargs *devargs;
+
+   if (args == NULL)
+   return -EINVAL;
+
+   devargs = rte_eal_devargs_find(devtype, args);
+   if (devargs == NULL) {
+   RTE_LOG(ERR, EAL, "device not found\n");
+   return -ENODEV;
+   }
+
+   TAILQ_REMOVE(_list, devargs, next);
+   return 0;
+}
+
 /* count the number of devices of a specified type */
 unsigned int
 rte_eal_devargs_type_count(enum rte_devtype devtype)
diff --git a/lib/librte_eal/common/include/rte_devargs.h 
b/lib/librte_eal/common/include/rte_devargs.h
index 9f9c98f..6d9763b 100644
--- a/lib/librte_eal/common/include/rte_devargs.h
+++ b/lib/librte_eal/common/include/rte_devargs.h
@@ -123,6 +123,27 @@ extern struct rte_devargs_list devargs_list;
 int rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str);

 /**
+ * Remove a device from the user device list
+ *
+ * For PCI devices, the format of arguments string is "PCI_ADDR". It shouldn't
+ * involve parameters for the device. Example: "08:00.1".
+ *
+ * For virtual devices, the format of arguments string is "DRIVER_NAME*". It
+ * shouldn't involve parameters for the device. Example: "eth_ring". The
+ * validity of the driver name is not checked by this function, it is done
+ * when closing the drivers.
+ *
+ * @param devtype
+ *   The type of the device.
+ * @param name
+ *   The name of the device.
+ *
+ * @return
+ *   - 0 on success, negative on error
+ */
+int rte_eal_devargs_remove(enum rte_devtype devtype, void *args);
+
+/**
  * Count the number of user devices of a specified type
  *
  * @param devtype
-- 
1.9.1

[dpdk-dev] [PATCH v7 07/14] ethdev: Add functions that will be used by port hotplug functions

2015-02-09 Thread Tetsuya Mukawa

The patch adds following functions.

- rte_eth_dev_save()
  The function is used for saving current rte_eth_dev structures.
- rte_eth_dev_get_changed_port()
  The function receives the rte_eth_dev structures, then compare
  these with current values to know which port is actually
  attached or detached.
- rte_eth_dev_get_addr_by_port()
  The function returns a pci address of an ethdev specified by port
  identifier.
- rte_eth_dev_get_port_by_addr()
  The function returns a port identifier of an ethdev specified by
  pci address.
- rte_eth_dev_get_name_by_port()
  The function returns a unique identifier name of an ethdev
  specified by port identifier.
- Add rte_eth_dev_check_detachable()
  The function returns whether a PMD supports detach function.

Also, the patch changes scope of rte_eth_dev_allocated() to global.
This function will be called by virtual PMDs to support port hotplug.
So change scope of the function to global.

v7:
- Add pt_driver checking to rte_eth_dev_check_detachable().
  (Thanks to Qiu, Michael)
v5:
- Fix return value of below functions.
  rte_eth_dev_get_changed_port().
  rte_eth_dev_get_port_by_addr().
v4:
- Add parameter checking.
v3:
- Fix if-condition bug while comparing pci addresses.
- Add error checking codes.
Reported-by: Mark Enright 

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 109 +-
 lib/librte_ether/rte_ethdev.h |  80 +++
 2 files changed, 188 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 7bed901..14a040a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -206,7 +206,7 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

-static struct rte_eth_dev *
+struct rte_eth_dev *
 rte_eth_dev_allocated(const char *name)
 {
unsigned i;
@@ -426,6 +426,113 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+void
+rte_eth_dev_save(struct rte_eth_dev *devs)
+{
+   if (devs == NULL)
+   return;
+
+   /* save current rte_eth_devices */
+   memcpy(devs, rte_eth_devices,
+   sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS);
+}
+
+int
+rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
+{
+   if ((devs == NULL) || (port_id == NULL))
+   return -EINVAL;
+
+   /* check which port was attached or detached */
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) {
+   if (rte_eth_devices[*port_id].attached ^ devs->attached)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
+   return -EINVAL;
+
+   if (addr == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   *addr = rte_eth_devices[port_id].pci_dev->addr;
+   return 0;
+}
+
+int
+rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   struct rte_pci_addr *tmp;
+
+   if ((addr == NULL) || (port_id == NULL)) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
+   if (!rte_eth_devices[*port_id].attached)
+   continue;
+   if (!rte_eth_devices[*port_id].pci_dev)
+   continue;
+   tmp = _eth_devices[*port_id].pci_dev->addr;
+   if (eal_compare_pci_addr(tmp, addr) == 0)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
+{
+   char *tmp;
+
+   if (rte_eth_dev_validate_port(port_id, TRACE) == DEV_INVALID)
+   return -EINVAL;
+
+   if (name == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   /* shouldn't check 'rte_eth_devices[i].data',
+* because it might be overwritten by VDEV PMD */
+   tmp = rte_eth_dev_data[port_id].name;
+   strncpy(name, tmp, strlen(tmp) + 1);
+   return 0;
+}
+
+int
+rte_eth_dev_check_detachable(uint8_t port_id)
+{
+   uint32_t drv_flags;
+
+   if (port_id >= RTE_MAX_ETHPORTS) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   if (rte_eth_devices[port_id].dev_type == RTE_ETH_DEV_PHYSICAL) {
+   switch (rte_eth_devices[port_id].pci_dev->pt_driver) {
+   case RTE_PT_IGB_UIO:
+   case RTE_PT_UIO_GENERIC:
+   break;
+   case RTE_PT_VFIO:
+   default:
+   return

[dpdk-dev] [PATCH v7 06/14] eal, ethdev: Add a function and function pointers to close ether device

2015-02-09 Thread Tetsuya Mukawa

The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.
Also, the patch adds rte_eth_dev_uninit(). So far, it's not called
by anywhere, but it will be called when port hotplug function is
implemented.

v6:
- Fix rte_eth_dev_uninit() to handle a return value of uninit
  function of PMD.
v4:
- Add parameter checking.
- Change function names.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  7 +
 lib/librte_ether/rte_ethdev.c   | 47 +
 lib/librte_ether/rte_ethdev.h   | 24 +
 3 files changed, 78 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 4814cd7..87ca4cf 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -189,12 +189,19 @@ struct rte_pci_driver;
 typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *);

 /**
+ * Uninitialisation function for the driver called during hotplugging.
+ */
+typedef int (pci_devuninit_t)(
+   struct rte_pci_driver *, struct rte_pci_device *);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
TAILQ_ENTRY(rte_pci_driver) next;   /**< Next in list. */
const char *name;   /**< Driver name. */
pci_devinit_t *devinit; /**< Device init. function. */
+   pci_devuninit_t *devuninit; /**< Device uninit function. */
struct rte_pci_id *id_table;/**< ID table, NULL terminated. 
*/
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b58bab3..7bed901 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -326,6 +326,52 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
return diag;
 }

+static int
+rte_eth_dev_uninit(struct rte_pci_driver *pci_drv,
+struct rte_pci_device *pci_dev)
+{
+   struct eth_driver *eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+   int ret;
+
+   if ((pci_drv == NULL) || (pci_dev == NULL))
+   return -EINVAL;
+
+   /* Create unique Ethernet device name using PCI address */
+   snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
+   pci_dev->addr.bus, pci_dev->addr.devid,
+   pci_dev->addr.function);
+
+   eth_dev = rte_eth_dev_allocated(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENODEV;
+
+   eth_drv = (struct eth_driver *)pci_drv;
+
+   /* Invoke PMD device uninit function */
+   if (*eth_drv->eth_dev_uninit) {
+   ret = (*eth_drv->eth_dev_uninit)(eth_drv, eth_dev);
+   if (ret)
+   return ret;
+   }
+
+   /* free ether device */
+   rte_eth_dev_free(eth_dev);
+
+   /* init user callbacks */
+   TAILQ_INIT(&(eth_dev->callbacks));
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+
+   eth_dev->pci_dev = NULL;
+   eth_dev->driver = NULL;
+   eth_dev->data = NULL;
+
+   return 0;
+}
+
 /**
  * Register an Ethernet [Poll Mode] driver.
  *
@@ -344,6 +390,7 @@ void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
eth_drv->pci_drv.devinit = rte_eth_dev_init;
+   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
rte_eal_pci_register(_drv->pci_drv);
 }

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fbe7ac1..91d9e86 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1678,6 +1678,27 @@ typedef int (*eth_dev_init_t)(struct eth_driver  
*eth_drv,

 /**
  * @internal
+ * Finalization function of an Ethernet driver invoked for each matching
+ * Ethernet PCI device detected during the PCI closing phase.
+ *
+ * @param eth_drv
+ *   The pointer to the [matching] Ethernet driver structure supplied by
+ *   the PMD when it registered itself.
+ * @param eth_dev
+ *   The *eth_dev* pointer is the address of the *rte_eth_dev* structure
+ *   associated with the matching device and which have been [automatically]
+ *   allocated in the *rte_eth_devices* array.
+ * @return
+ *   - 0: Success, the device is properly finalized by the driver.
+ *In particular, the driver MUST free the *dev_ops* pointer
+ *of the *eth_dev* structure.
+ *   - <0: Error code of the device initialization failure.
+ */
+typedef int (*eth_dev_uninit_t)(struct eth_driver  *eth_drv,
+ struct rte_eth_dev *eth_dev);
+
+/**
+ * @internal
  * The structure associated with a PMD Ethernet driver.
  *
  * Each Ethernet driver acts as a PCI driver and is

[dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs

2015-02-09 Thread Tetsuya Mukawa

This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
eal_compare_pci_addr().

v5:
- Fix pci_scan_one to handle pt_driver correctly.
v4:
- Fix calculation method of eal_compare_pci_addr().
- Add parameter checking.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 25 ---
 lib/librte_eal/common/eal_common_pci.c|  2 +-
 lib/librte_eal/common/include/rte_pci.h   | 34 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c | 25 ---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
 5 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 74ecce7..c844d58 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return (0);
 }

-/* Compare two PCI device addresses. */
-static int
-pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
-{
-   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
(addr->devid << 8) + addr->function;
-   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
(addr2->devid << 8) + addr2->function;
-
-   if (dev_addr > dev_addr2)
-   return 1;
-   else
-   return 0;
-}
-
-
 /* Scan one pci sysfs entry, and fill the devices list from it. */
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
@@ -356,13 +342,20 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
else {
struct rte_pci_device *dev2 = NULL;
+   int ret;

TAILQ_FOREACH(dev2, _device_list, next) {
-   if (pci_addr_comparison(>addr, >addr))
+   ret = eal_compare_pci_addr(>addr, >addr);
+   if (ret > 0)
continue;
-   else {
+   else if (ret < 0) {
TAILQ_INSERT_BEFORE(dev2, dev, next);
return 0;
+   } else { /* already registered */
+   /* update pt_driver */
+   dev2->pt_driver = dev->pt_driver;
+   free(dev);
+   return 0;
}
}
TAILQ_INSERT_TAIL(_device_list, dev, next);
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f3c7f71..a89f5c3 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
continue;
-   if (!memcmp(>addr, >pci.addr, sizeof(dev->addr)))
+   if (!eal_compare_pci_addr(>addr, >pci.addr))
return devargs;
}
return NULL;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 7f2d699..4814cd7 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, struct 
rte_pci_addr *dev_addr)
 }
 #undef GET_PCIADDR_FIELD

+/* Compare two PCI device addresses. */
+/**
+ * Utility function to compare two PCI device addresses.
+ *
+ * @param addr
+ * The PCI Bus-Device-Function address to compare
+ * @param addr2
+ * The PCI Bus-Device-Function address to compare
+ * @return
+ * 0 on equal PCI address.
+ * Positive on addr is greater than addr2.
+ * Negative on addr is less than addr2, or error.
+ */
+static inline int
+eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
+{
+   uint64_t dev_addr, dev_addr2;
+
+   if ((addr == NULL) || (addr2 == NULL))
+   return -1;
+
+   dev_addr = (addr->domain << 24) | (addr->bus << 16) |
+   (addr->devid << 8) | addr->function;
+   dev_addr2 = (addr2->domain << 24) | (addr2->bus << 16) |
+   (addr2->devid << 8) | addr2->function;
+
+   if (dev_addr > dev_addr2)
+   return 1;
+   else if (dev_addr < dev_addr2)
+   return -1;
+   else
+   return 0;
+}
+
 /**
  * Probe the PCI bus for registered drivers.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index c0ca5a5..d847102 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -229,20 +229,6 @@ error:
return -1;
 }

-/* Compare two PCI device addresses. */
-static int

[dpdk-dev] [PATCH v7 03/14] eal/pci, ethdev: Remove assumption that port will not be detached

2015-02-09 Thread Tetsuya Mukawa

To remove assumption, do like followings.

This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicate the driver can detach devices at runtime.
Also, remove assumption that port will not be detached.

To remove the assumption.
- Add 'attached' member to rte_eth_dev structure.
  This member is used for indicating the port is attached, or not.
- Add rte_eth_dev_allocate_new_port().
  This function is used for allocating new port.

v5:
- Change parameters of rte_eth_dev_validate_port() to cleanup code.
v4:
- Use braces with 'for' loop.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |   2 +
 lib/librte_ether/rte_ethdev.c   | 454 +---
 lib/librte_ether/rte_ethdev.h   |   5 +
 3 files changed, 186 insertions(+), 275 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 7b48b55..7f2d699 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -207,6 +207,8 @@ struct rte_pci_driver {
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
+/** Device driver supports detaching capability */
+#define RTE_PCI_DRV_DETACHABLE 0x0010

 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ea3a1fb..d70854f 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -175,6 +175,16 @@ enum {
STAT_QMAP_RX
 };

+enum {
+   DEV_INVALID = 0,
+   DEV_VALID,
+};
+
+enum {
+   DEV_DISCONNECTED = 0,
+   DEV_CONNECTED
+};
+
 static inline void
 rte_eth_dev_data_alloc(void)
 {
@@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)
 {
unsigned i;

-   for (i = 0; i < nb_ports; i++) {
-   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if ((rte_eth_devices[i].attached == DEV_CONNECTED) &&
+   strcmp(rte_eth_devices[i].data->name, name) == 0)
return _eth_devices[i];
}
return NULL;
 }

+static uint8_t
+rte_eth_dev_allocate_new_port(void)
+{
+   unsigned i;
+
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
+   return i;
+   }
+   return RTE_MAX_ETHPORTS;
+}
+
 struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
+   uint8_t port_id;
struct rte_eth_dev *eth_dev;

-   if (nb_ports == RTE_MAX_ETHPORTS) {
+   port_id = rte_eth_dev_allocate_new_port();
+   if (port_id == RTE_MAX_ETHPORTS) {
PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
return NULL;
}
@@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
return NULL;
}

-   eth_dev = _eth_devices[nb_ports];
-   eth_dev->data = _eth_dev_data[nb_ports];
+   eth_dev = _eth_devices[port_id];
+   eth_dev->data = _eth_dev_data[port_id];
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
-   eth_dev->data->port_id = nb_ports++;
+   eth_dev->data->port_id = port_id;
+   eth_dev->attached = DEV_CONNECTED;
+   nb_ports++;
return eth_dev;
 }

@@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
(unsigned) pci_dev->id.device_id);
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
rte_free(eth_dev->data->dev_private);
+   eth_dev->attached = DEV_DISCONNECTED;
nb_ports--;
return diag;
 }
@@ -308,10 +336,28 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
rte_eal_pci_register(_drv->pci_drv);
 }

+enum {
+   NONE_TRACE = 0,
+   TRACE
+};
+
+static int
+rte_eth_dev_validate_port(uint8_t port_id, int trace)
+{
+   if (port_id >= RTE_MAX_ETHPORTS ||
+   rte_eth_devices[port_id].attached != DEV_CONNECTED) {
+   if (trace) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   }
+   return DEV_INVALID;
+   } else
+   return DEV_VALID;
+}
+
 int
 rte_eth_dev_socket_id(uint8_t port_id)
 {
-   if (port_id >= nb_ports)
+   if (rte_eth_dev_validate_port(port_id, NONE_TRACE) == DEV_INVALID)
return -1;
return rte_eth_devices[port_id].pci_dev->numa_node;
 }
@@ -369,10 +415,8 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t 
rx_queue_id)
 * in a multi-process setup*/
PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);

-   if (port_id >= nb_ports) {
-   PMD_DEBUG_TRACE("Invalid port_id=%d\n",

[dpdk-dev] [PATCH v7 01/14] eal_pci: Add flag to hold kernel driver type

2015-02-09 Thread Tetsuya Mukawa

From: Michael Qiu 

Currently, dpdk has no ability to know which type of driver(
vfio-pci/igb_uio/uio_pci_generic) the device used. It only can
check whether vfio is enabled or not staticly.

It really useful to have the flag, becasue different type need to
handle differently in runtime. For example, pci memory map,
pot hotplug, and so on.

This patch add a flag field for pci device to solve above issue.

Signed-off-by: Michael Qiu 
Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  8 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 53 +++--
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..7b48b55 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -139,6 +139,13 @@ struct rte_pci_addr {

 struct rte_devargs;

+enum rte_pt_driver {
+   RTE_PT_UNKNOWN  = 0,
+   RTE_PT_IGB_UIO  = 1,
+   RTE_PT_VFIO = 2,
+   RTE_PT_UIO_GENERIC  = 3,
+};
+
 /**
  * A structure describing a PCI device.
  */
@@ -152,6 +159,7 @@ struct rte_pci_device {
uint16_t max_vfs;   /**< sriov enable if not zero */
int numa_node;  /**< NUMA node connection */
struct rte_devargs *devargs;/**< Device user arguments */
+   enum rte_pt_driver pt_driver;   /**< Driver of passthrough */
 };

 /** Any PCI device identifier (vendor, device, ...) */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f5410..bd3f77d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,35 @@ error:
return -1;
 }

+static int
+pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
+{
+   int count;
+   char path[PATH_MAX];
+   char *name;
+
+   if (!filename || !dri_name)
+   return -1;
+
+   count = readlink(filename, path, PATH_MAX);
+   if (count >= PATH_MAX)
+   return -1;
+
+   /* For device does not have a driver */
+   if (count < 0)
+   return 1;
+
+   path[count] = '\0';
+
+   name = strrchr(path, '/');
+   if (name) {
+   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
+   return 0;
+   }
+
+   return -1;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -222,11 +251,12 @@ pci_scan_one(const char *dirname, uint16_t domain, 
uint8_t bus,
char filename[PATH_MAX];
unsigned long tmp;
struct rte_pci_device *dev;
+   char driver[PATH_MAX];
+   int ret;

dev = malloc(sizeof(*dev));
-   if (dev == NULL) {
+   if (dev == NULL)
return -1;
-   }

memset(dev, 0, sizeof(*dev));
dev->addr.domain = domain;
@@ -298,6 +328,25 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
return -1;
}

+   /* parse driver */
+   snprintf(filename, sizeof(filename), "%s/driver", dirname);
+   ret = pci_get_kernel_driver_by_path(filename, driver);
+   if (!ret) {
+   if (!strcmp(driver, "vfio-pci"))
+   dev->pt_driver = RTE_PT_VFIO;
+   else if (!strcmp(driver, "igb_uio"))
+   dev->pt_driver = RTE_PT_IGB_UIO;
+   else if (!strcmp(driver, "uio_pci_generic"))
+   dev->pt_driver = RTE_PT_UIO_GENERIC;
+   else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+   } else if (ret < 0) {
+   RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
+   free(dev);
+   return -1;
+   } else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+
/* device is valid, add in list (sorted) */
if (TAILQ_EMPTY(_device_list)) {
TAILQ_INSERT_TAIL(_device_list, dev, next);
-- 
1.9.1

[dpdk-dev] [PATCH v7 00/14] Port Hotplug Framework

2015-02-09 Thread Tetsuya Mukawa

This patch series adds a dynamic port hotplug framework to DPDK.
With the patches, DPDK apps can attach or detach ports at runtime.

The basic concept of the port hotplug is like followings.
- DPDK apps must have responsibility to manage ports.
  DPDK apps only know which ports are attached or detached at the moment.
  The port hotplug framework is implemented to allow DPDK apps to manage ports.
  For example, when DPDK apps call port attach function, attached port number
  will be returned. Also, DPDK apps can detach port by port number.
- Kernel support is needed for attaching or detaching physical device ports.
  To attach a new physical device port, the device will be recognized by
  userspace directly I/O framework in kernel at first. Then DPDK apps can
  call the port hotplug functions to attach ports.
  For detaching, steps are vice versa.
- Before detach ports, ports must be stopped and closed.
  DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
  detaching ports. These function will call finalization codes of PMDs.
  But so far, no PMD frees all resources allocated by initialization.
  It means PMDs are needed to be fixed to support the port hotplug.
  'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
  Without this flag, detaching will be failed.
- Mustn't affect legacy DPDK apps.
  No DPDK EAL behavior is changed, if the port hotplug functions are't called.
  So all legacy DPDK apps can still work without modifications.

And a few limitations.
- The port hotplug functions are not thread safe.
  DPDK apps should handle it.
- Only support Linux and igb_uio so far.
  BSD and VFIO is not supported. I will send VFIO patches at least, but I don't
  have a plan to submit BSD patch so far.


Here is port hotplug APIs.
---
/**
 * Attach a new device.
 *
 * @param devargs
 *   A pointer to a strings array describing the new device
 *   to be attached. The strings should be a pci address like
 *   ':01:00.0' or virtual device name like 'eth_pcap0'.
 * @param port_id
 *  A pointer to a port identifier actually attached.
 * @return
 *  0 on success and port_id is filled, negative on error
 */
int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

/**
 * Detach a device.
 *
 * @param port_id
 *   The port identifier of the device to detach.
 * @param addr
 *  A pointer to a device name actually detached.
 * @return
 *  0 on success and devname is filled, negative on error
 */
int rte_eal_dev_detach(uint8_t port_id, char *devname);
---

This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check
a patch for pcap PMD.

Also, please check testpmd patch. It will show you how to fix your legacy
applications to support port hotplug feature.

PATCH v7 changes
 - Add a new section to programmer's guide.
   (Thanks to Iremonger, Bernard)
 - Fix port checking implementation of star_port().
 - Fix typo of warning messages.
 - Add pt_driver checking to rte_eth_dev_check_detachable().
   (Thanks to Qiu, Michael)

PATCH v6 changes
 - Fix rte_eth_dev_uninit() to handle a return value of uninit
   function of PMD. To do this, below changes also be applied.
   - Fix a parameter of rte_eth_dev_free().
   - Use rte_eth_dev structure as the paramter of rte_eth_dev_free().

PATCH v5 changes
 - Add runtime check passthrough driver type, like vfio-pci, igb_uio
   and uio_pci_generic.
   This was done by Qiu, Michael. Thanks a lot.
 - Change function names like below.
   - rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
   - rte_eal_dev_invoke() to rte_eal_vdev_invoke().
 - Add code to handle a return value of rte_eal_devargs_remove().
 - Fix pci address format in rte_eal_dev_detach().
 - Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
 - Change function definition of rte_eal_devargs_remove().
 - Fix pci_unmap_device() to check pt_driver.
 - Fix return value of below functions.
   - rte_eth_dev_get_changed_port().
   - rte_eth_dev_get_port_by_addr().
 - Change paramters of rte_eth_dev_validate_port() to cleanup code.
 - Fix pci_scan_one to handle pt_driver correctly.
   (Thanks to Qiu, Michael for above suggestions)

PATCH v4 changes
 - Merge patches to review easier.
 - Fix indent of 'if' statement.
 - Fix calculation method of eal_compare_pci_addr().
 - Fix header file declaration.
 - Add header file to determine if hotplug can be enabled.
   (Thanks to Qiu, Michael)
 - Use braces with 'for' loop.
 - Add parameter checking.
 - Fix sanity check code
 - Fix comments of rte_eth_dev_type.
 - Change function names.
   (Thanks to Iremonger, Bernard)

PATCH v3 changes:
 - Fix enum definition used in rte_ethdev.c.
   (Thanks to Zhang, Helin)

PATCH v2 changes:
 - Replace rte_eal_dev_attach_pdev(),

[dpdk-dev] [ Information needed related to Packet Generator in DPDK and ThroughPut Analysis ]

2015-02-09 Thread Arkajit Ghosh


Hi Team,

Can anyone please suggest me is there any packet generator which support 
Intel-DPDK and also can able to send the packets in a controlled way? 
[Controlled way means where I can able control the sending packets data rate 
and number of packets and it's size. ]

Along with this please let me know how to do the throughPut analysis ?during 
the packet processing? Is there any tool which can do the analysis.

I really appreciate your valuable inputs and suggestions.

Thanks a lot in advance.

Thanks & Regards
Arkajit Ghosh

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

[dpdk-dev] [PATCH] maintainer: claim review for virtio/vhost

2015-02-09 Thread Thomas Monjalon

> > I will be a volunteer of reviewing the following files:
> >lib/librte_pmd_virtio/
> >doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst
> >lib/librte_vhost/
> >doc/guides/prog_guide/vhost_lib.rst
> >examples/vhost/
> >doc/guides/sample_app_ug/vhost.rst
> > 
> > Signed-off-by: Changchun Ouyang 
> 
> Acked-by: Sergio Gonzalez Monroy 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] ACL Issue with single field rule and rest with wild card entry

2015-02-09 Thread Ananyev, Konstantin

Hi Varun,

> -Original Message-
> From: Rapelly, Varun [mailto:vrapelly at sonusnet.com]
> Sent: Friday, February 06, 2015 4:58 PM
> To: Ananyev, Konstantin
> Subject: FW: [dpdk-dev] ACL Issue with single field rule and rest with wild 
> card entry
> 
> Sorry for too many mails. :)
> 
> I tried with DPDK 1.7 also, but got the same different results as 1.6.0 :(

I just tried with v1.7.0-rc4.
For me results are absolutely the same for both cases:
rte_acl_classify() returns 0 , res[0]=11

That's on HSW, fedora 20, gcc 4.8.3.
Same for IVB, fedora 16 with gcc 4.6
RTE_TARGET=x86_64-native-linuxapp-gcc

Not sure why you are getting different results then me with DPDK 1.7.
I suppose you didn't modify ACL library in any way?

BTW, do you realise that in you test you specify 'size = sizeof (uint8_t)' for 
all your fields?
It doesn't really matter for that particular test case, as all fields except 
the very first one are wildcards,
but in real app, it shouldn't be that way.

Konstantin

> 
> Regards,
> Varun
> -Original Message-
> From: Rapelly, Varun
> Sent: Friday, February 06, 2015 10:24 PM
> To: 'Ananyev, Konstantin'
> Subject: FW: [dpdk-dev] ACL Issue with single field rule and rest with wild 
> card entry
> 
> Hi Konstantin,
> 
> FYI: I'm using DPDK 1.6.0
> 
> -Original Message-
> From: Rapelly, Varun
> Sent: Friday, February 06, 2015 10:08 PM
> To: 'Ananyev, Konstantin'
> Subject: RE: [dpdk-dev] ACL Issue with single field rule and rest with wild 
> card entry
> 
> Hi Konstantin,
> 
> Thanks for your quick response.
> 
> I'm getting different results:
> 
> With 118-125 lines commented:
> 
> ACL: Gen phase for ACL "ACL_example":
> runtime memory footprint on socket -1:
> single nodes/bytes used: 0/0
> quad nodes/bytes used: 0/0
> DFA nodes/bytes used: 1/2048
> match nodes/bytes used: 4/512
> total: 4960 bytes
> ACL: Build phase for ACL "ACL_example":
> memory consumed: 8388615
> ACL: trie 0: number of rules: 4
> rte_acl_classify() returns 0
> , res[0]=0
> 
> With uncommented:
> ACL: Gen phase for ACL "ACL_example":
> runtime memory footprint on socket -1:
> single nodes/bytes used: 12/96
> quad nodes/bytes used: 4/96
> DFA nodes/bytes used: 1/2048
> match nodes/bytes used: 4/512
> total: 5152 bytes
> ACL: Build phase for ACL "ACL_example":
> memory consumed: 8388615
> ACL: trie 0: number of rules: 4
> rte_acl_classify() returns 0
> , res[0]=11
> 
> Please let me know, is it depends on any other environment variables or what?
> 
> Regards,
> Varun
> 
> -Original Message-
> From: Ananyev, Konstantin [mailto:konstantin.ananyev at intel.com]
> Sent: Friday, February 06, 2015 5:15 PM
> To: Rapelly, Varun; dev at dpdk.org
> Subject: RE: [dpdk-dev] ACL Issue with single field rule and rest with wild 
> card entry
> 
> Hi Varun,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rapelly, Varun
> > Sent: Friday, February 06, 2015 7:25 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] ACL Issue with single field rule and rest with
> > wild card entry
> >
> > Hi,
> >
> > struct ipv6_5tuple {
> >uint8_t proto; /* Protocol, next header. */
> >uint32_t src_addr0;  /* IP address of source host. */
> >uint32_t src_addr1;  /* IP address of source host. */
> >uint32_t src_addr2;  /* IP address of source host. */
> >uint32_t src_addr3;  /* IP address of source host. */ };
> >
> > enum {
> >PROTO_FIELD_IPV6,
> >SRC_FIELD0_IPV6,
> >SRC_FIELD1_IPV6,
> >SRC_FIELD2_IPV6,
> >SRC_FIELD3_IPV6,
> >NUM_FIELDS_IPV6
> > };
> >
> >
> > I'm using the above data to insert in to ACL trie.
> >
> > If I'm inserting rules with only different proto fields, [I'm expecting 
> > others fields as wild card entries]  then the rules are not
> matching.
> >
> > But if I insert one rule with dummy entries [in the attached file line num 
> > 118-125], then the above issue is resolved.
> 
> Hmm, it is strange...
> I took your source code compiled it, then commented out lines 118-125 and 
> recompiled it.
> Both binaries produce valid result for me:
> 
> 1. original code:
> ACL: Gen phase for ACL "ACL_example":
> runtime memory footprint on socket -1:
> single nodes/bytes used: 0/0
> quad nodes/vectors/bytes used: 0/0/0
> DFA nodes/group64/bytes used: 1/4/4104
> match nodes/bytes used: 4/512
> total: 6816 bytes
> max limit: 18446744073709551615 bytes
> ACL: Build phase for ACL "ACL_example":
> node limit for tree split: 2048
> nodes created: 5
> memory consumed: 8388615
> ACL: trie 0: number of rules: 4, indexes: 1
> rte_acl_classify() returns 0
> , res[0]=11
> 
> 
> 2. code with lines 118-125 commented out:
> ACL: Gen phase for ACL "ACL_example":
> runtime memory footprint on socket -1:
> single nodes/bytes used: 0/0
> quad nodes/vectors/bytes used: 0/0/0
> DFA nodes/group64/bytes used: 1/4/4104
> match nodes/bytes used: 3/384
> total: 6688 bytes
> max limit: 18446744073709551615 bytes
> ACL: Build phase for ACL "ACL_example":
>

[dpdk-dev] [PATCH] maintainers: claim eal common and linux

2015-02-09 Thread Thomas Monjalon

2015-02-09 14:50, David Marchand:
> As discussed with Thomas, I would like to take care of the common eal and 
> linux
> implementation.
> 
> Signed-off-by: David Marchand 
[...]
>  EAL API and common code
> -M: Thomas Monjalon 
> +M: David Marchand 

Thank you David to assume this responsibility.
You have already done an excellent job in EAL reviews and cleanups,
especially for arch split.

>  Linux EAL (with overlaps)
> +M: David Marchand 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH] MAINTAINERS: claim i40e and KNI

2015-02-09 Thread Thomas Monjalon

> Claim i40e and KNI modules.
> 
> Signed-off-by: Helin Zhang 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt framework

2015-02-09 Thread Dumitrescu, Cristian

Thank you, Thomas!

We are working on some enhancements on librte_cfg for release 2.1, so in order 
to avoid unnecessary code churn, it is probably
better to have the librte_cfgfile changes done first, then have a subsequent 
patch on qos_sched.

Regards,
Cristian


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Monday, February 9, 2015 3:15 PM
To: Dumitrescu, Cristian
Cc: dev at dpdk.org; Gonzalez Monroy, Sergio
Subject: Re: [dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt 
framework

2015-02-06 13:13, Gonzalez Monroy, Sergio:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cristian Dumitrescu
> > Sent: Wednesday, February 4, 2015 3:53 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt
> > framework
> > 
> > As original author of these DPDK components, I am volunteering to maintain
> > them going forward:
> > - Traffic Metering
> > - Hierarchical Scheduler
> > - Packet Framework
> > - Configuration File
> > 
> > Signed-off-by: Cristian Dumitrescu 
> 
> Acked-by: Sergio Gonzalez Monroy 

Acked-by: Thomas Monjalon 

Applied, thanks

About cfgfile, we are still waiting for the cleanup in qos_sched example:
http://dpdk.org/ml/archives/dev/2014-October/006774.html
Do you have news?
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH v3] maintainers: claim hash and lpm libraries

2015-02-09 Thread Thomas Monjalon

> > Signed-off-by: Bruce Richardson 
> 
> Acked-by: Helin Zhang 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH v2] maintainers: claim FreeBSD EAL and distributor

2015-02-09 Thread Thomas Monjalon

> > Signed-off-by: Bruce Richardson 
> 
> Acked-by: Pablo de Lara 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH] MAINTAINERS: claim IP fragmentation and ACL

2015-02-09 Thread Thomas Monjalon

> > Signed-off-by: Konstantin Ananyev 
> 
> Acked-by: Sergio Gonzalez Monroy 

Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt framework

2015-02-09 Thread Thomas Monjalon

2015-02-06 13:13, Gonzalez Monroy, Sergio:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cristian Dumitrescu
> > Sent: Wednesday, February 4, 2015 3:53 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] MAINTAINERS: claim metering, sched and pkt
> > framework
> > 
> > As original author of these DPDK components, I am volunteering to maintain
> > them going forward:
> > - Traffic Metering
> > - Hierarchical Scheduler
> > - Packet Framework
> > - Configuration File
> > 
> > Signed-off-by: Cristian Dumitrescu 
> 
> Acked-by: Sergio Gonzalez Monroy 

Acked-by: Thomas Monjalon 

Applied, thanks

About cfgfile, we are still waiting for the cleanup in qos_sched example:
http://dpdk.org/ml/archives/dev/2014-October/006774.html
Do you have news?

[dpdk-dev] [PATCH] maintainers: claim responsability for testpmd and user guide

2015-02-09 Thread Thomas Monjalon

> > Signed-off-by: Pablo de Lara 
[...]
> >  Driver testing tool
> > +M: Pablo de Lara 
> >  F: app/test-pmd/
> >  F: doc/guides/testpmd_app_ug/
> 
> Acked-by: Sergio Gonzalez Monroy 
Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever

2015-02-09 Thread Ananyev, Konstantin

Hi Olivier,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> Sent: Friday, February 06, 2015 3:20 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin 
> forever
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > Add a sched_yield() syscall if the thread spins for too long, waiting other 
> > thread to finish its operations on the ring.
> > That gives pre-empted thread a chance to proceed and finish with ring 
> > enqnue/dequeue operation.
> > The purpose is to reduce contention on the ring.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_ring/rte_ring.h | 35 +--
> >  1 file changed, 29 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 39bacdd..c402c73 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats {
> >
> >  #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
> >  #define RTE_RING_MZ_PREFIX "RG_"
> > +#define RTE_RING_PAUSE_REP 0x100  /**< yield after num of times pause. */
> >
> >  /**
> >   * An RTE ring structure.
> > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * 
> > const *obj_table,
> > uint32_t cons_tail, free_entries;
> > const unsigned max = n;
> > int success;
> > -   unsigned i;
> > +   unsigned i, rep;
> > uint32_t mask = r->prod.mask;
> > int ret;
> >
> > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * 
> > const *obj_table,
> >  * If there are other enqueues in progress that preceded us,
> >  * we need to wait for them to complete
> >  */
> > -   while (unlikely(r->prod.tail != prod_head))
> > -   rte_pause();
> > +   do {
> > +   /* avoid spin too long waiting for other thread finish */
> > +   for (rep = RTE_RING_PAUSE_REP;
> > +rep != 0 && r->prod.tail != prod_head; rep--)
> > +   rte_pause();
> > +
> > +   /*
> > +* It gives pre-empted thread a chance to proceed and
> > +* finish with ring enqnue operation.
> > +*/
> > +   if (rep == 0)
> > +   sched_yield();
> > +   } while (rep == 0);
> >
> > r->prod.tail = prod_next;
> > return ret;
> > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
> > **obj_table,
> > uint32_t cons_next, entries;
> > const unsigned max = n;
> > int success;
> > -   unsigned i;
> > +   unsigned i, rep;
> > uint32_t mask = r->prod.mask;
> >
> > /* move cons.head atomically */
> > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
> > **obj_table,
> >  * If there are other dequeues in progress that preceded us,
> >  * we need to wait for them to complete
> >  */
> > -   while (unlikely(r->cons.tail != cons_head))
> > -   rte_pause();
> > +   do {
> > +   /* avoid spin too long waiting for other thread finish */
> > +   for (rep = RTE_RING_PAUSE_REP;
> > +rep != 0 && r->cons.tail != cons_head; rep--)
> > +   rte_pause();
> > +
> > +   /*
> > +* It gives pre-empted thread a chance to proceed and
> > +* finish with ring denqnue operation.
> > +*/
> > +   if (rep == 0)
> > +   sched_yield();
> > +   } while (rep == 0);
> >
> > __RING_STAT_ADD(r, deq_success, n);
> > r->cons.tail = cons_next;
> >
> 
> The ring library was designed with the assumption that the code is not
> preemptable. The code is lock-less but not wait-less. Actually, if the
> code is preempted at a bad moment, it can spin forever until it's
> unscheduled.
> 
> I wonder if adding a sched_yield() may not penalize the current
> implementations that only use one pthread per core? Even if there
> is only one pthread in the scheduler queue for this CPU, calling
> the scheduler code may cost thousands of cycles.
> 
> Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from?
> Why 0x100 is better than 42 or than 1?

The idea was to have something few times bigger than actual number
active cores in the system, to minimise chance of  a sched_yield() being called
for the case when we have one thread per physical core.  
My thought was that having that many repeats would make such chance neglectable.
Though, right now, I don't have any data to back it up. 

> I think it could be good to check if there is a performance impact
> with this change, especially where there is a lot of contention on
> the ring. If it has an impact, what about adding a compile or runtime
> option?

Good idea, probably we should make RTE_RING_PAUSE_REP  configuration option
and let say avoid emitting ' sched_yield();' at all, if  RTE_RING_PAUSE_REP == 
0.

[dpdk-dev] [PATCH] maintainers: claim responsibility for VMXNET3 PMD

2015-02-09 Thread Thomas Monjalon

> Signed-off-by: Yong Wang 

>  VMware vmxnet3
> +M: Yong Wang 
>  F: lib/librte_pmd_vmxnet3/
>  F: doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst

Acked-by: Thomas Monjalon 

Could you help reviewing these patches?
http://dpdk.org/dev/patchwork/project/dpdk/list/?q=vmxnet3

Thank you Yong.

[dpdk-dev] [PATCH 0/3] update maintainers areas

2015-02-09 Thread Thomas Monjalon

> More files should be referenced in MAINTAINERS files:
>   - some (forgotten) docs can be co-maintained in doc and lib areas
>   - new ABI files
> The script can now check for unknown files.
> 
> Thomas Monjalon (3):
>   maintainers: dispatch more doc
>   maintainers: add ABI versioning
>   scripts: check wrong patterns in maintainers file

Applied

[dpdk-dev] Atheros PMDs

2015-02-09 Thread Neil Horman

On Tue, Feb 10, 2015 at 01:24:02AM +0530, Akshay wrote:
> Sorry for not mentioning. But I meant atheros wired nics.
> 
Ah, I wasn't aware they made wired NICS.  Regardless, no such PMD has been
submitted for inclusion.  They may have one that is proprietary, but none that
I'm aware of
Neil

> On Tue, Feb 10, 2015 at 1:14 AM, Neil Horman  wrote:
> 
> > On Tue, Feb 10, 2015 at 12:11:54AM +0530, Akshay wrote:
> > > Hi,
> > >
> > > Are there any PMDs available for Atheros?
> > >
> > > Regards,
> > > Akshay.
> > >
> >
> > The DPDK isn't currently able to handle wireless NICS.
> > Neil
> >
> >

[dpdk-dev] [PATCH v7 03/14] eal/pci, ethdev: Remove assumption that port will not be detached

2015-02-09 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 9, 2015 8:30 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; Qiu, Michael; Tetsuya Mukawa
> Subject: [PATCH v7 03/14] eal/pci,ethdev: Remove assumption that port will 
> not be detached
> 
> To remove assumption, do like followings.
> 
> This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver 
> structure. The flags
> indicate the driver can detach devices at runtime.
> Also, remove assumption that port will not be detached.
> 
> To remove the assumption.
> - Add 'attached' member to rte_eth_dev structure.
>   This member is used for indicating the port is attached, or not.
> - Add rte_eth_dev_allocate_new_port().
>   This function is used for allocating new port.
> 
> v5:
> - Change parameters of rte_eth_dev_validate_port() to cleanup code.
> v4:
> - Use braces with 'for' loop.
> - Fix indent of 'if' statement.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/include/rte_pci.h |   2 +
>  lib/librte_ether/rte_ethdev.c   | 454 
> +---
>  lib/librte_ether/rte_ethdev.h   |   5 +
>  3 files changed, 186 insertions(+), 275 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index 7b48b55..7f2d699 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -207,6 +207,8 @@ struct rte_pci_driver {  #define RTE_PCI_DRV_FORCE_UNBIND 
> 0x0004
>  /** Device driver supports link state interrupt */
>  #define RTE_PCI_DRV_INTR_LSC 0x0008
> +/** Device driver supports detaching capability */
> +#define RTE_PCI_DRV_DETACHABLE   0x0010
> 
>  /**< Internal use only - Macro used by pci addr parsing functions **/
>  #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c 
> index ea3a1fb..d70854f
> 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -175,6 +175,16 @@ enum {
>   STAT_QMAP_RX
>  };
> 
> +enum {
> + DEV_INVALID = 0,
> + DEV_VALID,
> +};
> +
> +enum {
> + DEV_DISCONNECTED = 0,
> + DEV_CONNECTED
> +};
> +
>  static inline void
>  rte_eth_dev_data_alloc(void)
>  {
> @@ -201,19 +211,34 @@ rte_eth_dev_allocated(const char *name)  {
>   unsigned i;
> 
> - for (i = 0; i < nb_ports; i++) {
> - if (strcmp(rte_eth_devices[i].data->name, name) == 0)
> + for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> + if ((rte_eth_devices[i].attached == DEV_CONNECTED) &&
> + strcmp(rte_eth_devices[i].data->name, name) == 0)
>   return _eth_devices[i];
>   }
>   return NULL;
>  }
> 
> +static uint8_t
> +rte_eth_dev_allocate_new_port(void)
> +{
> + unsigned i;
> +
> + for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> + if (rte_eth_devices[i].attached == DEV_DISCONNECTED)
> + return i;
> + }
> + return RTE_MAX_ETHPORTS;
> +}
> +
>  struct rte_eth_dev *
>  rte_eth_dev_allocate(const char *name)
>  {
> + uint8_t port_id;
>   struct rte_eth_dev *eth_dev;
> 
> - if (nb_ports == RTE_MAX_ETHPORTS) {
> + port_id = rte_eth_dev_allocate_new_port();
> + if (port_id == RTE_MAX_ETHPORTS) {
>   PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
>   return NULL;
>   }
> @@ -226,10 +251,12 @@ rte_eth_dev_allocate(const char *name)
>   return NULL;
>   }
> 
> - eth_dev = _eth_devices[nb_ports];
> - eth_dev->data = _eth_dev_data[nb_ports];
> + eth_dev = _eth_devices[port_id];
> + eth_dev->data = _eth_dev_data[port_id];
>   snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
> - eth_dev->data->port_id = nb_ports++;
> + eth_dev->data->port_id = port_id;
> + eth_dev->attached = DEV_CONNECTED;
> + nb_ports++;
>   return eth_dev;
>  }
> 
> @@ -283,6 +310,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
>   (unsigned) pci_dev->id.device_id);
>   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>   rte_free(eth_dev->data->dev_private);
> + eth_dev->attached = DEV_DISCONNECTED;
>   nb_ports--;
>   return diag;
>  }
> @@ -308,10 +336,28 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
>   rte_eal_pci_register(_drv->pci_drv);
>  }
> 
> +enum {
> + NONE_TRACE = 0,

Hi Tetsuya,

NO_TRACE  would be clearer that NONE_TRACE


Regards,

Bernard.

> + TRACE
> +};
> +
> +static int
> +rte_eth_dev_validate_port(uint8_t port_id, int trace) {
> + if (port_id >= RTE_MAX_ETHPORTS ||
> + rte_eth_devices[port_id].attached != DEV_CONNECTED) {
> + if (trace) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + }
> + return DEV_INVALID;
> + } else
> +

[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read

2015-02-09 Thread Stephen Hemminger

On Mon, 9 Feb 2015 22:48:36 +
"Dumitrescu, Cristian"  wrote:

> Hi Stephen,
> 
> What is the reason not to clear statistics on read? Do you have a use-case / 
> justification for it?
> 
> (BTW, I see you added the reset functions, but was it also your intention to 
> remove the memset to 0 from the stats read functions? :) )
> 
> Regards,
> Cristian

Read and clear is a non-standard model. Interface statistics are not read/clear.
We have lots of scripts that read statistics. Users don't like it if when 
stastics disappear.

[dpdk-dev] Atheros PMDs

2015-02-09 Thread Neil Horman

On Tue, Feb 10, 2015 at 12:11:54AM +0530, Akshay wrote:
> Hi,
> 
> Are there any PMDs available for Atheros?
> 
> Regards,
> Akshay.
> 

The DPDK isn't currently able to handle wireless NICS.
Neil

[dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL thread

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 14/17] mempool: add support to non-EAL
> thread
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > It allows using rte_mempool in either EAL thread or any user pthread.
> > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> > It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
> > It will get bad performance and has critical risk if scheduling policy is 
> > RT.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_mempool/rte_mempool.h | 18 +++---
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.h
> b/lib/librte_mempool/rte_mempool.h
> > index 3314651..4845f27 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -198,10 +198,12 @@ struct rte_mempool {
> >   *   Number to add to the object-oriented statistics.
> >   */
> >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
> > -   unsigned __lcore_id = rte_lcore_id();   \
> > -   mp->stats[__lcore_id].name##_objs += n; \
> > -   mp->stats[__lcore_id].name##_bulk += 1; \
> > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
> > +   unsigned __lcore_id = rte_lcore_id();   \
> > +   if (__lcore_id < RTE_MAX_LCORE) {   \
> > +   mp->stats[__lcore_id].name##_objs += n; \
> > +   mp->stats[__lcore_id].name##_bulk += 1; \
> > +   }   \
> 
> Does it mean that we have no statistics for non-EAL threads?
> (same question for rings and timers in the next patches)
[LCM] Yes, it is in this patch set, mainly focus on EAL thread and make sure no 
running issue on non-EAL thread.
For full non-EAL function, will have other patch set to enhance non-EAL thread 
as the 2nd step.
> 
> 
> > } while(0)
> >  #else
> >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void
> * const *obj_table,
> > __MEMPOOL_STAT_ADD(mp, put, n);
> >
> >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > -   /* cache is not enabled or single producer */
> > -   if (unlikely(cache_size == 0 || is_mp == 0))
> > +   /* cache is not enabled or single producer or none EAL thread */
> > +   if (unlikely(cache_size == 0 || is_mp == 0 ||
> > +lcore_id >= RTE_MAX_LCORE))
> > goto ring_enqueue;
> >
> > /* Go straight to ring if put would overflow mem allocated for cache */
> > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void
> **obj_table,
> > uint32_t cache_size = mp->cache_size;
> >
> > /* cache is not enabled or single consumer */
> > -   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> > +   if (unlikely(cache_size == 0 || is_mc == 0 ||
> > +n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> > goto ring_dequeue;
> >
> > cache = >local_cache[lcore_id];
> >
> 
> What is the performance impact of adding this test?
[LCM] By perf in unit test, it's almost the same. But haven't measure EAL 
thread and non-EAL thread share the same mempool.
> 
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v2 15/15] mbuf: remove old packet type bit masks

2015-02-09 Thread Helin Zhang

As unified packet types are used instead, those old bit masks
and the relevant macros for packet type indication need to be
removed.

Signed-off-by: Helin Zhang 
---
 lib/librte_mbuf/rte_mbuf.c |  6 --
 lib/librte_mbuf/rte_mbuf.h | 14 --
 2 files changed, 4 insertions(+), 16 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.
* Redefined the bit masks for packet RX offload flags.

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 1b14e02..8050ccf 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -215,14 +215,8 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
-   case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
-   case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
-   case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
-   case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
-   case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
-   case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
default: return NULL;
}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ee912d6..55336b2 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -90,16 +90,10 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
-#define PKT_RX_IPV4_HDR  (1ULL << 5)  /**< RX packet with IPv4 header. */
-#define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 
header. */
-#define PKT_RX_IPV6_HDR  (1ULL << 7)  /**< RX packet with IPv6 header. */
-#define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 
header. */
-#define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT 
Packet. */
-#define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped 
packet.*/
-#define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 
header.*/
-#define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 
header. */
-#define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR match. */
-#define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported if FDIR 
match. */
+#define PKT_RX_IEEE1588_PTP  (1ULL << 5)  /**< RX IEEE1588 L2 Ethernet PT 
Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 6) /**< RX IEEE1588 L2/L4 timestamped 
packet.*/
+#define PKT_RX_FDIR_ID   (1ULL << 7) /**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX  (1ULL << 8) /**< Flexible bytes reported if FDIR 
match. */
 /* add new RX flags here */

 /* add new TX flags here */
-- 
1.9.3

[dpdk-dev] [PATCH v2 14/15] examples/l3fwd: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/l3fwd/main.c | 64 ---
 1 file changed, 35 insertions(+), 29 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 6f7d7d4..302322e 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -958,7 +958,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
struct lcore_conf *qcon

eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
/* Handle IPv4 headers.*/
ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned 
char *) +
sizeof(struct ether_hdr));
@@ -993,7 +993,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
struct lcore_conf *qcon

send_single_packet(m, dst_port);

-   } else {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
/* Handle IPv6 headers.*/
struct ipv6_hdr *ipv6_hdr;

@@ -1039,11 +1039,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t 
portid, struct lcore_conf *qcon
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
-rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint16_t ptype)
 {
uint8_t ihl;

-   if ((flags & PKT_RX_IPV4_HDR) != 0) {
+   if (RTE_ETH_IS_IPV4_HDR(ptype)) {

ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;

@@ -1074,11 +1074,11 @@ get_dst_port(const struct lcore_conf *qconf, struct 
rte_mbuf *pkt,
struct ipv6_hdr *ipv6_hdr;
struct ether_hdr *eth_hdr;

-   if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
_hop) != 0)
next_hop = portid;
-   } else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
@@ -1112,17 +1112,19 @@ process_packet(struct lcore_conf *qconf, struct 
rte_mbuf *pkt,
ve = val_eth[dp];

dst_port[0] = dp;
-   rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
+   rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);

te =  _mm_blend_epi16(te, ve, MASK_ETH);
_mm_store_si128((__m128i *)eth_hdr, te);
 }

 /*
- * Read ol_flags and destination IPV4 addresses from 4 mbufs.
+ * Read packet_type and destination IPV4 addresses from 4 mbufs.
  */
 static inline void
-processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
+processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
+   __m128i *dip,
+   uint32_t *ipv4_flag)
 {
struct ipv4_hdr *ipv4_hdr;
struct ether_hdr *eth_hdr;
@@ -1131,22 +1133,20 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i 
*dip, uint32_t *flag)
eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x0 = ipv4_hdr->dst_addr;
-   flag[0] = pkt[0]->ol_flags & PKT_RX_IPV4_HDR;

eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x1 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[1]->ol_flags;

eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x2 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[2]->ol_flags;

eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x3 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[3]->ol_flags;
+   *ipv4_flag = pkt[0]->packet_type & pkt[1]->packet_type &
+   pkt[2]->packet_type & pkt[3]->packet_type & RTE_PTYPE_L3_IPV4;

dip[0] = _mm_set_epi32(x3, x2, x1, x0);
 }
@@ -1156,8 +1156,12 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i 
*dip, uint32_t *flag)
  * If lookup fails, use incoming port (portid) as destination port.
  */
 static inline void
-processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
-   uint8_t portid, struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP])
+processx4_step2(const struct lcore_conf *qconf,
+   __m128i dip,
+   uint32_t ipv4_flag,
+   uint8_t portid,
+   struct rte_mbuf *pkt[FWDSTEP],
+   uint16_t dprt[FWDSTEP])
 {

[dpdk-dev] [PATCH v2 12/15] examples/l3fwd-acl: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/l3fwd-acl/main.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f1f7601..af70ccd 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -651,9 +651,7 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
struct ipv4_hdr *ipv4_hdr;
struct rte_mbuf *pkt = pkts_in[index];

-   int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
-
-   if (type == PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {

ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
unsigned char *) + sizeof(struct ether_hdr));
@@ -674,8 +672,7 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
rte_pktmbuf_free(pkt);
}

-   } else if (type == PKT_RX_IPV6_HDR) {
-
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -693,17 +690,13 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
 {
struct rte_mbuf *pkt = pkts_in[index];

-   int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
-
-   if (type == PKT_RX_IPV4_HDR) {
-
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv4[acl->num_ipv4] = MBUF_IPV4_2PROTO(pkt);
acl->m_ipv4[(acl->num_ipv4)++] = pkt;


-   } else if (type == PKT_RX_IPV6_HDR) {
-
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -751,9 +744,9 @@ send_one_packet(struct rte_mbuf *m, uint32_t res)
/* in the ACL list, drop it */
 #ifdef L3FWDACL_DEBUG
if ((res & ACL_DENY_SIGNATURE) != 0) {
-   if (m->ol_flags & PKT_RX_IPV4_HDR)
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type))
dump_acl4_rule(m, res);
-   else
+   else if (RTE_ETH_IS_IPV6_HDR(m->packet_type))
dump_acl6_rule(m, res);
}
 #endif
-- 
1.9.3

[dpdk-dev] [PATCH v2 11/15] examples/ip_reassembly: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/ip_reassembly/main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 8492153..5ef2135 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -357,7 +357,7 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t 
queue,
dst_port = portid;

/* if packet is IPv4 */
-   if (m->ol_flags & (PKT_RX_IPV4_HDR)) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
struct ipv4_hdr *ip_hdr;
uint32_t ip_dst;

@@ -397,9 +397,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t 
queue,
}

eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
-   }
-   /* if packet is IPv6 */
-   else if (m->ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+   /* if packet is IPv6 */
struct ipv6_extension_fragment *frag_hdr;
struct ipv6_hdr *ip_hdr;

-- 
1.9.3

[dpdk-dev] [PATCH v2 10/15] examples/ip_fragmentation: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/ip_fragmentation/main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index eac5427..152844e 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -286,7 +286,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
len = qconf->tx_mbufs[port_out].len;

/* if this is an IPv4 packet */
-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
struct ipv4_hdr *ip_hdr;
uint32_t ip_dst;
/* Read the lookup key (i.e. ip_dst) from the input packet */
@@ -320,9 +320,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
if (unlikely (len2 < 0))
return;
}
-   }
-   /* if this is an IPv6 packet */
-   else if (m->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+   /* if this is an IPv6 packet */
struct ipv6_hdr *ip_hdr;

ipv6 = 1;
-- 
1.9.3

[dpdk-dev] [PATCH v2 09/15] app/test: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
Signed-off-by: Jijiang Liu 
---
 app/test-pmd/csumonly.c | 6 +++---
 app/test-pmd/rxonly.c   | 9 +++--
 2 files changed, 6 insertions(+), 9 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 41711fd..5e08272 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -319,7 +319,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
uint16_t nb_tx;
uint16_t i;
uint64_t ol_flags;
-   uint16_t testpmd_ol_flags;
+   uint16_t testpmd_ol_flags, packet_type;
uint8_t l4_proto, l4_tun_len = 0;
uint16_t ethertype = 0, outer_ethertype = 0;
uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
@@ -362,6 +362,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
tunnel = 0;
l4_tun_len = 0;
m = pkts_burst[i];
+   packet_type = m->packet_type;

/* Update the L3/L4 checksum error packet statistics */
rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
@@ -387,8 +388,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)

/* currently, this flag is set by i40e only if the
 * packet is vxlan */
-   } else if (m->ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
-   PKT_RX_TUNNEL_IPV6_HDR))
+   } else if (RTE_ETH_IS_TUNNEL_PKT(packet_type))
tunnel = 1;

if (tunnel == 1) {
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index fdfe990..8eb68c4 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -92,7 +92,7 @@ pkt_burst_receive(struct fwd_stream *fs)
uint64_t ol_flags;
uint16_t nb_rx;
uint16_t i, packet_type;
-   uint64_t is_encapsulation;
+   uint16_t is_encapsulation;

 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
uint64_t start_tsc;
@@ -135,10 +135,7 @@ pkt_burst_receive(struct fwd_stream *fs)
eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
ol_flags = mb->ol_flags;
packet_type = mb->packet_type;
-
-   is_encapsulation = ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
-   PKT_RX_TUNNEL_IPV6_HDR);
-
+   is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
print_ether_addr("  src=", _hdr->s_addr);
print_ether_addr(" - dst=", _hdr->d_addr);
printf(" - type=0x%04x - length=%u - nb_segs=%d",
@@ -174,7 +171,7 @@ pkt_burst_receive(struct fwd_stream *fs)
l2_len  = sizeof(struct ether_hdr);

 /* Do not support ipv4 option field */
-   if (ol_flags & PKT_RX_TUNNEL_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(packet_type)) {
l3_len = sizeof(struct ipv4_hdr);
ipv4_hdr = (struct ipv4_hdr *) 
(rte_pktmbuf_mtod(mb,
unsigned char *) + l2_len);
-- 
1.9.3

[dpdk-dev] [PATCH v2 07/15] vmxnet3: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 8425f32..c85ebd8 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -650,9 +650,9 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);

if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct 
ipv4_hdr))
-   rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+   rxm->packet_type = RTE_PTYPE_L3_IPV4_EXT;
else
-   rxm->ol_flags |= PKT_RX_IPV4_HDR;
+   rxm->packet_type = RTE_PTYPE_L3_IPV4;

if (!rcd->cnc) {
if (!rcd->ipc)
-- 
1.9.3

[dpdk-dev] [PATCH v2 05/15] i40e: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_rxtx.c | 786 ++--
 1 file changed, 512 insertions(+), 274 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 2beae3c..bcb49f0 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -146,272 +146,511 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
return flags;
 }

-/* Translate pkt types to pkt flags */
-static inline uint64_t
-i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
+/* For each value it means, datasheet of hardware can tell more details */
+static inline uint32_t
+i40e_rxd_pkt_type_mapping(uint8_t ptype)
 {
-   uint8_t ptype = (uint8_t)((qword & I40E_RXD_QW1_PTYPE_MASK) >>
-   I40E_RXD_QW1_PTYPE_SHIFT);
-   static const uint64_t ip_ptype_map[I40E_MAX_PKT_TYPE] = {
-   0, /* PTYPE 0 */
-   0, /* PTYPE 1 */
-   0, /* PTYPE 2 */
-   0, /* PTYPE 3 */
-   0, /* PTYPE 4 */
-   0, /* PTYPE 5 */
-   0, /* PTYPE 6 */
-   0, /* PTYPE 7 */
-   0, /* PTYPE 8 */
-   0, /* PTYPE 9 */
-   0, /* PTYPE 10 */
-   0, /* PTYPE 11 */
-   0, /* PTYPE 12 */
-   0, /* PTYPE 13 */
-   0, /* PTYPE 14 */
-   0, /* PTYPE 15 */
-   0, /* PTYPE 16 */
-   0, /* PTYPE 17 */
-   0, /* PTYPE 18 */
-   0, /* PTYPE 19 */
-   0, /* PTYPE 20 */
-   0, /* PTYPE 21 */
-   PKT_RX_IPV4_HDR, /* PTYPE 22 */
-   PKT_RX_IPV4_HDR, /* PTYPE 23 */
-   PKT_RX_IPV4_HDR, /* PTYPE 24 */
-   0, /* PTYPE 25 */
-   PKT_RX_IPV4_HDR, /* PTYPE 26 */
-   PKT_RX_IPV4_HDR, /* PTYPE 27 */
-   PKT_RX_IPV4_HDR, /* PTYPE 28 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 29 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 30 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 31 */
-   0, /* PTYPE 32 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 33 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 34 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 35 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 36 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 37 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 38 */
-   0, /* PTYPE 39 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 40 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 41 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 42 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 43 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 44 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 45 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 46 */
-   0, /* PTYPE 47 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 48 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 49 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 50 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 51 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 52 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 53 */
-   0, /* PTYPE 54 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 55 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 56 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 57 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 58 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 59 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 60 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 61 */
-   0, /* PTYPE 62 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 63 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 64 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 65 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 66 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 67 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 68 */
-   0, /* PTYPE 69 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 70 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 71 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 72 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 73 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 74 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 75 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 76 */
-   0, /* PTYPE 77 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 78 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 79 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 80 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 81 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 82 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 83 */
-   0, /* PTYPE 84 */
-

[dpdk-dev] [PATCH v2 04/15] ixgbe: support of unified packet type for vector

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.
Note that around 2% performance drop (64B) was observed of doing
4 ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Cunming Liang 
Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 49 +++
 1 file changed, 26 insertions(+), 23 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
index b54cb19..357eb1d 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
@@ -134,44 +134,35 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq)
  */
 #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE

-#define OLFLAGS_MASK ((uint16_t)(PKT_RX_VLAN_PKT | PKT_RX_IPV4_HDR |\
-PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\
-PKT_RX_IPV6_HDR_EXT))
-#define OLFLAGS_MASK_V   (((uint64_t)OLFLAGS_MASK << 48) | \
- ((uint64_t)OLFLAGS_MASK << 32) | \
- ((uint64_t)OLFLAGS_MASK << 16) | \
- ((uint64_t)OLFLAGS_MASK))
-#define PTYPE_SHIFT(1)
+#define OLFLAGS_MASK_V   (((uint64_t)PKT_RX_VLAN_PKT << 48) | \
+ ((uint64_t)PKT_RX_VLAN_PKT << 32) | \
+ ((uint64_t)PKT_RX_VLAN_PKT << 16) | \
+ ((uint64_t)PKT_RX_VLAN_PKT))
 #define VTAG_SHIFT (3)

 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
-   __m128i ptype0, ptype1, vtag0, vtag1;
+   __m128i vtag0, vtag1;
union {
uint16_t e[4];
uint64_t dword;
} vol;

-   ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
-   ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);

-   ptype1 = _mm_unpacklo_epi32(ptype0, ptype1);
vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
-
-   ptype1 = _mm_slli_epi16(ptype1, PTYPE_SHIFT);
vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT);

-   ptype1 = _mm_or_si128(ptype1, vtag1);
-   vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V;
+   vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V;

rx_pkts[0]->ol_flags = vol.e[0];
rx_pkts[1]->ol_flags = vol.e[1];
rx_pkts[2]->ol_flags = vol.e[2];
rx_pkts[3]->ol_flags = vol.e[3];
 }
+
 #else
 #define desc_to_olflags_v(desc, rx_pkts) do {} while (0)
 #endif
@@ -197,13 +188,15 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
uint64_t var;
__m128i shuf_msk;
__m128i crc_adjust = _mm_set_epi16(
-   0, 0, 0, 0, /* ignore non-length fields */
+   0, 0, 0,/* ignore non-length fields */
+   -rxq->crc_len, /* sub crc on data_len */
0,  /* ignore high-16bits of pkt_len */
-rxq->crc_len, /* sub crc on pkt_len */
-   -rxq->crc_len, /* sub crc on data_len */
-   0/* ignore pkt_type field */
+   0, 0/* ignore pkt_type field */
);
__m128i dd_check, eop_check;
+   __m128i desc_mask = _mm_set_epi32(0x, 0x,
+ 0x, 0x07F0);

if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST))
return 0;
@@ -234,12 +227,13 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
/* mask to shuffle from desc. to mbuf */
shuf_msk = _mm_set_epi8(
7, 6, 5, 4,  /* octet 4~7, 32bits rss */
-   0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
15, 14,  /* octet 14~15, low 16 bits vlan_macip */
+   13, 12,  /* octet 12~13, 16 bits data_len */
0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
13, 12,  /* octet 12~13, low 16 bits pkt_len */
-   13, 12,  /* octet 12~13, 16 bits data_len */
-   0xFF, 0xFF   /* skip pkt_type field */
+   0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+   1,   /* octet 1, 8 bits pkt_type field */
+   0/* octet 0, 4 bits offset 4 pkt_type field */
);

/* Cache is empty -> need to scan the buffer rings, but first move
@@ -248,6 +242,7 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,

/*
 * A. load 4 packet in one loop
+* [A*. mask out 4 unused dirty field in desc]
 * B. copy 4 mbuf

[dpdk-dev] [PATCH v2 03/15] ixgbe: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.
Note that around 2.5% performance drop (64B) was observed of doing
4 ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 146 +-
 1 file changed, 112 insertions(+), 34 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index e6766b3..a2e4234 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -866,40 +866,107 @@ end_of_tx:
  *  RX functions
  *
  **/
-static inline uint64_t
-rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+#define IXGBE_PACKET_TYPE_IPV4  0X01
+#define IXGBE_PACKET_TYPE_IPV4_TCP  0X11
+#define IXGBE_PACKET_TYPE_IPV4_UDP  0X21
+#define IXGBE_PACKET_TYPE_IPV4_SCTP 0X41
+#define IXGBE_PACKET_TYPE_IPV4_EXT  0X03
+#define IXGBE_PACKET_TYPE_IPV4_EXT_SCTP 0X43
+#define IXGBE_PACKET_TYPE_IPV6  0X04
+#define IXGBE_PACKET_TYPE_IPV6_TCP  0X14
+#define IXGBE_PACKET_TYPE_IPV6_UDP  0X24
+#define IXGBE_PACKET_TYPE_IPV6_EXT  0X0C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_TCP  0X1C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_UDP  0X2C
+#define IXGBE_PACKET_TYPE_IPV4_IPV6 0X05
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_TCP 0X15
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_UDP 0X25
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT 0X0D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IXGBE_PACKET_TYPE_MAX   0X80
+#define IXGBE_PACKET_TYPE_MASK  0X7F
+#define IXGBE_PACKET_TYPE_SHIFT 0X04
+static inline uint32_t
+ixgbe_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
 {
-   uint64_t pkt_flags;
-
-   static uint64_t ip_pkt_types_map[16] = {
-   0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
-   PKT_RX_IPV6_HDR, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
+   static const uint32_t
+   ptype_table[IXGBE_PACKET_TYPE_MAX] __rte_cache_aligned = {
+   [IXGBE_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4,
+   [IXGBE_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4_EXT,
+   [IXGBE_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6,
+   [IXGBE_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT,
+   [IXGBE_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 |

[dpdk-dev] [PATCH v2 02/15] e1000: support of unified packet type

2015-02-09 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_e1000/igb_rxtx.c | 98 ++---
 1 file changed, 83 insertions(+), 15 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 5c394a9..12a68f4 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -602,17 +602,85 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
  *  RX functions
  *
  **/
+#define IGB_PACKET_TYPE_IPV4  0X01
+#define IGB_PACKET_TYPE_IPV4_TCP  0X11
+#define IGB_PACKET_TYPE_IPV4_UDP  0X21
+#define IGB_PACKET_TYPE_IPV4_SCTP 0X41
+#define IGB_PACKET_TYPE_IPV4_EXT  0X03
+#define IGB_PACKET_TYPE_IPV4_EXT_SCTP 0X43
+#define IGB_PACKET_TYPE_IPV6  0X04
+#define IGB_PACKET_TYPE_IPV6_TCP  0X14
+#define IGB_PACKET_TYPE_IPV6_UDP  0X24
+#define IGB_PACKET_TYPE_IPV6_EXT  0X0C
+#define IGB_PACKET_TYPE_IPV6_EXT_TCP  0X1C
+#define IGB_PACKET_TYPE_IPV6_EXT_UDP  0X2C
+#define IGB_PACKET_TYPE_IPV4_IPV6 0X05
+#define IGB_PACKET_TYPE_IPV4_IPV6_TCP 0X15
+#define IGB_PACKET_TYPE_IPV4_IPV6_UDP 0X25
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT 0X0D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IGB_PACKET_TYPE_MAX   0X80
+#define IGB_PACKET_TYPE_MASK  0X7F
+#define IGB_PACKET_TYPE_SHIFT 0X04
+static inline uint32_t
+igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+   static const uint32_t
+   ptype_table[IGB_PACKET_TYPE_MAX] __rte_cache_aligned = {
+   [IGB_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+   [IGB_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4_EXT,
+   [IGB_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+   [IGB_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6,
+   [IGB_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT,
+   [IGB_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+   [IGB_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+   [IGB_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+   [IGB_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+   RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+   };
+   if (unlikely(pkt_info & E1000_RXDADV_PKTTYPE_ETQF))
+   return RTE_PTYPE_UNKNOWN;
+
+   pkt_info = (pkt_info >> IGB_PACKET_TYPE_SHIFT) & IGB_PACKET_TYPE_MASK;
+
+   return ptype_table[pkt_info];
+}
+
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-   uint64_t pkt_flags;
-
-   static uint64_t ip_pkt_types_map[16] = {
-   0, PKT_RX_IPV4_HDR,

[dpdk-dev] [PATCH v2 01/15] mbuf: add definitions of unified packet types

2015-02-09 Thread Helin Zhang

As there are only 6 bit flags in ol_flags for indicating packet types,
which is not enough to describe all the possible packet types hardware
can recognize. For example, i40e hardware can recognize more than 150
packet types. Unified packet type is composed of tunnel type, L3 type,
L4 type and inner L3 type fields, and can be stored in mbuf field of
'packet_type' which is modified from 16 bits to 32 bits in mbuf structure.
Accordingly, the structure of 'rte_kni_mbuf' needs to be modifed as well.

Signed-off-by: Helin Zhang 
Signed-off-by: Cunming Liang 
Signed-off-by: Jijiang Liu 
---
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   4 +-
 lib/librte_mbuf/rte_mbuf.h | 113 +++--
 2 files changed, 108 insertions(+), 9 deletions(-)

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.

diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h 
b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 1e55c2d..bd1cc09 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -117,9 +117,9 @@ struct rte_kni_mbuf {
uint16_t data_off;  /**< Start address of data in segment buffer. */
char pad1[4];
uint64_t ol_flags;  /**< Offload features. */
-   char pad2[2];
-   uint16_t data_len;  /**< Amount of data in segment buffer. */
+   char pad2[4];
uint32_t pkt_len;   /**< Total pkt len: sum of all segment 
data_len. */
+   uint16_t data_len;  /**< Amount of data in segment buffer. */

/* fields on second cache line */
char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 16059c6..ee912d6 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -165,6 +165,96 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */

+/*
+ * 32 bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for L2 types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 11:8 is for L4 or outer L4 (for tunneling case) types.
+ * - Bit 15:12 is for tunnel types.
+ * - Bit 19:16 is for inner L2 types.
+ * - Bit 23:20 is for inner L3 types.
+ * - Bit 27:24 is for inner L4 types.
+ * - Bit 31:28 is reserved.
+ *
+ * To be compatible with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ */
+#define RTE_PTYPE_UNKNOWN   0x
+/* bit 3:0 for L2 types */
+#define RTE_PTYPE_L2_MAC0x0001
+#define RTE_PTYPE_L2_MAC_TIMESYNC   0x0002
+#define RTE_PTYPE_L2_ARP0x0003
+#define RTE_PTYPE_L2_LLDP   0x0004
+#define RTE_PTYPE_L2_MASK   0x000f
+/* bit 7:4 for L3 types */
+#define RTE_PTYPE_L3_IPV4   0x0010
+#define RTE_PTYPE_L3_IPV4_EXT   0x0030
+#define RTE_PTYPE_L3_IPV6   0x0040
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN   0x0090
+#define RTE_PTYPE_L3_IPV6_EXT   0x00c0
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN   0x00e0
+#define RTE_PTYPE_L3_MASK   0x00f0
+/* bit 11:8 for L4 types */
+#define RTE_PTYPE_L4_TCP0x0100
+#define RTE_PTYPE_L4_UDP0x0200
+#define RTE_PTYPE_L4_FRAG   0x0300
+#define RTE_PTYPE_L4_SCTP   0x0400
+#define RTE_PTYPE_L4_ICMP   0x0500
+#define RTE_PTYPE_L4_NONFRAG0x0600
+#define RTE_PTYPE_L4_MASK   0x0f00
+/* bit 15:12 for tunnel types */
+#define RTE_PTYPE_TUNNEL_IP 0x1000
+#define RTE_PTYPE_TUNNEL_GRE0x2000
+#define RTE_PTYPE_TUNNEL_VXLAN  0x3000
+#define RTE_PTYPE_TUNNEL_NVGRE  0x4000
+#define RTE_PTYPE_TUNNEL_GENEVE 0x5000
+#define RTE_PTYPE_TUNNEL_GRENAT 0x6000
+#define RTE_PTYPE_TUNNEL_MASK   0xf000
+/* bit 19:16 for inner L2 types */
+#define RTE_PTYPE_INNER_L2_MAC  0x0001
+#define RTE_PTYPE_INNER_L2_MAC_VLAN 0x0002
+#define RTE_PTYPE_INNER_L2_MASK 0x000f
+/* bit 23:20 for inner

[dpdk-dev] [PATCH v2 00/15] unified packet type

2015-02-09 Thread Helin Zhang

Currently only 6 bits which are stored in ol_flags are used to indicate the
packet types. This is not enough, as some NIC hardware can recognize quite
a lot of packet types, e.g i40e hardware can recognize more than 150 packet
types. Hiding those packet types hides hardware offload capabilities which
could be quite useful for improving performance and for end users. So an
unified packet types are needed to support all possible PMDs. A 16 bits
packet_type in mbuf structure can be changed to 32 bits and used for this
purpose. In addition, all packet types stored in ol_flag field should be
deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 32 bits of packet_type can be divided into several sub fields to
indicate different packet type information of a packet. The initial design
is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel
types, inner L2 types, inner L3 types and inner L4 types. All PMDs should
translate the offloaded packet types into these 7 fields of information,
for user applications.

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
* Used redefined packet types and enlarged packet_type field for all PMDs
  and corresponding applications.
* Removed changes in bond and its relevant application, as there is no need
  at all according to the recent bond changes.

Helin Zhang (15):
  mbuf: add definitions of unified packet types
  e1000: support of unified packet type
  ixgbe: support of unified packet type
  ixgbe: support of unified packet type for vector
  i40e: support of unified packet type
  enic: support of unified packet type
  vmxnet3: support of unified packet type
  app/test-pipeline: support of unified packet type
  app/test: support of unified packet type
  examples/ip_fragmentation: support of unified packet type
  examples/ip_reassembly: support of unified packet type
  examples/l3fwd-acl: support of unified packet type
  examples/l3fwd-power: support of unified packet type
  examples/l3fwd: support of unified packet type
  mbuf: remove old packet type bit masks

 app/test-pipeline/pipeline_hash.c  |   7 +-
 app/test-pmd/csumonly.c|   6 +-
 app/test-pmd/rxonly.c  |   9 +-
 examples/ip_fragmentation/main.c   |   7 +-
 examples/ip_reassembly/main.c  |   7 +-
 examples/l3fwd-acl/main.c  |  19 +-
 examples/l3fwd-power/main.c|   5 +-
 examples/l3fwd/main.c  |  64 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   4 +-
 lib/librte_mbuf/rte_mbuf.c |   6 -
 lib/librte_mbuf/rte_mbuf.h | 127 +++-
 lib/librte_pmd_e1000/igb_rxtx.c|  98 ++-
 lib/librte_pmd_enic/enic_main.c|  14 +-
 lib/librte_pmd_i40e/i40e_rxtx.c| 786 ++---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c  | 146 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c  |  49 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c  |   4 +-
 17 files changed, 914 insertions(+), 444 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 2/2] i40e:enable TSO support

2015-02-09 Thread Jijiang Liu

This patch enables i40e TSO feature for both non-tunneling packet and tunneling 
packet.

Signed-off-by: Jijiang Liu 
Signed-off-by: Miroslaw Walukiewicz 
---
 lib/librte_pmd_i40e/i40e_rxtx.c |   99 ---
 lib/librte_pmd_i40e/i40e_rxtx.h |   13 +
 2 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 349c1e5..9b9bdcd 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -465,16 +465,13 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
uint32_t *td_cmd,
uint32_t *td_offset,
-   uint8_t l2_len,
-   uint16_t l3_len,
-   uint8_t outer_l2_len,
-   uint16_t outer_l3_len,
+   union i40e_tx_offload tx_offload,
uint32_t *cd_tunneling)
 {
/* UDP tunneling packet TX checksum offload */
if (unlikely(ol_flags & PKT_TX_OUTER_IP_CKSUM)) {

-   *td_offset |= (outer_l2_len >> 1)
+   *td_offset |= (tx_offload.outer_l2_len >> 1)
<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;

if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
@@ -485,25 +482,35 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;

/* Now set the ctx descriptor fields */
-   *cd_tunneling |= (outer_l3_len >> 2) <<
+   *cd_tunneling |= (tx_offload.outer_l3_len >> 2) <<
I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
-   (l2_len >> 1) <<
+   (tx_offload.l2_len >> 1) <<
I40E_TXD_CTX_QW0_NATLEN_SHIFT;

} else
-   *td_offset |= (l2_len >> 1)
+   *td_offset |= (tx_offload.l2_len >> 1)
<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;

/* Enable L3 checksum offloads */
if (ol_flags & PKT_TX_IP_CKSUM) {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
-   *td_offset |= (l3_len >> 2) << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+   *td_offset |= (tx_offload.l3_len >> 2)
+   << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
} else if (ol_flags & PKT_TX_IPV4) {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
-   *td_offset |= (l3_len >> 2) << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+   *td_offset |= (tx_offload.l3_len >> 2)
+<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
} else if (ol_flags & PKT_TX_IPV6) {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
-   *td_offset |= (l3_len >> 2) << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+   *td_offset |= (tx_offload.l3_len >> 2)
+<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+   }
+
+   if (ol_flags & PKT_TX_TCP_SEG) {
+   *td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+   *td_offset |= (tx_offload.l4_len >> 2)
+   << I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+   return;
}

/* Enable L4 checksum offloads */
@@ -1154,7 +1161,7 @@ i40e_calc_context_desc(uint64_t flags)
 {
uint64_t mask = 0ULL;

-   mask |= PKT_TX_OUTER_IP_CKSUM;
+   mask |= (PKT_TX_OUTER_IP_CKSUM | PKT_TX_TCP_SEG);

 #ifdef RTE_LIBRTE_IEEE1588
mask |= PKT_TX_IEEE1588_TMST;
@@ -1165,6 +1172,41 @@ i40e_calc_context_desc(uint64_t flags)
return 0;
 }

+/* set i40e TSO context descriptor */
+static inline uint64_t
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+{
+
+   uint64_t ctx_desc = 0;
+   uint32_t cd_cmd, hdr_len, cd_tso_len;
+
+
+   if (!tx_offload.l4_len) {
+   PMD_DRV_LOG(DEBUG, "L4 length set to 0");
+   return ctx_desc;
+   }
+
+   /**
+* in case of tunneling packet, the outer_l2_len and
+* outer_l3_len must be 0.
+*/
+   hdr_len = tx_offload.outer_l2_len +
+   tx_offload.outer_l3_len +
+   tx_offload.l2_len +
+   tx_offload.l3_len +
+   tx_offload.l4_len;
+
+   cd_cmd = I40E_TX_CTX_DESC_TSO;
+   cd_tso_len = mbuf->pkt_len - hdr_len;
+   ctx_desc |= ((uint64_t)cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
+   ((uint64_t)cd_tso_len <<
+I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
+   ((uint64_t)mbuf->tso_segsz <<
+   I40E_TXD_CTX_QW1_MSS_SHIFT);
+
+   return ctx_desc;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1183,15 +1225,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
uint32_t tx_flags;
uint32_t td_tag;

[dpdk-dev] [PATCH 1/2] i40e:advertise TSO capability

2015-02-09 Thread Jijiang Liu

Advertise the DEV_TX_OFFLOAD_TCP_TSO flag in the PMD features. It means that 
the i40e PMD supports the offload of TSO.

Signed-off-by: Jijiang Liu 
Signed-off-by: Miroslaw Walukiewicz 
---
 lib/librte_pmd_i40e/i40e_ethdev.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index 6f385d2..fd625f9 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -1530,7 +1530,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_UDP_CKSUM |
DEV_TX_OFFLOAD_TCP_CKSUM |
DEV_TX_OFFLOAD_SCTP_CKSUM |
-   DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
+   DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+   DEV_TX_OFFLOAD_TCP_TSO;
dev_info->reta_size = pf->hash_lut_size;

dev_info->default_rxconf = (struct rte_eth_rxconf) {
-- 
1.7.7.6

[dpdk-dev] [PATCH 0/2] support TSO on i40e

2015-02-09 Thread Jijiang Liu

This patch set enables i40e TSO feature for both non-tunneling packet and 
tunneling packet.

Change logs:
v2 change: rework based on Olivier's patch set [PATCH v2 00/20] enhance tx 
checksum offload API
http://dpdk.org/ml/archives/dev/2015-February/012375.html

Jijiang Liu (2):
  advertise TSO capability in the i40e PMD
  enable i40e TSO feature 

 lib/librte_pmd_i40e/i40e_ethdev.c |3 +-
 lib/librte_pmd_i40e/i40e_rxtx.c   |   99 +++--
 lib/librte_pmd_i40e/i40e_rxtx.h   |   13 +
 3 files changed, 87 insertions(+), 28 deletions(-)

-- 
1.7.7.6

[dpdk-dev] [PATCH 0/3] Enable uio_pci_generic support

2015-02-09 Thread Bruce Richardson

On Thu, Jan 29, 2015 at 05:28:16PM +0800, Danny Zhou wrote:
> Linux kernel provides UIO as well as VFIO mechanism to support writing user
> space device driver. Comparing to UIO which is available since 2.6.32 kernel,
> the VFIO is introduced into kernel since version 3.6.0 with better interrupt
> and memory protection (build on top of Intel VT-d technology) supports.
> Basically, UIO and VFIO do two common things below:
> 1) Map PCIe device's I/O memory space to user space driver
> 2) Support device interrupt notification mechanism that notifies user space
>driver/application when a device interrupt triggers.
> 
> To run an DPDK application and make use of VFIO, two in_kernel modules
> vfio and vfio-pci module must be loaded. To use UIO, a DPDK kernel
> module igb_uio, which was there since DPDK is invented, must be loaded to
> attach to in_kernel uio module. As an solution to deprecate igb_uio, 
> this patch serials leverage the uio_pci_generic in_kernel module to support
> DPDK user space PMD in a generic fashion (similar to how VFIO works), to
> remove user space DPDK dependency on GPL code igb_uio in kernel. 
> 
> Example to bind Network Ports to uio_pci_generic:
>   modprobe uio
>   modprobe uio_pci_generic
>   /* to bind device 08:00.0, to the uio_pci_generic driver */
>   ./tools/dpdk_nic_bind.py -b uio_pci_generic 08:00.0
> 
> Note: this patch set does not remove igb_uio support due to igb_uio supports
> creating maximum number of SR-IOV VFs (Virtual Functions) by using max_vfs 
> kernel parameter on older kernels (kernel 3.7.x and below).
> Specifically, igb_uio explicitly calls pci_enable_sriov() to create VFs, while
> it is not invoked in either uio or uio_pci_generic kernel modules. On kernel 
> 3.8.x
> and above, user can use the standard sysfs to enable VFs. For examples:
> 
> #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs   //enable VFs
> #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable 
> VFs
> 
> Danny Zhou (3):
>   eal: add interrupt enable/disable routines for uio_pci_generic
>   eal: enable uio_pci_generic support
>   tools: enable binding NIC device to uio_pci_generic
> 
>  lib/librte_eal/common/include/rte_pci.h|   1 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  68 +--
>  lib/librte_eal/linuxapp/eal/eal_pci.c  |   2 +-
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 202 
> +++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |  10 +-
>  tools/dpdk_nic_bind.py |   2 +-
>  6 files changed, 172 insertions(+), 113 deletions(-)
> 
> -- 
> 1.8.1.4
> 

Series
Acked-by: Bruce Richardson

[dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to (-1) by default

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 12/17] eal: set _lcore_id and _socket_id to 
> (-1)
> by default
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY.
> > The libraries using *_lcore_id* as index need to take care.
> > *_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity
> 
> unitl -> until
[LCM] accept.
> 
> > by rte_thread_set_affinity()
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal_thread.c   | 4 ++--
> >  lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > index 5b16302..2b3c9a8 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > @@ -56,8 +56,8 @@
> >  #include "eal_private.h"
> >  #include "eal_thread.h"
> >
> > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
> > -RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
> > +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
> > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> >
> >  /*
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c
> b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > index 6eb1525..ab94e20 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > @@ -57,8 +57,8 @@
> >  #include "eal_private.h"
> >  #include "eal_thread.h"
> >
> > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
> > -RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
> > +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
> > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> 
> As far as I understand, now a rte_lcore_id() can return LCORE_ID_ANY.
> This should be modified in the rte_lcore_id() API comments.
> 
> Same for rte_socket_id().
[LCM] accept.
> 
> I also wonder if the API of these functions should be modified to
> return an int instead of an unsigned as LCORE_ID_ANY is -1.
[LCM] I prefer not change the API definition. (unsigned)LCORE_ID_ANY already 
used before.
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL thread

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 11/17] log: fix the gap to support non-EAL
> thread
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > For those non-EAL thread, *_lcore_id* is invalid and probably larger than
> RTE_MAX_LCORE.
> > The patch adds the check and allows only EAL thread using EAL per thread log
> level and log type.
> > Others shares the global log level.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/common/eal_common_log.c  | 17 +++--
> >  lib/librte_eal/common/include/rte_log.h |  5 +
> >  2 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_log.c
> b/lib/librte_eal/common/eal_common_log.c
> > index cf57619..e8dc94a 100644
> > --- a/lib/librte_eal/common/eal_common_log.c
> > +++ b/lib/librte_eal/common/eal_common_log.c
> > @@ -193,11 +193,20 @@ rte_set_log_type(uint32_t type, int enable)
> > rte_logs.type &= (~type);
> >  }
> >
> > +/* Get global log type */
> > +uint32_t
> > +rte_get_log_type(void)
> > +{
> > +   return rte_logs.type;
> > +}
> > +
> >  /* get the current loglevel for the message beeing processed */
> >  int rte_log_cur_msg_loglevel(void)
> >  {
> > unsigned lcore_id;
> > lcore_id = rte_lcore_id();
> > +   if (lcore_id >= RTE_MAX_LCORE)
> > +   return rte_get_log_level();
> > return log_cur_msg[lcore_id].loglevel;
> >  }
> >
> > @@ -206,6 +215,8 @@ int rte_log_cur_msg_logtype(void)
> >  {
> > unsigned lcore_id;
> > lcore_id = rte_lcore_id();
> > +   if (lcore_id >= RTE_MAX_LCORE)
> > +   return rte_get_log_type();
> > return log_cur_msg[lcore_id].logtype;
> >  }
> >
> > @@ -265,8 +276,10 @@ rte_vlog(__attribute__((unused)) uint32_t level,
> >
> > /* save loglevel and logtype in a global per-lcore variable */
> > lcore_id = rte_lcore_id();
> > -   log_cur_msg[lcore_id].loglevel = level;
> > -   log_cur_msg[lcore_id].logtype = logtype;
> > +   if (lcore_id < RTE_MAX_LCORE) {
> > +   log_cur_msg[lcore_id].loglevel = level;
> > +   log_cur_msg[lcore_id].logtype = logtype;
> > +   }
> >
> > ret = vfprintf(f, format, ap);
> > fflush(f);
> > diff --git a/lib/librte_eal/common/include/rte_log.h
> b/lib/librte_eal/common/include/rte_log.h
> > index db1ea08..f83a0d9 100644
> > --- a/lib/librte_eal/common/include/rte_log.h
> > +++ b/lib/librte_eal/common/include/rte_log.h
> > @@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void);
> >  void rte_set_log_type(uint32_t type, int enable);
> >
> >  /**
> > + * Get the global log type.
> > + */
> > +uint32_t rte_get_log_type(void);
> > +
> > +/**
> >   * Get the current loglevel for the message being processed.
> >   *
> >   * Before calling the user-defined stream for logging, the log
> >
> 
> Wouldn't it be better to change the variable:
> static struct log_cur_msg log_cur_msg[RTE_MAX_LCORE];
> into a pthread (tls) variable?
> 
> With your patch, the log level and log type are not saved for
> non-EAL threads. If TLS were used, I think it would work in any case.
[LCM] Good point. But for this patch set, still suppose not involve big impact 
to EAL thread.
For improve non-EAL thread, we'll have a separate patch set for it.
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of SOCKET_ID_ANY

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 10/17] malloc: fix the issue of 
> SOCKET_ID_ANY
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > Add check for rte_socket_id(), avoid get unexpected return like (-1).
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_malloc/malloc_heap.h | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_malloc/malloc_heap.h 
> > b/lib/librte_malloc/malloc_heap.h
> > index b4aec45..a47136d 100644
> > --- a/lib/librte_malloc/malloc_heap.h
> > +++ b/lib/librte_malloc/malloc_heap.h
> > @@ -44,7 +44,12 @@ extern "C" {
> >  static inline unsigned
> >  malloc_get_numa_socket(void)
> >  {
> > -   return rte_socket_id();
> > +   unsigned socket_id = rte_socket_id();
> > +
> > +   if (socket_id == (unsigned)SOCKET_ID_ANY)
> > +   return 0;
> > +
> > +   return socket_id;
> >  }
> >
> >  void *
> >
> 
> The documentation off rte_malloc_socket() says:
> 
> @param socket
>   NUMA socket to allocate memory on. If SOCKET_ID_ANY is used, this
>   function will behave the same as rte_malloc().
> 
> void *
> rte_malloc_socket(const char *type, size_t size, unsigned align, int
> socket);
> 
> 
> Your patch changes the behavior of rte_malloc() without explaining
> why, and the documentation becomes wrong.
> 
> Can you explain why you need this change?
[LCM] I don't think I change the declaration of rte_malloc_socket().
If socket_arg=SOCKET_ID_ANY, the socket value expect to the return value of 
malloc_get_numa_socket().
The malloc_get_numa_socket() supposed to return the correct TLS _socket_id.
It works fine for normal cases. But as we change the default value of TLS 
_socket_id to SOCKET_ID_ANY.
And one lcore can run on multiple cpu, if all cpus in the cpuset are not 
belongs to one NUMA node, the _socket_id would be SOCKET_ID_ANY.
When user call rte_malloc_socket(SOCKET_ID_ANY), it does provide the same 
behavior as rte_malloc().
They both will get socket_id from malloc_get_numa_socket(). The addition part 
is the exception path process.
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 09/17] enic: fix re-define freebsd compile complain

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 09/17] enic: fix re-define freebsd compile
> complain
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > Some macro already been defined by freebsd 'sys/param.h'.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_pmd_enic/enic.h| 1 +
> >  lib/librte_pmd_enic/enic_compat.h | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h
> > index c43417c..189c3b9 100644
> > --- a/lib/librte_pmd_enic/enic.h
> > +++ b/lib/librte_pmd_enic/enic.h
> > @@ -66,6 +66,7 @@
> >  #define ENIC_CALC_IP_CKSUM  1
> >  #define ENIC_CALC_TCP_UDP_CKSUM 2
> >  #define ENIC_MAX_MTU9000
> > +#undef PAGE_SIZE
> >  #define PAGE_SIZE   4096
> >  #define PAGE_ROUND_UP(x) \
> > unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1)))
> > diff --git a/lib/librte_pmd_enic/enic_compat.h
> b/lib/librte_pmd_enic/enic_compat.h
> > index b1af838..b84c766 100644
> > --- a/lib/librte_pmd_enic/enic_compat.h
> > +++ b/lib/librte_pmd_enic/enic_compat.h
> > @@ -67,6 +67,7 @@
> >  #define pr_warn(y, args...) dev_warning(0, y, ##args)
> >  #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__)
> >
> > +#undef ALIGN
> >  #define ALIGN(x, a)  __ALIGN_MASK(x, (typeof(x))(a)-1)
> >  #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask))
> >  #define udelay usleep
> >
> 
> Is the issue caused by a change you've made previously in the patch
> series?
[LCM] Yes, caused by [01/17] which include  in freebsdapp.
> 
> Wouldn't it be better to rename the macros in enic instead of doing
> #undef?
[LCM] Agree, will do it.
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by assigned cpuset

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:01 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 08/17] eal: apply affinity of EAL thread by
> assigned cpuset
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > EAL threads use assigned cpuset to set core affinity during startup.
> > It keeps 1:1 mapping, if no '--lcores' option is used.
> >
> > [...]
> >
> >  lib/librte_eal/bsdapp/eal/eal.c  | 13 ---
> >  lib/librte_eal/bsdapp/eal/eal_thread.c   | 63 
> > +-
> >  lib/librte_eal/linuxapp/eal/eal.c|  7 +++-
> >  lib/librte_eal/linuxapp/eal/eal_thread.c | 67 
> > +++-
> >  4 files changed, 54 insertions(+), 96 deletions(-)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal.c 
> > b/lib/librte_eal/bsdapp/eal/eal.c
> > index 69f3c03..98c5a83 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal.c
> > @@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv)
> > int i, fctret, ret;
> > pthread_t thread_id;
> > static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
> > +   char cpuset[CPU_STR_LEN];
> >
> > if (!rte_atomic32_test_and_set(_once))
> > return -1;
> > @@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv)
> > if (rte_eal_pci_init() < 0)
> > rte_panic("Cannot init PCI\n");
> >
> > -   RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n",
> > -   rte_config.master_lcore, thread_id);
> > -
> > eal_check_mem_on_local_socket();
> >
> > rte_eal_mcfg_complete();
> >
> > +   eal_thread_init_master(rte_config.master_lcore);
> > +
> > +   eal_thread_dump_affinity(cpuset, CPU_STR_LEN);
> > +
> > +   RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n",
> > +   rte_config.master_lcore, thread_id, cpuset);
> > +
> > if (rte_eal_dev_init() < 0)
> > rte_panic("Cannot init pmd devices\n");
> >
> > @@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv)
> > rte_panic("Cannot create thread\n");
> > }
> >
> > -   eal_thread_init_master(rte_config.master_lcore);
> > -
> > /*
> >  * Launch a dummy function on all slave lcores, so that master lcore
> >  * knows they are all ready when this function returns.
> 
> I wonder if changing this may have an impact on third-party drivers
> that already use a management thread. Before the patch, the init()
> function of the external library was called with default affinities,
> and now it's called with the affinity from master lcore.
> 
> I think it should at least be noticed in the commit log.
> 
> Why are you doing this change? (I don't say it's a bad change, but
> I don't understand why you are doing it here)
[LCM] To be honest, the main purpose is I don't found any reason to have 
linuxapp and freebsdapp in different init sequence.
I means in linux it init_master before dev_init(), but in freebsd it reverse.
And as the default value of TLS already changes, if dev_init() first and using 
those TLS, the result will be not in an EAL thread.
But actually they're in the EAL master thread. So I prefer to do the change 
follows linuxapp sequence.
> 
> 
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > index d0c077b..5b16302 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > @@ -103,55 +103,27 @@ eal_thread_set_affinity(void)
> >  {
> > int s;
> > pthread_t thread;
> > -
> > -/*
> > - * According to the section VERSIONS of the CPU_ALLOC man page:
> > - *
> > - * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were
> added
> > - * in glibc 2.3.3.
> > - *
> > - * CPU_COUNT() first appeared in glibc 2.6.
> > - *
> > - * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),
> CPU_ALLOC(),
> > - * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(),  CPU_SET_S(),
> CPU_CLR_S(),
> > - * CPU_ISSET_S(),  CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and
> CPU_EQUAL_S()
> > - * first appeared in glibc 2.7.
> > - */
> > -#if defined(CPU_ALLOC)
> > -   size_t size;
> > -   cpu_set_t *cpusetp;
> > -
> > -   cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
> > -   if (cpusetp == NULL) {
> > -   RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
> > -   return -1;
> > -   }
> > -
> > -   size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
> > -   CPU_ZERO_S(size, cpusetp);
> > -   CPU_SET_S(rte_lcore_id(), size, cpusetp);
> > +   unsigned lcore_id = rte_lcore_id();
> >
> > thread = pthread_self();
> > -   s = pthread_setaffinity_np(thread, size, cpusetp);
> > +   s = pthread_setaffinity_np(thread, sizeof(cpuset_t),
> > +  _config[lcore_id].cpuset);
> > if (s != 0) {
> > RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> > -   CPU_FREE(cpusetp);
> > return -1;
> > }
>

[dpdk-dev] [PATCH v7 08/14] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-02-09 Thread Iremonger, Bernard

> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Monday, February 9, 2015 8:31 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; Qiu, Michael; Tetsuya Mukawa
> Subject: [PATCH v7 08/14] eal/linux/pci: Add functions for unmapping igb_uio 
> resources
> 
> The patch adds functions for unmapping igb_uio resources. The patch is only 
> for Linux and igb_uio
> environment. VFIO and BSD are not supported.
> 
> v5:
> - Fix pci_unmap_device() to check pt_driver.
> v4:
> - Add parameter checking.
> - Add header file to determine if hotplug can be enabled.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/common/Makefile  |  1 +
>  lib/librte_eal/common/include/rte_dev_hotplug.h | 44 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 44 +
>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |  8 +++
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 65 
> +
>  5 files changed, 162 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/rte_dev_hotplug.h
> 
> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile 
> index 52c1a5f..db7cc93
> 100644
> --- a/lib/librte_eal/common/Makefile
> +++ b/lib/librte_eal/common/Makefile
> @@ -41,6 +41,7 @@ INC += rte_eal_memconfig.h rte_malloc_heap.h  INC += 
> rte_hexdump.h
> rte_devargs.h rte_dev.h  INC += rte_common_vect.h  INC += 
> rte_pci_dev_feature_defs.h
> rte_pci_dev_features.h
> +INC += rte_dev_hotplug.h
> 
>  ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
>  INC += rte_warnings.h
> diff --git a/lib/librte_eal/common/include/rte_dev_hotplug.h
> b/lib/librte_eal/common/include/rte_dev_hotplug.h
> new file mode 100644
> index 000..b333e0f
> --- /dev/null
> +++ b/lib/librte_eal/common/include/rte_dev_hotplug.h
> @@ -0,0 +1,44 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 IGEL Co.,LTd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of IGEL Co.,Ltd. nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_DEV_HOTPLUG_H_
> +#define _RTE_DEV_HOTPLUG_H_
> +
> +/*
> + * determine if hotplug can be enabled on the system  */ #if
> +defined(RTE_LIBRTE_EAL_HOTPLUG) && defined(RTE_LIBRTE_EAL_LINUXAPP)
> +#define ENABLE_HOTPLUG #endif /* RTE_LIBRTE_EAL_HOTPLUG &
> +RTE_LIBRTE_EAL_LINUXAPP */
> +
> +#endif /* _RTE_DEV_HOTPLUG_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index d847102..c3b7917 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -166,6 +166,25 @@ pci_map_resource(void *requested_addr, int fd, off_t 
> offset, size_t size)
>   return mapaddr;
>  }
> 
> +#ifdef ENABLE_HOTPLUG
> +/* unmap a particular resource */
> +void
> +pci_unmap_resource(void *requested_addr, size_t size) {
> + if (requested_addr == NULL)
> + return;
> +
> + /* Unmap the PCI memory resource of device */
> + if (munmap(requested_addr, size)) {
> + RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
> + __func__, requested_addr, (unsigned long)size,
> + strerror(errno));
> + } else
> + RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n",
Hi Tetsuya,

" PCI memory mapped" should be "PCI memory unmapped"

Regards,

Bernard

> +

[dpdk-dev] mbuf: how to set data to NULL?

2015-02-09 Thread Kavanagh, Mark B

Hi Bruce,

I figured as much, thanks for confirming.

We'll probably go with a flag.

Thanks again,
Mark

> -Original Message-
> From: Richardson, Bruce
> Sent: Monday, February 9, 2015 12:59 PM
> To: Kavanagh, Mark B
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] mbuf: how to set data to NULL?
> 
> On Mon, Feb 09, 2015 at 10:51:36AM +, Kavanagh, Mark B wrote:
> > Hi Bruce,
> >
> > As a follow-on to my previous question: I suppose what I'm really getting 
> > at is trying
> to understand the implications of removing the data pointer, and determine if 
> it's
> possible to replicate behavior observed in DPDK 1.7 (which we need in our use 
> case).
> >
> > Take this situation for example:
> >
> > DPDK 1.7: I want to set an mbuf's data to NULL:
> > =>   buf.data = NULL;
> >  Then, when I subsequently attempt to access the mbuf' data section,
> rte_pktmbuf_mtod(buf) returns NULL
> >
> > DPDK 1.8: I want to set an mbuf's data to NULL:
> > =>  buf.data_off = 0;  (is this correct?)
> > Then, if I attempt to access the mbuf's data, instead of NULL,
> rte_pktmbuf_mtod(buf) returns buf_addr, not NULL.
> >
> > Is it possible in DPDK 1.8 to replicate the same behavior observed in 1.7?
> >
> > Btw, in our use case a data_len of 0 doesn't necessarily indicate a data 
> > value of NULL.
> >
> > Thanks,
> > Mark
> >
> 
> I don't think there is any way to replicate this behaviour exactly with the 
> new mbuf
> structure. Memsetting zero may do what you want, but depending upon what the
> meaning of an mbuf with NULL data is there may still be better ways to 
> indicate
> such a thing e.g. a flag value in another field, or setting data_len to -1?
> 
> /Bruce
> 
> >
> > > -Original Message-
> > > From: Richardson, Bruce
> > > Sent: Wednesday, December 17, 2014 4:50 PM
> > > To: Kavanagh, Mark B
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] mbuf: how to set data to NULL?
> > >
> > > On Wed, Dec 17, 2014 at 04:44:15PM +, Kavanagh, Mark B wrote:
> > > > Hi,
> > > >
> > > > DPDK 1.8.0 removes the data pointer from the mbuf structure, such that 
> > > > the start of
> the
> > > data in the segment buffer must be calculated (i.e. buf_addr + data_off = 
> > > 'data').
> > > >
> > > > Given this, what is the best approach to set the mbuf data to NULL 
> > > > (previously
> mbuf.data
> > > = NULL)?
> > > >
> > > > As I see it, given an initialized mbuf, such that buf_addr is non-null, 
> > > > and data_off
> > > =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to 
> > > memset to 0 from
> > > location (buf_addr + data_off) for a length of (data_len - data_off)?
> > > >
> > > > Thanks in advance,
> > > > Mark
> > >
> > > Why not just set data_len = 0 to indicate an empty mbuf?

[dpdk-dev] [PATCH] enic: silence log message

2015-02-09 Thread David Marchand

On Sun, Feb 8, 2015 at 6:36 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> From: Stephen Hemminger 
>
> Silence is normal. drivers should speak only when spoken to and not
> be chatty.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_pmd_enic/enic_main.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/lib/librte_pmd_enic/enic_main.c
> b/lib/librte_pmd_enic/enic_main.c
> index 48fdca2..dad8922 100644
> --- a/lib/librte_pmd_enic/enic_main.c
> +++ b/lib/librte_pmd_enic/enic_main.c
> @@ -1046,8 +1046,6 @@ int enic_probe(struct enic *enic)
> struct rte_pci_device *pdev = enic->pdev;
> int err = -1;
>
> -   dev_info(enic, " Initializing ENIC PMD version %s\n", DRV_VERSION);
> -
> enic->bar0.vaddr = (void *)pdev->mem_resource[0].addr;
> enic->bar0.len = pdev->mem_resource[0].len;
>

NAK.

The main problem is that enic pmd is using printf to write logs.
So the pmd should be fixed so that dev_* macros use RTE_LOG.

Silence is good when it is the default behaviour.
But I prefer we can change this at runtime, rather than strip the log
messages, especially for init.

-- 
David Marchand

[dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for common thread API

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 06/17] eal: add eal_common_thread.c for
> common thread API
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > The API works for both EAL thread and none EAL thread.
> > When calling rte_thread_set_affinity, the *_socket_id* and
> > *_cpuset* of calling thread will be updated if the thread
> > successful set the cpu affinity.
> >
> > [...]
> > +int
> > +rte_thread_set_affinity(rte_cpuset_t *cpusetp)
> > +{
> > +   int s;
> > +   unsigned lcore_id;
> > +   pthread_t tid;
> > +
> > +   if (!cpusetp)
> > +   return -1;
> 
> Is it really needed to test that cpusetp is not NULL?
[LCM] Accept, we can ignore it and depend on pthread_setaffinity_np() to return 
failure.
> 
> > +
> > +   lcore_id = rte_lcore_id();
> > +   if (lcore_id != (unsigned)LCORE_ID_ANY) {
> 
> This is strange to see something that cannot happen:
> lcore_id == LCORE_ID_ANY is only possible after your patch is 12/17
> is added. Maybe it can be reordered to avoid this inconsistency?
[LCM] You're right, here do some re-order.
The point is to make everything ready before switching the default value to -1.
And we can have the whole function implement in one patch.
It just won't take effect, but won't bring additional risk.
> 
> > +   /* EAL thread */
> > +   tid = lcore_config[lcore_id].thread_id;
> > +
> > +   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
> > +   if (s != 0) {
> > +   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> > +   return -1;
> > +   }
> > +
> > +   /* store socket_id in TLS for quick access */
> > +   RTE_PER_LCORE(_socket_id) =
> > +   eal_cpuset_socket_id(cpusetp);
> > +
> > +   /* store cpuset in TLS for quick access */
> > +   rte_memcpy(_PER_LCORE(_cpuset), cpusetp,
> > +  sizeof(rte_cpuset_t));
> > +
> > +   /* update lcore_config */
> > +   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);
> > +   rte_memcpy(_config[lcore_id].cpuset, cpusetp,
> > +  sizeof(rte_cpuset_t));
> > +   } else {
> > +   /* none EAL thread */
> > +   tid = pthread_self();
> > +
> > +   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
> > +   if (s != 0) {
> > +   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
> > +   return -1;
> > +   }
> > +
> > +   /* store cpuset in TLS for quick access */
> > +   rte_memcpy(_PER_LCORE(_cpuset), cpusetp,
> > +  sizeof(rte_cpuset_t));
> > +
> > +   /* store socket_id in TLS for quick access */
> > +   RTE_PER_LCORE(_socket_id) =
> > +   eal_cpuset_socket_id(cpusetp);
> > +   }
> 
> Why not always using pthread_self() to get the tid?
[LCM] Good point, I haven't notice it.
> 
> I think most of the code could be factorized here. The only difference
> (which is hard to see as is as code is not exactly ordered in the same
> manner) is that the config is updated in case it's an EAL thread.
[LCM] Accept.
> 
> 
> 
> > +
> > +   return 0;
> > +}
> > +
> > +int
> > +rte_thread_get_affinity(rte_cpuset_t *cpusetp)
> > +{
> > +   if (!cpusetp)
> > +   return -1;
> 
> Same here. This is the only reason why rte_thread_get_affinity() could
> fail. Removing this test would allow to change the API to return void
> instead. It will avoid a useless test below in
> eal_thread_dump_affinity().
[LCM] The cpusetp is used as destination of memcpy and the function suppose an 
EAL API.
I don't think it's a good idea to remove the check, do you ?
> 
> > +
> > +   rte_memcpy(cpusetp, _PER_LCORE(_cpuset),
> > +  sizeof(rte_cpuset_t));
> > +
> > +   return 0;
> > +}
> > +
> > +void
> > +eal_thread_dump_affinity(char str[], unsigned size)
> > +{
> > +   rte_cpuset_t cpuset;
> > +   unsigned cpu;
> > +   int ret;
> > +   unsigned int out = 0;
> > +
> > +   if (rte_thread_get_affinity() < 0) {
> > +   str[0] = '\0';
> > +   return;
> > +   }
> 
> This one could be removed it the (== NULL) test is removed.
> 
> > +
> > +   for (cpu = 0; cpu < RTE_MAX_LCORE; cpu++) {
> > +   if (!CPU_ISSET(cpu, ))
> > +   continue;
> > +
> > +   ret = snprintf(str + out,
> > +  size - out, "%u,", cpu);
> > +   if (ret < 0 || (unsigned)ret >= size - out)
> > +   break;
> 
> On the contrary, I think here returning an error to the user
> would be useful so he can knows that the dump is not complete.
[LCM] accept.
> 
> 
> Regards,
> Olivier

[dpdk-dev] i40e: Steps and required configurations of how to achieve the best performance!

2015-02-09 Thread David Marchand

Hello Helin,

On Thu, Oct 16, 2014 at 2:43 AM, Zhang, Helin  wrote:

>  Hi Thomas
>
>
>
> Yes, your proposal it the perfect one, also the most complicated one. I
> was thinking of that one as well, but we did not have enough time for that
> in our 1.8 timeframe.
>
> In the long run, I agree with you to implement EAL function to access PCI
> config space directly. I will try to put it in our plan as soon as
> possible, if no objections.
>
>
>
> For now, I think the quickest and easiest way might be to write out a
> script of using ?setpci?, the Linux command. It is harmless for our code
> base, and we can remove it when we have better choice. What do you think?
>
>
>
> Thank you very much for the great comments on this topic! I really like it!
>

Did you make any progress on this ?
Actually, looking at Stephen patches (
http://dpdk.org/dev/patchwork/patch/3024/), I think we could go with this
approach once both uio and vfio are fine.


-- 
David Marchand

[dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs

2015-02-09 Thread Qiu, Michael

On 2/9/2015 4:31 PM, Tetsuya Mukawa wrote:
> This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
> eal_compare_pci_addr().
>
> v5:
> - Fix pci_scan_one to handle pt_driver correctly.
> v4:
> - Fix calculation method of eal_compare_pci_addr().
> - Add parameter checking.
>
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 25 ---
>  lib/librte_eal/common/eal_common_pci.c|  2 +-
>  lib/librte_eal/common/include/rte_pci.h   | 34 
> +++
>  lib/librte_eal/linuxapp/eal/eal_pci.c | 25 ---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
>  5 files changed, 54 insertions(+), 34 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 74ecce7..c844d58 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   return (0);
>  }
>  
> -/* Compare two PCI device addresses. */
> -static int
> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
> -{
> - uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
> (addr->devid << 8) + addr->function;
> - uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
> (addr2->devid << 8) + addr2->function;
> -
> - if (dev_addr > dev_addr2)
> - return 1;
> - else
> - return 0;
> -}
> -
> -
>  /* Scan one pci sysfs entry, and fill the devices list from it. */
>  static int
>  pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
> @@ -356,13 +342,20 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
>   }
>   else {
>   struct rte_pci_device *dev2 = NULL;
> + int ret;
>  
>   TAILQ_FOREACH(dev2, _device_list, next) {
> - if (pci_addr_comparison(>addr, >addr))
> + ret = eal_compare_pci_addr(>addr, >addr);
> + if (ret > 0)
>   continue;
> - else {
> + else if (ret < 0) {
>   TAILQ_INSERT_BEFORE(dev2, dev, next);
>   return 0;
> + } else { /* already registered */
> + /* update pt_driver */
> + dev2->pt_driver = dev->pt_driver;
> + free(dev);
> + return 0;
>   }
>   }
>   TAILQ_INSERT_TAIL(_device_list, dev, next);
> diff --git a/lib/librte_eal/common/eal_common_pci.c 
> b/lib/librte_eal/common/eal_common_pci.c
> index f3c7f71..a89f5c3 100644
> --- a/lib/librte_eal/common/eal_common_pci.c
> +++ b/lib/librte_eal/common/eal_common_pci.c
> @@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
> rte_pci_device *dev)
>   if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
>   devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
>   continue;
> - if (!memcmp(>addr, >pci.addr, sizeof(dev->addr)))
> + if (!eal_compare_pci_addr(>addr, >pci.addr))
>   return devargs;
>   }
>   return NULL;
> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index 7f2d699..4814cd7 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, struct 
> rte_pci_addr *dev_addr)
>  }
>  #undef GET_PCIADDR_FIELD
>  
> +/* Compare two PCI device addresses. */
> +/**
> + * Utility function to compare two PCI device addresses.
> + *
> + * @param addr
> + *   The PCI Bus-Device-Function address to compare
> + * @param addr2
> + *   The PCI Bus-Device-Function address to compare
> + * @return
> + *   0 on equal PCI address.
> + *   Positive on addr is greater than addr2.
> + *   Negative on addr is less than addr2, or error.
> + */
> +static inline int
> +eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
> +{
> + uint64_t dev_addr, dev_addr2;
> +
> + if ((addr == NULL) || (addr2 == NULL))
> + return -1;
> +
> + dev_addr = (addr->domain << 24) | (addr->bus << 16) |
> + (addr->devid << 8) | addr->function;
> + dev_addr2 = (addr2->domain << 24) | (addr2->bus << 16) |
> + (addr2->devid << 8) | addr2->function;
> +
> + if (dev_addr > dev_addr2)
> + return 1;
> + else if (dev_addr < dev_addr2)
> + return -1;
> + else
> + return 0;
> +}
> +
>  /**
>   * Probe the PCI bus for registered drivers.
>   *
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index c0ca5a5..d847102 100644

[dpdk-dev] mbuf: how to set data to NULL?

2015-02-09 Thread Bruce Richardson

On Mon, Feb 09, 2015 at 10:51:36AM +, Kavanagh, Mark B wrote:
> Hi Bruce,
> 
> As a follow-on to my previous question: I suppose what I'm really getting at 
> is trying to understand the implications of removing the data pointer, and 
> determine if it's possible to replicate behavior observed in DPDK 1.7 (which 
> we need in our use case).
> 
> Take this situation for example:
> 
> DPDK 1.7: I want to set an mbuf's data to NULL:
>   =>   buf.data = NULL;
>Then, when I subsequently attempt to access the mbuf' data 
> section, rte_pktmbuf_mtod(buf) returns NULL
> 
> DPDK 1.8: I want to set an mbuf's data to NULL:
>   =>  buf.data_off = 0;  (is this correct?)
>   Then, if I attempt to access the mbuf's data, instead of NULL, 
> rte_pktmbuf_mtod(buf) returns buf_addr, not NULL.
> 
> Is it possible in DPDK 1.8 to replicate the same behavior observed in 1.7?
> 
> Btw, in our use case a data_len of 0 doesn't necessarily indicate a data 
> value of NULL.
> 
> Thanks,
> Mark
> 

I don't think there is any way to replicate this behaviour exactly with the new 
mbuf
structure. Memsetting zero may do what you want, but depending upon what the
meaning of an mbuf with NULL data is there may still be better ways to indicate
such a thing e.g. a flag value in another field, or setting data_len to -1?

/Bruce

> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Wednesday, December 17, 2014 4:50 PM
> > To: Kavanagh, Mark B
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] mbuf: how to set data to NULL?
> > 
> > On Wed, Dec 17, 2014 at 04:44:15PM +, Kavanagh, Mark B wrote:
> > > Hi,
> > >
> > > DPDK 1.8.0 removes the data pointer from the mbuf structure, such that 
> > > the start of the
> > data in the segment buffer must be calculated (i.e. buf_addr + data_off = 
> > 'data').
> > >
> > > Given this, what is the best approach to set the mbuf data to NULL 
> > > (previously mbuf.data
> > = NULL)?
> > >
> > > As I see it, given an initialized mbuf, such that buf_addr is non-null, 
> > > and data_off
> > =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to 
> > memset to 0 from
> > location (buf_addr + data_off) for a length of (data_len - data_off)?
> > >
> > > Thanks in advance,
> > > Mark
> > 
> > Why not just set data_len = 0 to indicate an empty mbuf?

[dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API declaration

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 05/17] eal: new TLS definition and API
> declaration
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > 1. add two TLS *_socket_id* and *_cpuset*
> > 2. add two external API rte_thread_set/get_affinity
> > 3. add one internal API eal_thread_dump_affinity
> 
> To me, it's a bit strage to add an API withtout the associated code.
> Maybe you have a good reason to do that, but I think in this case it
> should be explained in the commit log.
[LCM] Accept.
> 
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal_thread.c|  2 ++
> >  lib/librte_eal/common/eal_thread.h| 14 ++
> >  lib/librte_eal/common/include/rte_lcore.h | 29
> +++--
> >  lib/librte_eal/linuxapp/eal/eal_thread.c  |  2 ++
> >  4 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > index ab05368..10220c7 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > @@ -56,6 +56,8 @@
> >  #include "eal_thread.h"
> >
> >  RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
> > +RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
> > +RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> >
> >  /*
> >   * Send a message to a slave lcore identified by slave_id to call a
> > diff --git a/lib/librte_eal/common/eal_thread.h
> b/lib/librte_eal/common/eal_thread.h
> > index a25ee86..28edf51 100644
> > --- a/lib/librte_eal/common/eal_thread.h
> > +++ b/lib/librte_eal/common/eal_thread.h
> > @@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
> > return socket_id;
> >  }
> >
> > +/**
> > + * Dump the current pthread cpuset.
> > + * This function is private to EAL.
> > + *
> > + * @param str
> > + *   The string buffer the cpuset will dump to.
> > + * @param size
> > + *   The string buffer size.
> > + */
> > +#define CPU_STR_LEN256
> > +void
> > +eal_thread_dump_affinity(char str[], unsigned size);
> 
> Although it's equivalent for function arguments, I think "char *str" is
> usually preferred over "char str[]". See for instance in snprintf() or
> fgets().
[LCM] Accept.
> 
> What is the purpose of CPU_STR_LEN?
[LCM] For default quick reference for str[] definition used in dump_affinity()
> 
> What occurs if the size of the dump is greater than the size of the
> given buffer? Is the string truncated? Is there a \0 at the end?
[LCM] Yes, always have a '\0' in the end.
> This should be described in the API comments.
[LCM] Accept.
> Maybe adding a return
> value could help the user to determine if the string was truncated.
[LCM] Good idea, so the user can continue to print '...' for the truncated part.
> 
> > +
> > +
> >  #endif /* EAL_THREAD_H */
> > diff --git a/lib/librte_eal/common/include/rte_lcore.h
> b/lib/librte_eal/common/include/rte_lcore.h
> > index 4c7d6bb..facdbdc 100644
> > --- a/lib/librte_eal/common/include/rte_lcore.h
> > +++ b/lib/librte_eal/common/include/rte_lcore.h
> > @@ -43,6 +43,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -80,7 +81,9 @@ struct lcore_config {
> >   */
> >  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
> >
> > -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
> > +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id".
> */
> > +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id".
> */
> > +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset".
> */
> >
> >  /**
> >   * Return the ID of the execution unit we are running on.
> > @@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id)
> >  static inline unsigned
> >  rte_socket_id(void)
> >  {
> > -   return lcore_config[rte_lcore_id()].socket_id;
> > +   return RTE_PER_LCORE(_socket_id);
> >  }
> 
> I don't see where the _socket_id variable is assigned. I think there
> is probably an issue with the splitting of the patches.
[LCM] The value initializes as SOCKET_ID_ANY when RTE_DEFINE_PER_LCORE().
And updated in eal_thread_set_affinity() for EAL thread and 
rte_thread_set_affinity() for non-EAL thread.
> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 04/17] eal: add support parsing socket_id from cpuset

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 04/17] eal: add support parsing socket_id
> from cpuset
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > It returns the socket_id if all cpus in the cpuset belongs
> > to the same NUMA node, otherwise it will return SOCKET_ID_ANY.
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal_lcore.c   |  7 +
> >  lib/librte_eal/common/eal_thread.h  | 52
> +
> >  lib/librte_eal/linuxapp/eal/eal_lcore.c |  7 +
> >  3 files changed, 66 insertions(+)
> >
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c
> b/lib/librte_eal/bsdapp/eal/eal_lcore.c
> > index 72f8ac2..162fb4f 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
> > @@ -41,6 +41,7 @@
> >  #include 
> >
> >  #include "eal_private.h"
> > +#include "eal_thread.h"
> >
> >  /* No topology information available on FreeBSD including NUMA info */
> >  #define cpu_core_id(X) 0
> > @@ -112,3 +113,9 @@ rte_eal_cpu_init(void)
> >
> > return 0;
> >  }
> > +
> > +unsigned
> > +eal_cpu_socket_id(__rte_unused unsigned cpu_id)
> > +{
> > +   return cpu_socket_id(cpu_id);
> > +}
> > diff --git a/lib/librte_eal/common/eal_thread.h
> b/lib/librte_eal/common/eal_thread.h
> > index b53b84d..a25ee86 100644
> > --- a/lib/librte_eal/common/eal_thread.h
> > +++ b/lib/librte_eal/common/eal_thread.h
> > @@ -34,6 +34,10 @@
> >  #ifndef EAL_THREAD_H
> >  #define EAL_THREAD_H
> >
> > +#include 
> > +
> > +#include 
> > +
> >  /**
> >   * basic loop of thread, called for each thread by eal_init().
> >   *
> > @@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void
> *arg);
> >   */
> >  void eal_thread_init_master(unsigned lcore_id);
> >
> > +/**
> > + * Get the NUMA socket id from cpu id.
> > + * This function is private to EAL.
> > + *
> > + * @param cpu_id
> > + *   The logical process id.
> > + * @return
> > + *   socket_id or SOCKET_ID_ANY
> > + */
> > +unsigned eal_cpu_socket_id(unsigned cpu_id);
> 
> Wouldn't it be better to rename the existing function cpu_socket_id()
> in eal_cpu_socket_id() and export it in eal_thread.h?
> 
> In case of bsd where cpu_socket_id() is implemented using a #define,
> a new function should be created returning 0.
[LCM] In eal_lcore.c, the cpu_socket_id()/cpu_core_id() defined as static and 
only used in rte_eal_cpu_init().
I suppose the purpose of origin design is to make the sysfs parsing only 
visible in the file.
No matter remove the 'static' prefix of cpu_core_id() or add a new wrap 
eal_cpu_socket_id(), it results in a new extern EAL API.
So I prefer not change the visibility of the origin static function but have 
one as extern interface.
> 
> 
> > +
> > +/**
> > + * Get the NUMA socket id from cpuset.
> > + * This function is private to EAL.
> > + *
> > + * @param cpusetp
> > + *   The point to a valid cpu set.
> > + * @return
> > + *   socket_id or SOCKET_ID_ANY
> > + */
> > +static inline int
> > +eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
> > +{
> > +   unsigned cpu = 0;
> > +   int socket_id = SOCKET_ID_ANY;
> > +   int sid;
> > +
> > +   if (cpusetp == NULL)
> > +   return SOCKET_ID_ANY;
> 
> SOCKET_ID_ANY is not defined, maybe  should be included
> somewhere.
[LCM] Agree with you, eal_cpuset_socket_id() can move into eal_common_thread.c.
And add rte_memory.h for SOCKET_ID_ANY reference.
> 
> > +
> > +   do {
> > +   if (!CPU_ISSET(cpu, cpusetp))
> > +   continue;
> > +
> > +   if (socket_id == SOCKET_ID_ANY)
> > +   socket_id = eal_cpu_socket_id(cpu);
> > +
> > +   sid = eal_cpu_socket_id(cpu);
> > +   if (socket_id != sid) {
> > +   socket_id = SOCKET_ID_ANY;
> > +   break;
> > +   }
> > +
> > +   } while (++cpu < RTE_MAX_LCORE);
> > +
> > +   return socket_id;
> > +}
> 
> 
> I don't think this function should be inlined.
> 
> As this function is not used, it could be interesting for reviewers
> to understand when
[LCM] It's used in eal_thread_set_affinity() of eal_thread.c.
> 
> > +
> >  #endif /* EAL_THREAD_H */
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c
> b/lib/librte_eal/linuxapp/eal/eal_lcore.c
> > index 29615f8..922af6d 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
> > @@ -45,6 +45,7 @@
> >
> >  #include "eal_private.h"
> >  #include "eal_filesystem.h"
> > +#include "eal_thread.h"
> >
> >  #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u"
> >  #define CORE_ID_FILE "topology/core_id"
> > @@ -197,3 +198,9 @@ rte_eal_cpu_init(void)
> >
> > return 0;
> >  }
> > +
> > +unsigned
> > +eal_cpu_socket_id(unsigned cpu_id)
> > +{
> > +   return cpu_socket_id(cpu_id);
> > +}
> >

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-09 Thread Linhaifeng

On 2015/2/9 10:57, Xu, Qian Q wrote:
> Haifeng, 
> No matter mergeable =0 or 1, I have not met the issue that the vhost-user 
> crash when start VM. Have u changed the code? As you said below, vhost-switch 
> will notify guest after sending every packet, yes, it's the current code, and 
> Huawei, Xie will plan to optimize it in future. Is the crash caused by 
> changing code or any other step? 
> What do you want for the vhost-user, changing the notification mechanism? 
> Thx. By the way, sth means something. 
> 

Yes,I have modify the code for compile errors(I replace it with memset(, 
0, sizeof msgh)).

The issue is failed to mmap(memory size not align to hugepage's size).I guess 
this is qemu's bug.

In file included from 
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/virtio-net.c:34:
/usr/include/linux/vhost.h:33: error: expected specifier-qualifier-list before 
?pid_t?
== Build lib/librte_port
cc1: warnings being treated as errors
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-?.c: 
In function ?read_fd_message?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:141:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:141:
 error: (near initialization for ?msgh.msg_namelen?)
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:
 In function ?send_fd_message?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:213:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:213:
 error: (near initialization for ?msgh.msg_namelen?)
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:
 In function ?vserver_new_vq_conn?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:276:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:276:
 error: (near initialization for ?vdev_ctx.fh?)
make[5]: *** [vhost_user/vhost-net-user.o] Error 1
make[5]: *** Waiting for unfinished jobs
cc1: warnings being treated as errors
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:
 In function ?user_set_mem_table?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:104:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:104:
 error: (near initialization for ?tmp[0].mapped_address?)

> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com] 
> Sent: Saturday, February 07, 2015 12:27 PM
> To: Xu, Qian Q; Xie, Huawei
> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/2/6 13:54, Xu, Qian Q wrote:
>> Haifeng
>> Are you using the latest dpdk branch with vhost-user patches? I have never 
>> met the issue.
>> When is the vhost sample crashed? When you start VM or when you run sth in 
>> VM? Is your qemu 2.2? How about your memory info? Could you give more 
>> details about your steps? 
>>
>>
> 
> I have knew why you never met the issue.Because vhost-switch will notify 
> guest after send every packets(performance is not every well).
> 
> static inline int __attribute__((always_inline)) virtio_tx_local(struct 
> vhost_dev *vdev, struct rte_mbuf *m) {
>   ...
>   ret = rte_vhost_enqueue_burst(tdev, VIRTIO_RXQ, , 1/*you cant try to 
> fill with rx_count*/);   
>   ..
> 
> }
> 
>>
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Friday, February 06, 2015 12:02 PM
>> To: Xu, Qian Q; Xie, Huawei
>> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer 
>> when there is no buffer
>>
>>
>>
>> On 2015/2/4 9:38, Xu, Qian Q wrote:
>>> 4. Launch the VM1 and VM2 with virtio device, note: you need use qemu 
>>> version>2.1 to enable the vhost-user server's feature. Old qemu such as 
>>> 1.5,1.6 didn't support it.
>>> Below is my VM1 startup command, for your reference, similar for VM2. 
>>> /home/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 
>>> -cpu host -enable-kvm -m 2048 -object 
>>> memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on 
>>> -numa node,memdev=mem -mem-prealloc -smp 2 -drive 
>>> file=/home/img/dpdk1-vm1.img -chardev 
>>> socket,id=char0,path=/home/dpdk-vhost/vhost-net -netdev 
>>> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device 
>>> virtio-net-pci,mac=00:00:00:00:00:01, -nographic
>>>
>>> 5. Then in the VM, you can have the same operations as before, send packet 
>>> from virtio1 to virtio2. 
>>>
>>> Pls let me know if any questions, issues. 
>>
>> Hi xie & xu
>>
>> When I try to start VM vhost-switch crashed.
>>
>> VHOST_CONFIG: read message

[dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return value in 32bit icc

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 03/17] eal: fix wrong strnlen() return 
> value in
> 32bit icc
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > The problem is that strnlen() here may return invalid value with 32bit icc.
> > (actually it returns it?s second parameter,e.g: sysconf(_SC_ARG_MAX)).
> > It starts to manifest hwen max_len parameter is > 2M and using icc ?m32 ?O2
> (or above).
> >
> > Suggested-by: Konstantin Ananyev 
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/common/eal_common_options.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_options.c
> b/lib/librte_eal/common/eal_common_options.c
> > index 29ebb6f..22d5d37 100644
> > --- a/lib/librte_eal/common/eal_common_options.c
> > +++ b/lib/librte_eal/common/eal_common_options.c
> > @@ -227,7 +227,7 @@ eal_parse_corelist(const char *corelist)
> > /* Remove all blank characters ahead and after */
> > while (isblank(*corelist))
> > corelist++;
> > -   i = strnlen(corelist, sysconf(_SC_ARG_MAX));
> > +   i = strnlen(corelist, PATH_MAX);
> > while ((i > 0) && isblank(corelist[i - 1]))
> > i--;
> >
> > @@ -469,7 +469,7 @@ eal_parse_lcores(const char *lcores)
> > /* Remove all blank characters ahead and after */
> > while (isblank(*lcores))
> > lcores++;
> > -   i = strnlen(lcores, sysconf(_SC_ARG_MAX));
> > +   i = strnlen(lcores, PATH_MAX);
> > while ((i > 0) && isblank(lcores[i - 1]))
> > i--;
> >
> >
> 
> I think PATH_MAX is not equivalent to _SC_ARG_MAX.
> 
> But the main question is: why do we need to use strnlen() here instead
> of strlen? We can expect that argv[] pointers are always nul-terminated.
> Replacing them by strlen() would probably also solve the icc issue.
[LCM] You're right, here strlen() also solve icc issue and no risk for argv[].
But follows practice suggestion, keeping using those with 'n' function in DPDK 
is not bad.
There's additional two reason to keep strnlen and PATH_MAX.
1. PATH_MAX is defined as 4096 which is enough as our input. It doesn't matter 
to be _SC_ARG_MAX or not.
2. strnlen and PATH_MAX already used in eal_parse_coremask, to keep the style 
consistent in '-l' and '--lcores'.

> 
> Regards,
> Olivier

[dpdk-dev] [PATCH v4 02/17] eal: new eal option '--lcores' for cpu assignment

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 02/17] eal: new eal option '--lcores' for 
> cpu
> assignment
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > It supports one new eal long option '--lcores' for EAL thread cpuset 
> > assignment.
> >
> > The format pattern:
> > --lcores='lcores[@cpus]<,lcores[@cpus]>'
> > lcores, cpus could be a single digit/range or a group.
> > '(' and ')' are necessary if it's a group.
> > If not supply '@cpus', the value of cpus uses the same as lcores.
> >
> > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
> >   lcore 0 runs on cpuset 0x41 (cpu 0,6)
> >   lcore 1 runs on cpuset 0x2 (cpu 1)
> >   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
> >   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
> >   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> >   lcore 7 runs on cpuset 0x80 (cpu 7)
> >   lcore 8 runs on cpuset 0x100 (cpu 8)
> >
> > Signed-off-by: Cunming Liang 
> > ---
> >  lib/librte_eal/common/eal_common_launch.c  |   1 -
> >  lib/librte_eal/common/eal_common_options.c | 300
> -
> >  lib/librte_eal/common/eal_options.h|   2 +
> >  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
> >  4 files changed, 299 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_launch.c
> b/lib/librte_eal/common/eal_common_launch.c
> > index 599f83b..2d732b1 100644
> > --- a/lib/librte_eal/common/eal_common_launch.c
> > +++ b/lib/librte_eal/common/eal_common_launch.c
> > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
> > rte_eal_wait_lcore(lcore_id);
> > }
> >  }
> > -
> 
> 
> This line should be removed from the patch.
[LCM] Accept.
> 
> 
> > diff --git a/lib/librte_eal/common/eal_common_options.c
> b/lib/librte_eal/common/eal_common_options.c
> > index 67e02dc..29ebb6f 100644
> > --- a/lib/librte_eal/common/eal_common_options.c
> > +++ b/lib/librte_eal/common/eal_common_options.c
> > @@ -45,6 +45,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "eal_internal_cfg.h"
> >  #include "eal_options.h"
> > @@ -85,6 +86,7 @@ eal_long_options[] = {
> > {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
> > {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
> > {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> > +   {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
> > {0, 0, 0, 0}
> >  };
> >
> > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
> > if (min == RTE_MAX_LCORE)
> > min = idx;
> > for (idx = min; idx <= max; idx++) {
> > -   cfg->lcore_role[idx] = ROLE_RTE;
> > -   lcore_config[idx].core_index = count;
> > -   count++;
> > +   if (cfg->lcore_role[idx] != ROLE_RTE) {
> > +   cfg->lcore_role[idx] = ROLE_RTE;
> > +   lcore_config[idx].core_index = count;
> > +   count++;
> > +   }
> > }
> > min = RTE_MAX_LCORE;
> > } else
> > @@ -292,6 +296,279 @@ eal_parse_master_lcore(const char *arg)
> > return 0;
> >  }
> >
> > +/*
> > + * Parse elem, the elem could be single number/range or '(' ')' group
> > + * Within group elem, '-' used for a range seperator;
> > + *',' used for a single number.
> > + */
> > +static int
> > +eal_parse_set(const char *input, uint16_t set[], unsigned num)
> 
> It's not very clear what elem is. Maybe it could be a bit reworded.
> What about naming the function "eal_parse_cpuset()" instead?
[LCM] As it not only parse cpuset but also used for lcore set, so 
'eal_parse_cpuset' is not accurate.
The set/elem here identify for a single number (e.g. 1), a number range (e.g. 
4-6) or a group (e.g. (3,4-8,9) ).
I'll reword the comment for better understand. Thanks.
> 
> 
> > +{
> > +   unsigned idx;
> > +   const char *str = input;
> > +   char *end = NULL;
> > +   unsigned min, max;
> > +
> > +   memset(set, 0, num * sizeof(uint16_t));
> > +
> > +   while (isblank(*str))
> > +   str++;
> > +
> > +   /* only digit or left bracket is qulify for start point */
> > +   if ((!isdigit(*str) && *str != '(') || *str == '\0')
> > +   return -1;
> > +
> > +   /* process single number or single range of number */
> > +   if (*str != '(') {
> > +   errno = 0;
> > +   idx = strtoul(str, , 10);
> > +   if (errno || end == NULL || idx >= num)
> > +   return -1;
> > +   else {
> > +   while (isblank(*end))
> > +   end++;
> > +
> > +   min = idx;
> > +   max = idx;
> > +   if (*end ==

[dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread lcore_config

2015-02-09 Thread Liang, Cunming



> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, February 09, 2015 4:00 AM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 01/17] eal: add cpuset into per EAL thread
> lcore_config
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
> > as the lcore no longer always 1:1 pinning with physical cpu.
> > The lcore now stands for a EAL thread rather than a logical cpu.
> >
> > It doesn't change the default behavior of 1:1 mapping, but allows to
> > affinity the EAL thread to multiple cpus.
> >
> > [...]
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c
> b/lib/librte_eal/bsdapp/eal/eal_memory.c
> > index 65ee87d..a34d500 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_memory.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
> > @@ -45,6 +45,8 @@
> >  #include "eal_internal_cfg.h"
> >  #include "eal_filesystem.h"
> >
> > +/* avoid re-defined against with freebsd header */
> > +#undef PAGE_SIZE
> >  #define PAGE_SIZE (sysconf(_SC_PAGESIZE))
> 
> I don't see the link with the patch. Should this go somewhere else?
> 
> 
> >
> >  /*
> > diff --git a/lib/librte_eal/common/include/rte_lcore.h
> b/lib/librte_eal/common/include/rte_lcore.h
> > index 49b2c03..4c7d6bb 100644
> > --- a/lib/librte_eal/common/include/rte_lcore.h
> > +++ b/lib/librte_eal/common/include/rte_lcore.h
> > @@ -50,6 +50,13 @@ extern "C" {
> >
> >  #define LCORE_ID_ANY -1/**< Any lcore. */
> >
> > +#if defined(__linux__)
> > +   typedef cpu_set_t rte_cpuset_t;
> > +#elif defined(__FreeBSD__)
> > +#include 
> > +   typedef cpuset_t rte_cpuset_t;
> > +#endif
> > +
> 
> Should we also define RTE_CPU_SETSIZE?
> For linux, should  be included?
[LCM] It uses the fix size cpuset, won't use CPU_ALLOC() to get the pointer of 
cpuset.
The RTE_CPU_SETSIZE always equal to sizeof(rte_cpuset_t).
> 
> If I understand well, after the patch series, the user of
> rte_thread_set_affinity() and rte_thread_get_affinity() are
> supposed to use the macros from sched.h to access to this
> cpuset parameter. So I'm wondering if it's not better to
> use cpu_set_t from libc instead of redefining rte_cpuset_t.
> 
> To reword my question: what is the purpose of redefining
> cpu_set_t in rte_cpuset_t if we still need to use all the
> libc API to access to it?
[LCM] In linux the type is *cpu_set_t*, but in freebsd it's *cpuset_t*.
The purpose of *rte_cpuset_t* is to make the consistent type definition in EAL, 
and to avoid lots of #ifdef for this diff.
In either linux or freebsd, it still can use the MACRO in libc to set the 
rte_cpuset_t.
> 
> 
> Regards,
> Olivier

[dpdk-dev] mbuf: how to set data to NULL?

2015-02-09 Thread Kavanagh, Mark B

Hi Bruce,

As a follow-on to my previous question: I suppose what I'm really getting at is 
trying to understand the implications of removing the data pointer, and 
determine if it's possible to replicate behavior observed in DPDK 1.7 (which we 
need in our use case).

Take this situation for example:

DPDK 1.7: I want to set an mbuf's data to NULL:
=>   buf.data = NULL;
 Then, when I subsequently attempt to access the mbuf' data 
section, rte_pktmbuf_mtod(buf) returns NULL

DPDK 1.8: I want to set an mbuf's data to NULL:
=>  buf.data_off = 0;  (is this correct?)
Then, if I attempt to access the mbuf's data, instead of NULL, 
rte_pktmbuf_mtod(buf) returns buf_addr, not NULL.

Is it possible in DPDK 1.8 to replicate the same behavior observed in 1.7?

Btw, in our use case a data_len of 0 doesn't necessarily indicate a data value 
of NULL.

Thanks,
Mark

> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, December 17, 2014 4:50 PM
> To: Kavanagh, Mark B
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] mbuf: how to set data to NULL?
> 
> On Wed, Dec 17, 2014 at 04:44:15PM +, Kavanagh, Mark B wrote:
> > Hi,
> >
> > DPDK 1.8.0 removes the data pointer from the mbuf structure, such that the 
> > start of the
> data in the segment buffer must be calculated (i.e. buf_addr + data_off = 
> 'data').
> >
> > Given this, what is the best approach to set the mbuf data to NULL 
> > (previously mbuf.data
> = NULL)?
> >
> > As I see it, given an initialized mbuf, such that buf_addr is non-null, and 
> > data_off
> =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to memset 
> to 0 from
> location (buf_addr + data_off) for a length of (data_len - data_off)?
> >
> > Thanks in advance,
> > Mark
> 
> Why not just set data_len = 0 to indicate an empty mbuf?

[dpdk-dev] [PATCH 1/4] pci: allow access to PCI config space

2015-02-09 Thread David Marchand

Hello Stephen,

- It looks a bit odd to me, we end up with something asymetric between uio
/ vfio wrt pci config space.
Can we an api consistent between the two ?
Does this mean that your pmd cannot work / has not been used with vfio ?

- Anyway, I suppose we could reuse this api to remove the RTE_PCI_CONFIG
#ifdef / hardcoded stuff from linux eal / igb_uio.
Opinion ?


-- 
David Marchand


On Fri, Feb 6, 2015 at 7:36 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> From: Stephen Hemminger 
>
> Some drivers need ability to access PCI config (for example for power
> management). This adds an abstraction to do this; only implemented
> on Linux, but should be possible on BSD.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_eal/common/include/rte_pci.h | 29
> +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 15 +
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c   | 10 +
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
>  4 files changed, 56 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/rte_pci.h
> b/lib/librte_eal/common/include/rte_pci.h
> index 66ed793..a78081f 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -152,6 +152,7 @@ struct rte_pci_device {
> uint16_t max_vfs;   /**< sriov enable if not
> zero */
> int numa_node;  /**< NUMA node connection
> */
> struct rte_devargs *devargs;/**< Device user arguments
> */
> +   int config_fd;  /**< PCI config access */
>  };
>
>  /** Any PCI device identifier (vendor, device, ...) */
> @@ -298,6 +299,34 @@ void rte_eal_pci_register(struct rte_pci_driver
> *driver);
>   */
>  void rte_eal_pci_unregister(struct rte_pci_driver *driver);
>
> +/**
> + * Read PCI config space.
> + *
> + * @param device
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use
> + * @param buf
> + *   A data buffer where the bytes should be read into
> + * @param size
> + *   The length of the data buffer.
> + */
> +int rte_eal_pci_read_config(const struct rte_pci_device *device,
> +   void *buf, size_t len, off_t offset);
> +
> +/**
> + * Write PCI config space.
> + *
> + * @param device
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use
> + * @param buf
> + *   A data buffer containing the bytes should be written
> + * @param size
> + *   The length of the data buffer.
> + */
> +int rte_eal_pci_write_config(const struct rte_pci_device *device,
> +const void *buf, size_t len, off_t offset);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> index b5f5410..5bcfffa 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -233,6 +233,7 @@ pci_scan_one(const char *dirname, uint16_t domain,
> uint8_t bus,
> dev->addr.bus = bus;
> dev->addr.devid = devid;
> dev->addr.function = function;
> +   dev->config_fd = -1;
>
> /* get vendor id */
> snprintf(filename, sizeof(filename), "%s/vendor", dirname);
> @@ -592,6 +593,20 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver
> *dr, struct rte_pci_device *d
> return 1;
>  }
>
> +/* Read PCI config space. */
> +int rte_eal_pci_read_config(const struct rte_pci_device *device,
> +   void *buf, size_t len, off_t offset)
> +{
> +   return pread(device->config_fd, buf, len, offset);
> +}
> +
> +/* Write PCI config space. */
> +int rte_eal_pci_write_config(const struct rte_pci_device *device,
> +const void *buf, size_t len, off_t offset)
> +{
> +   return pwrite(device->config_fd, buf, len, offset);
> +}
> +
>  /* Init the PCI EAL subsystem */
>  int
>  rte_eal_pci_init(void)
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index e53f06b..0396eaa 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -213,6 +213,7 @@ pci_get_uio_dev(struct rte_pci_device *dev, char
> *dstbuf,
> struct dirent *e;
> DIR *dir;
> char dirname[PATH_MAX];
> +   char filename[PATH_MAX];
>
> /* depending on kernel version, uio can be located in uio/uioX
>  * or uio:uioX */
> @@ -268,6 +269,15 @@ pci_get_uio_dev(struct rte_pci_device *dev, char
> *dstbuf,
> if (e == NULL)
> return -1;
>
> +   /* Open fd for access to PCI config */
> +   snprintf(filename, sizeof(filename), "%s/device/config", dirname);
> +   dev->config_fd = open(filename, O_RDWR);
> +   if (dev->config_fd < 0) {
> +   RTE_LOG(ERR, EAL, "%s(): cannot open %s: %s\n",
> +

[dpdk-dev] [PATCH v2 01/15] mbuf: add definitions of unified packet types

2015-02-09 Thread Bruce Richardson

On Mon, Feb 09, 2015 at 02:40:35PM +0800, Helin Zhang wrote:
> As there are only 6 bit flags in ol_flags for indicating packet types,
> which is not enough to describe all the possible packet types hardware
> can recognize. For example, i40e hardware can recognize more than 150
> packet types. Unified packet type is composed of tunnel type, L3 type,
> L4 type and inner L3 type fields, and can be stored in mbuf field of
> 'packet_type' which is modified from 16 bits to 32 bits in mbuf structure.
> Accordingly, the structure of 'rte_kni_mbuf' needs to be modifed as well.
> 
> Signed-off-by: Helin Zhang 
> Signed-off-by: Cunming Liang 
> Signed-off-by: Jijiang Liu 
> ---
>  .../linuxapp/eal/include/exec-env/rte_kni_common.h |   4 +-
>  lib/librte_mbuf/rte_mbuf.h | 113 
> +++--
>  2 files changed, 108 insertions(+), 9 deletions(-)
> 
> v2 changes:
> * Enlarged the packet_type field from 16 bits to 32 bits.
> * Redefined the packet type sub-fields.
> * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
>

Since these changes to the mbuf will break the operation of the vector driver,
that vector driver needs to be taken into account here.

Some suggestions/options:
1. Temporarily disable the VPMD at compile time or at run time as part of this
patch, and put the vector changes as the next patch (re-enabling the driver too)
2. Put in the minimum changes for the new mbuf layout into this patch. It will
make this patch a little longer, but may still be doable as it's only a couple
of fields changing, not the whole structure.

/Bruce

[dpdk-dev] [PATCH] x32 ABI support, first iteration

2015-02-09 Thread Ananyev, Konstantin

> Subject: [PATCH] x32 ABI support, first iteration
> 
> Signed-off-by: Konstantin Ananyev 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  config/defconfig_x86_x32-native-linuxapp-gcc | 46 
>  mk/arch/x86_x32/rte.vars.mk  | 63 
> 
>  2 files changed, 109 insertions(+)
>  create mode 100644 config/defconfig_x86_x32-native-linuxapp-gcc
>  create mode 100644 mk/arch/x86_x32/rte.vars.mk
> 
> --

Acked-by: Konstantin Ananyev 

> 1.9.1

[dpdk-dev] [PATCH 2/2] i40e:enable TSO support

2015-02-09 Thread David Marchand

Hello,

This patch does two things at the same time.
Please split this to make it easier to understand (see comments below).


On Mon, Feb 9, 2015 at 7:32 AM, Jijiang Liu  wrote:

> This patch enables i40e TSO feature for both non-tunneling packet and
> tunneling packet.
>
> Signed-off-by: Jijiang Liu 
> Signed-off-by: Miroslaw Walukiewicz 
> ---
>  lib/librte_pmd_i40e/i40e_rxtx.c |   99
> ---
>  lib/librte_pmd_i40e/i40e_rxtx.h |   13 +
>  2 files changed, 85 insertions(+), 27 deletions(-)
>
> diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c
> b/lib/librte_pmd_i40e/i40e_rxtx.c
> index 349c1e5..9b9bdcd 100644
> --- a/lib/librte_pmd_i40e/i40e_rxtx.c
> +++ b/lib/librte_pmd_i40e/i40e_rxtx.c
> @@ -465,16 +465,13 @@ static inline void
>  i40e_txd_enable_checksum(uint64_t ol_flags,
> uint32_t *td_cmd,
> uint32_t *td_offset,
> -   uint8_t l2_len,
> -   uint16_t l3_len,
> -   uint8_t outer_l2_len,
> -   uint16_t outer_l3_len,
> +   union i40e_tx_offload tx_offload,
> uint32_t *cd_tunneling)
>  {
> /* UDP tunneling packet TX checksum offload */
> if (unlikely(ol_flags & PKT_TX_OUTER_IP_CKSUM)) {
>
> -   *td_offset |= (outer_l2_len >> 1)
> +   *td_offset |= (tx_offload.outer_l2_len >> 1)
> << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
>
> if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> @@ -485,25 +482,35 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
> *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
>
> /* Now set the ctx descriptor fields */
> -   *cd_tunneling |= (outer_l3_len >> 2) <<
> +   *cd_tunneling |= (tx_offload.outer_l3_len >> 2) <<
> I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
> -   (l2_len >> 1) <<
> +   (tx_offload.l2_len >> 1) <<
> I40E_TXD_CTX_QW0_NATLEN_SHIFT;
>
> } else
> -   *td_offset |= (l2_len >> 1)
> +   *td_offset |= (tx_offload.l2_len >> 1)
> << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
>
> /* Enable L3 checksum offloads */
> if (ol_flags & PKT_TX_IP_CKSUM) {
> *td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
> -   *td_offset |= (l3_len >> 2) <<
> I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +   *td_offset |= (tx_offload.l3_len >> 2)
> +   << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> } else if (ol_flags & PKT_TX_IPV4) {
> *td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
> -   *td_offset |= (l3_len >> 2) <<
> I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +   *td_offset |= (tx_offload.l3_len >> 2)
> +<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> } else if (ol_flags & PKT_TX_IPV6) {
> *td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
> -   *td_offset |= (l3_len >> 2) <<
> I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +   *td_offset |= (tx_offload.l3_len >> 2)
> +<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +   }
>

This first part is a rework to me.
Please split.



> +
> +   if (ol_flags & PKT_TX_TCP_SEG) {
> +   *td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
> +   *td_offset |= (tx_offload.l4_len >> 2)
> +   << I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +   return;
> }
>
> /* Enable L4 checksum offloads */
> @@ -1154,7 +1161,7 @@ i40e_calc_context_desc(uint64_t flags)
>  {
> uint64_t mask = 0ULL;
>
> -   mask |= PKT_TX_OUTER_IP_CKSUM;
> +   mask |= (PKT_TX_OUTER_IP_CKSUM | PKT_TX_TCP_SEG);
>

You are adding this offload flag in an unconditional way.
Is this intended ?


>  #ifdef RTE_LIBRTE_IEEE1588
> mask |= PKT_TX_IEEE1588_TMST;
> @@ -1165,6 +1172,41 @@ i40e_calc_context_desc(uint64_t flags)
> return 0;
>  }
>
> +/* set i40e TSO context descriptor */
> +static inline uint64_t
> +i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
> +{
> +
> +   uint64_t ctx_desc = 0;
> +   uint32_t cd_cmd, hdr_len, cd_tso_len;
> +
> +
> +   if (!tx_offload.l4_len) {
> +   PMD_DRV_LOG(DEBUG, "L4 length set to 0");
> +   return ctx_desc;
> +   }
> +
> +   /**
> +* in case of tunneling packet, the outer_l2_len and
> +* outer_l3_len must be 0.
> +*/
> +   hdr_len = tx_offload.outer_l2_len +
> +   tx_offload.outer_l3_len +
> +   tx_offload.l2_len +
> +   tx_offload.l3_len +
> +   tx_offload.l4_len;
> +
> +   cd_cmd = I40E_TX_CTX_DESC_TSO;
> +   cd_tso_len = mbuf->pkt_len - hdr_len;
> +   ctx_desc |= ((uint64_t)cd_cmd <<

[dpdk-dev] [PATCH v4 25/26] virtio: Fix wmb issue

2015-02-09 Thread Ouyang Changchun

It needs use virtio_wmb instead of virtio_rmb for store memory barrier.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtqueue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index 6c45c27..41dda50 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -266,7 +266,7 @@ virtqueue_full(const struct virtqueue *vq)
 static inline void
 vq_update_avail_idx(struct virtqueue *vq)
 {
-   virtio_rmb();
+   virtio_wmb();
vq->vq_ring.avail->idx = vq->vq_avail_idx;
 }

-- 
1.8.4.2

[dpdk-dev] [PATCH v4 24/26] virtio: Remove hotspots

2015-02-09 Thread Ouyang Changchun

Remove those hotspots which is unnecessary when early returning occurs;

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index c6d9ae7..0225cc9 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -476,13 +476,13 @@ uint16_t
 virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
-   struct virtio_hw *hw = rxvq->hw;
+   struct virtio_hw *hw;
struct rte_mbuf *rxm, *new_mbuf;
-   uint16_t nb_used, num, nb_rx = 0;
+   uint16_t nb_used, num, nb_rx;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
int error;
-   uint32_t i, nb_enqueued = 0;
+   uint32_t i, nb_enqueued;
const uint32_t hdr_size = sizeof(struct virtio_net_hdr);

nb_used = VIRTQUEUE_NUSED(rxvq);
@@ -499,6 +499,11 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
+
+   hw = rxvq->hw;
+   nb_rx = 0;
+   nb_enqueued = 0;
+
for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];

@@ -568,17 +573,17 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
-   struct virtio_hw *hw = rxvq->hw;
+   struct virtio_hw *hw;
struct rte_mbuf *rxm, *new_mbuf;
-   uint16_t nb_used, num, nb_rx = 0;
+   uint16_t nb_used, num, nb_rx;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
struct rte_mbuf *prev;
int error;
-   uint32_t i = 0, nb_enqueued = 0;
-   uint32_t seg_num = 0;
-   uint16_t extra_idx = 0;
-   uint32_t seg_res = 0;
+   uint32_t i, nb_enqueued;
+   uint32_t seg_num;
+   uint16_t extra_idx;
+   uint32_t seg_res;
const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);

nb_used = VIRTQUEUE_NUSED(rxvq);
@@ -590,6 +595,14 @@ virtio_recv_mergeable_pkts(void *rx_queue,

PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);

+   hw = rxvq->hw;
+   nb_rx = 0;
+   i = 0;
+   nb_enqueued = 0;
+   seg_num = 0;
+   extra_idx = 0;
+   seg_res = 0;
+
while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;

-- 
1.8.4.2

[dpdk-dev] [PATCH v4 21/26] example/vhost: Add vlan-strip cmd line option

2015-02-09 Thread Ouyang Changchun

Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VALN strip functionality.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 6af7874..1876c8e 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -159,6 +159,9 @@ static uint32_t num_devices;
 static uint32_t zero_copy;
 static int mergeable;

+/* Do vlan strip on host, enabled on default */
+static uint32_t vlan_strip = 1;
+
 /* number of descriptors to apply*/
 static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
 static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
@@ -564,6 +567,7 @@ us_vhost_usage(const char *prgname)
"   --rx-retry-delay [0-N]: timeout(in usecond) between 
retries on RX. This makes effect only if retries on rx enabled\n"
"   --rx-retry-num [0-N]: the number of retries on rx. This 
makes effect only if retries on rx enabled\n"
"   --mergeable [0|1]: disable(default)/enable RX mergeable 
buffers\n"
+   "   --vlan-strip [0|1]: disable/enable(default) RX VLAN 
strip on host\n"
"   --stats [0-N]: 0: Disable stats, N: Time in seconds to 
print stats\n"
"   --dev-basename: The basename to be used for the 
character device.\n"
"   --zero-copy [0|1]: disable(default)/enable rx/tx "
@@ -591,6 +595,7 @@ us_vhost_parse_args(int argc, char **argv)
{"rx-retry-delay", required_argument, NULL, 0},
{"rx-retry-num", required_argument, NULL, 0},
{"mergeable", required_argument, NULL, 0},
+   {"vlan-strip", required_argument, NULL, 0},
{"stats", required_argument, NULL, 0},
{"dev-basename", required_argument, NULL, 0},
{"zero-copy", required_argument, NULL, 0},
@@ -691,6 +696,22 @@ us_vhost_parse_args(int argc, char **argv)
}
}

+   /* Enable/disable RX VLAN strip on host. */
+   if (!strncmp(long_option[option_index].name,
+   "vlan-strip", MAX_LONG_OPT_SZ)) {
+   ret = parse_num_opt(optarg, 1);
+   if (ret == -1) {
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Invalid argument for VLAN 
strip [0|1]\n");
+   us_vhost_usage(prgname);
+   return -1;
+   } else {
+   vlan_strip = !!ret;
+   vmdq_conf_default.rxmode.hw_vlan_strip =
+   vlan_strip;
+   }
+   }
+
/* Enable/disable stats. */
if (!strncmp(long_option[option_index].name, "stats", 
MAX_LONG_OPT_SZ)) {
ret = parse_num_opt(optarg, INT32_MAX);
@@ -950,7 +971,9 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
dev->device_fh);

/* Enable stripping of the vlan tag as we handle routing. */
-   rte_eth_dev_set_vlan_strip_on_queue(ports[0], 
(uint16_t)vdev->vmdq_rx_q, 1);
+   if (vlan_strip)
+   rte_eth_dev_set_vlan_strip_on_queue(ports[0],
+   (uint16_t)vdev->vmdq_rx_q, 1);

/* Set device as ready for RX. */
vdev->ready = DEVICE_RX;
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 20/26] example/vhost: Avoid inserting vlan twice

2015-02-09 Thread Ouyang Changchun

Check if it has already been vlan-tagged packet, if true, avoid inserting a
duplicated vlan tag into it.

This is a possible case when guest has the capability of inserting vlan tag.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 45 -
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 04f0118..6af7874 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1115,6 +1115,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
struct virtio_net *dev = vdev->dev;
+   struct ether_hdr *nh;

/*check if destination is local VM*/
if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) {
@@ -1135,28 +1136,38 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
tx_q = _tx_queue[lcore_id];
len = tx_q->len;

-   m->ol_flags = PKT_TX_VLAN_PKT;
+   nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+   if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
+   /* Guest has inserted the vlan tag. */
+   struct vlan_hdr *vh = (struct vlan_hdr *) (nh + 1);
+   uint16_t vlan_tag_be = rte_cpu_to_be_16(vlan_tag);
+   if ((vm2vm_mode == VM2VM_HARDWARE) &&
+   (vh->vlan_tci != vlan_tag_be))
+   vh->vlan_tci = vlan_tag_be;
+   } else {
+   m->ol_flags = PKT_TX_VLAN_PKT;

-   /*
-* Find the right seg to adjust the data len when offset is
-* bigger than tail room size.
-*/
-   if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
-   if (likely(offset <= rte_pktmbuf_tailroom(m)))
-   m->data_len += offset;
-   else {
-   struct rte_mbuf *seg = m;
+   /*
+* Find the right seg to adjust the data len when offset is
+* bigger than tail room size.
+*/
+   if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
+   if (likely(offset <= rte_pktmbuf_tailroom(m)))
+   m->data_len += offset;
+   else {
+   struct rte_mbuf *seg = m;

-   while ((seg->next != NULL) &&
-   (offset > rte_pktmbuf_tailroom(seg)))
-   seg = seg->next;
+   while ((seg->next != NULL) &&
+   (offset > rte_pktmbuf_tailroom(seg)))
+   seg = seg->next;

-   seg->data_len += offset;
+   seg->data_len += offset;
+   }
+   m->pkt_len += offset;
}
-   m->pkt_len += offset;
-   }

-   m->vlan_tci = vlan_tag;
+   m->vlan_tci = vlan_tag;
+   }

tx_q->m_table[len] = m;
len++;
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 19/26] ether: Fix vlan strip/insert issue

2015-02-09 Thread Ouyang Changchun

Need swap the data from cpu to BE(big endian) for vlan-type.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ether.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ether.h b/lib/librte_ether/rte_ether.h
index 74f71c2..0797908 100644
--- a/lib/librte_ether/rte_ether.h
+++ b/lib/librte_ether/rte_ether.h
@@ -351,7 +351,7 @@ static inline int rte_vlan_strip(struct rte_mbuf *m)
struct ether_hdr *eh
 = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (eh->ether_type != ETHER_TYPE_VLAN)
+   if (eh->ether_type != rte_cpu_to_be_16(ETHER_TYPE_VLAN))
return -1;

struct vlan_hdr *vh = (struct vlan_hdr *)(eh + 1);
@@ -401,7 +401,7 @@ static inline int rte_vlan_insert(struct rte_mbuf **m)
return -ENOSPC;

memmove(nh, oh, 2 * ETHER_ADDR_LEN);
-   nh->ether_type = ETHER_TYPE_VLAN;
+   nh->ether_type = rte_cpu_to_be_16(ETHER_TYPE_VLAN);

vh = (struct vlan_hdr *) (nh + 1);
vh->vlan_tci = rte_cpu_to_be_16((*m)->vlan_tci);
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 18/26] virtio: Fix descriptor index issue

2015-02-09 Thread Ouyang Changchun

It should use vring descriptor index instead of used_ring index to index 
vq_descx.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 580701a..a82e8eb 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -144,9 +144,9 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)

used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
uep = >vq_ring.used->ring[used_idx];
-   dxp = >vq_descx[used_idx];

desc_idx = (uint16_t) uep->id;
+   dxp = >vq_descx[desc_idx];
vq->vq_used_cons_idx++;
vq_ring_free_chain(vq, desc_idx);

-- 
1.8.4.2

[dpdk-dev] [PATCH v4 17/26] virtio: Use port IO to get PCI resource.

2015-02-09 Thread Ouyang Changchun

Make virtio not require UIO for some security reasons, this is to match 6Wind's 
virtio-net-pmd.

Signed-off-by: Changchun Ouyang 
---
changes in v3:
  Remove macro RTE_EAL_PORT_IO;
  virtio pmd could support uio and ioports method to get the address;  

 lib/librte_pmd_virtio/Makefile|   2 +
 lib/librte_pmd_virtio/virtio_ethdev.c | 136 +-
 2 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/lib/librte_pmd_virtio/Makefile b/lib/librte_pmd_virtio/Makefile
index 456095b..08fa27a 100644
--- a/lib/librte_pmd_virtio/Makefile
+++ b/lib/librte_pmd_virtio/Makefile
@@ -54,4 +54,6 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal 
lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_net lib/librte_malloc

+CFLAGS_virtio_ethdev.o += -Wno-cast-qual
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 8cd2d51..1163d42 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -38,6 +38,7 @@
 #include 
 #ifdef RTE_EXEC_ENV_LINUXAPP
 #include 
+#include 
 #endif

 #include 
@@ -408,11 +409,13 @@ static void
 virtio_dev_close(struct rte_eth_dev *dev)
 {
struct virtio_hw *hw = dev->data->dev_private;
+   struct rte_pci_device *pci_dev = dev->pci_dev;

PMD_INIT_LOG(DEBUG, "virtio_dev_close");

/* reset the NIC */
-   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+   if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
virtio_dev_free_mbufs(dev);
 }
@@ -845,9 +848,9 @@ parse_sysfs_value(const char *filename, unsigned long *val)
return 0;
 }

-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int 
buflen)
+static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int 
buflen,
+   unsigned int *uio_num)
 {
-   unsigned int uio_num;
struct dirent *e;
DIR *dir;
char dirname[PATH_MAX];
@@ -884,18 +887,18 @@ static int get_uio_dev(struct rte_pci_addr *loc, char 
*buf, unsigned int buflen)

/* first try uio%d */
errno = 0;
-   uio_num = strtoull(e->d_name + shortprefix_len, , 10);
+   *uio_num = strtoull(e->d_name + shortprefix_len, , 10);
if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-   snprintf(buf, buflen, "%s/uio%u", dirname, uio_num);
+   snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
break;
}

/* then try uio:uio%d */
errno = 0;
-   uio_num = strtoull(e->d_name + longprefix_len, , 10);
+   *uio_num = strtoull(e->d_name + longprefix_len, , 10);
if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-uio_num);
+*uio_num);
break;
}
}
@@ -928,13 +931,16 @@ virtio_has_msix(const struct rte_pci_addr *loc)
 }

 /* Extract I/O port numbers from sysfs */
-static int virtio_resource_init(struct rte_pci_device *pci_dev)
+static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
 {
char dirname[PATH_MAX];
char filename[PATH_MAX];
unsigned long start, size;
+   unsigned int uio_num;
+   struct rte_pci_driver *pci_drv =
+   (struct rte_pci_driver *)pci_dev->driver;

-   if (get_uio_dev(_dev->addr, dirname, sizeof(dirname)) < 0)
+   if (get_uio_dev(_dev->addr, dirname, sizeof(dirname), _num) < 0)
return -1;

/* get portio size */
@@ -959,8 +965,100 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev)
PMD_INIT_LOG(DEBUG,
 "PCI Port IO found start=0x%lx with size=0x%lx",
 start, size);
+
+   /* save fd */
+   memset(dirname, 0, sizeof(dirname));
+   snprintf(dirname, sizeof(dirname), "/dev/uio%u", uio_num);
+   pci_dev->intr_handle.fd = open(dirname, O_RDWR);
+   if (pci_dev->intr_handle.fd < 0) {
+   PMD_INIT_LOG(ERR, "Cannot open %s: %s\n",
+   devname, strerror(errno));
+   return -1;
+   }
+
+   pci_dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+   pci_drv->drv_flags |= RTE_PCI_DRV_INTR_LSC;
+
return 0;
 }
+
+/* Extract port I/O numbers from proc/ioports */
+static int virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
+{
+   uint16_t start, end;
+   int size;
+   FILE *fp;
+   char *line = NULL;
+   char pci_id[16];
+   int found = 0;
+

[dpdk-dev] [PATCH v4 16/26] virtio: Free mbuf's with threshold

2015-02-09 Thread Ouyang Changchun

This makes virtio driver work like ixgbe. Transmit buffers are
held until a transmit threshold is reached. The previous behavior
was to hold mbuf's until the ring entry was reused which caused
more memory usage than needed.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  7 ++--
 lib/librte_pmd_virtio/virtio_rxtx.c   | 75 +--
 lib/librte_pmd_virtio/virtqueue.h |  3 +-
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b30ab2a..8cd2d51 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -176,15 +176,16 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,

virtqueue_notify(vq);

-   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   rte_rmb();
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   rte_rmb();
usleep(100);
+   }

while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   virtio_rmb();
-
used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
uep = >vq_ring.used->ring[used_idx];
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index b6d6832..580701a 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -129,17 +129,32 @@ virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct 
rte_mbuf **rx_pkts,
return i;
 }

+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
 static void
-virtqueue_dequeue_pkt_tx(struct virtqueue *vq)
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 {
-   struct vring_used_elem *uep;
-   uint16_t used_idx, desc_idx;
+   uint16_t i, used_idx, desc_idx;
+   for (i = 0; i < num; i++) {
+   struct vring_used_elem *uep;
+   struct vq_desc_extra *dxp;
+
+   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = >vq_ring.used->ring[used_idx];
+   dxp = >vq_descx[used_idx];
+
+   desc_idx = (uint16_t) uep->id;
+   vq->vq_used_cons_idx++;
+   vq_ring_free_chain(vq, desc_idx);

-   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-   uep = >vq_ring.used->ring[used_idx];
-   desc_idx = (uint16_t) uep->id;
-   vq->vq_used_cons_idx++;
-   vq_ring_free_chain(vq, desc_idx);
+   if (dxp->cookie != NULL) {
+   rte_pktmbuf_free(dxp->cookie);
+   dxp->cookie = NULL;
+   }
+   }
 }


@@ -203,8 +218,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie)

idx = head_idx;
dxp = >vq_descx[idx];
-   if (dxp->cookie != NULL)
-   rte_pktmbuf_free(dxp->cookie);
dxp->cookie = (void *)cookie;
dxp->ndescs = needed;

@@ -404,6 +417,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
struct virtqueue *vq;
+   uint16_t tx_free_thresh;
int ret;

PMD_INIT_FUNC_TRACE();
@@ -421,6 +435,22 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return ret;
}

+   tx_free_thresh = tx_conf->tx_free_thresh;
+   if (tx_free_thresh == 0)
+   tx_free_thresh =
+   RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
+
+   if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+   RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+   "number of TX entries minus 3 (%u)."
+   " (tx_free_thresh=%u port=%u queue=%u)\n",
+   vq->vq_nentries - 3,
+   tx_free_thresh, dev->data->port_id, queue_idx);
+   return -EINVAL;
+   }
+
+   vq->vq_free_thresh = tx_free_thresh;
+
dev->data->tx_queues[queue_idx] = vq;
return 0;
 }
@@ -688,11 +718,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
struct rte_mbuf *txm;
-   uint16_t nb_used, nb_tx, num;
+   uint16_t nb_used, nb_tx;
int error;

-   nb_tx = 0;
-
if (unlikely(nb_pkts < 1))
return nb_pkts;

@@ -700,21 +728,26 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
nb_used = VIRTQUEUE_NUSED(txvq);

virtio_rmb();
+   if (likely(nb_used > txvq->vq_free_thresh))
+   virtio_xmit_cleanup(txvq, nb_used);

-

[dpdk-dev] [PATCH v4 15/26] virtio: Add ability to set MAC address

2015-02-09 Thread Ouyang Changchun

Need to have do special things to set default mac address.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h |  5 +
 lib/librte_pmd_virtio/virtio_ethdev.c | 24 
 2 files changed, 29 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 94d6b2b..5a54276 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1240,6 +1240,10 @@ typedef void (*eth_mac_addr_add_t)(struct rte_eth_dev 
*dev,
  uint32_t vmdq);
 /**< @internal Set a MAC address into Receive Address Address Register */

+typedef void (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
+ struct ether_addr *mac_addr);
+/**< @internal Set a MAC address into Receive Address Address Register */
+
 typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
  struct ether_addr *mac_addr,
  uint8_t on);
@@ -1459,6 +1463,7 @@ struct eth_dev_ops {
priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority 
flow control.*/
eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
eth_mac_addr_add_t mac_addr_add;  /**< Add a MAC address */
+   eth_mac_addr_set_t mac_addr_set;  /**< Set a MAC address */
eth_uc_hash_table_set_tuc_hash_table_set;  /**< Set Unicast Table 
Array */
eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast 
hash bitmap */
eth_mirror_rule_set_t  mirror_rule_set;  /**< Add a traffic mirror 
rule.*/
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 0e74eea..b30ab2a 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -90,6 +90,8 @@ static void virtio_mac_addr_add(struct rte_eth_dev *dev,
struct ether_addr *mac_addr,
uint32_t index, uint32_t vmdq __rte_unused);
 static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -518,6 +520,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.vlan_filter_set = virtio_vlan_filter_set,
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
+   .mac_addr_set= virtio_mac_addr_set,
 };

 static inline int
@@ -733,6 +736,27 @@ virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
virtio_mac_table_set(hw, uc, mc);
 }

+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+
+   memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+   /* Use atomic update if available */
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   struct virtio_pmd_ctrl ctrl;
+   int len = ETHER_ADDR_LEN;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+   memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+   virtio_send_command(hw->cvq, , , 1);
+   } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+   virtio_set_hwaddr(hw);
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 14/26] virtio: Add suport for multiple mac addresses

2015-02-09 Thread Ouyang Changchun

Virtio support multiple MAC addresses.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 94 ++-
 lib/librte_pmd_virtio/virtio_ethdev.h |  3 +-
 lib/librte_pmd_virtio/virtqueue.h | 34 -
 3 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 591d692..0e74eea 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -86,6 +86,10 @@ static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
 static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -503,8 +507,6 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.stats_get   = virtio_dev_stats_get,
.stats_reset = virtio_dev_stats_reset,
.link_update = virtio_dev_link_update,
-   .mac_addr_add= NULL,
-   .mac_addr_remove = NULL,
.rx_queue_setup  = virtio_dev_rx_queue_setup,
/* meaningfull only to multiple queue */
.rx_queue_release= virtio_dev_rx_queue_release,
@@ -514,6 +516,8 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
.vlan_filter_set = virtio_vlan_filter_set,
+   .mac_addr_add= virtio_mac_addr_add,
+   .mac_addr_remove = virtio_mac_addr_remove,
 };

 static inline int
@@ -644,6 +648,92 @@ virtio_get_hwaddr(struct virtio_hw *hw)
 }

 static int
+virtio_mac_table_set(struct virtio_hw *hw,
+const struct virtio_net_ctrl_mac *uc,
+const struct virtio_net_ctrl_mac *mc)
+{
+   struct virtio_pmd_ctrl ctrl;
+   int err, len[2];
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+   len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+   memcpy(ctrl.data, uc, len[0]);
+
+   len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+   memcpy(ctrl.data + len[0], mc, len[1]);
+
+   err = virtio_send_command(hw->cvq, , len, 2);
+   if (err != 0)
+   PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+   return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   const struct ether_addr *addrs = dev->data->mac_addrs;
+   unsigned int i;
+   struct virtio_net_ctrl_mac *uc, *mc;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   const struct ether_addr *addr
+   = (i == index) ? mac_addr : addrs + i;
+   struct virtio_net_ctrl_mac *tbl
+   = is_multicast_ether_addr(addr) ? mc : uc;
+
+   memcpy(>macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+   }
+
+   virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct ether_addr *addrs = dev->data->mac_addrs;
+   struct virtio_net_ctrl_mac *uc, *mc;
+   unsigned int i;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   struct virtio_net_ctrl_mac *tbl;
+
+   if (i == index || is_zero_ether_addr(addrs + i))
+   continue;
+
+   tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+   memcpy(>macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
+

[dpdk-dev] [PATCH v4 13/26] virtio: Add support for vlan filtering

2015-02-09 Thread Ouyang Changchun

Virtio supports vlan filtering.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 39b1fb4..591d692 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -84,6 +84,8 @@ static void virtio_dev_tx_queue_release(__rte_unused void 
*txq);
 static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats);
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+   uint16_t vlan_id, int on);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -511,6 +513,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.tx_queue_release= virtio_dev_tx_queue_release,
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+   .vlan_filter_set = virtio_vlan_filter_set,
 };

 static inline int
@@ -640,14 +643,31 @@ virtio_get_hwaddr(struct virtio_hw *hw)
}
 }

+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct virtio_pmd_ctrl ctrl;
+   int len;
+
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+   return -ENOTSUP;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+   ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+   memcpy(ctrl.data, _id, sizeof(vlan_id));
+   len = sizeof(vlan_id);
+
+   return virtio_send_command(hw->cvq, , , 1);
+}

 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_VLAN;
-   mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+   /* checksum offload not implemented */
+   mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
 * checksum offload feature is also negotiated.
@@ -1058,6 +1078,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

+   if (rxmode->hw_vlan_filter
+   && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+   PMD_DRV_LOG(NOTICE,
+   "vlan filtering not available on this host");
+   return -ENOTSUP;
+   }
+
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
-- 
1.8.4.2

1 2 >

1 - 100 of 124 matches

Mail list logo