[dpdk-dev] [PATCH v2] kni: fix unused variable compile error

2016-10-14 Thread Thomas Monjalon
2016-10-14 17:41, Ferruh Yigit:
> compile error:
>   CC [M]  .../lib/librte_eal/linuxapp/kni/kni_misc.o
> cc1: warnings being treated as errors
> .../lib/librte_eal/linuxapp/kni/kni_misc.c: In function ?kni_exit_net?:
> .../lib/librte_eal/linuxapp/kni/kni_misc.c:113:18:
> error: unused variable ?knet?
> 
> For kernel versions < v3.1 mutex_destroy() is a macro and does nothing,
> this cause an unused variable warning for knet which used in the
> mutex_destroy()
> 
> mutex_destroy() converted into static inline function with commit:
> Linux: 4582c0a4866e ("mutex: Make mutex_destroy() an inline function")
> 
> To fix the warning unused attribute added to the knet variable.
> 
> Fixes: 93a298b34e1b ("kni: support core id parameter in single threaded mode")
> 
> Signed-off-by: Ferruh Yigit 

Applied, thanks


[dpdk-dev] 17.02 Roadmap

2016-10-14 Thread Thomas Monjalon
2016-10-14 10:29, Stephen Hemminger:
> It seems like a lot of these feature are focused too narrowly on exposing
> features that exist on specific Intel hardware. The concept of a general
> purpose Dataplane Development Kit is that applications can be written that
> have a generic API (like any operating system) that will run on a wide
> variety of hardware.  This concept seems to getting lost as the DPDK is
> becoming more of a platform for exposing what ever cool hardware features
> exist.
> 
> I would propose that no new feature be allowed in the DPDK unless it
> can be supported on all device types. Yes that means you have to build
> and test software emulation layers for all other devices. The current
> model is more of a hardware test bed.

Thanks for the reminder Stephen. It is good goal.
I think the software emulation idea is finding its way.
About forbidding new hardware feature without emulation support,
it has to be discussed.


[dpdk-dev] [PATCH v2 3/5] i40e: enable i40e vector PMD on ARMv8a platform

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 09:30:02AM +0530, Jianbo Liu wrote:
> Signed-off-by: Jianbo Liu 

Reviewed-by: Jerin Jacob 

> ---
>  config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
>  doc/guides/nics/features/i40e_vec.ini  | 1 +
>  doc/guides/nics/features/i40e_vf_vec.ini   | 1 +
>  3 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
> b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index a0f4473..6321884 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -45,6 +45,5 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
>  CONFIG_RTE_EAL_IGB_UIO=n
>  
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> -CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n
>  
>  CONFIG_RTE_SCHED_VECTOR=n
> diff --git a/doc/guides/nics/features/i40e_vec.ini 
> b/doc/guides/nics/features/i40e_vec.ini
> index 0953d84..edd6b71 100644
> --- a/doc/guides/nics/features/i40e_vec.ini
> +++ b/doc/guides/nics/features/i40e_vec.ini
> @@ -37,3 +37,4 @@ Linux UIO= Y
>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> +ARMv8= Y
> diff --git a/doc/guides/nics/features/i40e_vf_vec.ini 
> b/doc/guides/nics/features/i40e_vf_vec.ini
> index 2a44bf6..d6674f7 100644
> --- a/doc/guides/nics/features/i40e_vf_vec.ini
> +++ b/doc/guides/nics/features/i40e_vf_vec.ini
> @@ -26,3 +26,4 @@ Linux UIO= Y
>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> +ARMv8= Y
> -- 
> 2.4.11
> 


[dpdk-dev] [PATCH v2 2/5] i40e: implement vector PMD for ARM architecture

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 09:30:01AM +0530, Jianbo Liu wrote:
> Use ARM NEON intrinsic to implement i40e vPMD
> 
> Signed-off-by: Jianbo Liu 

I'm not entirely familiar with i40e internals.The patch looks OK interms
of using NEON instructions.

Acked-by: Jerin Jacob 

> ---
>  drivers/net/i40e/Makefile |   4 +
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 614 
> ++
>  2 files changed, 618 insertions(+)
>  create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 10:30:33AM +, Hemant Agrawal wrote:

> > > Am I reading this correctly that there is no way to support an
> > > indefinite waiting capability? Or is this just saying that if a timed
> > > wait is performed there are min/max limits for the wait duration?
> > 
> > Application can wait indefinite if required. see
> > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.
> > 
> > Trivial application may not need different wait values on each dequeue.This 
> > is a
> > performance optimization opportunity for implementation.
> 
>  Jerin, It is irrespective of wait configuration, whether you are using per 
> device wait or per dequeuer wait. 
>  Can the value of MAX_U32 or MAX_U64 be treated as infinite weight? 

That will be yet another check in the fast path in the implementation, I
think, for more fine-grained wait scheme. Let application configure the device
with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT so that the application can have
two different function pointer-based implementation for dequeue function
if required.

With RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration, implicitly
MAX_U64 becomes infinite weight as the wait is uint64_t.
I can add this info in v3 if required.

Jerin

> 
> > 


[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 11:23:44AM +0200, Maxime Coquelin wrote:
> I was going to re-run some PVP benchmark with 0% pkt loss, as I had
> some strange results last week.
> 
> Problem is that your series no more apply cleanly due to
> next-virtio's master branch history rewrite.
> Any chance you send me a rebased version so that I can apply the series?

I think it's pointless to do that now: it won't be merged after all.
We have refactored out the new series, please help review if you
got time :)

BTW, apologize that I forgot to include your Reviewed-by for the
first patch. I intended to do that ...

--yliu


[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-14 Thread Maxime Coquelin


On 10/14/2016 09:24 AM, Wang, Zhihong wrote:
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maxime Coquelin
>> Sent: Tuesday, September 27, 2016 4:43 PM
>> To: yuanhan.liu at linux.intel.com; Xie, Huawei ;
>> dev at dpdk.org
>> Cc: vkaplans at redhat.com; mst at redhat.com;
>> stephen at networkplumber.org; Maxime Coquelin
>> 
>> Subject: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to
>> the TX path
>>
>> Indirect descriptors are usually supported by virtio-net devices,
>> allowing to dispatch a larger number of requests.
>>
>> When the virtio device sends a packet using indirect descriptors,
>> only one slot is used in the ring, even for large packets.
>>
>> The main effect is to improve the 0% packet loss benchmark.
>> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
>> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
>> zero loss.
>>
>> On the downside, micro-benchmark using testpmd txonly in VM and
>> rxonly on host shows a loss between 1 and 4%.i But depending on
>> the needs, feature can be disabled at VM boot time by passing
>> indirect_desc=off argument to vhost-user device in Qemu.
>>
>> Signed-off-by: Maxime Coquelin 
>
>
> Hi Maxime,
>
> Seems this patch can't with Windows virtio guest in my test.
> Have you done similar tests before?
>
> The way I test:
>
>  1. Make sure https://patchwork.codeaurora.org/patch/84339/ is applied
>
>  2. Start testpmd with iofwd between 2 vhost ports
>
>  3. Start 2 Windows guests connected to the 2 vhost ports
>
>  4. Disable firewall and assign IP to each guest using ipconfig
>
>  5. Use ping to test connectivity
>
> When I disable this patch by setting:
>
> 0ULL << VIRTIO_RING_F_INDIRECT_DESC,
>
> the connection is fine, but when I restore:
>
> 1ULL << VIRTIO_RING_F_INDIRECT_DESC,
>
> the connection is broken.

Just noticed I didn't reply to all this morning.
I sent a debug patch to Zhihong, which shows that indirect desc chaining
looks OK.

On my side, I just setup 2 Windows 2016 VMs, and confirm the issue.
I'll continue the investigation early next week.

Has anyone already tested Windows guest with vhost-net, which also has
indirect descs support?


Regards,
Maxime


[dpdk-dev] [PATCH v2] kni: fix unused variable compile error

2016-10-14 Thread Ferruh Yigit
compile error:
  CC [M]  .../lib/librte_eal/linuxapp/kni/kni_misc.o
cc1: warnings being treated as errors
.../lib/librte_eal/linuxapp/kni/kni_misc.c: In function ?kni_exit_net?:
.../lib/librte_eal/linuxapp/kni/kni_misc.c:113:18:
error: unused variable ?knet?

For kernel versions < v3.1 mutex_destroy() is a macro and does nothing,
this cause an unused variable warning for knet which used in the
mutex_destroy()

mutex_destroy() converted into static inline function with commit:
Linux: 4582c0a4866e ("mutex: Make mutex_destroy() an inline function")

To fix the warning unused attribute added to the knet variable.

Fixes: 93a298b34e1b ("kni: support core id parameter in single threaded mode")

Signed-off-by: Ferruh Yigit 
---

v2:
* updated commit log with more details on Linux version that issue
  occurs
---
 lib/librte_eal/linuxapp/kni/kni_misc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 3303d9b..497db9b 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -110,9 +110,11 @@ kni_init_net(struct net *net)
 static void __net_exit
 kni_exit_net(struct net *net)
 {
-   struct kni_net *knet = net_generic(net, kni_net_id);
+   struct kni_net *knet __maybe_unused;

+   knet = net_generic(net, kni_net_id);
mutex_destroy(>kni_kthread_lock);
+
 #ifndef HAVE_SIMPLIFIED_PERNET_OPERATIONS
kfree(knet);
 #endif
-- 
2.7.4



[dpdk-dev] [PATCH v1 2/2] doc: update poll mode driver guide

2016-10-14 Thread Bernard Iremonger
add information about new ixgbe PMD API.

Signed-off-by: Bernard Iremonger 
---
 doc/guides/prog_guide/poll_mode_drv.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst 
b/doc/guides/prog_guide/poll_mode_drv.rst
index bf3ea9f..3a400b2 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -356,3 +356,9 @@ Some additions in the metadata scheme are as follows:
 An example where queue numbers are used is as follows: ``tx_q7_bytes`` which
 indicates this statistic applies to queue number 7, and represents the number
 of transmitted bytes on that queue.
+
+Extended ixgbe PMD API
+~~
+
+In DPDK release v16.11 an API for ixgbe specific functions has been added to 
the ixgbe PMD.
+The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``.
-- 
2.10.1



[dpdk-dev] [PATCH v1 1/2] doc: update ixgbe guide

2016-10-14 Thread Bernard Iremonger
add information about new ixgbe PMD API.

Signed-off-by: Bernard Iremonger 
---
 doc/guides/nics/ixgbe.rst | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index ed260c4..3b6851b 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -147,6 +147,11 @@ The following MACROs are used for these three features:

 *   ETH_TXQ_FLAGS_NOXSUMTCP

+Application Programming Interface
+~
+
+In DPDK release v16.11 an API for ixgbe specific functions has been added to 
the ixgbe PMD.
+The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``.

 Sample Application Notes
 
-- 
2.10.1



[dpdk-dev] [PATCH v1 0/2] doc: ixgbe updates

2016-10-14 Thread Bernard Iremonger
Update two rst files to announce ixgbe PMD API's

Bernard Iremonger (2):
  doc: update ixgbe guide
  doc: update poll mode driver guide

 doc/guides/nics/ixgbe.rst   | 7 ++-
 doc/guides/prog_guide/poll_mode_drv.rst | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

-- 
2.10.1



[dpdk-dev] [PATCH v7 7/7] vhost: retrieve avail head once

2016-10-14 Thread Yuanhan Liu
There is no need to retrieve the latest avail head every time we enqueue
a packet in the mereable Rx path by

avail_idx = *((volatile uint16_t *)>avail->idx);

Instead, we could just retrieve it once at the beginning of the enqueue
path. This could diminish the cache penalty slightly, because the virtio
driver could be updating it while vhost is reading it (for each packet).

Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 12a037b..b784dba 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -387,10 +387,10 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t 
avail_idx,
  */
 static inline int
 reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size,
-   struct buf_vector *buf_vec, uint16_t *num_buffers)
+   struct buf_vector *buf_vec, uint16_t *num_buffers,
+   uint16_t avail_head)
 {
uint16_t cur_idx;
-   uint16_t avail_idx;
uint32_t vec_idx = 0;
uint16_t tries = 0;

@@ -401,8 +401,7 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, 
uint32_t size,
cur_idx  = vq->last_avail_idx;

while (size > 0) {
-   avail_idx = *((volatile uint16_t *)>avail->idx);
-   if (unlikely(cur_idx == avail_idx))
+   if (unlikely(cur_idx == avail_head))
return -1;

if (unlikely(fill_vec_buf(vq, cur_idx, _idx, buf_vec,
@@ -523,6 +522,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t pkt_idx = 0;
uint16_t num_buffers;
struct buf_vector buf_vec[BUF_VECTOR_MAX];
+   uint16_t avail_head;

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
@@ -542,11 +542,12 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]);

vq->shadow_used_idx = 0;
+   avail_head = *((volatile uint16_t *)>avail->idx);
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;

if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec,
-_buffers) < 0)) {
+   _buffers, avail_head) < 0)) {
LOG_DEBUG(VHOST_DATA,
"(%d) failed to get enough desc from vring\n",
dev->vid);
-- 
1.9.0



[dpdk-dev] [PATCH v7 6/7] vhost: prefetch avail ring

2016-10-14 Thread Yuanhan Liu
Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 2bdc2fe..12a037b 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -539,6 +539,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
if (count == 0)
return 0;

+   rte_prefetch0(>avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
vq->shadow_used_idx = 0;
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
-- 
1.9.0



[dpdk-dev] [PATCH v7 5/7] vhost: shadow used ring update

2016-10-14 Thread Yuanhan Liu
From: Zhihong Wang 

The basic idea is to shadow the used ring update: update them into a
local buffer first, and then flush them all to the virtio used vring
at once in the end.

And since we do avail ring reservation before enqueuing data, we would
know which and how many descs will be used. Which means we could update
the shadow used ring at the reservation time. It also introduce another
slight advantage: we don't need access the desc->flag any more inside
copy_mbuf_to_desc_mergeable().

Signed-off-by: Zhihong Wang 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost.c  |  13 +++-
 lib/librte_vhost/vhost.h  |   3 +
 lib/librte_vhost/vhost_user.c |  23 +--
 lib/librte_vhost/virtio_net.c | 138 +-
 4 files changed, 113 insertions(+), 64 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 469117a..d8116ff 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -121,9 +121,18 @@ static void
 free_device(struct virtio_net *dev)
 {
uint32_t i;
+   struct vhost_virtqueue *rxq, *txq;

-   for (i = 0; i < dev->virt_qp_nb; i++)
-   rte_free(dev->virtqueue[i * VIRTIO_QNUM]);
+   for (i = 0; i < dev->virt_qp_nb; i++) {
+   rxq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+   txq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+   rte_free(rxq->shadow_used_ring);
+   rte_free(txq->shadow_used_ring);
+
+   /* rxq and txq are allocated together as queue-pair */
+   rte_free(rxq);
+   }

rte_free(dev);
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 17c557f..acec772 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -105,6 +105,9 @@ struct vhost_virtqueue {
uint16_tlast_zmbuf_idx;
struct zcopy_mbuf   *zmbufs;
struct zcopy_mbuf_list  zmbuf_list;
+
+   struct vring_used_elem  *shadow_used_ring;
+   uint16_tshadow_used_idx;
 } __rte_cache_aligned;

 /* Old kernels have no such macro defined */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3074227..6b83c15 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -198,6 +198,15 @@ vhost_user_set_vring_num(struct virtio_net *dev,
}
}

+   vq->shadow_used_ring = rte_malloc(NULL,
+   vq->size * sizeof(struct vring_used_elem),
+   RTE_CACHE_LINE_SIZE);
+   if (!vq->shadow_used_ring) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "failed to allocate memory for shadow used ring.\n");
+   return -1;
+   }
+
return 0;
 }

@@ -711,6 +720,8 @@ static int
 vhost_user_get_vring_base(struct virtio_net *dev,
  struct vhost_vring_state *state)
 {
+   struct vhost_virtqueue *vq = dev->virtqueue[state->index];
+
/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING) {
dev->flags &= ~VIRTIO_DEV_RUNNING;
@@ -718,7 +729,7 @@ vhost_user_get_vring_base(struct virtio_net *dev,
}

/* Here we are safe to get the last used index */
-   state->num = dev->virtqueue[state->index]->last_used_idx;
+   state->num = vq->last_used_idx;

RTE_LOG(INFO, VHOST_CONFIG,
"vring base idx:%d file:%d\n", state->index, state->num);
@@ -727,13 +738,15 @@ vhost_user_get_vring_base(struct virtio_net *dev,
 * sent and only sent in vhost_vring_stop.
 * TODO: cleanup the vring, it isn't usable since here.
 */
-   if (dev->virtqueue[state->index]->kickfd >= 0)
-   close(dev->virtqueue[state->index]->kickfd);
+   if (vq->kickfd >= 0)
+   close(vq->kickfd);

-   dev->virtqueue[state->index]->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+   vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;

if (dev->dequeue_zero_copy)
-   free_zmbufs(dev->virtqueue[state->index]);
+   free_zmbufs(vq);
+   rte_free(vq->shadow_used_ring);
+   vq->shadow_used_ring = NULL;

return 0;
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index b5ba633..2bdc2fe 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -91,6 +91,56 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
qp_nb)
return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
 }

+static inline void __attribute__((always_inline))
+do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
+ uint16_t to, uint16_t from, uint16_t size)
+{
+   rte_memcpy(>used->ring[to],
+   >shadow_used_ring[from],
+   size * sizeof(struct 

[dpdk-dev] [PATCH v7 3/7] vhost: simplify mergeable Rx vring reservation

2016-10-14 Thread Yuanhan Liu
Let it return "num_buffers" we reserved, so that we could re-use it
with copy_mbuf_to_desc_mergeable() directly, instead of calculating
it again there.

Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed
to "int". -1 will be return on error.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/virtio_net.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index d4fc62a..1a40c91 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -336,7 +336,7 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
  */
 static inline int
 reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size,
-   uint16_t *end, struct buf_vector *buf_vec)
+   struct buf_vector *buf_vec, uint16_t *num_buffers)
 {
uint16_t cur_idx;
uint16_t avail_idx;
@@ -370,19 +370,18 @@ reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, 
uint32_t size,
return -1;
}

-   *end = cur_idx;
+   *num_buffers = cur_idx - vq->last_used_idx;
return 0;
 }

-static inline uint32_t __attribute__((always_inline))
+static inline int __attribute__((always_inline))
 copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
-   uint16_t end_idx, struct rte_mbuf *m,
-   struct buf_vector *buf_vec)
+   struct rte_mbuf *m, struct buf_vector *buf_vec,
+   uint16_t num_buffers)
 {
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
uint32_t vec_idx = 0;
-   uint16_t start_idx = vq->last_used_idx;
-   uint16_t cur_idx = start_idx;
+   uint16_t cur_idx = vq->last_used_idx;
uint64_t desc_addr;
uint32_t desc_chain_head;
uint32_t desc_chain_len;
@@ -394,21 +393,21 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
struct rte_mbuf *hdr_mbuf;

if (unlikely(m == NULL))
-   return 0;
+   return -1;

LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
dev->vid, cur_idx, end_idx);

desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
-   return 0;
+   return -1;

hdr_mbuf = m;
hdr_addr = desc_addr;
hdr_phys_addr = buf_vec[vec_idx].buf_addr;
rte_prefetch0((void *)(uintptr_t)hdr_addr);

-   virtio_hdr.num_buffers = end_idx - start_idx;
+   virtio_hdr.num_buffers = num_buffers;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
dev->vid, virtio_hdr.num_buffers);

@@ -440,7 +439,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,

desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
if (unlikely(!desc_addr))
-   return 0;
+   return -1;

/* Prefetch buffer address. */
rte_prefetch0((void *)(uintptr_t)desc_addr);
@@ -489,7 +488,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
offsetof(struct vring_used, ring[used_idx]),
sizeof(vq->used->ring[used_idx]));

-   return end_idx - start_idx;
+   return 0;
 }

 static inline uint32_t __attribute__((always_inline))
@@ -497,8 +496,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
struct rte_mbuf **pkts, uint32_t count)
 {
struct vhost_virtqueue *vq;
-   uint32_t pkt_idx = 0, nr_used = 0;
-   uint16_t end;
+   uint32_t pkt_idx = 0;
+   uint16_t num_buffers;
struct buf_vector buf_vec[BUF_VECTOR_MAX];

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
@@ -519,22 +518,24 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;

-   if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len,
-, buf_vec) < 0)) {
+   if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec,
+_buffers) < 0)) {
LOG_DEBUG(VHOST_DATA,
"(%d) failed to get enough desc from vring\n",
dev->vid);
break;
}

-   nr_used = copy_mbuf_to_desc_mergeable(dev, vq, end,
- pkts[pkt_idx], buf_vec);
+

[dpdk-dev] [PATCH v7 2/7] vhost: optimize cache access

2016-10-14 Thread Yuanhan Liu
From: Zhihong Wang 

This patch reorders the code to delay virtio header write to improve
cache access efficiency for cases where the mrg_rxbuf feature is turned
on. CPU pipeline stall cycles can be significantly reduced.

Virtio header write and mbuf data copy are all remote store operations
which takes a long time to finish. It's a good idea to put them together
to remove bubbles in between, to let as many remote store instructions
as possible go into store buffer at the same time to hide latency, and
to let the H/W prefetcher goes to work as early as possible.

On a Haswell machine, about 100 cycles can be saved per packet by this
patch alone. Taking 64B packets traffic for example, this means about 60%
efficiency improvement for the enqueue operation.

Signed-off-by: Zhihong Wang 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/virtio_net.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 812e5d3..d4fc62a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -390,6 +390,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t desc_offset, desc_avail;
uint32_t cpy_len;
uint16_t desc_idx, used_idx;
+   uint64_t hdr_addr, hdr_phys_addr;
+   struct rte_mbuf *hdr_mbuf;

if (unlikely(m == NULL))
return 0;
@@ -401,17 +403,15 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
return 0;

-   rte_prefetch0((void *)(uintptr_t)desc_addr);
+   hdr_mbuf = m;
+   hdr_addr = desc_addr;
+   hdr_phys_addr = buf_vec[vec_idx].buf_addr;
+   rte_prefetch0((void *)(uintptr_t)hdr_addr);

virtio_hdr.num_buffers = end_idx - start_idx;
LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
dev->vid, virtio_hdr.num_buffers);

-   virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
-
desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
desc_offset = dev->vhost_hlen;
desc_chain_head = buf_vec[vec_idx].desc_idx;
@@ -456,6 +456,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
mbuf_avail  = rte_pktmbuf_data_len(m);
}

+   if (hdr_addr) {
+   virtio_enqueue_offload(hdr_mbuf, _hdr.hdr);
+   copy_virtio_net_hdr(dev, hdr_addr, virtio_hdr);
+   vhost_log_write(dev, hdr_phys_addr, dev->vhost_hlen);
+   PRINT_PACKET(dev, (uintptr_t)hdr_addr,
+dev->vhost_hlen, 0);
+
+   hdr_addr = 0;
+   }
+
cpy_len = RTE_MIN(desc_avail, mbuf_avail);
rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)),
rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
-- 
1.9.0



[dpdk-dev] [PATCH v7 1/7] vhost: remove useless volatile

2016-10-14 Thread Yuanhan Liu
From: Zhihong Wang 

last_used_idx is a local var, there is no need to decorate it
by "volatile".

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 53dbf33..17c557f 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -85,7 +85,7 @@ struct vhost_virtqueue {
uint32_tsize;

uint16_tlast_avail_idx;
-   volatile uint16_t   last_used_idx;
+   uint16_tlast_used_idx;
 #define VIRTIO_INVALID_EVENTFD (-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD   (-2)

-- 
1.9.0



[dpdk-dev] [PATCH v7 0/7] vhost: optimize mergeable Rx path

2016-10-14 Thread Yuanhan Liu
This is a new set of patches to optimize the mergeable Rx code path.
No refactoring (rewrite) was made this time. It just applies some
findings from Zhihong (kudos to him!) that could improve the mergeable
Rx path on the old code.

The two major factors that could improve the performance greatly are:

- copy virtio header together with packet data. This could remove
  the buubbles between the two copy to optimize the cache access.

  This is implemented in patch 2 "vhost: optimize cache access"

- shadow used ring update and update them at once

  The basic idea is to update used ring in a local buffer and flush
  them to the virtio used ring at once in the end. Again, this is
  for optimizing the cache access.

  This is implemented in patch 5 "vhost: shadow used ring update"

The two optimizations could yield 40+% performance in micro testing
and 20+% in PVP case testing with 64B packet size.

Besides that, there are some tiny optimizations, such as prefetch
avail ring (patch 6) and retrieve avail head once (patch 7).

Note: the shadow used ring tech could also be applied to the non-mrg
Rx path (and even the dequeu) path. I didn't do that for two reasons:

- we already update used ring in batch in both path: it's not shadowed
  first though.

- it's a bit too late too make many changes at this stage: RC1 is out. 

Please help testing.

Thanks.

--yliu

Cc: Jianbo Liu 
---
Yuanhan Liu (4):
  vhost: simplify mergeable Rx vring reservation
  vhost: use last avail idx for avail ring reservation
  vhost: prefetch avail ring
  vhost: retrieve avail head once

Zhihong Wang (3):
  vhost: remove useless volatile
  vhost: optimize cache access
  vhost: shadow used ring update

 lib/librte_vhost/vhost.c  |  13 ++-
 lib/librte_vhost/vhost.h  |   5 +-
 lib/librte_vhost/vhost_user.c |  23 +++--
 lib/librte_vhost/virtio_net.c | 193 +-
 4 files changed, 149 insertions(+), 85 deletions(-)

-- 
1.9.0



[dpdk-dev] 16.07.1 stable patches review and test

2016-10-14 Thread Yuanhan Liu
Hi,

I have applied most of bug fixing patches (listed below) to the 16.07
stable branch at

http://dpdk.org/browse/dpdk-stable/

Please help reviewing and testing. The planned date for the final release
is 26th, Oct. Before that, please shout if anyone has objections with these
patches being applied.

Thanks.

--yliu

---
Alejandro Lucero (1):
  net/nfp: fix copying MAC address

Aleksey Katargin (1):
  table: fix symbol exports

Alex Zelezniak (1):
  net/ixgbe: fix VF reset to apply to correct VF

Ali Volkan Atli (1):
  net/e1000: fix returned number of available Rx descriptors

Arek Kusztal (1):
  app/test: fix verification of digest for GCM

Beilei Xing (2):
  net/i40e: fix dropping packets with ethertype 0x88A8
  net/i40e: fix parsing QinQ packets type

Bruce Richardson (1):
  net/mlx: fix debug build with gcc 6.1

Christian Ehrhardt (1):
  examples/ip_pipeline: fix Python interpreter

Deepak Kumar Jain (2):
  crypto/null: fix key size increment value
  crypto/qat: fix FreeBSD build

Dror Birkman (1):
  net/pcap: fix memory leak in jumbo frames

Ferruh Yigit (2):
  app/testpmd: fix help of MTU set commmand
  pmdinfogen: fix clang build

Gary Mussar (1):
  tools: fix virtio interface name when binding

Gowrishankar Muthukrishnan (1):
  examples/ip_pipeline: fix lcore mapping for ppc64

Hiroyuki Mikita (1):
  sched: fix releasing enqueued packets

James Poole (1):
  app/testpmd: fix timeout in Rx queue flushing

Jianfeng Tan (3):
  net/virtio_user: fix first queue pair without multiqueue
  net/virtio_user: fix wrong sequence of messages
  net/virtio_user: fix error management during init

Jim Harris (1):
  contigmem: zero all pages during mmap

John Daley (1):
  net/enic: fix bad L4 checksum flag on ICMP packets

Karmarkar Suyash (1):
  timer: fix lag delay

Maciej Czekaj (1):
  mem: fix crash on hugepage mapping error

Nelson Escobar (1):
  net/enic: fix freeing memory for descriptor ring

Olivier Matz (4):
  app/testpmd: fix crash when mempool allocation fails
  tools: fix json output of pmdinfo
  mbuf: fix error handling on pool creation
  mem: fix build with -O1

Pablo de Lara (3):
  hash: fix ring size
  hash: fix false zero signature key hit lookup
  crypto: fix build with icc

Qi Zhang (1):
  net/i40e/base: fix UDP packet header

Rich Lane (1):
  net/i40e: fix null pointer dereferences when using VMDq+RSS

Weiliang Luo (1):
  mempool: fix corruption due to invalid handler

Xiao Wang (5):
  net/fm10k: fix MAC address removal from switch
  net/ixgbe/base: fix pointer check
  net/ixgbe/base: fix check for NACK
  net/ixgbe/base: fix possible corruption of shadow RAM
  net/ixgbe/base: fix skipping PHY config

Yangchao Zhou (1):
  pci: fix memory leak when detaching device

Yury Kylulin (2):
  net/ixgbe: fix mbuf leak during Rx queue release
  net/i40e: fix mbuf leak during Rx queue release

Zhiyong Yang (1):
  net/virtio: fix xstats name


[dpdk-dev] [PATCH v2 1/3] lib/librte_port: enable file descriptor port support

2016-10-14 Thread Thomas Monjalon
2016-10-12 20:44, Dumitrescu, Cristian:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > This patchset was probably not tested as it does not compile.
> > And it could be useless if a TAP PMD is integrated.
> > I suggest to wait 17.02 cycle and see.
> 
> This patch was tested by me and Jasvinder as well and it works brilliantly.
> 
> We did not enable stats when testing, will sort out the missing semicolon 
> issue in the stats macros and resend v3 asap. This is a trivial issue, no 
> need to wait for 17.02.

So the stats were not tested.

> This is not conflicting with TAP PMD, and as said the scope of this 
> supersedes the TAP PMD.

The v3 has been applied and it breaks FreeBSD compilation now.

I felt it was not ready but you won with the words "it works brilliantly" ;)
(sorry I have not resisted to make the joke)



[dpdk-dev] [PATCH v6 6/6] testpmd: use Tx preparation in csum engine

2016-10-14 Thread Tomasz Kulasek
Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek 
---
 app/test-pmd/csumonly.c |   36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));

 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-   if (ethertype == _htons(ETHER_TYPE_IPv4))
-   return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-   else /* assume ethertype == ETHER_TYPE_IPv6 */
-   return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct 
testpmd_offload_info *info,
/* do not recalculate udp cksum if it was 0 */
if (udp_hdr->dgram_cksum != 0) {
udp_hdr->dgram_cksum = 0;
-   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+   if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
ol_flags |= PKT_TX_UDP_CKSUM;
-   udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-   info->ethertype, ol_flags);
-   } else {
+   else
udp_hdr->dgram_cksum =
get_udptcp_checksum(l3_hdr, udp_hdr,
info->ethertype);
-   }
}
} else if (info->l4_proto == IPPROTO_TCP) {
tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
tcp_hdr->cksum = 0;
-   if (tso_segsz) {
+   if (tso_segsz)
ol_flags |= PKT_TX_TCP_SEG;
-   tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-   ol_flags);
-   } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+   else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
ol_flags |= PKT_TX_TCP_CKSUM;
-   tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-   ol_flags);
-   } else {
+   else
tcp_hdr->cksum =
get_udptcp_checksum(l3_hdr, tcp_hdr,
info->ethertype);
-   }
} else if (info->l4_proto == IPPROTO_SCTP) {
sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
uint16_t nb_rx;
uint16_t nb_tx;
+   uint16_t nb_prep;
uint16_t i;
uint64_t rx_ol_flags, tx_ol_flags;
uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
printf("\n");
}
}
-   nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+   nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+   nb_rx);
+   if (nb_prep != nb_rx)
+   printf("Preparing packet burst to transmit failed: %s\n",
+   rte_strerror(rte_errno));
+
+   nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, 
nb_prep);
/*
 * Retry if necessary
 */
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 5/6] ixgbe: add Tx preparation

2016-10-14 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +-
 drivers/net/ixgbe/ixgbe_rxtx.h   |2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
.nb_max = IXGBE_MAX_RING_DESC,
.nb_min = IXGBE_MIN_RING_DESC,
.nb_align = IXGBE_TXD_ALIGN,
+   .nb_seg_max = IXGBE_TX_MAX_SEG,
+   .nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };

 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->dev_ops = _eth_dev_ops;
eth_dev->rx_pkt_burst = _recv_pkts;
eth_dev->tx_pkt_burst = _xmit_pkts;
+   eth_dev->tx_pkt_prep = _prep_pkts;

/*
 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..83db18f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:

 /*
  *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+   int i, ret;
+   uint64_t ol_flags;
+   struct rte_mbuf *m;
+   struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+   ol_flags = m->ol_flags;
+
+   /**
+* Check if packet meets requirements for number of segments
+*
+* NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and 
non-TSO
+*/
+
+   if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+
+   return i;
+}
+
+/*
+ *
  *  RX functions
  *
  **/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct 
ixgbe_tx_queue *txq)
if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+   dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
if (txq->tx_rs_thresh <= 

[dpdk-dev] [PATCH v6 4/6] i40e: add Tx preparation

2016-10-14 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/i40e/i40e_ethdev.c |3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++-
 drivers/net/i40e/i40e_rxtx.h   |8 +
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
dev->dev_ops = _eth_dev_ops;
dev->rx_pkt_burst = i40e_recv_pkts;
dev->tx_pkt_burst = i40e_xmit_pkts;
+   dev->tx_pkt_prep = i40e_prep_pkts;

/* for secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_max = I40E_MAX_RING_DESC,
.nb_min = I40E_MIN_RING_DESC,
.nb_align = I40E_ALIGN_RING_DESC,
+   .nb_seg_max = I40E_TX_MAX_SEG,
+   .nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
};

if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..3e2c428 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

+#define I40E_TX_OFFLOAD_MASK (  \
+   PKT_TX_IP_CKSUM |   \
+   PKT_TX_L4_MASK |\
+   PKT_TX_OUTER_IP_CKSUM | \
+   PKT_TX_TCP_SEG |\
+   PKT_TX_QINQ_PKT |   \
+   PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
  struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
return nb_tx;
 }

+/*
+ *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   uint64_t ol_flags;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+   ol_flags = m->ol_flags;
+
+   /**
+* m->nb_segs is uint8_t, so m->nb_segs is always less than
+* I40E_TX_MAX_SEG.
+* We check only a condition for m->nb_segs > 
I40E_TX_MAX_MTU_SEG.
+*/
+   if (!(ol_flags & PKT_TX_TCP_SEG)) {
+   if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+   rte_errno = -1;
+   return i;
+   }
+   } else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+   (m->tso_segsz > I40E_MAX_TSO_MSS)) {
+   /* MSS outside the range (256B - 9674B) are considered 
malicious */
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+   return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
dev->tx_pkt_burst = i40e_xmit_pkts_simple;
}
+   dev->tx_pkt_prep = NULL;
} else {
PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
dev->tx_pkt_burst = i40e_xmit_pkts;
+   

[dpdk-dev] [PATCH v6 3/6] fm10k: add Tx preparation

2016-10-14 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/fm10k/fm10k.h|6 +
 drivers/net/fm10k/fm10k_ethdev.c |5 
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct 
fm10k_tx_desc))

+#define FM10K_TX_MAX_SEG UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t 
offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
.nb_max = FM10K_MAX_TX_DESC,
.nb_min = FM10K_MIN_TX_DESC,
.nb_align = FM10K_MULT_TX_DESC,
+   .nb_seg_max = FM10K_TX_MAX_SEG,
+   .nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
};

dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   dev->tx_pkt_prep = NULL;
} else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   dev->tx_pkt_prep = fm10k_prep_pkts;
PMD_INIT_LOG(DEBUG, "Use regular Tx func");
}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
dev->dev_ops = _eth_dev_ops;
dev->rx_pkt_burst = _recv_pkts;
dev->tx_pkt_burst = _xmit_pkts;
+   dev->tx_pkt_prep = _prep_pkts;

/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..7ca28c0 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@

 #include 
 #include 
+#include 
 #include "fm10k.h"
 #include "base/fm10k_type.h"

@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif

+#define FM10K_TX_OFFLOAD_MASK (  \
+   PKT_TX_VLAN_PKT |\
+   PKT_TX_IP_CKSUM |\
+   PKT_TX_L4_MASK | \
+   PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,

return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+
+   if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+   (m->tso_segsz < FM10K_TSO_MINMSS)) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+   if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+   }
+
+   return i;
+}
-- 
1.7.9.5



[dpdk-dev] [PATCH v6 2/6] e1000: add Tx preparation

2016-10-14 Thread Tomasz Kulasek
Signed-off-by: Tomasz Kulasek 
---
 drivers/net/e1000/e1000_ethdev.h |   11 
 drivers/net/e1000/em_ethdev.c|5 +++-
 drivers/net/e1000/em_rxtx.c  |   48 ++-
 drivers/net/e1000/igb_ethdev.c   |4 +++
 drivers/net/e1000/igb_rxtx.c |   52 +-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID   RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START  RTE_INTR_VEC_RXTX_OFFSET

+#define IGB_TX_MAX_SEG UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG  UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
eth_dev->dev_ops = _em_ops;
eth_dev->rx_pkt_burst = (eth_rx_burst_t)_em_recv_pkts;
eth_dev->tx_pkt_burst = (eth_tx_burst_t)_em_xmit_pkts;
+   eth_dev->tx_pkt_prep = (eth_tx_prep_t)_em_prep_pkts;

/* for secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_max = E1000_MAX_RING_DESC,
.nb_min = E1000_MIN_RING_DESC,
.nb_align = EM_TXD_ALIGN,
+   .nb_seg_max = EM_TX_MAX_SEG,
+   .nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
};

dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..3af2f69 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #include "e1000_logs.h"
@@ -77,6 +78,14 @@

 #define E1000_RXDCTL_GRAN  0x0100 /* RXDCTL Granularity */

+#define E1000_TX_OFFLOAD_MASK ( \
+   PKT_TX_IP_CKSUM |   \
+   PKT_TX_L4_MASK |\
+   PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+   (PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:

 /*
  *
+ *  TX prep functions
+ *
+ **/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   int i, ret;
+   struct rte_mbuf *m;
+
+   for (i = 0; i < nb_pkts; i++) {
+   m = tx_pkts[i];
+
+   if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+   rte_errno = -EINVAL;
+   return i;
+   }
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+   ret = rte_validate_tx_offload(m);
+   if (ret != 0) {
+   rte_errno = ret;
+   return i;
+   }
+#endif
+   ret = rte_phdr_cksum_fix(m);
+   if (ret != 0) {
+   rte_errno = ret;

[dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation

2016-10-14 Thread Tomasz Kulasek
Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

uint16_t nb_seg_max;
/**< Max number of segments per whole packet. */

uint16_t nb_mtu_seg_max;
/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
to validate general requirements for tx offload in packet such a
flag completness. In current implementation this function is called
optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
before hardware tx checksum offload.
 - for non-TSO tcp/udp packets full pseudo-header checksum is
   counted and set.
 - for TSO the IP payload length is not included.


PERFORMANCE TESTS
-

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)


Signed-off-by: Tomasz Kulasek 
---
 config/common_base|1 +
 lib/librte_ether/rte_ethdev.h |   85 +
 lib/librte_mbuf/rte_mbuf.h|9 +++
 lib/librte_net/Makefile   |3 +-
 lib/librte_net/rte_pkt.h  |  137 +
 5 files changed, 234 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y

 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..a10ed9c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
uint16_t nb_max;   /**< Max allowed number of descriptors. */
uint16_t nb_min;   /**< Min allowed number of descriptors. */
uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+   uint16_t nb_seg_max; /**< Max number of segments per whole packet. 
*/
+   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };

 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+  struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet 
device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
   struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
function. */
struct rte_eth_dev_data *data;  /**< Pointer to device data */
const struct eth_driver *driver;/**< Driver for this device */
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, 
nb_pkts);
 }

+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to 

[dpdk-dev] [PATCH v6 0/6] add Tx preparation

2016-10-14 Thread Tomasz Kulasek
As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different 
requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
---

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
-

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

for (i = 0; i < nb_pkts; i++) {

/* initialize or process packet */

bufs[i]->tso_segsz = 800;
bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
| PKT_TX_IP_CKSUM;
bufs[i]->l2_len = sizeof(struct ether_hdr);
bufs[i]->l3_len = sizeof(struct ipv4_hdr);
bufs[i]->l4_len = sizeof(struct tcp_hdr);
}

/* Prepare burst of TX packets */
nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

if (nb_prep < nb_pkts) {
printf("tx_prep failed\n");

/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
 * can be used on remaining packets to find another ones.
 */

}

/* Send burst of TX packets */
nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

/* Free any unsent packets. */


v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c  |   36 --
 config/common_base   |1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++
 drivers/net/e1000/em_ethdev.c|5 +-
 drivers/net/e1000/em_rxtx.c  |   48 

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Bruce Richardson
On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
> 
> I've attempted to address as many comments as possible.
> 
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
> 
> Updates are also available online:
> 
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> 
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> 
> Repo:
> https://github.com/jerinjacobk/libeventdev
> 

Thanks for all the work on this.


> +/* Event device configuration bitmap flags */
> +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
> +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
> + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
> + */

Can you clarify why this is needed? If an app wants to use the same
dequeue wait times for all dequeues can it not specify that itself via
the wait time parameter, rather than having a global dequeue wait value?

/Bruce


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Francois Ozog
Dear Jerin,

Very nice work!

This new RFC version opens the way to a unified conceptual model of
Software Defined Data Planes supported by diverse implementations such
as OpenDataPlane and DPDK.

I think this is an important signal to the industry.

Fran?ois-Fr?d?ric


From: dev  on behalf of Jerin Jacob

Sent: Tuesday, October 11, 2016 9:30 PM
To: dev at dpdk.org
Cc: thomas.monjalon at 6wind.com; bruce.richardson at intel.com;
narender.vangati at intel.com; hemant.agrawal at nxp.com;
gage.eads at intel.com; Jerin Jacob
Subject: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven
programming model framework for DPDK

Thanks to Intel and NXP folks for the positive and constructive feedback
I've received so far. Here is the updated RFC(v2).

I've attempted to address as many comments as possible.

This series adds rte_eventdev.h to the DPDK tree with
adequate documentation in doxygen format.

Updates are also available online:

Related draft header file (this patch):
https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

PDF version(doxgen output):
https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf

Repo:
https://github.com/jerinjacobk/libeventdev

v1..v2

- Added Cavium, Intel, NXP copyrights in header file

- Changed the concept of flow queues to flow ids.
This is avoid dictating a specific structure to hold the flows.
A s/w implementation can do atomic load balancing on multiple
flow ids more efficiently than maintaining each event in a specific flow queue.

- Change the scheduling group to event queue.
A scheduling group is more a stream of events, so an event queue is a better
 abstraction.

- Introduced event port concept, Instead of trying eventdev access to the lcore,
a higher level of abstraction called event port is needed which is the
application i/f to the eventdev to dequeue and enqueue the events.
One or more event queues can be linked to single event port.
There can be more than one event port per lcore allowing multiple lightweight
threads to have their own i/f into eventdev, if the implementation supports it.
An event port will be bound to a lcore or a lightweight thread to keep
portable application workflow.
An event port abstraction also encapsulates dequeue depth and enqueue depth for
a scheduler implementations which can schedule multiple events at a time and
output events that can be buffered.

- Added configuration options with event queue(nb_atomic_flows,
nb_atomic_order_sequences, single consumer etc)
and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define the
limits on the resource usage.(Useful for optimized software implementation)

- Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and RTE_EVENT_DEV_CAP_EVENT_QOS
schemes of priority handling

- Added event port to event queue servicing priority.
This allows two event ports to connect to the same event queue with
different priorities.

- Changed the workflow as schedule/dequeue/enqueue.
An implementation is free to define schedule as NOOP.
A distributed s/w scheduler can use this to schedule events;
also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.

- Removed Cavium HW specific schedule_from_group API

- Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
 Introduced a more generic "event pinning" concept. i.e
If the normal workflow is a dequeue -> do work based on event type -> enqueue,
a pin_event argument to enqueue
where the pinned event is returned through the normal dequeue)
allows application workflow to remain the same whether or not an
implementation supports it.

- Added dequeue() burst variant

- Added the definition of a closed/open system - where open system is memory
backed and closed system eventdev has limited capacity.
In such systems, it is also useful to denote per event port how many packets
can be active in the system.
This can serve as a threshold for ethdev like devices so they don't overwhelm
core to core events.

- Added the option to specify maximum amount of time(in ns) application needs
wait on dequeue()

- Removed the scheme of expressing the number of flows in log2 format

Open item or the item needs improvement.

- Abstract the differences in event QoS management with different
priority schemes
available in different HW or SW implementations with portable
application workflow.

Based on the feedback, there three different kinds of QoS support available in
three different HW or SW implementations.
1) Priority associated with the event queue
2) Priority associated with each event enqueue
(Same flow can have two different priority on two separate enqueue)
3) Priority associated with the flow(each flow has unique priority)

In v2, The differences abstracted based on device capability
(RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
This scheme would call for different 

[dpdk-dev] [PATCHv3] examples/l3fwd: em: use hw accelerated crc hash function for arm64

2016-10-14 Thread Hemant Agrawal
if machine level CRC extension are available, offload the
hash to machine provide functions e.g. armv8-a CRC extensions
support it

Signed-off-by: Hemant Agrawal 
Reviewed-by: Jerin Jacob 
---
 examples/l3fwd/l3fwd_em.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 89a68e6..9cc4460 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -57,13 +57,17 @@

 #include "l3fwd.h"

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
+#define EM_HASH_CRC 1
+#endif
+
+#ifdef EM_HASH_CRC
 #include 
 #define DEFAULT_HASH_FUNC   rte_hash_crc
 #else
 #include 
 #define DEFAULT_HASH_FUNC   rte_jhash
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

 #define IPV6_ADDR_LEN 16

@@ -168,17 +172,17 @@ ipv4_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
t = k->proto;
p = (const uint32_t *)>port_src;

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
init_val = rte_hash_crc_4byte(t, init_val);
init_val = rte_hash_crc_4byte(k->ip_src, init_val);
init_val = rte_hash_crc_4byte(k->ip_dst, init_val);
init_val = rte_hash_crc_4byte(*p, init_val);
-#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#else
init_val = rte_jhash_1word(t, init_val);
init_val = rte_jhash_1word(k->ip_src, init_val);
init_val = rte_jhash_1word(k->ip_dst, init_val);
init_val = rte_jhash_1word(*p, init_val);
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

return init_val;
 }
@@ -190,16 +194,16 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
const union ipv6_5tuple_host *k;
uint32_t t;
const uint32_t *p;
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
const uint32_t  *ip_src0, *ip_src1, *ip_src2, *ip_src3;
const uint32_t  *ip_dst0, *ip_dst1, *ip_dst2, *ip_dst3;
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

k = data;
t = k->proto;
p = (const uint32_t *)>port_src;

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
ip_src0 = (const uint32_t *) k->ip_src;
ip_src1 = (const uint32_t *)(k->ip_src+4);
ip_src2 = (const uint32_t *)(k->ip_src+8);
@@ -218,14 +222,14 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
init_val = rte_hash_crc_4byte(*ip_dst2, init_val);
init_val = rte_hash_crc_4byte(*ip_dst3, init_val);
init_val = rte_hash_crc_4byte(*p, init_val);
-#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#else
init_val = rte_jhash_1word(t, init_val);
init_val = rte_jhash(k->ip_src,
sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
init_val = rte_jhash(k->ip_dst,
sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
init_val = rte_jhash_1word(*p, init_val);
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif
return init_val;
 }

-- 
1.9.1



[dpdk-dev] [PATCH v2 4/4] eal/linux: generalize PCI kernel driver extraction to EAL

2016-10-14 Thread Shreyansh Jain
From: Jan Viktorin 

Generalize the PCI-specific pci_get_kernel_driver_by_path. The function
is general enough, we have just moved it to eal.c, changed the prefix to
rte_eal and provided it privately to other parts of EAL.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
--
Changes since v1:
 - update BSD support for unbind kernel driver

---
 lib/librte_eal/bsdapp/eal/eal.c   |  7 +++
 lib/librte_eal/common/eal_private.h   | 14 ++
 lib/librte_eal/linuxapp/eal/eal.c | 29 +
 lib/librte_eal/linuxapp/eal/eal_pci.c | 31 +--
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5271fc2..9b93da3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -640,3 +640,10 @@ rte_eal_unbind_kernel_driver(const char *devpath 
__rte_unused,
 {
return -ENOTSUP;
 }
+
+int
+rte_eal_get_kernel_driver_by_path(const char *filename __rte_unused,
+ char *dri_name __rte_unused)
+{
+   return -ENOTSUP;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index b0c208a..c8c2131 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -269,6 +269,20 @@ int rte_eal_check_module(const char *module_name);
 int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);

 /**
+ * Extract the kernel driver name from the absolute path to the driver.
+ *
+ * @param filename  path to the driver ("/driver")
+ * @path  dri_name  target buffer where to place the driver name
+ *  (should be at least PATH_MAX long)
+ *
+ * @return
+ *  -1   on failure
+ *   0   when successful
+ *   1   when there is no such driver
+ */
+int rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name);
+
+/**
  * Get cpu core_id.
  *
  * This function is private to the EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 5f6676d..00af21c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -969,3 +969,32 @@ error:
fclose(f);
return -1;
 }
+
+int
+rte_eal_get_kernel_driver_by_path(const char *filename, char *dri_name)
+{
+   int count;
+   char path[PATH_MAX];
+   char *name;
+
+   if (!filename || !dri_name)
+   return -1;
+
+   count = readlink(filename, path, PATH_MAX);
+   if (count >= PATH_MAX)
+   return -1;
+
+   /* For device does not have a driver */
+   if (count < 0)
+   return 1;
+
+   path[count] = '\0';
+
+   name = strrchr(path, '/');
+   if (name) {
+   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
+   return 0;
+   }
+
+   return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a03553f..e1cf9e8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -78,35 +78,6 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev)
return rte_eal_unbind_kernel_driver(devpath, devid);
 }

-static int
-pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
-{
-   int count;
-   char path[PATH_MAX];
-   char *name;
-
-   if (!filename || !dri_name)
-   return -1;
-
-   count = readlink(filename, path, PATH_MAX);
-   if (count >= PATH_MAX)
-   return -1;
-
-   /* For device does not have a driver */
-   if (count < 0)
-   return 1;
-
-   path[count] = '\0';
-
-   name = strrchr(path, '/');
-   if (name) {
-   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
-   return 0;
-   }
-
-   return -1;
-}
-
 /* Map pci device */
 int
 rte_eal_pci_map_device(struct rte_pci_device *dev)
@@ -354,7 +325,7 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,

/* parse driver */
snprintf(filename, sizeof(filename), "%s/driver", dirname);
-   ret = pci_get_kernel_driver_by_path(filename, driver);
+   ret = rte_eal_get_kernel_driver_by_path(filename, driver);
if (ret < 0) {
RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
free(dev);
-- 
2.7.4



[dpdk-dev] [PATCH v2 3/4] eal/linux: generalize PCI kernel unbinding driver to EAL

2016-10-14 Thread Shreyansh Jain
From: Jan Viktorin 

Generalize the PCI-specific pci_unbind_kernel_driver. It is now divided
into two parts. First, determination of the path and string identification
of the device to be unbound. Second, the actual unbind operation which is
generic.

BSD implementation updated as ENOTSUP

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
--
Changes since v1:
 - update BSD support for unbind kernel driver

---
 lib/librte_eal/bsdapp/eal/eal.c   |  7 +++
 lib/librte_eal/bsdapp/eal/eal_pci.c   |  4 ++--
 lib/librte_eal/common/eal_private.h   | 13 +
 lib/librte_eal/linuxapp/eal/eal.c | 26 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c | 33 +
 5 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 35e3117..5271fc2 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -633,3 +633,10 @@ rte_eal_process_type(void)
 {
return rte_config.process_type;
 }
+
+int
+rte_eal_unbind_kernel_driver(const char *devpath __rte_unused,
+const char *devid __rte_unused)
+{
+   return -ENOTSUP;
+}
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 7ed0115..703f034 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -89,11 +89,11 @@

 /* unbind kernel driver for this device */
 int
-pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
+pci_unbind_kernel_driver(struct rte_pci_device *dev)
 {
RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
"for BSD\n");
-   return -ENOTSUP;
+   return rte_eal_unbind_kernel_driver(dev);
 }

 /* Map pci device */
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 9e7d8f6..b0c208a 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -256,6 +256,19 @@ int rte_eal_alarm_init(void);
 int rte_eal_check_module(const char *module_name);

 /**
+ * Unbind kernel driver bound to the device specified by the given devpath,
+ * and its string identification.
+ *
+ * @param devpath  path to the device directory ("/sys/.../devices/")
+ * @param devididentification of the device ()
+ *
+ * @return
+ *  -1  unbind has failed
+ *   0  module has been unbound
+ */
+int rte_eal_unbind_kernel_driver(const char *devpath, const char *devid);
+
+/**
  * Get cpu core_id.
  *
  * This function is private to the EAL.
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 2075282..5f6676d 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -943,3 +943,29 @@ rte_eal_check_module(const char *module_name)
/* Module has been found */
return 1;
 }
+
+int
+rte_eal_unbind_kernel_driver(const char *devpath, const char *devid)
+{
+   char filename[PATH_MAX];
+   FILE *f;
+
+   snprintf(filename, sizeof(filename),
+"%s/driver/unbind", devpath);
+
+   f = fopen(filename, "w");
+   if (f == NULL) /* device was not bound */
+   return 0;
+
+   if (fwrite(devid, strlen(devid), 1, f) == 0) {
+   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
+   filename);
+   goto error;
+   }
+
+   fclose(f);
+   return 0;
+error:
+   fclose(f);
+   return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 876ba38..a03553f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -59,38 +59,23 @@ int
 pci_unbind_kernel_driver(struct rte_pci_device *dev)
 {
int n;
-   FILE *f;
-   char filename[PATH_MAX];
-   char buf[BUFSIZ];
+   char devpath[PATH_MAX];
+   char devid[BUFSIZ];
struct rte_pci_addr *loc = >addr;

-   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
-   snprintf(filename, sizeof(filename),
-   "%s/" PCI_PRI_FMT "/driver/unbind", pci_get_sysfs_path(),
+   /* devpath /sys/bus/pci/devices/:BB:CC.D */
+   snprintf(devpath, sizeof(devpath),
+   "%s/" PCI_PRI_FMT, pci_get_sysfs_path(),
loc->domain, loc->bus, loc->devid, loc->function);

-   f = fopen(filename, "w");
-   if (f == NULL) /* device was not bound */
-   return 0;
-
-   n = snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
+   n = snprintf(devid, sizeof(devid), PCI_PRI_FMT "\n",
 loc->domain, loc->bus, loc->devid, loc->function);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
+   if ((n < 0) || (n >= (int)sizeof(devid))) {
RTE_LOG(ERR, EAL, "%s(): snprintf failed\n", __func__);
-   goto error;
- 

[dpdk-dev] [PATCH v2 2/4] eal: generalize PCI map/unmap resource to EAL

2016-10-14 Thread Shreyansh Jain
From: Jan Viktorin 

The functions pci_map_resource, pci_unmap_resource are generic so the
pci_* prefix can be omitted. The functions are moved to the
eal_common_dev.c so they can be reused by other infrastructure.

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
 lib/librte_eal/common/eal_common_dev.c  | 39 +
 lib/librte_eal/common/eal_common_pci.c  | 39 -
 lib/librte_eal/common/eal_common_pci_uio.c  | 16 +-
 lib/librte_eal/common/include/rte_dev.h | 32 
 lib/librte_eal/common/include/rte_pci.h | 32 
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c   |  2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c  |  5 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 10 files changed, 89 insertions(+), 82 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 8b3ed88..7ed0115 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -228,7 +228,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, 
int res_idx,

/* if matching map is found, then use it */
offset = res_idx * pagesz;
-   mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+   mapaddr = rte_eal_map_resource(NULL, fd, (off_t)offset,
(size_t)dev->mem_resource[res_idx].len, 0);
close(fd);
if (mapaddr == MAP_FAILED)
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2f81f7c..11d9f59 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -170,6 +170,8 @@ DPDK_16.11 {
rte_delay_us_callback_register;
rte_eal_dev_attach;
rte_eal_dev_detach;
+   rte_eal_map_resource;
+   rte_eal_unmap_resource;
rte_eal_vdrv_register;
rte_eal_vdrv_unregister;

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 4f3b493..457d227 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -151,3 +152,41 @@ err:
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n", name);
return -EINVAL;
 }
+
+/* map a particular resource from a file */
+void *
+rte_eal_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
+{
+   void *mapaddr;
+
+   /* Map the Memory resource of device */
+   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
+   MAP_SHARED | additional_flags, fd, offset);
+   if (mapaddr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s"
+   " (%p)\n", __func__, fd, requested_addr,
+   (unsigned long)size, (unsigned long)offset,
+   strerror(errno), mapaddr);
+   } else
+   RTE_LOG(DEBUG, EAL, "  Device memory mapped at %p\n", mapaddr);
+
+   return mapaddr;
+}
+
+/* unmap a particular resource */
+void
+rte_eal_unmap_resource(void *requested_addr, size_t size)
+{
+   if (requested_addr == NULL)
+   return;
+
+   /* Unmap the Memory resource of device */
+   if (munmap(requested_addr, size)) {
+   RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
+   __func__, requested_addr, (unsigned long)size,
+   strerror(errno));
+   } else
+   RTE_LOG(DEBUG, EAL, "  Device memory unmapped at %p\n",
+   requested_addr);
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 638cd86..464acc1 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -67,7 +67,6 @@
 #include 
 #include 
 #include 
-#include 

 #include 
 #include 
@@ -114,44 +113,6 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
return NULL;
 }

-/* map a particular resource from a file */
-void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
-int additional_flags)
-{
-   void *mapaddr;
-
-   /* Map the PCI memory resource of device */
-   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED | additional_flags, fd, offset);
-   if (mapaddr == MAP_FAILED) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
-   __func__, fd, requested_addr,
-   (unsigned 

[dpdk-dev] [PATCH v2 1/4] eal: generalize PCI kernel driver enum to EAL

2016-10-14 Thread Shreyansh Jain
From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 

--
Changes since v0:
 - fix compilation error due to missing include
---
 lib/librte_eal/common/include/rte_dev.h | 12 
 lib/librte_eal/common/include/rte_pci.h |  9 -
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index b3873bd..e73b0fa 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -109,6 +109,18 @@ struct rte_mem_resource {
void *addr; /**< Virtual address, NULL when not mapped. */
 };

+/**
+ * Kernel driver passthrough type
+ */
+enum rte_kernel_driver {
+   RTE_KDRV_UNKNOWN = 0,
+   RTE_KDRV_IGB_UIO,
+   RTE_KDRV_VFIO,
+   RTE_KDRV_UIO_GENERIC,
+   RTE_KDRV_NIC_UIO,
+   RTE_KDRV_NONE,
+};
+
 /** Double linked list of device drivers. */
 TAILQ_HEAD(rte_driver_list, rte_driver);
 /** Double linked list of devices. */
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 9ce8847..2c7046f 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -135,15 +135,6 @@ struct rte_pci_addr {

 struct rte_devargs;

-enum rte_kernel_driver {
-   RTE_KDRV_UNKNOWN = 0,
-   RTE_KDRV_IGB_UIO,
-   RTE_KDRV_VFIO,
-   RTE_KDRV_UIO_GENERIC,
-   RTE_KDRV_NIC_UIO,
-   RTE_KDRV_NONE,
-};
-
 /**
  * A structure describing a PCI device.
  */
-- 
2.7.4



[dpdk-dev] [PATCH v2 0/4] Generalize PCI specific EAL function/structures

2016-10-14 Thread Shreyansh Jain
(Rebased these over HEAD fed622dfd)

These patches were initially part of Jan's original series on SoC
Framework ([1],[2]). An update to that series, without these patches,
was posted here [3].

Main motivation for these is aim of introducing a non-PCI centric
subsystem in EAL. As of now the first usecase is SoC, but not limited to
it.

4 patches in this series are independent of each other, as well as SoC
framework. All these focus on generalizing some structure or functions
present with the PCI specific code to EAL Common area (or splitting a
function to be more userful).

 - 0001: move the rte_kernel_driver enum from rte_pci to rte_dev. As of
   now this structure is embedded in rte_pci_device, but, going ahead it
   can be part of other rte_xxx_device structures. Either way, it has no
   impact on PCI.
 - 0002: Functions pci_map_resource/pci_unmap_resource are moved to EAL
   common as rte_eal_map_resource/rte_eal_unmap_resource, respectively.
 - 0003: Split the  pci_unbind_kernel_driver into two, still working on
   the PCI BDF sysfs layout, first handles the file path (and validations)
   and second does the actual unbind. The second part might be useful in
   case of non-PCI layouts.
   -- This is useful for other subsystem, parallel to PCI, which require
| MMAP support.
`- an equivalent NOTSUP function for BSD has been added in v1
 - 0004: Move pci_get_kernel_driver_by_path to
   rte_eal_get_kernel_driver_by_path in EAL common. This function is
   generic for any sysfs compliant driver and can be re-used by other
   non-PCI subsystem.
`- an equivalent NOTSUP function for BSD has been added in v1

Changes since v1
 - Rebased over master (fed622dfd)
 - Added dummy functions for BSD for unbind and kernel driver fetch
   functions (patches 003, 004)

Changes since v0 [4]:
 - Fix for checkpatch and check-git-log
 - Fix missing include in patch 0001
 - Drop patch 2 for splitting sysfs into a sub-function taking file
   handle. This patch doesn't really fit into the model of PCI->EAL
   movement of generic functions which other patches relate to.
   Also, taking cue from review comment [5], it might not have a
   viable use-case as of now.

[1] http://dpdk.org/ml/archives/dev/2016-January/030915.html
[2] http://www.dpdk.org/ml/archives/dev/2016-May/038486.html
[3] http://dpdk.org/ml/archives/dev/2016-August/045993.html
[4] http://dpdk.org/ml/archives/dev/2016-September/046035.html
[5] http://dpdk.org/ml/archives/dev/2016-September/046041.html

Jan Viktorin (4):
  eal: generalize PCI kernel driver enum to EAL
  eal: generalize PCI map/unmap resource to EAL
  eal/linux: generalize PCI kernel unbinding driver to EAL
  eal/linux: generalize PCI kernel driver extraction to EAL

 lib/librte_eal/bsdapp/eal/eal.c | 14 ++
 lib/librte_eal/bsdapp/eal/eal_pci.c |  6 +--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 +
 lib/librte_eal/common/eal_common_dev.c  | 39 
 lib/librte_eal/common/eal_common_pci.c  | 39 
 lib/librte_eal/common/eal_common_pci_uio.c  | 16 ---
 lib/librte_eal/common/eal_private.h | 27 +++
 lib/librte_eal/common/include/rte_dev.h | 44 ++
 lib/librte_eal/common/include/rte_pci.h | 41 
 lib/librte_eal/linuxapp/eal/eal.c   | 55 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 62 -
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c   |  2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c  |  5 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 +
 14 files changed, 208 insertions(+), 146 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support

2016-10-14 Thread linhaifeng
? 2016/10/10 16:03, Yuanhan Liu ??:
> On Sun, Oct 09, 2016 at 06:46:44PM +0800, linhaifeng wrote:
>> ? 2016/8/23 16:10, Yuanhan Liu ??:
>>> The basic idea of Tx zero copy is, instead of copying data from the
>>> desc buf, here we let the mbuf reference the desc buf addr directly.
>>
>> Is there problem when push vlan to the mbuf which reference the desc buf 
>> addr directly?
> 
> Yes, you can't do that when zero copy is enabled, due to following code
> piece:
> 
> +   if (unlikely(dev->dequeue_zero_copy && (hpa = 
> gpa_to_hpa(dev,
> +   desc->addr + desc_offset, 
> cpy_len {
> +   cur->data_len = cpy_len;
> ==> +   cur->data_off = 0;
> +   cur->buf_addr = (void *)(uintptr_t)desc_addr;
> +   cur->buf_physaddr = hpa;
> 
> The marked line basically makes the mbuf has no headroom to use.
> 
>   --yliu
> 
>> We know if guest use virtio_net(kernel) maybe skb has no headroom.
> 
> .
>

It ok to set data_off zero.
But we also can use 128 bytes headromm when guest use virtio_net PMD but not 
for virtio_net kernel driver.

I think it's better to add headroom size to desc and kernel dirver support set 
headroom size.




[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Eads, Gage
Thanks Jerin, this looks good. I've put a few notes/questions inline.

Thanks,
Gage

>  -Original Message-
>  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
>  Sent: Tuesday, October 11, 2016 2:30 PM
>  To: dev at dpdk.org
>  Cc: thomas.monjalon at 6wind.com; Richardson, Bruce
>  ; Vangati, Narender
>  ; hemant.agrawal at nxp.com; Eads, Gage
>  ; Jerin Jacob 
>  Subject: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming
>  model framework for DPDK
>  
>  Thanks to Intel and NXP folks for the positive and constructive feedback
>  I've received so far. Here is the updated RFC(v2).
>  
>  I've attempted to address as many comments as possible.
>  
>  This series adds rte_eventdev.h to the DPDK tree with
>  adequate documentation in doxygen format.
>  
>  Updates are also available online:
>  
>  Related draft header file (this patch):
>  https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
>  
>  PDF version(doxgen output):
>  https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
>  
>  Repo:
>  https://github.com/jerinjacobk/libeventdev
>  
>  v1..v2
>  
>  - Added Cavium, Intel, NXP copyrights in header file
>  
>  - Changed the concept of flow queues to flow ids.
>  This is avoid dictating a specific structure to hold the flows.
>  A s/w implementation can do atomic load balancing on multiple
>  flow ids more efficiently than maintaining each event in a specific flow 
> queue.
>  
>  - Change the scheduling group to event queue.
>  A scheduling group is more a stream of events, so an event queue is a better
>   abstraction.
>  
>  - Introduced event port concept, Instead of trying eventdev access to the 
> lcore,
>  a higher level of abstraction called event port is needed which is the
>  application i/f to the eventdev to dequeue and enqueue the events.
>  One or more event queues can be linked to single event port.
>  There can be more than one event port per lcore allowing multiple lightweight
>  threads to have their own i/f into eventdev, if the implementation supports 
> it.
>  An event port will be bound to a lcore or a lightweight thread to keep
>  portable application workflow.
>  An event port abstraction also encapsulates dequeue depth and enqueue depth
>  for
>  a scheduler implementations which can schedule multiple events at a time and
>  output events that can be buffered.
>  
>  - Added configuration options with event queue(nb_atomic_flows,
>  nb_atomic_order_sequences, single consumer etc)
>  and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define
>  the
>  limits on the resource usage.(Useful for optimized software implementation)
>  
>  - Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and
>  RTE_EVENT_DEV_CAP_EVENT_QOS
>  schemes of priority handling
>  
>  - Added event port to event queue servicing priority.
>  This allows two event ports to connect to the same event queue with
>  different priorities.
>  
>  - Changed the workflow as schedule/dequeue/enqueue.
>  An implementation is free to define schedule as NOOP.
>  A distributed s/w scheduler can use this to schedule events;
>  also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.
>  
>  - Removed Cavium HW specific schedule_from_group API
>  
>  - Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
>   Introduced a more generic "event pinning" concept. i.e
>  If the normal workflow is a dequeue -> do work based on event type ->
>  enqueue,
>  a pin_event argument to enqueue
>  where the pinned event is returned through the normal dequeue)
>  allows application workflow to remain the same whether or not an
>  implementation supports it.
>  
>  - Added dequeue() burst variant
>  
>  - Added the definition of a closed/open system - where open system is memory
>  backed and closed system eventdev has limited capacity.
>  In such systems, it is also useful to denote per event port how many packets
>  can be active in the system.
>  This can serve as a threshold for ethdev like devices so they don't overwhelm
>  core to core events.
>  
>  - Added the option to specify maximum amount of time(in ns) application needs
>  wait on dequeue()
>  
>  - Removed the scheme of expressing the number of flows in log2 format
>  
>  Open item or the item needs improvement.
>  
>  - Abstract the differences in event QoS management with different priority
>  schemes
>  available in different HW or SW implementations with portable application
>  workflow.
>  
>  Based on the feedback, there three different kinds of QoS support available 
> in
>  three different HW or SW implementations.
>  1) Priority associated with the event queue
>  2) Priority associated with each event enqueue
>  (Same flow can have two different priority on two separate enqueue)
>  3) Priority associated with the flow(each flow has unique priority)
>  
>  In v2, The differences abstracted based on device capability
>  

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Jerin Jacob
On Thu, Oct 13, 2016 at 11:14:38PM -0500, Bill Fischofer wrote:
> Hi Jerin,

Hi Bill,

Thanks for the review.

[snip]
> > + * If the device init operation is successful, the correspondence between
> > + * the device identifier assigned to the new device and its associated
> > + * *rte_event_dev* structure is effectively registered.
> > + * Otherwise, both the *rte_event_dev* structure and the device
> > identifier are
> > + * freed.
> > + *
> > + * The functions exported by the application Event API to setup a device
> > + * designated by its device identifier must be invoked in the following
> > order:
> > + * - rte_event_dev_configure()
> > + * - rte_event_queue_setup()
> > + * - rte_event_port_setup()
> > + * - rte_event_port_link()
> > + * - rte_event_dev_start()
> > + *
> > + * Then, the application can invoke, in any order, the functions
> > + * exported by the Event API to schedule events, dequeue events, enqueue
> > events,
> > + * change event queue(s) to event port [un]link establishment and so on.
> > + *
> > + * Application may use rte_event_[queue/port]_default_conf_get() to get
> > the
> > + * default configuration to set up an event queue or event port by
> > + * overriding few default values.
> > + *
> > + * If the application wants to change the configuration (i.e. call
> > + * rte_event_dev_configure(), rte_event_queue_setup(), or
> > + * rte_event_port_setup()), it must call rte_event_dev_stop() first to
> > stop the
> > + * device and then do the reconfiguration before calling
> > rte_event_dev_start()
> > + * again. The schedule, enqueue and dequeue functions should not be
> > invoked
> > + * when the device is stopped.
> >
> 
> Given this requirement, the question is what happens to events that are "in
> flight" at the time rte_event_dev_stop() is called? Is stop an asynchronous
> operation that quiesces the event _dev and allows in-flight events to drain
> from queues/ports prior to fully stopping, or is some sort of separate
> explicit quiesce mechanism required? If stop is synchronous and simply
> halts the event_dev, then how is an application to know if subsequent
> configure/setup calls would leave these pending events with no place to
> stand?
>

>From an application API perspective rte_event_dev_stop() is a synchronous 
>function.
If the stop has been called for re-configuring the number of queues, ports etc 
of
the device, then "in flight" entry preservation will be implementation defined.
else "in flight" entries will be preserved.

[snip]

> > +extern int
> > +rte_event_dev_socket_id(uint8_t dev_id);
> > +
> > +/* Event device capability bitmap flags */
> > +#define RTE_EVENT_DEV_CAP_QUEUE_QOS(1 << 0)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event queue.
> > + *
> > + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
> > + */
> > +#define RTE_EVENT_DEV_CAP_EVENT_QOS(1 << 1)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event. Priority of each event is supplied in *rte_event*
> > structure
> > + *  on each enqueue operation.
> > + *
> > + *  \see rte_event_enqueue()
> > + */
> > +
> > +/**
> > + * Event device information
> > + */
> > +struct rte_event_dev_info {
> > +   const char *driver_name;/**< Event driver name */
> > +   struct rte_pci_device *pci_dev; /**< PCI information */
> > +   uint32_t min_dequeue_wait_ns;
> > +   /**< Minimum supported global dequeue wait delay(ns) by this
> > device */
> > +   uint32_t max_dequeue_wait_ns;
> > +   /**< Maximum supported global dequeue wait delay(ns) by this
> > device */
> > +   uint32_t dequeue_wait_ns;
> >
> 
> Am I reading this correctly that there is no way to support an indefinite
> waiting capability? Or is this just saying that if a timed wait is
> performed there are min/max limits for the wait duration?

Application can wait indefinite if required. see
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.

Trivial application may not need different wait values on each dequeue.This is
a performance optimization opportunity for implementation.

> 
> 
> > +   /**< Configured global dequeue wait delay(ns) for this device */
> > +   uint8_t max_event_queues;
> > +   /**< Maximum event_queues supported by this device */
> > +   uint32_t max_event_queue_flows;
> > +   /**< Maximum supported flows in an event queue by this device*/
> > +   uint8_t max_event_queue_priority_levels;
> > +   /**< Maximum number of event queue priority levels by this device.
> > +* Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
> > +*/
> > +   uint8_t nb_event_queues;
> > +   /**< Configured number of event queues for this device */
> >
> 
> Is 256 a sufficient number of queues? While various SoCs may have limits,
> why impose such a small limit architecturally?

Each event 

[dpdk-dev] [PATCH] kni: fix unused variable compile error

2016-10-14 Thread Thomas Monjalon
2016-10-14 12:24, Ferruh Yigit:
> compile error:
>   CC [M]  .../lib/librte_eal/linuxapp/kni/kni_misc.o
> cc1: warnings being treated as errors
> .../lib/librte_eal/linuxapp/kni/kni_misc.c: In function ?kni_exit_net?:
> .../lib/librte_eal/linuxapp/kni/kni_misc.c:113:18:
> error: unused variable ?knet?
> 
> For some kernel versions mutex_destroy() is macro and does nothing,
> this cause an unused variable warning for knet which used in mutex_destroy
> 
> Added unused attribute to the knet variable.
> 
> Fixes: 93a298b34e1b ("kni: support core id parameter in single threaded mode")
> 
> Signed-off-by: Ferruh Yigit 

That's why supporting an out-of-tree kernel module is a nightmare.
Compilation breaks everytime with various kernels :(

Please could you tell which Linux versions are affected?


[dpdk-dev] [PATCH v2] net/ixgbe: support multiqueue mode VMDq DCB with SRIOV

2016-10-14 Thread Bernard Iremonger
modify ixgbe_dcb_tx_hw_config function.
modify ixgbe_dev_mq_rx_configure function.
modify ixgbe_configure_dcb function.

Changes in v2:
Rebased to DPDK v16.11-rc1

Signed-off-by: Rahul R Shah 
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_ethdev.c |  9 -
 drivers/net/ixgbe/ixgbe_rxtx.c   | 37 +
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..114698d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1977,6 +1977,8 @@ ixgbe_check_mq_mode(struct rte_eth_dev *dev)
/* check multi-queue mode */
switch (dev_conf->rxmode.mq_mode) {
case ETH_MQ_RX_VMDQ_DCB:
+   PMD_INIT_LOG(INFO, "ETH_MQ_RX_VMDQ_DCB mode supported 
in SRIOV");
+   break;
case ETH_MQ_RX_VMDQ_DCB_RSS:
/* DCB/RSS VMDQ in SRIOV mode, not implement yet */
PMD_INIT_LOG(ERR, "SRIOV active,"
@@ -2012,11 +2014,8 @@ ixgbe_check_mq_mode(struct rte_eth_dev *dev)

switch (dev_conf->txmode.mq_mode) {
case ETH_MQ_TX_VMDQ_DCB:
-   /* DCB VMDQ in SRIOV mode, not implement yet */
-   PMD_INIT_LOG(ERR, "SRIOV is active,"
-   " unsupported VMDQ mq_mode tx %d.",
-   dev_conf->txmode.mq_mode);
-   return -EINVAL;
+   PMD_INIT_LOG(INFO, "ETH_MQ_TX_VMDQ_DCB mode supported 
in SRIOV");
+   break;
default: /* ETH_MQ_TX_VMDQ_ONLY or ETH_MQ_TX_NONE */
dev->data->dev_conf.txmode.mq_mode = 
ETH_MQ_TX_VMDQ_ONLY;
break;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..bb13889 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -3313,15 +3313,16 @@ ixgbe_vmdq_dcb_configure(struct rte_eth_dev *dev)

 /**
  * ixgbe_dcb_config_tx_hw_config - Configure general DCB TX parameters
- * @hw: pointer to hardware structure
+ * @dev: pointer to eth_dev structure
  * @dcb_config: pointer to ixgbe_dcb_config structure
  */
 static void
-ixgbe_dcb_tx_hw_config(struct ixgbe_hw *hw,
+ixgbe_dcb_tx_hw_config(struct rte_eth_dev *dev,
   struct ixgbe_dcb_config *dcb_config)
 {
uint32_t reg;
uint32_t q;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

PMD_INIT_FUNC_TRACE();
if (hw->mac.type != ixgbe_mac_82598EB) {
@@ -3339,11 +3340,17 @@ ixgbe_dcb_tx_hw_config(struct ixgbe_hw *hw,
if (dcb_config->vt_mode)
reg |= IXGBE_MTQC_VT_ENA;
IXGBE_WRITE_REG(hw, IXGBE_MTQC, reg);
-
-   /* Disable drop for all queues */
-   for (q = 0; q < 128; q++)
-   IXGBE_WRITE_REG(hw, IXGBE_QDE,
-   (IXGBE_QDE_WRITE | (q << IXGBE_QDE_IDX_SHIFT)));
+   if (RTE_ETH_DEV_SRIOV(dev).active == 0) {
+   /* Disable drop for all queues in VMDQ mode*/
+   for (q = 0; q < 128; q++)
+   IXGBE_WRITE_REG(hw, IXGBE_QDE,
+   (IXGBE_QDE_WRITE | (q << 
IXGBE_QDE_IDX_SHIFT) | IXGBE_QDE_ENABLE));
+   } else {
+   /* Enable drop for all queues in SRIOV mode */
+   for (q = 0; q < 128; q++)
+   IXGBE_WRITE_REG(hw, IXGBE_QDE,
+   (IXGBE_QDE_WRITE | (q << 
IXGBE_QDE_IDX_SHIFT)));
+   }

/* Enable the Tx desc arbiter */
reg = IXGBE_READ_REG(hw, IXGBE_RTTDCS);
@@ -3378,7 +3385,7 @@ ixgbe_vmdq_dcb_hw_tx_config(struct rte_eth_dev *dev,
vmdq_tx_conf->nb_queue_pools == ETH_16_POOLS ? 0x : 
0x);

/*Configure general DCB TX parameters*/
-   ixgbe_dcb_tx_hw_config(hw, dcb_config);
+   ixgbe_dcb_tx_hw_config(dev, dcb_config);
 }

 static void
@@ -3661,7 +3668,7 @@ ixgbe_dcb_hw_configure(struct rte_eth_dev *dev,
/*get DCB TX configuration parameters from rte_eth_conf*/
ixgbe_dcb_tx_config(dev, dcb_config);
/*Configure general DCB TX parameters*/
-   ixgbe_dcb_tx_hw_config(hw, dcb_config);
+   ixgbe_dcb_tx_hw_config(dev, dcb_config);
break;
default:
PMD_INIT_LOG(ERR, "Incorrect DCB TX 

[dpdk-dev] [PATCH v3 2/2] mempool: pktmbuf pool default fallback for mempool ops error

2016-10-14 Thread Olivier Matz
Hi Hemant,

Sorry for the late answer. Please see some comments inline.

On 10/13/2016 03:15 PM, Hemant Agrawal wrote:
> Hi Olivier,
> Any updates w.r.t this patch set?
> 
> Regards
> Hemant
> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
>> Hi Olivier
>>
>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>>> Hi Hemant,
>>>
>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
 In the rte_pktmbuf_pool_create, if the default external mempool is
 not available, the implementation can default to "ring_mp_mc", which
 is an software implementation.

 Signed-off-by: Hemant Agrawal 
 ---
 Changes in V3:
 * adding warning message to say that falling back to default sw pool
 ---
  lib/librte_mbuf/rte_mbuf.c | 8 
  1 file changed, 8 insertions(+)

 diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
 index 4846b89..8ab0eb1 100644
 --- a/lib/librte_mbuf/rte_mbuf.c
 +++ b/lib/librte_mbuf/rte_mbuf.c
 @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
 unsigned n,

  rte_errno = rte_mempool_set_ops_byname(mp,
  RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
 +
 +/* on error, try falling back to the software based default
 pool */
 +if (rte_errno == -EOPNOTSUPP) {
 +RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
 +"falling back to sw mempool \"ring_mp_mc\"");
 +rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
 NULL);
 +}
 +
  if (rte_errno != 0) {
  RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
  return NULL;

>>>
>>> Without adding a new method ".supported()", the first call to
>>> rte_mempool_populate() could return the same error ENOTSUP. In this
>>> case, it is still possible to fallback.
>>>
>> It will be bit late.
>>
>> On failure, than we have to set the default ops and do a goto before
>> rte_pktmbuf_pool_init(mp, _priv);

I still think we can do the job without adding the .supported() method.
The following code is just an (untested) example:

struct rte_mempool *
rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
{
struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;
int ret;
const char *ops[] = {
RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
};
const char **op;

if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
priv_size);
rte_errno = EINVAL;
return NULL;
}
elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
(unsigned)data_room_size;
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

for (op = [0]; *op != NULL; op++) {
mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
if (mp == NULL)
return NULL;

ret = rte_mempool_set_ops_byname(mp, *op, NULL);
if (ret != 0) {
RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
rte_mempool_free(mp);
if (ret == -ENOTSUP)
continue;
rte_errno = -ret;
return NULL;
}
rte_pktmbuf_pool_init(mp, _priv);

ret = rte_mempool_populate_default(mp);
if (ret < 0) {
rte_mempool_free(mp);
if (ret == -ENOTSUP)
continue;
rte_errno = -ret;
return NULL;
}
}

rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);

return mp;
}


>>> I've just submitted an RFC, which I think is quite linked:
>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>>> Assuming a new parameter "mempool_ops" is added to
>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>>> "ring_mp_mc"? What about just returning ENOTSUP? The application could
>>> do the job and decide which sw fallback to use.
>>
>> We ran into this issue when trying to run the standard DPDK examples
>> (l3fwd) in VM. Do you think, is it practical to add fallback handling in
>> each of the DPDK examples?

OK. What is still unclear for me, is how the software is aware of the
different hardware-assisted handlers. Moreover, we could imagine more
software handlers, which could be used depending on the use case.

I think this choice has to be made by the user or the application:

- the application may want to use a specific (sw or hw) handler: in
  this case, it want to be notified if it fails, instead of having
  a quiet fallback to ring_mp_mc
- if several handlers are available, the application may want to
  try them in a specific order
- maybe some handlers will have some limitations with some
  configurations or driver? The 

[dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation

2016-10-14 Thread Kulasek, TomaszX
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, October 13, 2016 21:21
> To: Kulasek, TomaszX 
> Cc: dev at dpdk.org; Ananyev, Konstantin 
> Subject: Re: [PATCH v5 1/6] ethdev: add Tx preparation
> 
> Hi,
> 
> 2016-10-13 19:36, Tomasz Kulasek:
> > Added API for `rte_eth_tx_prep`
> >
> > uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> > struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> >
> > Added fields to the `struct rte_eth_desc_lim`:
> >
> > uint16_t nb_seg_max;
> > /**< Max number of segments per whole packet. */
> >
> > uint16_t nb_mtu_seg_max;
> > /**< Max number of segments per one MTU */
> >
> > Created `rte_pkt.h` header with common used functions:
> 
> Same comment as in previous revision:
> this description lacks the usability and performance considerations.
> 
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id
> __rte_unused,
> > +   struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> 
> Doxygen still do not parse it well (same issue as previous revision).
> 
> > +/**
> > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> before
> > + * hardware tx checksum.
> > + * For non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set.
> > + * For TSO the IP payload length is not included.
> > + */
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> 
> You probably don't need this function since the recent improvements from
> Olivier.

Do you mean this improvement: "net: add function to calculate a checksum in a 
mbuf"
http://dpdk.org/dev/patchwork/patch/16542/

I see only full raw checksum computation on mbuf in Olivier patches, while this 
function counts only pseudo-header checksum to be used with tx offload.

Tomasz


[dpdk-dev] [PATCH] kni: fix unused variable compile error

2016-10-14 Thread Ferruh Yigit
compile error:
  CC [M]  .../lib/librte_eal/linuxapp/kni/kni_misc.o
cc1: warnings being treated as errors
.../lib/librte_eal/linuxapp/kni/kni_misc.c: In function ?kni_exit_net?:
.../lib/librte_eal/linuxapp/kni/kni_misc.c:113:18:
error: unused variable ?knet?

For some kernel versions mutex_destroy() is macro and does nothing,
this cause an unused variable warning for knet which used in mutex_destroy

Added unused attribute to the knet variable.

Fixes: 93a298b34e1b ("kni: support core id parameter in single threaded mode")

Signed-off-by: Ferruh Yigit 
---
 lib/librte_eal/linuxapp/kni/kni_misc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 3303d9b..497db9b 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -110,9 +110,11 @@ kni_init_net(struct net *net)
 static void __net_exit
 kni_exit_net(struct net *net)
 {
-   struct kni_net *knet = net_generic(net, kni_net_id);
+   struct kni_net *knet __maybe_unused;

+   knet = net_generic(net, kni_net_id);
mutex_destroy(>kni_kthread_lock);
+
 #ifndef HAVE_SIMPLIFIED_PERNET_OPERATIONS
kfree(knet);
 #endif
-- 
2.7.4



[dpdk-dev] [PATCH v2] examples/l3fwd: em: use hw accelerated crc hash function for arm64

2016-10-14 Thread Hemant Agrawal
On 10/13/2016 7:06 PM, Jerin Jacob wrote:
> On Fri, Oct 14, 2016 at 12:17:05AM +0530, Hemant Agrawal wrote:
>> if machine level CRC extension are available, offload the
>> hash to machine provide functions e.g. armv8-a CRC extensions
>> support it
>>
>> Signed-off-by: Hemant Agrawal 
>> Reviewed-by: Jerin Jacob 
>> ---
>>  examples/l3fwd/l3fwd_em.c | 24 ++--
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
>> index 89a68e6..d92d0aa 100644
>> --- a/examples/l3fwd/l3fwd_em.c
>> +++ b/examples/l3fwd/l3fwd_em.c
>> @@ -57,13 +57,17 @@
>>
>>  #include "l3fwd.h"
>>
>> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) && 
>> defined(RTE_MACHINE_CPUFLAG_CRC32)
>
> The will evaluate as FALSE always.
>
> Please change to logical OR operation here. ie #if 
> defined(RTE_MACHINE_CPUFLAG_SSE4_2) ||
> defined(RTE_MACHINE_CPUFLAG_CRC32)
>
Oops! Will fix it.

>> +#define EM_HASH_CRC 1
>> +#endif
>




[dpdk-dev] [PATCH v9] drivers/net:new PMD using tun/tap host interface

2016-10-14 Thread Ferruh Yigit
On 10/13/2016 11:03 PM, Keith Wiles wrote:
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> on the local host. The PMD allows for DPDK and the host to
> communicate using a raw device interface on the host and in
> the DPDK application. The device created is a Tap device with
> a L2 packet header.
> 
> v9 - Fix up the docs to use correct syntax
> v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
> v7 - Reword the comment in common_base and fix the data->name issue
> v6 - fixed the checkpatch issues
> v5 - merge in changes from list review see related emails
>  fixed many minor edits
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux
>  Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch
>  Fix a typo on naming the tap device
>  Update the maintainers list
> 
> Signed-off-by: Keith Wiles 
> ---

Reviewed-by: Ferruh Yigit 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Hemant Agrawal
 Hi Bill/Jerin,

> 
> Thanks for the review.
> 
> [snip]
> > > + * If the device init operation is successful, the correspondence
> > > + between
> > > + * the device identifier assigned to the new device and its
> > > + associated
> > > + * *rte_event_dev* structure is effectively registered.
> > > + * Otherwise, both the *rte_event_dev* structure and the device
> > > identifier are
> > > + * freed.
> > > + *
> > > + * The functions exported by the application Event API to setup a
> > > + device
> > > + * designated by its device identifier must be invoked in the
> > > + following
> > > order:
> > > + * - rte_event_dev_configure()
> > > + * - rte_event_queue_setup()
> > > + * - rte_event_port_setup()
> > > + * - rte_event_port_link()
> > > + * - rte_event_dev_start()
> > > + *
> > > + * Then, the application can invoke, in any order, the functions
> > > + * exported by the Event API to schedule events, dequeue events,
> > > + enqueue
> > > events,
> > > + * change event queue(s) to event port [un]link establishment and so on.
> > > + *
> > > + * Application may use rte_event_[queue/port]_default_conf_get() to
> > > + get
> > > the
> > > + * default configuration to set up an event queue or event port by
> > > + * overriding few default values.
> > > + *
> > > + * If the application wants to change the configuration (i.e. call
> > > + * rte_event_dev_configure(), rte_event_queue_setup(), or
> > > + * rte_event_port_setup()), it must call rte_event_dev_stop() first
> > > + to
> > > stop the
> > > + * device and then do the reconfiguration before calling
> > > rte_event_dev_start()
> > > + * again. The schedule, enqueue and dequeue functions should not be
> > > invoked
> > > + * when the device is stopped.
> > >
> >
> > Given this requirement, the question is what happens to events that
> > are "in flight" at the time rte_event_dev_stop() is called? Is stop an
> > asynchronous operation that quiesces the event _dev and allows
> > in-flight events to drain from queues/ports prior to fully stopping,
> > or is some sort of separate explicit quiesce mechanism required? If
> > stop is synchronous and simply halts the event_dev, then how is an
> > application to know if subsequent configure/setup calls would leave
> > these pending events with no place to stand?
> >
> 
> From an application API perspective rte_event_dev_stop() is a synchronous
> function.
> If the stop has been called for re-configuring the number of queues, ports 
> etc of
> the device, then "in flight" entry preservation will be implementation 
> defined.
> else "in flight" entries will be preserved.
> 
> [snip]
> 
> > > +extern int
> > > +rte_event_dev_socket_id(uint8_t dev_id);
> > > +
> > > +/* Event device capability bitmap flags */
> > > +#define RTE_EVENT_DEV_CAP_QUEUE_QOS(1 << 0)
> > > +/**< Event scheduling prioritization is based on the priority
> > > +associated
> > > with
> > > + *  each event queue.
> > > + *
> > > + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
> > > +*/
> > > +#define RTE_EVENT_DEV_CAP_EVENT_QOS(1 << 1)
> > > +/**< Event scheduling prioritization is based on the priority
> > > +associated
> > > with
> > > + *  each event. Priority of each event is supplied in *rte_event*
> > > structure
> > > + *  on each enqueue operation.
> > > + *
> > > + *  \see rte_event_enqueue()
> > > + */
> > > +
> > > +/**
> > > + * Event device information
> > > + */
> > > +struct rte_event_dev_info {
> > > +   const char *driver_name;/**< Event driver name */
> > > +   struct rte_pci_device *pci_dev; /**< PCI information */
> > > +   uint32_t min_dequeue_wait_ns;
> > > +   /**< Minimum supported global dequeue wait delay(ns) by this
> > > device */
> > > +   uint32_t max_dequeue_wait_ns;
> > > +   /**< Maximum supported global dequeue wait delay(ns) by this
> > > device */
> > > +   uint32_t dequeue_wait_ns;
> > >
> >
> > Am I reading this correctly that there is no way to support an
> > indefinite waiting capability? Or is this just saying that if a timed
> > wait is performed there are min/max limits for the wait duration?
> 
> Application can wait indefinite if required. see
> RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.
> 
> Trivial application may not need different wait values on each dequeue.This 
> is a
> performance optimization opportunity for implementation.

 Jerin, It is irrespective of wait configuration, whether you are using per 
device wait or per dequeuer wait. 
 Can the value of MAX_U32 or MAX_U64 be treated as infinite weight? 

> 
> >
> >
> > > +   /**< Configured global dequeue wait delay(ns) for this device */
> > > +   uint8_t max_event_queues;
> > > +   /**< Maximum event_queues supported by this device */
> > > +   uint32_t max_event_queue_flows;
> > > +   /**< Maximum supported flows in an event queue by this device*/
> > > +   uint8_t max_event_queue_priority_levels;
> > > +   

[dpdk-dev] 17.02 Roadmap

2016-10-14 Thread Stephen Hemminger
On Mon, 10 Oct 2016 16:13:42 +
"O'Driscoll, Tim"  wrote:

> We published our initial roadmap for 17.02 at the end of August. Since then 
> we've been doing more detailed planning and would like to provide an update 
> on the features that we plan to submit for this release. This is our current 
> plan, which should hopefully remain fairly stable now:
> 
> Consistent Filter API: Add support for the Consistent Filter API (see 
> http://dpdk.org/ml/archives/dev/2016-September/047924.html) for IGB, IXGBE 
> and I40E.
> 
> Elastic Flow Distributor: The Elastic Flow Distributor (EFD) is a flow-based 
> load balancing library which scales linearly for both lookup and insert with 
> the number of threads or cores.  EFD lookup uses a "perfect hashing" scheme 
> where only the information needed to compute a key's value (and not the key 
> itself) is stored in the lookup table, thus reducing CPU cache storage 
> requirements. 
> 
> Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC 
> Stats (Xstats) implementation to support the collection and reporting of 
> latency and bit rate measurements. Latency statistics will include min, max 
> and average latency, and jitter. Bit rate statistics will include peak and 
> average bit rate aggregated over a user-defined time period. This will be 
> implemented for IXGBE and I40E.
> 
> Run-Time Configuration of Packet Type (PTYPE) for I40E: At the moment all 
> packet types in DPDK are statically defined. This makes impossible to add new 
> values without first defining them statically and then recompiling DPDK. The 
> ability to configure packet types at run time will be added for I40E.
> 
> Packet Distributor Enhancements: Enhancements will be made to the Packet 
> Distributor library to improve performance:
> 1. Introduce burst functionality to allow batches of packets to be sent to 
> workers.
> 2. Improve the performance of the flow/core affinity through the use of 
> SSE/AVX instructions.
> 
> Add MACsec for IXGBE: MACsec support will be added for IXGBE. Ethdev API 
> primitives will be added to create/delete/enable/disable SC/SA, Next_PN etc. 
> similar to those used in Linux for the macsec_ops. Sample apps (l3fwd, 
> testpmd, etc.) will be updated to support MACsec for the IXGBE. 
> 
> Enhance AESNI_GCM PMD: The current AESNI_GCM PMD is limited to AES-128 and 
> does not support other features such as "any AAD length value". It will be 
> updated to use a newer GCM implementation supporting AES128/192/256 and other 
> features.
> 
> Create Crypto Performance Test App: A new app, similar to testpmd, will be 
> created to allow crypto performance to be tested using any crypto PMD and any 
> supported crypto algorithm.
> 
> Enable Cipher-Only and Hash-Only Support in AESNI_MB PMD: Support will be 
> added for cipher-only and hash-only operations in the AESNI_MB PMD.
> 
> Support Chained Mbufs in Cryptodev: Currently, an application using the 
> cryptodev API needs to reserve a continuous block of memory for mbufs. 
> Support will be added for chaining of mbufs in both the QAT and SW PMDs 
> supported by cryptodev.
> 
> Optimize Vhost-User Performance for Large Packets: A new memory copy function 
> optimized for core-to-core memory copy which will be added. This will be 
> beneficial for virtualization cases involving large packets, but it can be 
> used for other core-to-core cases as well.
> 
> Support New Device Types in Vhost-User: Support will be added to vhost-user 
> for new device types including vhost-scsi and vhost-blk.
> 
> Interrupt Mode Support in Virtio PMD: Support for interrupt mode will be 
> added to the virtio PMD.
> 
> Virtio-User as an Alternative Exception Path: Investigate the use of 
> virtio-user and vhost-net as an alternative exception path to KNI that does 
> not require out of tree drivers. This work is still at an experimental stage, 
> so it may not be included in 17.02.
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of O'Driscoll, Tim
> > Sent: Wednesday, August 31, 2016 11:32 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] 17.02 Roadmap
> > 
> > Below are the features that we're planning to submit for the 17.02
> > release. We'll submit a patch to update the roadmap page with this info.
> > 
> > Some things will obviously change during planning/development, so we'll
> > provide a more detailed update in late September/early October. After
> > that, things should hopefully be relatively stable.
> > 
> > It would be good if others are also willing to share their plans so that
> > we can build up a complete picture of what's planned for 17.02 and make
> > sure there's no duplication.
> > 
> > 
> > Consistent Filter API phase 2: Extend support for the Consistent Filter
> > API that will be first implemented in 16.11 to IGB and FM10K.
> > 
> > Elastic Flow Distributor: The Elastic Flow Distributor (EFD) is a flow-
> > based load balancing library which scales 

[dpdk-dev] [PATCH] mempool: Add sanity check when secondary link in less mempools than primary

2016-10-14 Thread Olivier Matz
Hi Jean,

On 10/12/2016 10:04 PM, Jean Tourrilhes wrote:
> mempool: Add sanity check when secondary link in less mempools than primary
> 
> If the primary and secondary process were build using different build
> systems, the list of constructors included by the linker in each
> binary might be different. Mempools are registered via constructors, so
> the linker magic will directly impact which tailqs are registered with
> the primary and the secondary.
> 
> DPDK currently assumes that the secondary has a superset of the
> mempools registered at the primary, and they are in the same order
> (same index in primary and secondary). In some build scenario, the
> secondary might not initialise any mempools at all. This would result
> in an obscure segfault when trying to use the mempool. Instead, fail
> early with a more explicit error message.
> 
> Signed-off-by: Jean Tourrilhes 
> ---
>  lib/librte_mempool/rte_mempool.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c 
> b/lib/librte_mempool/rte_mempool.c
> index 2e28e2e..4fe9158 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -1275,6 +1275,16 @@ rte_mempool_lookup(const char *name)
>   return NULL;
>   }
>  
> + /* Sanity check : secondary may have initialised less mempools
> +  * than primary due to linker and constructor magic. Note that
> +  * this does not address the case where the constructor order
> +  * is different between primary and secondary and where the index
> +  * points to the wrong ops. Jean II */
> + if(mp->ops_index >= (int32_t) rte_mempool_ops_table.num_ops) {
> + /* Do not dump mempool list, it will segfault. */
> + rte_panic("Cannot find ops for mempool, ops_index %d, num_ops 
> %d - maybe due to build process or linker configuration\n", mp->ops_index, 
> rte_mempool_ops_table.num_ops);
> + }
> +
>   return mp;
>  }
>  
> 

I'm not really fan of this. I think the configuration and build system
of primary and secondaries should be the same to avoid this kind of
issues. Some other issues may happen if the configuration is different,
for instance the size of structures may be different.

There is already a lot of mess due to primary/secondary at many places
in the code, I'm not sure adding more is really desirable.

Regards,
Olivier


[dpdk-dev] [PATCH v3] vhost: Only access header if offloading is supported in dequeue path

2016-10-14 Thread Maxime Coquelin
If offloading features are not negotiated, parsing the virtio header
is not needed.

Micro-benchmark with testpmd shows that the gain is +4% with indirect
descriptors, +1% when using direct descriptors.

Signed-off-by: Maxime Coquelin 
---
Changes since v2:
=
 - Simplify code by translating first desc address
   unconditionnaly (Yuanhan)
 - Instead of checking features again, check whether
   hdr has been assign to call offload function.

Changes since v1:
=
 - Rebased
 - Fix early out check in vhost_dequeue_offload

 lib/librte_vhost/virtio_net.c | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 812e5d3..15ef0b0 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -555,6 +555,18 @@ rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
return virtio_dev_rx(dev, queue_id, pkts, count);
 }

+static inline bool
+virtio_net_with_host_offload(struct virtio_net *dev)
+{
+   if (dev->features &
+   (VIRTIO_NET_F_CSUM | VIRTIO_NET_F_HOST_ECN |
+VIRTIO_NET_F_HOST_TSO4 | VIRTIO_NET_F_HOST_TSO6 |
+VIRTIO_NET_F_HOST_UFO))
+   return true;
+
+   return false;
+}
+
 static void
 parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
 {
@@ -607,6 +619,9 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct 
rte_mbuf *m)
void *l4_hdr = NULL;
struct tcp_hdr *tcp_hdr = NULL;

+   if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+   return;
+
parse_ethernet(m, _proto, _hdr);
if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
if (hdr->csum_start == (m->l2_len + m->l3_len)) {
@@ -702,7 +717,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc 
*descs,
uint32_t mbuf_avail, mbuf_offset;
uint32_t cpy_len;
struct rte_mbuf *cur = m, *prev = m;
-   struct virtio_net_hdr *hdr;
+   struct virtio_net_hdr *hdr = NULL;
/* A counter to avoid desc dead loop chain */
uint32_t nr_desc = 1;

@@ -715,8 +730,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
vring_desc *descs,
if (unlikely(!desc_addr))
return -1;

-   hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
-   rte_prefetch0(hdr);
+   if (virtio_net_with_host_offload(dev)) {
+   hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
+   rte_prefetch0(hdr);
+   }

/*
 * A virtio driver normally uses at least 2 desc buffers
@@ -733,18 +750,18 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
vring_desc *descs,
if (unlikely(!desc_addr))
return -1;

-   rte_prefetch0((void *)(uintptr_t)desc_addr);
-
desc_offset = 0;
desc_avail  = desc->len;
nr_desc+= 1;
-
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
} else {
desc_avail  = desc->len - dev->vhost_hlen;
desc_offset = dev->vhost_hlen;
}

+   rte_prefetch0((void *)(uintptr_t)(desc_addr + desc_offset));
+
+   PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), desc_avail, 0);
+
mbuf_offset = 0;
mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
while (1) {
@@ -831,7 +848,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc 
*descs,
prev->data_len = mbuf_offset;
m->pkt_len+= mbuf_offset;

-   if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)
+   if (hdr)
vhost_dequeue_offload(hdr, m);

return 0;
-- 
2.7.4



[dpdk-dev] [PATCH] net/mlx5: fix hash key size retrieval

2016-10-14 Thread Nelio Laranjeiro
Return RSS key size in struct rte_eth_dev_info.

Fixes: 0f6f219e7919 ("app/testpmd: fix RSS hash key size")

Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index c1c2d26..b8b3ea9 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -601,6 +601,9 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
 * size if it is not fixed.
 * The API should be updated to solve this problem. */
info->reta_size = priv->ind_table_max_size;
+   info->hash_key_size = ((*priv->rss_conf) ?
+  (*priv->rss_conf)[0]->rss_key_len :
+  0);
info->speed_capa =
ETH_LINK_SPEED_1G |
ETH_LINK_SPEED_10G |
-- 
2.1.4



[dpdk-dev] [PATCH v2 5/5] maintainers: claim i40e vector PMD on ARM

2016-10-14 Thread Jianbo Liu
Signed-off-by: Jianbo Liu 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5fa82..621bda6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -151,6 +151,7 @@ F: lib/librte_acl/acl_run_neon.*
 F: lib/librte_lpm/rte_lpm_neon.h
 F: lib/librte_hash/rte*_arm64.h
 F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+F: drivers/net/i40e/i40e_rxtx_vec_neon.c
 F: drivers/net/virtio/virtio_rxtx_simple_neon.c

 EZchip TILE-Gx
-- 
2.4.11



[dpdk-dev] [PATCH v2 4/5] i40e: make vector driver filenames consistent

2016-10-14 Thread Jianbo Liu
To be consistent with the naming for ARM NEON implementation,
i40e_rxtx_vec.c is renamed to i40e_rxtx_vec_sse.c.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile | 4 ++--
 drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} | 0
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (100%)

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 9e92b38..13085fb 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -100,7 +100,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
 ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
 else
-SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_sse.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
@@ -108,7 +108,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c

 # vector PMD driver needs SSE4.1 support
 ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_1,$(CFLAGS)),)
-CFLAGS_i40e_rxtx_vec.o += -msse4.1
+CFLAGS_i40e_rxtx_vec_sse.o += -msse4.1
 endif


diff --git a/drivers/net/i40e/i40e_rxtx_vec.c 
b/drivers/net/i40e/i40e_rxtx_vec_sse.c
similarity index 100%
rename from drivers/net/i40e/i40e_rxtx_vec.c
rename to drivers/net/i40e/i40e_rxtx_vec_sse.c
-- 
2.4.11



[dpdk-dev] [PATCH v2 3/5] i40e: enable i40e vector PMD on ARMv8a platform

2016-10-14 Thread Jianbo Liu
Signed-off-by: Jianbo Liu 
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 doc/guides/nics/features/i40e_vec.ini  | 1 +
 doc/guides/nics/features/i40e_vf_vec.ini   | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index a0f4473..6321884 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -45,6 +45,5 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
 CONFIG_RTE_EAL_IGB_UIO=n

 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/features/i40e_vec.ini 
b/doc/guides/nics/features/i40e_vec.ini
index 0953d84..edd6b71 100644
--- a/doc/guides/nics/features/i40e_vec.ini
+++ b/doc/guides/nics/features/i40e_vec.ini
@@ -37,3 +37,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
diff --git a/doc/guides/nics/features/i40e_vf_vec.ini 
b/doc/guides/nics/features/i40e_vf_vec.ini
index 2a44bf6..d6674f7 100644
--- a/doc/guides/nics/features/i40e_vf_vec.ini
+++ b/doc/guides/nics/features/i40e_vf_vec.ini
@@ -26,3 +26,4 @@ Linux UIO= Y
 Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
+ARMv8= Y
-- 
2.4.11



[dpdk-dev] [PATCH v2 2/5] i40e: implement vector PMD for ARM architecture

2016-10-14 Thread Jianbo Liu
Use ARM NEON intrinsic to implement i40e vPMD

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/Makefile |   4 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 614 ++
 2 files changed, 618 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 53fe145..9e92b38 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -97,7 +97,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_dcb.c

 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
+ifeq ($(CONFIG_RTE_ARCH_ARM64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec_neon.c
+else
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_INC_VECTOR) += i40e_rxtx_vec.c
+endif
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
new file mode 100644
index 000..011c54e
--- /dev/null
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -0,0 +1,614 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2016, Linaro Limited
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "base/i40e_prototype.h"
+#include "base/i40e_type.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"
+
+#include 
+
+#pragma GCC diagnostic ignored "-Wcast-qual"
+
+static inline void
+i40e_rxq_rearm(struct i40e_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union i40e_rx_desc *rxdp;
+   struct i40e_rx_entry *rxep = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   uint64x2_t dma_addr0, dma_addr1;
+   uint64x2_t zero = vdupq_n_u64(0);
+   uint64_t paddr;
+   uint8x8_t p;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (unlikely(rte_mempool_get_bulk(rxq->mp,
+ (void *)rxep,
+ RTE_I40E_RXQ_REARM_THRESH) < 0)) {
+   if (rxq->rxrearm_nb + RTE_I40E_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   vst1q_u64((uint64_t *)[i].read, zero);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_I40E_RXQ_REARM_THRESH;
+   return;
+   }
+
+   p = vld1_u8((uint8_t *)>mbuf_initializer);
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+/* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+*/
+  

[dpdk-dev] [PATCH v2 1/5] i40e: extract non-x86 specific code from vector driver

2016-10-14 Thread Jianbo Liu
move scalar code which does not use x86 intrinsic functions to new file
"i40e_rxtx_vec_common.h", while keeping x86 code in i40e_rxtx_vec.c.
This allows the scalar code to to be shared among vector drivers for
different platforms.

Signed-off-by: Jianbo Liu 
---
 drivers/net/i40e/i40e_rxtx_vec.c| 196 +
 drivers/net/i40e/i40e_rxtx_vec_common.h | 251 
 2 files changed, 255 insertions(+), 192 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 0ee0241..3607312 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -39,6 +39,7 @@
 #include "base/i40e_type.h"
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
+#include "i40e_rxtx_vec_common.h"

 #include 

@@ -445,68 +446,6 @@ i40e_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }

-static inline uint16_t
-reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs,
-  uint16_t nb_bufs, uint8_t *split_flags)
-{
-   struct rte_mbuf *pkts[RTE_I40E_VPMD_RX_BURST]; /*finished pkts*/
-   struct rte_mbuf *start = rxq->pkt_first_seg;
-   struct rte_mbuf *end =  rxq->pkt_last_seg;
-   unsigned pkt_idx, buf_idx;
-
-   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
-   if (end != NULL) {
-   /* processing a split packet */
-   end->next = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-
-   start->nb_segs++;
-   start->pkt_len += rx_bufs[buf_idx]->data_len;
-   end = end->next;
-
-   if (!split_flags[buf_idx]) {
-   /* it's the last packet of the set */
-   start->hash = end->hash;
-   start->ol_flags = end->ol_flags;
-   /* we need to strip crc for the whole packet */
-   start->pkt_len -= rxq->crc_len;
-   if (end->data_len > rxq->crc_len) {
-   end->data_len -= rxq->crc_len;
-   } else {
-   /* free up last mbuf */
-   struct rte_mbuf *secondlast = start;
-
-   while (secondlast->next != end)
-   secondlast = secondlast->next;
-   secondlast->data_len -= (rxq->crc_len -
-   end->data_len);
-   secondlast->next = NULL;
-   rte_pktmbuf_free_seg(end);
-   end = secondlast;
-   }
-   pkts[pkt_idx++] = start;
-   start = end = NULL;
-   }
-   } else {
-   /* not processing a split packet */
-   if (!split_flags[buf_idx]) {
-   /* not a split packet, save and skip */
-   pkts[pkt_idx++] = rx_bufs[buf_idx];
-   continue;
-   }
-   end = start = rx_bufs[buf_idx];
-   rx_bufs[buf_idx]->data_len += rxq->crc_len;
-   rx_bufs[buf_idx]->pkt_len += rxq->crc_len;
-   }
-   }
-
-   /* save the partial packet for next time */
-   rxq->pkt_first_seg = start;
-   rxq->pkt_last_seg = end;
-   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
-   return pkt_idx;
-}
-
  /* vPMD receive routine that reassembles scattered packets
  * Notice:
  * - nb_pkts < RTE_I40E_DESCS_PER_LOOP, just return no packet
@@ -572,73 +511,6 @@ vtx(volatile struct i40e_tx_desc *txdp,
vtx1(txdp, *pkt, flags);
 }

-static inline int __attribute__((always_inline))
-i40e_tx_free_bufs(struct i40e_tx_queue *txq)
-{
-   struct i40e_tx_entry *txep;
-   uint32_t n;
-   uint32_t i;
-   int nb_free = 0;
-   struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
-
-   /* check DD bits on threshold descriptor */
-   if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-   rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
-   return 0;
-
-   n = txq->tx_rs_thresh;
-
-/* first buffer to free from S/W ring is at index
- * tx_next_dd - (tx_rs_thresh-1)
- */
-   txep = >sw_ring[txq->tx_next_dd - (n - 1)];
-   m = 

[dpdk-dev] [PATCH v2 0/5] i40e: vector poll-mode driver on ARM64

2016-10-14 Thread Jianbo Liu
This patch set is to implement i40e vector PMD on ARM64.
For x86, vPMD is only reorganized, there should be no performance loss.

v1 -> v2
- rebase to dpdk-next-net/rel_16_11

Jianbo Liu (5):
  i40e: extract non-x86 specific code from vector driver
  i40e: implement vector PMD for ARM architecture
  i40e: enable i40e vector PMD on ARMv8a platform
  i40e: make vector driver filenames consistent
  maintainers: claim i40e vector PMD on ARM

 MAINTAINERS|   1 +
 config/defconfig_arm64-armv8a-linuxapp-gcc |   1 -
 doc/guides/nics/features/i40e_vec.ini  |   1 +
 doc/guides/nics/features/i40e_vf_vec.ini   |   1 +
 drivers/net/i40e/Makefile  |   8 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h| 251 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c  | 614 +
 .../i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c}  | 196 +--
 8 files changed, 878 insertions(+), 195 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_common.h
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
 rename drivers/net/i40e/{i40e_rxtx_vec.c => i40e_rxtx_vec_sse.c} (78%)

-- 
2.4.11



[dpdk-dev] [PATCH] eal: avoid unnecessary conflicts over rte_config file

2016-10-14 Thread John Ousterhout
It sounds like my patch would break some existing software, so it probably
doesn't make sense right now.

I'd still argue that the current mechanism has a number of problems, and it
should probably undergo a comprehensive overhaul at some point in the
future.

-John-

On Thu, Oct 13, 2016 at 2:39 PM, Tahhan, Maryam 
wrote:

> > Hi John,
> >
> > > Before this patch, DPDK used the file ~/.rte_config as a lock to
> > > detect potential interference between multiple DPDK applications
> > > running on the same machine. However, if a single user ran DPDK
> > > applications concurrently on several different machines, and if the
> > > user's home directory was shared between the machines via NFS, DPDK
> > > would incorrectly detect conflicts for all but the first application
> > > and abort them. This patch fixes the problem by incorporating the
> > > machine name into the config file name (e.g., ~/.rte_hostname_config).
> > >
> > > Signed-off-by: John Ousterhout 
> > > ---
> > >  doc/guides/prog_guide/multi_proc_support.rst | 11 +++
> > >  lib/librte_eal/common/eal_common_proc.c  |  8 ++--
> > >  lib/librte_eal/common/eal_filesystem.h   | 15 +--
> > >  3 files changed, 22 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/multi_proc_support.rst
> > > b/doc/guides/prog_guide/multi_proc_support.rst
> > > index badd102..a54fa1c 100644
> > > --- a/doc/guides/prog_guide/multi_proc_support.rst
> > > +++ b/doc/guides/prog_guide/multi_proc_support.rst
> > > @@ -129,10 +129,13 @@ Support for this usage scenario is provided
> > > using the ``--file-prefix`` paramete
> > >
> > >  By default, the EAL creates hugepage files on each hugetlbfs
> > > filesystem using the rtemap_X filename,  where X is in the range 0 to
> the
> > maximum number of hugepages -1.
> > > -Similarly, it creates shared configuration files, memory mapped in
> > > each process, using the /var/run/.rte_config filename, -when run as
> > > root (or $HOME/.rte_config when run as a non-root user; -if filesystem
> and
> > device permissions are set up to allow this).
> > > -The rte part of the filenames of each of the above is configurable
> using the
> > file-prefix parameter.
> > > +Similarly, it creates shared configuration files, memory mapped in
> each
> > process.
> > > +When run as root, the name of the configuration file will be
> > > +/var/run/.rte_*host*_config, where *host* is the name of the machine.
> > > +When run as a non-root user, the the name of the configuration file
> > > +will be $HOME/.rte_*host*_config (if filesystem and device permissions
> > are set up to allow this).
> > > +If the ``--file-prefix`` parameter has been specified, its value will
> > > +be used in place of "rte" in the file names.
> >
> > I am not sure that we need to handle all such cases inside EAL.
> > User can easily overcome that problem by just adding something like:
> > --file-prefix=`uname -n`
> > to his command-line.
> > Konstantin
> >
>
> I agree with Konstantin, there's no need to include the hostname in the
> rte config file + I'm not sure this will be backward compatible with
> existing DPDK applications that use a secondary processes that use the
> config file (as in, multiprocess DPDK applications in use would all need to
> be updated). What Konstantin suggests fixes the issue you were encountering
> without breaking backward compatibility.
> Is addition, the hostname is not unique... you could in theory have 2
> hosts with the same hostname and encounter the issue you were seeing again.
>
> > >
> > >  In addition to specifying the file-prefix parameter,  any DPDK
> > > applications that are to be run side-by-side must explicitly limit
> their
> > memory use.
> > > diff --git a/lib/librte_eal/common/eal_common_proc.c
> > > b/lib/librte_eal/common/eal_common_proc.c
> > > index 12e0fca..517aa0c 100644
> > > --- a/lib/librte_eal/common/eal_common_proc.c
> > > +++ b/lib/librte_eal/common/eal_common_proc.c
> > > @@ -45,12 +45,8 @@ rte_eal_primary_proc_alive(const char
> > > *config_file_path)
> > >
> > > if (config_file_path)
> > > config_fd = open(config_file_path, O_RDONLY);
> > > -   else {
> > > -   char default_path[PATH_MAX+1];
> > > -   snprintf(default_path, PATH_MAX, RUNTIME_CONFIG_FMT,
> > > -default_config_dir, "rte");
> > > -   config_fd = open(default_path, O_RDONLY);
> > > -   }
> > > +   else
> > > +   config_fd = open(eal_runtime_config_path(), O_RDONLY);
> > > if (config_fd < 0)
> > > return 0;
> > >
> > > diff --git a/lib/librte_eal/common/eal_filesystem.h
> > > b/lib/librte_eal/common/eal_filesystem.h
> > > index fdb4a70..4929aa3 100644
> > > --- a/lib/librte_eal/common/eal_filesystem.h
> > > +++ b/lib/librte_eal/common/eal_filesystem.h
> > > @@ -41,7 +41,7 @@
> > >  #define EAL_FILESYSTEM_H
> > >
> > >  /** Path of rte config file. */
> > > -#define RUNTIME_CONFIG_FMT "%s/.%s_config"
> > > +#define 

[dpdk-dev] [PATCH] mempool: Add sanity check when secondary link in less mempools than primary

2016-10-14 Thread Jean Tourrilhes
On Fri, Oct 14, 2016 at 10:23:31AM +0200, Olivier Matz wrote:
> Hi Jean,
> 
> I'm not really fan of this. I think the configuration and build system
> of primary and secondaries should be the same to avoid this kind of
> issues.

You are not going to convert all existing applications to the
DPDK build system. I believe that restricting the build system is
irrealistic, it would restrict DPDK secondary only to toy examples.
Note that libdpdk.a is tricky to use outside the DPDK build
system and require some quirks even for primary applications (see
Snort DPDK patches). I would say that DPDK is not very friendly to
foreign applications and their build system in general.

> Some other issues may happen if the configuration is different,
> for instance the size of structures may be different.

Impossible, because then libdpdk.a would not work. Remember we
are talking of using the exact same libdpdk.a in primary and
secondary, and therefore any structure used in libdpdk.a has to
match. And the structures used in the app has to match libdpdk.a as
well.

> There is already a lot of mess due to primary/secondary at many places
> in the code, I'm not sure adding more is really desirable.

Yes, one solution is obviously to get rid of secondary entirely.
Personally, I believe it's pretty close to working, the number
of issues I found is manageable. I have a complex application (Snort)
working that way without any issues. If DPDK wants to support
secondary, you might as well make it work for everybody.
We could discuss better solutions to those issues. For
example, the tailq subsystem has a better solution. But, I'm not going
to waste time if secondary is deprecated.

> Regards,
> Olivier

Regards,

Jean


[dpdk-dev] [PATCH v2] vhost: Only access header if offloading is supported in dequeue path

2016-10-14 Thread Maxime Coquelin


On 10/11/2016 11:01 AM, Yuanhan Liu wrote:
> On Tue, Oct 11, 2016 at 09:45:27AM +0200, Maxime Coquelin wrote:
>> @@ -684,12 +699,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vring_desc *descs,
>>struct rte_mempool *mbuf_pool)
>>  {
>>  struct vring_desc *desc;
>> -uint64_t desc_addr;
>> +uint64_t desc_addr = 0;
>>  uint32_t desc_avail, desc_offset;
>>  uint32_t mbuf_avail, mbuf_offset;
>>  uint32_t cpy_len;
>>  struct rte_mbuf *cur = m, *prev = m;
>> -struct virtio_net_hdr *hdr;
>> +struct virtio_net_hdr *hdr = NULL;
>>  /* A counter to avoid desc dead loop chain */
>>  uint32_t nr_desc = 1;
>>
>> @@ -698,12 +713,14 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vring_desc *descs,
>>  (desc->flags & VRING_DESC_F_INDIRECT))
>>  return -1;
>>
>> -desc_addr = gpa_to_vva(dev, desc->addr);
>> -if (unlikely(!desc_addr))
>> -return -1;
>> +if (virtio_net_with_host_offload(dev)) {
>> +desc_addr = gpa_to_vva(dev, desc->addr);
>> +if (unlikely(!desc_addr))
>> +return -1;
>>
>> -hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
>> -rte_prefetch0(hdr);
>> +hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
>> +rte_prefetch0(hdr);
>> +}
>>
>>  /*
>>   * A virtio driver normally uses at least 2 desc buffers
>> @@ -720,18 +737,24 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vring_desc *descs,
>>  if (unlikely(!desc_addr))
>>  return -1;
>>
>> -rte_prefetch0((void *)(uintptr_t)desc_addr);
>> -
>>  desc_offset = 0;
>>  desc_avail  = desc->len;
>>  nr_desc+= 1;
>> -
>> -PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
>>  } else {
>> +if (!desc_addr) {
>> +desc_addr = gpa_to_vva(dev, desc->addr);
>> +if (unlikely(!desc_addr))
>> +return -1;
>> +}
>> +
>
> I think this piece of code make things a bit complex. I think what you
> want to achieve is, besides saving hdr prefetch, to save one call to
> gpa_to_vva() for the non-ANY_LAYOUT case. Does that matter too much?
>
> How about just saving the hdr prefetch?
>
>   if (virtio_net_with_host_offload(dev)) {
>   hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
>   rte_prefetch0(hdr);
>   }
Oops, you reply slipped through the cracks...

You're right, it doesn't matter too much, the thing to avoid id
definitely the hdr prefetch and access.

I'm sending a v3 now.

Thanks,
Maxime


[dpdk-dev] [PATCH] app/test: add crypto continual tests

2016-10-14 Thread Jain, Deepak K


> -Original Message-
> From: Kusztal, ArkadiuszX
> Sent: Thursday, October 13, 2016 1:18 PM
> To: dev at dpdk.org
> Cc: Trahe, Fiona ; De Lara Guarch, Pablo
> ; Griffin, John  intel.com>;
> Jain, Deepak K ; Kusztal, ArkadiuszX
> 
> Subject: [PATCH] app/test: add crypto continual tests
> 
> This commit adds continual performace tests to Intel(R) QuickAssist
> Technology tests suite. Performance tests are run continually with some
> number of repeating loops.
> 
> Signed-off-by: Arek Kusztal 
> ---
>  app/test/test_cryptodev_perf.c | 133
> -
>  1 file changed, 119 insertions(+), 14 deletions(-)
> 
> diff --git a/app/test/test_cryptodev_perf.c
> b/app/test/test_cryptodev_perf.c index 43a7166..dd741fa 100644
> --- a/app/test/test_cryptodev_perf.c
> --
> 2.1.0
Acked-by: Deepak Kumar Jain 


[dpdk-dev] [PATCH] app/test: add tests with corrupted data for QAT test suite

2016-10-14 Thread Jain, Deepak K


> -Original Message-
> From: Kusztal, ArkadiuszX
> Sent: Thursday, October 13, 2016 11:04 AM
> To: dev at dpdk.org
> Cc: Trahe, Fiona ; Jain, Deepak K
> ; De Lara Guarch, Pablo
> ; Griffin, John  intel.com>;
> Kusztal, ArkadiuszX 
> Subject: [PATCH] app/test: add tests with corrupted data for QAT test suite
> 
> This commit adds tests with corrupted data to the Intel QuickAssist
> Technology tests suite in test_cryptodev.c
> 
> Signed-off-by: Arek Kusztal 
> ---
>  app/test/test_cryptodev.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
>  };
> --
> 2.1.0
Acked-by: Deepak Kumar Jain 


[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-14 Thread Wang, Zhihong


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Zhihong
> Sent: Friday, October 14, 2016 3:25 PM
> To: Maxime Coquelin ;
> yuanhan.liu at linux.intel.com; Xie, Huawei ;
> dev at dpdk.org
> Cc: vkaplans at redhat.com; mst at redhat.com;
> stephen at networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
> to the TX path
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maxime Coquelin
> > Sent: Tuesday, September 27, 2016 4:43 PM
> > To: yuanhan.liu at linux.intel.com; Xie, Huawei ;
> > dev at dpdk.org
> > Cc: vkaplans at redhat.com; mst at redhat.com;
> > stephen at networkplumber.org; Maxime Coquelin
> > 
> > Subject: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to
> > the TX path
> >
> > Indirect descriptors are usually supported by virtio-net devices,
> > allowing to dispatch a larger number of requests.
> >
> > When the virtio device sends a packet using indirect descriptors,
> > only one slot is used in the ring, even for large packets.
> >
> > The main effect is to improve the 0% packet loss benchmark.
> > A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> > (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> > zero loss.
> >
> > On the downside, micro-benchmark using testpmd txonly in VM and
> > rxonly on host shows a loss between 1 and 4%.i But depending on
> > the needs, feature can be disabled at VM boot time by passing
> > indirect_desc=off argument to vhost-user device in Qemu.
> >
> > Signed-off-by: Maxime Coquelin 
> 
> 
> Hi Maxime,
> 
> Seems this patch can't with Windows virtio guest in my test.
> Have you done similar tests before?
> 
> The way I test:
> 
>  1. Make sure https://patchwork.codeaurora.org/patch/84339/ is applied
> 
>  2. Start testpmd with iofwd between 2 vhost ports
> 
>  3. Start 2 Windows guests connected to the 2 vhost ports

The mrg_rxbuf feature is on.

> 
>  4. Disable firewall and assign IP to each guest using ipconfig
> 
>  5. Use ping to test connectivity
> 
> When I disable this patch by setting:
> 
> 0ULL << VIRTIO_RING_F_INDIRECT_DESC,
> 
> the connection is fine, but when I restore:
> 
> 1ULL << VIRTIO_RING_F_INDIRECT_DESC,
> 
> the connection is broken.
> 
> 
> Thanks
> Zhihong
> 



[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

2016-10-14 Thread Wang, Zhihong


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maxime Coquelin
> Sent: Tuesday, September 27, 2016 4:43 PM
> To: yuanhan.liu at linux.intel.com; Xie, Huawei ;
> dev at dpdk.org
> Cc: vkaplans at redhat.com; mst at redhat.com;
> stephen at networkplumber.org; Maxime Coquelin
> 
> Subject: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to
> the TX path
> 
> Indirect descriptors are usually supported by virtio-net devices,
> allowing to dispatch a larger number of requests.
> 
> When the virtio device sends a packet using indirect descriptors,
> only one slot is used in the ring, even for large packets.
> 
> The main effect is to improve the 0% packet loss benchmark.
> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> zero loss.
> 
> On the downside, micro-benchmark using testpmd txonly in VM and
> rxonly on host shows a loss between 1 and 4%.i But depending on
> the needs, feature can be disabled at VM boot time by passing
> indirect_desc=off argument to vhost-user device in Qemu.
> 
> Signed-off-by: Maxime Coquelin 


Hi Maxime,

Seems this patch can't with Windows virtio guest in my test.
Have you done similar tests before?

The way I test:

 1. Make sure https://patchwork.codeaurora.org/patch/84339/ is applied

 2. Start testpmd with iofwd between 2 vhost ports

 3. Start 2 Windows guests connected to the 2 vhost ports

 4. Disable firewall and assign IP to each guest using ipconfig

 5. Use ping to test connectivity

When I disable this patch by setting:

0ULL << VIRTIO_RING_F_INDIRECT_DESC,

the connection is fine, but when I restore:

1ULL << VIRTIO_RING_F_INDIRECT_DESC,

the connection is broken.


Thanks
Zhihong




[dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation

2016-10-14 Thread Thomas Monjalon
2016-10-14 14:02, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-10-13 19:36, Tomasz Kulasek:
> > > +/**
> > > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> > before
> > > + * hardware tx checksum.
> > > + * For non-TSO tcp/udp packets full pseudo-header checksum is counted
> > and set.
> > > + * For TSO the IP payload length is not included.
> > > + */
> > > +static inline int
> > > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> > 
> > You probably don't need this function since the recent improvements from
> > Olivier.
> 
> Do you mean this improvement: "net: add function to calculate a checksum in a 
> mbuf"
> http://dpdk.org/dev/patchwork/patch/16542/
> 
> I see only full raw checksum computation on mbuf in Olivier patches, while 
> this function counts only pseudo-header checksum to be used with tx offload.

OK. Please check what exists already in librte_net (especially rte_ip.h)
and try to re-use code if possible. Thanks



[dpdk-dev] [PATCH] doc: how to build KASUMI as shared library

2016-10-14 Thread Jain, Deepak K


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Thursday, October 13, 2016 8:34 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo 
> Subject: [dpdk-dev] [PATCH] doc: how to build KASUMI as shared library
> 
> Libsso KASUMI library has to be built with specific parameters to make the
> KASUMI PMD be built as a shared library, so a note has been added in its
> documentation.
> 
> Signed-off-by: Pablo de Lara 
> ---
>  doc/guides/cryptodevs/kasumi.rst | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> +  make KASUMI_CFLAGS=-DKASUMI_C
> +
> 
>  Initialization
>  --
> --
> 2.7.4
Acked-by: Deepak Kumar Jain 


[dpdk-dev] [PATCH] doc: ZUC PMD cannot be built as a shared library

2016-10-14 Thread Jain, Deepak K


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Thursday, October 13, 2016 8:35 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo 
> Subject: [dpdk-dev] [PATCH] doc: ZUC PMD cannot be built as a shared
> library
> 
> ZUC PMD cannot be built as a shared library, due to the fact that some
> assembly code in the underlying libsso library is not relocatable.
> This will be fixed in the future, but for the moment, it is added as a 
> limitation
> of the PMD.
> 
> Signed-off-by: Pablo de Lara 
> ---
>  doc/guides/cryptodevs/zuc.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> 
>  Installation
>  
> --
> 2.7.4
Acked-by: Deepak Kumar Jain 


[dpdk-dev] [PATCH] doc: fix libcrypto title

2016-10-14 Thread Jain, Deepak K


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Thursday, October 13, 2016 8:34 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo 
> Subject: [dpdk-dev] [PATCH] doc: fix libcrypto title
> 
> Libcrypto documentation was missing the equal signs ("="), in its title, so it
> was not present in the documentation generated.
> 
> Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")
> 
> Signed-off-by: Pablo de Lara 
> ---
>  doc/guides/cryptodevs/libcrypto.rst | 1 +
>  1 file changed, 1 insertion(+)
> --
> 2.7.4
Acked-by: Deepak Kumar Jain 


[dpdk-dev] [PATCH v9] drivers/net:new PMD using tun/tap host interface

2016-10-14 Thread Mcnamara, John


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Keith Wiles
> Sent: Thursday, October 13, 2016 11:04 PM
> To: dev at dpdk.org
> Cc: pmatilai at redhat.com; yuanhan.liu at linux.intel.com; Yigit, Ferruh
> 
> Subject: [dpdk-dev] [PATCH v9] drivers/net:new PMD using tun/tap host
> interface
> 
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the
> local host. The PMD allows for DPDK and the host to communicate using a
> raw device interface on the host and in the DPDK application. The device
> created is a Tap device with a L2 packet header.
> 
> v9 - Fix up the docs to use correct syntax
> v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
> v7 - Reword the comment in common_base and fix the data->name issue
> v6 - fixed the checkpatch issues
> v5 - merge in changes from list review see related emails
>  fixed many minor edits
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux
>  Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch
>  Fix a typo on naming the tap device
>  Update the maintainers list
> 
> Signed-off-by: Keith Wiles 

For the doc part of the patch:

Acked-by: John McNamara 



[dpdk-dev] [dpdk-announce] release candidate 16.11-rc1

2016-10-14 Thread Thomas Monjalon
A new DPDK release candidate is ready for testing:
http://dpdk.org/browse/dpdk/tag/?id=v16.11-rc1

It is the first release candidate for DPDK 16.11.
It happens a bit late, though there are still some features missing.
This version must be released before mid-November.
Therefore we have 3 weeks to make the validation, fixes and decide
which remaining features can enter without hurting the validation process.
The easy decision would be to stop accepting large patches and new features.

The release notes shall be completed:
http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/release_16_11.rst
Some highlights:
- EAL device object rework
- more offloads
- improved vhost
- virtio for NEON
- new crypto libraries
- usual updates of drivers
- ivshmem library removal

Please start now discussing the changes you plan to do for
the next release cycle (17.02).

Thank you everyone


[dpdk-dev] [PATCH v8 0/2] modify callback for VF management

2016-10-14 Thread Thomas Monjalon
2016-10-10 15:34, Bernard Iremonger:
> This patchset modifies the callback function for VF management.
> 
> A third parameter has been added to the _rte_eth_dev_callback_process
> function. All references to this function have been updated.
> Changes have been made to the ixgbe_rcv_msg_from_vf function to
> use the new callback parameter.

Applied, thanks


[dpdk-dev] [PATCH v3] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-14 Thread Thomas Monjalon
2016-10-10 11:13, Shreyansh Jain:
> All macros related to driver registeration renamed from DRIVER_*
> to RTE_PMD_*
> 
> This includes:
> 
>  DRIVER_REGISTER_PCI -> RTE_PMD_REGISTER_PCI
>  DRIVER_REGISTER_PCI_TABLE -> RTE_PMD_REGISTER_PCI_TABLE
>  DRIVER_REGISTER_VDEV -> RTE_PMD_REGISTER_VDEV
>  DRIVER_REGISTER_PARAM_STRING -> RTE_PMD_REGISTER_PARAM_STRING
>  DRIVER_EXPORT_* -> RTE_PMD_EXPORT_*
> 
> Fix PMDINFOGEN tool to look for matches of RTE_PMD_REGISTER_*.
> 
> Signed-off-by: Shreyansh Jain 

Applied, thanks


[dpdk-dev] [PATCH v2 0/5] implement new Rx checksum flag

2016-10-14 Thread Thomas Monjalon
> > Xiao Wang (5):
> >   net/fm10k: fix Rx checksum flags
> >   net/fm10k: implement new Rx checksum flag
> >   net/e1000: implement new Rx checksum flag
> >   net/ixgbe: implement new Rx checksum flag
> >   net/i40e: implement new Rx checksum flag
> 
> Acked-by : Jing Chen 

Applied directly in mainline on Ferruh's advice


[dpdk-dev] ixgbe: support checksum flags in sse vector Rx function

2016-10-14 Thread Thomas Monjalon
2016-10-06 15:00, Remy Horton:
> On 07/07/2016 13:19, Olivier Matz wrote:
> [..]
> > Signed-off-by: Maxime Leroy 
> > Signed-off-by: Olivier Matz 
> > ---
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |  8 ++---
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |  6 
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c| 50 
> > +--
> >  3 files changed, 42 insertions(+), 22 deletions(-)
> 
> Acked-by: Remy Horton 

Applied directly in mainline on Ferruh's advice


[dpdk-dev] [PATCH v2] examples/l3fwd: em: use hw accelerated crc hash function for arm64

2016-10-14 Thread Hemant Agrawal
if machine level CRC extension are available, offload the
hash to machine provide functions e.g. armv8-a CRC extensions
support it

Signed-off-by: Hemant Agrawal 
Reviewed-by: Jerin Jacob 
---
 examples/l3fwd/l3fwd_em.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 89a68e6..d92d0aa 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -57,13 +57,17 @@

 #include "l3fwd.h"

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) && defined(RTE_MACHINE_CPUFLAG_CRC32)
+#define EM_HASH_CRC 1
+#endif
+
+#ifdef EM_HASH_CRC
 #include 
 #define DEFAULT_HASH_FUNC   rte_hash_crc
 #else
 #include 
 #define DEFAULT_HASH_FUNC   rte_jhash
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

 #define IPV6_ADDR_LEN 16

@@ -168,17 +172,17 @@ ipv4_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
t = k->proto;
p = (const uint32_t *)>port_src;

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
init_val = rte_hash_crc_4byte(t, init_val);
init_val = rte_hash_crc_4byte(k->ip_src, init_val);
init_val = rte_hash_crc_4byte(k->ip_dst, init_val);
init_val = rte_hash_crc_4byte(*p, init_val);
-#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#else
init_val = rte_jhash_1word(t, init_val);
init_val = rte_jhash_1word(k->ip_src, init_val);
init_val = rte_jhash_1word(k->ip_dst, init_val);
init_val = rte_jhash_1word(*p, init_val);
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

return init_val;
 }
@@ -190,16 +194,16 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
const union ipv6_5tuple_host *k;
uint32_t t;
const uint32_t *p;
-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
const uint32_t  *ip_src0, *ip_src1, *ip_src2, *ip_src3;
const uint32_t  *ip_dst0, *ip_dst1, *ip_dst2, *ip_dst3;
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif

k = data;
t = k->proto;
p = (const uint32_t *)>port_src;

-#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+#ifdef EM_HASH_CRC
ip_src0 = (const uint32_t *) k->ip_src;
ip_src1 = (const uint32_t *)(k->ip_src+4);
ip_src2 = (const uint32_t *)(k->ip_src+8);
@@ -218,14 +222,14 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
data_len,
init_val = rte_hash_crc_4byte(*ip_dst2, init_val);
init_val = rte_hash_crc_4byte(*ip_dst3, init_val);
init_val = rte_hash_crc_4byte(*p, init_val);
-#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#else
init_val = rte_jhash_1word(t, init_val);
init_val = rte_jhash(k->ip_src,
sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
init_val = rte_jhash(k->ip_dst,
sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
init_val = rte_jhash_1word(*p, init_val);
-#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+#endif
return init_val;
 }

-- 
1.9.1



[dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> + /* common case: header is not fragmented */
> + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> + m->l4_len)) {
> + struct ipv4_hdr *iph;
> + struct ipv6_hdr *ip6h;
> + struct tcp_hdr *th;
> + uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
> + uint32_t tmp;
...
> + } else {

As discussed just now, if you drop the else part, you could add my
ACK for the whole virtio changes, and Review-ed by for all mbuf and
other changes.

Thoams, please pick them by youself directly: since it depends on
other patches and they will be picked (or already be picked?) by you.

Thanks.

--yliu


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 05:45:21PM +0200, Olivier Matz wrote:
> >> If you have a packet split like this:
> >> 
> >> mbuf segment 1 mbuf segment 2
> >>    --
> >> | Ethernet header |  IP hea|   |der | TCP header | data
> >>    --
> >>^
> >>iph
> >
> >Thanks, that's clear. How could you be able to access the tcp header
> >from the first mbuf then? I mean, how is the following code supposed
> >to work?
> >
> >prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len + m->l3_len + 16);
> >
> 
> Oh I see... Sorry there was a confusion on my side with another (internal) 
> macro that browses the segments if the offset ils not in the first one.
> 
> If you agree, let's add the code without the else part, I'll fix it for the 
> rc2.

Good. That's okay to me.

--yliu


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>+/* When doing TSO, the IP length is not included in the pseudo header
> >>+ * checksum of the packet given to the PMD, but for virtio it is
> >>+ * expected.
> >>+ */
> >>+static void
> >>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>+{
> >>+   /* common case: header is not fragmented */
> >>+   if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>+   m->l4_len)) {
> >...
> >>+   /* replace it in the packet */
> >>+   th->cksum = new_cksum;
> >>+   } else {
> >...
> >>+   /* replace it in the packet */
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>+   *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+   m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>+   }
> >
> >The tcp header will always be in the mbuf, right? Otherwise, you can't
> >update the cksum field here. What's the point of introducing the "else
> >clause" then?
> 
> Sorry, I don't see the problem you're pointing out here.
> 
> What I want to solve here is to support the cases where the mbuf is
> segmented in the middle of the network header (which is probably a rare
> case).
> >>>
> >>>How it's gonna segmented?
> >>
> >>The mbuf is given by the application. So if the application generates a
> >>segmented mbuf, it should work.
> >>
> >>This could happen for instance if the application uses mbuf clones to share
> >>the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> >>different destination.
> >>
> >>
> In the "else" part, I only access the mbuf byte by byte using the
> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> the header in a linear buffer, fix the checksum, then copy it again in the
> packet, but there is no mbuf helpers to do these copies for now.
> >>>
> >>>In the "else" clause, the ip header is still in the mbuf, right?
> >>>Why do you have to access it the way like:
> >>>
> >>>   ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>   m->l2_len) >> 4;
> >>>
> >>>Why can't you just use
> >>>
> >>>   iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >>>   iph->version_ihl ;
> >>
> >>AFAIK, there is no requirement that each network header has to be contiguous
> >>in a mbuf segment.
> >>
> >>Of course, a split in the middle of a network header probably never
> >>happens... but we never knows, as it is not forbidden. I think the code
> >>should be robust enough to avoid accesses to wrong addresses.
> >>
> >>Hope it's clear enough :)
> >
> >Thanks, but not really. Maybe let me ask this way: what wrong would
> >happen if we use
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >to access the IP header? Is it about the endian?
> 
> If you have a packet split like this:
> 
> mbuf segment 1 mbuf segment 2
>    --
> | Ethernet header |  IP hea|   |der | TCP header | data
>    --
>^
>iph

Thanks, that's clear. How could you be able to access the tcp header
from the first mbuf then? I mean, how is the following code supposed
to work?

prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
m->l2_len + m->l3_len + 16);

> The IP header is not contiguous. So accessing to the end of the structure
> will access to a wrong location.
> 
> >One more question is do you have any case to trigger the "else" clause?
> 
> No, but I think it may happen.

A piece of untest code is not trusted though ...

--yliu


[dpdk-dev] [PATCH v3 00/19] KNI checkpatch cleanup

2016-10-14 Thread Thomas Monjalon
2016-09-26 16:39, Ferruh Yigit:
> KNI checkpatch cleanup, mostly non-functional but cosmetic modifications.
> Only functional change is related logging, switched to kernel dynamic
> logging and compile time KNI debug options removed, some log message
> levels updated.

Applied, thanks

Note that it is generally preferred to fix coding style when working
on code changes and avoid mass cleanup.


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Bill Fischofer
Hi Jerin,

This looks reasonable and seems a welcome addition to DPDK. A few questions
noted inline:

On Tue, Oct 11, 2016 at 2:30 PM, Jerin Jacob  wrote:

> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
>
> I've attempted to address as many comments as possible.
>
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
>
> Updates are also available online:
>
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
>
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
>
> Repo:
> https://github.com/jerinjacobk/libeventdev
>
> v1..v2
>
> - Added Cavium, Intel, NXP copyrights in header file
>
> - Changed the concept of flow queues to flow ids.
> This is avoid dictating a specific structure to hold the flows.
> A s/w implementation can do atomic load balancing on multiple
> flow ids more efficiently than maintaining each event in a specific flow
> queue.
>
> - Change the scheduling group to event queue.
> A scheduling group is more a stream of events, so an event queue is a
> better
>  abstraction.
>
> - Introduced event port concept, Instead of trying eventdev access to the
> lcore,
> a higher level of abstraction called event port is needed which is the
> application i/f to the eventdev to dequeue and enqueue the events.
> One or more event queues can be linked to single event port.
> There can be more than one event port per lcore allowing multiple
> lightweight
> threads to have their own i/f into eventdev, if the implementation
> supports it.
> An event port will be bound to a lcore or a lightweight thread to keep
> portable application workflow.
> An event port abstraction also encapsulates dequeue depth and enqueue
> depth for
> a scheduler implementations which can schedule multiple events at a time
> and
> output events that can be buffered.
>
> - Added configuration options with event queue(nb_atomic_flows,
> nb_atomic_order_sequences, single consumer etc)
> and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define the
> limits on the resource usage.(Useful for optimized software implementation)
>
> - Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and RTE_EVENT_DEV_CAP_EVENT_QOS
> schemes of priority handling
>
> - Added event port to event queue servicing priority.
> This allows two event ports to connect to the same event queue with
> different priorities.
>
> - Changed the workflow as schedule/dequeue/enqueue.
> An implementation is free to define schedule as NOOP.
> A distributed s/w scheduler can use this to schedule events;
> also a centralized s/w scheduler can make this a NOOP on non-scheduler
> cores.
>
> - Removed Cavium HW specific schedule_from_group API
>
> - Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
>  Introduced a more generic "event pinning" concept. i.e
> If the normal workflow is a dequeue -> do work based on event type ->
> enqueue,
> a pin_event argument to enqueue
> where the pinned event is returned through the normal dequeue)
> allows application workflow to remain the same whether or not an
> implementation supports it.
>
> - Added dequeue() burst variant
>
> - Added the definition of a closed/open system - where open system is
> memory
> backed and closed system eventdev has limited capacity.
> In such systems, it is also useful to denote per event port how many
> packets
> can be active in the system.
> This can serve as a threshold for ethdev like devices so they don't
> overwhelm
> core to core events.
>
> - Added the option to specify maximum amount of time(in ns) application
> needs
> wait on dequeue()
>
> - Removed the scheme of expressing the number of flows in log2 format
>
> Open item or the item needs improvement.
> 
> - Abstract the differences in event QoS management with different priority
> schemes
> available in different HW or SW implementations with portable application
> workflow.
>
> Based on the feedback, there three different kinds of QoS support
> available in
> three different HW or SW implementations.
> 1) Priority associated with the event queue
> 2) Priority associated with each event enqueue
> (Same flow can have two different priority on two separate enqueue)
> 3) Priority associated with the flow(each flow has unique priority)
>
> In v2, The differences abstracted based on device capability
> (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
> RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
> This scheme would call for different application workflow for
> nontrivial QoS-enabled applications.
>
> Looking forward to getting comments from both application and driver
> implementation perspective.
>
> /Jerin
>
> ---
>  doc/api/doxy-api-index.md  |1 +
>  doc/api/doxy-api.conf  |1 +
>  lib/librte_eventdev/rte_eventdev.h | 1204 

[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> > iph->version_ihl ;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.

One more question is do you have any case to trigger the "else" clause?

--yliu


[dpdk-dev] [PATCH v2 12/12] virtio: add Tso support

2016-10-14 Thread Yuanhan Liu
On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> + /* common case: header is not fragmented */
> + if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> + m->l4_len)) {
> >>>...
> + /* replace it in the packet */
> + th->cksum = new_cksum;
> + } else {
> >>>...
> + /* replace it in the packet */
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> + *rte_pktmbuf_mtod_offset(m, uint8_t *,
> + m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> + }
> >>>
> >>>The tcp header will always be in the mbuf, right? Otherwise, you can't
> >>>update the cksum field here. What's the point of introducing the "else
> >>>clause" then?
> >>
> >>Sorry, I don't see the problem you're pointing out here.
> >>
> >>What I want to solve here is to support the cases where the mbuf is
> >>segmented in the middle of the network header (which is probably a rare
> >>case).
> >
> >How it's gonna segmented?
> 
> The mbuf is given by the application. So if the application generates a
> segmented mbuf, it should work.
> 
> This could happen for instance if the application uses mbuf clones to share
> the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> different destination.
> 
> 
> >>In the "else" part, I only access the mbuf byte by byte using the
> >>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> >>the header in a linear buffer, fix the checksum, then copy it again in the
> >>packet, but there is no mbuf helpers to do these copies for now.
> >
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> > ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> > m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> > iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> > iph->version_ihl ;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.
> 
> Hope it's clear enough :)

Thanks, but not really. Maybe let me ask this way: what wrong would
happen if we use
iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
to access the IP header? Is it about the endian?

--yliu