date:20160602

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith


On 6/2/16, 12:11 PM, "Neil Horman"  wrote:

>
>1) The definition of a config structure that can be passed to rte_eal_init,
>defining the configuration for that running process

Having a configuration structure means we have to have an ABI change to that 
structure anytime we add or remove an option. I was thinking a very simple DB 
of some kind would be better. Have the code query the DB to obtain the needed 
information. The APIs used to query and set the DB needs to be very easy to use 
as well.

Maybe each option can define its own structure if needed or just a simple 
variable type can be used for the basic types (int, string, bool, ?)

Would this work better in the long run, does a fixed structure still make sense?

>
>2) The creation and use of an API that various DPDK libraries can use to
>retrieve that structure (or elements thereof), based on some explicit or 
>imlicit
>id, so that the configuration can be used (I'm thinking here specifically of
>multiple dpdk applications using a dpdk shared library)
>
>3) The removal of the eal_parse_args code from the core dpdk library entirely,
>packaging it instead as its own library that interprets command line arguments
>as currently defined, and populates an instance of the structure defined in (1)
>
>4) Altering the Makefiles, so that the example apps link against the new 
>library
>in (3), altering the app source code to work with the config structure defined
>in (1)
>
>With those steps, I think we will remove the command line bits from the dpdk
>core, and do so without altering the user experience for any of the sample apps
>(which will demonstrate to other developers that the same can be done with 
>their
>applications).  From there we will be free to create alternate methods of
>populating the config struct defined in (1) (via JSON file, YAML, XML, or
>whatever).
>
>Neil
>
>> >> 
>> >> For the purposes of the example apps, it would seem that either JSON, 
>> >> YAML, or
>> >> the above Lua format would work just fine.
>> >
>> >+1
>> >
>> 
>> Regards,
>> ++Keith
>> 
>> 
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith

On 6/2/16, 12:11 PM, "Neil Horman"  wrote:

>On Thu, Jun 02, 2016 at 01:53:32PM +, Wiles, Keith wrote:
>> 
>> On 6/2/16, 8:19 AM, "Thomas Monjalon"  wrote:
>> 
>> >2016-06-02 06:41, Neil Horman:
>> >> I'm not sure why you're focusing no selecting a config file format at 
>> >> all.  Why
>> 
>> The reason is I am on now looking at formats is because I have been thinking 
>> about this issue for some time and already understand your comments. I agree 
>> with you and Thomas, which to me would be the details needing to be done as 
>> part of the project. I guess I found the config file format a lot more fun 
>> to define. ?
>> 
>
>Sure, it is more fun to define, but I think its likely the wrong problem to
>solve (or perhaps not even the wrong problem, but rather the less pressing
>problem).

It is not the wrong problem, just a different starting point in the overall 
problem from the changes below. Now, I believe we have come full circle to 
identify the whole problem.

Let me look at the problem some and see if I can identify a configuration 
structure.
>> 
>> Regards,
>> ++Keith

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Matthew Hall

On Thu, Jun 02, 2016 at 06:34:58PM -0400, Neil Horman wrote:
> > This sort of code is very 1970s / ioctl / messy binary. And doesn't buy any 
> > performance advantage because it's just for config.
> > 
> What!?  I can't even parse that sentence.

I would not want to have to use the structure you proposed in user-readable 
code. It looked a lot like ugly ioctl stuff and I found the sysctl style 
interface easier to read. I don't see why that would be hard for anyone to 
parse but nevertheless.

> > https://www.freebsd.org/cgi/man.cgi?sysctl(3)
> > 
> I can't even begin to understand what you're after here.  sysctl provides a
> heirarchy in _exactly_ the same way that I just proposed, by texual 
> consistency
> in naming.

I didn't object to the hierarchy part, but the user hostility of the example 
proposed.

> > http://json-c.github.io/json-c/json-c-0.12/doc/html/json__object_8h.html
> > 
> So, this is a fine interface to convert text config to a code format, but 
> thats
> a decision that application should be making, not something dpdk should 
> mandate

You're thinking way too narrowly here for what I am working to convey. I 
wasn't meaning to say JSON had to be used. I was saying, the kind of 
lightweight object-based API they used for modeling JSON has worked very well 
for modeling config data inside of my app. IE, simple functions for working 
with the following sort of entities (which are used in many file / interchange 
systems like JSON, MsgPack, YAML, etc.):

Objects:
* hashes, arbitrarily nested
* arrays, arbitrarily nested

Atoms:
* strings - textual
* strings - binary (something we should add for DPDK)
* integers
* floats / doubles
* booleans

In general I am seeing two good approaches for nesting:

1. name nesting like MIB variable "x.y.z.a.b.c" - this is how sysctl works
2. object nesting- this is how JSON, YAML, MsgPack, INI (implicitly w/ section 
names), XML etc. work...

to express this in the Python / Ruby / JS style syntax it would be:

config['x']['y']['z']['a']['b']['c']
using json-c it would be like

json_object_object_get()... until a json_object_TYPE_get().

What I've done for these in the past, is to make something that can parse the 
sysctl-style name x.y.z.0.a.b.c, detect if each dotted-item is a string, in 
which case reach inside the dict for the string or return NULL if not found, 
and if it's a number reach inside the array for that index and return NULL if 
not found. Here is a Python example how to take the sysctl style and look it 
up inside some objects. The same thing could be done using anything with at 
least as rich of features as what json-c provides...

RE_IS_INT = re.compile('^[0-9]+$')
def retrieve_path(data, path):
if isinstance(path, basestring):
path = path.split('.')

if isinstance(data, Mapping):
result = data.get(path[0])
else:
if not RE_IS_INT.match(str(path[0])):
return None
i = int(path[0])
result = data[i] if len(data) > i else None

if len(path) == 1:
return result
else:
if result:
return fetch(result, path[1:])
else:
return None

> Neil

Matthew

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Neil Horman

On Thu, Jun 02, 2016 at 01:53:55PM -0700, Matthew Hall wrote:
> On Thu, Jun 02, 2016 at 04:08:37PM -0400, Neil Horman wrote:
> > struct key_vals {
> > char *key;
> > union {
> > ulong longval;
> > void *ptrval;
> > } value;
> > };
> > 
> > struct config {
> > size_t count;
> > struct key_vals kvp[0];
> > };
> 
> This sort of code is very 1970s / ioctl / messy binary. And doesn't buy any 
> performance advantage because it's just for config.
> 
What!?  I can't even parse that sentence.  Of course its just for config, we're
talking about a configuration structure. If you want to make it more
complex/heirarchical/whatever, fine, propose a way to do that that isnt ABI
variant in response to config additions.  Its just a starting point.

> Something that looks more like sysctl MIBs with hierarchical names or like 
> JSON w/ a hierarchy of hash tables and arrays is much less user-hostile.
> 

> https://www.freebsd.org/cgi/man.cgi?sysctl(3)
> 
I can't even begin to understand what you're after here.  sysctl provides a
heirarchy in _exactly_ the same way that I just proposed, by texual consistency
in naming.

> http://json-c.github.io/json-c/json-c-0.12/doc/html/json__object_8h.html
> 
So, this is a fine interface to convert text config to a code format, but thats
a decision that application should be making, not something dpdk should mandate

Neil

> Matthew.
>

[dpdk-dev] [PATCH v5 0/6] Virtio-net PMD: QEMU QTest extension for container

2016-06-02 Thread Tetsuya Mukawa

Hi Yuanhan,

On 2016/06/02 16:31, Yuanhan Liu wrote:
> But still, I'd ask do we really need 2 virtio for container solutions?

I appreciate your comments.
Let me have time to discuss it with our team.

Thanks,
Tetsuya

[dpdk-dev] [PATCH] ixgbe: fix unused value

2016-06-02 Thread Daniel Mrzyglod

An assigned value that is never used may represent unnecessary computation,
an incorrect algorithm, or possibly the need for cleanup or refactoring.

In reassemble_packets: A value assigned to a variable is never used.

Fixes: cf4b4708a88a ("ixgbe: improve slow-path perf with vector scattered Rx")
Coverity ID 13335

Signed-off-by: Daniel Mrzyglod 
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c 
b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index e97ea82..61c7aad 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -458,7 +458,6 @@ reassemble_packets(struct ixgbe_rx_queue *rxq, struct 
rte_mbuf **rx_bufs,
end->data_len);
secondlast->next = NULL;
rte_pktmbuf_free_seg(end);
-   end = secondlast;
}
pkts[pkt_idx++] = start;
start = end = NULL;
-- 
2.5.5

[dpdk-dev] [PATCH v3 13/13] enic: fix Tx IP and UDP/TCP checksum offload

2016-06-02 Thread John Daley

Private/confilicting ol_flags where used to enable UDP/TCP Tx
offloads. Use the common flags in PKT_TX_L4_MASK to support them.
Also, do some minor code rearranging for slightly better performane.

Fixes: fefed3d1e62c ("enic: new driver")
Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  |  1 -
 drivers/net/enic/enic_rxtx.c | 28 ++--
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 4b76e6d..1e6914e 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -62,7 +62,6 @@
 #define ENICPMD_SETTING(enic, f) ((enic->config.flags & VENETF_##f) ? 1 : 0)

 #define ENICPMD_BDF_LENGTH  13   /* :00:00.0'\0' */
-#define PKT_TX_TCP_UDP_CKSUM0x6000
 #define ENIC_CALC_IP_CKSUM  1
 #define ENIC_CALC_TCP_UDP_CKSUM 2
 #define ENIC_MAX_MTU9000
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 350d04b..8fe334f 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -385,10 +385,10 @@ uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
struct enic *enic = vnic_dev_priv(wq->vdev);
unsigned short vlan_id;
uint64_t ol_flags;
+   uint64_t ol_flags_mask;
unsigned int wq_desc_avail;
int head_idx;
struct vnic_wq_buf *buf;
-   unsigned int hw_ip_cksum_enabled;
unsigned int desc_count;
struct wq_enet_desc *descs, *desc_p, desc_tmp;
uint16_t mss;
@@ -400,10 +400,10 @@ uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
wq_desc_avail = vnic_wq_desc_avail(wq);
head_idx = wq->head_idx;
desc_count = wq->ring.desc_count;
+   ol_flags_mask = PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | PKT_TX_L4_MASK;

nb_pkts = RTE_MIN(nb_pkts, ENIC_TX_XMIT_MAX);

-   hw_ip_cksum_enabled = enic->hw_ip_checksum;
for (index = 0; index < nb_pkts; index++) {
tx_pkt = *tx_pkts++;
nb_segs = tx_pkt->nb_segs;
@@ -415,10 +415,9 @@ uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

pkt_len = tx_pkt->pkt_len;
data_len = tx_pkt->data_len;
-   vlan_id = tx_pkt->vlan_tci;
ol_flags = tx_pkt->ol_flags;
-
mss = 0;
+   vlan_id = 0;
vlan_tag_insert = 0;
bus_addr = (dma_addr_t)
   (tx_pkt->buf_physaddr + tx_pkt->data_off);
@@ -428,14 +427,23 @@ uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

eop = (data_len == pkt_len);

-   if (ol_flags & PKT_TX_VLAN_PKT)
-   vlan_tag_insert = 1;
+   if (ol_flags & ol_flags_mask) {
+   if (ol_flags & PKT_TX_VLAN_PKT) {
+   vlan_tag_insert = 1;
+   vlan_id = tx_pkt->vlan_tci;
+   }

-   if (hw_ip_cksum_enabled && (ol_flags & PKT_TX_IP_CKSUM))
-   mss |= ENIC_CALC_IP_CKSUM;
+   if (ol_flags & PKT_TX_IP_CKSUM)
+   mss |= ENIC_CALC_IP_CKSUM;

-   if (hw_ip_cksum_enabled && (ol_flags & PKT_TX_TCP_UDP_CKSUM))
-   mss |= ENIC_CALC_TCP_UDP_CKSUM;
+   /* Nic uses just 1 bit for UDP and TCP */
+   switch (ol_flags & PKT_TX_L4_MASK) {
+   case PKT_TX_TCP_CKSUM:
+   case PKT_TX_UDP_CKSUM:
+   mss |= ENIC_CALC_TCP_UDP_CKSUM;
+   break;
+   }
+   }

wq_enet_desc_enc(_tmp, bus_addr, data_len, mss, 0, 0, eop,
 eop, 0, vlan_tag_insert, vlan_id, 0);
-- 
2.7.0

[dpdk-dev] [PATCH v3 12/13] enic: expand local Tx mbuf flags variable to 64-bits

2016-06-02 Thread John Daley

The offload flags variable (ol_flags) in rte_mbuf structure is 64-bits,
so local copy of it must be 64-bits too. Moreover bit comparison between
16-bits variable and 64-bits value make no sense. This breaks Tx vlan
IP and L4 offloads.

CID 13218 : Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
result_independent_of_operands: ol_flags & (18014398509481984ULL /* 1ULL
<< 54 */) is always 0 regardless of the values of its operands. This
occurs as the logical operand of if.

Coverity issue: 13218
Fixes: fefed3d1e62c ("enic: new driver")

Suggested-by: Piotr Azarewicz 
Signed-off-by: John Daley 
---
This is essentially patch http://www.dpdk.org/dev/patchwork/patch/12642
applied after the enic_send_packet function was melded into the main
transmit funciton.

 drivers/net/enic/enic_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 7527bce..350d04b 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -384,7 +384,7 @@ uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
struct enic *enic = vnic_dev_priv(wq->vdev);
unsigned short vlan_id;
-   unsigned short ol_flags;
+   uint64_t ol_flags;
unsigned int wq_desc_avail;
int head_idx;
struct vnic_wq_buf *buf;
-- 
2.7.0

[dpdk-dev] [PATCH v3 11/13] enic: add an enic assert macro

2016-06-02 Thread John Daley

Add an ASSERT macro for the enic driver which is enabled when the log
level is >= RTE_LOG_DEBUG. Assert that number of mbufs to return to
the pool in the Tx function is never greater than the max allowed.

Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  | 12 
 drivers/net/enic/enic_rxtx.c |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 4eb28ee..4b76e6d 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -187,6 +187,18 @@ enic_ring_incr(uint32_t n_descriptors, uint32_t idx)
return idx;
 }

+#if RTE_LOG_LEVEL >= RTE_LOG_DEBUG
+#define ENIC_ASSERT(cond)  \
+   do {\
+   if (unlikely(!(cond))) {\
+   rte_panic("line %d\tassert \"" #cond "\""   \
+   "failed\n", __LINE__);  \
+   }   \
+   } while (0)
+#else
+#define ENIC_ASSERT(cond) do {} while (0)
+#endif
+
 extern void enic_fdir_stats_get(struct enic *enic,
struct rte_eth_fdir_stats *stats);
 extern int enic_fdir_add_fltr(struct enic *enic,
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index a04ebd0..7527bce 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -343,6 +343,7 @@ static inline void enic_free_wq_bufs(struct vnic_wq *wq, 
u16 completed_index)
buf = >bufs[tail_idx];
m = (struct rte_mbuf *)(buf->mb);
if (likely(m->pool == pool)) {
+   ENIC_ASSERT(nb_free < ENIC_MAX_WQ_DESCS);
free[nb_free++] = m;
} else {
rte_mempool_put_bulk(pool, (void *)free, nb_free);
-- 
2.7.0

[dpdk-dev] [PATCH v3 10/13] enic: remove unused files and functions and variables

2016-06-02 Thread John Daley

Remove some files, functions and variables left unused after
Tx performance improvements.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/enic_vnic_wq.h | 63 
 drivers/net/enic/base/vnic_cq.h  | 44 -
 drivers/net/enic/base/vnic_wq.c  |  4 +--
 drivers/net/enic/base/vnic_wq.h  |  6 +---
 drivers/net/enic/enic.h  |  1 -
 drivers/net/enic/enic_main.c |  2 +-
 drivers/net/enic/enic_rxtx.c |  1 -
 7 files changed, 4 insertions(+), 117 deletions(-)
 delete mode 100644 drivers/net/enic/base/enic_vnic_wq.h

diff --git a/drivers/net/enic/base/enic_vnic_wq.h 
b/drivers/net/enic/base/enic_vnic_wq.h
deleted file mode 100644
index 55c5664..000
--- a/drivers/net/enic/base/enic_vnic_wq.h
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Copyright 2008-2015 Cisco Systems, Inc.  All rights reserved.
- * Copyright 2007 Nuova Systems, Inc.  All rights reserved.
- *
- * Copyright (c) 2015, Cisco Systems, Inc.
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * 1. Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in
- * the documentation and/or other materials provided with the
- * distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
- *
- */
-
-#ifndef _ENIC_VNIC_WQ_H_
-#define _ENIC_VNIC_WQ_H_
-
-#include "vnic_dev.h"
-#include "vnic_cq.h"
-
-static inline void enic_vnic_post_wq_index(struct vnic_wq *wq)
-{
-   /* Adding write memory barrier prevents compiler and/or CPU
-* reordering, thus avoiding descriptor posting before
-* descriptor is initialized. Otherwise, hardware can read
-* stale descriptor fields.
-   */
-   wmb();
-   iowrite32(wq->head_idx, >ctrl->posted_index);
-}
-
-static inline void enic_vnic_post_wq(struct vnic_wq *wq,
-void *os_buf, uint8_t cq_entry)
-{
-   struct vnic_wq_buf *buf = >bufs[wq->head_idx];
-   buf->mb = os_buf;
-   wq->head_idx = enic_ring_incr(wq->ring.desc_count, wq->head_idx);
-
-   if (cq_entry)
-   enic_vnic_post_wq_index(wq);
-}
-
-#endif /* _ENIC_VNIC_WQ_H_ */
diff --git a/drivers/net/enic/base/vnic_cq.h b/drivers/net/enic/base/vnic_cq.h
index 922391b..13ab87c 100644
--- a/drivers/net/enic/base/vnic_cq.h
+++ b/drivers/net/enic/base/vnic_cq.h
@@ -90,50 +90,6 @@ struct vnic_cq {
 #endif
 };

-static inline unsigned int vnic_cq_service(struct vnic_cq *cq,
-   unsigned int work_to_do,
-   int (*q_service)(struct vnic_dev *vdev, struct cq_desc *cq_desc,
-   u8 type, u16 q_number, u16 completed_index, void *opaque),
-   void *opaque)
-{
-   struct cq_desc *cq_desc;
-   unsigned int work_done = 0;
-   u16 q_number, completed_index;
-   u8 type, color;
-   struct rte_mbuf **rx_pkts = opaque;
-   unsigned int ret;
-
-   cq_desc = (struct cq_desc *)((u8 *)cq->ring.descs +
-   cq->ring.desc_size * cq->to_clean);
-   cq_desc_dec(cq_desc, , ,
-   _number, _index);
-
-   while (color != cq->last_color) {
-   if (opaque)
-   opaque = (void *)&(rx_pkts[work_done]);
-
-   ret = (*q_service)(cq->vdev, cq_desc, type,
-   q_number, completed_index, opaque);
-   cq->to_clean++;
-   if (cq->to_clean == cq->ring.desc_count) {
-   cq->to_clean = 0;
-   cq->last_color = cq->last_color ? 0 : 1;
-   }
-
-   cq_desc = (struct cq_desc *)((u8 *)cq->ring.descs +
-   cq->ring.desc_size * cq->to_clean);
-   cq_desc_dec(cq_desc, , ,
-   _number, _index);
-
-   if (ret)
-   work_done++;
-   if (work_done >= work_to_do)
-   break;
-   }
-
-   return work_done;
-}
-

[dpdk-dev] [PATCH v3 09/13] enic: optimize the Tx function

2016-06-02 Thread John Daley

Reduce host CPU overhead of Tx packet processing:
* Use local variables inside per packet loop instead of fields in structs.
* Factor book keeping and conditionals out of the per packet loop where
  possible.
* Post buffers to the nic at a maximum of every 64 packets

Signed-off-by: Nelson Escobar 
Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.h |   1 +
 drivers/net/enic/enic_res.h |   2 +-
 drivers/net/enic/enic_rxtx.c| 167 +++-
 3 files changed, 83 insertions(+), 87 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index 689b81c..7a66813 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -67,6 +67,7 @@ struct vnic_wq_ctrl {

 /* 16 bytes */
 struct vnic_wq_buf {
+   struct rte_mempool *pool;
void *mb;
 };

diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index 955db71..3c8e303 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -53,7 +53,7 @@

 #define ENIC_NON_TSO_MAX_DESC  16
 #define ENIC_DEFAULT_RX_FREE_THRESH32
-#define ENIC_TX_POST_THRESH(ENIC_MIN_WQ_DESCS / 2)
+#define ENIC_TX_XMIT_MAX   64

 #define ENIC_SETTING(enic, f) ((enic->config.flags & VENETF_##f) ? 1 : 0)

diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index ec8d90a..ba15670 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -374,114 +374,109 @@ unsigned int enic_cleanup_wq(__rte_unused struct enic 
*enic, struct vnic_wq *wq)
return 0;
 }

-void enic_post_wq_index(struct vnic_wq *wq)
-{
-   enic_vnic_post_wq_index(wq);
-}
-
-void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
-  struct rte_mbuf *tx_pkt, unsigned short len,
-  uint8_t sop, uint8_t eop, uint8_t cq_entry,
-  uint16_t ol_flags, uint16_t vlan_tag)
-{
-   struct wq_enet_desc *desc, *descs;
-   uint16_t mss = 0;
-   uint8_t vlan_tag_insert = 0;
-   uint64_t bus_addr = (dma_addr_t)
-   (tx_pkt->buf_physaddr + tx_pkt->data_off);
-
-   descs = (struct wq_enet_desc *)wq->ring.descs;
-   desc = descs + wq->head_idx;
-
-   if (sop) {
-   if (ol_flags & PKT_TX_VLAN_PKT)
-   vlan_tag_insert = 1;
-
-   if (enic->hw_ip_checksum) {
-   if (ol_flags & PKT_TX_IP_CKSUM)
-   mss |= ENIC_CALC_IP_CKSUM;
-
-   if (ol_flags & PKT_TX_TCP_UDP_CKSUM)
-   mss |= ENIC_CALC_TCP_UDP_CKSUM;
-   }
-   }
-
-   wq_enet_desc_enc(desc,
-   bus_addr,
-   len,
-   mss,
-   0 /* header_length */,
-   0 /* offload_mode WQ_ENET_OFFLOAD_MODE_CSUM */,
-   eop,
-   cq_entry,
-   0 /* fcoe_encap */,
-   vlan_tag_insert,
-   vlan_tag,
-   0 /* loopback */);
-
-   enic_vnic_post_wq(wq, (void *)tx_pkt, cq_entry);
-}
-
 uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
 {
uint16_t index;
-   unsigned int frags;
-   unsigned int pkt_len;
-   unsigned int seg_len;
-   unsigned int inc_len;
+   unsigned int pkt_len, data_len;
unsigned int nb_segs;
-   struct rte_mbuf *tx_pkt, *next_tx_pkt;
+   struct rte_mbuf *tx_pkt;
struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
struct enic *enic = vnic_dev_priv(wq->vdev);
unsigned short vlan_id;
unsigned short ol_flags;
-   uint8_t last_seg, eop;
-   unsigned int host_tx_descs = 0;
+   unsigned int wq_desc_avail;
+   int head_idx;
+   struct vnic_wq_buf *buf;
+   unsigned int hw_ip_cksum_enabled;
+   unsigned int desc_count;
+   struct wq_enet_desc *descs, *desc_p, desc_tmp;
+   uint16_t mss;
+   uint8_t vlan_tag_insert;
+   uint8_t eop;
+   uint64_t bus_addr;

+   enic_cleanup_wq(enic, wq);
+   wq_desc_avail = vnic_wq_desc_avail(wq);
+   head_idx = wq->head_idx;
+   desc_count = wq->ring.desc_count;
+
+   nb_pkts = RTE_MIN(nb_pkts, ENIC_TX_XMIT_MAX);
+
+   hw_ip_cksum_enabled = enic->hw_ip_checksum;
for (index = 0; index < nb_pkts; index++) {
tx_pkt = *tx_pkts++;
-   inc_len = 0;
nb_segs = tx_pkt->nb_segs;
-   if (nb_segs > vnic_wq_desc_avail(wq)) {
+   if (nb_segs > wq_desc_avail) {
if (index > 0)
-   enic_post_wq_index(wq);
-
-   /* wq cleanup and try again */
-   if (!enic_cleanup_wq(enic, wq) ||
-   (nb_segs > vnic_wq_desc_avail(wq))) {
-   return index;
-   }
+

[dpdk-dev] [PATCH v3 08/13] enic: refactor Tx mbuf recycling

2016-06-02 Thread John Daley

Mbufs were returned to the pool one at a time. Use rte_mempool_put_bulk
instead. There were muiltiple function calls for each buffer returned.
Refactor this code into just 2 functions.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.h | 27 -
 drivers/net/enic/enic_rxtx.c| 54 ++---
 2 files changed, 35 insertions(+), 46 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index fe46bb4..689b81c 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -177,33 +177,6 @@ buf_idx_incr(uint32_t n_descriptors, uint32_t idx)
return idx;
 }

-static inline void vnic_wq_service(struct vnic_wq *wq,
-   struct cq_desc *cq_desc, u16 completed_index,
-   void (*buf_service)(struct vnic_wq *wq,
-   struct cq_desc *cq_desc, struct vnic_wq_buf *buf, void *opaque),
-   void *opaque)
-{
-   struct vnic_wq_buf *buf;
-   unsigned int to_clean = wq->tail_idx;
-
-   buf = >bufs[to_clean];
-   while (1) {
-
-   (*buf_service)(wq, cq_desc, buf, opaque);
-
-   wq->ring.desc_avail++;
-
-
-   to_clean = buf_idx_incr(wq->ring.desc_count, to_clean);
-
-   if (to_clean == completed_index)
-   break;
-
-   buf = >bufs[to_clean];
-   }
-   wq->tail_idx = to_clean;
-}
-
 void vnic_wq_free(struct vnic_wq *wq);
 int vnic_wq_alloc(struct vnic_dev *vdev, struct vnic_wq *wq, unsigned int 
index,
unsigned int desc_count, unsigned int desc_size);
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 2a54333..ec8d90a 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -326,33 +326,49 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
 }

-static void enic_wq_free_buf(struct vnic_wq *wq,
-   __rte_unused struct cq_desc *cq_desc,
-   struct vnic_wq_buf *buf,
-   __rte_unused void *opaque)
+static inline void enic_free_wq_bufs(struct vnic_wq *wq, u16 completed_index)
 {
-   enic_free_wq_buf(wq, buf);
-}
-
-static int enic_wq_service(struct vnic_dev *vdev, struct cq_desc *cq_desc,
-   __rte_unused u8 type, u16 q_number, u16 completed_index, void *opaque)
-{
-   struct enic *enic = vnic_dev_priv(vdev);
+   struct vnic_wq_buf *buf;
+   struct rte_mbuf *m, *free[ENIC_MAX_WQ_DESCS];
+   unsigned int nb_to_free, nb_free = 0, i;
+   struct rte_mempool *pool;
+   unsigned int tail_idx;
+   unsigned int desc_count = wq->ring.desc_count;
+
+   nb_to_free = enic_ring_sub(desc_count, wq->tail_idx, completed_index)
+  + 1;
+   tail_idx = wq->tail_idx;
+   buf = >bufs[tail_idx];
+   pool = ((struct rte_mbuf *)buf->mb)->pool;
+   for (i = 0; i < nb_to_free; i++) {
+   buf = >bufs[tail_idx];
+   m = (struct rte_mbuf *)(buf->mb);
+   if (likely(m->pool == pool)) {
+   free[nb_free++] = m;
+   } else {
+   rte_mempool_put_bulk(pool, (void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   pool = m->pool;
+   }
+   tail_idx = enic_ring_incr(desc_count, tail_idx);
+   buf->mb = NULL;
+   }

-   vnic_wq_service(>wq[q_number], cq_desc,
-   completed_index, enic_wq_free_buf,
-   opaque);
+   rte_mempool_put_bulk(pool, (void **)free, nb_free);

-   return 0;
+   wq->tail_idx = tail_idx;
+   wq->ring.desc_avail += nb_to_free;
 }

-unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq)
+unsigned int enic_cleanup_wq(__rte_unused struct enic *enic, struct vnic_wq 
*wq)
 {
-   u16 completed_index = *((uint32_t *)wq->cqmsg_rz->addr) & 0x;
+   u16 completed_index;
+
+   completed_index = *((uint32_t *)wq->cqmsg_rz->addr) & 0x;

if (wq->last_completed_index != completed_index) {
-   enic_wq_service(enic->vdev, NULL, 0, wq->index,
-   completed_index, NULL);
+   enic_free_wq_bufs(wq, completed_index);
wq->last_completed_index = completed_index;
}
return 0;
-- 
2.7.0

[dpdk-dev] [PATCH v3 07/13] enic: use Tx completion messages instead of descriptors

2016-06-02 Thread John Daley

The NIC can either DMA a separate completion message for each
completed send or periodically just DMA an index of the last
completed send. Switch to the second method which improves
cache locality and performance.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.c |  1 +
 drivers/net/enic/base/vnic_wq.h |  3 +++
 drivers/net/enic/enic_main.c| 43 -
 drivers/net/enic/enic_rxtx.c| 11 +++
 4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.c b/drivers/net/enic/base/vnic_wq.c
index ab81c7e..cfef1af 100644
--- a/drivers/net/enic/base/vnic_wq.c
+++ b/drivers/net/enic/base/vnic_wq.c
@@ -142,6 +142,7 @@ void vnic_wq_init(struct vnic_wq *wq, unsigned int cq_index,
vnic_wq_init_start(wq, cq_index, 0, 0,
error_interrupt_enable,
error_interrupt_offset);
+   wq->last_completed_index = 0;
 }

 void vnic_wq_error_out(struct vnic_wq *wq, unsigned int error)
diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index a6759f5..fe46bb4 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -38,6 +38,7 @@

 #include "vnic_dev.h"
 #include "vnic_cq.h"
+#include 

 /* Work queue control */
 struct vnic_wq_ctrl {
@@ -79,6 +80,8 @@ struct vnic_wq {
unsigned int tail_idx;
unsigned int pkts_outstanding;
unsigned int socket_id;
+   const struct rte_memzone *cqmsg_rz;
+   uint16_t last_completed_index;
 };

 static inline unsigned int vnic_wq_desc_avail(struct vnic_wq *wq)
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 5bf5fcf..eaa206e 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -97,7 +97,6 @@ enic_rxmbuf_queue_release(struct enic *enic, struct vnic_rq 
*rq)
}
 }

-
 void enic_set_hdr_split_size(struct enic *enic, u16 split_hdr_size)
 {
vnic_set_hdr_split_size(enic->vdev, split_hdr_size);
@@ -235,12 +234,26 @@ void enic_init_vnic_resources(struct enic *enic)
unsigned int error_interrupt_enable = 1;
unsigned int error_interrupt_offset = 0;
unsigned int index = 0;
+   unsigned int cq_idx;

for (index = 0; index < enic->rq_count; index++) {
vnic_rq_init(>rq[index],
enic_cq_rq(enic, index),
error_interrupt_enable,
error_interrupt_offset);
+
+   cq_idx = enic_cq_rq(enic, index);
+   vnic_cq_init(>cq[cq_idx],
+   0 /* flow_control_enable */,
+   1 /* color_enable */,
+   0 /* cq_head */,
+   0 /* cq_tail */,
+   1 /* cq_tail_color */,
+   0 /* interrupt_enable */,
+   1 /* cq_entry_enable */,
+   0 /* cq_message_enable */,
+   0 /* interrupt offset */,
+   0 /* cq_message_addr */);
}

for (index = 0; index < enic->wq_count; index++) {
@@ -248,22 +261,19 @@ void enic_init_vnic_resources(struct enic *enic)
enic_cq_wq(enic, index),
error_interrupt_enable,
error_interrupt_offset);
-   }

-   vnic_dev_stats_clear(enic->vdev);
-
-   for (index = 0; index < enic->cq_count; index++) {
-   vnic_cq_init(>cq[index],
+   cq_idx = enic_cq_wq(enic, index);
+   vnic_cq_init(>cq[cq_idx],
0 /* flow_control_enable */,
1 /* color_enable */,
0 /* cq_head */,
0 /* cq_tail */,
1 /* cq_tail_color */,
0 /* interrupt_enable */,
-   1 /* cq_entry_enable */,
-   0 /* cq_message_enable */,
+   0 /* cq_entry_enable */,
+   1 /* cq_message_enable */,
0 /* interrupt offset */,
-   0 /* cq_message_addr */);
+   (u64)enic->wq[index].cqmsg_rz->phys_addr);
}

vnic_intr_init(>intr,
@@ -507,6 +517,7 @@ void enic_free_wq(void *txq)
struct vnic_wq *wq = (struct vnic_wq *)txq;
struct enic *enic = vnic_dev_priv(wq->vdev);

+   rte_memzone_free(wq->cqmsg_rz);
vnic_wq_free(wq);
vnic_cq_free(>cq[enic->rq_count + wq->index]);
 }
@@ -517,6 +528,8 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
int err;
struct vnic_wq *wq = >wq[queue_idx];
unsigned int cq_index = enic_cq_wq(enic, queue_idx);
+   char name[NAME_MAX];
+   static int instance;

wq->socket_id = socket_id;
if (nb_desc) {
@@ -552,6 +565,18 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
dev_err(enic, "error

[dpdk-dev] [PATCH v3 06/13] enic: streamline mbuf handling in Tx path

2016-06-02 Thread John Daley

The list of mbufs held by the driver on Tx was allocated in chunks
(a hold-over from the enic kernel mode driver). The structure used
next pointers across chunks which led to cache misses.

Allocate the array used to hold mbufs in flight on Tx with
rte_zmalloc_socket(). Remove unnecessary fields from the structure
and use head and tail pointers instead of next pointers.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/enic_vnic_wq.h | 26 +++--
 drivers/net/enic/base/vnic_wq.c  | 75 +---
 drivers/net/enic/base/vnic_wq.h  | 56 ++-
 drivers/net/enic/enic.h  | 24 
 drivers/net/enic/enic_main.c |  4 +-
 drivers/net/enic/enic_rxtx.c | 23 +++
 6 files changed, 75 insertions(+), 133 deletions(-)

diff --git a/drivers/net/enic/base/enic_vnic_wq.h 
b/drivers/net/enic/base/enic_vnic_wq.h
index b019109..55c5664 100644
--- a/drivers/net/enic/base/enic_vnic_wq.h
+++ b/drivers/net/enic/base/enic_vnic_wq.h
@@ -40,37 +40,21 @@

 static inline void enic_vnic_post_wq_index(struct vnic_wq *wq)
 {
-   struct vnic_wq_buf *buf = wq->to_use;
-
/* Adding write memory barrier prevents compiler and/or CPU
 * reordering, thus avoiding descriptor posting before
 * descriptor is initialized. Otherwise, hardware can read
 * stale descriptor fields.
*/
wmb();
-   iowrite32(buf->index, >ctrl->posted_index);
+   iowrite32(wq->head_idx, >ctrl->posted_index);
 }

 static inline void enic_vnic_post_wq(struct vnic_wq *wq,
-void *os_buf, dma_addr_t dma_addr,
-unsigned int len, int sop,
-uint8_t desc_skip_cnt, uint8_t cq_entry,
-uint8_t compressed_send, uint64_t wrid)
+void *os_buf, uint8_t cq_entry)
 {
-   struct vnic_wq_buf *buf = wq->to_use;
-
-   buf->sop = sop;
-   buf->cq_entry = cq_entry;
-   buf->compressed_send = compressed_send;
-   buf->desc_skip_cnt = desc_skip_cnt;
-   buf->os_buf = os_buf;
-   buf->dma_addr = dma_addr;
-   buf->len = len;
-   buf->wr_id = wrid;
-
-   buf = buf->next;
-   wq->ring.desc_avail -= desc_skip_cnt;
-   wq->to_use = buf;
+   struct vnic_wq_buf *buf = >bufs[wq->head_idx];
+   buf->mb = os_buf;
+   wq->head_idx = enic_ring_incr(wq->ring.desc_count, wq->head_idx);

if (cq_entry)
enic_vnic_post_wq_index(wq);
diff --git a/drivers/net/enic/base/vnic_wq.c b/drivers/net/enic/base/vnic_wq.c
index a3ef417..ab81c7e 100644
--- a/drivers/net/enic/base/vnic_wq.c
+++ b/drivers/net/enic/base/vnic_wq.c
@@ -59,71 +59,30 @@ int vnic_wq_alloc_ring(struct vnic_dev *vdev, struct 
vnic_wq *wq,

 static int vnic_wq_alloc_bufs(struct vnic_wq *wq)
 {
-   struct vnic_wq_buf *buf;
-   unsigned int i, j, count = wq->ring.desc_count;
-   unsigned int blks = VNIC_WQ_BUF_BLKS_NEEDED(count);
-
-   for (i = 0; i < blks; i++) {
-   wq->bufs[i] = kzalloc(VNIC_WQ_BUF_BLK_SZ(count), GFP_ATOMIC);
-   if (!wq->bufs[i])
-   return -ENOMEM;
-   }
-
-   for (i = 0; i < blks; i++) {
-   buf = wq->bufs[i];
-   for (j = 0; j < VNIC_WQ_BUF_BLK_ENTRIES(count); j++) {
-   buf->index = i * VNIC_WQ_BUF_BLK_ENTRIES(count) + j;
-   buf->desc = (u8 *)wq->ring.descs +
-   wq->ring.desc_size * buf->index;
-   if (buf->index + 1 == count) {
-   buf->next = wq->bufs[0];
-   break;
-   } else if (j + 1 == VNIC_WQ_BUF_BLK_ENTRIES(count)) {
-   buf->next = wq->bufs[i + 1];
-   } else {
-   buf->next = buf + 1;
-   buf++;
-   }
-   }
-   }
-
-   wq->to_use = wq->to_clean = wq->bufs[0];
-
+   unsigned int count = wq->ring.desc_count;
+   /* Allocate the mbuf ring */
+   wq->bufs = (struct vnic_wq_buf *)rte_zmalloc_socket("wq->bufs",
+   sizeof(struct vnic_wq_buf) * count,
+   RTE_CACHE_LINE_SIZE, wq->socket_id);
+   wq->head_idx = 0;
+   wq->tail_idx = 0;
+   if (wq->bufs == NULL)
+   return -ENOMEM;
return 0;
 }

 void vnic_wq_free(struct vnic_wq *wq)
 {
struct vnic_dev *vdev;
-   unsigned int i;

vdev = wq->vdev;

vnic_dev_free_desc_ring(vdev, >ring);

-   for (i = 0; i < VNIC_WQ_BUF_BLKS_MAX; i++) {
-   if (wq->bufs[i]) {
-   kfree(wq->bufs[i]);
-   wq->bufs[i] = NULL;
-   }
-   }
-
+   rte_free(wq->bufs);
wq->ctrl = NULL;
 }

-int

[dpdk-dev] [PATCH v3 04/13] enic: put Tx and Rx functions into same file

2016-06-02 Thread John Daley

Signed-off-by: John Daley 
---
 drivers/net/enic/Makefile  |   2 +-
 drivers/net/enic/enic.h|   3 +
 drivers/net/enic/enic_ethdev.c |  65 --
 drivers/net/enic/enic_main.c   |  82 +--
 drivers/net/enic/enic_rx.c | 343 -
 drivers/net/enic/enic_rxtx.c   | 481 +
 6 files changed, 486 insertions(+), 490 deletions(-)
 delete mode 100644 drivers/net/enic/enic_rx.c
 create mode 100644 drivers/net/enic/enic_rxtx.c

diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
index f316274..3926b79 100644
--- a/drivers/net/enic/Makefile
+++ b/drivers/net/enic/Makefile
@@ -53,7 +53,7 @@ VPATH += $(SRCDIR)/src
 #
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c
-SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_cq.c
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 9b6f349..62a8c12 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -208,4 +208,7 @@ extern void enic_clsf_destroy(struct enic *enic);
 uint16_t enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t enicpmd_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
+void enic_free_wq_buf(__rte_unused struct vnic_wq *wq, struct vnic_wq_buf 
*buf);
 #endif /* _ENIC_H_ */
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 6bea940..fab8124 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -519,71 +519,6 @@ static void enicpmd_remove_mac_addr(struct rte_eth_dev 
*eth_dev, __rte_unused ui
enic_del_mac_address(enic);
 }

-
-static uint16_t enicpmd_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-   uint16_t nb_pkts)
-{
-   uint16_t index;
-   unsigned int frags;
-   unsigned int pkt_len;
-   unsigned int seg_len;
-   unsigned int inc_len;
-   unsigned int nb_segs;
-   struct rte_mbuf *tx_pkt, *next_tx_pkt;
-   struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
-   struct enic *enic = vnic_dev_priv(wq->vdev);
-   unsigned short vlan_id;
-   unsigned short ol_flags;
-   uint8_t last_seg, eop;
-   unsigned int host_tx_descs = 0;
-
-   for (index = 0; index < nb_pkts; index++) {
-   tx_pkt = *tx_pkts++;
-   inc_len = 0;
-   nb_segs = tx_pkt->nb_segs;
-   if (nb_segs > vnic_wq_desc_avail(wq)) {
-   if (index > 0)
-   enic_post_wq_index(wq);
-
-   /* wq cleanup and try again */
-   if (!enic_cleanup_wq(enic, wq) ||
-   (nb_segs > vnic_wq_desc_avail(wq))) {
-   return index;
-   }
-   }
-
-   pkt_len = tx_pkt->pkt_len;
-   vlan_id = tx_pkt->vlan_tci;
-   ol_flags = tx_pkt->ol_flags;
-   for (frags = 0; inc_len < pkt_len; frags++) {
-   if (!tx_pkt)
-   break;
-   next_tx_pkt = tx_pkt->next;
-   seg_len = tx_pkt->data_len;
-   inc_len += seg_len;
-
-   host_tx_descs++;
-   last_seg = 0;
-   eop = 0;
-   if ((pkt_len == inc_len) || !next_tx_pkt) {
-   eop = 1;
-   /* post if last packet in batch or > thresh */
-   if ((index == (nb_pkts - 1)) ||
-  (host_tx_descs > ENIC_TX_POST_THRESH)) {
-   last_seg = 1;
-   host_tx_descs = 0;
-   }
-   }
-   enic_send_pkt(enic, wq, tx_pkt, (unsigned short)seg_len,
- !frags, eop, last_seg, ol_flags, vlan_id);
-   tx_pkt = next_tx_pkt;
-   }
-   }
-
-   enic_cleanup_wq(enic, wq);
-   return index;
-}
-
 static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.dev_configure= enicpmd_dev_configure,
.dev_start= enicpmd_dev_start,
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index e4ccc7d..f41ef86 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -58,7 +58,6 @@
 #include "vnic_cq.h"
 #include "vnic_intr.h"
 #include "vnic_nic.h"
-#include "enic_vnic_wq.h"

 static inline int enic_is_sriov_vf(struct enic *enic)
 {
@@ -104,7 +103,7 @@ void

[dpdk-dev] [PATCH v3 03/13] enic: count truncated packets

2016-06-02 Thread John Daley

Truncated packets occur on enic if an mbuf is not big enough to
receive it or there aren't enough mbufs if rx scatter is in use.
They show up as error packets but unlike other error packets (like
packets bad FCS) there are no nic drop counts incremented for them.
Truncated packets are calculated by subtracting hardware errors from
software errors. Note: this causes transient inaccuracies in the
ipackets count. Also, the length of truncated packets are counted
in ibytes even though truncated packets are dropped which can make
ibytes be slightly higher than it should be.

Signed-off-by: Nelson Escobar 
Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  |  1 +
 drivers/net/enic/enic_main.c | 21 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 584d49b..9b6f349 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -93,6 +93,7 @@ struct enic_fdir {

 struct enic_soft_stats {
rte_atomic64_t rx_nombuf;
+   rte_atomic64_t rx_packet_errors;
 };

 /* Per-instance private data structure */
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index c002ef3..e4ccc7d 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -215,12 +215,14 @@ static void enic_clear_soft_stats(struct enic *enic)
 {
struct enic_soft_stats *soft_stats = >soft_stats;
rte_atomic64_clear(_stats->rx_nombuf);
+   rte_atomic64_clear(_stats->rx_packet_errors);
 }

 static void enic_init_soft_stats(struct enic *enic)
 {
struct enic_soft_stats *soft_stats = >soft_stats;
rte_atomic64_init(_stats->rx_nombuf);
+   rte_atomic64_init(_stats->rx_packet_errors);
enic_clear_soft_stats(enic);
 }

@@ -234,14 +236,26 @@ void enic_dev_stats_clear(struct enic *enic)
 void enic_dev_stats_get(struct enic *enic, struct rte_eth_stats *r_stats)
 {
struct vnic_stats *stats;
-   struct enic_soft_stats *soft_stats;
+   struct enic_soft_stats *soft_stats = >soft_stats;
+   int64_t rx_truncated;
+   uint64_t rx_packet_errors;

if (vnic_dev_stats_dump(enic->vdev, )) {
dev_err(enic, "Error in getting stats\n");
return;
}

-   r_stats->ipackets = stats->rx.rx_frames_ok;
+   /* The number of truncated packets can only be calculated by
+* subtracting a hardware counter from error packets received by
+* the driver. Note: this causes transient inaccuracies in the
+* ipackets count. Also, the length of truncated packets are
+* counted in ibytes even though truncated packets are dropped
+* which can make ibytes be slightly higher than it should be.
+*/
+   rx_packet_errors = rte_atomic64_read(_stats->rx_packet_errors);
+   rx_truncated = rx_packet_errors - stats->rx.rx_errors;
+
+   r_stats->ipackets = stats->rx.rx_frames_ok - rx_truncated;
r_stats->opackets = stats->tx.tx_frames_ok;

r_stats->ibytes = stats->rx.rx_bytes_ok;
@@ -250,9 +264,8 @@ void enic_dev_stats_get(struct enic *enic, struct 
rte_eth_stats *r_stats)
r_stats->ierrors = stats->rx.rx_errors + stats->rx.rx_drop;
r_stats->oerrors = stats->tx.tx_errors;

-   r_stats->imissed = stats->rx.rx_no_bufs;
+   r_stats->imissed = stats->rx.rx_no_bufs + rx_truncated;

-   soft_stats = >soft_stats;
r_stats->rx_nombuf = rte_atomic64_read(_stats->rx_nombuf);
 }

-- 
2.7.0

[dpdk-dev] [PATCH v3 02/13] enic: drop bad packets and remove unused Rx error flag

2016-06-02 Thread John Daley

Following the discussions from:
http://dpdk.org/ml/archives/dev/2015-July/021721.html
http://dpdk.org/ml/archives/dev/2016-April/038143.html

Remove the unused flag from enic driver. Also, the enic driver is
modified to drop bad packets.

Signed-off-by: Olivier Matz 
Signed-off-by: John Daley 
---
 drivers/net/enic/enic_rx.c | 39 +--
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index 89c62ce..c72a80a 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -134,20 +134,15 @@ enic_cq_rx_desc_n_bytes(struct cq_desc *cqd)
 }

 static inline uint8_t
-enic_cq_rx_to_pkt_err_flags(struct cq_desc *cqd, uint64_t *pkt_err_flags_out)
+enic_cq_rx_check_err(struct cq_desc *cqd)
 {
struct cq_enet_rq_desc *cqrd = (struct cq_enet_rq_desc *)cqd;
uint16_t bwflags;
-   int ret = 0;
-   uint64_t pkt_err_flags = 0;

bwflags = enic_cq_rx_desc_bwflags(cqrd);
-   if (unlikely(enic_cq_rx_desc_packet_error(bwflags))) {
-   pkt_err_flags = PKT_RX_MAC_ERR;
-   ret = 1;
-   }
-   *pkt_err_flags_out = pkt_err_flags;
-   return ret;
+   if (unlikely(enic_cq_rx_desc_packet_error(bwflags)))
+   return 1;
+   return 0;
 }

 /*
@@ -243,7 +238,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
struct enic *enic = vnic_dev_priv(rq->vdev);
unsigned int rx_id;
struct rte_mbuf *nmb, *rxmb;
-   uint16_t nb_rx = 0;
+   uint16_t nb_rx = 0, nb_err = 0;
uint16_t nb_hold;
struct vnic_cq *cq;
volatile struct cq_desc *cqd_ptr;
@@ -259,7 +254,6 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
volatile struct rq_enet_desc *rqd_ptr;
dma_addr_t dma_addr;
struct cq_desc cqd;
-   uint64_t ol_err_flags;
uint8_t packet_error;

/* Check for pkts available */
@@ -280,7 +274,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
}

/* A packet error means descriptor and data are untrusted */
-   packet_error = enic_cq_rx_to_pkt_err_flags(, _err_flags);
+   packet_error = enic_cq_rx_check_err();

/* Get the mbuf to return and replace with one just allocated */
rxmb = rq->mbuf_ring[rx_id];
@@ -307,20 +301,21 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rqd_ptr->length_type = cpu_to_le16(nmb->buf_len
   - RTE_PKTMBUF_HEADROOM);

+   /* Drop incoming bad packet */
+   if (unlikely(packet_error)) {
+   rte_pktmbuf_free(rxmb);
+   nb_err++;
+   continue;
+   }
+
/* Fill in the rest of the mbuf */
rxmb->data_off = RTE_PKTMBUF_HEADROOM;
rxmb->nb_segs = 1;
rxmb->next = NULL;
rxmb->port = enic->port_id;
-   if (!packet_error) {
-   rxmb->pkt_len = enic_cq_rx_desc_n_bytes();
-   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
-   enic_cq_rx_to_pkt_flags(, rxmb);
-   } else {
-   rxmb->pkt_len = 0;
-   rxmb->packet_type = 0;
-   rxmb->ol_flags = 0;
-   }
+   rxmb->pkt_len = enic_cq_rx_desc_n_bytes();
+   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
+   enic_cq_rx_to_pkt_flags(, rxmb);
rxmb->data_len = rxmb->pkt_len;

/* prefetch mbuf data for caller */
@@ -331,7 +326,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rx_pkts[nb_rx++] = rxmb;
}

-   nb_hold += nb_rx;
+   nb_hold += nb_rx + nb_err;
cq->to_clean = rx_id;

if (nb_hold > rq->rx_free_thresh) {
-- 
2.7.0

[dpdk-dev] [PATCH v3 01/13] enic: fix Rx drop counters

2016-06-02 Thread John Daley

rx_no_bufs is a hardware counter of packets dropped on the
interface due to no host buffers and should be used to update
r_stats->imissed counter instead of rx_nombuf.

Include rx_drop in ierrors. rx_drop is incremented if packets
arrive when the receive queue is disabled.

Add a structure and functions for initializing and clearing
software counters. Add count of Rx mbuf allocation failures
(rx_nombuf) as the first counter.

Fixes: fefed3d1e62c ("enic: new driver")
Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  |  7 +++
 drivers/net/enic/enic_main.c | 24 +---
 drivers/net/enic/enic_rx.c   |  5 +
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 09f3853..584d49b 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -91,6 +91,10 @@ struct enic_fdir {
struct enic_fdir_node *nodes[ENICPMD_FDIR_MAX];
 };

+struct enic_soft_stats {
+   rte_atomic64_t rx_nombuf;
+};
+
 /* Per-instance private data structure */
 struct enic {
struct enic *next;
@@ -133,6 +137,9 @@ struct enic {
/* interrupt resource */
struct vnic_intr intr;
unsigned int intr_count;
+
+   /* software counters */
+   struct enic_soft_stats soft_stats;
 };

 static inline unsigned int enic_cq_rq(__rte_unused struct enic *enic, unsigned 
int rq)
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index bbbe660..c002ef3 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -211,15 +211,30 @@ void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
  0 /*wrid*/);
 }

+static void enic_clear_soft_stats(struct enic *enic)
+{
+   struct enic_soft_stats *soft_stats = >soft_stats;
+   rte_atomic64_clear(_stats->rx_nombuf);
+}
+
+static void enic_init_soft_stats(struct enic *enic)
+{
+   struct enic_soft_stats *soft_stats = >soft_stats;
+   rte_atomic64_init(_stats->rx_nombuf);
+   enic_clear_soft_stats(enic);
+}
+
 void enic_dev_stats_clear(struct enic *enic)
 {
if (vnic_dev_stats_clear(enic->vdev))
dev_err(enic, "Error in clearing stats\n");
+   enic_clear_soft_stats(enic);
 }

 void enic_dev_stats_get(struct enic *enic, struct rte_eth_stats *r_stats)
 {
struct vnic_stats *stats;
+   struct enic_soft_stats *soft_stats;

if (vnic_dev_stats_dump(enic->vdev, )) {
dev_err(enic, "Error in getting stats\n");
@@ -232,12 +247,13 @@ void enic_dev_stats_get(struct enic *enic, struct 
rte_eth_stats *r_stats)
r_stats->ibytes = stats->rx.rx_bytes_ok;
r_stats->obytes = stats->tx.tx_bytes_ok;

-   r_stats->ierrors = stats->rx.rx_errors;
+   r_stats->ierrors = stats->rx.rx_errors + stats->rx.rx_drop;
r_stats->oerrors = stats->tx.tx_errors;

-   r_stats->imissed = stats->rx.rx_drop;
+   r_stats->imissed = stats->rx.rx_no_bufs;

-   r_stats->rx_nombuf = stats->rx.rx_no_bufs;
+   soft_stats = >soft_stats;
+   r_stats->rx_nombuf = rte_atomic64_read(_stats->rx_nombuf);
 }

 void enic_del_mac_address(struct enic *enic)
@@ -795,6 +811,8 @@ int enic_setup_finish(struct enic *enic)
 {
int ret;

+   enic_init_soft_stats(enic);
+
ret = enic_set_rss_nic_cfg(enic);
if (ret) {
dev_err(enic, "Failed to config nic, aborting.\n");
diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index f92f6bc..89c62ce 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -275,10 +275,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
/* allocate a new mbuf */
nmb = rte_mbuf_raw_alloc(rq->mp);
if (nmb == NULL) {
-   dev_err(enic, "RX mbuf alloc failed port=%u qid=%u",
-   enic->port_id, (unsigned)rq->index);
-   rte_eth_devices[enic->port_id].
-   data->rx_mbuf_alloc_failed++;
+   rte_atomic64_inc(>soft_stats.rx_nombuf);
break;
}

-- 
2.7.0

[dpdk-dev] [PATCH v3 00/13] enic counter fixes and Tx optimization

2016-06-02 Thread John Daley

The first 3 patches are related to drop counters. The remaining
patches make up refactoring, cleanup bug fixes and optimization of
the Tx path.

Changes since v2 are:
- Piotr Azarewicz's ol_flags patch
  http://www.dpdk.org/dev/patchwork/patch/12642
- fix Tx IP and UDP/TCP checksum offload

John Daley (13):
  enic: fix Rx drop counters
  enic: drop bad packets and remove unused Rx error flag
  enic: count truncated packets
  enic: put Tx and Rx functions into same file
  enic: remove some unused functions in Tx path
  enic: streamline mbuf handling in Tx path
  enic: use Tx completion messages instead of descriptors
  enic: refactor Tx mbuf recycling
  enic: optimize the Tx function
  enic: remove unused files and functions and variables
  enic: add an enic assert macro
  enic: expand local Tx mbuf flags variable to 64-bits
  enic: fix Tx IP and UDP/TCP checksum offload

 drivers/net/enic/Makefile|   2 +-
 drivers/net/enic/base/enic_vnic_wq.h |  79 --
 drivers/net/enic/base/vnic_cq.h  |  44 
 drivers/net/enic/base/vnic_wq.c  |  80 ++
 drivers/net/enic/base/vnic_wq.h  | 118 ++---
 drivers/net/enic/enic.h  |  48 +++-
 drivers/net/enic/enic_ethdev.c   |  67 +
 drivers/net/enic/enic_main.c | 156 +--
 drivers/net/enic/enic_res.h  |  80 +-
 drivers/net/enic/enic_rx.c   | 351 -
 drivers/net/enic/enic_rxtx.c | 490 +++
 11 files changed, 642 insertions(+), 873 deletions(-)
 delete mode 100644 drivers/net/enic/base/enic_vnic_wq.h
 delete mode 100644 drivers/net/enic/enic_rx.c
 create mode 100644 drivers/net/enic/enic_rxtx.c

-- 
2.7.0

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu

On 2 June 2016 at 15:10, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 06/01/2016 05:29 AM, Jianbo Liu wrote:
>>> enum rte_mbuf_prefetch_type {
>>> > PREFETCH0,
>>> > PREFETCH1,
>>> > ...
>>> > };
>>> >
>>> > static inline void
>>> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
>>> > struct rte_mbuf *m)
>>> > {
>>> > switch (type) {
>>> > case PREFETCH0:
>>> > rte_prefetch0(>cacheline0);
>>> > break;
>>> > case PREFETCH1:
>>> > rte_prefetch1(>cacheline0);
>>> > break;
>>> > ...
>>> > }
>>> >
>> How about adding these to forbid the illegal use of this macro?
>> enum rte_mbuf_prefetch_type {
>>  ENUM_prefetch0,
>>  ENUM_prefetch1,
>>  ...
>> };
>>
>> #define RTE_MBUF_PREFETCH_PART1(type, m) \
>> if (ENUM_##type == ENUM_prefretch0) \
>> rte_prefetch0(&(m)->cacheline0);   \
>> else if (ENUM_##type == ENUM_prefetch1) \
>> rte_prefetch1(&(m)->cacheline0); \
>> 
>>
>
> As Stephen stated, a static inline is better than a macro, mainly
> because it is understood by the compiler instead of beeing a dumb
> code replacement.
>
> Any reason why you would prefer a macro in that case?
>
For the simplicity reason. If not, we may have to write several
similar functions for different prefetchings.

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jianbo Liu

On 1 June 2016 at 14:00, Jerin Jacob  wrote:
> On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
>> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
>> > Hi Jianbo,
>> >
>> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> >> Change the inline function to macro with parameters
>> >>
>> >> Signed-off-by: Jianbo Liu 
>> >>
>> >> [...]
[...]
>> It's for performance consideration, and only on armv8a platform.
>
> Strictly it is not armv8 specific, IA also implemented this API with
> _MM_HINT_NTA hint.

I mean this patch is only for ixgbe vector PMD on armv8 platform.

>
> Do we really need non-temporal/transient version of prefetch for ixgbe?

Strictly speaking, we don't have to since we don't know how APPs use
the mbuf header.
But, is it high possibility that the second part is used only once or
short period because prefetching is done only when split_packet is not
NULL?

> If so, for x86 also it makes sense to keep it? Right?
>
> The primary use case for transient version would be use with pipe line
> line mode where the same cpu wont consume the packet.
>
> /**
>  * Prefetch a cache line into all cache levels (non-temporal/transient
>  * version)
>  *
>  * The non-temporal prefetch is intended as a prefetch hint that
>  * processor will
>  * use the prefetched data only once or short period, unlike the
>  * rte_prefetch0() function which imply that prefetched data to use
>  * repeatedly.
>  *
>  * @param p
>  *   Address to prefetch
>  */
> static inline void rte_prefetch_non_temporal(const volatile void *p);
>
>>
>> >
>> > By the way, I did not try to apply the patch, but it looks
>> > it's on top of dpdk-next-net/rel_16_07, right?
>> >
>> Yes

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Neil Horman

On Thu, Jun 02, 2016 at 07:41:10PM +, Wiles, Keith wrote:
> 
> On 6/2/16, 12:11 PM, "Neil Horman"  wrote:
> 
> >
> >1) The definition of a config structure that can be passed to rte_eal_init,
> >defining the configuration for that running process
> 
> Having a configuration structure means we have to have an ABI change to that 
> structure anytime we add or remove an option. I was thinking a very simple DB 
> of some kind would be better. Have the code query the DB to obtain the needed 
> information. The APIs used to query and set the DB needs to be very easy to 
> use as well.

Thats a fair point.  A decent starting point is likely a simple struct that
looks like this:

struct key_vals {
char *key;
union {
ulong longval;
void *ptrval;
} value;
};

struct config {
size_t count;
struct key_vals kvp[0];
};

> 
> Maybe each option can define its own structure if needed or just a simple 
> variable type can be used for the basic types (int, string, bool, ?)
> 
Well, if you have config sections that require mulitiple elements, I'd handle
that with naming, i.e. if you have a config group that has an int and char
value, I'd name them "group.intval", and "group.charval", so they are
independently searchable, but linked from a nomenclature standpoint.

> Would this work better in the long run, does a fixed structure still make 
> sense?
> 
No. I think you're ABI concerns are valid, but the above is likely a good
starting point to address them.

Best
Neil


> >
> >2) The creation and use of an API that various DPDK libraries can use to
> >retrieve that structure (or elements thereof), based on some explicit or 
> >imlicit
> >id, so that the configuration can be used (I'm thinking here specifically of
> >multiple dpdk applications using a dpdk shared library)
> >
> >3) The removal of the eal_parse_args code from the core dpdk library 
> >entirely,
> >packaging it instead as its own library that interprets command line 
> >arguments
> >as currently defined, and populates an instance of the structure defined in 
> >(1)
> >
> >4) Altering the Makefiles, so that the example apps link against the new 
> >library
> >in (3), altering the app source code to work with the config structure 
> >defined
> >in (1)
> >
> >With those steps, I think we will remove the command line bits from the dpdk
> >core, and do so without altering the user experience for any of the sample 
> >apps
> >(which will demonstrate to other developers that the same can be done with 
> >their
> >applications).  From there we will be free to create alternate methods of
> >populating the config struct defined in (1) (via JSON file, YAML, XML, or
> >whatever).
> >
> >Neil
> >
> >> >> 
> >> >> For the purposes of the example apps, it would seem that either JSON, 
> >> >> YAML, or
> >> >> the above Lua format would work just fine.
> >> >
> >> >+1
> >> >
> >> 
> >> Regards,
> >> ++Keith
> >> 
> >> 
> >
> 
> 
>

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 03:00:11PM +, Wiles, Keith wrote:
> I have been looking at a number of different options here and the direction I 
> was thinking was using a file for the options and configurations with the 
> data in a clean format.

It should be helpful and handy for productive usage. But for development
and debugging, I'd say CLI options is more convenient and flexible. I
would be more willing to fiddle with CLI options than editing config files.

In another word, +1, but I would also assume that we will keep the CLI
options.

> It could have been a INI file or JSON or XML, but they all seem to have some 
> problems I do not like. The INI file is too flat and I wanted a hierarchy in 
> the data, the JSON data is similar and XML is just hard to read. I wanted to 
> be able to manage multiple applications and possible system the DPDK/app 
> runs. The problem with the above formats is they are just data and not easy 
> to make decisions about the system and applications at runtime.

__Just__ want to increase the chaos a bit, here is another option:
YAML, which supports comments.

> 
> If the ?database? of information could be queried by the EAL, drivers and 
> application then we do not need to try and create a complex command line. It 
> would be nice to execute a DPDK applications like this:
> 
> ./some_dpdk_app ?config-file dpdk-config-filename

It could be simpler if you hardcode a default config file, say
/etc/dpdk.conf.

I'm thinking OVS guys would be happy to see that? :)

--yliu

[dpdk-dev] [PATCH v6 1/5] mempool: support external handler

2016-06-02 Thread Jan Viktorin

On Thu, 2 Jun 2016 12:23:41 +0100
"Hunt, David"  wrote:

> On 6/1/2016 6:54 PM, Jan Viktorin wrote:
> >
> > mp->populated_size--;
> > @@ -383,13 +349,16 @@ rte_mempool_populate_phys(struct rte_mempool *mp, 
> > char *vaddr,
> > unsigned i = 0;
> > size_t off;
> > struct rte_mempool_memhdr *memhdr;
> > -   int ret;
> >   
> > /* create the internal ring if not already done */
> > if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
> > -   ret = rte_mempool_ring_create(mp);
> > -   if (ret < 0)
> > -   return ret;
> > +   rte_errno = 0;
> > +   mp->pool = rte_mempool_ops_alloc(mp);
> > +   if (mp->pool == NULL) {
> > +   if (rte_errno == 0)
> > +   return -EINVAL;
> > +   return -rte_errno;
> > Are you sure the rte_errno is a positive value?  
> 
> If the user returns EINVAL, or similar, we want to return negative, as 
> per the rest of DPDK.

Oh, yes... you're right.

> 
> >> @@ -204,9 +205,13 @@ struct rte_mempool_memhdr {
> >>   struct rte_mempool {
> >>char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
> >>struct rte_ring *ring;   /**< Ring to store objects. */
> >> +  union {
> >> +  void *pool;  /**< Ring or pool to store objects */  
> > What about calling this pdata or priv? I think, it can improve some doc 
> > comments.
> > Describing a "pool" may refer to both the rte_mempool itself or to the 
> > mp->pool
> > pointer. The "priv" would help to understand the code a bit.
> >
> > Then the rte_mempool_alloc_t can be called rte_mempool_priv_alloc_t. Or 
> > something
> > similar. It's than clear enough, what the function should do...  
> 
> I'd lean towards pdata or maybe pool_data. Not sure about the function 
> name change though,
> the function does return a pool_data pointer, which we put into 
> mp->pool_data.

Yes. But from the name of the function, it is difficult to understand what is 
its
purpose.

> 
> >> +  uint64_t *pool_id;   /**< External mempool identifier */  
> > Why is pool_id a pointer? Is it a typo? I've understood it should be 64b 
> > wide
> > from the discussion (Olivier, Jerin, David):  
> 

[...]

> 
> >> +typedef void (*rte_mempool_free_t)(void *p);
> >> +
> >> +/**
> >> + * Put an object in the external pool.
> >> + * The *p pointer is the opaque data for a given mempool handler (ring,
> >> + * array, linked list, etc)  
> > The obj_table is not documented. Is it really a table? I'd called an array 
> > instead.  
> 
> You're probably right, but it's always been called obj_table, and I'm 
> not sure
> this patchset is the place to change it.  Maybe a follow up patch?

In that case, it's OK.

> 
> >> + */
> >> +typedef int (*rte_mempool_put_t)(void *p, void * const *obj_table, 
> >> unsigned n);  
> > unsigned int  
> Done
> >  
> >> +
> >> +/** Get an object from the external pool. */
> >> +typedef int (*rte_mempool_get_t)(void *p, void **obj_table, unsigned n);  
> > unsigned int  
> Done
> >  
> >> +
> >> +/** Return the number of available objects in the external pool. */  
> > Is the number of available objects or the total number of all objects
> > (so probably a constant value)?  
> 
> It's intended to be the umber of available objects

OK, forget it.

> 
> >  
> >> +typedef unsigned (*rte_mempool_get_count)(void *p);  
> > Should it be const void *p?  
> 
> Yes, it could be. Changed.
> 
> >  

[...]

> 
> >> + * @return
> >> + *   - 0: Sucess; the new handler is configured.
> >> + *   - EINVAL - Invalid handler name provided
> >> + *   - EEXIST - mempool already has a handler assigned  
> > They are returned as -EINVAL and -EEXIST.
> >
> > IMHO, using "-" here is counter-intuitive:
> >
> >   - EINVAL
> >
> > does it mean a positive with "-" or negative value?  
> 
> EINVAL is positive, so it's returning negative. Common usage in DPDK, 
> afaics.

Yes, of course. But it is not so clear from the doc. I've already wrote
a code checking the positive error codes. That was because of no "minus
sign" in the doc. So, my code was wrong and it took me a while to find
out the reason... I supposed, the positive value was intentional. Finally,
I had to lookup the source code (the calling path) to verify...

> 
> 
> >> + */
> >> +int
> >> +rte_mempool_set_handler(struct rte_mempool *mp, const char *name);  
> > rte_mempool_set_ops
> >
> > What about rte_mempool_set_ops_byname? Not a big deal...  
> 
> I agree.  rte_mempool_set_ops_byname
> 
> >> +
> >> +/**
> >> + * Register an external pool handler.  
> > Register mempool operations  
> 
> Done
> 
> >> + *
> >> + * @param h
> >> + *   Pointer to the external pool handler
> >> + * @return
> >> + *   - >=0: Sucess; return the index of the handler in the table.
> >> + *   - EINVAL - missing callbacks while registering handler
> >> + *   - ENOSPC - the maximum number of handlers has been reached  
> > - -EINVAL
> > - -ENOSPC  
> 
> :)

[dpdk-dev] [PATCH 0/5] cxgbe: add features to CXGBE PMD

2016-06-02 Thread Bruce Richardson

On Fri, May 06, 2016 at 01:13:14PM +0530, Rahul Lakkireddy wrote:
> This patch series add some features to CXGBE PMD.
> 
> Patch 1 fixes a bug where reading/writing PCI config space in BSD fails
> with EPERM due to missing write permission when opening /dev/pci/.
> 
> Patch 2 adds support to access PCI config space for CXGBE PMD.
> 
> Patch 3 programs PCIe completion timeout to 4 sec.
> 
> Patch 4 adds support to get/set EEPROM.
> 
> Patch 5 adds support to get register dump.
> 
> Rahul Lakkireddy (5):
>   pci: fix access to PCI config space in bsd
>   cxgbe: add support to access PCI config space
>   cxgbe: set default PCIe completion timeout
>   cxgbe: add support to get/set EEPROM
>   cxgbe: add support to get register dump
>
Applied to dpdk-next-net/rel_16_07

/Bruce

[dpdk-dev] [PATCH v5 0/6] Virtio-net PMD: QEMU QTest extension for container

2016-06-02 Thread Yuanhan Liu

On Thu, Jun 02, 2016 at 12:29:39PM +0900, Tetsuya Mukawa wrote:
> The patches will work on below patch series.
>  - [PATCH v5 0/8] virtio support for container
> 
> It seems his implementation will be changed a bit.
> So, this patch series are also going to be changed to follow his 
> implementation.

Hi Tetsuya,

TBH, I was considering to reject your v4: the code was quite messy. But
this v5 changed my mind a bit: it's much cleaner.

But still, I'd ask do we really need 2 virtio for container solutions?

That results to the same question that I'm sure you have already
answered before: in which way your solution outweighs Jianfeng's?

The reason I want to ask again is: 1), I wasn't actively participating
the discussion in last release, besides some common comments on virtio,
2), maybe it's time to make a decision that should we take one solution
only, if so, which one, or should we take both?

Thomas is Cc'ed, hope he can help on the decision making.

--yliu

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Thomas Monjalon

2016-06-02 06:41, Neil Horman:
> I'm not sure why you're focusing no selecting a config file format at all.  
> Why
> not just focus on removing the argument parsing from the core rte_eal_init 
> code,
> instead passing in a configuration struct that is stored and queried per
> application.  Leave the parsing of a config file and population of that config
> struct as an exercize to the application developer.  That way a given
> application can use command line options, config files, or whatever method 
> they
> choose, which would be in keeping with traditional application design.
> 
> For the purposes of the example apps, it would seem that either JSON, YAML, or
> the above Lua format would work just fine.

+1

[dpdk-dev] [PATCH] scripts: check commits with checkpatch

2016-06-02 Thread Thomas Monjalon

2016-06-02 13:33, Bruce Richardson:
> Testing this out here, I find that git format-patch includes the diff stats
> in the output, which then can trigger long-line warnings for the commit 
> message.
> 
>   WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
> (prefer a maximum 75 chars per line)
>   #8:
>examples/my-new-test-app/basicfwd.c | 342 
> +
> 
> To fix this, I suggest replacing "format-patch --stdout" with "show 
> --format=email"
> since that should give the same output just without the change stats.

or format-patch --no-stat

[dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy

2016-06-02 Thread Jerin Jacob

On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote:
> Hi Jerin,
> 
> On 06/01/2016 09:00 AM, Jerin Jacob wrote:
> > On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote:
> >> Today, the objects pointers are reversed only in the get(). It means
> >> that this code:
> >>
> >>rte_mempool_get_bulk(mp, table, 4);
> >>for (i = 0; i < 4; i++)
> >>printf("obj = %p\n", t[i]);
> >>rte_mempool_put_bulk(mp, table, 4);
> >>
> >>
> >>printf("-\n");
> >>rte_mempool_get_bulk(mp, table, 4);
> >>for (i = 0; i < 4; i++)
> >>printf("obj = %p\n", t[i]);
> >>rte_mempool_put_bulk(mp, table, 4);
> >>
> >> prints:
> >>
> >>addr1
> >>addr2
> >>addr3
> >>addr4
> >>-
> >>addr4
> >>addr3
> >>addr2
> >>addr1
> >>
> >> Which is quite strange.
> > 
> > IMO, It is the expected LIFO behavior. Right ?
> > 
> > What is not expected is the following, which is the case after change. Or 
> > Am I
> > missing something here?
> > 
> > addr1
> > addr2
> > addr3
> > addr4
> > -
> > addr1
> > addr2
> > addr3
> > addr4
> > 
> >>
> >> I don't think it would be an issue to replace the loop by a
> >> rte_memcpy(), it may increase the copy speed and it will be
> >> more coherent with the put().
> >>
> 
> I think the LIFO behavior should occur on a per-bulk basis. I mean,
> it should behave like in the exemplaes below:
> 
>   // pool cache is in state X
>   obj1 = mempool_get(mp)
>   obj2 = mempool_get(mp)
>   mempool_put(mp, obj2)
>   mempool_put(mp, obj1)
>   // pool cache is back in state X
> 
>   // pool cache is in state X
>   bulk1 = mempool_get_bulk(mp, 16)
>   bulk2 = mempool_get_bulk(mp, 16)
>   mempool_put_bulk(mp, bulk2, 16)
>   mempool_put_bulk(mp, bulk1, 16)
>   // pool cache is back in state X
> 

Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued 
buffer
comes out for next "get" makes more chance that buffer in Last level cache.

> Note that today it's not the case for bulks, since object addresses
> are reversed only in get(), we are not back in the original state.
> I don't really see the advantage of this.
> 
> Removing the reversing may accelerate the cache in case of bulk get,
> I think.

I tried in my setup, it was dropping the performance. Have you got
improvement in any setup?

Jerin

> 
> Regards,
> Olivier

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Jerin Jacob

On Thu, Jun 02, 2016 at 05:04:13PM +0800, Jianbo Liu wrote:
> On 1 June 2016 at 14:00, Jerin Jacob  
> wrote:
> > On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
> >> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> >> > Hi Jianbo,
> >> >
> >> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
> >> >> Change the inline function to macro with parameters
> >> >>
> >> >> Signed-off-by: Jianbo Liu 
> >> >>
> >> >> [...]
> [...]
> >> It's for performance consideration, and only on armv8a platform.
> >
> > Strictly it is not armv8 specific, IA also implemented this API with
> > _MM_HINT_NTA hint.
> 
> I mean this patch is only for ixgbe vector PMD on armv8 platform.
> 
> >
> > Do we really need non-temporal/transient version of prefetch for ixgbe?
> 
> Strictly speaking, we don't have to since we don't know how APPs use
> the mbuf header.

Then IMO it makes sense to keep the same behavior as x86 ixgbe driver.
Then on the upside, We may not need the new macros for part prefetching

Jerin

> But, is it high possibility that the second part is used only once or
> short period because prefetching is done only when split_packet is not
> NULL?
> 
> > If so, for x86 also it makes sense to keep it? Right?
> >
> > The primary use case for transient version would be use with pipe line
> > line mode where the same cpu wont consume the packet.
> >
> > /**
> >  * Prefetch a cache line into all cache levels (non-temporal/transient
> >  * version)
> >  *
> >  * The non-temporal prefetch is intended as a prefetch hint that
> >  * processor will
> >  * use the prefetched data only once or short period, unlike the
> >  * rte_prefetch0() function which imply that prefetched data to use
> >  * repeatedly.
> >  *
> >  * @param p
> >  *   Address to prefetch
> >  */
> > static inline void rte_prefetch_non_temporal(const volatile void *p);
> >
> >>
> >> >
> >> > By the way, I did not try to apply the patch, but it looks
> >> > it's on top of dpdk-next-net/rel_16_07, right?
> >> >
> >> Yes

[dpdk-dev] [PATCH] app/testpmd: log mbuf pool creation

2016-06-02 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, May 30, 2016 1:04 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo
> Subject: [PATCH] app/testpmd: log mbuf pool creation
> 
> Enhance the logs related to mbuf pool creation. Display an info level
> log when creating the mbuf, and display the error as a string on failure.
> 
> After the patch, we have:
> 
>   [...]
>   EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
>   USER1: create a new mbuf pool : n=331456, \
>   size=2176, socket=0
>   EAL: Error - exiting with code: 1
> Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate \
>   memory
> 
> Signed-off-by: Olivier Matz 

Acked-by: Pablo de Lara

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-06-02 Thread Yuanhan Liu

On Thu, Jun 02, 2016 at 06:38:49AM +, Xie, Huawei wrote:
> On 6/1/2016 3:13 PM, Yuanhan Liu wrote:
> > On Mon, May 30, 2016 at 05:06:20PM +0800, Huawei Xie wrote:
> >> We keep a common vq structure, containing only vq related fields,
> >> and then split others into RX, TX and control queue respectively.
> >>
> >> Signed-off-by: Huawei Xie 
> >> ---
> >> v2:
> >> - don't split virtio_dev_rx/tx_queue_setup
> >> v3:
> >> - fix some 80 char warnings
> >> - fix other newer version checkpatch warnings
> >> - remove '\n' in PMD_RX_LOG
> > Weird, I still saw them.
> 
> Maybe missed this.
> 
> 
> >
> > Besides that, I found a crash issue with this patch applied. You could
> > easily reproduce it by:
> >
> > testpmd> start tx_first
> > testpmd> quit
> >
> > [82774.055297] testpmd[9661]: segfault at 94 ip 004f7ef0 sp 
> > 7ffcb1fa
> > 66c0 error 4 in testpmd[40+25b000]
> > ./t.pmd: line 11:  9661 Segmentation fault   (core dumped) 
> > $RTE_SDK/$RTE
> > _TARGET/app/testpmd -c 0x1f -n 4 -- --nb-cores=4 -i --disable-hw-vlan 
> > --txqflags
> >  0xf00 --rxq=$nr_queues --txq=$nr_queues --rxd=256 --txd=256
> >
> > --yliu
> >
> 
> Couldn't reproduce. Seems like resource free issue. Do you test with
> multiple queues?

Nope, nr_queues is 1 by default.

--yliu

[dpdk-dev] [PATCH v7 0/5] mempool: add external mempool manager

2016-06-02 Thread Hunt, David

Since the cover letter seems to have gone missing, sending it again:

Here's the latest version of the External Mempool Manager patchset.
It's re-based on top of the latest head as of 19/5/2016, including
Olivier's 35-part patch series on mempool re-org [1]

[1] http://dpdk.org/ml/archives/dev/2016-May/039229.html


v7 changes:

  * Changed rte_mempool_handler_table to rte_mempool_ops_table
  * Changed hander_idx to ops_index in rte_mempool struct
  * Reworked comments in rte_mempool.h around ops functions
  * Changed rte_mempool_hander.c to rte_mempool_ops.c
  * Changed all functions containing _handler_ to _ops_
  * Now there is no mention of 'handler' left
  * Other small changes out of review of mailing list

v6 changes:

  * Moved the flags handling from rte_mempool_create_empty to
rte_mempool_create, as it's only there for backward compatibility
  * Various comment additions and cleanup
  * Renamed rte_mempool_handler to rte_mempool_ops
  * Added a union for *pool and u64 pool_id in struct rte_mempool
  * split the original patch into a few parts for easier review.
  * rename functions with _ext_ to _ops_.
  * addressed review comments
  * renamed put and get functions to enqueue and dequeue
  * changed occurences of rte_mempool_ops to const, as they
contain function pointers (security)
  * split out the default external mempool handler into a separate
patch for easier review

v5 changes:
  * rebasing, as it is dependent on another patch series [1]

v4 changes (Olivier Matz):
  * remove the rte_mempool_create_ext() function. To change the handler, the
user has to do the following:
- mp = rte_mempool_create_empty()
- rte_mempool_set_handler(mp, "my_handler")
- rte_mempool_populate_default(mp)
This avoids to add another function with more than 10 arguments, 
duplicating
the doxygen comments
  * change the api of rte_mempool_alloc_t: only the mempool pointer is 
required
as all information is available in it
  * change the api of rte_mempool_free_t: remove return value
  * move inline wrapper functions from the .c to the .h (else they won't be
inlined). This implies to have one header file (rte_mempool.h), or it
would have generate cross dependencies issues.
  * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due
to the use of && instead of &)
  * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining)
  * fix build with shared libraries (global handler has to be declared in
the .map file)
  * rationalize #include order
  * remove unused function rte_mempool_get_handler_name()
  * rename some structures, fields, functions
  * remove the static in front of rte_tailq_elem rte_mempool_tailq (comment
from Yuanhan)
  * test the ext mempool handler in the same file than standard mempool 
tests,
avoiding to duplicate the code
  * rework the custom handler in mempool_test
  * rework a bit the patch selecting default mbuf pool handler
  * fix some doxygen comments

v3 changes:
  * simplified the file layout, renamed to rte_mempool_handler.[hc]
  * moved the default handlers into rte_mempool_default.c
  * moved the example handler out into app/test/test_ext_mempool.c
  * removed is_mc/is_mp change, slight perf degredation on sp cached 
operation
  * removed stack hanler, may re-introduce at a later date
  * Changes out of code reviews

v2 changes:
  * There was a lot of duplicate code between rte_mempool_xmem_create and
rte_mempool_create_ext. This has now been refactored and is now
hopefully cleaner.
  * The RTE_NEXT_ABI define is now used to allow building of the library
in a format that is compatible with binaries built against previous
versions of DPDK.
  * Changes out of code reviews. Hopefully I've got most of them included.

The External Mempool Manager is an extension to the mempool API that allows
users to add and use an external mempool manager, which allows external 
memory
subsystems such as external hardware memory management systems and software
based memory allocators to be used with DPDK.

The existing API to the internal DPDK mempool manager will remain unchanged
and will be backward compatible. However, there will be an ABI breakage, as
the mempool struct is changing. These changes are all contained withing
RTE_NEXT_ABI defs, and the current or next code can be changed with
the CONFIG_RTE_NEXT_ABI config setting

There are two aspects to external mempool manager.
   1. Adding the code for your new mempool operations (ops). This is
  achieved by adding a new mempool ops source file into the
  librte_mempool library, and using the REGISTER_MEMPOOL_HANDLER macro.
   2. Using the new API to call rte_mempool_create_empty and
  rte_mempool_set_ops to create a new mempool
  using the name parameter to identify which ops to use.

New API calls added
  1. A new rte_mempool_create_empty() function
  2. rte_mempool_set_ops_byname() which sets the mempool's ops (functions)

[dpdk-dev] [PATCH v7 5/5] mbuf: allow apps to change default mempool ops

2016-06-02 Thread David Hunt

By default, the mempool ops used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_OPS.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 config/common_base |  1 +
 lib/librte_mbuf/rte_mbuf.c | 26 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/config/common_base b/config/common_base
index 47c26f6..899c038 100644
--- a/config/common_base
+++ b/config/common_base
@@ -394,6 +394,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
+CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS="ring_mp_mc"
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index eec1456..491230c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -153,6 +153,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
 {
+   struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

@@ -167,10 +168,27 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

-   return rte_mempool_create(name, n, elt_size,
-   cache_size, sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, _priv, rte_pktmbuf_init, NULL,
-   socket_id, 0);
+   mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
+sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+   if (mp == NULL)
+   return NULL;
+
+   rte_errno = rte_mempool_set_ops_byname(mp,
+   RTE_MBUF_DEFAULT_MEMPOOL_OPS);
+   if (rte_errno != 0) {
+   RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
+   return NULL;
+   }
+   rte_pktmbuf_pool_init(mp, _priv);
+
+   if (rte_mempool_populate_default(mp) < 0) {
+   rte_mempool_free(mp);
+   return NULL;
+   }
+
+   rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+   return mp;
 }

 /* do some sanity checks on a mbuf: panic if it fails */
-- 
2.5.5

[dpdk-dev] [PATCH v7 4/5] app/test: test external mempool manager

2016-06-02 Thread David Hunt

Use a minimal custom mempool external ops and check that it also
passes basic mempool autotests.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 app/test/test_mempool.c | 114 
 1 file changed, 114 insertions(+)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index b586249..52d6f4e 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -83,6 +83,97 @@
 static rte_atomic32_t synchro;

 /*
+ * Simple example of custom mempool structure. Holds pointers to all the
+ * elements which are simply malloc'd in this example.
+ */
+struct custom_mempool {
+   rte_spinlock_t lock;
+   unsigned count;
+   unsigned size;
+   void *elts[];
+};
+
+/*
+ * Loop through all the element pointers and allocate a chunk of memory, then
+ * insert that memory into the ring.
+ */
+static void *
+custom_mempool_alloc(const struct rte_mempool *mp)
+{
+   struct custom_mempool *cm;
+
+   cm = rte_zmalloc("custom_mempool",
+   sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0);
+   if (cm == NULL)
+   return NULL;
+
+   rte_spinlock_init(>lock);
+   cm->count = 0;
+   cm->size = mp->size;
+   return cm;
+}
+
+static void
+custom_mempool_free(void *p)
+{
+   rte_free(p);
+}
+
+static int
+custom_mempool_put(void *p, void * const *obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (cm->count + n > cm->size) {
+   ret = -ENOBUFS;
+   } else {
+   memcpy(>elts[cm->count], obj_table, sizeof(void *) * n);
+   cm->count += n;
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+
+static int
+custom_mempool_get(void *p, void **obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (n > cm->count) {
+   ret = -ENOENT;
+   } else {
+   cm->count -= n;
+   memcpy(obj_table, >elts[cm->count], sizeof(void *) * n);
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+static unsigned
+custom_mempool_get_count(void *p)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+
+   return cm->count;
+}
+
+static struct rte_mempool_ops mempool_ops_custom = {
+   .name = "custom_handler",
+   .alloc = custom_mempool_alloc,
+   .free = custom_mempool_free,
+   .put = custom_mempool_put,
+   .get = custom_mempool_get,
+   .get_count = custom_mempool_get_count,
+};
+
+MEMPOOL_REGISTER_OPS(mempool_ops_custom);
+
+/*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
  */
@@ -477,6 +568,7 @@ test_mempool(void)
 {
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
+   struct rte_mempool *mp_ext = NULL;

rte_atomic32_init();

@@ -505,6 +597,27 @@ test_mempool(void)
goto err;
}

+   /* create a mempool with an external handler */
+   mp_ext = rte_mempool_create_empty("test_ext",
+   MEMPOOL_SIZE,
+   MEMPOOL_ELT_SIZE,
+   RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
+   SOCKET_ID_ANY, 0);
+
+   if (mp_ext == NULL) {
+   printf("cannot allocate mp_ext mempool\n");
+   goto err;
+   }
+   if (rte_mempool_set_ops_byname(mp_ext, "custom_handler") < 0) {
+   printf("cannot set custom handler\n");
+   goto err;
+   }
+   if (rte_mempool_populate_default(mp_ext) < 0) {
+   printf("cannot populate mp_ext mempool\n");
+   goto err;
+   }
+   rte_mempool_obj_iter(mp_ext, my_obj_init, NULL);
+
/* retrieve the mempool from its name */
if (rte_mempool_lookup("test_nocache") != mp_nocache) {
printf("Cannot lookup mempool from its name\n");
@@ -545,6 +658,7 @@ test_mempool(void)
 err:
rte_mempool_free(mp_nocache);
rte_mempool_free(mp_cache);
+   rte_mempool_free(mp_ext);
return -1;
 }

-- 
2.5.5

[dpdk-dev] [PATCH v7 3/5] mempool: add default external mempool ops

2016-06-02 Thread David Hunt

The first patch in this series added the framework for an external
mempool manager. This patch in the series adds a set of default
ops (functioni callbacks) based on rte_ring.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 lib/librte_mempool/Makefile  |   1 +
 lib/librte_mempool/rte_mempool.h |   2 +
 lib/librte_mempool/rte_mempool_default.c | 153 +++
 3 files changed, 156 insertions(+)
 create mode 100644 lib/librte_mempool/rte_mempool_default.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 4cbf772..8cac29b 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 2
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index c33eeb8..b90f57c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -413,6 +413,8 @@ extern struct rte_mempool_ops_table rte_mempool_ops_table;
 static inline struct rte_mempool_ops *
 rte_mempool_ops_get(int ops_index)
 {
+   RTE_VERIFY(ops_index < RTE_MEMPOOL_MAX_OPS_IDX);
+
return _mempool_ops_table.ops[ops_index];
 }

diff --git a/lib/librte_mempool/rte_mempool_default.c 
b/lib/librte_mempool/rte_mempool_default.c
new file mode 100644
index 000..f2e7d95
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_default.c
@@ -0,0 +1,153 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static int
+common_ring_mp_put(void *p, void * const *obj_table, unsigned n)
+{
+   return rte_ring_mp_enqueue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_sp_put(void *p, void * const *obj_table, unsigned n)
+{
+   return rte_ring_sp_enqueue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_mc_get(void *p, void **obj_table, unsigned n)
+{
+   return rte_ring_mc_dequeue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_sc_get(void *p, void **obj_table, unsigned n)
+{
+   return rte_ring_sc_dequeue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static unsigned
+common_ring_get_count(void *p)
+{
+   return rte_ring_count((struct rte_ring *)p);
+}
+
+
+static void *
+common_ring_alloc(const struct rte_mempool *mp)
+{
+   int rg_flags = 0, ret;
+   char rg_name[RTE_RING_NAMESIZE];
+   struct rte_ring *r;
+
+   ret = snprintf(rg_name, sizeof(rg_name),
+   RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   if (ret < 0 || ret >= (int)sizeof(rg_name)) {
+   rte_errno = ENAMETOOLONG;
+   return NULL;
+   }
+
+   /* ring flags */
+   if (mp->flags & MEMPOOL_F_SP_PUT)
+   rg_flags |= RING_F_SP_ENQ;
+   if (mp->flags & MEMPOOL_F_SC_GET)
+   rg_flags |= RING_F_SC_DEQ;
+
+   /* Allocate the ring that will be used to store objects.
+* Ring functions will return appropriate errors if we are
+* running as a secondary

[dpdk-dev] [PATCH v7 2/5] mempool: remove rte_ring from rte_mempool struct

2016-06-02 Thread David Hunt

Now that we're moving to an external mempoool handler, which
uses a void *pool_data as a pointer to the pool data, remove the
unneeded ring pointer from the mempool struct.

Signed-off-by: David Hunt 
---
 app/test/test_mempool_perf.c | 1 -
 lib/librte_mempool/rte_mempool.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index cdc02a0..091c1df 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg)
   n_get_bulk);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
-   rte_ring_dump(stdout, mp->ring);
/* in this case, objects are lost... */
return -1;
}
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index a6b28b0..c33eeb8 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -204,7 +204,6 @@ struct rte_mempool_memhdr {
  */
 struct rte_mempool {
char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
-   struct rte_ring *ring;   /**< Ring to store objects. */
union {
void *pool_data; /**< Ring or pool to store objects */
uint64_t pool_id;/**< External mempool identifier */
-- 
2.5.5

[dpdk-dev] [PATCH v7 1/5] mempool: support external mempool operations

2016-06-02 Thread David Hunt

Until now, the objects stored in a mempool were internally stored in a
ring. This patch introduces the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_handler() right after rte_mempool_create_empty() allows
the user to change the handler that will be used when populating
the mempool.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 lib/librte_mempool/Makefile  |   1 +
 lib/librte_mempool/rte_mempool.c |  71 ---
 lib/librte_mempool/rte_mempool.h | 240 ---
 lib/librte_mempool/rte_mempool_ops.c | 141 
 4 files changed, 389 insertions(+), 64 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 43423e0..4cbf772 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -42,6 +42,7 @@ LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b54de43..1c61c57 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -148,7 +148,7 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
 #endif

/* enqueue in ring */
-   rte_ring_sp_enqueue(mp->ring, obj);
+   rte_mempool_ops_enqueue_bulk(mp, , 1);
 }

 /* call obj_cb() for each mempool element */
@@ -303,40 +303,6 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t 
elt_num,
return (size_t)paddr_idx << pg_shift;
 }

-/* create the internal ring */
-static int
-rte_mempool_ring_create(struct rte_mempool *mp)
-{
-   int rg_flags = 0, ret;
-   char rg_name[RTE_RING_NAMESIZE];
-   struct rte_ring *r;
-
-   ret = snprintf(rg_name, sizeof(rg_name),
-   RTE_MEMPOOL_MZ_FORMAT, mp->name);
-   if (ret < 0 || ret >= (int)sizeof(rg_name))
-   return -ENAMETOOLONG;
-
-   /* ring flags */
-   if (mp->flags & MEMPOOL_F_SP_PUT)
-   rg_flags |= RING_F_SP_ENQ;
-   if (mp->flags & MEMPOOL_F_SC_GET)
-   rg_flags |= RING_F_SC_DEQ;
-
-   /* Allocate the ring that will be used to store objects.
-* Ring functions will return appropriate errors if we are
-* running as a secondary process etc., so no checks made
-* in this function for that condition.
-*/
-   r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
-   mp->socket_id, rg_flags);
-   if (r == NULL)
-   return -rte_errno;
-
-   mp->ring = r;
-   mp->flags |= MEMPOOL_F_RING_CREATED;
-   return 0;
-}
-
 /* free a memchunk allocated with rte_memzone_reserve() */
 static void
 rte_mempool_memchunk_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
@@ -354,7 +320,7 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
void *elt;

while (!STAILQ_EMPTY(>elt_list)) {
-   rte_ring_sc_dequeue(mp->ring, );
+   rte_mempool_ops_dequeue_bulk(mp, , 1);
(void)elt;
STAILQ_REMOVE_HEAD(>elt_list, next);
mp->populated_size--;
@@ -383,13 +349,16 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
unsigned i = 0;
size_t off;
struct rte_mempool_memhdr *memhdr;
-   int ret;

/* create the internal ring if not already done */
if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
-   ret = rte_mempool_ring_create(mp);
-   if (ret < 0)
-   return ret;
+   rte_errno = 0;
+   mp->pool_data = rte_mempool_ops_alloc(mp);
+   if (mp->pool_data == NULL) {
+   if (rte_errno == 0)
+   return -EINVAL;
+   return -rte_errno;
+   }
}

/* mempool is already populated */
@@ -703,7 +672,7 @@ rte_mempool_free(struct rte_mempool *mp)
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_mempool_free_memchunks(mp);
-   rte_ring_free(mp->ring);
+   rte_mempool_ops_free(mp);
rte_memzone_free(mp->mz);
 }

@@ -815,6 +784,20 @@ rte_mempool_create_empty(const char *name, unsigned n, 
unsigned elt_size,
RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, 0));

te->data = mp;
+
+   /*
+* Since we have 4 combinations of the SP/SC/MP/MC examine the flags to
+* set the correct index into the table of ops structs.
+*/
+   if (flags & (MEMPOOL_F_SP_PUT | MEMPOOL_F_SC_GET))
+   rte_mempool_set_ops_byname(mp, "ring_sp_sc");
+   else if (flags & MEMPOOL_F_SP_PUT)
+

[dpdk-dev] [PATCH v7 0/5] mempool: add external mempool manager

2016-06-02 Thread David Hunt

Here's the latest version of the External Mempool Manager patchset.
It's re-based on top of the latest head as of 19/5/2016, including
Olivier's 35-part patch series on mempool re-org [1]

[1] http://dpdk.org/ml/archives/dev/2016-May/039229.html


v7 changes:

 * Changed rte_mempool_handler_table to rte_mempool_ops_table
 * Changed hander_idx to ops_index in rte_mempool struct
 * Reworked comments in rte_mempool.h around ops functions
 * Changed rte_mempool_hander.c to rte_mempool_ops.c
 * Changed all functions containing _handler_ to _ops_
 * Now there is no mention of 'handler' left
 * Other small changes out of review of mailing list

v6 changes:

 * Moved the flags handling from rte_mempool_create_empty to
   rte_mempool_create, as it's only there for backward compatibility
 * Various comment additions and cleanup
 * Renamed rte_mempool_handler to rte_mempool_ops
 * Added a union for *pool and u64 pool_id in struct rte_mempool
 * split the original patch into a few parts for easier review.
 * rename functions with _ext_ to _ops_.
 * addressed review comments
 * renamed put and get functions to enqueue and dequeue
 * changed occurences of rte_mempool_ops to const, as they
   contain function pointers (security)
 * split out the default external mempool handler into a separate
   patch for easier review

v5 changes:
 * rebasing, as it is dependent on another patch series [1]

v4 changes (Olivier Matz):
 * remove the rte_mempool_create_ext() function. To change the handler, the
   user has to do the following:
   - mp = rte_mempool_create_empty()
   - rte_mempool_set_handler(mp, "my_handler")
   - rte_mempool_populate_default(mp)
   This avoids to add another function with more than 10 arguments, duplicating
   the doxygen comments
 * change the api of rte_mempool_alloc_t: only the mempool pointer is required
   as all information is available in it
 * change the api of rte_mempool_free_t: remove return value
 * move inline wrapper functions from the .c to the .h (else they won't be
   inlined). This implies to have one header file (rte_mempool.h), or it
   would have generate cross dependencies issues.
 * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due
   to the use of && instead of &)
 * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining)
 * fix build with shared libraries (global handler has to be declared in
   the .map file)
 * rationalize #include order
 * remove unused function rte_mempool_get_handler_name()
 * rename some structures, fields, functions
 * remove the static in front of rte_tailq_elem rte_mempool_tailq (comment
   from Yuanhan)
 * test the ext mempool handler in the same file than standard mempool tests,
   avoiding to duplicate the code
 * rework the custom handler in mempool_test
 * rework a bit the patch selecting default mbuf pool handler
 * fix some doxygen comments

v3 changes:
 * simplified the file layout, renamed to rte_mempool_handler.[hc]
 * moved the default handlers into rte_mempool_default.c
 * moved the example handler out into app/test/test_ext_mempool.c
 * removed is_mc/is_mp change, slight perf degredation on sp cached operation
 * removed stack hanler, may re-introduce at a later date
 * Changes out of code reviews

v2 changes:
 * There was a lot of duplicate code between rte_mempool_xmem_create and
   rte_mempool_create_ext. This has now been refactored and is now
   hopefully cleaner.
 * The RTE_NEXT_ABI define is now used to allow building of the library
   in a format that is compatible with binaries built against previous
   versions of DPDK.
 * Changes out of code reviews. Hopefully I've got most of them included.

The External Mempool Manager is an extension to the mempool API that allows
users to add and use an external mempool manager, which allows external memory
subsystems such as external hardware memory management systems and software
based memory allocators to be used with DPDK.

The existing API to the internal DPDK mempool manager will remain unchanged
and will be backward compatible. However, there will be an ABI breakage, as
the mempool struct is changing. These changes are all contained withing
RTE_NEXT_ABI defs, and the current or next code can be changed with
the CONFIG_RTE_NEXT_ABI config setting

There are two aspects to external mempool manager.
  1. Adding the code for your new mempool operations (ops). This is
 achieved by adding a new mempool ops source file into the
 librte_mempool library, and using the REGISTER_MEMPOOL_HANDLER macro.
  2. Using the new API to call rte_mempool_create_empty and
 rte_mempool_set_ops to create a new mempool
 using the name parameter to identify which ops to use.

New API calls added
 1. A new rte_mempool_create_empty() function
 2. rte_mempool_set_ops_byname() which sets the mempool's ops (functions)
 3. An rte_mempool_populate_default() and rte_mempool_populate_anon() functions
which populates the mempool using the relevant ops

Several

[dpdk-dev] [PATCH] scripts: check commits with checkpatch

2016-06-02 Thread Bruce Richardson

On Thu, Jun 02, 2016 at 03:15:54PM +0200, Thomas Monjalon wrote:
> 2016-06-02 13:33, Bruce Richardson:
> > Testing this out here, I find that git format-patch includes the diff stats
> > in the output, which then can trigger long-line warnings for the commit 
> > message.
> > 
> > WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
> > (prefer a maximum 75 chars per line)
> > #8:
> >  examples/my-new-test-app/basicfwd.c | 342 
> > +
> > 
> > To fix this, I suggest replacing "format-patch --stdout" with "show 
> > --format=email"
> > since that should give the same output just without the change stats.
> 
> or format-patch --no-stat
>
Sure, whatever works! :-) It's just a longer command :-)

/Bruce

[dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet capturing

2016-06-02 Thread Ananyev, Konstantin



> -Original Message-
> From: Pattan, Reshma
> Sent: Thursday, June 02, 2016 1:32 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet 
> capturing
> 
> 
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, May 31, 2016 6:21 PM
> > To: Pattan, Reshma ; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet
> > capturing
> >
> >
> >
> > > -Original Message-
> > > From: Pattan, Reshma
> > > Sent: Tuesday, May 31, 2016 3:50 PM
> > > To: Ananyev, Konstantin; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for
> > > packet capturing
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, May 27, 2016 4:22 PM
> > > > To: Pattan, Reshma ; dev at dpdk.org
> > > > Cc: Pattan, Reshma 
> > > > Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for
> > > > packet capturing
> > > >
> > > >
> > > > > +static int
> > > > > +parse_num_mbufs(const char *key __rte_unused, const char *value,
> > > > > +void
> > > > > +*extra_args) {
> > > > > + int n;
> > > > > + struct pdump_tuples *pt = extra_args;
> > > > > +
> > > > > + n = atoi(value);
> > > > > + if (n > 1024)
> > > > > + pt->total_num_mbufs = (uint16_t) n;
> > > > > + else {
> > > > > + printf("total-num-mbufs %d invalid - must be > 1024\n", 
> > > > > n);
> > > > > + return -1;
> > > > > + }
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > >
> > > >
> > > > You have several parse functions - doing almost the same thing:
> > > > convert string to integer value and then check that this valu is
> > > > within specific range.
> > > > Why not to introduce one function that would accept as extra_args
> > > > pointer to the struct {uint64_t v; uint64_t min; uint64_t max; } So
> > > > inside that function you can check that: v >= min && v < max or so.
> > > > Then you can use that function all over the places.
> > > > Another possibility just have parse function that only does
> > > > conversion without any boundary checking, and make boundary check later
> > in parse_pdump().
> > > > In both cases you can re-use same parse function.
> > > >
> > >
> > > Yes, I do have 4 functions but in all I am not checking min and max
> > > values and log message also differs in each function.  So I would like to 
> > > retain
> > this as it is now.
> >
> > What for?
> >
> 
> OK, I will make changes to have parse function only for conversion and  
> boundary checks will be done
> In parse_pdump(). This looks ok to me.
> 
> I agree with rest of the comments and will take care of the same in next 
> revision of patch set.

Ok, thanks.
Konstantin

[dpdk-dev] [PATCH] crypto: fix null pointer dereferencing

2016-06-02 Thread De Lara Guarch, Pablo

Hi Deepak,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Deepak Kumar Jain
> Sent: Monday, May 23, 2016 4:47 PM
> To: Doherty, Declan
> Cc: Jain, Deepak K; dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] crypto: fix null pointer dereferencing
> 
> From: "Jain, Deepak K" 
> 
> Fix null pointer dereferencing by reporing if null and
> exiting the function.
> 
> Fixes: c0f87eb5252b ("cryptodev: change burst API to be crypto op oriented")
> Coverity issue: 126584
> 
> Signed-off-by: Deepak Kumar Jain 

Could you change the title to "aesni_mb: fix null pointer dereferencing",
to specify that the patch is for the aesni mb pmd?

Thanks,
Pablo

[dpdk-dev] [PATCH v2 0/3] Keep-alive enhancements

2016-06-02 Thread Tahhan, Maryam

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Wednesday, May 18, 2016 10:30 AM
> To: dev at dpdk.org; Mcnamara, John 
> Subject: [dpdk-dev] [PATCH v2 0/3] Keep-alive enhancements
> 
> This patchset adds enhancements to the keepalive core monitoring
> and reporting sub-system. The first is support for idled (sleeping and
> frequency-stepped) CPU cores, and the second is support for
> applications to be notified of active as well as faulted cores. The latter
> is to allow core state to be relayed to external (secondary) processes,
> which is demonstrated by changes to the l2fed-keepalive example.
> 
> --

Acked-by: Maryam Tahhan

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Matthew Hall

On Thu, Jun 02, 2016 at 04:08:37PM -0400, Neil Horman wrote:
> struct key_vals {
>   char *key;
>   union {
>   ulong longval;
>   void *ptrval;
>   } value;
> };
> 
> struct config {
>   size_t count;
>   struct key_vals kvp[0];
> };

This sort of code is very 1970s / ioctl / messy binary. And doesn't buy any 
performance advantage because it's just for config.

Something that looks more like sysctl MIBs with hierarchical names or like 
JSON w/ a hierarchy of hash tables and arrays is much less user-hostile.

https://www.freebsd.org/cgi/man.cgi?sysctl(3)

http://json-c.github.io/json-c/json-c-0.12/doc/html/json__object_8h.html

Matthew.

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Wiles, Keith

On 6/2/16, 8:19 AM, "Thomas Monjalon"  wrote:

>2016-06-02 06:41, Neil Horman:
>> I'm not sure why you're focusing no selecting a config file format at all.  
>> Why

The reason is I am on now looking at formats is because I have been thinking 
about this issue for some time and already understand your comments. I agree 
with you and Thomas, which to me would be the details needing to be done as 
part of the project. I guess I found the config file format a lot more fun to 
define. ?

>> not just focus on removing the argument parsing from the core rte_eal_init 
>> code,
>> instead passing in a configuration struct that is stored and queried per
>> application.  Leave the parsing of a config file and population of that 
>> config
>> struct as an exercize to the application developer.  That way a given
>> application can use command line options, config files, or whatever method 
>> they
>> choose, which would be in keeping with traditional application design.

Moving the code out of DPDK into multiple different libraries one for 
converting command line to config structure (support the current options) and 
possibly some config file format library to config structure would give options 
for the developers. DPDK just needs to be driven by a configuration structure 
or set of APIs and not use args/argv directly.

Moving the current args/argv code out of DPDK into a library should be easy (I 
guess) and I am willing to do that work if we think it is needed today.

>> 
>> For the purposes of the example apps, it would seem that either JSON, YAML, 
>> or
>> the above Lua format would work just fine.
>
>+1
>

Regards,
++Keith

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Matthew Hall

On Thu, Jun 02, 2016 at 07:41:10PM +, Wiles, Keith wrote:
> Would this work better in the long run, does a fixed structure still make 
> sense?

This right here is why I suggested libjson-c as an example. It has a nice API 
like this already:

http://json-c.github.io/json-c/json-c-0.12/doc/html/json__object_8h.html

Matthew.

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-06-02 Thread Ilya Maximets

Hi, Rich.
Thank you for testing and analysing.

On 01.06.2016 01:06, Rich Lane wrote:
> On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets  > wrote:
> 
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
> 
> 
> I see a performance regression with this patch at large packet sizes (> 768 
> bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, 
> there's actually a ~1% performance improvement at small packet sizes.
> 
> The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
> 
> AFAICT this is just the compiler generating bad code. One difference is that 
> it's storing the offset on the stack instead of in a register. A workaround 
> is to move the !desc_addr check outside the unlikely macros.
> 
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
> vhost_virtqueue *vq,
> struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 
> 0};
> 
> desc = >desc[desc_idx];
> -   if (unlikely(desc->len < vq->vhost_hlen))
> +   desc_addr = gpa_to_vva(dev, desc->addr);
> +   if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
> 
> 
>  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || 
> !desc_addr)".
> 
> return -1;
> 
> 
> -   desc_addr = gpa_to_vva(dev, desc->addr);
> rte_prefetch0((void *)(uintptr_t)desc_addr);
> 
> virtio_enqueue_offload(m, _hdr.hdr);
> @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
> vhost_virtqueue *vq,
> 
> desc = >desc[desc->next];
> desc_addr   = gpa_to_vva(dev, desc->addr);
> +   if (unlikely(!desc_addr))
> 
> 
> Workaround: change to "if (!desc_addr)".
>  
> 
> +   return -1;
> +
> desc_offset = 0;
> desc_avail  = desc->len;
> }
> 

What about other places? Is there same issues or it's only inside 
copy_mbuf_to_desc() ?

Best regards, Ilya Maximets.

[dpdk-dev] [PATCH] scripts: check commits with checkpatch

2016-06-02 Thread Bruce Richardson

On Thu, Jun 02, 2016 at 11:13:15AM +0200, Thomas Monjalon wrote:
> The new option -n allows to give a number of commits to check
> from the git HEAD.
> If neither -n nor patch files are given, the commits after
> origin/master are checked.
> 

Yep, I love it, exactly what I wanted to replace my own checkpatch script 
wrapper!

> Signed-off-by: Thomas Monjalon 
> ---
>  scripts/checkpatches.sh | 45 -
>  1 file changed, 36 insertions(+), 9 deletions(-)
> 
> diff --git a/scripts/checkpatches.sh b/scripts/checkpatches.sh

> @@ -74,17 +76,42 @@ if [ ! -x "$DPDK_CHECKPATCH_PATH" ] ; then
>   exit 1
>  fi
>  
> +total=0
>  status=0
> -for p in "$@" ; do
> - ! $verbose || printf '\n### %s\n\n' "$p"
> - report=$($DPDK_CHECKPATCH_PATH $options "$p" 2>/dev/null)
> +
> +check () { #   
> + total=$(($total + 1))
> + ! $verbose || printf '\n### %s\n\n' "$3"
> + if [ -n "$1" ] ; then
> + report=$($DPDK_CHECKPATCH_PATH $options "$1" 2>/dev/null)
> + elif [ -n "$2" ] ; then
> + report=$(git format-patch --stdout -1 $commit |
> + $DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
> + fi

Testing this out here, I find that git format-patch includes the diff stats
in the output, which then can trigger long-line warnings for the commit message.

WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
(prefer a maximum 75 chars per line)
#8:
 examples/my-new-test-app/basicfwd.c | 342 
+

To fix this, I suggest replacing "format-patch --stdout" with "show 
--format=email"
since that should give the same output just without the change stats.

With this one adjustment:

Acked-by: Bruce Richardson

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Neil Horman

On Thu, Jun 02, 2016 at 01:53:32PM +, Wiles, Keith wrote:
> 
> On 6/2/16, 8:19 AM, "Thomas Monjalon"  wrote:
> 
> >2016-06-02 06:41, Neil Horman:
> >> I'm not sure why you're focusing no selecting a config file format at all. 
> >>  Why
> 
> The reason is I am on now looking at formats is because I have been thinking 
> about this issue for some time and already understand your comments. I agree 
> with you and Thomas, which to me would be the details needing to be done as 
> part of the project. I guess I found the config file format a lot more fun to 
> define. ?
> 

Sure, it is more fun to define, but I think its likely the wrong problem to
solve (or perhaps not even the wrong problem, but rather the less pressing
problem).

> >> not just focus on removing the argument parsing from the core rte_eal_init 
> >> code,
> >> instead passing in a configuration struct that is stored and queried per
> >> application.  Leave the parsing of a config file and population of that 
> >> config
> >> struct as an exercize to the application developer.  That way a given
> >> application can use command line options, config files, or whatever method 
> >> they
> >> choose, which would be in keeping with traditional application design.
> 
> Moving the code out of DPDK into multiple different libraries one for 
> converting command line to config structure (support the current options) and 
> possibly some config file format library to config structure would give 
> options for the developers. DPDK just needs to be driven by a configuration 
> structure or set of APIs and not use args/argv directly.

Yes.  So we agree?

> 
> Moving the current args/argv code out of DPDK into a library should be easy 
> (I guess) and I am willing to do that work if we think it is needed today.
> 
Yes, I think thats the more pressing problem.  To ennumerate, I think whats
really called for is:

1) The definition of a config structure that can be passed to rte_eal_init,
defining the configuration for that running process

2) The creation and use of an API that various DPDK libraries can use to
retrieve that structure (or elements thereof), based on some explicit or imlicit
id, so that the configuration can be used (I'm thinking here specifically of
multiple dpdk applications using a dpdk shared library)

3) The removal of the eal_parse_args code from the core dpdk library entirely,
packaging it instead as its own library that interprets command line arguments
as currently defined, and populates an instance of the structure defined in (1)

4) Altering the Makefiles, so that the example apps link against the new library
in (3), altering the app source code to work with the config structure defined
in (1)

With those steps, I think we will remove the command line bits from the dpdk
core, and do so without altering the user experience for any of the sample apps
(which will demonstrate to other developers that the same can be done with their
applications).  From there we will be free to create alternate methods of
populating the config struct defined in (1) (via JSON file, YAML, XML, or
whatever).

Neil

> >> 
> >> For the purposes of the example apps, it would seem that either JSON, 
> >> YAML, or
> >> the above Lua format would work just fine.
> >
> >+1
> >
> 
> Regards,
> ++Keith
> 
>

[dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet capturing

2016-06-02 Thread Pattan, Reshma



> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, May 31, 2016 6:21 PM
> To: Pattan, Reshma ; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet
> capturing
> 
> 
> 
> > -Original Message-
> > From: Pattan, Reshma
> > Sent: Tuesday, May 31, 2016 3:50 PM
> > To: Ananyev, Konstantin; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for
> > packet capturing
> >
> >
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Friday, May 27, 2016 4:22 PM
> > > To: Pattan, Reshma ; dev at dpdk.org
> > > Cc: Pattan, Reshma 
> > > Subject: RE: [dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for
> > > packet capturing
> > >
> > >
> > > > +static int
> > > > +parse_num_mbufs(const char *key __rte_unused, const char *value,
> > > > +void
> > > > +*extra_args) {
> > > > +   int n;
> > > > +   struct pdump_tuples *pt = extra_args;
> > > > +
> > > > +   n = atoi(value);
> > > > +   if (n > 1024)
> > > > +   pt->total_num_mbufs = (uint16_t) n;
> > > > +   else {
> > > > +   printf("total-num-mbufs %d invalid - must be > 1024\n", 
> > > > n);
> > > > +   return -1;
> > > > +   }
> > > > +
> > > > +   return 0;
> > > > +}
> > >
> > >
> > > You have several parse functions - doing almost the same thing:
> > > convert string to integer value and then check that this valu is
> > > within specific range.
> > > Why not to introduce one function that would accept as extra_args
> > > pointer to the struct {uint64_t v; uint64_t min; uint64_t max; } So
> > > inside that function you can check that: v >= min && v < max or so.
> > > Then you can use that function all over the places.
> > > Another possibility just have parse function that only does
> > > conversion without any boundary checking, and make boundary check later
> in parse_pdump().
> > > In both cases you can re-use same parse function.
> > >
> >
> > Yes, I do have 4 functions but in all I am not checking min and max
> > values and log message also differs in each function.  So I would like to 
> > retain
> this as it is now.
> 
> What for?
> 

OK, I will make changes to have parse function only for conversion and  
boundary checks will be done
In parse_pdump(). This looks ok to me. 

I agree with rest of the comments and will take care of the same in next 
revision of patch set.

> >
> > Thanks,
> > Reshma

[dpdk-dev] [PATCH v1 2/2] virtio, qtest: Add functionality to handle interrupt

2016-06-02 Thread Tetsuya Mukawa

The patch adds functionality to handle interrupt from pci device of
QEMU guest. To handle the interrupts, the patch adds to initialize piix3
pci device.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_ethdev.c |   7 +-
 drivers/net/virtio/virtio_qtest/qtest.h|   3 +-
 drivers/net/virtio/virtio_qtest/qtest_utils.c  | 225 -
 drivers/net/virtio/virtio_qtest/qtest_utils.h  |  68 ++-
 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.c |  23 ++-
 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c |  64 --
 6 files changed, 360 insertions(+), 30 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 8b5fb66..e8737ab 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1043,7 +1043,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
struct virtio_net_config *config;
struct virtio_net_config local_config;
struct rte_pci_device *pci_dev;
-   uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
int ret;

RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct 
virtio_net_hdr_mrg_rxbuf));
@@ -1067,8 +1066,9 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

+   eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
if (pci_dev) {
-   ret = vtpci_init(pci_dev, hw, _flags);
+   ret = vtpci_init(pci_dev, hw, _dev->data->dev_flags);
if (ret)
return ret;
}
@@ -1086,10 +1086,9 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
-   dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+   eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;

rte_eth_copy_pci_info(eth_dev, pci_dev);
-   eth_dev->data->dev_flags = dev_flags;

rx_func_get(eth_dev);

diff --git a/drivers/net/virtio/virtio_qtest/qtest.h 
b/drivers/net/virtio/virtio_qtest/qtest.h
index 534c5a0..7e8c093 100644
--- a/drivers/net/virtio/virtio_qtest/qtest.h
+++ b/drivers/net/virtio/virtio_qtest/qtest.h
@@ -35,7 +35,7 @@
 #define _VIRTIO_QTEST_H_

 #define QTEST_DRV_NAME "eth_virtio_qtest"
-#define QTEST_DEVICE_NUM2
+#define QTEST_DEVICE_NUM3

 #include 
 #include 
@@ -43,6 +43,7 @@
 /* Device information */
 #define VIRTIO_NET_DEVICE_ID0x1000
 #define VIRTIO_NET_VENDOR_ID0x1af4
+#define VIRTIO_NET_IRQ_NUM  10
 #define IVSHMEM_DEVICE_ID   0x1110
 #define IVSHMEM_VENDOR_ID   0x1af4
 #define PIIX3_DEVICE_ID 0x7000
diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.c 
b/drivers/net/virtio/virtio_qtest/qtest_utils.c
index 27118fb..c5a3a7a 100644
--- a/drivers/net/virtio/virtio_qtest/qtest_utils.c
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -43,6 +44,12 @@
 #include "../virtio_ethdev.h"
 #include "qtest_utils.h"

+/* PIIX3 configuration registers */
+#define PIIX3_REG_ADDR_PIRQA0x60
+#define PIIX3_REG_ADDR_PIRQB0x61
+#define PIIX3_REG_ADDR_PIRQC0x62
+#define PIIX3_REG_ADDR_PIRQD0x63
+
 /* ivshmem configuration */
 #define IVSHMEM_PROTOCOL_VERSION0

@@ -74,6 +81,14 @@ struct qtest_session {
size_t evq_total_len;

union qtest_pipefds msgfds;
+
+   int irqno;
+   pthread_t intr_th;
+   int intr_th_started;
+   int eventfd;
+   rte_atomic16_t enable_intr;
+   rte_intr_callback_fn cb;
+   void *cb_arg;
 };

 static int
@@ -230,6 +245,29 @@ qtest_pci_inb(struct qtest_session *s, uint8_t bus, 
uint8_t device,
return (tmp >> ((offset & 0x3) * 8)) & 0xff;
 }

+static void
+qtest_pci_outb(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint8_t value)
+{
+   uint32_t addr, tmp, pos;
+
+   addr = PCI_CONFIG_ADDR(bus, device, function, offset);
+   pos = (offset % 4) * 8;
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, addr, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+   tmp = (tmp & ~(0xff << pos)) | (value << pos);
+
+   qtest_raw_out(s, 0xcf8, addr, 'l');
+   qtest_raw_out(s, 0xcfc, tmp, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
 static uint32_t
 qtest_pci_inl(struct qtest_session *s, uint8_t bus, uint8_t device,
uint8_t function, uint8_t offset)
@@ -466,15 +504,112 @@ qtest_get_bar_size(struct qtest_session *s, const char 
*name,
return 0;
 }

+int
+qtest_intr_enable(struct qtest_session *s)
+{
+

[dpdk-dev] [PATCH v1 1/2] virtio: Handle interrupt things under vtpci abstraction

2016-06-02 Thread Tetsuya Mukawa

So far, interrupts from PCI devices are handled in virtio_ethdev
directly. The patch changes it, and try to handle it under vtpci
abstraction. The patch is needed because virtio-qtest needs to handle
interrupts from virtual pci devices.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_ethdev.c | 10 ++---
 drivers/net/virtio/virtio_pci.c| 86 ++
 drivers/net/virtio/virtio_pci.h|  7 
 3 files changed, 72 insertions(+), 31 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c35d1c0..8b5fb66 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1011,7 +1011,7 @@ virtio_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,
isr = vtpci_isr(hw);
PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);

-   if (rte_intr_enable(>pci_dev->intr_handle) < 0)
+   if (hw->vtpci_ops->intr_enable(hw) < 0)
PMD_DRV_LOG(ERR, "interrupt enable failed");

if (isr & VIRTIO_PCI_ISR_CONFIG) {
@@ -1170,7 +1170,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* Setup interrupt callback  */
if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
-   rte_intr_callback_register(_dev->intr_handle,
+   hw->vtpci_ops->intr_cb_register(hw,
   virtio_interrupt_handler, eth_dev);

virtio_dev_cq_start(eth_dev);
@@ -1205,7 +1205,7 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)

/* reset interrupt callback  */
if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
-   rte_intr_callback_unregister(_dev->intr_handle,
+   hw->vtpci_ops->intr_cb_unregister(hw,
virtio_interrupt_handler,
eth_dev);
rte_eal_pci_unmap_device(pci_dev);
@@ -1294,7 +1294,7 @@ virtio_dev_start(struct rte_eth_dev *dev)
return -ENOTSUP;
}

-   if (rte_intr_enable(>pci_dev->intr_handle) < 0) {
+   if (hw->vtpci_ops->intr_enable(hw) < 0) {
PMD_DRV_LOG(ERR, "interrupt enable failed");
return -EIO;
}
@@ -1398,7 +1398,7 @@ virtio_dev_stop(struct rte_eth_dev *dev)
hw->started = 0;

if (dev->data->dev_conf.intr_conf.lsc)
-   rte_intr_disable(>pci_dev->intr_handle);
+   hw->vtpci_ops->intr_disable(hw);

memset(, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, );
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 6bd239c..acbc9b1 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -71,6 +71,32 @@ check_vq_phys_addr_ok(struct virtqueue *vq)
return 1;
 }

+static int
+intr_cb_register(struct virtio_hw *hw,
+   rte_intr_callback_fn cb, void *cb_arg)
+{
+   return rte_intr_callback_register(>dev->intr_handle, cb, cb_arg);
+}
+
+static int
+intr_cb_unregister(struct virtio_hw *hw,
+   rte_intr_callback_fn cb, void *cb_arg)
+{
+   return rte_intr_callback_register(>dev->intr_handle, cb, cb_arg);
+}
+
+static int
+intr_enable(struct virtio_hw *hw)
+{
+   return rte_intr_enable(>dev->intr_handle);
+}
+
+static int
+intr_disable(struct virtio_hw *hw)
+{
+   return rte_intr_disable(>dev->intr_handle);
+}
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
   void *dst, int length)
@@ -234,19 +260,23 @@ legacy_virtio_resource_init(struct rte_pci_device 
*pci_dev,
 }

 static const struct virtio_pci_ops legacy_ops = {
-   .read_dev_cfg   = legacy_read_dev_config,
-   .write_dev_cfg  = legacy_write_dev_config,
-   .reset  = legacy_reset,
-   .get_status = legacy_get_status,
-   .set_status = legacy_set_status,
-   .get_features   = legacy_get_features,
-   .set_features   = legacy_set_features,
-   .get_isr= legacy_get_isr,
-   .set_config_irq = legacy_set_config_irq,
-   .get_queue_num  = legacy_get_queue_num,
-   .setup_queue= legacy_setup_queue,
-   .del_queue  = legacy_del_queue,
-   .notify_queue   = legacy_notify_queue,
+   .read_dev_cfg   = legacy_read_dev_config,
+   .write_dev_cfg  = legacy_write_dev_config,
+   .reset  = legacy_reset,
+   .get_status = legacy_get_status,
+   .set_status = legacy_set_status,
+   .get_features   = legacy_get_features,
+   .set_features   = legacy_set_features,
+   .get_isr= legacy_get_isr,
+   .set_config_irq = legacy_set_config_irq,
+   .get_queue_num  = legacy_get_queue_num,
+   .setup_queue= legacy_setup_queue,
+

[dpdk-dev] [PATCH v1 0/2] Supplement patches for virtio-qtest to support LSC interrupt

2016-06-02 Thread Tetsuya Mukawa

This is patches to support LSC interrupt handling for virtio-qtest.
This patches should be on below patches.
 - [PATCH v5 0/6] Virtio-net PMD: QEMU QTest extension for container

To support LSC interrupts, vtpci abstraction was expanded to handle interrupt 
from
pci devices.
Actually, this PMD is handling a virtual virtio-net device. So handling 
interrupts
are a bit different from actual pci devices. In this case, all interrupts are 
come
from unix domain socket connected to QEMU.


Tetsuya Mukawa (2):
  virtio: Handle interrupt things under vtpci abstraction
  virtio, qtest: Add functionality to handle interrupt

 drivers/net/virtio/virtio_ethdev.c |  17 +-
 drivers/net/virtio/virtio_pci.c|  86 +---
 drivers/net/virtio/virtio_pci.h|   7 +
 drivers/net/virtio/virtio_qtest/qtest.h|   3 +-
 drivers/net/virtio/virtio_qtest/qtest_utils.c  | 225 -
 drivers/net/virtio/virtio_qtest/qtest_utils.h  |  68 ++-
 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.c |  23 ++-
 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c |  64 --
 8 files changed, 432 insertions(+), 61 deletions(-)

-- 
2.7.4

[dpdk-dev] [PATCH v5 6/6] virtio: Add QTest support for virtio-net PMD

2016-06-02 Thread Tetsuya Mukawa

The patch adds a new virtio-net PMD configuration that allows the PMD to
work on host as if the PMD is in VM.
Here is new configuration for virtio-net PMD.
 - CONFIG_RTE_VIRTIO_QTEST
To use this mode, EAL needs map all hugepages as one file. Also the file
should be mapped between (1 << 31) and (1 << 44). And start address
should be aligned by EAL memory size.

To allocate like above, use below '--base-virtaddr' option with
appropriate value.
If EAL cannot allocate memory like above, the PMD will return error
while initialization. In the case, try other values.
Later supplement patches will help allocating EAL memory like above.

To prepare virtio-net device on host, the users need to invoke QEMU
process in special QTest mode. This mode is mainly used for testing QEMU
devices from outer process. In this mode, no guest runs.
Here is QEMU command line.

 $ qemu-system-x86_64 \
 -machine pc-i440fx-1.4,accel=qtest \
 -display none -qtest-log /dev/null \
 -qtest unix:/tmp/socket,server \
 -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
 -device
virtio-net-pci,netdev=net0,mq=on,disable-modern=false,addr=3 \
 -chardev socket,id=chr1,path=/tmp/ivshmem,server \
 -device ivshmem,size=1G,chardev=chr1,vectors=1,addr=4

 * Should use QEMU-2.6, or above.
 * QEMU process is needed per port.
 * virtio-1.0 device are only supported.
 * The vhost backends like vhost-net and vhost-user can be specified.
 * In most cases, just using above command is enough, but you can also
   specify other QEMU virtio-net options like mac address.
 * Only checked "pc-i440fx-1.4" machine, but may work with other
   machines.
 * Should not add "--enable-kvm" to QEMU command line.

After invoking QEMU, the PMD can connect to QEMU process using unix
domain sockets. Over these sockets, virtio-net and ivshmem in QEMU
are probed by the PMD.
Here is example of command line.

 $ testpmd -c f -n 1 -m 1024 --no-pci --base-virtaddr=0x4 \
  --vdev="eth_virtio_qtest0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem"\
  -- --disable-hw-vlan --txqflags=0xf00 -i

Please specify same unix domain sockets and memory size in both QEMU
and DPDK command lines like above.
The share memory size should be power of 2, because ivshmem only
accepts such memory size.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/Makefile|   1 +
 drivers/net/virtio/virtio_ethdev.c |   3 +-
 drivers/net/virtio/virtio_ethdev.h |   1 +
 drivers/net/virtio/virtio_qtest/qtest.h|  95 +
 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.c | 393 +
 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.h |  42 +++
 drivers/net/virtio/virtqueue.h |   6 +-
 7 files changed, 536 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_qtest/qtest.h
 create mode 100644 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.c
 create mode 100644 drivers/net/virtio/virtio_qtest/virtio_qtest_dev.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 1c86d9d..5933205 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -66,6 +66,7 @@ endif
 ifeq ($(CONFIG_RTE_VIRTIO_QTEST),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/qtest_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/virtio_qtest_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/virtio_qtest_dev.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index f8972f2..c35d1c0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -59,7 +59,6 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

-static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev);
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
 static void virtio_dev_stop(struct rte_eth_dev *dev);
@@ -1179,7 +1178,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
return 0;
 }

-static int
+int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 284afaa..cbb03f5 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -114,6 +114,7 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct 
rte_mbuf **tx_pkts,
uint16_t nb_pkts);

 int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
+int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev);

 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_qtest/qtest.h 
b/drivers/net/virtio/virtio_qtest/qtest.h
new file mode 100644
index 000..534c5a0
--- /dev/null
+++ b/drivers/net/virtio/virtio_qtest/qtest.h
@@ -0,0 +1,95 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016

[dpdk-dev] [PATCH v5 5/6] virtio: Add QTest support to vtpci abstraction

2016-06-02 Thread Tetsuya Mukawa

The patch adds QTest support to vtpci abstraction.
With this patch, only modern virtio device will be supported.
This implementation will be used by later QTest extension patch
of virtio-net PMD.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/Makefile|   1 +
 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c | 407 +
 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.h |  39 ++
 3 files changed, 447 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c
 create mode 100644 drivers/net/virtio/virtio_qtest/virtio_qtest_pci.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 0b1ccff..1c86d9d 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -65,6 +65,7 @@ endif

 ifeq ($(CONFIG_RTE_VIRTIO_QTEST),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/qtest_utils.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/virtio_qtest_pci.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c 
b/drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c
new file mode 100644
index 000..d715b13
--- /dev/null
+++ b/drivers/net/virtio/virtio_qtest/virtio_qtest_pci.c
@@ -0,0 +1,407 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "../virtio_logs.h"
+#include "../virtio_pci.h"
+#include "../virtqueue.h"
+
+#include "qtest_utils.h"
+#include "virtio_qtest_pci.h"
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+   /* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+* and only accepts 32 bit page frame number.
+* Check if the allocated physical memory exceeds 16TB.
+*/
+   if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+   (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+   return 0;
+   }
+
+   return 1;
+}
+
+static inline uint8_t
+qtest_read8(struct virtio_hw *hw, uint8_t *addr)
+{
+   return qtest_read(hw->virtio_user_dev, (uint64_t)addr, 'b');
+}
+
+static inline void
+qtest_write8(struct virtio_hw *hw, uint8_t val, uint8_t *addr)
+{
+   return qtest_write(hw->virtio_user_dev, (uint64_t)addr, val, 'b');
+}
+
+static inline uint16_t
+qtest_read16(struct virtio_hw *hw, uint16_t *addr)
+{
+   return qtest_read(hw->virtio_user_dev, (uint64_t)addr, 'w');
+}
+
+static inline void
+qtest_write16(struct virtio_hw *hw, uint16_t val, uint16_t *addr)
+{
+   return qtest_write(hw->virtio_user_dev, (uint64_t)addr, val, 'w');
+}
+
+static inline uint32_t
+qtest_read32(struct virtio_hw *hw, uint32_t *addr)
+{
+   return qtest_read(hw->virtio_user_dev, (uint64_t)addr, 'l');
+}
+
+static inline void
+qtest_write32(struct virtio_hw *hw, uint32_t val, uint32_t *addr)
+{
+   return qtest_write(hw->virtio_user_dev, (uint64_t)addr, val, 'l');
+}
+
+static inline void
+qtest_write64_twopart(struct virtio_hw *hw,
+   uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+   qtest_write32(hw, val & ((1ULL << 32) - 1), lo);
+   qtest_write32(hw, val >> 32,hi);
+}
+
+static void
+qtest_read_dev_config(struct virtio_hw *hw, size_t offset,
+

[dpdk-dev] [PATCH v5 4/6] virtio, qtest: Add misc functions to handle pci information

2016-06-02 Thread Tetsuya Mukawa

The patch adds below functions.
 - qtest_read_pci_cfg
 - qtest_get_bar_addr
 - qtest_get_bar_size
These are used for handling pci device information.
It will be called by later patches.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_qtest/qtest_utils.c | 77 +++
 drivers/net/virtio/virtio_qtest/qtest_utils.h | 56 +++
 2 files changed, 133 insertions(+)

diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.c 
b/drivers/net/virtio/virtio_qtest/qtest_utils.c
index 9bc1fca..27118fb 100644
--- a/drivers/net/virtio/virtio_qtest/qtest_utils.c
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.c
@@ -389,6 +389,83 @@ qtest_find_device(struct qtest_session *s, const char 
*name)
return NULL;
 }

+/*
+ * The function is used for reading pci configuration space of specifed device.
+ */
+int
+qtest_read_pci_cfg(struct qtest_session *s, const char *name,
+   void *buf, size_t len, off_t offset)
+{
+   struct qtest_pci_device *dev;
+   uint32_t i;
+   uint8_t *p = buf;
+
+   dev = qtest_find_device(s, name);
+   if (dev == NULL) {
+   PMD_DRV_LOG(ERR, "Cannot find specified device: %s", name);
+   return -1;
+   }
+
+   for (i = 0; i < len; i++) {
+   *(p + i) = qtest_pci_inb(s,
+   dev->bus_addr, dev->device_addr, 0, offset + i);
+   }
+
+   return 0;
+}
+
+static struct qtest_pci_bar *
+qtest_get_bar(struct qtest_session *s, const char *name, uint8_t bar)
+{
+   struct qtest_pci_device *dev;
+
+   if (bar >= NB_BAR) {
+   PMD_DRV_LOG(ERR, "Invalid bar is specified: %u", bar);
+   return NULL;
+   }
+
+   dev = qtest_find_device(s, name);
+   if (dev == NULL) {
+   PMD_DRV_LOG(ERR, "Cannot find specified device: %s", name);
+   return NULL;
+   }
+
+   if (dev->bar[bar].type == QTEST_PCI_BAR_DISABLE) {
+   PMD_DRV_LOG(ERR, "Cannot find valid BAR(%s): %u", name, bar);
+   return NULL;
+   }
+
+   return >bar[bar];
+}
+
+int
+qtest_get_bar_addr(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t **addr)
+{
+   struct qtest_pci_bar *bar_ptr;
+
+   bar_ptr = qtest_get_bar(s, name, bar);
+   if (bar_ptr == NULL)
+   return -1;
+
+   *addr = (uint64_t *)bar_ptr->region_start;
+   return 0;
+}
+
+int
+qtest_get_bar_size(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t *size)
+{
+   struct qtest_pci_bar *bar_ptr;
+
+   bar_ptr = qtest_get_bar(s, name, bar);
+   if (bar_ptr == NULL)
+   return -1;
+
+   *size = bar_ptr->region_size;
+   return 0;
+}
+
 static void
 qtest_event_send(struct qtest_session *s, char *buf)
 {
diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.h 
b/drivers/net/virtio/virtio_qtest/qtest_utils.h
index 6c70552..e41374f 100644
--- a/drivers/net/virtio/virtio_qtest/qtest_utils.h
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.h
@@ -218,6 +218,62 @@ void qtest_write(struct qtest_session *s, uint64_t addr,

 /**
  * @internal
+ * Read pci configuration space of QEMU guest.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param buf
+ *   The pointer to the buffer.
+ * @param len
+ *   Length to read.
+ * @param offset
+ *   Offset of pci configuration space.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_read_pci_cfg(struct qtest_session *s, const char *name,
+   void *buf, size_t len, off_t offset);
+
+/**
+ * @internal
+ * Get BAR address of a specified pci device.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param bar
+ *   The index of BAR. Should be between 0 to 5.
+ * @param addr
+ *   The pointer to store BAR address.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_get_bar_addr(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t **addr);
+
+/**
+ * @internal
+ * Get BAR size of a specified pci device.
+ *
+ * @param s
+ *   The pointer to qtest session structure.
+ * @param name
+ *   The name of pci device.
+ * @param bar
+ *   The index of BAR. Should be between 0 to 5.
+ * @param size
+ *   The pointer to store BAR size.
+ * @return
+ *   0 on success, negative on error
+ */
+int qtest_get_bar_size(struct qtest_session *s, const char *name,
+   uint8_t bar, uint64_t *size);
+
+/**
+ * @internal
  * Initialization function of general device.
  *
  * @param s
-- 
2.7.4

[dpdk-dev] [PATCH v5 3/6] virtio, qtest: Add functionality to share memory between QTest guest

2016-06-02 Thread Tetsuya Mukawa

The patch adds functionality to share memory between QTest guest and
DPDK application using ivshmem device.
The shared memory will be all EAL memory on hugepages. This memory will
be accessed by QEMU vcpu and DPDK application using same address.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_qtest/qtest_utils.c | 189 +-
 drivers/net/virtio/virtio_qtest/qtest_utils.h |   4 +-
 2 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.c 
b/drivers/net/virtio/virtio_qtest/qtest_utils.c
index 2c088f0..9bc1fca 100644
--- a/drivers/net/virtio/virtio_qtest/qtest_utils.c
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.c
@@ -43,6 +43,9 @@
 #include "../virtio_ethdev.h"
 #include "qtest_utils.h"

+/* ivshmem configuration */
+#define IVSHMEM_PROTOCOL_VERSION0
+
 #define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
(1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
((_function) & 0x7) << 8 | ((_offset) & 0xfc))
@@ -59,6 +62,7 @@ union qtest_pipefds {

 struct qtest_session {
int qtest_socket;
+   int ivshmem_socket;
pthread_mutex_t qtest_session_lock;

struct qtest_pci_device_list head;
@@ -411,6 +415,7 @@ qtest_close_sockets(struct qtest_session *s)
qtest_close_one_socket(>qtest_socket);
qtest_close_one_socket(>msgfds.readfd);
qtest_close_one_socket(>msgfds.writefd);
+   qtest_close_one_socket(>ivshmem_socket);
 }

 static void
@@ -716,6 +721,172 @@ qtest_register_target_devices(struct qtest_session *s,
 }

 static int
+qtest_send_message_to_ivshmem(int sock_fd, uint64_t client_id, int shm_fd)
+{
+   struct iovec iov;
+   struct msghdr msgh;
+   size_t fdsize = sizeof(int);
+   char control[CMSG_SPACE(fdsize)];
+   struct cmsghdr *cmsg;
+   int ret;
+
+   memset(, 0, sizeof(msgh));
+   iov.iov_base = _id;
+   iov.iov_len = sizeof(client_id);
+
+   msgh.msg_iov = 
+   msgh.msg_iovlen = 1;
+
+   if (shm_fd >= 0) {
+   msgh.msg_control = 
+   msgh.msg_controllen = sizeof(control);
+   cmsg = CMSG_FIRSTHDR();
+   cmsg->cmsg_len = CMSG_LEN(fdsize);
+   cmsg->cmsg_level = SOL_SOCKET;
+   cmsg->cmsg_type = SCM_RIGHTS;
+   memcpy(CMSG_DATA(cmsg), _fd, fdsize);
+   }
+
+   do {
+   ret = sendmsg(sock_fd, , 0);
+   } while (ret < 0 && errno == EINTR);
+
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "sendmsg error");
+   return ret;
+   }
+
+   return ret;
+}
+
+/* This function is came from ../virtio_user/vhost_user.c
+ *
+ * Two possible options:
+ * 1. Match HUGEPAGE_INFO_FMT to find the file storing struct hugepage_file
+ * array. This is simple but cannot be used in secondary process because
+ * secondary process will close and munmap that file.
+ * 2. Match HUGEFILE_FMT to find hugepage files directly.
+ *
+ * We choose option 2.
+ */
+struct hugepage_file_info {
+   uint64_t addr;/**< virtual addr */
+   size_t   size;/**< the file size */
+   char path[PATH_MAX];  /**< path to backing file */
+};
+
+static int
+get_hugepage_file_info(struct hugepage_file_info huges[], int max)
+{
+   int idx;
+   FILE *f;
+   char buf[BUFSIZ], *tmp, *tail;
+   char *str_underline, *str_start;
+   int huge_index;
+   uint64_t v_start, v_end;
+
+   f = fopen("/proc/self/maps", "r");
+   if (!f) {
+   PMD_DRV_LOG(ERR, "cannot open /proc/self/maps");
+   return -1;
+   }
+
+   idx = 0;
+   while (fgets(buf, sizeof(buf), f) != NULL) {
+   sscanf(buf, "%" PRIx64 "-%" PRIx64, _start, _end);
+
+   tmp = strchr(buf, ' ') + 1; /** skip address */
+   tmp = strchr(tmp, ' ') + 1; /** skip perm */
+   tmp = strchr(tmp, ' ') + 1; /** skip offset */
+   tmp = strchr(tmp, ' ') + 1; /** skip dev */
+   tmp = strchr(tmp, ' ') + 1; /** skip inode */
+   while (*tmp == ' ') /** skip spaces */
+   tmp++;
+   tail = strrchr(tmp, '\n');  /** remove newline if exists */
+   if (tail)
+   *tail = '\0';
+
+   /* Match HUGEFILE_FMT, aka "%s/%smap_%d",
+* which is defined in eal_filesystem.h
+*/
+   str_underline = strrchr(tmp, '_');
+   if (!str_underline)
+   continue;
+
+   str_start = str_underline - strlen("map");
+   if (str_start < tmp)
+   continue;
+
+   if (sscanf(str_start, "map_%d", _index) != 1)
+   continue;
+
+   if (idx >= max) {
+   PMD_DRV_LOG(ERR, "Exceed maximum of %d", max);
+   goto error;
+

[dpdk-dev] [PATCH v5 2/6] virtio, qtest: Add pci device initialization function to qtest utils

2016-06-02 Thread Tetsuya Mukawa

The patch adds general pci device initialization functionality to
qtest utils. It initializes pci devices using qtest messaging.

Signed-off-by: Tetsuya Mukawa 
---
 drivers/net/virtio/virtio_qtest/qtest_utils.c | 349 +-
 drivers/net/virtio/virtio_qtest/qtest_utils.h | 114 -
 2 files changed, 461 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.c 
b/drivers/net/virtio/virtio_qtest/qtest_utils.c
index 3ad8f9e..2c088f0 100644
--- a/drivers/net/virtio/virtio_qtest/qtest_utils.c
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.c
@@ -43,6 +43,10 @@
 #include "../virtio_ethdev.h"
 #include "qtest_utils.h"

+#define PCI_CONFIG_ADDR(_bus, _device, _function, _offset) ( \
+   (1 << 31) | ((_bus) & 0xff) << 16 | ((_device) & 0x1f) << 11 | \
+   ((_function) & 0x7) << 8 | ((_offset) & 0xfc))
+
 union qtest_pipefds {
struct {
int pipefd[2];
@@ -57,6 +61,8 @@ struct qtest_session {
int qtest_socket;
pthread_mutex_t qtest_session_lock;

+   struct qtest_pci_device_list head;
+
pthread_t event_th;
int event_th_started;
char *evq;
@@ -195,6 +201,119 @@ qtest_raw_write(struct qtest_session *s, uint64_t addr, 
uint32_t val, char type)
 }

 /*
+ * qtest_pci_inX/outX are used for accessing PCI configuration space.
+ * The functions are implemented based on PCI configuration space
+ * specification.
+ * Accroding to the spec, access size of read()/write() should be 4 bytes.
+ */
+static int
+qtest_pci_inb(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return (tmp >> ((offset & 0x3) * 8)) & 0xff;
+}
+
+static uint32_t
+qtest_pci_inl(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   tmp = qtest_raw_in(s, 0xcfc, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return tmp;
+}
+
+static void
+qtest_pci_outl(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint32_t value)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, value, 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
+static uint64_t
+qtest_pci_inq(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset)
+{
+   uint32_t tmp;
+   uint64_t val;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   val = (uint64_t)qtest_raw_in(s, 0xcfc, 'l');
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   val |= (uint64_t)qtest_raw_in(s, 0xcfc, 'l') << 32;
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+
+   return val;
+}
+
+static void
+qtest_pci_outq(struct qtest_session *s, uint8_t bus, uint8_t device,
+   uint8_t function, uint8_t offset, uint64_t value)
+{
+   uint32_t tmp;
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset);
+
+   if (pthread_mutex_lock(>qtest_session_lock) < 0)
+   rte_panic("Cannot lock mutex\n");
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, (uint32_t)(value & 0x), 'l');
+
+   tmp = PCI_CONFIG_ADDR(bus, device, function, offset + 4);
+
+   qtest_raw_out(s, 0xcf8, tmp, 'l');
+   qtest_raw_out(s, 0xcfc, (uint32_t)(value >> 32), 'l');
+
+   if (pthread_mutex_unlock(>qtest_session_lock) < 0)
+   rte_panic("Cannot unlock mutex\n");
+}
+
+/*
  * qtest_in/out are used for accessing ioport of qemu guest.
  * qtest_read/write are used for accessing memory of qemu guest.
  */
@@ -254,6 +373,18 @@ qtest_write(struct qtest_session *s, uint64_t addr, 
uint64_t val, char type)

[dpdk-dev] [PATCH v5 1/6] virtio, qtest: Add QTest utility basic functions

2016-06-02 Thread Tetsuya Mukawa

The patch adds basic functions for accessing to QEMU quest that runs in
QTest mode. The functions will be used by virtio container extension
that can access to the above guest.

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp|   2 +
 drivers/net/virtio/Makefile   |   4 +
 drivers/net/virtio/virtio_qtest/qtest_utils.c | 480 ++
 drivers/net/virtio/virtio_qtest/qtest_utils.h | 119 +++
 4 files changed, 605 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_qtest/qtest_utils.c
 create mode 100644 drivers/net/virtio/virtio_qtest/qtest_utils.h

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 946a6d4..3bf6237 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -46,3 +46,5 @@ CONFIG_RTE_LIBRTE_POWER=y

 # Enable virtio-user
 CONFIG_RTE_VIRTIO_VDEV=y
+# Enable virtio-qtest
+CONFIG_RTE_VIRTIO_QTEST=y
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 13b2b75..0b1ccff 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -63,6 +63,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += 
virtio_user/virtio_user_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_pci.c
 endif

+ifeq ($(CONFIG_RTE_VIRTIO_QTEST),y)
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_qtest/qtest_utils.c
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/virtio_qtest/qtest_utils.c 
b/drivers/net/virtio/virtio_qtest/qtest_utils.c
new file mode 100644
index 000..3ad8f9e
--- /dev/null
+++ b/drivers/net/virtio/virtio_qtest/qtest_utils.c
@@ -0,0 +1,480 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IGEL Co., Ltd. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "../virtio_logs.h"
+#include "../virtio_ethdev.h"
+#include "qtest_utils.h"
+
+union qtest_pipefds {
+   struct {
+   int pipefd[2];
+   };
+   struct {
+   int readfd;
+   int writefd;
+   };
+};
+
+struct qtest_session {
+   int qtest_socket;
+   pthread_mutex_t qtest_session_lock;
+
+   pthread_t event_th;
+   int event_th_started;
+   char *evq;
+   char *evq_dequeue_ptr;
+   size_t evq_total_len;
+
+   union qtest_pipefds msgfds;
+};
+
+static int
+qtest_raw_send(int fd, char *buf, size_t count)
+{
+   size_t len = count;
+   size_t total_len = 0;
+   int ret = 0;
+
+   while (len > 0) {
+   ret = write(fd, buf, len);
+   if (ret == -1) {
+   if (errno == EINTR)
+   continue;
+   return ret;
+   }
+   if (ret == (int)len)
+   break;
+   total_len += ret;
+   buf += ret;
+   len -= ret;
+   }
+   return total_len + ret;
+}
+
+static int
+qtest_raw_recv(int fd, char *buf, size_t count)
+{
+   size_t len = count;
+   size_t total_len = 0;
+   int ret = 0;
+
+   while (len > 0) {
+   ret = read(fd, buf, len);
+   if (ret <= 0) {
+   if (errno == EINTR)

[dpdk-dev] [PATCH v5 0/6] Virtio-net PMD: QEMU QTest extension for container

2016-06-02 Thread Tetsuya Mukawa

The patches will work on below patch series.
 - [PATCH v5 0/8] virtio support for container

It seems his implementation will be changed a bit.
So, this patch series are also going to be changed to follow his implementation.


[Changes]
v5 changes:
 - Rebase on latest dpdk-next-virtio.
 - Follow Jianfeng's implementation to support virtual virtio-net device.
 - Split the patch series like followings.
   - This patch series.
 Only support basic functions.
 The functions to handle LSC interrupt and '--range-virtaddr' was
 removed from this patch series.
 This patch needs EAL memory mapped between (1<<31) to (1<<44).
 To allocate such a memory, just assume the user will use '--base-virtaddr'.
 If appropriate memory cannot be allocated, this PMD will exit as error.
 Then the users can try other values.
  - Supplement patches to support link status interrupt.
  - Supplement patches to support '--range-virtaddr'.
This EAL option will help to allocate memory mapped between (1<<31) to
(1<<44).

v4 changes:
 - Rebase on latest master.
 - Split patches.
 - To abstract qtest code more, change interface between current virtio
   code and qtest code.
 - Rename qtest.c to qtest_utils.c
 - Change implementation like below.
   - Set pci device information out of qtest abstraction, then pass it to
 qtest to initialize devices.
 - Remove redundant condition checking from qtest_raw_send/recv().
 - Fix return value of qtest_raw_send().

v3 changes:
 - Rebase on latest master.
 - remove "-qtest-virtio" option, then add "--range-virtaddr" and
   "--align-memsize" options.
 - Fix typos in qtest.c

v2 changes:
 - Rebase on above patch seiries.
 - Rebase on master
 - Add "--qtest-virtio" EAL option.
 - Fixes in qtest.c
  - Fix error handling for the case qtest connection is closed.
  - Use eventfd for interrupt messaging.
  - Use linux header for PCI register definitions.
  - Fix qtest_raw_send/recv to handle error correctly.
  - Fix bit mask of PCI_CONFIG_ADDR.
  - Describe memory and ioport usage of qtest guest in qtest.c
  - Remove loop that is for finding PCI devices.


[Abstraction]

Normally, virtio-net PMD only works on VM, because there is no virtio-net 
device on host.
This patches extend  virtio-net PMD to be able to work on host as virtual PMD.
But we didn't implement virtio-net device as a part of virtio-net PMD.
To prepare virtio-net device for the PMD, start QEMU process with special QTest 
mode, then connect it from virtio-net PMD through unix domain socket.

The PMD can connect to anywhere QEMU virtio-net device can.
For example, the PMD can connects to vhost-net kernel module and vhost-user 
backend application.
Similar to virtio-net PMD on QEMU, application memory that uses virtio-net PMD 
will be shared between vhost backend application.
But vhost backend application memory will not be shared.

Main target of this PMD is container like docker, rkt, lxc and etc.
We can isolate related processes(virtio-net PMD process, QEMU and vhost-user 
backend process) by container.
But, to communicate through unix domain socket, shared directory will be needed.


[How to use]

 Please use QEMU-2.5.1, or above.
 (So far, QEMU-2.5.1 hasn't been released yet, so please checkout master from 
QEMU repository)

 - Compile
 Set "CONFIG_RTE_VIRTIO_VDEV_QTEST=y" in config/common_linux.
 Then compile it.

 - Start QEMU like below.
 $ qemu-system-x86_64 \
  -machine pc-i440fx-1.4,accel=qtest \
  -display none -qtest-log /dev/null \
  -qtest unix:/tmp/socket,server \
  -netdev type=tap,script=/etc/qemu-ifup,id=net0,queues=1 \
  -device 
virtio-net-pci,netdev=net0,mq=on,disable-modern=false,addr=3 \
  -chardev socket,id=chr1,path=/tmp/ivshmem,server \
  -device ivshmem,size=1G,chardev=chr1,vectors=1,addr=4

 - Start DPDK application like below
 $ testpmd -c f -n 1 -m 1024 --no-pci --base-virtaddr=0x4 \
 --vdev="eth_virtio_qtest0,qtest=/tmp/socket,ivshmem=/tmp/ivshmem"\
 -- --disable-hw-vlan --txqflags=0xf00 -i

(*1) Please Specify same memory size in QEMU and DPDK command line.
(*2) Should use qemu-2.5.1, or above.
(*3) QEMU process is needed per port.
(*4) virtio-1.0 device are only supported.
(*5) The vhost backends like vhost-net and vhost-user can be specified.
(*6) In most cases, just using above command is enough, but you can also
 specify other QEMU virtio-net options.
(*7) Only checked "pc-i440fx-1.4" machine, but may work with other
 machines. It depends on a machine has piix3 south bridge.
 If the machine doesn't have, virtio-net PMD cannot receive status
 changed interrupts.
(*8) Should not add "--enable-kvm" to QEMU command line.


[Detailed Description]

 - virtio-net device implementation
The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality is 
used.
QTest is a test framework of QEMU devices. It allows us to implement a device 
driver

[dpdk-dev] [PATCH v6 1/5] mempool: support external handler

2016-06-02 Thread Hunt, David



On 6/1/2016 6:54 PM, Jan Viktorin wrote:
>
>   mp->populated_size--;
> @@ -383,13 +349,16 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
> *vaddr,
>   unsigned i = 0;
>   size_t off;
>   struct rte_mempool_memhdr *memhdr;
> - int ret;
>   
>   /* create the internal ring if not already done */
>   if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
> - ret = rte_mempool_ring_create(mp);
> - if (ret < 0)
> - return ret;
> + rte_errno = 0;
> + mp->pool = rte_mempool_ops_alloc(mp);
> + if (mp->pool == NULL) {
> + if (rte_errno == 0)
> + return -EINVAL;
> + return -rte_errno;
> Are you sure the rte_errno is a positive value?

If the user returns EINVAL, or similar, we want to return negative, as 
per the rest of DPDK.

>> @@ -204,9 +205,13 @@ struct rte_mempool_memhdr {
>>   struct rte_mempool {
>>  char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
>>  struct rte_ring *ring;   /**< Ring to store objects. */
>> +union {
>> +void *pool;  /**< Ring or pool to store objects */
> What about calling this pdata or priv? I think, it can improve some doc 
> comments.
> Describing a "pool" may refer to both the rte_mempool itself or to the 
> mp->pool
> pointer. The "priv" would help to understand the code a bit.
>
> Then the rte_mempool_alloc_t can be called rte_mempool_priv_alloc_t. Or 
> something
> similar. It's than clear enough, what the function should do...

I'd lean towards pdata or maybe pool_data. Not sure about the function 
name change though,
the function does return a pool_data pointer, which we put into 
mp->pool_data.

>> +uint64_t *pool_id;   /**< External mempool identifier */
> Why is pool_id a pointer? Is it a typo? I've understood it should be 64b wide
> from the discussion (Olivier, Jerin, David):

Yes, typo.

>   uint32_t trailer_size;   /**< Size of trailer (after elt). */
>   
>   unsigned private_data_size;  /**< Size of private data. */
> + /**
> +  * Index into rte_mempool_handler_table array of mempool handler ops
> s/rte_mempool_handler_table/rte_mempool_ops_table/

Done.

>> + * structs, which contain callback function pointers.
>> + * We're using an index here rather than pointers to the callbacks
>> + * to facilitate any secondary processes that may want to use
>> + * this mempool.
>> + */
>> +int32_t handler_idx;
> s/handler_idx/ops_idx/
>
> What about ops_index? Not a big deal...

I agree ops_index is better. Changed.

>>   
>>  struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */
>>   
>> @@ -325,6 +338,199 @@ void rte_mempool_check_cookies(const struct 
>> rte_mempool *mp,
>>   #define __mempool_check_cookies(mp, obj_table_const, n, free) do {} 
>> while(0)
>>   #endif /* RTE_LIBRTE_MEMPOOL_DEBUG */
>>   
>> +#define RTE_MEMPOOL_HANDLER_NAMESIZE 32 /**< Max length of handler name. */
>> +
>> +/**
>> + * typedef for allocating the external pool.
> What about:
>
> function prototype to provide an implementation specific data
>
>> + * The handler's alloc function does whatever it needs to do to grab memory
>> + * for this handler, and sets the *pool opaque pointer in the rte_mempool
>> + * struct. In the default handler, *pool points to a ring,in other handlers,
> What about:
>
> The function should provide a memory heap representation or another private 
> data
> used for allocation by the rte_mempool_ops. E.g. the default ops provides an
> instance of the rte_ring for this purpose.
>
>> + * it will mostlikely point to a different type of data structure.
>> + * It will be transparent to the application programmer.
> I'd add something like this:
>
> The function should not touch the given *mp instance.

Agreed. Reworked somewhat.


> ...because it's goal is NOT to set the mp->pool, this is done by the
> rte_mempool_populate_phys - the caller of this rte_mempool_ops_alloc.
>
> This is why I've suggested to pass the rte_mempool as const in the v5.
> Is there any reason to modify the rte_mempool contents by the implementation?
> I think, we just need to read the flags, socket_id, etc.

Yes, I agree it should be const. Changed.

>> + */
>> +typedef void *(*rte_mempool_alloc_t)(struct rte_mempool *mp);
>> +
>> +/** Free the external pool opaque data (that pointed to by *pool) */
> What about:
>
> /** Free the opaque private data stored in the mp->pool pointer. */

I've merged the two versions of the comment:
/** Free the opaque private data pointed to by mp->pool_data pointer */


>> +typedef void (*rte_mempool_free_t)(void *p);
>> +
>> +/**
>> + * Put an object in the external pool.
>> + * The *p pointer is the opaque data for a given mempool handler (ring,
>> + * array, linked list, etc)
> The obj_table is not documented. Is it really a table? I'd called an

[dpdk-dev] [PATCH] librte_sched: fix compile error on unused parameter red

2016-06-02 Thread Jasvinder Singh

This commit fixes the following compile error messages when
CONFIG_RTE_SCHED_RED=n and CONFIG_RTE_SCHED_COLLECT_STATS=y;

rte_sched.c: In function ?rte_sched_port_update_subport_stats_on_drop?:
rte_sched.c:1090:41: error: unused parameter ?red? [-Werror=unused-parameter]
struct rte_mbuf *pkt, uint32_t red)
 ^
rte_sched.c: In function ?rte_sched_port_update_queue_stats_on_drop?:
rte_sched.c:1116:39: error: unused parameter ?red? [-Werror=unused-parameter]
struct rte_mbuf *pkt, uint32_t red)

Fixes: 4d51afb5cdb6 ("sched: keep track of RED drops")

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 lib/librte_sched/rte_sched.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 1609ea8..8696423 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -1084,10 +1084,17 @@ rte_sched_port_update_subport_stats(struct 
rte_sched_port *port, uint32_t qindex
s->stats.n_bytes_tc[tc_index] += pkt_len;
 }

+#ifdef RTE_SCHED_RED
 static inline void
 rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port,
-   uint32_t qindex,
-   struct rte_mbuf *pkt, uint32_t red)
+   uint32_t qindex,
+   struct rte_mbuf *pkt, uint32_t 
red)
+#else
+static inline void
+rte_sched_port_update_subport_stats_on_drop(struct rte_sched_port *port,
+   uint32_t qindex,
+   struct rte_mbuf *pkt, 
__rte_unused uint32_t red)
+#endif
 {
struct rte_sched_subport *s = port->subport + (qindex / 
rte_sched_port_queues_per_subport(port));
uint32_t tc_index = (qindex >> 2) & 0x3;
@@ -1110,10 +1117,17 @@ rte_sched_port_update_queue_stats(struct rte_sched_port 
*port, uint32_t qindex,
qe->stats.n_bytes += pkt_len;
 }

+#ifdef RTE_SCHED_RED
 static inline void
 rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port *port,
- uint32_t qindex,
- struct rte_mbuf *pkt, uint32_t red)
+   uint32_t qindex,
+   struct rte_mbuf *pkt, uint32_t 
red)
+#else
+static inline void
+rte_sched_port_update_queue_stats_on_drop(struct rte_sched_port *port,
+   uint32_t qindex,
+   struct rte_mbuf *pkt, 
__rte_unused uint32_t red)
+#endif
 {
struct rte_sched_queue_extra *qe = port->queue_extra + qindex;
uint32_t pkt_len = pkt->pkt_len;
-- 
2.5.5

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Marc

On 1 June 2016 at 20:51, Thomas Monjalon  wrote:

> Hi Keith,
>
> I'll try to bring more context to this discussion below.
>
> 2016-06-01 15:00, Wiles, Keith:
> > Started from the link below, but did not want to highjack the thread.
> > http://dpdk.org/ml/archives/dev/2016-June/040021.html
> >
> > I was thinking about this problem from a user perspective and command
> > line options are very difficult to manage specifically when you have
> > a large number of options as we have in dpdk.
>
> The user uses an application.
> It is up to the application to let users do some configuration.
>
> > I see all of these options as a type of database of information for
> > the DPDK and the application, because the application command line
> > options are also getting very complex as well.
>
> DPDK is a collection of libraries.
> There is no command line options in a library.
> So we should not be talking about such issue. But...
>
> ... configuration of the DPDK libraries must be improved.
> We need some clean API to let the application configure a lot of things
> at runtime (during initialization or after).
> Ideally the API should not use an argc/argv format.
>
> We also have a lot of applications for tests or examples which use a
> common configuration scheme based on command line options.
> It is only for test and demonstration purpose. So it is not so important
> and must not be complex to maintain.
> I also think that we should avoid having to modify a configuration file
> for test applications. I like launching a freshly built testpmd with a
> copy-pasted command line without having to create a temporary
> configuration file.
>
> Instead of wrapping a messy configuration interface, we should proceed
> with this steps (in this order):
> - implement clean configuration API
> - move command line options parsing in a separate library
> - implement an alternative to the options parsing library, as an example
> - remove the options parsing library if the alternative is better
>
>
Fully agree on all that you say. To me:

* +1 on staying away from XML and JSON.

* INI is an option, but if there is the need of hierarchical another option
is libconfig
(
http://www.hyperrealm.com/libconfig/libconfig_manual.html#Configuration-Files)
that to me is more readable and user-friendly than JSON (not to mention
XML).

* As you, Thomas say, and as it has been discussed previously [1]; it would
be good that eal_init was not depending on argv and had a _simple_, and
with reasonable defaults, struct-based init API, and build wrapper
libraries on top of that, one being the command-line and another being a
configuration file (although they would be connected somehow).

Marc

[1] http://dpdk.org/ml/archives/dev/2013-August/000374.html

[dpdk-dev] [PATCH] scripts: check commits with checkpatch

2016-06-02 Thread Thomas Monjalon

The new option -n allows to give a number of commits to check
from the git HEAD.
If neither -n nor patch files are given, the commits after
origin/master are checked.

Signed-off-by: Thomas Monjalon 
---
 scripts/checkpatches.sh | 45 -
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/scripts/checkpatches.sh b/scripts/checkpatches.sh
index 4cae86d..24a246d 100755
--- a/scripts/checkpatches.sh
+++ b/scripts/checkpatches.sh
@@ -48,17 +48,19 @@ NEW_TYPEDEFS,COMPARISON_TO_NULL"

 print_usage () {
cat <<- END_OF_HELP
-   usage: $(basename $0) [-q] [-v] [patch1 [patch2] ...]]
+   usage: $(basename $0) [-q] [-v] [-nX|patch1 [patch2] ...]]

Run Linux kernel checkpatch.pl with DPDK options.
The environment variable DPDK_CHECKPATCH_PATH must be set.
END_OF_HELP
 }

+number=0
 quiet=false
 verbose=false
-while getopts hqv ARG ; do
+while getopts hn:qv ARG ; do
case $ARG in
+   n ) number=$OPTARG ;;
q ) quiet=true && options="$options --no-summary" ;;
v ) verbose=true ;;
h ) print_usage ; exit 0 ;;
@@ -74,17 +76,42 @@ if [ ! -x "$DPDK_CHECKPATCH_PATH" ] ; then
exit 1
 fi

+total=0
 status=0
-for p in "$@" ; do
-   ! $verbose || printf '\n### %s\n\n' "$p"
-   report=$($DPDK_CHECKPATCH_PATH $options "$p" 2>/dev/null)
+
+check () { #   
+   total=$(($total + 1))
+   ! $verbose || printf '\n### %s\n\n' "$3"
+   if [ -n "$1" ] ; then
+   report=$($DPDK_CHECKPATCH_PATH $options "$1" 2>/dev/null)
+   elif [ -n "$2" ] ; then
+   report=$(git format-patch --stdout -1 $commit |
+   $DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
+   fi
[ $? -ne 0 ] || continue
-   $verbose || printf '\n### %s\n\n' "$p"
+   $verbose || printf '\n### %s\n\n' "$3"
printf '%s\n' "$report" | head -n -6
status=$(($status + 1))
-done
-pass=$(($# - $status))
-$quiet || printf '%d/%d valid patch' $pass $#
+}
+
+if [ -z "$1" ] ; then
+   if [ $number -eq 0 ] ; then
+   commits=$(git rev-list origin/master..)
+   else
+   commits=$(git rev-list --max-count=$number HEAD)
+   fi
+   for commit in $commits ; do
+   subject=$(git log --format='%s' -1 $commit)
+   check '' $commit "$subject"
+   done
+else
+   for patch in "$@" ; do
+   subject=$(sed -n 's,^Subject: ,,p' "$patch")
+   check "$patch" '' "$subject"
+   done
+fi
+pass=$(($total - $status))
+$quiet || printf '%d/%d valid patch' $pass $total
 $quiet || [ $pass -le 1 ] || printf 'es'
 $quiet || printf '\n'
 exit $status
-- 
2.7.0

[dpdk-dev] [PATCH v6 1/5] mempool: support external handler

2016-06-02 Thread Hunt, David

On 6/1/2016 6:54 PM, Jan Viktorin wrote:
> Hello David,
>
> the rename s/handler/ops/ has a lot of residues. Sorry for that :). I tried to
> mark most of them. Otherwise, I couldn't see many more serious issues for now.

Ah, I had assumed that we were just talking about the 
rte_mempool_handler_ops structure,
not a global replace of 'handler' with 'ops'.  It does make sense to 
change it to ops, so we don't have
two words describing the same entity. I'll change to ops.

Just, note the s/pool/priv/ rename suggestion.

I prefer your suggestion of pdata rather than priv, how about "pool_data"?

Again, thanks for the comprehensive review.

Regards,
Dave.

[...]

[dpdk-dev] [PATCH v6 7/7] virtio-user: add a new vdev named virtio-user

2016-06-02 Thread Jianfeng Tan

Add a new virtual device named vhost-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.

Configured parameters include:
  - queues (optional, 1 by default), number of queue pairs, multi-queue
not supported for now.
  - cq (optional, 0 by default), not supported for now.
  - mac (optional), random value will be given if not specified.
  - queue_size (optional, 256 by default), size of virtqueues.
  - path (madatory), path of vhost, depends on the file type, vhost
user if the given path points to a unix socket; vhost-net if the
given path points to a char device.
  - ifname (optional), specify the name of backend tap device; only
valid when backend is vhost-net.

When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled
library can be used in both VM and container environment.

Examples:
path_vhost=/dev/vhost-net # use vhost-net as a backend
path_vhost= # use vhost-user as a backend

sudo ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \
--socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
--vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 doc/guides/rel_notes/release_16_07.rst  |  11 ++
 doc/guides/sample_app_ug/vhost.rst  |  17 +++
 drivers/net/virtio/virtio_ethdev.c  |  19 ++-
 drivers/net/virtio/virtio_ethdev.h  |   2 +
 drivers/net/virtio/virtio_user_ethdev.c | 218 
 5 files changed, 260 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index f6d543c..78787ca 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -34,6 +34,17 @@ This section should contain new features added in this 
release. Sample format:

   Refer to the previous release notes for examples.

+* **Virtio support for containers.**
+
+  Add a new virtual device, named virtio-user, to support virtio for 
containers.
+
+  Known limitations:
+
+  * Control queue and multi-queue are not supported yet.
+  * Cannot work with --huge-unlink.
+  * Cannot work with --no-huge.
+  * Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) 
hugepages.
+  * Root privilege is a must for sorting hugepages by physical address.

 Resolved Issues
 ---
diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index 5f81802..a93e54d 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -833,3 +833,20 @@ For example:
 The above message indicates that device 0 has been registered with MAC address 
cc:bb:bb:bb:bb:bb and VLAN tag 1000.
 Any packets received on the NIC with these values is placed on the devices 
receive queue.
 When a virtio-net device transmits packets, the VLAN tag is added to the 
packet by the DPDK vhost sample code.
+
+Running virtio-user with vhost-switch
+-
+
+We can also use virtio-user with vhost-switch now.
+Virtio-user is a virtual device that can be run in a application (container) 
parallelly with vhost in the same OS,
+aka, there is no need to start a VM. We just run it with a different 
--file-prefix to avoid startup failure.
+
+.. code-block:: console
+
+cd ${RTE_SDK}/x86_64-native-linuxapp-gcc/app
+./testpmd -c 0x3 -n 4 --socket-mem 1024 --no-pci 
--file-prefix=virtio-user-testpmd \
+--vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost \
+-- -i --txqflags=0xf01 --disable-hw-vlan
+
+There is no difference on the vhost side.
+Pleae note that there are some limitations (see release note for more 
information) in the usage of virtio-user.
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 90f1a28..e1d5f0b 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -59,7 +59,6 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

-static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev);
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
@@ -1042,7 +1041,7 @@ rx_func_get(struct rte_eth_dev *eth_dev)
  * This function is based on probe() function in virtio_pci.c
  * It returns 0

[dpdk-dev] [PATCH v6 6/7] virtio-user: add new virtual pci driver for virtio

2016-06-02 Thread Jianfeng Tan

This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  --
  | -- |
  | | virtio driver  | |> (virtio_user_ethdev.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  | |
  | -- |
  --
|
|
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/Makefile |   1 +
 drivers/net/virtio/virtio_pci.h |   1 +
 drivers/net/virtio/virtio_user_ethdev.c | 218 
 3 files changed, 220 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user_ethdev.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 68068bd..d913df0 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index a76daf7..d10d013 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -260,6 +260,7 @@ struct virtio_hw {
struct virtio_pci_common_cfg *common_cfg;
struct virtio_net_config *dev_cfg;
const struct virtio_pci_ops *vtpci_ops;
+   void*virtio_user_dev;
 };

 /*
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
new file mode 100644
index 000..0ea3f23
--- /dev/null
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -0,0 +1,218 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "virtio_logs.h"
+#include "virtio_pci.h"
+#include "virtqueue.h"
+#include "virtio_user/virtio_user_dev.h"
+
+#define virtio_user_get_dev(hw) \
+   ((struct virtio_user_dev *)(hw)->virtio_user_dev);
+
+static void
+virtio_user_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+void *dst, int length)
+{
+   int i;
+   struct virtio_user_dev *dev = virtio_user_get_dev(hw);
+
+   if (offset == offsetof(struct virtio_net_config, mac) &&
+   length == ETHER_ADDR_LEN) {
+   for (i = 0; i < ETHER_ADDR_LEN; ++i)
+   ((uint8_t *)dst)[i] = dev->mac_addr[i];
+   return;
+   }
+
+   if (offset == offsetof(struct virtio_net_config, status))
+   *(uint16_t *)dst = dev->status;
+
+   if (offset == offsetof(struct virtio_net_config, max_virtqueue_pairs))
+   *(uint16_t *)dst =

[dpdk-dev] [PATCH v6 4/7] virtio-user: add vhost adapter layer

2016-06-02 Thread Jianfeng Tan

This patch is to provide vhost adapter layer implementations. Instead
of relying on a hypervisor to translate between device emulation and
vhost backend, here we directly talk with vhost backend through the
vhost file. Depending on the type of vhost file,
  - vhost-user is used if the given path points to a unix socket;
  - vhost-kernel is used if the given path points to a char device.

Here three main APIs are provided to upper layer (device emulation):
  - vhost_user_setup(), to set up env to talk to a vhost user backend;
  - vhost_kernel_setup(), to set up env to talk to a vhost kernel backend.
  - vhost_call(), to provide a unified interface to communicate with
vhost backend.

  --
  | -- |
  | | virtio driver  | |
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
|
| -- --> (vhost-user protocol or vhost-net ioctls)
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 config/common_linuxapp|   3 +
 drivers/net/virtio/Makefile   |   6 +
 drivers/net/virtio/virtio_user/vhost.c| 105 +++
 drivers/net/virtio/virtio_user/vhost.h| 222 +++
 drivers/net/virtio/virtio_user/vhost_kernel.c | 254 +
 drivers/net/virtio/virtio_user/vhost_user.c   | 378 ++
 6 files changed, 968 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/vhost.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost.h
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost_user.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7e698e2..946a6d4 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -43,3 +43,6 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
+
+# Enable virtio-user
+CONFIG_RTE_VIRTIO_VDEV=y
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index ef84f60..c9f2bc0 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -55,6 +55,12 @@ ifeq ($(findstring 
RTE_MACHINE_CPUFLAG_SSSE3,$(CFLAGS)),RTE_MACHINE_CPUFLAG_SSSE
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 endif

+ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/virtio_user/vhost.c 
b/drivers/net/virtio/virtio_user/vhost.c
new file mode 100644
index 000..1944a97
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost.c
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH

[dpdk-dev] [PATCH v6 3/7] virtio: enable use virtual address to fill desc

2016-06-02 Thread Jianfeng Tan

This patch is related to how to calculate relative address for vhost
backend.

The principle is that: based on one or multiple shared memory regions,
vhost maintains a reference system with the frontend start address,
backend start address, and length for each segment, so that each
frontend address (GPA, Guest Physical Address) can be translated into
vhost-recognizable backend address. To make the address translation
efficient, we need to maintain as few regions as possible. In the case
of VM, GPA is always locally continuous. But for some other case, like
virtio-user, we use virtual address here.

It basically means:
  a. when set_base_addr, VA address is used;
  b. when preparing RX's descriptors, VA address is used;
  c. when transmitting packets, VA is filled in TX's descriptors;
  d. in TX and CQ's header, VA is used.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-by: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c  | 25 -
 drivers/net/virtio/virtio_rxtx.c|  5 ++---
 drivers/net/virtio/virtio_rxtx_simple.c | 13 +++--
 drivers/net/virtio/virtqueue.h  | 13 -
 4 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 781886d..90f1a28 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -167,14 +167,14 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 * One RX packet for ACK.
 */
vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mem;
vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
vq->vq_free_cnt--;
i = vq->vq_ring.desc[head].next;

for (k = 0; k < pkt_num; k++) {
vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr)
+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
vq->vq_ring.desc[i].len = dlen[k];
@@ -184,7 +184,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
}

vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr);
vq->vq_ring.desc[i].len = sizeof(ctrl->status);
vq->vq_free_cnt--;
@@ -419,8 +419,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mem = hdr_mz->phys_addr;

memset(hdr_mz->addr, 0, hdr_mz_sz);
-   vring_hdr_desc_init(vq);
-
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
@@ -441,6 +439,23 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

+   /* For virtio-user case (that is when dev->pci_dev is NULL), we use
+* virtual address. And we need properly set _offset_, please see
+* MBUF_DATA_DMA_ADDR in virtqueue.h for more information.
+*/
+   if (dev->pci_dev)
+   vq->offset = offsetof(struct rte_mbuf, buf_physaddr);
+   else {
+   vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
+   vq->offset = offsetof(struct rte_mbuf, buf_addr);
+   if (vq->virtio_net_hdr_mz)
+   vq->virtio_net_hdr_mem =
+   (phys_addr_t)vq->virtio_net_hdr_mz->addr;
+   }
+
+   if (queue_type == VTNET_TQ)
+   vring_hdr_desc_init(vq);
+
if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
PMD_INIT_LOG(ERR, "setup_queue failed");
virtio_dev_queue_release(vq);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index f326222..5b0c3df 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -193,8 +193,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct 
rte_mbuf *cookie)

start_dp = vq->vq_ring.desc;
start_dp[idx].addr =
-   (uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
-   - hw->vtnet_hdr_size);
+   MBUF_DATA_DMA_ADDR(cookie, vq->offset) - hw->vtnet_hdr_size;
start_dp[idx].len =
cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
start_dp[idx].flags =  VRING_DESC_F_WRITE;
@@ -265,7 +264,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,

[dpdk-dev] [PATCH v6 2/7] virtio: clean up virtio_dev_queue_setup

2016-06-02 Thread Jianfeng Tan

Abstract vring hdr desc init as an inline method.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_ethdev.c | 42 ++
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index a3031e4..781886d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -278,6 +278,26 @@ virtio_dev_queue_release(struct virtqueue *vq)
}
 }

+static void
+vring_hdr_desc_init(struct virtqueue *vq)
+{
+   int i;
+   struct virtio_tx_region *txr = vq->virtio_net_hdr_mz->addr;
+
+   for (i = 0; i < vq->vq_nentries; i++) {
+   struct vring_desc *start_dp = txr[i].tx_indir;
+
+   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
+
+   /* first indirect descriptor is always the tx header */
+   start_dp->addr = vq->virtio_net_hdr_mem + i * sizeof(*txr) +
+offsetof(struct virtio_tx_region, tx_hdr);
+
+   start_dp->len = vq->hw->vtnet_hdr_size;
+   start_dp->flags = VRING_DESC_F_NEXT;
+   }
+}
+
 int virtio_dev_queue_setup(struct rte_eth_dev *dev,
int queue_type,
uint16_t queue_idx,
@@ -375,8 +395,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

if (queue_type == VTNET_TQ) {
const struct rte_memzone *hdr_mz;
-   struct virtio_tx_region *txr;
-   unsigned int i;
+   size_t hdr_mz_sz = vq_size * sizeof(struct virtio_tx_region);

/*
 * For each xmit packet, allocate a virtio_net_hdr
@@ -385,7 +404,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
 dev->data->port_id, queue_idx);
hdr_mz = rte_memzone_reserve_aligned(vq_name,
-vq_size * sizeof(*txr),
+hdr_mz_sz,
 socket_id, 0,
 RTE_CACHE_LINE_SIZE);
if (hdr_mz == NULL) {
@@ -399,21 +418,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mz = hdr_mz;
vq->virtio_net_hdr_mem = hdr_mz->phys_addr;

-   txr = hdr_mz->addr;
-   memset(txr, 0, vq_size * sizeof(*txr));
-   for (i = 0; i < vq_size; i++) {
-   struct vring_desc *start_dp = txr[i].tx_indir;
-
-   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
-
-   /* first indirect descriptor is always the tx header */
-   start_dp->addr = vq->virtio_net_hdr_mem
-   + i * sizeof(*txr)
-   + offsetof(struct virtio_tx_region, tx_hdr);
-
-   start_dp->len = vq->hw->vtnet_hdr_size;
-   start_dp->flags = VRING_DESC_F_NEXT;
-   }
+   memset(hdr_mz->addr, 0, hdr_mz_sz);
+   vring_hdr_desc_init(vq);

} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
-- 
2.1.4

[dpdk-dev] [PATCH v6 1/7] virtio: hide phys addr check inside pci ops

2016-06-02 Thread Jianfeng Tan

This patch is to move phys addr check from virtio_dev_queue_setup
to pci ops. To makt that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Huawei Xie 
Acked-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 17 +
 drivers/net/virtio/virtio_pci.c| 30 --
 drivers/net/virtio/virtio_pci.h|  2 +-
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3fb628..a3031e4 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -364,17 +364,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
}
}

-   /*
-* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
-* and only accepts 32 bit page frame number.
-* Check if the allocated physical memory exceeds 16TB.
-*/
-   if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
-   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-   virtio_dev_queue_release(vq);
-   return -ENOMEM;
-   }
-
memset(mz->addr, 0, sizeof(mz->len));
vq->mz = mz;
vq->vq_ring_mem = mz->phys_addr;
@@ -446,7 +435,11 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   hw->vtpci_ops->setup_queue(hw, vq);
+   if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
+   PMD_INIT_LOG(ERR, "setup_queue failed");
+   virtio_dev_queue_release(vq);
+   return -EINVAL;
+   }

vq->configured = 1;
*pvq = vq;
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 9cdca06..6bd239c 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -55,6 +55,22 @@
  */
 #define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)

+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+   /* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+* and only accepts 32 bit page frame number.
+* Check if the allocated physical memory exceeds 16TB.
+*/
+   if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+   (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+   return 0;
+   }
+
+   return 1;
+}
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
   void *dst, int length)
@@ -143,15 +159,20 @@ legacy_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return dst;
 }

-static void
+static int
 legacy_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint32_t src;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
rte_eal_pci_ioport_write(>io, >vq_queue_index, 2,
 VIRTIO_PCI_QUEUE_SEL);
src = vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
rte_eal_pci_ioport_write(>io, , 4, VIRTIO_PCI_QUEUE_PFN);
+
+   return 0;
 }

 static void
@@ -367,12 +388,15 @@ modern_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return io_read16(>common_cfg->queue_size);
 }

-static void
+static int
 modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint64_t desc_addr, avail_addr, used_addr;
uint16_t notify_off;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
desc_addr = vq->mz->phys_addr;
avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
@@ -400,6 +424,8 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue 
*vq)
PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
vq->notify_addr, notify_off);
+
+   return 0;
 }

 static void
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 554efea..a76daf7 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -234,7 +234,7 @@ struct virtio_pci_ops {
uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec);

uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id);
-   void (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+   int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq);
 };
-- 
2.1.4

[dpdk-dev] [PATCH v6 0/7] virtio support for container

2016-06-02 Thread Jianfeng Tan

v6:
 - Move driver related code into from driver/net/virtio/virtio-user/ to
   driver/net/virtio/ directory, inside virtio_user_ethdev.c.
 - Rename vdev to virtio_user in comments and code.
 - Merge code, which lies in virtio_user_pci.c, into virtio_user_ethdev.c.
 - Add some comments at virtio-user special handling at virtio_dev_ethdev.c.
 - Merge document update into the 7nd commit where virtio-user is added.
 - Add usage with vhost-switch in vhost.rst.

v5:
 - Rename struct virtio_user_hw to struct virtio_user_dev.
 - Rename "vdev_private" to "virtio_user_dev".
 - Move special handling into virtio_ethdev.c from queue_setup().
 - Add vring in virtio_user_dev (remove rte_eth_dev_data), so that
   device does not depend on driver's data structure (rte_eth_dev_data).
 - Remove update on doc/guides/nics/overview.rst, because virtio-user has
   exact feature set with virtio.
 - Change "unsigned long int" to "uint64_t", "unsigned" to "uint32_t".
 - Remove unnecessary cast in vdev_read_dev_config().
 - Add functions in virtio_user_dev.c with prefix of "virtio_user_".
 - Rebase on virtio-next-virtio.

v4:
 - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to
   judge if it's virtual device or physical device.
 - Change the added device name to virtio-user.
 - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c,
   virtio_user_dev.c.
 - Move virtio-user specific data from struct virtio_hw into struct
   virtio_user_hw.
 - Add support to send reset_owner message.
 - Change del_queue implementation. (This need more check)
 - Remove rte_panic(), and superseded with log.
 - Add reset_owner into virtio_pci_ops.reset.
 - Merge parameter "rx" and "tx" to "queues" to emliminate confusion.
 - Move get_features to after set_owner.
 - Redefine path in virtio_user_hw from char * to char [].

v3:
 - Remove --single-file option; do no change at EAL memory.
 - Remove the added API rte_eal_get_backfile_info(), instead we check all
   opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
 - Accordingly, add more restrictions at "Known issue" section.
 - Rename parameter from queue_num to queue_size for confusion.
 - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
 - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
   reuse eth_virtio_dev_init(), remove its static declaration.
 - Implement dev_uninit() for rte_eth_dev_detach().
 - WARN -> ERR, in vhost_embedded.c
 - Add more commit message for clarify the model.

v2:
 - Rebase on the patchset of virtio 1.0 support.
 - Fix cannot create non-hugepage memory.
 - Fix wrong size of memory region when "single-file" is used.
 - Fix setting of offset in virtqueue to use virtual address.
 - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
 - Add mac option to specify the mac address of this virtual device.
 - Update doc.

This patchset is to provide high performance networking interface (virtio)
for container-based DPDK applications. The way of starting DPDK apps in
containers with ownership of NIC devices exclusively is beyond the scope.
The basic idea here is to present a new virtual device (named virtio-user),
which can be discovered and initialized by DPDK. To minimize the change,
we reuse already-existing virtio PMD code (driver/net/virtio/).

Background: Previously, we usually use a virtio device in the context of
QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually
presented in VM as a PCI device.

  --
  |  virtio driver |  ->  VM
  --
|
| --> (over PCI bus or MMIO or Channel I/O)
|
  --
  | device emulate |
  ||  ->  QEMU
  | vhost adapter  |
  --
|
| --> (vhost-user protocol or vhost-net ioctls)
|
  --
  | vhost backend  |
  --

Compared to QEMU/VM case, virtio support for contaner requires to embedded
device framework inside the virtio PMD. So this converged driver actually
plays three roles:
  - virtio driver to drive this new kind of virtual device;
  - device emulation to present this virtual device and reponse to the
virtio driver, which is originally by QEMU;
  - and the role to communicate with vhost backend, which is also
originally by QEMU.

The code layout and functionality of each module:

  --
  | -- |
  | | virtio driver  | |> (virtio_user_ethdev.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate |-|> (virtio_user_dev.c)
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
 |
 | -- --> (vhost-user protocol or vhost-net ioctls)
 |
   --
   | vhost backend  |
   --

How to share memory? In VM's case, qemu always

[dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy

2016-06-02 Thread Olivier MATZ

Hi Jerin,

On 06/01/2016 09:00 AM, Jerin Jacob wrote:
> On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote:
>> Today, the objects pointers are reversed only in the get(). It means
>> that this code:
>>
>>  rte_mempool_get_bulk(mp, table, 4);
>>  for (i = 0; i < 4; i++)
>>  printf("obj = %p\n", t[i]);
>>  rte_mempool_put_bulk(mp, table, 4);
>>
>>
>>  printf("-\n");
>>  rte_mempool_get_bulk(mp, table, 4);
>>  for (i = 0; i < 4; i++)
>>  printf("obj = %p\n", t[i]);
>>  rte_mempool_put_bulk(mp, table, 4);
>>
>> prints:
>>
>>  addr1
>>  addr2
>>  addr3
>>  addr4
>>  -
>>  addr4
>>  addr3
>>  addr2
>>  addr1
>>
>> Which is quite strange.
> 
> IMO, It is the expected LIFO behavior. Right ?
> 
> What is not expected is the following, which is the case after change. Or Am I
> missing something here?
> 
> addr1
> addr2
> addr3
> addr4
> -
> addr1
> addr2
> addr3
> addr4
> 
>>
>> I don't think it would be an issue to replace the loop by a
>> rte_memcpy(), it may increase the copy speed and it will be
>> more coherent with the put().
>>

I think the LIFO behavior should occur on a per-bulk basis. I mean,
it should behave like in the exemplaes below:

  // pool cache is in state X
  obj1 = mempool_get(mp)
  obj2 = mempool_get(mp)
  mempool_put(mp, obj2)
  mempool_put(mp, obj1)
  // pool cache is back in state X

  // pool cache is in state X
  bulk1 = mempool_get_bulk(mp, 16)
  bulk2 = mempool_get_bulk(mp, 16)
  mempool_put_bulk(mp, bulk2, 16)
  mempool_put_bulk(mp, bulk1, 16)
  // pool cache is back in state X

Note that today it's not the case for bulks, since object addresses
are reversed only in get(), we are not back in the original state.
I don't really see the advantage of this.

Removing the reversing may accelerate the cache in case of bulk get,
I think.

Regards,
Olivier

[dpdk-dev] [PATCH v5 7/8] virtio-user: add a new vdev named virtio-user

2016-06-02 Thread Tan, Jianfeng

Hi Yuanhan,


On 6/1/2016 4:26 PM, Yuanhan Liu wrote:
> On Mon, May 30, 2016 at 10:55:38AM +, Jianfeng Tan wrote:
>> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
>> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> index 41d8ad1..5e4f60b 100644
>> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
>> @@ -166,3 +166,312 @@ int virtio_user_stop_device(struct virtio_user_dev 
>> *dev)
>>  return vhost_call(dev->vhostfd, dev->type, VHOST_MSG_RESET_OWNER, NULL);
>>   }
>>   
>> +static inline void parse_mac(struct virtio_user_dev *dev, const char *mac)
> Note that this is a slight coding style offensive.

OK, I'll fix it.

>
>> +{
>> +int i, r;
>> +uint32_t tmp[ETHER_ADDR_LEN];
>> +
>> +if (!mac)
>> +return;
>> +
>> +r = sscanf(mac, "%x:%x:%x:%x:%x:%x", [0],
>> +[1], [2], [3], [4], [5]);
>> +if (r == ETHER_ADDR_LEN) {
>> +for (i = 0; i < ETHER_ADDR_LEN; ++i)
>> +dev->mac_addr[i] = (uint8_t)tmp[i];
>> +dev->mac_specified = 1;
>> +} else {
>> +/* ignore the wrong mac, use random mac */
>> +PMD_DRV_LOG(ERR, "wrong format of mac: %s", mac);
>> +}
>> +}
>> +
>> +static int
>> +virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
>> + int queue_size, const char *mac, char *ifname)
> As stated in last email, we should move all others (except above 2
> functions) to the driver layer, where they belong to.

OK, I'll create a virtio_user_ethdev.c to store these things.

Thanks,
Jianfeng

>
>   --yliu

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-06-02 Thread Rich Lane

On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets 
wrote:

> Hi, Rich.
> Thank you for testing and analysing.
>
> On 01.06.2016 01:06, Rich Lane wrote:
> > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets  > wrote:
> >
> > In current implementation guest application can reinitialize vrings
> > by executing start after stop. In the same time host application
> > can still poll virtqueue while device stopped in guest and it will
> > crash with segmentation fault while vring reinitialization because
> > of dereferencing of bad descriptor addresses.
> >
> >
> > I see a performance regression with this patch at large packet sizes (>
> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles.
> Strangely, there's actually a ~1% performance improvement at small packet
> sizes.
> >
> > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
> >
> > AFAICT this is just the compiler generating bad code. One difference is
> that it's storing the offset on the stack instead of in a register. A
> workaround is to move the !desc_addr check outside the unlikely macros.
> >
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0,
> 0, 0}, 0};
> >
> > desc = >desc[desc_idx];
> > -   if (unlikely(desc->len < vq->vhost_hlen))
> > +   desc_addr = gpa_to_vva(dev, desc->addr);
> > +   if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
> >
> >
> >  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) ||
> !desc_addr)".
> >
> > return -1;
> >
> >
> > -   desc_addr = gpa_to_vva(dev, desc->addr);
> > rte_prefetch0((void *)(uintptr_t)desc_addr);
> >
> > virtio_enqueue_offload(m, _hdr.hdr);
> > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> >
> > desc = >desc[desc->next];
> > desc_addr   = gpa_to_vva(dev, desc->addr);
> > +   if (unlikely(!desc_addr))
> >
> >
> > Workaround: change to "if (!desc_addr)".
> >
> >
> > +   return -1;
> > +
> > desc_offset = 0;
> > desc_avail  = desc->len;
> > }
> >
>
> What about other places? Is there same issues or it's only inside
> copy_mbuf_to_desc() ?
>

Only copy_mbuf_to_desc has the issue.

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-02 Thread Olivier MATZ

Hi Jianbo,

On 06/01/2016 05:29 AM, Jianbo Liu wrote:
>> enum rte_mbuf_prefetch_type {
>> > PREFETCH0,
>> > PREFETCH1,
>> > ...
>> > };
>> >
>> > static inline void
>> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
>> > struct rte_mbuf *m)
>> > {
>> > switch (type) {
>> > case PREFETCH0:
>> > rte_prefetch0(>cacheline0);
>> > break;
>> > case PREFETCH1:
>> > rte_prefetch1(>cacheline0);
>> > break;
>> > ...
>> > }
>> >
> How about adding these to forbid the illegal use of this macro?
> enum rte_mbuf_prefetch_type {
>  ENUM_prefetch0,
>  ENUM_prefetch1,
>  ...
> };
> 
> #define RTE_MBUF_PREFETCH_PART1(type, m) \
> if (ENUM_##type == ENUM_prefretch0) \
> rte_prefetch0(&(m)->cacheline0);   \
> else if (ENUM_##type == ENUM_prefetch1) \
> rte_prefetch1(&(m)->cacheline0); \
> 
> 

As Stephen stated, a static inline is better than a macro, mainly
because it is understood by the compiler instead of beeing a dumb
code replacement.

Any reason why you would prefer a macro in that case?

Regards
Olivier

[dpdk-dev] [PATCH] ivshmem: add all memzones of mempool to metada

2016-06-02 Thread Olivier MATZ

Hi Ferruh,

Thank you for fixing this issue.

On 06/01/2016 03:18 PM, Ferruh Yigit wrote:
> [PATCH] ivshmem: add all memzones of mempool to metada

Minor comment: it seems the title is truncated

> +static int
> +add_mempool_to_metadata(const struct rte_mempool *mp,
> + struct ivshmem_config *config)
> +{
> + struct rte_mempool_memhdr *memhdr;
> + int ret;
> +
> + ret = add_mempool_memzone_to_metadata(mp, config);
>   if (ret < 0)
>   return -1;
>  
> + STAILQ_FOREACH(memhdr, >mem_list, next) {
> + ret = add_mempool_memzone_to_metadata(memhdr->addr, config);
> + if (ret < 0)
> + return -1;
> + }
> +
> + /* mempool consists of memzone and ring */
>   return add_ring_to_metadata(mp->ring, config);
>  }
>  

In case you missed it: there is a function
rte_mempool_mem_iter() that can be used to browse the
memory chunks of a mempool. It's probably less convenient
to use compared to directly browsing the list, but it
may be more resistant to api changes.

Apart from that:
Acked-by: Olivier Matz 

Thanks

[dpdk-dev] [PATCH] virtio: use volatile to get used->idx in the loop

2016-06-02 Thread Xie, Huawei

On 6/2/2016 4:52 PM, Yuanhan Liu wrote:
> On Thu, Jun 02, 2016 at 08:39:36AM +, Xie, Huawei wrote:
>> On 6/1/2016 2:03 PM, Yuanhan Liu wrote:
>>> On Wed, Jun 01, 2016 at 05:40:08AM +, Xie, Huawei wrote:
 On 5/30/2016 4:20 PM, Yuanhan Liu wrote:
> On Wed, May 25, 2016 at 12:16:41AM +0800, Huawei Xie wrote:
>> There is no external function call or any barrier in the loop,
>> the used->idx would only be retrieved once.
>>
>> Signed-off-by: Huawei Xie 
>> ---
>>  drivers/net/virtio/virtio_ethdev.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>> b/drivers/net/virtio/virtio_ethdev.c
>> index c3fb628..f6d6305 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -204,7 +204,8 @@ virtio_send_command(struct virtqueue *vq, struct 
>> virtio_pmd_ctrl *ctrl,
>>  usleep(100);
>>  }
>>  
>> -while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
>> +while (vq->vq_used_cons_idx !=
>> +   *((volatile uint16_t *)(>vq_ring.used->idx))) {
> I'm wondering maybe we could fix VIRTQUEUE_NUSED (which has no such
> qualifier) and use this macro here?
>
> If you check the reference of that macro, you might find similar
> issues, say, it is also used inside the while-loop of
> virtio_recv_mergeable_pkts().
>
>   --yliu
>  
>
 Yes, seems it has same issue though haven't confirmed with asm code.
>>> So, move the "volatile" qualifier to VIRTQUEUE_NUSED?
>>>
>>> --yliu
>>>
>> Yes, anyway this is just intermediate fix. In next patch, will declare
>> the idx as volatile, and remove the qualifier in the macro.
> Hmm.., why we need an intermediate fix then, if we can come up with an
> ultimate fix very quickly?
>
>   --yliu
>
... Either is OK. I have no preference.

[dpdk-dev] [PATCH] virtio: use volatile to get used->idx in the loop

2016-06-02 Thread Xie, Huawei

On 6/1/2016 2:03 PM, Yuanhan Liu wrote:
> On Wed, Jun 01, 2016 at 05:40:08AM +, Xie, Huawei wrote:
>> On 5/30/2016 4:20 PM, Yuanhan Liu wrote:
>>> On Wed, May 25, 2016 at 12:16:41AM +0800, Huawei Xie wrote:
 There is no external function call or any barrier in the loop,
 the used->idx would only be retrieved once.

 Signed-off-by: Huawei Xie 
 ---
  drivers/net/virtio/virtio_ethdev.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/virtio/virtio_ethdev.c 
 b/drivers/net/virtio/virtio_ethdev.c
 index c3fb628..f6d6305 100644
 --- a/drivers/net/virtio/virtio_ethdev.c
 +++ b/drivers/net/virtio/virtio_ethdev.c
 @@ -204,7 +204,8 @@ virtio_send_command(struct virtqueue *vq, struct 
 virtio_pmd_ctrl *ctrl,
usleep(100);
}
  
 -  while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
 +  while (vq->vq_used_cons_idx !=
 + *((volatile uint16_t *)(>vq_ring.used->idx))) {
>>> I'm wondering maybe we could fix VIRTQUEUE_NUSED (which has no such
>>> qualifier) and use this macro here?
>>>
>>> If you check the reference of that macro, you might find similar
>>> issues, say, it is also used inside the while-loop of
>>> virtio_recv_mergeable_pkts().
>>>
>>> --yliu
>>>  
>>>
>> Yes, seems it has same issue though haven't confirmed with asm code.
> So, move the "volatile" qualifier to VIRTQUEUE_NUSED?
>
>   --yliu
>

Yes, anyway this is just intermediate fix. In next patch, will declare
the idx as volatile, and remove the qualifier in the macro.

[dpdk-dev] [PATCH v2] qat: fix phys address of content descriptor

2016-06-02 Thread Jain, Deepak K

> -Original Message-
> From: Kusztal, ArkadiuszX
> Sent: Wednesday, June 1, 2016 11:52 AM
> To: dev at dpdk.org
> Cc: Trahe, Fiona ; Griffin, John
> ; Jain, Deepak K ;
> olivier.matz at 6wind.com; thomas.monjalon at 6wind.com; Kusztal, ArkadiuszX
> 
> Subject: [PATCH v2] qat: fix phys address of content descriptor
> 
> From: Arkadiusz Kusztal 
> 
> Fix an error with computation of physical address of content descriptor in the
> symmetric operations session
> 
> Fixes: 1703e94ac5ce ("qat: add driver for QuickAssist devices")
> 
> Signed-off-by: Arkadiusz Kusztal 
> ---
> v2: Added fixes line to commit message

Acked-by: Deepak Kumar JAIN

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-06-02 Thread Xie, Huawei

On 6/2/2016 4:07 PM, Xie, Huawei wrote:
> We keep a common vq structure, containing only vq related fields,
> and then split others into RX, TX and control queue respectively.
>
> Signed-off-by: Huawei Xie 
sorry, this is v4.

[dpdk-dev] about rx checksum flags

2016-06-02 Thread Chandran, Sugesh

Hi Olivier,

Thank you for working on this..
A comment on the proposal is given below,


Regards
_Sugesh

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Wednesday, June 1, 2016 10:07 AM
> To: Stephen Hemminger ; Olivier MATZ
> 
> Cc: Yuanhan Liu ; dev at dpdk.org; Richardson,
> Bruce ; Adrien Mazarguil
> ; Tan, Jianfeng 
> Subject: Re: [dpdk-dev] about rx checksum flags
> 
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Tuesday, May 31, 2016 11:03 PM
> > To: Olivier MATZ
> > Cc: Yuanhan Liu; dev at dpdk.org; Ananyev, Konstantin; Richardson, Bruce;
> > Adrien Mazarguil; Tan, Jianfeng
> > Subject: Re: [dpdk-dev] about rx checksum flags
> >
> > On Tue, 31 May 2016 22:58:57 +0200
> > Olivier MATZ  wrote:
> >
> > > Hi Stephen,
> > >
> > > On 05/31/2016 10:28 PM, Stephen Hemminger wrote:
> > > > On Tue, 31 May 2016 21:11:59 +0200 Olivier MATZ
> > > >  wrote:
> > > >
> > > >>
> > > >>
> > > >> On 05/31/2016 10:09 AM, Yuanhan Liu wrote:
> > > >>> On Mon, May 30, 2016 at 05:26:21PM +0200, Olivier Matz wrote:
> > >   PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the
> > >  packet  data, but the integrity of the L4 header is verified.
> > >    -> the application can process the packet but must not verify the
> > >   checksum by sw. It has to take care to recalculate the cksum
> > >   if the packet is transmitted (either by sw or using tx
> > >  offload)
> > > >>>
> > > >>> I like the explanation you made at [1] better :)
> > > >>>
> > > >>> So in general, I think this proposal is good to have.
> > > >>
> > > >> Thanks everyone for your feedback.
> > > >>
> > > >> I'll try to send a first patch proposition soon.
> > > >>
> > > >> Regards,
> > > >> Olivier
> > > >
> > > > I think it is time to ditch the old definitions of Rx checksum and
> > > > instead use something more compatiable with virtio (and Linux). I.e
> have three values
> > > >   1) checksum is know good for packet contents
> > > >   2) checksum value one's complement for packet contents
> > > >   3) checksum is undetermined
> > > > The original definition seems to be Intel HW centric and applies
> > > > to a limited range of devices making it unusable by general application.
> > > >
> > > > Break the ABI, and ditch the old values (ok mark
> > > > PKT_RX_L4_CKSUM_BAD as __deprecated and remove all usage).
> > > >
> > >
> > > Don't you think knowing that a checksum is bad could be useful?
> >
> > Not really. They should be mark as undetermined, then software can
> > recheck for the possibly buggy hardware.
> 
> Hmm, I don't see the point here.
> If the HW clearly reports that checksum is invalid (not unknown), why SW has
> to assume it is ' undetermined' and recheck it?
> To me that means just wasted cycles.
> In general, it sounds like really strange approach to me:
> write your SW with assumption that all HW you are going to use will not work
> correctly.
> 
> >
> > > In that case the application can drop/log the packet without any
> > > additional cpu cost.
> > >
> > > What do you mean by beeing unusable by general application?
> >
> > Right now application can only see "known bad" or "indeterminate"
> > there is no way to no which packets are good. Since good is the
> > desired/expected case, software has to checksum every packet.
> >
> > >
> > > I think the "2)" also requires a csum_start + csum_offset in mbuf
> > > structure, right?
> >
> > Not really, it would mean having a way to get the raw one's complement
> > sum out of the hardware. This is a good way to support the tunnel
> > protocol du jour without having to have firmware support.
> > Unfortunately, most hardware vendors don't believe in doing it that way.
> 
> It might be a good feature to have, but if most HW vendors don't support it
> why to bother?
> 
> >
> >
> > > Do you also suggest to drop IP checksum flags?
> >
> > IP checksum offload is mostly useless. If application needs to look at
> > IP, it can do whole checksum in very few instructions, the whole
> > header is in the same cache line as src/dst so the HW offload is really no
> help.
> >
[Sugesh] The checksum offload can boost the tunneling performance in OVS.
I guess the IP checksum also important as L4. In some cases, UDP checksum is
zero and no need to validate it. But Ip checksum is present on all the packets 
and that must be
validated all  the time. At higher packet rate, the ip checksum offload can 
offer slight 
performance improvement. What do you think??

> > >
> > > Will it be possible to manage tunnel checksums?
> > >
> > > I think this would be a pretty big change. If there is no additional
> > > argument than beeing more compatible with virtio/linux, I'm
> > > wondering if it's worth breaking the API. Let's wait for other opinions.
> 
> I think that what Olivier proposed is good enough and definitely a step
> forward from what we have right

[dpdk-dev] [PATCH] e1000: configure VLAN TPID

2016-06-02 Thread Wang, Xiao W

Hi Beilei,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Beilei Xing
> Sent: Thursday, April 21, 2016 4:56 PM
> To: Lu, Wenzhuo 
> Cc: dev at dpdk.org; Xing, Beilei 
> Subject: [dpdk-dev] [PATCH] e1000: configure VLAN TPID
> 
> This patch enables configuring the ether types of both inner and outer VLANs.
> Note that TPID of single or inner VLAN is read only.
> 
> Signed-off-by: Beilei Xing 
> ---
>  drivers/net/e1000/igb_ethdev.c | 34 +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
> index e0053fe..c957fbe 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -86,6 +86,14 @@
>  #define E1000_INCVALUE_82576 (16 << IGB_82576_TSYNC_SHIFT)
>  #define E1000_TSAUXC_DISABLE_SYSTIME 0x8000
> 
> +/* CTRL_EXT bit mask*/
> +#define E1000_CTRL_EXT_EXT_VLAN  (1 << 26)
> +
> +/* VLAN Ether Type bit mask */
> +#define E1000_VET_VET_EXT0x
> +
> +#define E1000_VET_VET_EXT_SHIFT  16
> +
>  static int  eth_igb_configure(struct rte_eth_dev *dev);  static int
> eth_igb_start(struct rte_eth_dev *dev);  static void eth_igb_stop(struct
> rte_eth_dev *dev); @@ -2242,13 +2250,33 @@ eth_igb_vlan_tpid_set(struct
> rte_eth_dev *dev,  {
>   struct e1000_hw *hw =
>   E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> - uint32_t reg = ETHER_TYPE_VLAN;
> + uint32_t reg;
> + uint32_t qinq;
>   int ret = 0;
> 
> + qinq = E1000_READ_REG(hw, E1000_CTRL_EXT);
> + qinq &= E1000_CTRL_EXT_EXT_VLAN;
> +
>   switch (vlan_type) {
>   case ETH_VLAN_TYPE_INNER:
> - reg |= (tpid << 16);
> - E1000_WRITE_REG(hw, E1000_VET, reg);
> + if (qinq)
> + PMD_DRV_LOG(WARNING,
> + "inner vlan ether type is read-only\n");

Add: ret = -ENOTSUP or something else, so that the programmer can handle it.
The same for below.

> + else {
> + PMD_DRV_LOG(ERR, "not set QinQ on yet\n");
> + ret = -EIO;
> + }
> + break;
> + case ETH_VLAN_TYPE_OUTER:
> + if (qinq) {
> + reg = E1000_READ_REG(hw, E1000_VET);
> + reg = (reg & (~E1000_VET_VET_EXT)) |
> + ((uint32_t)tpid << E1000_VET_VET_EXT_SHIFT);
> + E1000_WRITE_REG(hw, E1000_VET, reg);
> + } else
> + PMD_DRV_LOG(WARNING,
> + "single vlan ether type is read-only\n");
> +
>   break;
>   default:
>   ret = -EINVAL;
> --
> 2.5.0

Since both inner and outer tpid are considered in this patch, the comment in
rte_ethdev.h "vlan_tpid_set;  /**< Outer VLAN TPID Setup. */" should be 
changed
accordingly.

Best Regards,
Xiao

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-02 Thread Neil Horman

On Wed, Jun 01, 2016 at 03:00:11PM +, Wiles, Keith wrote:
> Started from the link below, but did not want to highjack the thread.
> http://dpdk.org/ml/archives/dev/2016-June/040021.html
> 
> I was thinking about this problem from a user perspective and command line 
> options are very difficult to manage specifically when you have a large 
> number of options as we have in dpdk. I see all of these options as a type of 
> database of information for the DPDK and the application, because the 
> application command line options are also getting very complex as well.
> 
> I have been looking at a number of different options here and the direction I 
> was thinking was using a file for the options and configurations with the 
> data in a clean format. It could have been a INI file or JSON or XML, but 
> they all seem to have some problems I do not like. The INI file is too flat 
> and I wanted a hierarchy in the data, the JSON data is similar and XML is 
> just hard to read. I wanted to be able to manage multiple applications and 
> possible system the DPDK/app runs. The problem with the above formats is they 
> are just data and not easy to make decisions about the system and 
> applications at runtime.
> 
> If the ?database? of information could be queried by the EAL, drivers and 
> application then we do not need to try and create a complex command line. It 
> would be nice to execute a DPDK applications like this:
> 
> ./some_dpdk_app ?config-file dpdk-config-filename
> 
> The dpdk-config-filename could contain a lot of information and be able to 
> startup multiple different applications. The dpdk-config-file could also 
> include other config files to complete the configuration. The format of the 
> data in the config file needs to be readable, but allow the user to put in 
> new options, needs to be hierarchical in nature and have some simple 
> functions to execute if required.
> 
> The solution I was thinking is the file information is really just a fragment 
> of a scripting language, which the DPDK application contains this scripting 
> language interpreter. I was looking at using Lua lua.org as the scripting 
> language interpreter it is small and easy to understand. Python and others 
> are very big and require a lot of resources and Lua requires very few system 
> resources. Also I did not want to have to write a parser (lex/yacc). The 
> other nice feature of Lua is you can create a sandbox for the code to run in 
> and limit the type of system resources and APIs that can be accessed by the 
> application and configuration. Lua can be trimmed down to a fairly small size 
> and builds on just about any system or we can just install Lua on the system 
> without changes from a rpm or deb.
> 
> I use Lua in pktgen at this time and the interface between ?C? and Lua is 
> very simple and easy. Currently I include Lua in Pktgen, but I could have 
> just used a system library.
> 
> The data in the config file can be data statements along with some limited 
> code to make some data changes at run time without having to modify the real 
> application. Here is a simple config file I wrote: Some of the options do not 
> make sense to be in the file at the same time, but wanted to see all of the 
> options. The mk_lcore_list() and mk_coremap() are just Lua functions we can 
> preload to help convert the simple strings into real data in this case tables 
> of information. The application could be something like pktgen = { map = { ? 
> }, more_options = 1, } this allows the same file to possible contain many 
> application configurations. Needs a bit more work.
> 
> dpdk_default = {
> lcore_mask = 0xFF00,
> lcore_list = mk_lcore_list("0-7", 10, "14-16"),
> coremap = mk_coremap("(0-7)@0,10,(14-16)@1"),
> master_lcore = 1,
> log_level = 7,
> ranks = 4,
> channels = 2,
> memory = 512,
> socket_mem = { 512, 512 },
> huge_dir = "/mnt/huge",
> base_virtaddr = 0,
> create_uio_dev = true,
> vfio_intr = "legacy",
> xen_dom0 = false,
> proc_type = "auto",
> pci_blacklist = {
> "08:00.0",
> "08:00.1",
> "09:00.0",
> "09:00.1",
> "83:00.1",
> "87:00.0",
> "87:00.1",
> "89:00.0",
> "89:00.1"
> },
> pci_whitelist = {
> },
> vdev = {
> eth_pcap0 = { iface = "eth2" },
> eth_pcap1 = { iface = "eth3" },
> },
> driver = { },
> syslog = true,
> vmware_tsc_map = false,
> file_prefix = "pg",
> huge_unlink = true,
> no_huge = false,
> no_pci = false,
> no_hpet = false,
> no_shconf = false,
> }
> 
> pktgen_data = {
>map = { ? },
>more-data = 1,
> }
> 
> The EAL, driver, application, ? would query an API to access the data and the 
> application can change his options quickly without modifying the code.
> 
> Anyway comments are welcome.
>  
> Regards,
> Keith

I'm not sure why you're focusing no selecting a config file

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-06-02 Thread Xie, Huawei

On 6/1/2016 3:13 PM, Yuanhan Liu wrote:
> On Mon, May 30, 2016 at 05:06:20PM +0800, Huawei Xie wrote:
>> We keep a common vq structure, containing only vq related fields,
>> and then split others into RX, TX and control queue respectively.
>>
>> Signed-off-by: Huawei Xie 
>> ---
>> v2:
>> - don't split virtio_dev_rx/tx_queue_setup
>> v3:
>> - fix some 80 char warnings
>> - fix other newer version checkpatch warnings
>> - remove '\n' in PMD_RX_LOG
> Weird, I still saw them.

Maybe missed this.


>
> Besides that, I found a crash issue with this patch applied. You could
> easily reproduce it by:
>
> testpmd> start tx_first
> testpmd> quit
>
> [82774.055297] testpmd[9661]: segfault at 94 ip 004f7ef0 sp 
> 7ffcb1fa
> 66c0 error 4 in testpmd[40+25b000]
> ./t.pmd: line 11:  9661 Segmentation fault   (core dumped) 
> $RTE_SDK/$RTE
> _TARGET/app/testpmd -c 0x1f -n 4 -- --nb-cores=4 -i --disable-hw-vlan 
> --txqflags
>  0xf00 --rxq=$nr_queues --txq=$nr_queues --rxd=256 --txd=256
>
>   --yliu
>

Couldn't reproduce. Seems like resource free issue. Do you test with
multiple queues?

[dpdk-dev] [PATCH v2 1/2] ixgbe: VF supports mailbox interruption for PF link up/down

2016-06-02 Thread Wu, Jingjing



> -Original Message-
> From: Lu, Wenzhuo
> Sent: Wednesday, June 01, 2016 9:53 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Lu, Wenzhuo
> Subject: [PATCH v2 1/2] ixgbe: VF supports mailbox interruption for PF link
> up/down
> 
> In this scenario, kernel PF + DPDK VF, when PF finds the link state changes,
> up -> down or down -> up, it will send a message to VF by mailbox. This link
> state change may be triggered by PHY disconnection/reconnection,
> configuration like *ifconfig down/up* or interface parameter, like MTU,
> change.
> This patch enables the support of the mailbox interruption, so VF can receive
> the message of link up/down.
> After VF receives this message, VF port need to be reset to recover. So the
> handler of this message registers a reset callback to let APP reset the VF 
> port.
> 
> Signed-off-by: Wenzhuo Lu 

Acked-by: Jingjing Wu

[dpdk-dev] [PATCH] i40e: fix flexible payload selection

2016-06-02 Thread Zhe Tao

On Thu, May 12, 2016 at 04:11:40PM +0800, Jingjing Wu wrote:
> When setting up flexible payload selection rules, it is allowed
> that setting value to 63 to disable the rule (NONUSE_FLX_PIT_DEST_OFF).
> However, MK_FLX_PIT macro is always adding an offset value 50
> (I40E_FLX_OFFSET_IN_FIELD_VECTOR), it will be set to "63 + 50" and
> when setting NONUSE_FLX_PIT_DEST_OFF to disable it. It breaks
> the functionality.
> This patch fixes this issue.
> 
> Fixes: d8b90c4eabe9 ("i40e: take flow director flexible payload
>   configuration")
> 
> Reported-by: Michael Habibi 
> Signed-off-by: Jingjing Wu 
> ---
>  drivers/net/i40e/i40e_fdir.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
> index 8aa41e5..efbcd18 100644
> --- a/drivers/net/i40e/i40e_fdir.c
> +++ b/drivers/net/i40e/i40e_fdir.c
> @@ -94,7 +94,9 @@
>   I40E_PRTQF_FLX_PIT_SOURCE_OFF_MASK) | \
>   (((fsize) << I40E_PRTQF_FLX_PIT_FSIZE_SHIFT) & \
>   I40E_PRTQF_FLX_PIT_FSIZE_MASK) | \
> - dst_offset) + I40E_FLX_OFFSET_IN_FIELD_VECTOR) << \
> + dst_offset) == NONUSE_FLX_PIT_DEST_OFF ? \
> + NONUSE_FLX_PIT_DEST_OFF : \
> + ((dst_offset) + I40E_FLX_OFFSET_IN_FIELD_VECTOR)) << \
>   I40E_PRTQF_FLX_PIT_DEST_OFF_SHIFT) & \
>   I40E_PRTQF_FLX_PIT_DEST_OFF_MASK))
>  
> -- 
> 2.4.0
Acked-by: Zhe Tao

[dpdk-dev] [PATCH 0/2] NSH packet type support in i40e

2016-06-02 Thread Zhe Tao

On Tue, May 03, 2016 at 01:51:10PM +0800, Jingjing Wu wrote:
> NSH packet can be recognized by Intel X710/XL710 series. This
> patch set enables it.
> 
> Jingjing Wu (2):
>   mbuf: new NSH packet type
>   i40e: NSH packet type support
> 
>  app/test-pmd/rxonly.c  |  3 +++
>  doc/guides/rel_notes/release_16_07.rst |  2 ++
>  drivers/net/i40e/i40e_rxtx.c   | 27 +++
>  lib/librte_mbuf/rte_mbuf.h |  7 +++
>  4 files changed, 39 insertions(+)
> 
> -- 
> 2.4.0
Acked-by: Zhe Tao

[dpdk-dev] [PATCH v2 01/15] i40e/base: remove HMC AQ APIs

2016-06-02 Thread Gu, YongjieX

Tested-by: Yongjie Gu 

- Check patch: success
- Apply patch: success
- compilation: success
OS: fedora20
GCC: gcc_x86-64, 4.8.3
ICC: 16.0.2
Commit: 587d684d70f9d7f74e77a886c58103b40409caea
i686-native-linuxapp-icc: compile pass
x86_64-native-linuxapp-gcc-combined: compile pass
i686-native-linuxapp-gcc: compile pass
x86_64-native-linuxapp-gcc: compile pass
x86_64-native-linuxapp-icc: compile pass
x86_64-native-linuxapp-gcc-debug: compile pass
x86_64-native-linuxapp-gcc-shared: compile pass
x86_64-native-linuxapp-clang: compile pass

- dts validation: 
 -- Test Commit: c8c33ad7f94c59d1c0676af0cfd61207b3e808db
 -- OS/Kernel: Fedora22/4.2.8-200.fc22.x86_64
 -- GCC: gcc version 5.1.1
 -- CPU: Intel(R) Xeon(R) CPU E5-2658 v2 @ 2.40GHz
 -- NIC: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ [8086:1584]
 -- total 106,failed 10(Detailed case list see in the attachment,10 failed 
cases also exist in daily regression test)

Thanks
Yongjie

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Helin Zhang
Sent: Tuesday, May 24, 2016 2:23 PM
To: dev at dpdk.org
Cc: Zhang, Helin
Subject: [dpdk-dev] [PATCH v2 01/15] i40e/base: remove HMC AQ APIs

HMC AQ APIs were removed from the latest datasheet, and hence remove its 
implementations and relevant.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/base/i40e_adminq.c |  4 ---
 drivers/net/i40e/base/i40e_adminq_cmd.h | 25 --
 drivers/net/i40e/base/i40e_common.c | 61 -
 drivers/net/i40e/base/i40e_prototype.h  |  8 -
 4 files changed, 98 deletions(-)

diff --git a/drivers/net/i40e/base/i40e_adminq.c 
b/drivers/net/i40e/base/i40e_adminq.c
index 222add4..0e4198e 100644
--- a/drivers/net/i40e/base/i40e_adminq.c
+++ b/drivers/net/i40e/base/i40e_adminq.c
@@ -683,10 +683,6 @@ enum i40e_status_code i40e_init_adminq(struct i40e_hw *hw)
hw->aq.nvm_release_on_done = false;
hw->nvmupd_state = I40E_NVMUPD_STATE_INIT;

-   ret_code = i40e_aq_set_hmc_resource_profile(hw,
-   I40E_HMC_PROFILE_DEFAULT,
-   0,
-   NULL);
 #endif /* PF_DRIVER */
ret_code = I40E_SUCCESS;

diff --git a/drivers/net/i40e/base/i40e_adminq_cmd.h 
b/drivers/net/i40e/base/i40e_adminq_cmd.h
index fe9d5b5..58ba609 100644
--- a/drivers/net/i40e/base/i40e_adminq_cmd.h
+++ b/drivers/net/i40e/base/i40e_adminq_cmd.h
@@ -224,10 +224,6 @@ enum i40e_admin_queue_opc {
i40e_aqc_opc_resume_port_tx = 0x041C,
i40e_aqc_opc_configure_partition_bw = 0x041D,

-   /* hmc */
-   i40e_aqc_opc_query_hmc_resource_profile = 0x0500,
-   i40e_aqc_opc_set_hmc_resource_profile   = 0x0501,
-
/* phy commands*/
i40e_aqc_opc_get_phy_abilities  = 0x0600,
i40e_aqc_opc_set_phy_config = 0x0601,
@@ -1646,27 +1642,6 @@ struct i40e_aqc_configure_partition_bw_data {

 I40E_CHECK_STRUCT_LEN(0x22, i40e_aqc_configure_partition_bw_data);

-/* Get and set the active HMC resource profile and status.
- * (direct 0x0500) and (direct 0x0501)
- */
-struct i40e_aq_get_set_hmc_resource_profile {
-   u8  pm_profile;
-   u8  pe_vf_enabled;
-   u8  reserved[14];
-};
-
-I40E_CHECK_CMD_LENGTH(i40e_aq_get_set_hmc_resource_profile);
-
-enum i40e_aq_hmc_profile {
-   /* I40E_HMC_PROFILE_NO_CHANGE= 0, reserved */
-   I40E_HMC_PROFILE_DEFAULT= 1,
-   I40E_HMC_PROFILE_FAVOR_VF   = 2,
-   I40E_HMC_PROFILE_EQUAL  = 3,
-};
-
-#define I40E_AQ_GET_HMC_RESOURCE_PROFILE_PM_MASK   0xF
-#define I40E_AQ_GET_HMC_RESOURCE_PROFILE_COUNT_MASK0x3F
-
 /* Get PHY Abilities (indirect 0x0600) uses the generic indirect struct */

 /* set in param0 for get phy abilities to report qualified modules */ diff 
--git a/drivers/net/i40e/base/i40e_common.c 
b/drivers/net/i40e/base/i40e_common.c
index ef3425e..7a5f754 100644
--- a/drivers/net/i40e/base/i40e_common.c
+++ b/drivers/net/i40e/base/i40e_common.c
@@ -3240,67 +3240,6 @@ enum i40e_status_code 
i40e_aq_debug_write_register(struct i40e_hw *hw,  }

 /**
- * i40e_aq_get_hmc_resource_profile
- * @hw: pointer to the hw struct
- * @profile: type of profile the HMC is to be set as
- * @pe_vf_enabled_count: the number of PE enabled VFs the system has
- * @cmd_details: pointer to command details structure or NULL
- *
- * query the HMC profile of the device.
- **/
-enum i40e_status_code i40e_aq_get_hmc_resource_profile(struct i40e_hw *hw,
-   enum i40e_aq_hmc_profile *profile,
-   u8 *pe_vf_enabled_count,
-   struct i40e_asq_cmd_details *cmd_details)
-{
-   struct i40e_aq_desc desc;
-   struct

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-06-02 Thread Huawei Xie

We keep a common vq structure, containing only vq related fields,
and then split others into RX, TX and control queue respectively.

Signed-off-by: Huawei Xie 
---
v2:
- don't split virtio_dev_rx/tx_queue_setup
v3:
- fix some 80 char warnings
- fix other newer version checkpatch warnings

- remove hdr zone allocation for RX queue
v4:
- remove '\n' in PMD_RX_LOG
- fix some conversions between vq and rx/txvq in virtio_dev_free_mbufs

 drivers/net/virtio/virtio_ethdev.c  | 374 ++--
 drivers/net/virtio/virtio_ethdev.h  |   2 +-
 drivers/net/virtio/virtio_pci.c |   4 +-
 drivers/net/virtio/virtio_pci.h |   3 +-
 drivers/net/virtio/virtio_rxtx.c| 282 +---
 drivers/net/virtio/virtio_rxtx.h|  56 -
 drivers/net/virtio/virtio_rxtx_simple.c |  83 +++
 drivers/net/virtio/virtqueue.h  |  70 +++---
 8 files changed, 496 insertions(+), 378 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index c3fb628..cba01d1 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -114,40 +114,61 @@ struct rte_virtio_xstats_name_off {
 };

 /* [rt]x_qX_ is prepended to the name string here */
-static const struct rte_virtio_xstats_name_off rte_virtio_q_stat_strings[] = {
-   {"good_packets",   offsetof(struct virtqueue, packets)},
-   {"good_bytes", offsetof(struct virtqueue, bytes)},
-   {"errors", offsetof(struct virtqueue, errors)},
-   {"multicast_packets",  offsetof(struct virtqueue, multicast)},
-   {"broadcast_packets",  offsetof(struct virtqueue, broadcast)},
-   {"undersize_packets",  offsetof(struct virtqueue, size_bins[0])},
-   {"size_64_packets",offsetof(struct virtqueue, size_bins[1])},
-   {"size_65_127_packets",offsetof(struct virtqueue, size_bins[2])},
-   {"size_128_255_packets",   offsetof(struct virtqueue, size_bins[3])},
-   {"size_256_511_packets",   offsetof(struct virtqueue, size_bins[4])},
-   {"size_512_1023_packets",  offsetof(struct virtqueue, size_bins[5])},
-   {"size_1024_1517_packets", offsetof(struct virtqueue, size_bins[6])},
-   {"size_1518_max_packets",  offsetof(struct virtqueue, size_bins[7])},
+static const struct rte_virtio_xstats_name_off rte_virtio_rxq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_rx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_rx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_rx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_rx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_rx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_rx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_rx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_rx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_rx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_rx, 
stats.size_bins[7])},
 };

-#define VIRTIO_NB_Q_XSTATS (sizeof(rte_virtio_q_stat_strings) / \
-   sizeof(rte_virtio_q_stat_strings[0]))
+/* [rt]x_qX_ is prepended to the name string here */
+static const struct rte_virtio_xstats_name_off rte_virtio_txq_stat_strings[] = 
{
+   {"good_packets",   offsetof(struct virtnet_tx, stats.packets)},
+   {"good_bytes", offsetof(struct virtnet_tx, stats.bytes)},
+   {"errors", offsetof(struct virtnet_tx, stats.errors)},
+   {"multicast_packets",  offsetof(struct virtnet_tx, 
stats.multicast)},
+   {"broadcast_packets",  offsetof(struct virtnet_tx, 
stats.broadcast)},
+   {"undersize_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[0])},
+   {"size_64_packets",offsetof(struct virtnet_tx, 
stats.size_bins[1])},
+   {"size_65_127_packets",offsetof(struct virtnet_tx, 
stats.size_bins[2])},
+   {"size_128_255_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[3])},
+   {"size_256_511_packets",   offsetof(struct virtnet_tx, 
stats.size_bins[4])},
+   {"size_512_1023_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[5])},
+   {"size_1024_1517_packets", offsetof(struct virtnet_tx, 
stats.size_bins[6])},
+   {"size_1518_max_packets",  offsetof(struct virtnet_tx, 
stats.size_bins[7])},
+};
+
+#define VIRTIO_NB_RXQ_XSTATS (sizeof(rte_virtio_rxq_stat_strings) / \
+

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-02 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 04:03:07PM +0200, Thomas Monjalon wrote:
> 2016-06-01 21:19, Yuanhan Liu:
> > On Wed, Jun 01, 2016 at 02:39:28PM +0200, Thomas Monjalon wrote:
> > > I was thinking to implement the library options parsing in DPDK.
> > > But if the application implements its own options parsing without using
> > > the DPDK one, yes the option parsing is obviously done in the application.
> > > 
> > > > I'd say, that would work, but I see inflexibility and some drawbacks:
> > > > 
> > > > - I would assume "--pciopt" has the input style of
> > > > 
> > > >   "domain:bus:devid:func,option1,option2,..."
> > > > 
> > > >   It then looks hard to me to use it: I need figure out the
> > > >   pci id first.
> > > 
> > > What do you suggest instead of PCI id?
> > 
> > That might depend on what you need: if you want to configure a specific
> > device, yes, PCI is needed (or even, a must). In such case, the --pciopt
> > or the side effect of --pci-whitelist could help. I confess this would
> > be helpful in some cases.
> > 
> > I guess there is another need is that we might want to pass an option
> > to a driver, say ixgbe, that would work for all devices operated by it.
> > In such case, we could use driver name (see the example below).
> > 
> > > > - For the libraries, we have to write code to add new options for
> > > >   each applictions. With the generic option, user just need use it;
> > > >   don't need write a single line of code, which could save user's
> > > >   effort. It also gives user an united interface.
> > > 
> > > Applications can leverage the DPDK options parsing (without writing
> > > any new line of code).
> > > 
> > > >   And to make it clear, as stated, I don't object to having an API.
> > > >   I mean, the generic option gives us a chance to do the right
> > > >   configuration at startup time: it would still invoke the right
> > > >   API to do the right thing in the end.
> > > 
> > > I agree. I just want to split your --extra-option proposal in few options
> > > with a bit more context.
> > 
> > Firstly, the name "--extra-option" might not be well taken :(
> > I just want to show the idea first.
> > 
> > Secondly, splitting it to more options would result to quite many
> > options introduced; it's also hard to list all of them. User intend
> > to get lost if so many options provided. It also introduces more
> > chaos, especially when we are going to add one option for each
> > library.
> > 
> > For the context issue, I guess we could address it by adding a prefix.
> > Such as,
> > 
> > --extra-options (or --whatever) "vhost.owner=kvm:kvm virtio.force_legacy
> >  mempool.handler=xxx"
> > 
> > Based on that, we could introduce other sematics, to let it be:
> > 
> > driver.opt | driver.pci_id.opt
> > 
> > Where,
> > 
> > - driver.opt
> >   applies to all devices operated by this driver
> > 
> > - driver.pci_id.opt
> >   applies only to a specific device with the same pci ID.
> > 
> > This could save us changing the --pci-whitelist sematic, yet it saves
> > us introducing a new option (--pciopt).
> > 
> > What do you think of it?
> 
> I like the idea :)

Superb!

> One important benefit of having only one option is to make easier to rename
> in applications to e.g. --dpdk-options and pass the string to the DPDK
> parsing function.
> I think we must allow several occurences of this new option on the CLI.

No idea so far; I'm thinking one should be enough. But I also see no
issue when allowing several occurences. Let's recheck it later.

> 
> At the end, the main issue is to find a shiny name for this option ;)

You know what, I'm really not good at naming, so might need your
help :-)

--yliu

91 matches

Mail list logo