date:20160406

[dpdk-dev] [PATCH] ixgbe: cleanup whitespace and formatting issues

2016-04-06 Thread Lu, Wenzhuo

Hi Stephen,


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, April 5, 2016 9:56 AM
> To: Lu, Wenzhuo
> Cc: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: cleanup whitespace and formatting
> issues
> 
> On Tue, 5 Apr 2016 00:57:16 +
> "Lu, Wenzhuo"  wrote:
> 
> > Hi Stephen,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> > > Hemminger
> > > Sent: Tuesday, April 5, 2016 12:14 AM
> > > To: Zhang, Helin; Ananyev, Konstantin
> > > Cc: dev at dpdk.org; Stephen Hemminger
> > > Subject: [dpdk-dev] [PATCH] ixgbe: cleanup whitespace and formatting
> > > issues
> > >
> > > This driver was one of the originals and has lots of little whitespace 
> > > issues.
> > >
> > > PS: I know Intel doesn't like whitespace changes, there is never a
> > > good time to do this, but no resuliting binary changes and it is
> > > unlikely that more changes to this driver will occur this late in release 
> > > cycle.
> > >
> > > Signed-off-by: Stephen Hemminger 
> > Thanks for this patch. I think it's good to make the format better :)
> > But there's some checkpatch error and warnings, like this,
> > ERROR: "foo* bar" should be "foo *bar"
> > #508: FILE: drivers/net/ixgbe/ixgbe_ethdev.c:4256:
> > +ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct ether_addr*
> > +mac_addr,
> >
> 
> I didn't fix all the checkpatch errors, just the easy ones.
So, If you don't object, I can try to make a new version with the checkpatch 
error/warning fixed.

[dpdk-dev] [RFC] vhost-user public struct refactor (was Re: [PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

2016-04-06 Thread Flavio Leitner

On Tue, Apr 05, 2016 at 01:47:33PM +0800, Yuanhan Liu wrote:
> On Fri, Feb 19, 2016 at 03:06:50PM +0800, Yuanhan Liu wrote:
> > On Fri, Feb 19, 2016 at 09:32:41AM +0300, Ilya Maximets wrote:
> > > Array of buf_vector's is just an array for temporary storing information
> > > about available descriptors. It used only locally in virtio_dev_merge_rx()
> > > and there is no reason for that array to be shared.
> > > 
> > > Fix that by allocating local buf_vec inside virtio_dev_merge_rx().
> > > 
> > > Signed-off-by: Ilya Maximets 
> > > ---
> > >  lib/librte_vhost/rte_virtio_net.h |  1 -
> > >  lib/librte_vhost/vhost_rxtx.c | 45 
> > > ---
> > >  2 files changed, 23 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/lib/librte_vhost/rte_virtio_net.h 
> > > b/lib/librte_vhost/rte_virtio_net.h
> > > index 10dcb90..ae1e4fb 100644
> > > --- a/lib/librte_vhost/rte_virtio_net.h
> > > +++ b/lib/librte_vhost/rte_virtio_net.h
> > > @@ -91,7 +91,6 @@ struct vhost_virtqueue {
> > >   int kickfd; /**< Currently unused 
> > > as polling mode is enabled. */
> > >   int enabled;
> > >   uint64_treserved[16];   /**< Reserve some 
> > > spaces for future extension. */
> > > - struct buf_vector   buf_vec[BUF_VECTOR_MAX];/**< for 
> > > scatter RX. */
> > >  } __rte_cache_aligned;
> > 
> > I like this kind of cleanup, however, it breaks ABI.
> 
> So, I was considering to add vhost-user Tx delayed-copy (or zero copy)
> support recently, which comes to yet another ABI violation, as we need
> add a new field to virtio_memory_regions struct to do guest phys addr
> to host phys addr translation. You may ask, however, that why do we need
> expose virtio_memory_regions struct to users at all?
> 
> You are right, we don't have to. And here is the thing: we exposed way
> too many fields (or even structures) than necessary. Say, vhost_virtqueue
> struct should NOT be exposed to user at all: application just need to
> tell the right queue id to locate a specific queue, and that's all.
> The structure should be defined in an internal header file. With that,
> we could do any changes to it we want, without worrying about that we
> may offense the painful ABI rules.
> 
> Similar changes could be done to virtio_net struct as well, just exposing
> very few fields that are necessary and moving all others to an internal
> structure.
> 
> Huawei then suggested a more radical yet much cleaner one: just exposing
> a virtio_net handle to application, just like the way kernel exposes an
> fd to user for locating a specific file. However, it's more than an ABI
> change; it's also an API change: some fields are referenced by applications,
> such as flags, virt_qp_nb. We could expose some new functions to access
> them though.
> 
> I'd vote for this one, as it sounds very clean to me. This would also
> solve the block issue of this patch. Though it would break OVS, I'm thinking
> that'd be okay, as OVS has dependence on DPDK version: what we need to
> do is just to send few patches to OVS, and let it points to next release,
> say DPDK v16.07. Flavio, please correct me if I'm wrong.

There is a plan to use vHost PMD, so from OVS point of view the virtio
stuff would be hidden because vhost PMD would look like just as a
regular ethernet, right?

I think we are waiting for 16.04 to be released with that so we can
start push changes to OVS as well.

-- 
fbl

[dpdk-dev] [RFC] vhost-user public struct refactor (was Re: [PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

2016-04-06 Thread Yuanhan Liu

On Wed, Apr 06, 2016 at 01:14:09AM -0300, Flavio Leitner wrote:
> > 
> > I'd vote for this one, as it sounds very clean to me. This would also
> > solve the block issue of this patch. Though it would break OVS, I'm thinking
> > that'd be okay, as OVS has dependence on DPDK version: what we need to
> > do is just to send few patches to OVS, and let it points to next release,
> > say DPDK v16.07. Flavio, please correct me if I'm wrong.
> 
> There is a plan to use vHost PMD,

Great.

> so from OVS point of view the virtio
> stuff would be hidden because vhost PMD would look like just as a
> regular ethernet, right?

Yes.

--yliu

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Yuanhan Liu

On Tue, Apr 05, 2016 at 05:09:47PM +0100, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
> 
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
> 
> Signed-off-by: Ciara Loftus 
> ---

Acked-by: Yuanhan Liu 

--yliu

[dpdk-dev] [RFC] vhost-user public struct refactor (was Re: [PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

2016-04-06 Thread Ilya Maximets

--- Original Message ---
Sender : Flavio Leitner
Date : Apr 06, 2016 07:14 (GMT+03:00)
Title : Re: [RFC] vhost-user public struct refactor (was Re: [dpdk-dev] [PATCH 
RFC 2/4] vhost: make buf vector for scatter RX) local.

On Tue, Apr 05, 2016 at 01:47:33PM +0800, Yuanhan Liu wrote:
> On Fri, Feb 19, 2016 at 03:06:50PM +0800, Yuanhan Liu wrote:
> > On Fri, Feb 19, 2016 at 09:32:41AM +0300, Ilya Maximets wrote:
> > > Array of buf_vector's is just an array for temporary storing information
> > > about available descriptors. It used only locally in virtio_dev_merge_rx()
> > > and there is no reason for that array to be shared.
> > > 
> > > Fix that by allocating local buf_vec inside virtio_dev_merge_rx().
> > > 
> > > Signed-off-by: Ilya Maximets 
> > > ---
> > >  lib/librte_vhost/rte_virtio_net.h |  1 -
> > >  lib/librte_vhost/vhost_rxtx.c | 45 
> > > ---
> > >  2 files changed, 23 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/lib/librte_vhost/rte_virtio_net.h 
> > > b/lib/librte_vhost/rte_virtio_net.h
> > > index 10dcb90..ae1e4fb 100644
> > > --- a/lib/librte_vhost/rte_virtio_net.h
> > > +++ b/lib/librte_vhost/rte_virtio_net.h
> > > @@ -91,7 +91,6 @@ struct vhost_virtqueue {
> > >   int kickfd; /**< Currently unused as polling mode is enabled. */
> > >   int enabled;
> > >   uint64_t reserved[16]; /**< Reserve some spaces for future extension. */
> > > - struct buf_vector buf_vec[BUF_VECTOR_MAX]; /**< for scatter RX. */
> > >  } __rte_cache_aligned;
> > 
> > I like this kind of cleanup, however, it breaks ABI.
> 
> So, I was considering to add vhost-user Tx delayed-copy (or zero copy)
> support recently, which comes to yet another ABI violation, as we need
> add a new field to virtio_memory_regions struct to do guest phys addr
> to host phys addr translation. You may ask, however, that why do we need
> expose virtio_memory_regions struct to users at all?
> 
> You are right, we don't have to. And here is the thing: we exposed way
> too many fields (or even structures) than necessary. Say, vhost_virtqueue
> struct should NOT be exposed to user at all: application just need to
> tell the right queue id to locate a specific queue, and that's all.
> The structure should be defined in an internal header file. With that,
> we could do any changes to it we want, without worrying about that we
> may offense the painful ABI rules.
> 
> Similar changes could be done to virtio_net struct as well, just exposing
> very few fields that are necessary and moving all others to an internal
> structure.
> 
> Huawei then suggested a more radical yet much cleaner one: just exposing
> a virtio_net handle to application, just like the way kernel exposes an
> fd to user for locating a specific file. However, it's more than an ABI
> change; it's also an API change: some fields are referenced by applications,
> such as flags, virt_qp_nb. We could expose some new functions to access
> them though.
> 
> I'd vote for this one, as it sounds very clean to me. This would also
> solve the block issue of this patch. Though it would break OVS, I'm thinking
> that'd be okay, as OVS has dependence on DPDK version: what we need to
> do is just to send few patches to OVS, and let it points to next release,
> say DPDK v16.07. Flavio, please correct me if I'm wrong.

> There is a plan to use vHost PMD, so from OVS point of view the virtio
> stuff would be hidden because vhost PMD would look like just as a
> regular ethernet, right?

But we still need to have access to virtqueue_enabe/disable notifications to
work properly. How this will be done if virtqueue will be hidden from user?

Best regards, Ilya Maximets.

[dpdk-dev] DPDK namespace

2016-04-06 Thread Yuanhan Liu

On Tue, Apr 05, 2016 at 05:31:22PM +0300, Arnon Warshavsky wrote:
> On Tue, Apr 5, 2016 at 5:13 PM, Trahe, Fiona  wrote:
> 
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > > Sent: Tuesday, April 05, 2016 2:57 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] DPDK namespace
> > >
> > > DPDK is going to be more popular in Linux distributions.
> > > It means people will have some DPDK files in their /usr/include and some
> > DPDK
> > > libraries on their system.
> > >
> > > Let's imagine someone trying to compile an application which needs
> > > rte_ethdev.h. He has to figure out that this "rte header" is provided by
> > the DPDK.
> > > Hopefully it will be explained on StackOverflow that RTE stands for DPDK.
> > > Then someone else will try to run a binary without having installed the
> > DPDK
> > > libraries. The linker will require libethdev.so (no prefix here).
> > > StackOverflow will probably have another good answer (among wrong ones):
> > > "Hey Sherlock Holmes, have you tried to install the DPDK library?"
> > > Followed by an insight: "You know, the DPDK naming is weird..."
> > > And we could continue the story with developers having some naming clash
> > > because of some identifiers not prefixed at all.
> > >
> > > The goal of this email is to get some feedback on how important it is to
> > fix the
> > > DPDK namespace.
> > >
> > > If there is enough agreement that we should do something, I suggest to
> > > introduce the "dpdk_" prefix slowly and live with both "rte_" and "dpdk_"
> > > during some time.
> > > We could start using the new prefix for the new APIs (example: crypto)
> > or when
> > > there is a significant API break (example: mempool).
> > >
> > > Opinions welcome!
> > I don't have an opinion on how important it is to fix the namespace,
> > though it does seem like a good idea.
> > However if it's to be done, in my opinion it should be completed quickly
> > or will just cause more confusion.
> > So if rte_cryptoxxx becomes dpdk_cryptoxxx all other libraries should
> > follow in next release or two, with
> > the resulting ABI compatibility handling. Maybe with dual naming handled
> > for several releases, but a
> > clear end date when all are converted.
> > Else there will be many years with a mix of rte_ and dpdk_
> >
> >
> 
> Googling rte functions or error codes usually takes you to dpdk dev email
> archive so I don't think it is that much difficult to figure out where rte
> comes from.
> Other than that , except for my own refactoring pains when replacing a dpdk
> version, I do not see a major reason why not.
> If Going for dpdk_ prefix, I agree with the quick death approach.

+1: it's a bit weird to keep both, especially for a long while, that
every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
Instead of breaking applications many times, I'd prefer to break once.
Therefore, applications could do a simple global rte_ -> dpdk_ substitute:
it doesn't sound that painful then.

And here are few more comments:

- we should add rte_/dpdk_ prefix to all public structures as well.

  I'm thinking we are doing well here. I'm just aware that vhost lib
  does a bad job, which is something I proposed to fix in next release.

- If we do the whole change once, I'd suggest to do it ASAP when this
  release is over.

  It should be a HUGE change that touches a lot of code, if we do it
  later, at a stage that a lot of patches for new features have been
  made or sent out, all of them need rebase. That'd be painful.

--yliu

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Tan, Jianfeng

Hi,

Just out of interest, seems that the message handling thread which runs 
new_device() is pthread_create() from the thread which calls the 
dev_start(), usually master thread, right? But it's not necessary to be 
the master thread to poll pkts from this vhost port, right? So what's 
the significance to record the numa_node information of message handling 
thread here? Shall we make the decision of numa_realloc based on the 
final PMD thread who is responsible for polling this vhost port?

It's not related to this patch itself. And it seems good to me.


Thanks,
Jianfeng



On 4/6/2016 12:09 AM, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
>
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
>
> Signed-off-by: Ciara Loftus 
> ---
>   drivers/net/vhost/rte_eth_vhost.c | 28 ++--
>   1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
> b/drivers/net/vhost/rte_eth_vhost.c
> index 4cc6bec..b1eb082 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
>   struct pmd_internal *internal;
>   struct vhost_queue *vq;
>   unsigned i;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> + int newnode, ret;
> +#endif
>   
>   if (dev == NULL) {
>   RTE_LOG(INFO, PMD, "Invalid argument\n");
> @@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
>   eth_dev = list->eth_dev;
>   internal = eth_dev->data->dev_private;
>   
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> + ret  = get_mempolicy(, NULL, 0, dev,
> + MPOL_F_NODE | MPOL_F_ADDR);
> + if (ret < 0) {
> + RTE_LOG(ERR, PMD, "Unknown numa node\n");
> + return -1;
> + }
> +
> + eth_dev->data->numa_node = newnode;
> +#endif
> +
>   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>   vq = eth_dev->data->rx_queues[i];
>   if (vq == NULL)
> @@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
> vring, int enable)
>   struct rte_vhost_vring_state *state;
>   struct rte_eth_dev *eth_dev;
>   struct internal_list *list;
> -#ifdef RTE_LIBRTE_VHOST_NUMA
> - int newnode, ret;
> -#endif
>   
>   if (dev == NULL) {
>   RTE_LOG(ERR, PMD, "Invalid argument\n");
> @@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
> vring, int enable)
>   eth_dev = list->eth_dev;
>   /* won't be NULL */
>   state = vring_states[eth_dev->data->port_id];
> -
> -#ifdef RTE_LIBRTE_VHOST_NUMA
> - ret  = get_mempolicy(, NULL, 0, dev,
> - MPOL_F_NODE | MPOL_F_ADDR);
> - if (ret < 0) {
> - RTE_LOG(ERR, PMD, "Unknown numa node\n");
> - return -1;
> - }
> -
> - eth_dev->data->numa_node = newnode;
> -#endif
>   rte_spinlock_lock(>lock);
>   state->cur[vring] = enable;
>   state->max_vring = RTE_MAX(vring, state->max_vring);

[dpdk-dev] [RFC] vhost-user public struct refactor (was Re: [PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

2016-04-06 Thread Yuanhan Liu

On Wed, Apr 06, 2016 at 05:11:01AM +, Ilya Maximets wrote:
> > There is a plan to use vHost PMD, so from OVS point of view the virtio
> > stuff would be hidden because vhost PMD would look like just as a
> > regular ethernet, right?
> 
> But we still need to have access to virtqueue_enabe/disable notifications to
> work properly. How this will be done if virtqueue will be hidden from user?

Do you mean vring_state_changed() callback? It will not be removed.
BTW, when using vhost pmd, you will not be aware of such callback:
it will be translated to a RTE_ETH_EVENT_QUEUE_STATE interrupt.

OTOH, I have a simple git grep of "vq" from ovs dpdk netdev code,
it returns nothing. So, I don't think that will matter?

--yliu

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Yuanhan Liu

On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
> Hi,
> 
> Just out of interest, seems that the message handling thread which runs
> new_device() is pthread_create() from the thread which calls the
> dev_start(), usually master thread, right? But it's not necessary to be the
> master thread to poll pkts from this vhost port, right? So what's the
> significance to record the numa_node information of message handling thread
> here? Shall we make the decision of numa_realloc based on the final PMD
> thread who is responsible for polling this vhost port?

It doesn't matter on which core we made the decision: the result
would be same since we are querying the numa node info of the
virtio_net dev struct.

--yliu
> 
> It's not related to this patch itself. And it seems good to me.
> 
> 
> Thanks,
> Jianfeng
> 
> 
> 
> On 4/6/2016 12:09 AM, Ciara Loftus wrote:
> >After some testing, it was found that retrieving numa information
> >about a vhost device via a call to get_mempolicy is more
> >accurate when performed during the new_device callback versus
> >the vring_state_changed callback, in particular upon initial boot
> >of the VM.  Performing this check during new_device is also
> >potentially more efficient as this callback is only triggered once
> >during device initialisation, compared with vring_state_changed
> >which may be called multiple times depending on the number of
> >queues assigned to the device.
> >
> >Reorganise the code to perform this check and assign the correct
> >socket_id to the device during the new_device callback.
> >
> >Signed-off-by: Ciara Loftus 
> >---
> >  drivers/net/vhost/rte_eth_vhost.c | 28 ++--
> >  1 file changed, 14 insertions(+), 14 deletions(-)
> >
> >diff --git a/drivers/net/vhost/rte_eth_vhost.c 
> >b/drivers/net/vhost/rte_eth_vhost.c
> >index 4cc6bec..b1eb082 100644
> >--- a/drivers/net/vhost/rte_eth_vhost.c
> >+++ b/drivers/net/vhost/rte_eth_vhost.c
> >@@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
> > struct pmd_internal *internal;
> > struct vhost_queue *vq;
> > unsigned i;
> >+#ifdef RTE_LIBRTE_VHOST_NUMA
> >+int newnode, ret;
> >+#endif
> > if (dev == NULL) {
> > RTE_LOG(INFO, PMD, "Invalid argument\n");
> >@@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
> > eth_dev = list->eth_dev;
> > internal = eth_dev->data->dev_private;
> >+#ifdef RTE_LIBRTE_VHOST_NUMA
> >+ret  = get_mempolicy(, NULL, 0, dev,
> >+MPOL_F_NODE | MPOL_F_ADDR);
> >+if (ret < 0) {
> >+RTE_LOG(ERR, PMD, "Unknown numa node\n");
> >+return -1;
> >+}
> >+
> >+eth_dev->data->numa_node = newnode;
> >+#endif
> >+
> > for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> > vq = eth_dev->data->rx_queues[i];
> > if (vq == NULL)
> >@@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
> >vring, int enable)
> > struct rte_vhost_vring_state *state;
> > struct rte_eth_dev *eth_dev;
> > struct internal_list *list;
> >-#ifdef RTE_LIBRTE_VHOST_NUMA
> >-int newnode, ret;
> >-#endif
> > if (dev == NULL) {
> > RTE_LOG(ERR, PMD, "Invalid argument\n");
> >@@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
> >vring, int enable)
> > eth_dev = list->eth_dev;
> > /* won't be NULL */
> > state = vring_states[eth_dev->data->port_id];
> >-
> >-#ifdef RTE_LIBRTE_VHOST_NUMA
> >-ret  = get_mempolicy(, NULL, 0, dev,
> >-MPOL_F_NODE | MPOL_F_ADDR);
> >-if (ret < 0) {
> >-RTE_LOG(ERR, PMD, "Unknown numa node\n");
> >-return -1;
> >-}
> >-
> >-eth_dev->data->numa_node = newnode;
> >-#endif
> > rte_spinlock_lock(>lock);
> > state->cur[vring] = enable;
> > state->max_vring = RTE_MAX(vring, state->max_vring);

[dpdk-dev] [RFC] vhost-user public struct refactor (was Re: [PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

2016-04-06 Thread Ilya Maximets

--- Original Message ---
Sender : Yuanhan Liu
Date : Apr 06, 2016 08:38 (GMT+03:00)
Title : Re: Re: [RFC] vhost-user public struct refactor (was Re: [dpdk-dev] 
[PATCH RFC 2/4] vhost: make buf vector for scatter RX) local.

On Wed, Apr 06, 2016 at 05:11:01AM +, Ilya Maximets wrote:
> > > There is a plan to use vHost PMD, so from OVS point of view the virtio
> > > stuff would be hidden because vhost PMD would look like just as a
> > > regular ethernet, right?
> > 
> > But we still need to have access to virtqueue_enabe/disable notifications to
> > work properly. How this will be done if virtqueue will be hidden from user?
>
> Do you mean vring_state_changed() callback? It will not be removed.
> BTW, when using vhost pmd, you will not be aware of such callback:
> it will be translated to a RTE_ETH_EVENT_QUEUE_STATE interrupt.
>
> OTOH, I have a simple git grep of "vq" from ovs dpdk netdev code,
> it returns nothing. So, I don't think that will matter?

OK, thanks for clarifying.
I guess, all should be fine in that case. Thank you.

Best regards, Ilya Maximets.

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Tan, Jianfeng



On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
>> Hi,
>>
>> Just out of interest, seems that the message handling thread which runs
>> new_device() is pthread_create() from the thread which calls the
>> dev_start(), usually master thread, right? But it's not necessary to be the
>> master thread to poll pkts from this vhost port, right? So what's the
>> significance to record the numa_node information of message handling thread
>> here? Shall we make the decision of numa_realloc based on the final PMD
>> thread who is responsible for polling this vhost port?
> It doesn't matter on which core we made the decision: the result
> would be same since we are querying the numa node info of the
> virtio_net dev struct.

OK, according to your hint, read numa_realloc() again, it's to keep *dev 
(struct virtio_net), *dev->virtqueue[], on the same numa socket of 
shared memory with virtio?

And numa_realloc() is called in vhost_set_vring_addr(), which is after 
new_device()?


>
>   --yliu
>> It's not related to this patch itself. And it seems good to me.
>>
>>
>> Thanks,
>> Jianfeng
>>
>>
>>
>> On 4/6/2016 12:09 AM, Ciara Loftus wrote:
>>> After some testing, it was found that retrieving numa information
>>> about a vhost device via a call to get_mempolicy is more
>>> accurate when performed during the new_device callback versus
>>> the vring_state_changed callback, in particular upon initial boot
>>> of the VM.  Performing this check during new_device is also
>>> potentially more efficient as this callback is only triggered once
>>> during device initialisation, compared with vring_state_changed
>>> which may be called multiple times depending on the number of
>>> queues assigned to the device.
>>>
>>> Reorganise the code to perform this check and assign the correct
>>> socket_id to the device during the new_device callback.
>>>
>>> Signed-off-by: Ciara Loftus 
>>> ---
>>>   drivers/net/vhost/rte_eth_vhost.c | 28 ++--
>>>   1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
>>> b/drivers/net/vhost/rte_eth_vhost.c
>>> index 4cc6bec..b1eb082 100644
>>> --- a/drivers/net/vhost/rte_eth_vhost.c
>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>> @@ -229,6 +229,9 @@ new_device(struct virtio_net *dev)
>>> struct pmd_internal *internal;
>>> struct vhost_queue *vq;
>>> unsigned i;
>>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>>> +   int newnode, ret;
>>> +#endif
>>> if (dev == NULL) {
>>> RTE_LOG(INFO, PMD, "Invalid argument\n");
>>> @@ -244,6 +247,17 @@ new_device(struct virtio_net *dev)
>>> eth_dev = list->eth_dev;
>>> internal = eth_dev->data->dev_private;
>>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>>> +   ret  = get_mempolicy(, NULL, 0, dev,
>>> +   MPOL_F_NODE | MPOL_F_ADDR);
>>> +   if (ret < 0) {
>>> +   RTE_LOG(ERR, PMD, "Unknown numa node\n");
>>> +   return -1;
>>> +   }
>>> +
>>> +   eth_dev->data->numa_node = newnode;

If it's correct of above analysis, dev, here, could be numa_realloc() later?

Thanks,
Jianfeng


>>> +#endif
>>> +
>>> for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>>> vq = eth_dev->data->rx_queues[i];
>>> if (vq == NULL)
>>> @@ -352,9 +366,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
>>> vring, int enable)
>>> struct rte_vhost_vring_state *state;
>>> struct rte_eth_dev *eth_dev;
>>> struct internal_list *list;
>>> -#ifdef RTE_LIBRTE_VHOST_NUMA
>>> -   int newnode, ret;
>>> -#endif
>>> if (dev == NULL) {
>>> RTE_LOG(ERR, PMD, "Invalid argument\n");
>>> @@ -370,17 +381,6 @@ vring_state_changed(struct virtio_net *dev, uint16_t 
>>> vring, int enable)
>>> eth_dev = list->eth_dev;
>>> /* won't be NULL */
>>> state = vring_states[eth_dev->data->port_id];
>>> -
>>> -#ifdef RTE_LIBRTE_VHOST_NUMA
>>> -   ret  = get_mempolicy(, NULL, 0, dev,
>>> -   MPOL_F_NODE | MPOL_F_ADDR);
>>> -   if (ret < 0) {
>>> -   RTE_LOG(ERR, PMD, "Unknown numa node\n");
>>> -   return -1;
>>> -   }
>>> -
>>> -   eth_dev->data->numa_node = newnode;
>>> -#endif
>>> rte_spinlock_lock(>lock);
>>> state->cur[vring] = enable;
>>> state->max_vring = RTE_MAX(vring, state->max_vring);

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Yuanhan Liu

On Wed, Apr 06, 2016 at 02:05:51PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
> >On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
> >>Hi,
> >>
> >>Just out of interest, seems that the message handling thread which runs
> >>new_device() is pthread_create() from the thread which calls the
> >>dev_start(), usually master thread, right? But it's not necessary to be the
> >>master thread to poll pkts from this vhost port, right? So what's the
> >>significance to record the numa_node information of message handling thread
> >>here? Shall we make the decision of numa_realloc based on the final PMD
> >>thread who is responsible for polling this vhost port?
> >It doesn't matter on which core we made the decision: the result
> >would be same since we are querying the numa node info of the
> >virtio_net dev struct.
> 
> OK, according to your hint, read numa_realloc() again, it's to keep *dev
> (struct virtio_net), *dev->virtqueue[], on the same numa socket of shared
> memory with virtio?

Sort of, and I'm guessing that the comment have already made it clear?

/*
 * Reallocate virtio_dev and vhost_virtqueue data structure to make them on 
the
 * same numa node as the memory of vring descriptor.
 */

> 
> And numa_realloc() is called in vhost_set_vring_addr(), which is after
> new_device()?

It's actually before new_device().

--yliu

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Tan, Jianfeng



On 4/6/2016 2:17 PM, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 02:05:51PM +0800, Tan, Jianfeng wrote:
>>
>> On 4/6/2016 1:44 PM, Yuanhan Liu wrote:
>>> On Wed, Apr 06, 2016 at 01:32:12PM +0800, Tan, Jianfeng wrote:
 Hi,

 Just out of interest, seems that the message handling thread which runs
 new_device() is pthread_create() from the thread which calls the
 dev_start(), usually master thread, right? But it's not necessary to be the
 master thread to poll pkts from this vhost port, right? So what's the
 significance to record the numa_node information of message handling thread
 here? Shall we make the decision of numa_realloc based on the final PMD
 thread who is responsible for polling this vhost port?
>>> It doesn't matter on which core we made the decision: the result
>>> would be same since we are querying the numa node info of the
>>> virtio_net dev struct.
>> OK, according to your hint, read numa_realloc() again, it's to keep *dev
>> (struct virtio_net), *dev->virtqueue[], on the same numa socket of shared
>> memory with virtio?
> Sort of, and I'm guessing that the comment have already made it clear?
>
>  /*
>   * Reallocate virtio_dev and vhost_virtqueue data structure to make them 
> on the
>   * same numa node as the memory of vring descriptor.
>   */
>
>> And numa_realloc() is called in vhost_set_vring_addr(), which is after
>> new_device()?
> It's actually before new_device().
>
>   --yliu

Yes, exactly as you said. Thanks for explanation!

Jianfeng

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Tetsuya Mukawa

On 2016/04/06 1:09, Ciara Loftus wrote:
> After some testing, it was found that retrieving numa information
> about a vhost device via a call to get_mempolicy is more
> accurate when performed during the new_device callback versus
> the vring_state_changed callback, in particular upon initial boot
> of the VM.  Performing this check during new_device is also
> potentially more efficient as this callback is only triggered once
> during device initialisation, compared with vring_state_changed
> which may be called multiple times depending on the number of
> queues assigned to the device.
>
> Reorganise the code to perform this check and assign the correct
> socket_id to the device during the new_device callback.
>
> Signed-off-by: Ciara Loftus 
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 28 ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
> b/drivers/net/vhost/rte_eth_vhost.c
> index 4cc6bec..b1eb082 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
>

Hi,

I appreciate fixing it.
Just one worry is that state changed event may be occurred before new
device event.
The users should not call rte_eth_dev_socket_id() until new device event
comes, even if they catch queue state events.
Otherwise, they will get wrong socket id to call
rte_eth_rx/tx_queue_setup().

So how about commenting it in 'rte_eth_vhost.h'?

Thanks,
Tetsuya

[dpdk-dev] [PATCH] vhost: ABI/API change announcement due to refactor

2016-04-06 Thread Yuanhan Liu

We currently exposed way too many fields (or even structures) than
necessary. For example, vhost_virtqueue struct should NOT be exposed
to user at all: application just need to tell the right queue id to
locate a specific queue, and that's all. Instead, the structure should
be defined in an internal header file. With that, we could do any changes
to it we want, without worrying about that we may offense the painful
ABI rules.

Similar changes could be done to virtio_net struct as well, just exposing
very few fields that are necessary and moving all others to an internal
structure.

Huawei then suggested a more radical yet much cleaner one: just exposing
a virtio_net handle to application, just like the way kernel exposes an
fd to user for locating a specific file, and exposing some new functions
to access those old fields, such as flags, virt_qp_nb.

With this change, we're likely to be free from ABI violations forever
(well, except when we have to extend the virtio_net_device_ops struct).
For example, following nice cleanup would not be a blocking one then:

http://dpdk.org/ml/archives/dev/2016-February/033528.html

Suggested-by: Huawei Xie 
Cc: Ilya Maximets 
Signed-off-by: Yuanhan Liu 
---
 doc/guides/rel_notes/deprecation.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index ad31355..7d16d86 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -40,3 +40,10 @@ Deprecation Notices
   The existing API will be backward compatible, but there will be new API
   functions added to facilitate the creation of mempools using an external
   handler. The 16.07 release will contain these changes.
+
+* A librte_vhost public structures refactor is planned for DPDK 16.07
+  that requires both ABI and API change.
+  The proposed refactor would expose DPDK vhost dev to applications as
+  a handle, like the way kernel exposes an fd to user for locating a
+  specific file, and to keep all major structures internally, so that
+  we are likely to be free from ABI violations in future.
-- 
1.9.0

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Tetsuya Mukawa

On 2016/04/06 16:17, Yuanhan Liu wrote:
> On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
>> On 2016/04/06 1:09, Ciara Loftus wrote:
>>> After some testing, it was found that retrieving numa information
>>> about a vhost device via a call to get_mempolicy is more
>>> accurate when performed during the new_device callback versus
>>> the vring_state_changed callback, in particular upon initial boot
>>> of the VM.  Performing this check during new_device is also
>>> potentially more efficient as this callback is only triggered once
>>> during device initialisation, compared with vring_state_changed
>>> which may be called multiple times depending on the number of
>>> queues assigned to the device.
>>>
>>> Reorganise the code to perform this check and assign the correct
>>> socket_id to the device during the new_device callback.
>>>
>>> Signed-off-by: Ciara Loftus 
>>> ---
>>>  drivers/net/vhost/rte_eth_vhost.c | 28 ++--
>>>  1 file changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c 
>>> b/drivers/net/vhost/rte_eth_vhost.c
>>> index 4cc6bec..b1eb082 100644
>>> --- a/drivers/net/vhost/rte_eth_vhost.c
>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>>
>> Hi,
>>
>> I appreciate fixing it.
>> Just one worry is that state changed event may be occurred before new
>> device event.
>> The users should not call rte_eth_dev_socket_id() until new device event
>> comes, even if they catch queue state events.
>> Otherwise, they will get wrong socket id to call
>> rte_eth_rx/tx_queue_setup().
> There is no way to guarantee that the socket id stuff would work
> perfectly in vhost, right? I mean, it's likely that virtio device
> would allocate memory from 2 or more sockets.
>
> So, it doesn't matter too much whether it's set perfectly right
> or not. Instead, we should assign it with a saner value instead
> of a obvious wrong one when new_device() is not invoked yet. So,
> I'd suggest to make an assignment first based on vhost_dev (or
> whatever) struct, and then make it "right" at new_device()
> callback?

Yes, I agree with you idea.

Thanks,
Tetsuya

>> So how about commenting it in 'rte_eth_vhost.h'?
> It asks a different usage than other PMDs, which I don't think
> it's a good idea.
>
>   --yliu

[dpdk-dev] [PATCH v4] i40e: fix TSO issue for tx function

2016-04-06 Thread Zhe Tao

Issue:

when using the following CLI in testpmd to enable ipv6 TSO feature
(set --txqflags=0 in the testpmd command)
set verbose 1
csum set ip hw 0
csum set udp hw 0
csum set tcp hw 0
csum set sctp hw 0
csum set outer-ip hw 0
csum parse_tunnel on 0
tso set 800 0
set fwd csum
start

We will not get what we want, the ipv6 packets sent out from IXIA can be 
received by i40e, but cannot forward to another port.
The root cause is when HW doing the TSO offload for packets, it does not only
depends on the context descriptor to define the MSS and TSO payload size, it
also need to know whether this packets is ipv4 or ipv6, we use 
i40e_txd_enable_checksum to generate the related fields for data descriptor.
But PMD will not call i40e_txd_enable_checksum if only the TSO offload flag is
set. The reason why ipv4 works fine for TSO in testpmd csum mode is csum engine
will set the ip csum flag when the packet is ipv4 and TSO is enabled but 
will not set the flag for ipv6 and this flag will cause the
i40e_txd_enable_checksum to be invoked. For both the cases the TSO flag will be
set, so we need to use TSO flag to trigger the i40e_txd_enable_checksum.
The right logic here is we enable csum offload for both ipv4 and ipv6 when TSO
flag is set.

Fixes: e3f0151f (i40e: enable Tx checksum only for offloaded packets)

Signed-off-by: Zhe Tao 

---
v2: changed condition check for ipv6 TSO checksum offload
use a more clear check method which include both ipv4 & ipv6 TSO
v3: edited the commit log and title because this problem exists for both
ipv4 and ipv6
v4: moved the history under the commit log 
---
 drivers/net/i40e/i40e_rxtx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 59909f3..4d35d83 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -76,6 +76,7 @@
 #define I40E_TX_CKSUM_OFFLOAD_MASK (\
PKT_TX_IP_CKSUM |\
PKT_TX_L4_MASK | \
+   PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
-- 
2.1.4

[dpdk-dev] [PATCH v2] doc: update overview

2016-04-06 Thread Thomas Monjalon

2016-04-05 10:54, Wenzhuo Lu:
> Update the overview.rst for e1000, igb, ixgbe.
[...]
> stats per queue  X
>  X

You have not filled stats per queue. Is it because of an issue
with queue_stats_mapping?

[...]
> usage docX
>  X
> design doc
> perf doc

Do you plan to complete the docs for these drivers?

[dpdk-dev] [PATCH v2] doc: update overview

2016-04-06 Thread Thomas Monjalon

2016-04-05 10:54, Wenzhuo Lu:
> Update the overview.rst for e1000, igb, ixgbe.
> 
> v2:
> - Some "X"s are put in the wrong place, correct it.
> 
> Signed-off-by: Wenzhuo Lu 

Applied, thanks

[dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

2016-04-06 Thread Weglicki, MichalX

Hello, 

I have a question about this patch. 

As far as I see changing ETH_LINK_SPEED_ to ETH_SPEED_NUM_ is rather cosmetic 
change, am I right? 

Disadvantage of that is that it breaks compatibility with OVS (compilation) and 
thus it also breaks backward compatibility between OVS & DPDK. Is it really 
necessary? 

It would be great to keep ABI & API stable if possible. 

Br, 
Michal. 

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Marc Sune
Sent: Saturday, March 26, 2016 1:27 AM
To: Thomas Monjalon ; Xu, Qian Q ; Xing, Beilei ; dev at dpdk.org; Ananyev, 
Konstantin ; Lu, Wenzhuo ; Richardson, Bruce ; Glynn, Michael J 

Cc: Marc Sune 
Subject: [dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

The speed numbers ETH_LINK_SPEED_ are renamed ETH_SPEED_NUM_.
The prefix ETH_LINK_SPEED_ is kept for AUTONEG and will be used
for bit flags in next patch.

Signed-off-by: Marc Sune 
---
 app/test-pmd/cmdline.c| 10 +-
 app/test/virtual_pmd.c|  2 +-
 drivers/net/af_packet/rte_eth_af_packet.c |  2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c | 12 ++--
 drivers/net/cxgbe/base/t4_hw.c|  8 
 drivers/net/e1000/em_ethdev.c |  8 
 drivers/net/e1000/igb_ethdev.c|  8 
 drivers/net/ena/ena_ethdev.c  |  2 +-
 drivers/net/i40e/i40e_ethdev.c| 30 +++---
 drivers/net/i40e/i40e_ethdev_vf.c |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  | 22 +++---
 drivers/net/mpipe/mpipe_tilegx.c  |  4 ++--
 drivers/net/nfp/nfp_net.c |  2 +-
 drivers/net/null/rte_eth_null.c   |  2 +-
 drivers/net/pcap/rte_eth_pcap.c   |  2 +-
 drivers/net/ring/rte_eth_ring.c   |  2 +-
 drivers/net/szedata2/rte_eth_szedata2.c   |  8 
 drivers/net/vmxnet3/vmxnet3_ethdev.c  |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c |  2 +-
 lib/librte_ether/rte_ethdev.h | 29 ++---
 20 files changed, 83 insertions(+), 76 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index eb7bbb4..815b53b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1006,20 +1006,20 @@ parse_and_check_speed_duplex(char *speedstr, char 
*duplexstr, uint16_t *speed)
}

if (!strcmp(speedstr, "10")) {
-   *speed = ETH_LINK_SPEED_10;
+   *speed = ETH_SPEED_NUM_10M;
} else if (!strcmp(speedstr, "100")) {
-   *speed = ETH_LINK_SPEED_100;
+   *speed = ETH_SPEED_NUM_100M;
} else {
if (duplex != ETH_LINK_FULL_DUPLEX) {
printf("Invalid speed/duplex parameters\n");
return -1;
}
if (!strcmp(speedstr, "1000")) {
-   *speed = ETH_LINK_SPEED_1000;
+   *speed = ETH_SPEED_NUM_1G;
} else if (!strcmp(speedstr, "1")) {
-   *speed = ETH_LINK_SPEED_10G;
+   *speed = ETH_SPEED_NUM_10G;
} else if (!strcmp(speedstr, "4")) {
-   *speed = ETH_LINK_SPEED_40G;
+   *speed = ETH_SPEED_NUM_40G;
} else if (!strcmp(speedstr, "auto")) {
*speed = ETH_LINK_SPEED_AUTONEG;
} else {
diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index b1d40d7..b4bd2f2 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -604,7 +604,7 @@ virtual_ethdev_create(const char *name, struct ether_addr 
*mac_addr,
TAILQ_INIT(&(eth_dev->link_intr_cbs));

eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
-   eth_dev->data->dev_link.link_speed = ETH_LINK_SPEED_1;
+   eth_dev->data->dev_link.link_speed = ETH_SPEED_NUM_10G;
eth_dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX;

eth_dev->data->mac_addrs = rte_zmalloc(name, ETHER_ADDR_LEN, 0);
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index dee7b59..641f849 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -116,7 +116,7 @@ static const char *valid_arguments[] = {
 static const char *drivername = "AF_PACKET PMD";

 static struct rte_eth_link pmd_link = {
-   .link_speed = 1,
+   .link_speed = ETH_SPEED_NUM_10G,
.link_duplex = ETH_LINK_FULL_DUPLEX,
.link_status = ETH_LINK_DOWN,
 };
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
index 1b7e93a..ac8306f 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -711,22 +711,22 @@ link_speed_key(uint16_t speed) {
case ETH_LINK_SPEED_AUTONEG:
key_speed = 0x00;

[dpdk-dev] [PATCH v2] doc: update nic overview

2016-04-06 Thread Thomas Monjalon

2016-04-01 16:55, Chen Jing D:
> -   stats per queue  X
>  X
> +   stats per queue  X
>  X

I think you should fill "stats per queue"

> BSD nic_uio  X   X X X X

What is the issue with BSD?

[dpdk-dev] [PATCH] doc: add enic features to nic features matrix

2016-04-06 Thread Thomas Monjalon

2016-04-05 11:46, John Daley:
> usage docX   X X  
>  X

doc/guides/nics/enic.rst ?
I'll fill "usage doc".

Applied, thanks

[dpdk-dev] [PATCH] doc: announce ABI change for rte_port_source_params structure

2016-04-06 Thread Azarewicz, PiotrX T

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Fan Zhang
> > Sent: Thursday, March 31, 2016 2:29 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] doc: announce ABI change for
> > rte_port_source_params structure
> >
> > Several new fields will be added to structure rte_port_source_params
> > for source port enhancement with pcap file reading support.
> >
> > Signed-off-by: Fan Zhang 
> > Acked-by: Cristian Dumitrescu 
> 
> Acked-by: Jasvinder Singh 

Acked-by: Piotr Azarewicz

[dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

2016-04-06 Thread Thomas Monjalon

2016-04-06 08:34, Weglicki, MichalX:
> Hello, 
> 
> I have a question about this patch. 
> 
> As far as I see changing ETH_LINK_SPEED_ to ETH_SPEED_NUM_ is rather cosmetic 
> change, am I right? 

No.
ETH_LINK_SPEED was used for configuration and reported speed.
Now the configuration is done with a bitmap filled with new ETH_LINK_SPEED
and the old numerical values are kept for other usages as ETH_SPEED_NUM.

> Disadvantage of that is that it breaks compatibility with OVS (compilation) 
> and thus it also breaks backward compatibility between OVS & DPDK. Is it 
> really necessary? 

Yes that's why this change was discussed 9 months ago and announced in the
previous release notes.

> It would be great to keep ABI & API stable if possible.

We try to keep it stable when possible and continue to bring some improvements.

[dpdk-dev] [dpdk-dev, 07/10] lib: fix missing include dependencies

2016-04-06 Thread Adrien Mazarguil

Hi Jan,

Replying below as well.

On Tue, Apr 05, 2016 at 10:23:04PM +0200, Jan Viktorin wrote:
> Hello Adrien,
> 
> just quickly skimming through the ARM fixes...
> 
> On Tue,  5 Apr 2016 16:08:07 +0200
> Adrien Mazarguil  wrote:
> 
> > Exported header files for use by applications should be self sufficient and
> > allow out of order inclusion. Moreover, they must include all the system
> > headers they need for types and macros.
> > 
> > This commit prevents the following errors:
> > 
> >  error: `RTE_MAX_LCORE' undeclared here (not in a function)
> >  error: `RTE_LPM_VALID_EXT_ENTRY_BITMASK' undeclared (first use in this 
> > function)
> >  error: #error "Unsupported cache line size"
> >  error: `asm' undeclared (first use in this function)
> >  error: implicit declaration of function `[...]'
> >  error: unknown type name `[...]'
> >  error: field `mac_addr' has incomplete type
> >  error: `CHAR_BIT' undeclared here (not in a function)
> >  error: `struct timespec' declared inside parameter list
> > 
> > Signed-off-by: Adrien Mazarguil 
> > 
> > ---
> [...]
> > +
> >  #ifdef __cplusplus
> >  extern "C" {
> >  #endif
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_byteorder.h 
> > b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> > index 3f2dd1f..c2078e7 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> > @@ -37,6 +37,9 @@
> >  #  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
> >  #endif
> >  
> > +#include 
> > +#include 
> 
> Why not to place it into the extern "C" { block? There is already:
> 
> #include "generic/rte_byteorder.h"

Right, I did not do it because headers may eventually contain C++
compatibility code someday, so I think we should avoid #includes inside
extern "C" blocks. C++ compliant headers should provide their own blocks,
also I'm not sure how well it mixes with system includes having their own
compatibility layer.

I agree we need consistency, so what about a commit to move all #includes
outside of such blocks instead?

> > +#include 
> 
> I don't see any reason for this. The header does not use anything
> special. Just "asm", but that should be a keyword...

Unfortunately it's a nonstandard keyword which is defined as __asm__ in
rte_common.h, itself an extension keyword compilers will swallow without
complaining thanks to these "__".

> >  #ifdef __cplusplus
> >  extern "C" {
> >  #endif
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h 
> > b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > index 3ed46a4..600c6f0 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > @@ -33,6 +33,8 @@
> >  #ifndef _RTE_PREFETCH_ARM_64_H_
> >  #define _RTE_PREFETCH_ARM_64_H_
> >  
> > +#include 
> 
> Same here.

Same reason here.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Van Haaren, Harry

+ David Harton,

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Subject: Re: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> 2016-04-05 18:58, Harry van Haaren:
> > +* ABI change is planned for the xstats API

> Have you already submitted a RFC patch to let us have an opinion on the 
> change?
> We need, at least, to see the structure changes.

This API break is to allow changing the API of xstats given the conversation 
that was on-list, as discussed in this thread:

http://thread.gmane.org/gmane.comp.networking.dpdk.devel/31728/focus=31903

The issue we are going to fix is that currently PMDs copy strings when 
retrieving statistics, which causes unnecessary overhead. The implementation is 
not decided yet, but using an int->value mapping seems logical.

The rte_eth_xstats struct size may be modified, and the API to retrieve 
statistics will be modified.

[dpdk-dev] [PATCH v2] [PATCH] doc: update supported features of virtio

2016-04-06 Thread Thomas Monjalon

2016-04-05 11:31, Jianfeng Tan:
> Update the overview.rst for virtio.
> 
> Note: virtio is a para-virtualization device, which indicates that its
> features depend on not only front end but also back end. Here by X, we
> just mean the feature is supported in front end.
> 
> Signed-off-by: Jianfeng Tan 

Applied, thanks

[dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

2016-04-06 Thread Weglicki, MichalX

Hello, 

Thank you for your answer. 

I didn't mention that patch is cosmetic, rather naming. 

But all clear, I somehow missed it 9 months ago, and couldn't find it. 

Is there any official place where I can find upcoming ABI & API changes, or I 
have to dig through mailing list? 

Thank you in advance. 

Br, 
Michal. 

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, April 6, 2016 9:53 AM
To: Weglicki, MichalX 
Cc: Marc Sune ; Xu, Qian Q ; 
Xing, Beilei ; dev at dpdk.org; Ananyev, Konstantin 
; Lu, Wenzhuo ; 
Richardson, Bruce ; Glynn, Michael J 
; Gray, Mark D 
Subject: Re: [dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

2016-04-06 08:34, Weglicki, MichalX:
> Hello, 
> 
> I have a question about this patch. 
> 
> As far as I see changing ETH_LINK_SPEED_ to ETH_SPEED_NUM_ is rather cosmetic 
> change, am I right? 

No.
ETH_LINK_SPEED was used for configuration and reported speed.
Now the configuration is done with a bitmap filled with new ETH_LINK_SPEED
and the old numerical values are kept for other usages as ETH_SPEED_NUM.

> Disadvantage of that is that it breaks compatibility with OVS (compilation) 
> and thus it also breaks backward compatibility between OVS & DPDK. Is it 
> really necessary? 

Yes that's why this change was discussed 9 months ago and announced in the
previous release notes.

> It would be great to keep ABI & API stable if possible.

We try to keep it stable when possible and continue to bring some improvements.

--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Thomas Monjalon

2016-04-06 09:02, Van Haaren, Harry:
> + David Harton,
> 
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Subject: Re: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> > 2016-04-05 18:58, Harry van Haaren:
> > > +* ABI change is planned for the xstats API
> 
> > Have you already submitted a RFC patch to let us have an opinion on the 
> > change?
> > We need, at least, to see the structure changes.
> 
> This API break is to allow changing the API of xstats given the conversation 
> that was on-list, as discussed in this thread:
> 
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/31728/focus=31903
> 
> The issue we are going to fix is that currently PMDs copy strings when 
> retrieving statistics, which causes unnecessary overhead. The implementation 
> is not decided yet, but using an int->value mapping seems logical.

I am not sure performance is so much critical when retrieving statistics.
The extended stats can be infinitely extended. So a string identifier seems
a lot more natural.
I do not agree to add a new numeric identifier in the API each time a driver
wants to report a specific statistic for debugging purpose.

> The rte_eth_xstats struct size may be modified, and the API to retrieve 
> statistics will be modified.

[dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x

2016-04-06 Thread Thomas Monjalon

> > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
> > algorithm.
> > 
> > When last packet pointer "lp" and "pnum->u64" buffer points the same
> > memory buffer, high optimization can cause unpredictable results. It seems
> > that assignment of precalculated group sizes may interfere with
> > initialization of new group size when lp points value inside current group
> > and didn't should be changed.
> > 
> > With gcc >5.x and optimization we cannot be sure which assignment will be
> > done first, so the group size can be counted incorrectly.
> > 
> > This patch eliminates intersection of assignment of initial group size
> > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> > 
> > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> > 
> > Signed-off-by: Tomasz Kulasek 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] [PATCH v13 4/8] ethdev: rename link speed constants

2016-04-06 Thread Thomas Monjalon

2016-04-06 09:16, Weglicki, MichalX:
> Hello, 
> 
> Thank you for your answer. 
> 
> I didn't mention that patch is cosmetic, rather naming. 
> 
> But all clear, I somehow missed it 9 months ago, and couldn't find it. 
> 
> Is there any official place where I can find upcoming ABI & API changes, or I 
> have to dig through mailing list? 

Yes, the API and ABI changes are described in the release notes:

http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/release_16_04.rst#n471
And the next deprecations are announced in another section:
http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/deprecation.rst

PS: please answer inline

-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] 
> 2016-04-06 08:34, Weglicki, MichalX:
> > Hello, 
> > 
> > I have a question about this patch. 
> > 
> > As far as I see changing ETH_LINK_SPEED_ to ETH_SPEED_NUM_ is rather 
> > cosmetic change, am I right? 
> 
> No.
> ETH_LINK_SPEED was used for configuration and reported speed.
> Now the configuration is done with a bitmap filled with new ETH_LINK_SPEED
> and the old numerical values are kept for other usages as ETH_SPEED_NUM.
> 
> > Disadvantage of that is that it breaks compatibility with OVS (compilation) 
> > and thus it also breaks backward compatibility between OVS & DPDK. Is it 
> > really necessary? 
> 
> Yes that's why this change was discussed 9 months ago and announced in the
> previous release notes.
> 
> > It would be great to keep ABI & API stable if possible.
> 
> We try to keep it stable when possible and continue to bring some 
> improvements.

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Loftus, Ciara

> 
> On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > On 2016/04/06 1:09, Ciara Loftus wrote:
> > > After some testing, it was found that retrieving numa information
> > > about a vhost device via a call to get_mempolicy is more
> > > accurate when performed during the new_device callback versus
> > > the vring_state_changed callback, in particular upon initial boot
> > > of the VM.  Performing this check during new_device is also
> > > potentially more efficient as this callback is only triggered once
> > > during device initialisation, compared with vring_state_changed
> > > which may be called multiple times depending on the number of
> > > queues assigned to the device.
> > >
> > > Reorganise the code to perform this check and assign the correct
> > > socket_id to the device during the new_device callback.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > >  drivers/net/vhost/rte_eth_vhost.c | 28 ++--
> > >  1 file changed, 14 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > > index 4cc6bec..b1eb082 100644
> > > --- a/drivers/net/vhost/rte_eth_vhost.c
> > > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > >
> >
> > Hi,
> >
> > I appreciate fixing it.
> > Just one worry is that state changed event may be occurred before new
> > device event.
> > The users should not call rte_eth_dev_socket_id() until new device event
> > comes, even if they catch queue state events.
> > Otherwise, they will get wrong socket id to call
> > rte_eth_rx/tx_queue_setup().
> 
> There is no way to guarantee that the socket id stuff would work
> perfectly in vhost, right? I mean, it's likely that virtio device
> would allocate memory from 2 or more sockets.
> 
> So, it doesn't matter too much whether it's set perfectly right
> or not. Instead, we should assign it with a saner value instead
> of a obvious wrong one when new_device() is not invoked yet. So,
> I'd suggest to make an assignment first based on vhost_dev (or
> whatever) struct, and then make it "right" at new_device()
> callback?

Thanks for the feedback.
At the moment with this patch numa_node is initially set to rte_socket_id() 
during  pmd init and then updated to the correct value during new_device.
Are you suggesting we set it again in between these two steps ("based on 
vhost_dev")? If so where do you think would be a good place?

Thanks,
Ciara

> 
> > So how about commenting it in 'rte_eth_vhost.h'?
> 
> It asks a different usage than other PMDs, which I don't think
> it's a good idea.
> 
>   --yliu

[dpdk-dev] Fw: dpdk-armv7-testing - Build # 43 - Failure!

2016-04-06 Thread Jan Viktorin

Hello Adrien,

I've failed to build the patch set

[PATCH 00/10] Fix build errors related to exported headers

properly. Please, see the attached log file. It's something with a
header file inclusion. It should be probably:

#include "rte_ether.h"

Regards
Jan

-- 
   Jan Viktorin  E-mail: Viktorin at RehiveTech.com
   System Architect  Web:www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

[dpdk-dev] [PATCH] autotests: fix eal flags test

2016-04-06 Thread Thomas Monjalon

> > Since commit a88ba49e51, values larger than 4 are allowed,
> > the autotests need to be updated accordingly.
> > 
> > Fixes: a88ba49e51 ("config: fix CPU and memory parameters on IBM
> > POWER8")
> > Signed-off-by: Olivier Matz 
> 
> Acked-by: Pablo de Lara

[dpdk-dev] Fw: dpdk-armv7-testing - Build # 43 - Failure!

2016-04-06 Thread Thomas Monjalon

Hi Jan,

2016-04-06 11:53, Jan Viktorin:
> Please, see the attached log file.

The text attachments are filtered out on the mailing list because
it is not convenient or impossible to read in the archives.
Please use inline text, next time. Thanks

[dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, April 05, 2016 4:52 PM
> To: Richardson, Bruce; De Lara Guarch, Pablo
> Cc: dev at dpdk.org; Olivier Matz
> Subject: Re: [dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash
> 
> 2016-04-05 13:53, Olivier Matz:
> > Seen while trying to fix the func_reentrancy autotest. The
> > series addresses several issues:
> >
> > 1/ Hash and lpm return a pointer to an existing object if the user requests
> the
> >creation with an already existing name. This look dangerous: when an
> object
> >is returned, the user does not know if it should be freed or not.
> >
> > 2/ There is a race condition in cuckoo_hash as the lock is not held in
> >rte_hash_create(). We could find some cases where NULL is returned
> when the
> >object already exists (ex: when rte_ring_create() fails).
> >
> > 3/ There is a race condition func_reentrancy that can fail even if the 
> > tested
> >API behaves correctly.
> 
> Pablo, Bruce,
> What do you think of these fixes for 16.04?

I was reviewing them yesterday, but couldn't finish in time. I will do now.

[dpdk-dev] [PATCH] autotests: increase memory for group_2

2016-04-06 Thread Thomas Monjalon

> > The hash test (located in group_2) may require more than 64MB of memory,
> > especially if the memory is physically fragmented, making the test to
> > fail. So increase the memory to 128MB to avoid this issue.
> > 
> > Signed-off-by: Olivier Matz 
> 
> Acked-by: Pablo de Lara 

Applied, thanks

[dpdk-dev] [PATCH] xstats: fix behavior when a null array is provided

2016-04-06 Thread Thomas Monjalon

2016-04-05 11:06, Van Haaren, Harry:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > Subject: [dpdk-dev] [PATCH] xstats: fix behavior when a null array is 
> > provided
> > 
> > Coverity reports an issue in ethdev:
> > 
> >   *** CID 124562:  Null pointer dereferences  (FORWARD_NULL)
> >   /lib/librte_ether/rte_ethdev.c: 1518 in rte_eth_xstats_get()
> >   1512
> >   1513  /* global stats */
> >   1514  for (i = 0; i < RTE_NB_STATS; i++) {
> >   1515  stats_ptr = RTE_PTR_ADD(_stats,
> >   1516
> >   rte_stats_strings[i].offset);
> >   1517  val = *stats_ptr;
> >   >>> CID 124562:  Null pointer dereferences  (FORWARD_NULL)
> >   >>> Dereferencing null pointer "xstats".
> >   1518 snprintf(xstats[count].name,
> >   sizeof(xstats[count].name),
> >   1519  "%s", rte_stats_strings[i].name);
> >   1520xstats[count++].value = val;
> >   1521}
> >   1522
> >   1523  /* per-rxq stats */
> > 
> > If a user calls rte_eth_xstats_get(portid, NULL, n) with n != 0,
> > it may result in a crash. Although the API documentation says that
> > n is the size of the table and xstats can be NULL if n == 0, we
> > can add an additional check here to make Coverity happy.
> > 
> > In that case, the return value is the same than when n == 0 is
> > passed, it returns the number of statistics.
> > 
> > Fixes: ce757f5c9a ("ethdev: new method to retrieve extended statistics")
> > Signed-off-by: Olivier Matz 
> 
> I'm unsure on how verbose commit messages are ideal,
> but there's certainly enough description here :)
> 
> Acked-by: Harry van Haaren 

Applied, thanks

[dpdk-dev] Fw: dpdk-armv7-testing - Build # 43 - Failure!

2016-04-06 Thread Jan Viktorin

On Wed, 06 Apr 2016 12:05:31 +0200
Thomas Monjalon  wrote:

> Hi Jan,
> 
> 2016-04-06 11:53, Jan Viktorin:
> > Please, see the attached log file.  
> 
> The text attachments are filtered out on the mailing list because
> it is not convenient or impossible to read in the archives.
> Please use inline text, next time. Thanks

I am sorry, didn't notice that. Nobody has pointed to this before.

dpdk-armv7-testing - Build # 44 - Still Failing

See the following log:

[...]
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline.h 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse_num.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse_num.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse_ipaddr.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse_ipaddr.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse_etheraddr.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse_etheraddr.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse_string.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse_string.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_rdline.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_rdline.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_vt100.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_vt100.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_socket.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_socket.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_cirbuf.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_cirbuf.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  SYMLINK-FILE include/cmdline_parse_portlist.h
ln -nsf `/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/scripts/relpath.sh 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/lib/librte_cmdline/cmdline_parse_portlist.h
 /var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include` 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include
  INSTALL-LIB librte_cmdline.a
cp -f librte_cmdline.a 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/lib
== Build lib/librte_ether
/opt/gcc/br2-arm32-glibc-4.9.x/usr/bin/arm-buildroot-linux-gnueabi-gcc 
-Wp,-MD,./.rte_ethdev.o.d.tmp -mfloat-abi=softfp -mfloat-abi=softfp 
-mfloat-abi=softfp -pthread  -march=armv7-a -mtune=cortex-a9 -mfpu=neon 
-DRTE_MACHINE_CPUFLAG_NEON  
-I/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include -include 
/var/lib/jenkins/jobs/dpdk-armv7-testing/workspace/build/include/rte_config.h 
-O3 -W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations 
-Wold-style-definition -Wpointer-arith -Wcast-align -Wnested-externs 
-Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef -Wwrite-strings 
-Werror

[dpdk-dev] [PATCH] vhost: fix coverity defect

2016-04-06 Thread Thomas Monjalon

> Fix following coverity defect:
> 
> 291 void
> 292 vhost_destroy_device(struct vhost_device_ctx ctx)
> 293 {
> 294 struct virtio_net *dev = get_device(ctx);
> 295
> >>> CID 124565:  Null pointer dereferences  (NULL_RETURNS)
> >>> Dereferencing a null pointer "dev".
> 
> Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")
> 
> Reported-by: John McNamara 
> Signed-off-by: Yuanhan Liu 

Applied, thanks

[dpdk-dev] [PATCH v2] virtio: use zeroed memory for simple TX header

2016-04-06 Thread Thomas Monjalon

> > For simple TX the virtio-net header must be zeroed, but it was using memory
> > that had been initialized with indirect descriptor tables. This resulted in
> > "unsupported gso type" errors from librte_vhost.
> > 
> > We can use the same memory for every descriptor to save cachelines in the
> > vswitch.
> > 
> > Fixes: 6dc5de3a (virtio: use indirect ring elements)
> > Signed-off-by: Rich Lane 
> 
> Acked-by: Yuanhan Liu 

Applied, thanks

[dpdk-dev] [PATCH v2] igb: fix i350 VF RX issue

2016-04-06 Thread Thomas Monjalon

> A problem is found on i350 VF. We found TX will happen once
> per 4 packets. If only 1~3 packets are received, they will
> not be forwarded. But the real problem is on RX side. The
> reason is the default RX write-back threshold is changed to
> 4, so every first 3 packets may be hung there.
> 
> This patch checks the RX wthresh when setting up the RX
> queue, and forces it to be 1, so every packet can be handled
> immediately.
> 
> v2:
> - Add missed signoff.
> 
> Fixes: 4a41c17dba18 (igb: set default thresholds based on MAC type)
> Signed-off-by: Wenzhuo Lu 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] [PATCH v2] ixgbe: fix occasional timeouts when starting VF

2016-04-06 Thread Iremonger, Bernard

Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, April 5, 2016 4:48 PM
> To: Iremonger, Bernard 
> Cc: dev at dpdk.org; Lu, Wenzhuo 
> Subject: Re: [dpdk-dev] [PATCH v2] ixgbe: fix occasional timeouts when
> starting VF
> 
> 2016-04-05 15:55, Bernard Iremonger:
> > -   poll_ms = 10;
> > +   poll_ms = RTE_IXGBE_REGISTER_POLL_WAIT_15_MS;
> [...]
> >  #define RTE_IXGBE_REGISTER_POLL_WAIT_10_MS  10
> > +#define RTE_IXGBE_REGISTER_POLL_WAIT_15_MS  15
> >  #define RTE_IXGBE_WAIT_100_US   100
> 
> I don't understand why these constants are needed.
> There is no semantic, just an arbitrary number.

I am seeing intermittent failures in the following test at line 4969 in 
ixgbe_rxtx.c:

while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE))

Increasing the value of poll_ms does not address the root cause of this failure.
This needs more investigation.

Self NAK.

[dpdk-dev] [PATCH v2] ethdev: refine new API to query supported ptypes

2016-04-06 Thread Jianfeng Tan

This change is to  make user code simpler. For PMDs which do not fill any
packet types, return 0 instead of -ENOTSUP as suggested by Bruce.

Usually, users only care if the required (by ptype_mask) ptypes can be
filled by the specified PMD. If the PMD implements dev_supported_ptypes_get
func is not important. And the introduce of another return value (-ENOTSUP)
would increase the complexity of user programs to check it.

Besides, there are ways to know if a PMD implements the func:
  a. see doc/guides/nics/overview.rst.
  b. use (~1) as parameter ptype_mask, then check if return 0.

Suggested-by: Bruce Richardson 
Signed-off-by: Jianfeng Tan 
Acked-by: Konstantin Ananyev 
---
 v2: exclude the commit of updating doc/guides/nics/overview.rst.

 lib/librte_ether/rte_ethdev.c | 3 +--
 lib/librte_ether/rte_ethdev.h | 9 ++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fd49b26..5a16d5e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1670,8 +1670,7 @@ rte_eth_dev_get_supported_ptypes(uint8_t port_id, 
uint32_t ptype_mask,

RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
dev = _eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_get,
-   -ENOTSUP);
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_get, 0);
all_ptypes = (*dev->dev_ops->dev_supported_ptypes_get)(dev);

if (!all_ptypes)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 37ddd51..022733e 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2379,6 +2379,9 @@ void rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info);
  * @note
  *   Better to invoke this API after the device is already started or rx burst
  *   function is decided, to obtain correct supported ptypes.
+ * @note
+ *   if a given PMD does not report what ptypes it supports, then the supported
+ *   ptype count is reported as 0.
  * @param port_id
  *   The port identifier of the Ethernet device.
  * @param ptype_mask
@@ -2388,9 +2391,9 @@ void rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info);
  * @param num
  *  Size of the array pointed by param ptypes.
  * @return
- *   - (>0) Number of supported ptypes. If it exceeds param num, exceeding
- *  packet types will not be filled in the given array.
- *   - (0 or -ENOTSUP) if PMD does not fill the specified ptype.
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *   only num entries will be filled into the ptypes array, but the 
full
+ *   count of supported ptypes will be returned.
  *   - (-ENODEV) if *port_id* invalid.
  */
 int rte_eth_dev_get_supported_ptypes(uint8_t port_id, uint32_t ptype_mask,
-- 
2.1.4

[dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label

2016-04-06 Thread Fiona Trahe

The cryptodev API was introduced in the DPDK 2.2 release.
Since then it has
 - been reviewed and iterated for the DPDK 16.04 release
 - had extensive use by the l2fwd-crypto app,
the ipsec-secgw example app,
the test app.
We believe it is now stable and the EXPERIMENTAL label should be removed.

v2:
- remove extra empty line

Signed-off-by: Fiona Trahe 
---
 MAINTAINERS  | 2 +-
 config/common_base   | 1 -
 lib/librte_cryptodev/rte_cryptodev.h | 3 ---
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 85d72ca..a7570cd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -231,7 +231,7 @@ M: Thomas Monjalon 
 F: lib/librte_ether/
 F: scripts/test-null.sh

-Crypto API - EXPERIMENTAL
+Crypto API
 M: Declan Doherty 
 F: lib/librte_cryptodev/
 F: app/test/test_cryptodev*
diff --git a/config/common_base b/config/common_base
index abd6a64..0124e86 100644
--- a/config/common_base
+++ b/config/common_base
@@ -327,7 +327,6 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y

 #
 # Compile generic crypto device library
-# EXPERIMENTAL: API may change without prior notice
 #
 CONFIG_RTE_LIBRTE_CRYPTODEV=y
 CONFIG_RTE_LIBRTE_CRYPTODEV_DEBUG=n
diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
b/lib/librte_cryptodev/rte_cryptodev.h
index 568ffbb..d47f1e8 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -39,9 +39,6 @@
  *
  * Defines RTE Crypto Device APIs for the provisioning of cipher and
  * authentication operations.
- *
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  */

 #ifdef __cplusplus
-- 
2.1.0

[dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread Olivier Matz



On 04/06/2016 12:32 PM, De Lara Guarch, Pablo wrote:
> 
> I wonder if you should include something in release notes.
> We are fixing the API, so I guess we don't need to follow the deprecation 
> process, but at least a note in the documentation?

Good idea, I'll send a v4 with the deprecation notice.

> Apart from that,
> 
> Series-acked-by: Pablo de Lara 

Thank you for the review.

Olivier

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Van Haaren, Harry

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Subject: Re: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> > The issue we are going to fix is that currently PMDs copy strings when 
> > retrieving
> statistics, which causes unnecessary overhead. The implementation is not 
> decided yet, but
> using an int->value mapping seems logical.

> I am not sure performance is so much critical when retrieving statistics.

In the previous discussion David was concerned about performance impact
of string copies, are those concerns still present David?

> The extended stats can be infinitely extended. So a string identifier seems
> a lot more natural.

I'm not suggesting that the string identifier is removed totally.

> I do not agree to add a new numeric identifier in the API each time a driver
> wants to report a specific statistic for debugging purpose. 

And I agree - the ints are just an index to xstats arrays, no eth-dev wide 
enums here.
The proposal is to make the API more flexible, see example:
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/31728/focus=32795

This more flexible API would allow other types of information about
statistics be retrieved too.

For now, the sent patch announces that the API/ABI may change, and we can
discuss details of API as development starts.

[dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread De Lara Guarch, Pablo

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, April 06, 2016 12:15 PM
> To: De Lara Guarch, Pablo; dev at dpdk.org
> Cc: Richardson, Bruce
> Subject: Re: [dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash
> 
> 
> 
> On 04/06/2016 12:32 PM, De Lara Guarch, Pablo wrote:
> >
> > I wonder if you should include something in release notes.
> > We are fixing the API, so I guess we don't need to follow the deprecation
> process, but at least a note in the documentation?
> 
> Good idea, I'll send a v4 with the deprecation notice.

Well, not sure if this needs a deprecation notice.
I mean, it is an API fix: yes, this is changing what the function returns
in a particular situation (when the hash/lpm already exists) ,
but it was going against the API documentation, so a deprecation notice should 
not be necessary.
(just my opinion, I could be quite wrong here :P).

I was thinking more on adding a note in Resolved issues.

Thanks,
Pablo
> 
> > Apart from that,
> >
> > Series-acked-by: Pablo de Lara 
> 
> Thank you for the review.
> 
> Olivier

[dpdk-dev] [PATCH v4] i40e: fix TSO issue for tx function

2016-04-06 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhe Tao
> Sent: Wednesday, April 06, 2016 9:17 AM
> To: dev at dpdk.org
> Cc: Tao, Zhe; Wu, Jingjing
> Subject: [dpdk-dev] [PATCH v4] i40e: fix TSO issue for tx function
> 
> Issue:
> 
> when using the following CLI in testpmd to enable ipv6 TSO feature
> (set --txqflags=0 in the testpmd command)
>   set verbose 1
>   csum set ip hw 0
>   csum set udp hw 0
>   csum set tcp hw 0
>   csum set sctp hw 0
>   csum set outer-ip hw 0
>   csum parse_tunnel on 0
>   tso set 800 0
>   set fwd csum
>   start
> 
> We will not get what we want, the ipv6 packets sent out from IXIA can be
> received by i40e, but cannot forward to another port.
> The root cause is when HW doing the TSO offload for packets, it does not only
> depends on the context descriptor to define the MSS and TSO payload size, it
> also need to know whether this packets is ipv4 or ipv6, we use
> i40e_txd_enable_checksum to generate the related fields for data descriptor.
> But PMD will not call i40e_txd_enable_checksum if only the TSO offload flag is
> set. The reason why ipv4 works fine for TSO in testpmd csum mode is csum 
> engine
> will set the ip csum flag when the packet is ipv4 and TSO is enabled but
> will not set the flag for ipv6 and this flag will cause the
> i40e_txd_enable_checksum to be invoked. For both the cases the TSO flag will 
> be
> set, so we need to use TSO flag to trigger the i40e_txd_enable_checksum.
> The right logic here is we enable csum offload for both ipv4 and ipv6 when TSO
> flag is set.
> 
> Fixes: e3f0151f (i40e: enable Tx checksum only for offloaded packets)
> 
> Signed-off-by: Zhe Tao 
> 
> ---
> v2: changed condition check for ipv6 TSO checksum offload
> use a more clear check method which include both ipv4 & ipv6 TSO
> v3: edited the commit log and title because this problem exists for both
> ipv4 and ipv6
> v4: moved the history under the commit log
> ---
>  drivers/net/i40e/i40e_rxtx.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 59909f3..4d35d83 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -76,6 +76,7 @@
>  #define I40E_TX_CKSUM_OFFLOAD_MASK (  \
>   PKT_TX_IP_CKSUM |\
>   PKT_TX_L4_MASK | \
> + PKT_TX_TCP_SEG | \
>   PKT_TX_OUTER_IP_CKSUM)
> 
>  static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
> --

Acked-by: Konstantin Ananyev 

> 2.1.4

[dpdk-dev] [PATCH v3 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread Olivier Matz



On 04/06/2016 01:20 PM, De Lara Guarch, Pablo wrote:
>> On 04/06/2016 12:32 PM, De Lara Guarch, Pablo wrote:
>>>
>>> I wonder if you should include something in release notes.
>>> We are fixing the API, so I guess we don't need to follow the deprecation
>> process, but at least a note in the documentation?
>>
>> Good idea, I'll send a v4 with the deprecation notice.
> 
> Well, not sure if this needs a deprecation notice.
> I mean, it is an API fix: yes, this is changing what the function returns
> in a particular situation (when the hash/lpm already exists) ,
> but it was going against the API documentation, so a deprecation notice 
> should not be necessary.
> (just my opinion, I could be quite wrong here :P).
> 
> I was thinking more on adding a note in Resolved issues.

Yes, agree, it's a bug fix.

Another argument to not follow the API change process is that
the initial behavior was to return EEXIST, but it was changed
by this commit:

  http://dpdk.org/browse/dpdk/commit/?id=916e4f4f4e

By the way, the "Fixes:" line was not referencing this commit
in the v3, I'll also change that in v4.

Thanks,
Olivier

[dpdk-dev] DPDK namespace

2016-04-06 Thread Panu Matilainen

On 04/06/2016 08:26 AM, Yuanhan Liu wrote:
> On Tue, Apr 05, 2016 at 05:31:22PM +0300, Arnon Warshavsky wrote:
>> On Tue, Apr 5, 2016 at 5:13 PM, Trahe, Fiona  
>> wrote:
>>
>>>
>>>
 -Original Message-
 From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
 Sent: Tuesday, April 05, 2016 2:57 PM
 To: dev at dpdk.org
 Subject: [dpdk-dev] DPDK namespace

 DPDK is going to be more popular in Linux distributions.
 It means people will have some DPDK files in their /usr/include and some
>>> DPDK
 libraries on their system.

 Let's imagine someone trying to compile an application which needs
 rte_ethdev.h. He has to figure out that this "rte header" is provided by
>>> the DPDK.
 Hopefully it will be explained on StackOverflow that RTE stands for DPDK.
 Then someone else will try to run a binary without having installed the
>>> DPDK
 libraries. The linker will require libethdev.so (no prefix here).
 StackOverflow will probably have another good answer (among wrong ones):
 "Hey Sherlock Holmes, have you tried to install the DPDK library?"
 Followed by an insight: "You know, the DPDK naming is weird..."
 And we could continue the story with developers having some naming clash
 because of some identifiers not prefixed at all.

 The goal of this email is to get some feedback on how important it is to
>>> fix the
 DPDK namespace.

 If there is enough agreement that we should do something, I suggest to
 introduce the "dpdk_" prefix slowly and live with both "rte_" and "dpdk_"
 during some time.
 We could start using the new prefix for the new APIs (example: crypto)
>>> or when
 there is a significant API break (example: mempool).

 Opinions welcome!
>>> I don't have an opinion on how important it is to fix the namespace,
>>> though it does seem like a good idea.
>>> However if it's to be done, in my opinion it should be completed quickly
>>> or will just cause more confusion.
>>> So if rte_cryptoxxx becomes dpdk_cryptoxxx all other libraries should
>>> follow in next release or two, with
>>> the resulting ABI compatibility handling. Maybe with dual naming handled
>>> for several releases, but a
>>> clear end date when all are converted.
>>> Else there will be many years with a mix of rte_ and dpdk_
>>>
>>>
>>
>> Googling rte functions or error codes usually takes you to dpdk dev email
>> archive so I don't think it is that much difficult to figure out where rte
>> comes from.
>> Other than that , except for my own refactoring pains when replacing a dpdk
>> version, I do not see a major reason why not.
>> If Going for dpdk_ prefix, I agree with the quick death approach.
>
> +1: it's a bit weird to keep both, especially for a long while, that
> every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
> Instead of breaking applications many times, I'd prefer to break once.
> Therefore, applications could do a simple global rte_ -> dpdk_ substitute:
> it doesn't sound that painful then.

I concur. If (and I think that should be a pretty big IF) the prefix is 
to be changed then its better done in one fast sweep than gradually.

Gratuitious (or nearly so) change is always extremely annoying, and the 
longer it takes the more painful it is. Application developers wont much 
care what the prefix is as long as its consistent, but if they're forced 
to track prefix changes across several releases with different libraries 
moving at different pace, they WILL be calling for bloody murder :)

As for rte_ being strange for DPDK - yes it is, but it takes like 5 
minutes to get over it. It would help to have it explained on dpdk.org 
FAQ: "Due to historical reasons, DPDK libraries are prefixed rte_ 
instead of dpdk_ because  and changing it is unnecessarily painful."

>
> And here are few more comments:
>
> - we should add rte_/dpdk_ prefix to all public structures as well.
>
>I'm thinking we are doing well here. I'm just aware that vhost lib
>does a bad job, which is something I proposed to fix in next release.

Yup, all public symbols should be prefixed. What the exact prefix is 
isn't that important really.

>
> - If we do the whole change once, I'd suggest to do it ASAP when this
>release is over.
>
>It should be a HUGE change that touches a lot of code, if we do it
>later, at a stage that a lot of patches for new features have been
>made or sent out, all of them need rebase. That'd be painful.

Nod, that's yet another aspect to consider.

So to summarize, I'm not strongly opposed to doing a one-time mass rte_ 
-> dpdk_ prefix change, but it needs to be one big sweep all at once, or 
not do it at all. Gradual change is a suicide.

Keeping rte_ is not the end of the world by any means, especially when 
applied consistently and explained someplace.

- Panu -

[dpdk-dev] [dpdk-dev, 07/10] lib: fix missing include dependencies

2016-04-06 Thread Jan Viktorin

On Wed, 6 Apr 2016 10:54:14 +0200
Adrien Mazarguil  wrote:

> Hi Jan,
> 
> Replying below as well.
> 
[...]
> > > --- a/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> > > +++ b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> > > @@ -37,6 +37,9 @@
> > >  #  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
> > >  #endif
> > >  
> > > +#include 
> > > +#include   
> > 
> > Why not to place it into the extern "C" { block? There is already:
> > 
> > #include "generic/rte_byteorder.h"  
> 
> Right, I did not do it because headers may eventually contain C++
> compatibility code someday, so I think we should avoid #includes inside
> extern "C" blocks. C++ compliant headers should provide their own blocks,
> also I'm not sure how well it mixes with system includes having their own
> compatibility layer.
> 
> I agree we need consistency, so what about a commit to move all #includes
> outside of such blocks instead?

Yes, I agree.

> 
> > > +#include   
> > 
> > I don't see any reason for this. The header does not use anything
> > special. Just "asm", but that should be a keyword...  
> 
> Unfortunately it's a nonstandard keyword which is defined as __asm__ in
> rte_common.h, itself an extension keyword compilers will swallow without
> complaining thanks to these "__".

OK.

> 
> > >  #ifdef __cplusplus
> > >  extern "C" {
> > >  #endif
> > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h 
> > > b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > > index 3ed46a4..600c6f0 100644
> > > --- a/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > > +++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch_64.h
> > > @@ -33,6 +33,8 @@
> > >  #ifndef _RTE_PREFETCH_ARM_64_H_
> > >  #define _RTE_PREFETCH_ARM_64_H_
> > >  
> > > +#include   
> > 
> > Same here.  
> 
> Same reason here.

OK.

Regards
Jan

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Thomas Monjalon

2016-04-06 11:16, Van Haaren, Harry:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > The issue we are going to fix is that currently PMDs copy strings when 
> > > retrieving
> > statistics, which causes unnecessary overhead. The implementation is not 
> > decided yet, but
> > using an int->value mapping seems logical.
> 
> > I am not sure performance is so much critical when retrieving statistics.
> 
> In the previous discussion David was concerned about performance impact
> of string copies, are those concerns still present David?
> 
> > The extended stats can be infinitely extended. So a string identifier seems
> > a lot more natural.
> 
> I'm not suggesting that the string identifier is removed totally.
> 
> > I do not agree to add a new numeric identifier in the API each time a driver
> > wants to report a specific statistic for debugging purpose. 
> 
> And I agree - the ints are just an index to xstats arrays, no eth-dev wide 
> enums here.
> The proposal is to make the API more flexible, see example:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/31728/focus=32795
> 
> This more flexible API would allow other types of information about
> statistics be retrieved too.

OK I think I start to understand.

> For now, the sent patch announces that the API/ABI may change, and we can
> discuss details of API as development starts.

This should not be the normal process.
It is important to understand what should be the changes to decide of
announcing or not a deprecation.
In the case of the mempool reworks, the patch have been sent and discussed
on the mailing list.
Given the previous explanations (and knowing you did good job on stats),
I give my
Acked-by: Thomas Monjalon

[dpdk-dev] [PATCH] ena: icc fix compilation warning #3656

2016-04-06 Thread Ferruh Yigit

 With (ICC) 16.0.2 20160204, getting following warnings:

 .../drivers/net/ena/base/ena_com.c(492): error #3656: variable
 "flags" may be used before its value is set
 ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);

.../drivers/net/ena/base/ena_com.c(1971): error #3656: variable
 "mem_handle" may be used before its value is set
 ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev, len,

For both warnings the variable value is ignored, so there is no defect.
To comfort compiler warning, a initial value provided to variables.

Signed-off-by: Ferruh Yigit 
---
 drivers/net/ena/base/ena_com.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
index f886760..a21a951 100644
--- a/drivers/net/ena/base/ena_com.c
+++ b/drivers/net/ena/base/ena_com.c
@@ -478,7 +478,7 @@ ena_com_wait_and_process_admin_cq_polling(
struct ena_comp_ctx *comp_ctx,
struct ena_com_admin_queue *admin_queue)
 {
-   unsigned long flags;
+   unsigned long flags = 0;
u64 start_time;
int ret;

@@ -1964,7 +1964,7 @@ int ena_com_get_dev_extended_stats(struct ena_com_dev 
*ena_dev, char *buff,
int ret = 0;
struct ena_admin_aq_get_stats_cmd get_cmd;
struct ena_admin_acq_get_stats_resp get_resp;
-   ena_mem_handle_t mem_handle;
+   ena_mem_handle_t mem_handle = 0;
void *virt_addr;
dma_addr_t phys_addr;

-- 
2.5.5

[dpdk-dev] DPDK namespace

2016-04-06 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, April 06, 2016 1:08 PM
> To: Yuanhan Liu; Arnon Warshavsky
> Cc: Trahe, Fiona; Thomas Monjalon; dev at dpdk.org
> Subject: Re: [dpdk-dev] DPDK namespace
> 
> On 04/06/2016 08:26 AM, Yuanhan Liu wrote:
> > On Tue, Apr 05, 2016 at 05:31:22PM +0300, Arnon Warshavsky wrote:
> >> On Tue, Apr 5, 2016 at 5:13 PM, Trahe, Fiona  
> >> wrote:
> >>
> >>>
> >>>
>  -Original Message-
>  From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
>  Sent: Tuesday, April 05, 2016 2:57 PM
>  To: dev at dpdk.org
>  Subject: [dpdk-dev] DPDK namespace
> 
>  DPDK is going to be more popular in Linux distributions.
>  It means people will have some DPDK files in their /usr/include and some
> >>> DPDK
>  libraries on their system.
> 
>  Let's imagine someone trying to compile an application which needs
>  rte_ethdev.h. He has to figure out that this "rte header" is provided by
> >>> the DPDK.
>  Hopefully it will be explained on StackOverflow that RTE stands for DPDK.
>  Then someone else will try to run a binary without having installed the
> >>> DPDK
>  libraries. The linker will require libethdev.so (no prefix here).
>  StackOverflow will probably have another good answer (among wrong ones):
>  "Hey Sherlock Holmes, have you tried to install the DPDK library?"
>  Followed by an insight: "You know, the DPDK naming is weird..."
>  And we could continue the story with developers having some naming clash
>  because of some identifiers not prefixed at all.
> 
>  The goal of this email is to get some feedback on how important it is to
> >>> fix the
>  DPDK namespace.
> 
>  If there is enough agreement that we should do something, I suggest to
>  introduce the "dpdk_" prefix slowly and live with both "rte_" and "dpdk_"
>  during some time.
>  We could start using the new prefix for the new APIs (example: crypto)
> >>> or when
>  there is a significant API break (example: mempool).
> 
>  Opinions welcome!
> >>> I don't have an opinion on how important it is to fix the namespace,
> >>> though it does seem like a good idea.
> >>> However if it's to be done, in my opinion it should be completed quickly
> >>> or will just cause more confusion.
> >>> So if rte_cryptoxxx becomes dpdk_cryptoxxx all other libraries should
> >>> follow in next release or two, with
> >>> the resulting ABI compatibility handling. Maybe with dual naming handled
> >>> for several releases, but a
> >>> clear end date when all are converted.
> >>> Else there will be many years with a mix of rte_ and dpdk_
> >>>
> >>>
> >>
> >> Googling rte functions or error codes usually takes you to dpdk dev email
> >> archive so I don't think it is that much difficult to figure out where rte
> >> comes from.
> >> Other than that , except for my own refactoring pains when replacing a dpdk
> >> version, I do not see a major reason why not.
> >> If Going for dpdk_ prefix, I agree with the quick death approach.
> >
> > +1: it's a bit weird to keep both, especially for a long while, that
> > every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
> > Instead of breaking applications many times, I'd prefer to break once.
> > Therefore, applications could do a simple global rte_ -> dpdk_ substitute:
> > it doesn't sound that painful then.
> 
> I concur. If (and I think that should be a pretty big IF) the prefix is
> to be changed then its better done in one fast sweep than gradually.
> 
> Gratuitious (or nearly so) change is always extremely annoying, and the
> longer it takes the more painful it is. Application developers wont much
> care what the prefix is as long as its consistent, but if they're forced
> to track prefix changes across several releases with different libraries
> moving at different pace, they WILL be calling for bloody murder :)
> 
> As for rte_ being strange for DPDK - yes it is, but it takes like 5
> minutes to get over it. It would help to have it explained on dpdk.org
> FAQ: "Due to historical reasons, DPDK libraries are prefixed rte_
> instead of dpdk_ because  name> and changing it is unnecessarily painful."
> 
> >
> > And here are few more comments:
> >
> > - we should add rte_/dpdk_ prefix to all public structures as well.
> >
> >I'm thinking we are doing well here. I'm just aware that vhost lib
> >does a bad job, which is something I proposed to fix in next release.
> 
> Yup, all public symbols should be prefixed. What the exact prefix is
> isn't that important really.
> 
> >
> > - If we do the whole change once, I'd suggest to do it ASAP when this
> >release is over.
> >
> >It should be a HUGE change that touches a lot of code, if we do it
> >later, at a stage that a lot of patches for new features have been
> >made or sent out, all of them need

[dpdk-dev] DPDK namespace

2016-04-06 Thread Jay Rolette

On Wed, Apr 6, 2016 at 12:26 AM, Yuanhan Liu 
wrote:

> On Tue, Apr 05, 2016 at 05:31:22PM +0300, Arnon Warshavsky wrote:
> > Googling rte functions or error codes usually takes you to dpdk dev email
> > archive so I don't think it is that much difficult to figure out where
> rte
> > comes from.
> > Other than that , except for my own refactoring pains when replacing a
> dpdk
> > version, I do not see a major reason why not.
> > If Going for dpdk_ prefix, I agree with the quick death approach.
>
> +1: it's a bit weird to keep both, especially for a long while, that
> every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
> Instead of breaking applications many times, I'd prefer to break once.
> Therefore, applications could do a simple global rte_ -> dpdk_ substitute:
> it doesn't sound that painful then.
>
> And here are few more comments:
>
> - we should add rte_/dpdk_ prefix to all public structures as well.
>
>   I'm thinking we are doing well here. I'm just aware that vhost lib
>   does a bad job, which is something I proposed to fix in next release.
>
> - If we do the whole change once, I'd suggest to do it ASAP when this
>   release is over.
>
>   It should be a HUGE change that touches a lot of code, if we do it
>   later, at a stage that a lot of patches for new features have been
>   made or sent out, all of them need rebase. That'd be painful.
>

This last point that yliu brings up is the one that worries me. DPDK
already has a serious problem with the patch backlog. It can take months
for even small bug fixes to get merged.

>From the app side, it's not the end of the world if there is a one time,
single shot change to the prefix. However, I'm not sure it's worth it
because of the impact on patches that have been submitted. As others have
mentioned, google can sort the rte_* references just fine.

Jay

[dpdk-dev] DPDK namespace

2016-04-06 Thread Mcnamara, John



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> Sent: Wednesday, April 6, 2016 1:41 PM
> To: Yuanhan Liu 
> Cc: Arnon Warshavsky ; Trahe, Fiona
> ; Thomas Monjalon ;
> dev at dpdk.org
> Subject: Re: [dpdk-dev] DPDK namespace
> 
> On Wed, Apr 6, 2016 at 12:26 AM, Yuanhan Liu 
> wrote:
> 
> This last point that yliu brings up is the one that worries me. DPDK
> already has a serious problem with the patch backlog. It can take months
> for even small bug fixes to get merged.
> 
> From the app side, it's not the end of the world if there is a one time,
> single shot change to the prefix. However, I'm not sure it's worth it
> because of the impact on patches that have been submitted. As others have
> mentioned, google can sort the rte_* references just fine.

Hi,

Agreed on patches. And if we wanted to do a Long Term Support branch/release 
based on DPDK 2.2.0 or 16.04 then backporting fixes would be a pain.

John.
--

[dpdk-dev] [PATCH] doc: update multi process memory image

2016-04-06 Thread Harry van Haaren

This patch updates the titles in the multiprocess memory image
to read "Primary Process" and "Secondary Process" instead of
"DPDK Server Process" and "Customer Client Process".

The rest of the image has been converted from PNG to SVG.

Signed-off-by: Harry van Haaren 

---

PS: Have to send this patch with git send-email --no-validate, as the PNG
content being removed has too many characters on one line. I will test
applying it from patchwork and self-NACK if it doesn't apply.


 doc/guides/prog_guide/img/multi_process_memory.svg | 467 -
 1 file changed, 445 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/img/multi_process_memory.svg 
b/doc/guides/prog_guide/img/multi_process_memory.svg
index fee95a0..03f17d7 100644
--- a/doc/guides/prog_guide/img/multi_process_memory.svg
+++ b/doc/guides/prog_guide/img/multi_process_memory.svg
@@ -41,17 +41,115 @@
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xmlns:svg="http://www.w3.org/2000/svg;
xmlns="http://www.w3.org/2000/svg;
-   xmlns:xlink="http://www.w3.org/1999/xlink;
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd;
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape;
-   width="975.5733"
-   height="710.76343"
+   width="964.6286"
+   height="712.48572"
id="svg2"
version="1.1"
-   inkscape:version="0.48.4 r9939"
+   inkscape:version="0.91 r13725"
sodipodi:docname="multi_process_memory.svg">
   
+ id="defs4">
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
   
   
   
-
+ id="layer2"
+ inkscape:label="Boxes"
+ style="display:inline"
+ transform="translate(-6.971426,-3.4491554)">
+
+
+
+
+
+
+
+
+
+  
+  
+  
+Primary 
Process
+Secondary Process
+struct rte_config
+struct hugepage[]
+
+  
+  IPC Queue
+
+
+  
+  IPC Queue
+
+HugepageDPDKMemory
+
+  
+  Mbuf Pool
+
+Local Pointers
+Local Pointers
+Local Data
+Local Data
+  
+  
+
+
+
+
+
+
+
+
   
 
-- 
2.5.0

[dpdk-dev] [PATCH v4 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread Olivier Matz

Seen while trying to fix the func_reentrancy autotest. The
series addresses several issues:

1/ Hash and lpm return a pointer to an existing object if the user requests the
   creation with an already existing name. This look dangerous: when an object
   is returned, the user does not know if it should be freed or not.

2/ There is a race condition in cuckoo_hash as the lock is not held in
   rte_hash_create(). We could find some cases where NULL is returned when the
   object already exists (ex: when rte_ring_create() fails).

3/ There is a race condition func_reentrancy that can fail even if the tested
   API behaves correctly.


RFC -> v1:

- split the patch in 4 patches
- on error, set rte_errno to EEXIST when relevant
- fix locking in cuckoo_hash creation

v1 -> v2:

- fix compilation issue in cuckoo hash
- update the hash test to conform to the new behavior
- rework locking modification in cuckoo_hash
- passed autotests: hash, lpm, lpm6, func_reentrancy

v2 -> v3:

- rebase against head
- add "Fixes:" in commit messages
- properly set lpm or hash pointers to NULL on error before returning

v3 -> v4:

- add entries in the release note to explain the API changes
  in lpm and hash
- correct the "Fixes:" line in the first 2 patches

Olivier Matz (4):
  lpm: allocation of an existing object should fail
  hash: allocation of an existing object should fail
  hash: keep the list locked at creation
  autotest: fix func reentrancy

 app/test/test_func_reentrancy.c| 31 ++-
 app/test/test_hash.c   | 65 +++---
 app/test/test_lpm6.c   |  2 +-
 doc/guides/rel_notes/release_16_04.rst | 17 
 lib/librte_hash/rte_cuckoo_hash.c  | 72 --
 lib/librte_hash/rte_fbk_hash.c |  5 ++-
 lib/librte_lpm/rte_lpm.c   | 10 -
 lib/librte_lpm/rte_lpm6.c  |  5 ++-
 8 files changed, 120 insertions(+), 87 deletions(-)

-- 
2.1.4

[dpdk-dev] [PATCH v4 1/4] lpm: allocation of an existing object should fail

2016-04-06 Thread Olivier Matz

Change rte_lpm*_create() functions to return NULL and set rte_errno to
EEXIST when the object name already exists. This is the behavior
described in the API documentation in the header file.

These functions were returning a pointer to the existing object in that
case, but it is a problem as the caller did not know if the object had
to be freed or not.

Doing this change also makes the lpm API more consistent with the other
APIs (mempool, rings, ...).

Fixes: 916e4f4f4e ("memory: fix for multi process support")
Signed-off-by: Olivier Matz 
Acked-by: Pablo de Lara 
---
 app/test/test_lpm6.c   |  2 +-
 doc/guides/rel_notes/release_16_04.rst |  9 +
 lib/librte_lpm/rte_lpm.c   | 10 --
 lib/librte_lpm/rte_lpm6.c  |  5 -
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/app/test/test_lpm6.c b/app/test/test_lpm6.c
index 1f88d7a..b464342 100644
--- a/app/test/test_lpm6.c
+++ b/app/test/test_lpm6.c
@@ -222,7 +222,7 @@ test1(void)

/* rte_lpm6_create: lpm name == LPM2 */
lpm3 = rte_lpm6_create("LPM1", SOCKET_ID_ANY, );
-   TEST_LPM_ASSERT(lpm3 == lpm1);
+   TEST_LPM_ASSERT(lpm3 == NULL);

rte_lpm6_free(lpm1);
rte_lpm6_free(lpm2);
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index d6e358f..9a6fd4a 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -432,6 +432,15 @@ Libraries

   Fixed core dump issue on txq and swq when dropless is set to yes.

+* **lpm: Fixed return value when allocating an existing object.**
+
+  Changed the ``rte_lpm*_create()`` functions to return ``NULL`` and set
+  ``rte_errno`` to ``EEXIST` when the object name already exists. This is
+  the behavior described in the API documentation in the header file.
+  The previous behavior was to return a pointer to the existing object in
+  that case, preventing the caller to know if the object had to be freed
+  or not.
+

 Examples
 
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index bd3563f..8bdf606 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -209,8 +209,11 @@ rte_lpm_create_v20(const char *name, int socket_id, int 
max_rules,
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (te != NULL)
+   lpm = NULL;
+   if (te != NULL) {
+   rte_errno = EEXIST;
goto exit;
+   }

/* allocate tailq entry */
te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
@@ -280,8 +283,11 @@ rte_lpm_create_v1604(const char *name, int socket_id,
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (te != NULL)
+   lpm = NULL;
+   if (te != NULL) {
+   rte_errno = EEXIST;
goto exit;
+   }

/* allocate tailq entry */
te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 4c44cd7..ba4353c 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -182,8 +182,11 @@ rte_lpm6_create(const char *name, int socket_id,
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (te != NULL)
+   lpm = NULL;
+   if (te != NULL) {
+   rte_errno = EEXIST;
goto exit;
+   }

/* allocate tailq entry */
te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
-- 
2.1.4

[dpdk-dev] [PATCH v4 3/4] hash: keep the list locked at creation

2016-04-06 Thread Olivier Matz

To avoid a race condition while creating a new hash object, the
list has to be locked before the lookup, and released only once the
new object is added in the list.

As the lock is held by the rte_ring_create(), move its creation at the
beginning of the function and only take the lock after the ring is
created to avoid a deadlock.

Fixes: 48a3991196 ("hash: replace with cuckoo hash implementation")
Signed-off-by: Olivier Matz 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash.c | 70 ++-
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index b00cc12..7b7d1f8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -295,19 +295,48 @@ rte_hash_create(const struct rte_hash_parameters *params)
if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)
hw_trans_mem_support = 1;

+   /* Store all keys and leave the first entry as a dummy entry for 
lookup_bulk */
+   if (hw_trans_mem_support)
+   /*
+* Increase number of slots by total number of indices
+* that can be stored in the lcore caches
+* except for the first cache
+*/
+   num_key_slots = params->entries + (RTE_MAX_LCORE - 1) *
+   LCORE_CACHE_SIZE + 1;
+   else
+   num_key_slots = params->entries + 1;
+
+   snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
+   r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
+   params->socket_id, 0);
+   if (r == NULL) {
+   RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   goto err;
+   }
+
snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);

-   /* Guarantee there's no existing */
-   h = rte_hash_find_existing(params->name);
-   if (h != NULL) {
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* guarantee there's no existing: this is normally already checked
+* by ring creation above */
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
+   if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
+   break;
+   }
+   h = NULL;
+   if (te != NULL) {
rte_errno = EEXIST;
-   return NULL;
+   te = NULL;
+   goto err_unlock;
}

te = rte_zmalloc("HASH_TAILQ_ENTRY", sizeof(*te), 0);
if (te == NULL) {
RTE_LOG(ERR, HASH, "tailq entry allocation failed\n");
-   goto err;
+   goto err_unlock;
}

h = (struct rte_hash *)rte_zmalloc_socket(hash_name, sizeof(struct 
rte_hash),
@@ -315,7 +344,7 @@ rte_hash_create(const struct rte_hash_parameters *params)

if (h == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
-   goto err;
+   goto err_unlock;
}

const uint32_t num_buckets = rte_align32pow2(params->entries)
@@ -327,23 +356,10 @@ rte_hash_create(const struct rte_hash_parameters *params)

if (buckets == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
-   goto err;
+   goto err_unlock;
}

const uint32_t key_entry_size = sizeof(struct rte_hash_key) + 
params->key_len;
-
-   /* Store all keys and leave the first entry as a dummy entry for 
lookup_bulk */
-   if (hw_trans_mem_support)
-   /*
-* Increase number of slots by total number of indices
-* that can be stored in the lcore caches
-* except for the first cache
-*/
-   num_key_slots = params->entries + (RTE_MAX_LCORE - 1) *
-   LCORE_CACHE_SIZE + 1;
-   else
-   num_key_slots = params->entries + 1;
-
const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;

k = rte_zmalloc_socket(NULL, key_tbl_size,
@@ -351,7 +367,7 @@ rte_hash_create(const struct rte_hash_parameters *params)

if (k == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
-   goto err;
+   goto err_unlock;
}

 /*
@@ -393,14 +409,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->cmp_jump_table_idx = KEY_OTHER_BYTES;
 #endif

-   snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
-   r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
-   params->socket_id, 0);
-   if (r == NULL) {
-   RTE_LOG(ERR, HASH, "memory allocation failed\n");
-   goto err;
-   }
-
if (hw_trans_mem_support) {
h->local_free_slots =

[dpdk-dev] [PATCH v4 2/4] hash: allocation of an existing object should fail

2016-04-06 Thread Olivier Matz

Change rte_hash*_create() functions to return NULL and set rte_errno to
EEXIST when the object name already exists. This is the behavior
described in the API documentation in the header file.

These functions were returning a pointer to the existing object in that
case, but it is a problem as the caller did not know if the object had
to be freed or not.

Doing this change also makes the hash API more consistent with the other
APIs (mempool, rings, ...).

Fixes: 916e4f4f4e ("memory: fix for multi process support")
Signed-off-by: Olivier Matz 
Acked-by: Pablo de Lara 
---
 app/test/test_hash.c   | 65 --
 doc/guides/rel_notes/release_16_04.rst |  8 +
 lib/librte_hash/rte_cuckoo_hash.c  |  6 ++--
 lib/librte_hash/rte_fbk_hash.c |  5 ++-
 4 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 2f3d884..adbdb4a 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -805,15 +805,11 @@ fbk_hash_unit_test(void)
RETURN_IF_ERROR_FBK(handle == NULL, "fbk hash creation should have 
succeeded");

tmp = rte_fbk_hash_create(_params_same_name_2);
-   RETURN_IF_ERROR_FBK(tmp == NULL, "fbk hash creation should have 
succeeded");
-   if (tmp != handle) {
-   printf("ERROR line %d: hashes should have been the 
same\n", __LINE__);
-   rte_fbk_hash_free(handle);
-   rte_fbk_hash_free(tmp);
-   return -1;
-   }
+   if (tmp != NULL)
+   rte_fbk_hash_free(tmp);
+   RETURN_IF_ERROR_FBK(tmp != NULL, "fbk hash creation should have 
failed");

-   /* we are not freeing tmp or handle here because we need a hash list
+   /* we are not freeing  handle here because we need a hash list
 * to be not empty for the next test */

/* create a hash in non-empty list - good for coverage */
@@ -988,7 +984,7 @@ static int test_fbk_hash_find_existing(void)
  */
 static int test_hash_creation_with_bad_parameters(void)
 {
-   struct rte_hash *handle;
+   struct rte_hash *handle, *tmp;
struct rte_hash_parameters params;

handle = rte_hash_create(NULL);
@@ -1038,7 +1034,23 @@ static int test_hash_creation_with_bad_parameters(void)
return -1;
}

+   /* test with same name should fail */
+   memcpy(, _params, sizeof(params));
+   params.name = "same_name";
+   handle = rte_hash_create();
+   if (handle == NULL) {
+   printf("Cannot create first hash table with 'same_name'\n");
+   return -1;
+   }
+   tmp = rte_hash_create();
+   if (tmp != NULL) {
+   printf("Creation of hash table with same name should fail\n");
+   rte_hash_free(handle);
+   rte_hash_free(tmp);
+   return -1;
+   }
rte_hash_free(handle);
+
printf("# Test successful. No more errors expected\n");

return 0;
@@ -1051,12 +1063,12 @@ static int test_hash_creation_with_bad_parameters(void)
 static int
 test_hash_creation_with_good_parameters(void)
 {
-   struct rte_hash *handle, *tmp;
+   struct rte_hash *handle;
struct rte_hash_parameters params;

/* create with null hash function - should choose DEFAULT_HASH_FUNC */
memcpy(, _params, sizeof(params));
-   params.name = "same_name";
+   params.name = "name";
params.hash_func = NULL;
handle = rte_hash_create();
if (handle == NULL) {
@@ -1064,37 +1076,6 @@ test_hash_creation_with_good_parameters(void)
return -1;
}

-   /* this test is trying to create a hash with the same name as previous 
one.
-* this should return a pointer to the hash we previously created.
-* the previous hash isn't freed exactly for the purpose of it being in
-* the hash list.
-*/
-   memcpy(, _params, sizeof(params));
-   params.name = "same_name";
-   tmp = rte_hash_create();
-
-   /* check if the returned handle is actually equal to the previous hash 
*/
-   if (handle != tmp) {
-   rte_hash_free(handle);
-   rte_hash_free(tmp);
-   printf("Creating hash with existing name was successful\n");
-   return -1;
-   }
-
-   /* try creating hash when there already are hashes in the list.
-* the previous hash is not freed to have a non-empty hash list.
-* the other hash that's in the list is still pointed to by "handle" 
var.
-*/
-   memcpy(, _params, sizeof(params));
-   params.name = "different_name";
-   tmp = rte_hash_create();
-   if (tmp == NULL) {
-   rte_hash_free(handle);
-   printf("Creating hash with valid parameters failed\n");
-   return -1;
-   }
-
-   rte_hash_free(tmp);
rte_hash_free(handle);

return 0;
diff

[dpdk-dev] [PATCH v4 4/4] autotest: fix func reentrancy

2016-04-06 Thread Olivier Matz

The previous code in func_reentrancy autotest was doing in parallel
something close to:

  name = "common_name";
  do several times {
  obj = allocate_an_object(name)   // obj = ring, mempool, hash, lpm, ...
  if (obj == NULL && lookup(name) == NULL)
  return TEST_FAIL;
  }

This code is not safe. For instance:

   mempool_create() is called on core 0, it creates a ring. At the same
   time on core 1, mempool_create() is called too and the creation of the
   ring fails (EEXIST). But the mempool lookup can fail on core 1 if
   the mempool is not added in the list by core 0.

This commit fixes the func_reentrancy autotest that now works with all
tested class of objects.

Fixes: 104a92bd02 ("app: add reentrancy tests")
Signed-off-by: Olivier Matz 
Acked-by: Pablo de Lara 
---
 app/test/test_func_reentrancy.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 5d09296..300a3bc 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -83,6 +83,7 @@ typedef void (*case_clean_t)(unsigned lcore_id);

 #define MAX_LCORES RTE_MAX_MEMZONE / (MAX_ITER_TIMES * 4U)

+static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
 static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);

 #define WAIT_SYNCHRO_FOR_SLAVES()   do{ \
@@ -100,6 +101,7 @@ test_eal_init_once(__attribute__((unused)) void *arg)

WAIT_SYNCHRO_FOR_SLAVES();

+   rte_atomic32_set(_count, 1); /* silent the check in the caller */
if (rte_eal_init(0, NULL) != -1)
return -1;

@@ -122,8 +124,8 @@ ring_create_lookup(__attribute__((unused)) void *arg)
/* create the same ring simultaneously on all threads */
for (i = 0; i < MAX_ITER_TIMES; i++) {
rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
-   if ((NULL == rp) && (rte_ring_lookup("fr_test_once") == NULL))
-   return -1;
+   if (rp != NULL)
+   rte_atomic32_inc(_count);
}

/* create/lookup new ring several times */
@@ -172,8 +174,8 @@ mempool_create_lookup(__attribute__((unused)) void *arg)
NULL, NULL,
my_obj_init, NULL,
SOCKET_ID_ANY, 0);
-   if ((NULL == mp) && (rte_mempool_lookup("fr_test_once") == 
NULL))
-   return -1;
+   if (mp != NULL)
+   rte_atomic32_inc(_count);
}

/* create/lookup new ring several times */
@@ -238,8 +240,8 @@ hash_create_free(__attribute__((unused)) void *arg)
hash_params.name = "fr_test_once";
for (i = 0; i < MAX_ITER_TIMES; i++) {
handle = rte_hash_create(_params);
-   if ((NULL == handle) && (rte_hash_find_existing("fr_test_once") 
== NULL))
-   return -1;
+   if (handle != NULL)
+   rte_atomic32_inc(_count);
}

/* create mutiple times simultaneously */
@@ -306,8 +308,8 @@ fbk_create_free(__attribute__((unused)) void *arg)
fbk_params.name = "fr_test_once";
for (i = 0; i < MAX_ITER_TIMES; i++) {
handle = rte_fbk_hash_create(_params);
-   if ((NULL == handle) && 
(rte_fbk_hash_find_existing("fr_test_once") == NULL))
-   return -1;
+   if (handle != NULL)
+   rte_atomic32_inc(_count);
}

/* create mutiple fbk tables simultaneously */
@@ -372,8 +374,8 @@ lpm_create_free(__attribute__((unused)) void *arg)
/* create the same lpm simultaneously on all threads */
for (i = 0; i < MAX_ITER_TIMES; i++) {
lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, );
-   if ((NULL == lpm) && (rte_lpm_find_existing("fr_test_once") == 
NULL))
-   return -1;
+   if (lpm != NULL)
+   rte_atomic32_inc(_count);
}

/* create mutiple fbk tables simultaneously */
@@ -432,10 +434,12 @@ launch_test(struct test_case *pt_case)
unsigned lcore_id;
unsigned cores_save = rte_lcore_count();
unsigned cores = RTE_MIN(cores_save, MAX_LCORES);
+   unsigned count;

if (pt_case->func == NULL)
return -1;

+   rte_atomic32_set(_count, 0);
rte_atomic32_set(, 0);

RTE_LCORE_FOREACH_SLAVE(lcore_id) {
@@ -462,6 +466,13 @@ launch_test(struct test_case *pt_case)
pt_case->clean(lcore_id);
}

+   count = rte_atomic32_read(_count);
+   if (count != 1) {
+   printf("%s: common object allocated %d times (should be 1)\n",
+   pt_case->name, count);
+   ret = -1;
+   }
+
return ret;
 }

-- 
2.1.4

[dpdk-dev] [PATCH v2] i40evf: fix link info update

2016-04-06 Thread Thomas Monjalon

2016-04-05 10:01, Jingjing Wu:
> The issue is the VF's link speed kept as 10G and status always was up.
> It did not change even the physical link's status changed.
> This patch fixes this issue to make VF's link info consistent with
> physical link.
> 
> Fixes: 4861cde46116 (i40e: new poll mode driver)
> Signed-off-by: Jingjing Wu 

Applied, thanks

[dpdk-dev] [PATCH v4] i40e: fix TSO issue for tx function

2016-04-06 Thread Thomas Monjalon

> > We will not get what we want, the ipv6 packets sent out from IXIA can be
> > received by i40e, but cannot forward to another port.
> > The root cause is when HW doing the TSO offload for packets, it does not 
> > only
> > depends on the context descriptor to define the MSS and TSO payload size, it
> > also need to know whether this packets is ipv4 or ipv6, we use
> > i40e_txd_enable_checksum to generate the related fields for data descriptor.
> > But PMD will not call i40e_txd_enable_checksum if only the TSO offload flag 
> > is
> > set. The reason why ipv4 works fine for TSO in testpmd csum mode is csum 
> > engine
> > will set the ip csum flag when the packet is ipv4 and TSO is enabled but
> > will not set the flag for ipv6 and this flag will cause the
> > i40e_txd_enable_checksum to be invoked. For both the cases the TSO flag 
> > will be
> > set, so we need to use TSO flag to trigger the i40e_txd_enable_checksum.
> > The right logic here is we enable csum offload for both ipv4 and ipv6 when 
> > TSO
> > flag is set.
> > 
> > Fixes: e3f0151f (i40e: enable Tx checksum only for offloaded packets)
> > 
> > Signed-off-by: Zhe Tao 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread David Harton (dharton)



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, April 06, 2016 8:14 AM
> To: Van Haaren, Harry 
> Cc: David Harton (dharton) ; dev at dpdk.org; Tahhan,
> Maryam ; olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> 
> 2016-04-06 11:16, Van Haaren, Harry:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > The issue we are going to fix is that currently PMDs copy strings
> > > > when retrieving
> > > statistics, which causes unnecessary overhead. The implementation is
> > > not decided yet, but using an int->value mapping seems logical.
> >
> > > I am not sure performance is so much critical when retrieving
> statistics.
> >
> > In the previous discussion David was concerned about performance
> > impact of string copies, are those concerns still present David?
> >
> > > The extended stats can be infinitely extended. So a string
> > > identifier seems a lot more natural.
> >
> > I'm not suggesting that the string identifier is removed totally.
> >
> > > I do not agree to add a new numeric identifier in the API each time
> > > a driver wants to report a specific statistic for debugging purpose.
> >
> > And I agree - the ints are just an index to xstats arrays, no eth-dev
> wide enums here.

Yes, I abandoned the idea of a set of stats ids.  I can see where registration 
will be problematic and cumbersome to driver developers.

> > The proposal is to make the API more flexible, see example:
> > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/31728/focus=3
> > 2795
> >
> > This more flexible API would allow other types of information about
> > statistics be retrieved too.

I have prototyped this.  If there is interest/acceptance I can work on making 
an official patch to share back to the community.

Using this method still gives the flexibility the current API desires while 
giving the user the control to only obtain the counters.  This of course 
assumes that the counters per device are static but that seems a safe bet.

> 
> OK I think I start to understand.
> 
> > For now, the sent patch announces that the API/ABI may change, and we
> > can discuss details of API as development starts.
> 
> This should not be the normal process.
> It is important to understand what should be the changes to decide of
> announcing or not a deprecation.
> In the case of the mempool reworks, the patch have been sent and discussed
> on the mailing list.
> Given the previous explanations (and knowing you did good job on stats), I
> give my
> Acked-by: Thomas Monjalon 

Thanks for considering this.

Regards,
Dave

[dpdk-dev] [PATCH] ena: icc fix compilation warning #3656

2016-04-06 Thread Thomas Monjalon

2016-04-06 13:17, Ferruh Yigit:
>  With (ICC) 16.0.2 20160204, getting following warnings:
> 
>  .../drivers/net/ena/base/ena_com.c(492): error #3656: variable
>  "flags" may be used before its value is set
>  ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
> 
> .../drivers/net/ena/base/ena_com.c(1971): error #3656: variable
>  "mem_handle" may be used before its value is set
>  ENA_MEM_ALLOC_COHERENT(ena_dev->dmadev, len,
> 
> For both warnings the variable value is ignored, so there is no defect.
> To comfort compiler warning, a initial value provided to variables.
> 
> Signed-off-by: Ferruh Yigit 

Applied, thanks

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Van Haaren, Harry

> From: David Harton (dharton) [mailto:dharton at cisco.com]
> Subject: RE: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> > This should not be the normal process.
> > It is important to understand what should be the changes to decide of
> > announcing or not a deprecation.
> > In the case of the mempool reworks, the patch have been sent and discussed
> > on the mailing list.
> > Given the previous explanations (and knowing you did good job on stats), I
> > give my
> > Acked-by: Thomas Monjalon 
> 
> Thanks for considering this.

Would you mind giving the patch an Ack David?

Then we'll have the 3 Acks needed and can fix this in the next release.
Cheers, -Harry

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread David Harton (dharton)


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Harry van Haaren
> Sent: Tuesday, April 05, 2016 1:58 PM
> To: dev at dpdk.org
> Cc: maryam.tahhan at intel.com; Harry van Haaren  intel.com>
> Subject: [dpdk-dev] [PATCH] doc: announce xstats api change for 16.07
> 
> This patch adds a notice that the API for the xstats functionality will be
> modified in the 16.07 release, with no backwards compatibility planned as
> it would require code duplication in each PMD that supports xstats.
> 
> Signed-off-by: Harry van Haaren 
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 98d5529..13c3a95 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -54,3 +54,8 @@ Deprecation Notices
>induce a modification of the rte_mempool structure, plus a
>modification of the API of rte_mempool_obj_iter(), implying a breakage
>of the ABI.
> +
> +* ABI change is planned for the xstats API and rte_eth_xstats struct,
> +to
> +  facilitate updating to an API that allows retrieval of values without
> +any
> +  string copies or parsing. No backwards compatibility is planned, as
> +it would
> +  require code duplication in every PMD that supports xstats.
> --
> 2.5.0

Acked-by: David Harton

[dpdk-dev] [PATCH] doc: announce xstats api change for 16.07

2016-04-06 Thread Tahhan, Maryam

> From: Van Haaren, Harry
> Sent: Tuesday, April 5, 2016 6:58 PM
> To: dev at dpdk.org
> Cc: Tahhan, Maryam ; Van Haaren, Harry
> 
> Subject: [PATCH] doc: announce xstats api change for 16.07
> 
> This patch adds a notice that the API for the xstats functionality will be
> modified in the 16.07 release, with no backwards compatibility planned
> as it would require code duplication in each PMD that supports xstats.
> 
> Signed-off-by: Harry van Haaren 
> ---

Acked-by: Maryam Tahhan

[dpdk-dev] [PATCH v3 3/4] bnx2x: Enhance stats get

2016-04-06 Thread Van Haaren, Harry

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rasesh Mody
> Subject: [dpdk-dev] [PATCH v3 3/4] bnx2x: Enhance stats get

Hi Rasesh,

> + snprintf(xstats[num].name, sizeof(xstats[num].name), "brb_drops");

I don't understand what a "brb" drop is.


> + snprintf(xstats[num].name, sizeof(xstats[num].name), "tx_pfc");

Similarly here, and with some other of the xstats strings, it doesn't
become clear to me what exactly the value represents.

"mac_filter_discard" is descriptive and readable, but the next stat
has "mf_tag_discard" - these small inconsistencies make it much harder
(impossible?) to scrap the xstats strings and retrieve useful metadata.

I'll suggest leaving the xstats implementation part of this patch until
the next release, and we can align on the names of the stats.

-Harry

[dpdk-dev] [PATCH v3 1/4] bnx2x: Update documentation

2016-04-06 Thread Bruce Richardson

On Tue, Apr 05, 2016 at 05:37:05PM -0700, Rasesh Mody wrote:
> Signed-off-by: Harish Patil 
> Signed-off-by: Rasesh Mody 
> ---
>  doc/guides/nics/bnx2x.rst|1 +
>  doc/guides/nics/overview.rst |   22 +++---
>  2 files changed, 12 insertions(+), 11 deletions(-)
> 
Thanks for the patchset. Looking at it though, some of the changes are bug or
documentation fixes which would be good to get into 16.04 e.g. patch 1, and the
fix for the regular stats in patch 2, while other parts of the set, e.g. the TX
optimisations, or the new xstats functionality, are larger changes. Given we
are within days of a final release, the latter changes not really suitable for 
merging at this point. 
As per Thomas' guidance, the only code changes are meant to be bug fixes,
in case a change introduces a last minute issue.

Any chance you could split this patchset in two, with the doc and bug fixes 
alone
in one set, and the feature changes in a separate set for 16.07?

Thanks,
/Bruce

[dpdk-dev] [PATCH v2] ethdev: refine new API to query supported ptypes

2016-04-06 Thread Thomas Monjalon

2016-04-06 11:51, Jianfeng Tan:
> This change is to  make user code simpler. For PMDs which do not fill any
> packet types, return 0 instead of -ENOTSUP as suggested by Bruce.
> 
> Usually, users only care if the required (by ptype_mask) ptypes can be
> filled by the specified PMD. If the PMD implements dev_supported_ptypes_get
> func is not important. And the introduce of another return value (-ENOTSUP)
> would increase the complexity of user programs to check it.
> 
> Besides, there are ways to know if a PMD implements the func:
>   a. see doc/guides/nics/overview.rst.
>   b. use (~1) as parameter ptype_mask, then check if return 0.
> 
> Suggested-by: Bruce Richardson 
> Signed-off-by: Jianfeng Tan 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] DPDK namespace

2016-04-06 Thread Wiles, Keith

>On 04/06/2016 08:26 AM, Yuanhan Liu wrote:
>> On Tue, Apr 05, 2016 at 05:31:22PM +0300, Arnon Warshavsky wrote:
>>> On Tue, Apr 5, 2016 at 5:13 PM, Trahe, Fiona  
>>> wrote:
>>>


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, April 05, 2016 2:57 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] DPDK namespace
>
> DPDK is going to be more popular in Linux distributions.
> It means people will have some DPDK files in their /usr/include and some
 DPDK
> libraries on their system.
>
> Let's imagine someone trying to compile an application which needs
> rte_ethdev.h. He has to figure out that this "rte header" is provided by
 the DPDK.
> Hopefully it will be explained on StackOverflow that RTE stands for DPDK.
> Then someone else will try to run a binary without having installed the
 DPDK
> libraries. The linker will require libethdev.so (no prefix here).
> StackOverflow will probably have another good answer (among wrong ones):
> "Hey Sherlock Holmes, have you tried to install the DPDK library?"
> Followed by an insight: "You know, the DPDK naming is weird..."
> And we could continue the story with developers having some naming clash
> because of some identifiers not prefixed at all.
>
> The goal of this email is to get some feedback on how important it is to
 fix the
> DPDK namespace.
>
> If there is enough agreement that we should do something, I suggest to
> introduce the "dpdk_" prefix slowly and live with both "rte_" and "dpdk_"
> during some time.
> We could start using the new prefix for the new APIs (example: crypto)
 or when
> there is a significant API break (example: mempool).
>
> Opinions welcome!
 I don't have an opinion on how important it is to fix the namespace,
 though it does seem like a good idea.
 However if it's to be done, in my opinion it should be completed quickly
 or will just cause more confusion.
 So if rte_cryptoxxx becomes dpdk_cryptoxxx all other libraries should
 follow in next release or two, with
 the resulting ABI compatibility handling. Maybe with dual naming handled
 for several releases, but a
 clear end date when all are converted.
 Else there will be many years with a mix of rte_ and dpdk_


>>>
>>> Googling rte functions or error codes usually takes you to dpdk dev email
>>> archive so I don't think it is that much difficult to figure out where rte
>>> comes from.
>>> Other than that , except for my own refactoring pains when replacing a dpdk
>>> version, I do not see a major reason why not.
>>> If Going for dpdk_ prefix, I agree with the quick death approach.
>>
>> +1: it's a bit weird to keep both, especially for a long while, that
>> every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
>> Instead of breaking applications many times, I'd prefer to break once.
>> Therefore, applications could do a simple global rte_ -> dpdk_ substitute:
>> it doesn't sound that painful then.
>
>I concur. If (and I think that should be a pretty big IF) the prefix is 
>to be changed then its better done in one fast sweep than gradually.
>
>Gratuitious (or nearly so) change is always extremely annoying, and the 
>longer it takes the more painful it is. Application developers wont much 
>care what the prefix is as long as its consistent, but if they're forced 
>to track prefix changes across several releases with different libraries 
>moving at different pace, they WILL be calling for bloody murder :)
>
>As for rte_ being strange for DPDK - yes it is, but it takes like 5 
>minutes to get over it. It would help to have it explained on dpdk.org 
>FAQ: "Due to historical reasons, DPDK libraries are prefixed rte_ 
>instead of dpdk_ because name> and changing it is unnecessarily painful."

As I understand RTE is from the ?Run Time Environment? which was the primary 
set of API?s at the time and it just kept getting propagated :-)

>
>>
>> And here are few more comments:
>>
>> - we should add rte_/dpdk_ prefix to all public structures as well.
>>
>>I'm thinking we are doing well here. I'm just aware that vhost lib
>>does a bad job, which is something I proposed to fix in next release.
>
>Yup, all public symbols should be prefixed. What the exact prefix is 
>isn't that important really.
>
>>
>> - If we do the whole change once, I'd suggest to do it ASAP when this
>>release is over.
>>
>>It should be a HUGE change that touches a lot of code, if we do it
>>later, at a stage that a lot of patches for new features have been
>>made or sent out, all of them need rebase. That'd be painful.
>
>Nod, that's yet another aspect to consider.
>
>So to summarize, I'm not strongly opposed to doing a one-time mass rte_ 
>-> dpdk_ prefix change, but it needs to be one big sweep all at once, or 
>not do

[dpdk-dev] [PATCH v4 0/4] fix creation of duplicate lpm and hash

2016-04-06 Thread Thomas Monjalon

2016-04-06 15:27, Olivier Matz:
> Seen while trying to fix the func_reentrancy autotest. The
> series addresses several issues:
> 
> 1/ Hash and lpm return a pointer to an existing object if the user requests 
> the
>creation with an already existing name. This look dangerous: when an object
>is returned, the user does not know if it should be freed or not.
> 
> 2/ There is a race condition in cuckoo_hash as the lock is not held in
>rte_hash_create(). We could find some cases where NULL is returned when the
>object already exists (ex: when rte_ring_create() fails).
> 
> 3/ There is a race condition func_reentrancy that can fail even if the tested
>API behaves correctly.

Applied, thanks

[dpdk-dev] [PATCH] aesni_gcm: fix incorrect supported key sizes

2016-04-06 Thread Pablo de Lara

AES-GCM PMD only supports 128-bit keys, but not 192 and 256,
which the capabilities structure was reporting.

Fixes: 27cf2d1b18e1 ("examples/l2fwd-crypto: discover capabilities")

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c 
b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index 4dec8dd..e824d4b 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -48,8 +48,8 @@ static const struct rte_cryptodev_capabilities 
aesni_gcm_pmd_capabilities[] = {
.block_size = 16,
.key_size = {
.min = 16,
-   .max = 32,
-   .increment = 8
+   .max = 16,
+   .increment = 0
},
.digest_size = {
.min = 8,
@@ -73,8 +73,8 @@ static const struct rte_cryptodev_capabilities 
aesni_gcm_pmd_capabilities[] = {
.block_size = 16,
.key_size = {
.min = 16,
-   .max = 32,
-   .increment = 8
+   .max = 16,
+   .increment = 0
},
.iv_size = {
.min = 16,
-- 
2.5.5

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Thomas Monjalon

2016-04-07 00:09, Yuanhan Liu:
> On Wed, Apr 06, 2016 at 09:37:53AM +, Loftus, Ciara wrote:
> > > On Wed, Apr 06, 2016 at 03:49:25PM +0900, Tetsuya Mukawa wrote:
> > > > Hi,
> > > >
> > > > I appreciate fixing it.
> > > > Just one worry is that state changed event may be occurred before new
> > > > device event.
> > > > The users should not call rte_eth_dev_socket_id() until new device event
> > > > comes, even if they catch queue state events.
> > > > Otherwise, they will get wrong socket id to call
> > > > rte_eth_rx/tx_queue_setup().
> > > 
> > > There is no way to guarantee that the socket id stuff would work
> > > perfectly in vhost, right? I mean, it's likely that virtio device
> > > would allocate memory from 2 or more sockets.
> > > 
> > > So, it doesn't matter too much whether it's set perfectly right
> > > or not. Instead, we should assign it with a saner value instead
> > > of a obvious wrong one when new_device() is not invoked yet. So,
> > > I'd suggest to make an assignment first based on vhost_dev (or
> > > whatever) struct, and then make it "right" at new_device()
> > > callback?
> > 
> > Thanks for the feedback.
> > At the moment with this patch numa_node is initially set to rte_socket_id() 
> > during  pmd init and then updated to the correct value during new_device.
> > Are you suggesting we set it again in between these two steps ("based on 
> > vhost_dev")? If so where do you think would be a good place?
> 
> Oh, I was not aware of that. Then I think we are fine here.

Please Yuanhan, could you be more explicit?

[dpdk-dev] [PATCH] aesni_gcm: fix incorrect supported key sizes

2016-04-06 Thread Thomas Monjalon

2016-04-06 16:39, Pablo de Lara:
> AES-GCM PMD only supports 128-bit keys, but not 192 and 256,
> which the capabilities structure was reporting.
> 
> Fixes: 27cf2d1b18e1 ("examples/l2fwd-crypto: discover capabilities")

I think you mean
Fixes: 26c2e4ad5ad4 ("cryptodev: add capabilities discovery")

[dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label

2016-04-06 Thread Thomas Monjalon

2016-04-06 12:05, Fiona Trahe:
> The cryptodev API was introduced in the DPDK 2.2 release.
> Since then it has
>  - been reviewed and iterated for the DPDK 16.04 release
>  - had extensive use by the l2fwd-crypto app,
>   the ipsec-secgw example app,
>   the test app.
> We believe it is now stable and the EXPERIMENTAL label should be removed.

I thought it was a good idea to use a dpdk_ prefix in this new "stable" lib,
but it appears that almost everybody is against having a mix of prefixes.

Acked-by: Thomas Monjalon 

> v2:

The v2 is missing in the title (option -v2 in git).

[dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label

2016-04-06 Thread Trahe, Fiona



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, April 06, 2016 5:37 PM
> To: Trahe, Fiona
> Cc: dev at dpdk.org; Doherty, Declan
> Subject: Re: [dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label
> 
> 2016-04-06 12:05, Fiona Trahe:
> > The cryptodev API was introduced in the DPDK 2.2 release.
> > Since then it has
> >  - been reviewed and iterated for the DPDK 16.04 release
> >  - had extensive use by the l2fwd-crypto app,
> > the ipsec-secgw example app,
> > the test app.
> > We believe it is now stable and the EXPERIMENTAL label should be removed.
> 
> I thought it was a good idea to use a dpdk_ prefix in this new "stable" lib, 
> but it
> appears that almost everybody is against having a mix of prefixes.
> 
> Acked-by: Thomas Monjalon 
> 
> > v2:
> 
> The v2 is missing in the title (option -v2 in git).
Sorry :(
Do  you want me to resend with updated subject?

[dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer

2016-04-06 Thread Dumitrescu, Cristian

Hi Robert,

Sorry for my delay, I am traveling this week, I will reply as soon as I find a 
slot to focus on this, hopefully in the next couple of days, thanks for your 
patience.

Regards,
Cristian

> -Original Message-
> From: Sanford, Robert [mailto:rsanford at akamai.com]
> Sent: Friday, April 1, 2016 12:43 PM
> To: dev at dpdk.org; Dumitrescu, Cristian 
> Subject: Re: [dpdk-dev] [PATCH 1/4] app/test: enhance
> test_port_ring_writer
> 
> We don't need to change this line, because we never access more than
> RTE_PORT_IN_BURST_SIZE_MAX (64) elements in this array:
> 
> - struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> + struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> 
> 
> --
> Robert
> 
> >Add code to send two 60-packet bursts to a ring port_out.
> >This tests a ring writer buffer overflow problem and fix
> >(in patch 2/4).
> >
> >Signed-off-by: Robert Sanford 
> >---
> > app/test/test_table_ports.c |   27 +--
> > 1 files changed, 25 insertions(+), 2 deletions(-)
> >
> >diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
> >index 2532367..0c0ec0a 100644
> >--- a/app/test/test_table_ports.c
> >+++ b/app/test/test_table_ports.c
> >@@ -149,8 +149,8 @@ test_port_ring_writer(void)
> >
> > /* -- Traffic TX -- */
> > int expected_pkts, received_pkts;
> >-struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> >-struct rte_mbuf *res_mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> >+struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >+struct rte_mbuf *res_mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >
> > port_ring_writer_params.ring = RING_TX;
> > port_ring_writer_params.tx_burst_sz =
> RTE_PORT_IN_BURST_SIZE_MAX;
> >@@ -216,5 +216,28 @@ test_port_ring_writer(void)
> > for (i = 0; i < RTE_PORT_IN_BURST_SIZE_MAX; i++)
> > rte_pktmbuf_free(res_mbuf[i]);
> >
> >+/* TX Bulk - send two 60-packet bursts */
> >+uint64_t pkt_mask = 0xfff0ULL;
> >+
> >+for (i = 0; i < 4; i++)
> >+mbuf[i] = NULL;
> >+for (i = 4; i < 64; i++)
> >+mbuf[i] = rte_pktmbuf_alloc(pool);
> >+rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
> >+for (i = 4; i < 64; i++)
> >+mbuf[i] = rte_pktmbuf_alloc(pool);
> >+rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
> >+rte_port_ring_writer_ops.f_flush(port);
> >+
> >+expected_pkts = 2 * 60;
> >+received_pkts =
> rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
> >+(void **)res_mbuf, 2 * RTE_PORT_IN_BURST_SIZE_MAX);
> >+
> >+if (received_pkts != expected_pkts)
> >+return -10;
> >+
> >+for (i = 0; i < received_pkts; i++)
> >+rte_pktmbuf_free(res_mbuf[i]);
> >+
> > return 0;
> > }
> >--
> >1.7.1
> 
>

[dpdk-dev] [PATCH] vhost: Fix retrieval of numa information in PMD

2016-04-06 Thread Thomas Monjalon

2016-04-06 13:03, Yuanhan Liu:
> On Tue, Apr 05, 2016 at 05:09:47PM +0100, Ciara Loftus wrote:
> > After some testing, it was found that retrieving numa information
> > about a vhost device via a call to get_mempolicy is more
> > accurate when performed during the new_device callback versus
> > the vring_state_changed callback, in particular upon initial boot
> > of the VM.  Performing this check during new_device is also
> > potentially more efficient as this callback is only triggered once
> > during device initialisation, compared with vring_state_changed
> > which may be called multiple times depending on the number of
> > queues assigned to the device.
> > 
> > Reorganise the code to perform this check and assign the correct
> > socket_id to the device during the new_device callback.
> > 
> > Signed-off-by: Ciara Loftus 
> 
> Acked-by: Yuanhan Liu 

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Applied, thanks

[dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label

2016-04-06 Thread Thomas Monjalon

2016-04-06 16:46, Trahe, Fiona:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > v2:
> > 
> > The v2 is missing in the title (option -v2 in git).
> Sorry :(
> Do  you want me to resend with updated subject?

No, it was just for information for next times.

[dpdk-dev] [PATCH] aesni_gcm: fix incorrect supported key sizes

2016-04-06 Thread Thomas Monjalon

2016-04-06 18:29, Thomas Monjalon:
> 2016-04-06 16:39, Pablo de Lara:
> > AES-GCM PMD only supports 128-bit keys, but not 192 and 256,
> > which the capabilities structure was reporting.
> > 
> > Fixes: 27cf2d1b18e1 ("examples/l2fwd-crypto: discover capabilities")
> 
> I think you mean
> Fixes: 26c2e4ad5ad4 ("cryptodev: add capabilities discovery")

Applied, thanks

[dpdk-dev] [PATCH 0/3] Fixes/extra documentation for Crypto PMDs

2016-04-06 Thread Thomas Monjalon

2016-04-04 14:14, Pablo de Lara:
> This patchset fixes some minor typos in the Crypto PMD documentation
> and add an extra section to help user to initialize the devices.
> 
> The patchset depends on:
> "doc: update the snow3g library information" 
> (http://dpdk.org/dev/patchwork/patch/11889/)
> 
> Pablo de Lara (3):
>   doc: fix typo in Crypto NULL PMD title
>   doc: remove unnecessary underline characters in AES GCM PMD
>   doc: add initialization section for virtual crypto PMDs

Applied, thanks

[dpdk-dev] [PATCH] doc: update the snow3g library information

2016-04-06 Thread Thomas Monjalon

> > A new process to request the libsso library required by the SNOW3G PMD
> > has been put in place, through a website, replacing the previous email 
> > method.
> > This commit updates the SNOW3G documentation, to reflect this change.
> >
> > Since the library does not support newer gcc versions, the documentation
> > also contains a patch to make the library work with gcc > 5.0.
> >
> > Signed-off-by: Pablo de Lara 
> 
> Acked-by: Declan Doherty 

Applied, thanks

[dpdk-dev] [PATCH] cryptodev: renaming 2 elements for clarity

2016-04-06 Thread Thomas Monjalon

2016-04-04 18:14, Fiona Trahe:
> renamed rte_cryptodev_sym_session.type -> dev_type
> (as it's not a session type, but a device type)
> 
> renamed rte_crypto_sym_op.type -> sess_type
> (as it's not an op type, but a session type)
> 
> Signed-off-by: Fiona Trahe 

Applied, thanks

[dpdk-dev] [PATCH] cryptodev: Remove EXPERIMENTAL label

2016-04-06 Thread Thomas Monjalon

> > The cryptodev API was introduced in the DPDK 2.2 release.
> > Since then it has
> >  - been reviewed and iterated for the DPDK 16.04 release
> >  - had extensive use by the l2fwd-crypto app,
> > the ipsec-secgw example app,
> > the test app.
> > We believe it is now stable and the EXPERIMENTAL label should be removed.
> 
> I thought it was a good idea to use a dpdk_ prefix in this new "stable" lib,
> but it appears that almost everybody is against having a mix of prefixes.
> 
> Acked-by: Thomas Monjalon 

Applied, thanks

[dpdk-dev] Kernel panic in KNI

2016-04-06 Thread Jay Rolette

I had a system lockup hard a couple of days ago and all we were able to get
was a photo of the LCD monitor with most of the kernel panic on it. No way
to scroll back the buffer and nothing in the logs after we rebooted. Not
surprising with a kernel panic due to an exception during interrupt
processing. We have a serial console attached in case we are able to get it
to happen again, but it's not easy to reproduce (hours of runtime for this
instance).

Ran the photo through OCR software to get a text version of the dump, so
possible I missed some fixups in this:

[39178.433262] RDX: 00ba RSI: 881fd2f350ee RDI:
a12520669126180a
[39178.464020] RBP: 880433966970 R08: a12520669126180a R09:
881fd2f35000
[39178.495091] R10:  R11: 881fd2f88000 R12:
883fdla75ee8
[39178.526594] R13: 00ba R14: 7fdad5a66780 R15:
883715ab6780
[39178.559011] FS:  77fea740() GS:88lfffc0()
knlGS:
[39178.592005] CS:  0010 DS:  ES:  CR0: 80050033
[39178.623931] CR2: 77ea2000 CR3: 001fd156f000 CR4:
001407f0
[39178.656187] Stack:
[39178.689025] c067c7ef 00ba 00ba
881fd2f88000
[39178.722682] 4000 8B3fd0bbd09c 883fdla75ee8
8804339bb9c8
[39178.756525] 81658456 881fcd2ec40c c0680700
880436bad800
[39178.790577] Call Trace:
[39178.824420] [] ? kni_net_tx+0xef/0x1a0 [rte_kni]
[39178.859190] [] dev_hard_start_xmit+0x316/0x5c0
[39178.893426] [] sch_direct_xmit+0xee/0xic0
[39178.927435] [l __dev_queue_xmit+0x200/0x4d0
[39178.961684] [l dev_queue_xmit+0x10/0x20
[39178.996194] [] neigh_connected_output+0x67/0x100
[39179.031098] [] ip_finish_output+0xid8/0x850
[39179.066709] [l ip_output+0x58/0x90
[39179.101551] [] ip_local_out_sk+0x30/0x40
[39179.136823] [] ip_queue_xmit+0xl3f/0x3d0
[39179.171742] [] tcp_transmit_skb+0x47c/0x900
[39179.206854] [l tcp_write_xmit+0x110/0xcb0
[39179.242335] [] __tcp_push_pending_frames+0x2e/0xc0
[39179.277632] [] tcp_push+0xec/0x120
[39179.311768] [] tcp_sendmsg+0xb9/0xce0
[39179.346934] [] ? tcp_recvmsg+0x6e2/0xba0
[39179.385586] [] inet_sendmsg+0x64/0x60
[39179.424228] [] ? apparmor_socket_sendmsg+0x21/0x30
[39179.4586581 [] sock_sendmsg+0x86/0xc0
[39179.493220] [] ? __inet_stream_connect+0xa5/0x320
[39179.528033] [] ? __fdget+0x13/0x20
[39179.561214] [] SYSC_sendto+0x121/0x1c0
[39179.594665] [] ? aa_sk_perm.isra.4+0x6d/0x150
[39179.6268931 [] ? read_tsc+0x9/0x20
[39179.6586541 [] ? ktime_get_ts+0x48/0xe0
[39179.689944] [] SyS_sendto+0xe/0x10
[39179.719575] [] system_call_fastpath+0xia/0xif
[39179.748760] Code: 43 58 48 Zb 43 50 88 43 4e 5b 5d c3 66 Of if 84 00 00
00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90
 90 48 89 f8 48 89 d1  a4 c3 03 83 eZ 07 f3 48 .15 89 di f3 a4 c3 20 4c
8b % 4c 86
[39179.808690] RIP  [] memcpy+0x6/0x110
[39179.837238]  RSP 
[39179.933755] ---[ end trace 2971562f425e2cf8 ]---
[39179.964856] Kernel panic - not syncing: Fatal exception in interrupt
[39179.992896] Kernel Offset: 0x0 from 0x8100 (relocation
range: 0x8000-0xbfff)
[39180.024617] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt

It blew up when kni_net_tx() called memcpy() to copy data from the skb to
an mbuf.

Disclosure: I'm not a Linux device driver guy. I dip into the kernel as
needed. Plenty of experience doing RTOS and bare metal development, but not
a Linux kernel expert.

What context does kni_net_tx() run in? On the receive path, my
understanding is that KNI always runs in process context on a kthread. I've
been assuming that the transmit path was also in process context (albeit on
the app's process), so the "Fatal exception in interrupt" is throwing me.

Does kni_net_tx() ever run in interrupt (or soft-interrupt) context?

Thanks,
Jay

[dpdk-dev] DPDK namespace

2016-04-06 Thread Dave Neary

Hi,

On 04/06/2016 08:07 AM, Panu Matilainen wrote:
>> +1: it's a bit weird to keep both, especially for a long while, that
>> every time we turn a rte_ prefix to dpdk_ prefix, we break applications.
>> Instead of breaking applications many times, I'd prefer to break once.
>> Therefore, applications could do a simple global rte_ -> dpdk_
>> substitute:
>> it doesn't sound that painful then.
> 
> I concur. If (and I think that should be a pretty big IF) the prefix is
> to be changed then its better done in one fast sweep than gradually.
> 
> Gratuitious (or nearly so) change is always extremely annoying, and the
> longer it takes the more painful it is. Application developers wont much
> care what the prefix is as long as its consistent, but if they're forced
> to track prefix changes across several releases with different libraries
> moving at different pace, they WILL be calling for bloody murder :)

How about the idea of creating (at switch over time) an optionally
installable dpdk_compat package that just has a list of #defines for the
old symbols pointing them at the new symbols? That would also allow
people with old applications to update DPDK without having to modify
their applications.

Thanks,
Dave.

-- 
Dave Neary - NFV/SDN Community Strategy
Open Source and Standards, Red Hat - http://community.redhat.com
Ph: +1-978-399-2182 / Cell: +1-978-799-3338

[dpdk-dev] Potential deadlock in KNI RX path

2016-04-06 Thread Jay Rolette

Over a year ago, Neil pointed out that calling netif_rx() from
kni_net_rx_normal() was a bug and could cause lockups. Here's the comment:

http://dpdk.org/ml/archives/dev/2015-March/015783.html

Looking at the current code base, it is still calling netif_rx() instead of
netif_rx_ni() as suggested.

Did that fall through the cracks or is there disagreement about whether it
was the right thing to do?

Thanks,
Jay

[dpdk-dev] Kernel panic in KNI

2016-04-06 Thread Sanford, Robert

Hi Jay,

I won't try to interpret your kernel stack trace. But, I'll tell you about
a KNI-related problem that we once experienced, and the symptom was a
kernel hang.

The problem was that we were passing mbufs allocated out of one mempool,
to a KNI context that we had set up with a different mempool (on a
different CPU socket). The KNI kernel driver, converts the user-space mbuf
virtual address (VA) to a kernel VA by adding the difference between the
user and kernel VAs of the mempool used to create the KNI context. So, if
an mbuf comes from a different mempool, the calculated address will
probably be VERY BAD.

Could this be your problem?

--
Robert


On 4/6/16 4:16 PM, "Jay Rolette"  wrote:

>I had a system lockup hard a couple of days ago and all we were able to
>get
>was a photo of the LCD monitor with most of the kernel panic on it. No way
>to scroll back the buffer and nothing in the logs after we rebooted. Not
>surprising with a kernel panic due to an exception during interrupt
>processing. We have a serial console attached in case we are able to get
>it
>to happen again, but it's not easy to reproduce (hours of runtime for this
>instance).
>
>Ran the photo through OCR software to get a text version of the dump, so
>possible I missed some fixups in this:
>
>[39178.433262] RDX: 00ba RSI: 881fd2f350ee RDI:
>a12520669126180a
>[39178.464020] RBP: 880433966970 R08: a12520669126180a R09:
>881fd2f35000
>[39178.495091] R10:  R11: 881fd2f88000 R12:
>883fdla75ee8
>[39178.526594] R13: 00ba R14: 7fdad5a66780 R15:
>883715ab6780
>[39178.559011] FS:  77fea740() GS:88lfffc0()
>knlGS:
>[39178.592005] CS:  0010 DS:  ES:  CR0: 80050033
>[39178.623931] CR2: 77ea2000 CR3: 001fd156f000 CR4:
>001407f0
>[39178.656187] Stack:
>[39178.689025] c067c7ef 00ba 00ba
>881fd2f88000
>[39178.722682] 4000 8B3fd0bbd09c 883fdla75ee8
>8804339bb9c8
>[39178.756525] 81658456 881fcd2ec40c c0680700
>880436bad800
>[39178.790577] Call Trace:
>[39178.824420] [] ? kni_net_tx+0xef/0x1a0 [rte_kni]
>[39178.859190] [] dev_hard_start_xmit+0x316/0x5c0
>[39178.893426] [] sch_direct_xmit+0xee/0xic0
>[39178.927435] [l __dev_queue_xmit+0x200/0x4d0
>[39178.961684] [l dev_queue_xmit+0x10/0x20
>[39178.996194] [] neigh_connected_output+0x67/0x100
>[39179.031098] [] ip_finish_output+0xid8/0x850
>[39179.066709] [l ip_output+0x58/0x90
>[39179.101551] [] ip_local_out_sk+0x30/0x40
>[39179.136823] [] ip_queue_xmit+0xl3f/0x3d0
>[39179.171742] [] tcp_transmit_skb+0x47c/0x900
>[39179.206854] [l tcp_write_xmit+0x110/0xcb0
>[39179.242335] [] __tcp_push_pending_frames+0x2e/0xc0
>[39179.277632] [] tcp_push+0xec/0x120
>[39179.311768] [] tcp_sendmsg+0xb9/0xce0
>[39179.346934] [] ? tcp_recvmsg+0x6e2/0xba0
>[39179.385586] [] inet_sendmsg+0x64/0x60
>[39179.424228] [] ? apparmor_socket_sendmsg+0x21/0x30
>[39179.4586581 [] sock_sendmsg+0x86/0xc0
>[39179.493220] [] ? __inet_stream_connect+0xa5/0x320
>[39179.528033] [] ? __fdget+0x13/0x20
>[39179.561214] [] SYSC_sendto+0x121/0x1c0
>[39179.594665] [] ? aa_sk_perm.isra.4+0x6d/0x150
>[39179.6268931 [] ? read_tsc+0x9/0x20
>[39179.6586541 [] ? ktime_get_ts+0x48/0xe0
>[39179.689944] [] SyS_sendto+0xe/0x10
>[39179.719575] [] system_call_fastpath+0xia/0xif
>[39179.748760] Code: 43 58 48 Zb 43 50 88 43 4e 5b 5d c3 66 Of if 84 00 00
>00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90
> 90 48 89 f8 48 89 d1  a4 c3 03 83 eZ 07 f3 48 .15 89 di f3 a4 c3 20
>4c
>8b % 4c 86
>[39179.808690] RIP  [] memcpy+0x6/0x110
>[39179.837238]  RSP 
>[39179.933755] ---[ end trace 2971562f425e2cf8 ]---
>[39179.964856] Kernel panic - not syncing: Fatal exception in interrupt
>[39179.992896] Kernel Offset: 0x0 from 0x8100 (relocation
>range: 0x8000-0xbfff)
>[39180.024617] ---[ end Kernel panic - not syncing: Fatal exception in
>interrupt
>
>It blew up when kni_net_tx() called memcpy() to copy data from the skb to
>an mbuf.
>
>Disclosure: I'm not a Linux device driver guy. I dip into the kernel as
>needed. Plenty of experience doing RTOS and bare metal development, but
>not
>a Linux kernel expert.
>
>What context does kni_net_tx() run in? On the receive path, my
>understanding is that KNI always runs in process context on a kthread.
>I've
>been assuming that the transmit path was also in process context (albeit
>on
>the app's process), so the "Fatal exception in interrupt" is throwing me.
>
>Does kni_net_tx() ever run in interrupt (or soft-interrupt) context?
>
>Thanks,
>Jay

[dpdk-dev] [PATCH] vhost: call rte_vhost_enable_guest_notification only on enabled queues

2016-04-06 Thread Rich Lane

If the vhost PMD were configured with more queues than the guest, the old
code would segfault in rte_vhost_enable_guest_notification due to a NULL
virtqueue pointer.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")
Signed-off-by: Rich Lane 
---
 drivers/net/vhost/rte_eth_vhost.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index b1eb082..310cbef 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -265,7 +265,6 @@ new_device(struct virtio_net *dev)
vq->device = dev;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
-   rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
}
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
vq = eth_dev->data->tx_queues[i];
@@ -274,9 +273,11 @@ new_device(struct virtio_net *dev)
vq->device = dev;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
-   rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
}

+   for (i = 0; i < dev->virt_qp_nb * VIRTIO_QNUM; i++)
+   rte_vhost_enable_guest_notification(dev, i, 0);
+
dev->flags |= VIRTIO_DEV_RUNNING;
dev->priv = eth_dev;
eth_dev->data->dev_link.link_status = ETH_LINK_UP;
-- 
1.9.1

93 matches

Mail list logo