from:"Rich Lane"

[dpdk-dev] [PATCH] vhost: add back support for concurrent enqueue

2016-08-23 Thread Rich Lane

On Mon, Aug 22, 2016 at 7:16 AM, Yuanhan Liu 
wrote:

> On Thu, Aug 18, 2016 at 11:27:06AM -0700, Rich Lane wrote:
> > On Mon, Aug 15, 2016 at 7:37 PM, Yuanhan Liu <
> yuanhan.liu at linux.intel.com>
> > wrote:
> >
> > On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote:
> > > Concurrent enqueue is an important performance optimization when
> the
> > number
> > > of cores used for switching is different than the number of vhost
> queues.
> > > I've observed a 20% performance improvement compared to a strategy
> that
> > > binds queues to cores.
> > >
> > > The atomic cmpset is only executed when the application calls
> > > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact
> > > when not using concurrent enqueue.
> > >
> > > Mergeable RX buffers aren't supported by concurrent enqueue to
> minimize
> > > code complexity.
> >
> > I think that would break things when Mergeable rx is enabled (which
> is
> > actually enabled by default).
> >
> >
> > Would it be reasonable to return -ENOTSUP in this case, and restrict
> concurrent
> > enqueue
> > to devices where VIRTIO_NET_F_MRG_RXBUF is disabled?
> >
> > I could also add back concurrent enqueue support for mergeable RX, but I
> was
> > hoping to avoid
> > that since the mergeable codepath is already complex and wouldn't be
> used in
> > high performance
> > deployments.
>
> Another note is that, you might also have noticed, Zhihong made a patch
> set [0] to optimize the enqueue code path (mainly on mergeable path). It
> basically does a rewrite from scatch, which removes the desc buf
> reservation,
> meaning it would be harder to do concurrent enqueue support based on that.
>
> [0]: Aug 19 Zhihong Wang(  68) ??>[PATCH v3 0/5] vhost: optimize
> enqueue
>

Yes, I'd noticed that these patches would conflict. Once the vhost-cuse
removal and
Zhihong's patches have merged I'll send a new version.

[dpdk-dev] [PATCH] vhost: add back support for concurrent enqueue

2016-08-18 Thread Rich Lane

On Mon, Aug 15, 2016 at 7:37 PM, Yuanhan Liu 
wrote:

> On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote:
> > Concurrent enqueue is an important performance optimization when the
> number
> > of cores used for switching is different than the number of vhost queues.
> > I've observed a 20% performance improvement compared to a strategy that
> > binds queues to cores.
> >
> > The atomic cmpset is only executed when the application calls
> > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact
> > when not using concurrent enqueue.
> >
> > Mergeable RX buffers aren't supported by concurrent enqueue to minimize
> > code complexity.
>
> I think that would break things when Mergeable rx is enabled (which is
> actually enabled by default).
>

Would it be reasonable to return -ENOTSUP in this case, and restrict
concurrent enqueue
to devices where VIRTIO_NET_F_MRG_RXBUF is disabled?

I could also add back concurrent enqueue support for mergeable RX, but I
was hoping to avoid
that since the mergeable codepath is already complex and wouldn't be used
in high performance
deployments.

> Besides that, as mentioned in the last week f2f talk, do you think adding
> a new flag RTE_VHOST_USER_CONCURRENT_ENQUEUE (for
> rte_vhost_driver_register())
> __might__ be a better idea? That could save us a API, to which I don't
> object
> though.
>

Sure, I can add a flag instead. That will be similar to how the rte_ring
library picks the enqueue method.

[dpdk-dev] [PATCH] vhost: add back support for concurrent enqueue

2016-08-15 Thread Rich Lane

Concurrent enqueue is an important performance optimization when the number
of cores used for switching is different than the number of vhost queues.
I've observed a 20% performance improvement compared to a strategy that
binds queues to cores.

The atomic cmpset is only executed when the application calls
rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact
when not using concurrent enqueue.

Mergeable RX buffers aren't supported by concurrent enqueue to minimize
code complexity.

Partially reverts 39449e74 ("vhost: remove concurrent enqueue") and
includes a fix from "vhost: avoid reordering of used->idx and last_used_idx
updating".

Signed-off-by: Rich Lane 
---
 lib/librte_vhost/rte_vhost_version.map |  6 +++
 lib/librte_vhost/rte_virtio_net.h  | 19 +
 lib/librte_vhost/vhost-net.h   |  2 +
 lib/librte_vhost/vhost_rxtx.c  | 77 ++
 lib/librte_vhost/virtio-net.c  |  2 +
 5 files changed, 97 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index 5ceaa8a..ca9d49e 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -30,3 +30,9 @@ DPDK_16.07 {
rte_vhost_get_queue_num;

 } DPDK_2.1;
+
+DPDK_16.11 {
+   global:
+
+   rte_vhost_enqueue_burst_mp
+} DPDK_16.07;
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 9caa622..0f05917 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -175,6 +175,25 @@ uint16_t rte_vhost_enqueue_burst(int vid, uint16_t 
queue_id,
struct rte_mbuf **pkts, uint16_t count);

 /**
+ * This function adds buffers to the virtio devices RX virtqueue. Buffers can
+ * be received from the physical port or from another virtual device. A packet
+ * count is returned to indicate the number of packets that were successfully
+ * added to the RX queue. This version is multi-producer safe.
+ * @param vid
+ *  virtio-net device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param pkts
+ *  array to contain packets to be enqueued
+ * @param count
+ *  packets num to be enqueued
+ * @return
+ *  num of packets enqueued
+ */
+uint16_t rte_vhost_enqueue_burst_mp(int vid, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint16_t count);
+
+/**
  * This function gets guest buffers from the virtio device TX virtqueue,
  * construct host mbufs, copies guest buffer content to host mbufs and
  * store them in pkts to be processed.
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 38593a2..ea9c377 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -72,6 +72,8 @@ struct vhost_virtqueue {

/* Last index used on the available ring */
volatile uint16_t   last_used_idx;
+   /* Used for multiple devices reserving buffers */
+   volatile uint16_t   last_used_idx_res;
 #define VIRTIO_INVALID_EVENTFD (-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD   (-2)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 08a73fd..eea810e 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -212,6 +212,47 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
return 0;
 }

+/*
+ * As many data cores may want to access available buffers
+ * they need to be reserved.
+ */
+static inline uint32_t
+reserve_avail_buf(struct vhost_virtqueue *vq, uint32_t count, bool mp,
+ uint16_t *start)
+{
+   uint16_t res_start_idx;
+   uint16_t avail_idx;
+   uint16_t free_entries;
+   int success;
+
+   count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
+
+again:
+   res_start_idx = mp ? vq->last_used_idx_res : vq->last_used_idx;
+   avail_idx = *((volatile uint16_t *)>avail->idx);
+
+   free_entries = avail_idx - res_start_idx;
+   count = RTE_MIN(count, free_entries);
+   if (count == 0)
+   return 0;
+
+   if (mp) {
+   uint16_t res_end_idx = res_start_idx + count;
+
+   /*
+   * update vq->last_used_idx_res atomically; try again if failed.
+   */
+   success = rte_atomic16_cmpset(>last_used_idx_res,
+   res_start_idx, res_end_idx);
+   if (unlikely(!success))
+   goto again;
+   }
+
+   *start = res_start_idx;
+
+   return count;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -221,10 +262,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
  */
 static inline uint32_t __attribute__((always_inline))
 virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
-

[dpdk-dev] [PATCH] net/i40e: fix null pointer dereferences when using VMDQ+RSS

2016-08-02 Thread Rich Lane

When using VMDQ+RSS, the queue ids used by the application are not
contiguous (see i40e_pf_config_rss). Most of the driver already handled
this, but there were a few cases where it assumed all configured queues
had been setup.

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Fixes: 6b4537128394 ("i40e: free queue memory when closing")
Fixes: 8e109464c022 ("i40e: allow vector Rx and Tx usage")

Signed-off-by: Rich Lane 
---
 drivers/net/i40e/i40e_rxtx.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..0556a4d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2948,11 +2948,15 @@ i40e_dev_clear_queues(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   if (!dev->data->tx_queues[i])
+   continue;
i40e_tx_queue_release_mbufs(dev->data->tx_queues[i]);
i40e_reset_tx_queue(dev->data->tx_queues[i]);
}

for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   if (!dev->data->rx_queues[i])
+   continue;
i40e_rx_queue_release_mbufs(dev->data->rx_queues[i]);
i40e_reset_rx_queue(dev->data->rx_queues[i]);
}
@@ -2966,12 +2970,16 @@ i40e_dev_free_queues(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   if (!dev->data->rx_queues[i])
+   continue;
i40e_dev_rx_queue_release(dev->data->rx_queues[i]);
dev->data->rx_queues[i] = NULL;
}
dev->data->nb_rx_queues = 0;

for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   if (!dev->data->tx_queues[i])
+   continue;
i40e_dev_tx_queue_release(dev->data->tx_queues[i]);
dev->data->tx_queues[i] = NULL;
}
@@ -3154,7 +3162,7 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
struct i40e_rx_queue *rxq =
dev->data->rx_queues[i];

-   if (i40e_rxq_vec_setup(rxq)) {
+   if (rxq && i40e_rxq_vec_setup(rxq)) {
ad->rx_vec_allowed = false;
break;
}
@@ -3216,7 +3224,8 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct i40e_rx_queue *rxq = dev->data->rx_queues[i];

-   rxq->rx_using_sse = rx_using_sse;
+   if (rxq)
+   rxq->rx_using_sse = rx_using_sse;
}
}
 }
@@ -3255,7 +3264,7 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
struct i40e_tx_queue *txq =
dev->data->tx_queues[i];

-   if (i40e_txq_vec_setup(txq)) {
+   if (txq && i40e_txq_vec_setup(txq)) {
ad->tx_vec_allowed = false;
break;
}
-- 
1.9.1

[dpdk-dev] [PATCH] doc: deprecate vhost-cuse

2016-07-27 Thread Rich Lane

On Wed, Jul 27, 2016 at 1:31 AM, Thomas Monjalon 
wrote:

> > > Vhost-cuse was invented before vhost-user exist. The both are actually
> > > doing the same thing: a vhost-net implementation in user space. But
> they
> > > are not exactly the same thing.
> > >
> > > Firstly, vhost-cuse is harder for use; no one seems to care it, either.
> > > Furthermore, since v2.1, a large majority of development effort has
> gone
> > > to vhost-user. For example, we extended the vhost-user spec to add the
> > > multiple queue support. We also added the vhost-user live migration at
> > > v16.04 and the latest one, vhost-user reconnect that allows vhost app
> > > restart without restarting the guest. Both of them are very important
> > > features for product usage and none of them works for vhost-cuse.
> > >
> > > You now see that the difference between vhost-user and vhost-cuse is
> > > big (and will be bigger and bigger as time moves forward), that you
> > > should never use vhost-cuse, that we should drop it completely.
> > >
> > > The remove would also result to a much cleaner code base, allowing us
> > > to do all kinds of extending easier.
> > >
> > > So here to mark vhost-cuse as deprecated in this release and will be
> > > removed in the next release (v16.11).
> > >
> > > Signed-off-by: Yuanhan Liu 
> >
> > Acked-by: Ciara Loftus 
>
> Acked-by: Thomas Monjalon 
>

Acked-by: Rich Lane

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-07-13 Thread Rich Lane

On Wednesday, July 13, 2016, Yuanhan Liu 
wrote:

> On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote:
> > This scenario fixed somehow, I agree. But this patch still needed to
> protect
> > vhost from untrusted VM, from malicious or buggy virtio application.
> > Maybe we could change the commit-message and resend this patch as a
> > security enhancement? What do you think?
>
> Indeed, but I'm a bit concerned about the performance regression found
> by Rich, yet I am not quite sure why it happens, though Rich claimed
> that it seems to be a problem related to compiler.


The workaround I suggested solves the performance regression. But even if
it hadn't, this is a security fix that should be merged regardless of the
performance impact.

[dpdk-dev] [PATCH] vhost: reset queue state in destroy_device

2016-07-01 Thread Rich Lane

Fixes a bug where rte_eth_vhost_get_queue_event would not return enabled queues
after a guest application restart.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")
Signed-off-by: Rich Lane 
---
 drivers/net/vhost/rte_eth_vhost.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index c5bbb87..33e9728 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -303,6 +303,7 @@ destroy_device(int vid)
struct internal_list *list;
char ifname[PATH_MAX];
unsigned i;
+   struct rte_vhost_vring_state *state;

rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
list = find_internal_resource(ifname);
@@ -345,6 +346,15 @@ destroy_device(int vid)
vq->vid = -1;
}

+   state = vring_states[eth_dev->data->port_id];
+   rte_spinlock_lock(>lock);
+   for (i = 0; i <= state->max_vring; i++) {
+   state->cur[i] = false;
+   state->seen[i] = false;
+   }
+   state->max_vring = 0;
+   rte_spinlock_unlock(>lock);
+
RTE_LOG(INFO, PMD, "Connection closed\n");

_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
-- 
1.9.1

[dpdk-dev] [PATCH v3 08/20] vhost: introduce new API to export numa node

2016-06-08 Thread Rich Lane

On Mon, Jun 6, 2016 at 8:51 PM, Yuanhan Liu 
wrote:
>
> @@ -248,14 +248,9 @@ new_device(struct virtio_net *dev)
> internal = eth_dev->data->dev_private;
>
>  #ifdef RTE_LIBRTE_VHOST_NUMA
> -   ret  = get_mempolicy(, NULL, 0, dev,
> -   MPOL_F_NODE | MPOL_F_ADDR);
> -   if (ret < 0) {
> -   RTE_LOG(ERR, PMD, "Unknown numa node\n");
> -   return -1;
> -   }
> -
> -   eth_dev->data->numa_node = newnode;
> +   newnode = rte_vhost_get_numa_node(dev->vid);
> +   if (newnode > 0)
>

Should be "newnode >= 0".

+   eth_dev->data->numa_node = newnode;
>  #endif
>

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-06-02 Thread Rich Lane

On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets 
wrote:

> Hi, Rich.
> Thank you for testing and analysing.
>
> On 01.06.2016 01:06, Rich Lane wrote:
> > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets  <mailto:i.maximets at samsung.com>> wrote:
> >
> > In current implementation guest application can reinitialize vrings
> > by executing start after stop. In the same time host application
> > can still poll virtqueue while device stopped in guest and it will
> > crash with segmentation fault while vring reinitialization because
> > of dereferencing of bad descriptor addresses.
> >
> >
> > I see a performance regression with this patch at large packet sizes (>
> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles.
> Strangely, there's actually a ~1% performance improvement at small packet
> sizes.
> >
> > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
> >
> > AFAICT this is just the compiler generating bad code. One difference is
> that it's storing the offset on the stack instead of in a register. A
> workaround is to move the !desc_addr check outside the unlikely macros.
> >
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0,
> 0, 0}, 0};
> >
> > desc = >desc[desc_idx];
> > -   if (unlikely(desc->len < vq->vhost_hlen))
> > +   desc_addr = gpa_to_vva(dev, desc->addr);
> > +   if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
> >
> >
> >  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) ||
> !desc_addr)".
> >
> > return -1;
> >
> >
> > -   desc_addr = gpa_to_vva(dev, desc->addr);
> > rte_prefetch0((void *)(uintptr_t)desc_addr);
> >
> > virtio_enqueue_offload(m, _hdr.hdr);
> > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> >
> > desc = >desc[desc->next];
> > desc_addr   = gpa_to_vva(dev, desc->addr);
> > +   if (unlikely(!desc_addr))
> >
> >
> > Workaround: change to "if (!desc_addr)".
> >
> >
> > +   return -1;
> > +
> > desc_offset = 0;
> > desc_avail  = desc->len;
> > }
> >
>
> What about other places? Is there same issues or it's only inside
> copy_mbuf_to_desc() ?
>

Only copy_mbuf_to_desc has the issue.

[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-31 Thread Rich Lane

On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets 
wrote:

> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
>

I see a performance regression with this patch at large packet sizes (> 768
bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely,
there's actually a ~1% performance improvement at small packet sizes.

The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.

AFAICT this is just the compiler generating bad code. One difference is
that it's storing the offset on the stack instead of in a register. A
workaround is to move the !desc_addr check outside the unlikely macros.

--- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0},
> 0};
>
> desc = >desc[desc_idx];
> -   if (unlikely(desc->len < vq->vhost_hlen))
> +   desc_addr = gpa_to_vva(dev, desc->addr);
> +   if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
>

 Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) ||
!desc_addr)".

return -1;

> -   desc_addr = gpa_to_vva(dev, desc->addr);
> rte_prefetch0((void *)(uintptr_t)desc_addr);
>
> virtio_enqueue_offload(m, _hdr.hdr);
> @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>
> desc = >desc[desc->next];
> desc_addr   = gpa_to_vva(dev, desc->addr);
> +   if (unlikely(!desc_addr))
>

Workaround: change to "if (!desc_addr)".

> +   return -1;
> +
> desc_offset = 0;
> desc_avail  = desc->len;
> }
>

[dpdk-dev] [PATCH v2 00/19] vhost ABI/API refactoring

2016-05-26 Thread Rich Lane

On Thu, May 12, 2016 at 10:24 PM, Yuanhan Liu 
wrote:

> v2: - exported ifname as well to fix a vhost-pmd issue reported
>   by Rich
> - separated the big patch that introduces several new APIs
>   into some small patches.
> - updated release note
> - updated version.map
>

Tested-by: Rich Lane 
Acked-by: Rich Lane

[dpdk-dev] [PATCH] af_packet: add byte counters

2016-05-26 Thread Rich Lane

On Thu, May 26, 2016 at 7:47 AM, Ferruh Yigit 
wrote:

> On 5/25/2016 10:03 PM, Rich Lane wrote:
> > Signed-off-by: Rich Lane 
>
> Reviewed-by: Ferruh Yigit 
>
>
> While testing this, independent from patch applied or not , I am getting
> following assertion after random time:
>
> testpmd> PANIC in rte_mbuf_raw_alloc():
> line    assert "rte_mbuf_refcnt_read(m) == 0" failed
>
> I may be doing something wrong, do you observe this Rich?
>

No, I don't see that assertion failure in testpmd or my application.

Thanks for the review!

[dpdk-dev] [PATCH] af_packet: add byte counters

2016-05-25 Thread Rich Lane

Signed-off-by: Rich Lane 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index f17bd7e..2d7f344 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -78,6 +78,7 @@ struct pkt_rx_queue {

volatile unsigned long rx_pkts;
volatile unsigned long err_pkts;
+   volatile unsigned long rx_bytes;
 };

 struct pkt_tx_queue {
@@ -90,6 +91,7 @@ struct pkt_tx_queue {

volatile unsigned long tx_pkts;
volatile unsigned long err_pkts;
+   volatile unsigned long tx_bytes;
 };

 struct pmd_internals {
@@ -131,6 +133,7 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
uint8_t *pbuf;
struct pkt_rx_queue *pkt_q = queue;
uint16_t num_rx = 0;
+   unsigned long num_rx_bytes = 0;
unsigned int framecount, framenum;

if (unlikely(nb_pkts == 0))
@@ -167,9 +170,11 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
/* account for the receive frame */
bufs[i] = mbuf;
num_rx++;
+   num_rx_bytes += mbuf->pkt_len;
}
pkt_q->framenum = framenum;
pkt_q->rx_pkts += num_rx;
+   pkt_q->rx_bytes += num_rx_bytes;
return num_rx;
 }

@@ -186,6 +191,7 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
struct pollfd pfd;
struct pkt_tx_queue *pkt_q = queue;
uint16_t num_tx = 0;
+   unsigned long num_tx_bytes = 0;
int i;

if (unlikely(nb_pkts == 0))
@@ -219,6 +225,7 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
ppd = (struct tpacket2_hdr *) pkt_q->rd[framenum].iov_base;

num_tx++;
+   num_tx_bytes += mbuf->pkt_len;
rte_pktmbuf_free(mbuf);
}

@@ -229,6 +236,7 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
pkt_q->framenum = framenum;
pkt_q->tx_pkts += num_tx;
pkt_q->err_pkts += nb_pkts - num_tx;
+   pkt_q->tx_bytes += num_tx_bytes;
return num_tx;
 }

@@ -287,13 +295,16 @@ eth_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *igb_stats)
 {
unsigned i, imax;
unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+   unsigned long rx_bytes_total = 0, tx_bytes_total = 0;
const struct pmd_internals *internal = dev->data->dev_private;

imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
for (i = 0; i < imax; i++) {
igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts;
+   igb_stats->q_ibytes[i] = internal->rx_queue[i].rx_bytes;
rx_total += igb_stats->q_ipackets[i];
+   rx_bytes_total += igb_stats->q_ibytes[i];
}

imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
@@ -301,13 +312,17 @@ eth_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *igb_stats)
for (i = 0; i < imax; i++) {
igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts;
igb_stats->q_errors[i] = internal->tx_queue[i].err_pkts;
+   igb_stats->q_obytes[i] = internal->tx_queue[i].tx_bytes;
tx_total += igb_stats->q_opackets[i];
tx_err_total += igb_stats->q_errors[i];
+   tx_bytes_total += igb_stats->q_obytes[i];
}

igb_stats->ipackets = rx_total;
+   igb_stats->ibytes = rx_bytes_total;
igb_stats->opackets = tx_total;
igb_stats->oerrors = tx_err_total;
+   igb_stats->obytes = tx_bytes_total;
 }

 static void
@@ -316,12 +331,15 @@ eth_stats_reset(struct rte_eth_dev *dev)
unsigned i;
struct pmd_internals *internal = dev->data->dev_private;

-   for (i = 0; i < internal->nb_queues; i++)
+   for (i = 0; i < internal->nb_queues; i++) {
internal->rx_queue[i].rx_pkts = 0;
+   internal->rx_queue[i].rx_bytes = 0;
+   }

for (i = 0; i < internal->nb_queues; i++) {
internal->tx_queue[i].tx_pkts = 0;
internal->tx_queue[i].err_pkts = 0;
+   internal->tx_queue[i].tx_bytes = 0;
}
 }

-- 
1.9.1

[dpdk-dev] [PATCH v2 6/6] vhost: add pmd client and reconnect option

2016-05-25 Thread Rich Lane

>
> @@ -817,6 +821,9 @@ rte_pmd_vhost_devinit(const char *name, const char
> *params)
> int ret = 0;
> char *iface_name;
> uint16_t queues;
> +   uint64_t flags = 0;
> +   int client_mode;
> +   int reconnect;
>

client_mode and reconnect are not initialized if the arguments aren't
passed.

[dpdk-dev] [PATCH 0/3] [RFC] vhost: micro vhost optimization

2016-05-10 Thread Rich Lane

I see a significant performance improvement with these patches, around 5%
at 64 bytes.

The one patch that didn't give any performance boost for me was "vhost:
arrange virtio_net fields for better cache sharing".

Tested-by: Rich Lane 

On Mon, May 2, 2016 at 5:46 PM, Yuanhan Liu 
wrote:

> Here is a small patch set does the micro optimization, which brings about
> 10% performance boost in my 64B packet testing, with the following topo:
>
> pkt generator <> NIC <-> Virtio NIC
>
> Patch 1 pre updates the used ring and update them in batch. It should be
> feasible from my understanding: there will be no issue, guest driver will
> not start processing them as far as we haven't updated the "used->idx"
> yet. I could miss something though.
>
> Patch 2 saves one check for small packets (that can be hold in one desc
> buf and mbuf).
>
> Patch 3 moves several frequently used fields into one cache line, for
> better cache sharing.
>
> Note that this patch set is based on my latest vhost ABI refactoring
> patchset.
>
>
> ---
> Yuanhan Liu (3):
>   vhost: pre update used ring for Tx and Rx
>   vhost: optimize dequeue for small packets
>   vhost: arrange virtio_net fields for better cache sharing
>
>  lib/librte_vhost/vhost-net.h  |   8 +--
>  lib/librte_vhost/vhost_rxtx.c | 110
> --
>  2 files changed, 68 insertions(+), 50 deletions(-)
>
> --
> 1.9.0
>
>

[dpdk-dev] [PATCH 10/16] vhost: export vid as the only interface to applications

2016-05-10 Thread Rich Lane

On Tue, May 10, 2016 at 9:39 AM, Yuanhan Liu 
wrote:
>
> Rich, would you help try by adding following line there and
> do a test? It would be great if this patch has your Tested-by :)
>
> internal->vid = vid;
>

The problem is new_device has already returned before that point because
find_internal_resource failed.

I suggest adding a cookie parameter to rte_vhost_driver_register and
passing the cookie to each of the vhost_ops. The PMD can use pmd_internals
for the cookie and the whole internal_list can go away.

ps. Could you push git branches somewhere for these larger vhost patch
series? That would make it a lot easier to test than getting patches
individually from patchwork.

[dpdk-dev] [PATCH 10/16] vhost: export vid as the only interface to applications

2016-05-10 Thread Rich Lane

On Mon, May 2, 2016 at 3:25 PM, Yuanhan Liu 
wrote:

> With all the previous prepare works, we are just one step away from
> the final ABI refactoring. That is, to change current API to let them
> stick to vid instead of the old virtio_net dev.
>

This patch removes the only assignment to internal->vid in the PMD. It's
initialized to zero, so only the first vhost connection will work.

[dpdk-dev] [PATCH] vhost: call rte_vhost_enable_guest_notification only on enabled queues

2016-04-06 Thread Rich Lane

If the vhost PMD were configured with more queues than the guest, the old
code would segfault in rte_vhost_enable_guest_notification due to a NULL
virtqueue pointer.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")
Signed-off-by: Rich Lane 
---
 drivers/net/vhost/rte_eth_vhost.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index b1eb082..310cbef 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -265,7 +265,6 @@ new_device(struct virtio_net *dev)
vq->device = dev;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
-   rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
}
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
vq = eth_dev->data->tx_queues[i];
@@ -274,9 +273,11 @@ new_device(struct virtio_net *dev)
vq->device = dev;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
-   rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
}

+   for (i = 0; i < dev->virt_qp_nb * VIRTIO_QNUM; i++)
+   rte_vhost_enable_guest_notification(dev, i, 0);
+
dev->flags |= VIRTIO_DEV_RUNNING;
dev->priv = eth_dev;
eth_dev->data->dev_link.link_status = ETH_LINK_UP;
-- 
1.9.1

[dpdk-dev] [PATCH v2] virtio: use zeroed memory for simple TX header

2016-04-04 Thread Rich Lane

For simple TX the virtio-net header must be zeroed, but it was using memory
that had been initialized with indirect descriptor tables. This resulted in
"unsupported gso type" errors from librte_vhost.

We can use the same memory for every descriptor to save cachelines in the
vswitch.

Fixes: 6dc5de3a (virtio: use indirect ring elements)
Signed-off-by: Rich Lane 
---
v1-v2:
- Use offsetof to get address of tx_hdr

 drivers/net/virtio/virtio_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 2b88efd..ef21d8e 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -377,7 +377,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
vq->vq_ring.desc[i + mid_idx].next = i;
vq->vq_ring.desc[i + mid_idx].addr =
vq->virtio_net_hdr_mem +
-   i * vq->hw->vtnet_hdr_size;
+   offsetof(struct virtio_tx_region, 
tx_hdr);
vq->vq_ring.desc[i + mid_idx].len =
vq->hw->vtnet_hdr_size;
vq->vq_ring.desc[i + mid_idx].flags =
-- 
1.9.1

[dpdk-dev] [PATCH] virtio: use zeroed memory for simple TX header

2016-04-04 Thread Rich Lane

On Mon, Apr 4, 2016 at 1:05 PM, Yuanhan Liu 
wrote:

> On Mon, Apr 04, 2016 at 03:13:37PM +0200, Thomas Monjalon wrote:
> > Huawei, Yuanhan, any comment?
> >
> > 2016-03-31 13:01, Rich Lane:
> > > vq->vq_ring.desc[i + mid_idx].next = i;
> > > vq->vq_ring.desc[i + mid_idx].addr =
> > > -   vq->virtio_net_hdr_mem +
> > > -   i * vq->hw->vtnet_hdr_size;
> > > +   vq->virtio_net_hdr_mem;
>
> I could be wrong, but this looks like a special case when i == 0,
> which is by no way that zeroed memory is guaranteed? Huawei, do
> you have time to check this patch?

This bug exists because the type of the objects pointed to by
virtio_net_hdr_mem changed in 6dc5de3a (virtio: use indirect ring
elements), but because it isn't a C pointer the compiler didn't catch the
type mismatch. We could also fix it with:

vq->virtio_net_hdr_mem + i * sizeof(struct virtio_tx_region) +
offsetof(struct virtio_tx_region, tx_hdr)

Given that tx_hdr is the first member in struct virtio_tx_region, and using
a single header optimizes cache use, that simplifies to the code in my
patch. The virtio-net header is never written to by simple TX so it remains
zeroed.

I can respin the patch using offsetof if that's preferred.

Note that right now virtio simple TX is broken with DPDK vhost due to the
flood of error messages.

[dpdk-dev] [PATCH] virtio: use zeroed memory for simple TX header

2016-03-31 Thread Rich Lane

For simple TX the virtio-net header must be zeroed, but it was using memory
that had been initialized with indirect descriptor tables. This resulted in
"unsupported gso type" errors from librte_vhost.

We can use the same memory for every descriptor to save cachelines in the
vswitch.

Signed-off-by: Rich Lane 
---
 drivers/net/virtio/virtio_rxtx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 2b88efd..1df2df6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -376,8 +376,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
vq->vq_ring.avail->ring[i] = i + mid_idx;
vq->vq_ring.desc[i + mid_idx].next = i;
vq->vq_ring.desc[i + mid_idx].addr =
-   vq->virtio_net_hdr_mem +
-   i * vq->hw->vtnet_hdr_size;
+   vq->virtio_net_hdr_mem;
vq->vq_ring.desc[i + mid_idx].len =
vq->hw->vtnet_hdr_size;
vq->vq_ring.desc[i + mid_idx].flags =
-- 
1.9.1

[dpdk-dev] [PATCH v6] cfgfile: support looking up sections by index

2016-02-25 Thread Rich Lane

This is useful when sections have duplicate names.

Signed-off-by: Rich Lane 
---
v5->v6:
- Reordered sectionname argument in comment.
v4->v5:
- Reordered sectionname argument.
v3->v4:
- Added section name return value.
- Updated API docs for other functions.
v2->v3
- Added check for index < 0.
v1->v2:
- Added new symbol to version script.

 lib/librte_cfgfile/rte_cfgfile.c   | 18 ++
 lib/librte_cfgfile/rte_cfgfile.h   | 39 ++
 lib/librte_cfgfile/rte_cfgfile_version.map |  6 +
 3 files changed, 63 insertions(+)

diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c
index 1cd523f..75625a2 100644
--- a/lib/librte_cfgfile/rte_cfgfile.c
+++ b/lib/librte_cfgfile/rte_cfgfile.c
@@ -333,6 +333,24 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg, const 
char *sectionname,
return i;
 }

+int
+rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
+   char *sectionname,
+   struct rte_cfgfile_entry *entries, int max_entries)
+{
+   int i;
+   const struct rte_cfgfile_section *sect;
+
+   if (index < 0 || index >= cfg->num_sections)
+   return -1;
+
+   sect = cfg->sections[index];
+   snprintf(sectionname, CFG_NAME_LEN, "%s", sect->name);
+   for (i = 0; i < max_entries && i < sect->num_entries; i++)
+   entries[i] = *sect->entries[i];
+   return i;
+}
+
 const char *
 rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
const char *entryname)
diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h
index d443782..834f828 100644
--- a/lib/librte_cfgfile/rte_cfgfile.h
+++ b/lib/librte_cfgfile/rte_cfgfile.h
@@ -126,6 +126,9 @@ int rte_cfgfile_has_section(struct rte_cfgfile *cfg, const 
char *sectionname);
 /**
 * Get number of entries in given config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -138,6 +141,9 @@ int rte_cfgfile_section_num_entries(struct rte_cfgfile *cfg,

 /** Get section entries as key-value pairs
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -155,8 +161,38 @@ int rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
struct rte_cfgfile_entry *entries,
int max_entries);

+/** Get section entries as key-value pairs
+*
+* The index of a section is the same as the index of its name in the
+* result of rte_cfgfile_sections. This API can be used when there are
+* multiple sections with the same name.
+*
+* @param cfg
+*   Config file
+* @param index
+*   Section index
+* @param sectionname
+*   Pre-allocated string of at least CFG_NAME_LEN characters where the
+*   section name is stored after successful invocation.
+* @param entries
+*   Pre-allocated array of at least max_entries entries where the section
+*   entries are stored as key-value pair after successful invocation
+* @param max_entries
+*   Maximum number of section entries to be stored in entries array
+* @return
+*   Number of entries populated on success, negative error code otherwise
+*/
+int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
+   int index,
+   char *sectionname,
+   struct rte_cfgfile_entry *entries,
+   int max_entries);
+
 /** Get value of the named entry in named config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -172,6 +208,9 @@ const char *rte_cfgfile_get_entry(struct rte_cfgfile *cfg,

 /** Check if given entry exists in named config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
diff --git a/lib/librte_cfgfile/rte_cfgfile_version.map 
b/lib/librte_cfgfile/rte_cfgfile_version.map
index bf6c6fd..f6a27a9 100644
--- a/lib/librte_cfgfile/rte_cfgfile_version.map
+++ b/lib/librte_cfgfile/rte_cfgfile_version.map
@@ -13,3 +13,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.3 {
+   global:
+
+   rte_cfgfile_section_entries_by_index;
+} DPDK_2.0;
-- 
1.9.1

[dpdk-dev] [PATCH v5] cfgfile: support looking up sections by index

2016-02-22 Thread Rich Lane

This is useful when sections have duplicate names.

Signed-off-by: Rich Lane 
---
v4->v5:
- Reordered sectionname argument.
v3->v4:
- Added section name return value.
- Updated API docs for other functions.
v2->v3
- Added check for index < 0.
v1->v2:
- Added new symbol to version script.

 lib/librte_cfgfile/rte_cfgfile.c   | 18 ++
 lib/librte_cfgfile/rte_cfgfile.h   | 39 ++
 lib/librte_cfgfile/rte_cfgfile_version.map |  6 +
 3 files changed, 63 insertions(+)

diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c
index 1cd523f..75625a2 100644
--- a/lib/librte_cfgfile/rte_cfgfile.c
+++ b/lib/librte_cfgfile/rte_cfgfile.c
@@ -333,6 +333,24 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg, const 
char *sectionname,
return i;
 }

+int
+rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
+   char *sectionname,
+   struct rte_cfgfile_entry *entries, int max_entries)
+{
+   int i;
+   const struct rte_cfgfile_section *sect;
+
+   if (index < 0 || index >= cfg->num_sections)
+   return -1;
+
+   sect = cfg->sections[index];
+   snprintf(sectionname, CFG_NAME_LEN, "%s", sect->name);
+   for (i = 0; i < max_entries && i < sect->num_entries; i++)
+   entries[i] = *sect->entries[i];
+   return i;
+}
+
 const char *
 rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
const char *entryname)
diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h
index d443782..689fd66 100644
--- a/lib/librte_cfgfile/rte_cfgfile.h
+++ b/lib/librte_cfgfile/rte_cfgfile.h
@@ -126,6 +126,9 @@ int rte_cfgfile_has_section(struct rte_cfgfile *cfg, const 
char *sectionname);
 /**
 * Get number of entries in given config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -138,6 +141,9 @@ int rte_cfgfile_section_num_entries(struct rte_cfgfile *cfg,

 /** Get section entries as key-value pairs
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -155,8 +161,38 @@ int rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
struct rte_cfgfile_entry *entries,
int max_entries);

+/** Get section entries as key-value pairs
+*
+* The index of a section is the same as the index of its name in the
+* result of rte_cfgfile_sections. This API can be used when there are
+* multiple sections with the same name.
+*
+* @param cfg
+*   Config file
+* @param index
+*   Section index
+* @param entries
+*   Pre-allocated array of at least max_entries entries where the section
+*   entries are stored as key-value pair after successful invocation
+* @param max_entries
+*   Maximum number of section entries to be stored in entries array
+* @param sectionname
+*   Pre-allocated string of at least CFG_NAME_LEN characters where the
+*   section name is stored after successful invocation.
+* @return
+*   Number of entries populated on success, negative error code otherwise
+*/
+int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
+   int index,
+   char *sectionname,
+   struct rte_cfgfile_entry *entries,
+   int max_entries);
+
 /** Get value of the named entry in named config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
@@ -172,6 +208,9 @@ const char *rte_cfgfile_get_entry(struct rte_cfgfile *cfg,

 /** Check if given entry exists in named config file section
 *
+* If multiple sections have the given name this function operates on the
+* first one.
+*
 * @param cfg
 *   Config file
 * @param sectionname
diff --git a/lib/librte_cfgfile/rte_cfgfile_version.map 
b/lib/librte_cfgfile/rte_cfgfile_version.map
index bf6c6fd..f6a27a9 100644
--- a/lib/librte_cfgfile/rte_cfgfile_version.map
+++ b/lib/librte_cfgfile/rte_cfgfile_version.map
@@ -13,3 +13,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.3 {
+   global:
+
+   rte_cfgfile_section_entries_by_index;
+} DPDK_2.0;
-- 
1.9.1

[dpdk-dev] [PATCH v3] vhost: remove vhost_net_device_ops

2016-02-19 Thread Rich Lane

The indirection is unnecessary because there is only one implementation
of the vhost common code. Removing it makes the code more readable.

Signed-off-by: Rich Lane 
Acked-by: Yuanhan Liu 
---
v2->v3:
- Rebased.
v1->v2:
- Fix long lines.

 examples/vhost_xen/virtio-net.h   |  2 -
 lib/librte_vhost/vhost-net.h  | 42 +---
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 27 
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  4 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 23 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  6 +-
 lib/librte_vhost/virtio-net.c | 94 +--
 7 files changed, 73 insertions(+), 125 deletions(-)

diff --git a/examples/vhost_xen/virtio-net.h b/examples/vhost_xen/virtio-net.h
index c8c5a7a..ab69726 100644
--- a/examples/vhost_xen/virtio-net.h
+++ b/examples/vhost_xen/virtio-net.h
@@ -110,6 +110,4 @@ struct virtio_net_device_ops {
void (* destroy_device) (volatile struct virtio_net *); /* Remove 
device. */
 };

-struct vhost_net_device_ops const * get_virtio_net_callbacks(void);
-
 #endif
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index affbd1a..f193a1f 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -43,8 +43,6 @@

 #include "rte_virtio_net.h"

-extern struct vhost_net_device_ops const *ops;
-
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
@@ -85,36 +83,28 @@ struct vhost_device_ctx {
uint64_tfh; /* Populated with fi->fh to track the device 
index. */
 };

-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-   int (*new_device)(struct vhost_device_ctx);
-   void (*destroy_device)(struct vhost_device_ctx);
-
-   void (*set_ifname)(struct vhost_device_ctx,
-   const char *if_name, unsigned int if_len);
-
-   int (*get_features)(struct vhost_device_ctx, uint64_t *);
-   int (*set_features)(struct vhost_device_ctx, uint64_t *);
+int vhost_new_device(struct vhost_device_ctx);
+void vhost_destroy_device(struct vhost_device_ctx);

-   int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr 
*);
-   int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct 
vhost_vring_state *);
+void vhost_set_ifname(struct vhost_device_ctx,
+   const char *if_name, unsigned int if_len);

-   int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file 
*);
-   int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file 
*);
+int vhost_get_features(struct vhost_device_ctx, uint64_t *);
+int vhost_set_features(struct vhost_device_ctx, uint64_t *);

-   int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
+int vhost_set_vring_num(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_set_vring_addr(struct vhost_device_ctx, struct vhost_vring_addr *);
+int vhost_set_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_get_vring_base(struct vhost_device_ctx,
+   uint32_t, struct vhost_vring_state *);

-   int (*set_owner)(struct vhost_device_ctx);
-   int (*reset_owner)(struct vhost_device_ctx);
-};
+int vhost_set_vring_kick(struct vhost_device_ctx, struct vhost_vring_file *);
+int vhost_set_vring_call(struct vhost_device_ctx, struct vhost_vring_file *);

+int vhost_set_backend(struct vhost_device_ctx, struct vhost_vring_file *);

-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+int vhost_set_owner(struct vhost_device_ctx);
+int vhost_reset_owner(struct vhost_device_ctx);

 /*
  * Backend-specific cleanup. Defined by vhost-cuse and vhost-user.
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index ae7ad8d..c613e68 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -58,7 +58,6 @@ static const char cuse_device_name[] = "/dev/cuse";
 static const char default_cdev[] = "vhost-net";

 static struct fuse_session *session;
-struct vhost_net_device_ops const *ops;

 /*
  * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
@@ -86,7 +85,7 @@ vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
int err = 0;

-   err = ops->new_device(ctx);
+   err = vhost_new_device(ctx);
if (err == -1) {
fuse_reply_err(req, EPERM);
return;
@@ -108,7 +107,7 @

[dpdk-dev] [PATCH v2] vhost: remove vhost_net_device_ops

2016-02-16 Thread Rich Lane

The indirection is unnecessary because there is only one implementation
of the vhost common code. Removing it makes the code more readable.

Signed-off-by: Rich Lane 
---
v1->v2:
- Fix long lines.

 examples/vhost_xen/virtio-net.h   |  2 -
 lib/librte_vhost/vhost-net.h  | 41 +---
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 27 
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  4 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 23 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  6 +-
 lib/librte_vhost/virtio-net.c | 94 +--
 7 files changed, 73 insertions(+), 124 deletions(-)

diff --git a/examples/vhost_xen/virtio-net.h b/examples/vhost_xen/virtio-net.h
index c8c5a7a..ab69726 100644
--- a/examples/vhost_xen/virtio-net.h
+++ b/examples/vhost_xen/virtio-net.h
@@ -110,6 +110,4 @@ struct virtio_net_device_ops {
void (* destroy_device) (volatile struct virtio_net *); /* Remove 
device. */
 };

-struct vhost_net_device_ops const * get_virtio_net_callbacks(void);
-
 #endif
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index c69b60b..afa9829 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -43,8 +43,6 @@

 #include "rte_virtio_net.h"

-extern struct vhost_net_device_ops const *ops;
-
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
@@ -85,34 +83,27 @@ struct vhost_device_ctx {
uint64_tfh; /* Populated with fi->fh to track the device 
index. */
 };

-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-   int (*new_device)(struct vhost_device_ctx);
-   void (*destroy_device)(struct vhost_device_ctx);
+int vhost_new_device(struct vhost_device_ctx);
+void vhost_destroy_device(struct vhost_device_ctx);

-   void (*set_ifname)(struct vhost_device_ctx,
-   const char *if_name, unsigned int if_len);
+void vhost_set_ifname(struct vhost_device_ctx,
+   const char *if_name, unsigned int if_len);

-   int (*get_features)(struct vhost_device_ctx, uint64_t *);
-   int (*set_features)(struct vhost_device_ctx, uint64_t *);
+int vhost_get_features(struct vhost_device_ctx, uint64_t *);
+int vhost_set_features(struct vhost_device_ctx, uint64_t *);

-   int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr 
*);
-   int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct 
vhost_vring_state *);
+int vhost_set_vring_num(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_set_vring_addr(struct vhost_device_ctx, struct vhost_vring_addr *);
+int vhost_set_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_get_vring_base(struct vhost_device_ctx,
+   uint32_t, struct vhost_vring_state *);

-   int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file 
*);
-   int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file 
*);
+int vhost_set_vring_kick(struct vhost_device_ctx, struct vhost_vring_file *);
+int vhost_set_vring_call(struct vhost_device_ctx, struct vhost_vring_file *);

-   int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-   int (*set_owner)(struct vhost_device_ctx);
-   int (*reset_owner)(struct vhost_device_ctx);
-};
+int vhost_set_backend(struct vhost_device_ctx, struct vhost_vring_file *);

+int vhost_set_owner(struct vhost_device_ctx);
+int vhost_reset_owner(struct vhost_device_ctx);

-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index ae7ad8d..c613e68 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -58,7 +58,6 @@ static const char cuse_device_name[] = "/dev/cuse";
 static const char default_cdev[] = "vhost-net";

 static struct fuse_session *session;
-struct vhost_net_device_ops const *ops;

 /*
  * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
@@ -86,7 +85,7 @@ vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
int err = 0;

-   err = ops->new_device(ctx);
+   err = vhost_new_device(ctx);
if (err == -1) {
fuse_reply_err(req, EPERM);
return;
@@ -108,7 +107,7 @@ vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
int err =

[dpdk-dev] [PATCH] vhost: remove vhost_net_device_ops

2016-02-10 Thread Rich Lane

The indirection is unnecessary because there is only one implementation of the
vhost common code. Removing it makes the code more readable.

Signed-off-by: Rich Lane 
---
 examples/vhost_xen/virtio-net.h   |  2 -
 lib/librte_vhost/vhost-net.h  | 40 +---
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 27 
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  4 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 23 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  6 +-
 lib/librte_vhost/virtio-net.c | 92 ---
 7 files changed, 70 insertions(+), 124 deletions(-)

diff --git a/examples/vhost_xen/virtio-net.h b/examples/vhost_xen/virtio-net.h
index c8c5a7a..ab69726 100644
--- a/examples/vhost_xen/virtio-net.h
+++ b/examples/vhost_xen/virtio-net.h
@@ -110,6 +110,4 @@ struct virtio_net_device_ops {
void (* destroy_device) (volatile struct virtio_net *); /* Remove 
device. */
 };

-struct vhost_net_device_ops const * get_virtio_net_callbacks(void);
-
 #endif
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index c69b60b..81274df 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -43,8 +43,6 @@

 #include "rte_virtio_net.h"

-extern struct vhost_net_device_ops const *ops;
-
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
@@ -85,34 +83,26 @@ struct vhost_device_ctx {
uint64_tfh; /* Populated with fi->fh to track the device 
index. */
 };

-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-   int (*new_device)(struct vhost_device_ctx);
-   void (*destroy_device)(struct vhost_device_ctx);
+int vhost_new_device(struct vhost_device_ctx);
+void vhost_destroy_device(struct vhost_device_ctx);

-   void (*set_ifname)(struct vhost_device_ctx,
-   const char *if_name, unsigned int if_len);
+void vhost_set_ifname(struct vhost_device_ctx,
+   const char *if_name, unsigned int if_len);

-   int (*get_features)(struct vhost_device_ctx, uint64_t *);
-   int (*set_features)(struct vhost_device_ctx, uint64_t *);
+int vhost_get_features(struct vhost_device_ctx, uint64_t *);
+int vhost_set_features(struct vhost_device_ctx, uint64_t *);

-   int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr 
*);
-   int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state 
*);
-   int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct 
vhost_vring_state *);
+int vhost_set_vring_num(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_set_vring_addr(struct vhost_device_ctx, struct vhost_vring_addr *);
+int vhost_set_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
+int vhost_get_vring_base(struct vhost_device_ctx, uint32_t, struct 
vhost_vring_state *);

-   int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file 
*);
-   int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file 
*);
+int vhost_set_vring_kick(struct vhost_device_ctx, struct vhost_vring_file *);
+int vhost_set_vring_call(struct vhost_device_ctx, struct vhost_vring_file *);

-   int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-   int (*set_owner)(struct vhost_device_ctx);
-   int (*reset_owner)(struct vhost_device_ctx);
-};
+int vhost_set_backend(struct vhost_device_ctx, struct vhost_vring_file *);

+int vhost_set_owner(struct vhost_device_ctx);
+int vhost_reset_owner(struct vhost_device_ctx);

-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index ae7ad8d..c613e68 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -58,7 +58,6 @@ static const char cuse_device_name[] = "/dev/cuse";
 static const char default_cdev[] = "vhost-net";

 static struct fuse_session *session;
-struct vhost_net_device_ops const *ops;

 /*
  * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
@@ -86,7 +85,7 @@ vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
int err = 0;

-   err = ops->new_device(ctx);
+   err = vhost_new_device(ctx);
if (err == -1) {
fuse_reply_err(req, EPERM);
return;
@@ -108,7 +107,7 @@ vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
int err = 0;
struct vhost_device_ctx

[dpdk-dev] [PATCH v2] cfgfile: support looking up sections by index

2016-02-10 Thread Rich Lane

On Tue, Feb 2, 2016 at 7:10 AM, Dumitrescu, Cristian <
cristian.dumitrescu at intel.com> wrote:

>
>
> > -Original Message-----
> > From: Rich Lane [mailto:rich.lane at bigswitch.com]
> > Sent: Tuesday, January 19, 2016 4:42 AM
> > To: dev at dpdk.org
> > Cc: Dumitrescu, Cristian ; Panu
> Matilainen
> > 
> > Subject: [PATCH v2] cfgfile: support looking up sections by index
> >
> > This is useful when sections have duplicate names.
>
> Hi Rich,
>
> You are right, I can see this is needed to allow multiple sections with
> identical name in the same configuration file. When sections with the same
> name are not allowed, then this is not needed, as the current API is
> sufficient.
>
> To me, having several sections with the same name does not look like a
> good idea, in fact it might look like an application design flaw, as
> differentiating between sections with the same name can only done based on
> the position of the section in the configuration file, which is an error
> prone option. Some examples:
> 1. While maintaining a large config file, keeping a specific section at a
> fixed position quickly becomes a pain, and shifting the position of the
> section up or down can completely change the application behavior;
> 2. Using basic pre-processors (like CPP or M4) or scripts, the master
> configuration file can include other configuration files, with the
> inclusion of each decided at run-time based on application command line
> parameters, so the position of certain sections is actually not known until
> run-time.
>
> Can you provide some examples when having multiple sections with the same
> name is a key requirement?
>

My application uses a config file to assign RX/TX queues to cores. Ports
and cores have names (e.g. "core.fwd0"), but there's no reason to name
queues. The order of the queue sections is unimportant.


> A straight forward workaround to having multiple sections with the same
> name is to add a number to the name of each such section. Using the current
> API, all the sections with the same prefix name can be read gracefully. As
> an example, the ip_pipeline application allows multiple sections with the
> same name prefix but a different number prefix:
> PIPELINE0, PIPELINE1, ...
> LINK0, LINK1, ...
> MEMPOOL0, MEMPOOL1, ...
> RXQ0.0, RXQ0.1, RXQ1.0, ...
> TXQ0.0, TXQ0.1, TXQ1.0, ...
>
> Is there a reason why this approach is not acceptable for your application?
>

It is less usable. The person writing the config file must generate those
suffixes which are irrelevant to what they're trying to express. The sole
effect is to add another source of errors.

Additionally, this API allows reading a config file in linear instead of
quadratic time (wrt number of sections). While my config files aren't long
enough to make this a requirement it certainly seems desirable.

[dpdk-dev] [PATCH v3] cfgfile: support looking up sections by index

2016-02-10 Thread Rich Lane

This is useful when sections have duplicate names.

Signed-off-by: Rich Lane 
---
v2->v3
- Added check for index < 0.
v1->v2:
- Added new symbol to version script.

 lib/librte_cfgfile/rte_cfgfile.c   | 16 
 lib/librte_cfgfile/rte_cfgfile.h   | 23 +++
 lib/librte_cfgfile/rte_cfgfile_version.map |  6 ++
 3 files changed, 45 insertions(+)

diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c
index a677dad..eba0713 100644
--- a/lib/librte_cfgfile/rte_cfgfile.c
+++ b/lib/librte_cfgfile/rte_cfgfile.c
@@ -333,6 +333,22 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg, const 
char *sectionname,
return i;
 }

+int
+rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
+   struct rte_cfgfile_entry *entries, int max_entries)
+{
+   int i;
+   const struct rte_cfgfile_section *sect;
+
+   if (index < 0 || index >= cfg->num_sections)
+   return -1;
+
+   sect = cfg->sections[index];
+   for (i = 0; i < max_entries && i < sect->num_entries; i++)
+   entries[i] = *sect->entries[i];
+   return i;
+}
+
 const char *
 rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
const char *entryname)
diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h
index d443782..8e69971 100644
--- a/lib/librte_cfgfile/rte_cfgfile.h
+++ b/lib/librte_cfgfile/rte_cfgfile.h
@@ -155,6 +155,29 @@ int rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
struct rte_cfgfile_entry *entries,
int max_entries);

+/** Get section entries as key-value pairs
+*
+* The index of a section is the same as the index of its name in the
+* result of rte_cfgfile_sections. This API can be used when there are
+* multiple sections with the same name.
+*
+* @param cfg
+*   Config file
+* @param index
+*   Section index
+* @param entries
+*   Pre-allocated array of at least max_entries entries where the section
+*   entries are stored as key-value pair after successful invocation
+* @param max_entries
+*   Maximum number of section entries to be stored in entries array
+* @return
+*   Number of entries populated on success, negative error code otherwise
+*/
+int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
+   int index,
+   struct rte_cfgfile_entry *entries,
+   int max_entries);
+
 /** Get value of the named entry in named config file section
 *
 * @param cfg
diff --git a/lib/librte_cfgfile/rte_cfgfile_version.map 
b/lib/librte_cfgfile/rte_cfgfile_version.map
index bf6c6fd..f6a27a9 100644
--- a/lib/librte_cfgfile/rte_cfgfile_version.map
+++ b/lib/librte_cfgfile/rte_cfgfile_version.map
@@ -13,3 +13,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.3 {
+   global:
+
+   rte_cfgfile_section_entries_by_index;
+} DPDK_2.0;
-- 
1.9.1

[dpdk-dev] [PATCH v3] vhost: fix leak of fds and mmaps

2016-02-10 Thread Rich Lane

The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.

The new code calls back into the implementation (vhost-user or vhost-cuse) to
clean up these resources.

The vhost-cuse changes are only compile tested.

Signed-off-by: Rich Lane 
Acked-by: Yuanhan Liu 
---
v2->v3:
- Rename "impl" to "backend".

v1->v2:
- Call into vhost-user/vhost-cuse to free mmaps.

 lib/librte_vhost/vhost-net.h  |  6 ++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 12 
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  1 -
 lib/librte_vhost/vhost_user/virtio-net-user.c | 25 ++---
 lib/librte_vhost/vhost_user/virtio-net-user.h |  1 -
 lib/librte_vhost/virtio-net.c |  8 +---
 6 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index c69b60b..affbd1a 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -115,4 +115,10 @@ struct vhost_net_device_ops {


 struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+
+/*
+ * Backend-specific cleanup. Defined by vhost-cuse and vhost-user.
+ */
+void vhost_backend_cleanup(struct virtio_net *dev);
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index ae2c3fa..374c884 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -421,3 +421,15 @@ int cuse_set_backend(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)

return ops->set_backend(ctx, file);
 }
+
+void
+vhost_backend_cleanup(struct virtio_net *dev)
+{
+   /* Unmap QEMU memory file if mapped. */
+   if (dev->mem) {
+   munmap((void *)(uintptr_t)dev->mem->mapped_address,
+   (size_t)dev->mem->mapped_size);
+   free(dev->mem);
+   dev->mem = NULL;
+   }
+}
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 8b7a448..336efba 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -347,7 +347,6 @@ vserver_message_handler(int connfd, void *dat, int *remove)
close(connfd);
*remove = 1;
free(cfd_ctx);
-   user_destroy_device(ctx);
ops->destroy_device(ctx);

return;
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 2934d1c..993ed71 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -339,21 +339,6 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 }

 void
-user_destroy_device(struct vhost_device_ctx ctx)
-{
-   struct virtio_net *dev = get_device(ctx);
-
-   if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-   notify_ops->destroy_device(dev);
-
-   if (dev && dev->mem) {
-   free_mem_region(dev);
-   free(dev->mem);
-   dev->mem = NULL;
-   }
-}
-
-void
 user_set_protocol_features(struct vhost_device_ctx ctx,
   uint64_t protocol_features)
 {
@@ -365,3 +350,13 @@ user_set_protocol_features(struct vhost_device_ctx ctx,

dev->protocol_features = protocol_features;
 }
+
+void
+vhost_backend_cleanup(struct virtio_net *dev)
+{
+   if (dev->mem) {
+   free_mem_region(dev);
+   free(dev->mem);
+   dev->mem = NULL;
+   }
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/vhost_user/virtio-net-user.h
index b82108d..1140ee1 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -55,5 +55,4 @@ int user_get_vring_base(struct vhost_device_ctx, struct 
vhost_vring_state *);
 int user_set_vring_enable(struct vhost_device_ctx ctx,
  struct vhost_vring_state *state);

-void user_destroy_device(struct vhost_device_ctx);
 #endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index de78a0f..cf2560e 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -199,13 +199,7 @@ cleanup_device(struct virtio_net *dev, int destroy)
 {
uint32_t i;

-   /* Unmap QEMU memory file if mapped. */
-   if (dev->mem) {

[dpdk-dev] [PATCH v6 0/2] Add VHOST PMD

2016-02-02 Thread Rich Lane

Looks good. I tested the queue state change code in particular and didn't
find any problems.

Reviewed-by: Rich Lane 
Tested-by: Rich Lane

[dpdk-dev] [PATCH v2 1/1] vhost: fix leak of fds and mmaps

2016-01-19 Thread Rich Lane

On Sun, Jan 17, 2016 at 11:58 PM, Yuanhan Liu 
wrote:

> On Sun, Jan 17, 2016 at 11:57:18AM -0800, Rich Lane wrote:
> > The common vhost code only supported a single mmap per device. vhost-user
> > worked around this by saving the address/length/fd of each mmap after
> the end
> > of the rte_virtio_memory struct. This only works if the vhost-user code
> frees
> > dev->mem, since the common code is unaware of the extra info. The
> > VHOST_USER_RESET_OWNER message is one situation where the common code
> frees
> > dev->mem and leaks the fds and mappings. This happens every time I shut
> down a
> > VM.
> >
> > The new code calls back into the implementation (vhost-user or
> vhost-cuse) to
> > clean up these resources.
> >
> > The vhost-cuse changes are only compile tested.
> >
> > Signed-off-by: Rich Lane 
> > ---
> > v1->v2:
> > - Call into vhost-user/vhost-cuse to free mmaps.
> >
> >  lib/librte_vhost/vhost-net.h  |  6 ++
> >  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 12 
> >  lib/librte_vhost/vhost_user/vhost-net-user.c  |  1 -
> >  lib/librte_vhost/vhost_user/virtio-net-user.c | 25
> ++---
> >  lib/librte_vhost/vhost_user/virtio-net-user.h |  1 -
> >  lib/librte_vhost/virtio-net.c |  8 +---
> >  6 files changed, 29 insertions(+), 24 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> > index c69b60b..e8d7477 100644
> > --- a/lib/librte_vhost/vhost-net.h
> > +++ b/lib/librte_vhost/vhost-net.h
> > @@ -115,4 +115,10 @@ struct vhost_net_device_ops {
> >
> >
> >  struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
> > +
> > +/*
> > + * Implementation-specific cleanup. Defined by vhost-cuse and
> vhost-user.
> > + */
> > +void vhost_impl_cleanup(struct virtio_net *dev);
>
> TBH, I am not quite like "_impl_"; maybe "_backend_" is better?
>

If you have a strong preference I will change it. Let me know.


> OTOH, what I thought of has slight difference than yours: not
> necessary to export a function, but instead, call the vhost
> backend specific unmap function inside the backend itself. Say,
> call vhost_user_unmap() on RESET_OWNER and connection close.
> What do you think of that?


The munmap must be done after the notify_ops->destroy_device callback. That
means
the backend can't call it before reset_owner() or destroy_device(). The
munmap could
be done afterwards, but that requires saving dev->mem in the caller in the
case of
destroy_device. The cleanest solution is for the vhost common code to ask
the
backend to clean up at the correct time.

[dpdk-dev] [PATCH v2] cfgfile: support looking up sections by index

2016-01-18 Thread Rich Lane

This is useful when sections have duplicate names.

Signed-off-by: Rich Lane 
---
v1->v2:
- Added new symbol to version script.

 lib/librte_cfgfile/rte_cfgfile.c   | 16 
 lib/librte_cfgfile/rte_cfgfile.h   | 23 +++
 lib/librte_cfgfile/rte_cfgfile_version.map |  6 ++
 3 files changed, 45 insertions(+)

diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c
index a677dad..0bb40a4 100644
--- a/lib/librte_cfgfile/rte_cfgfile.c
+++ b/lib/librte_cfgfile/rte_cfgfile.c
@@ -333,6 +333,22 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg, const 
char *sectionname,
return i;
 }

+int
+rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
+   struct rte_cfgfile_entry *entries, int max_entries)
+{
+   int i;
+   const struct rte_cfgfile_section *sect;
+
+   if (index >= cfg->num_sections)
+   return -1;
+
+   sect = cfg->sections[index];
+   for (i = 0; i < max_entries && i < sect->num_entries; i++)
+   entries[i] = *sect->entries[i];
+   return i;
+}
+
 const char *
 rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
const char *entryname)
diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h
index d443782..8e69971 100644
--- a/lib/librte_cfgfile/rte_cfgfile.h
+++ b/lib/librte_cfgfile/rte_cfgfile.h
@@ -155,6 +155,29 @@ int rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
struct rte_cfgfile_entry *entries,
int max_entries);

+/** Get section entries as key-value pairs
+*
+* The index of a section is the same as the index of its name in the
+* result of rte_cfgfile_sections. This API can be used when there are
+* multiple sections with the same name.
+*
+* @param cfg
+*   Config file
+* @param index
+*   Section index
+* @param entries
+*   Pre-allocated array of at least max_entries entries where the section
+*   entries are stored as key-value pair after successful invocation
+* @param max_entries
+*   Maximum number of section entries to be stored in entries array
+* @return
+*   Number of entries populated on success, negative error code otherwise
+*/
+int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
+   int index,
+   struct rte_cfgfile_entry *entries,
+   int max_entries);
+
 /** Get value of the named entry in named config file section
 *
 * @param cfg
diff --git a/lib/librte_cfgfile/rte_cfgfile_version.map 
b/lib/librte_cfgfile/rte_cfgfile_version.map
index bf6c6fd..f6a27a9 100644
--- a/lib/librte_cfgfile/rte_cfgfile_version.map
+++ b/lib/librte_cfgfile/rte_cfgfile_version.map
@@ -13,3 +13,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.3 {
+   global:
+
+   rte_cfgfile_section_entries_by_index;
+} DPDK_2.0;
-- 
1.9.1

[dpdk-dev] [PATCH v2 1/1] vhost: fix leak of fds and mmaps

2016-01-17 Thread Rich Lane

The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.

The new code calls back into the implementation (vhost-user or vhost-cuse) to
clean up these resources.

The vhost-cuse changes are only compile tested.

Signed-off-by: Rich Lane 
---
v1->v2:
- Call into vhost-user/vhost-cuse to free mmaps.

 lib/librte_vhost/vhost-net.h  |  6 ++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 12 
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  1 -
 lib/librte_vhost/vhost_user/virtio-net-user.c | 25 ++---
 lib/librte_vhost/vhost_user/virtio-net-user.h |  1 -
 lib/librte_vhost/virtio-net.c |  8 +---
 6 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index c69b60b..e8d7477 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -115,4 +115,10 @@ struct vhost_net_device_ops {


 struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+
+/*
+ * Implementation-specific cleanup. Defined by vhost-cuse and vhost-user.
+ */
+void vhost_impl_cleanup(struct virtio_net *dev);
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index ae2c3fa..06bfd5f 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -421,3 +421,15 @@ int cuse_set_backend(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)

return ops->set_backend(ctx, file);
 }
+
+void
+vhost_impl_cleanup(struct virtio_net *dev)
+{
+   /* Unmap QEMU memory file if mapped. */
+   if (dev->mem) {
+   munmap((void *)(uintptr_t)dev->mem->mapped_address,
+   (size_t)dev->mem->mapped_size);
+   free(dev->mem);
+   dev->mem = NULL;
+   }
+}
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 8b7a448..336efba 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -347,7 +347,6 @@ vserver_message_handler(int connfd, void *dat, int *remove)
close(connfd);
*remove = 1;
free(cfd_ctx);
-   user_destroy_device(ctx);
ops->destroy_device(ctx);

return;
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 2934d1c..190679f 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -339,21 +339,6 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 }

 void
-user_destroy_device(struct vhost_device_ctx ctx)
-{
-   struct virtio_net *dev = get_device(ctx);
-
-   if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-   notify_ops->destroy_device(dev);
-
-   if (dev && dev->mem) {
-   free_mem_region(dev);
-   free(dev->mem);
-   dev->mem = NULL;
-   }
-}
-
-void
 user_set_protocol_features(struct vhost_device_ctx ctx,
   uint64_t protocol_features)
 {
@@ -365,3 +350,13 @@ user_set_protocol_features(struct vhost_device_ctx ctx,

dev->protocol_features = protocol_features;
 }
+
+void
+vhost_impl_cleanup(struct virtio_net *dev)
+{
+   if (dev->mem) {
+   free_mem_region(dev);
+   free(dev->mem);
+   dev->mem = NULL;
+   }
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/vhost_user/virtio-net-user.h
index b82108d..1140ee1 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -55,5 +55,4 @@ int user_get_vring_base(struct vhost_device_ctx, struct 
vhost_vring_state *);
 int user_set_vring_enable(struct vhost_device_ctx ctx,
  struct vhost_vring_state *state);

-void user_destroy_device(struct vhost_device_ctx);
 #endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index de78a0f..50fc68c 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -199,13 +199,7 @@ cleanup_device(struct virtio_net *dev, int destroy)
 {
uint32_t i;

-   /* Unmap QEMU memory file if mapped. */
-   if (dev->mem) {
-   munmap((void *)(uintptr_t)dev->mem->mapped_address,
-

[dpdk-dev] [PATCH] cfgfile: support looking up sections by index

2016-01-16 Thread Rich Lane

This is useful when sections have duplicate names.

Signed-off-by: Rich Lane 
---
 lib/librte_cfgfile/rte_cfgfile.c | 16 
 lib/librte_cfgfile/rte_cfgfile.h | 23 +++
 2 files changed, 39 insertions(+)

diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c
index a677dad..0bb40a4 100644
--- a/lib/librte_cfgfile/rte_cfgfile.c
+++ b/lib/librte_cfgfile/rte_cfgfile.c
@@ -333,6 +333,22 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg, const 
char *sectionname,
return i;
 }

+int
+rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
+   struct rte_cfgfile_entry *entries, int max_entries)
+{
+   int i;
+   const struct rte_cfgfile_section *sect;
+
+   if (index >= cfg->num_sections)
+   return -1;
+
+   sect = cfg->sections[index];
+   for (i = 0; i < max_entries && i < sect->num_entries; i++)
+   entries[i] = *sect->entries[i];
+   return i;
+}
+
 const char *
 rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
const char *entryname)
diff --git a/lib/librte_cfgfile/rte_cfgfile.h b/lib/librte_cfgfile/rte_cfgfile.h
index d443782..8e69971 100644
--- a/lib/librte_cfgfile/rte_cfgfile.h
+++ b/lib/librte_cfgfile/rte_cfgfile.h
@@ -155,6 +155,29 @@ int rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
struct rte_cfgfile_entry *entries,
int max_entries);

+/** Get section entries as key-value pairs
+*
+* The index of a section is the same as the index of its name in the
+* result of rte_cfgfile_sections. This API can be used when there are
+* multiple sections with the same name.
+*
+* @param cfg
+*   Config file
+* @param index
+*   Section index
+* @param entries
+*   Pre-allocated array of at least max_entries entries where the section
+*   entries are stored as key-value pair after successful invocation
+* @param max_entries
+*   Maximum number of section entries to be stored in entries array
+* @return
+*   Number of entries populated on success, negative error code otherwise
+*/
+int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
+   int index,
+   struct rte_cfgfile_entry *entries,
+   int max_entries);
+
 /** Get value of the named entry in named config file section
 *
 * @param cfg
-- 
1.9.1

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-12 Thread Rich Lane

See my reply to "mem: add API to obstain memory-backed file info" for a
workaround. With fixes for that and the TUNSETVNETHDRSZ issue I was able to
get traffic running over vhost-user.

[dpdk-dev] [PATCH 2/4] mem: add API to obstain memory-backed file info

2016-01-11 Thread Rich Lane

On Sun, Jan 10, 2016 at 3:43 AM, Jianfeng Tan 
wrote:

> @@ -1157,6 +1180,20 @@ rte_eal_hugepage_init(void)
> mcfg->memseg[0].len = internal_config.memory;
> mcfg->memseg[0].socket_id = socket_id;
>
> +   hugepage = create_shared_memory(eal_hugepage_info_path(),
> +   sizeof(struct hugepage_file));
> +   hugepage->orig_va = addr;
> +   hugepage->final_va = addr;
> +   hugepage->physaddr = rte_mem_virt2phy(addr);
> +   hugepage->size = pagesize;
>

Should this be "hugepage->size = internal_config.memory"? Otherwise the
vhost-user
memtable entry has a size of only 2MB.

[dpdk-dev] [PATCH] vhost: fix leak of fds and mmaps

2016-01-05 Thread Rich Lane

The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.

The new code does not keep the fds around since they aren't required for
munmap. It saves the address/length in a new structure which is read by the
common code.

The vhost-cuse changes are only compile tested.

Signed-off-by: Rich Lane 
---
 lib/librte_vhost/rte_virtio_net.h | 14 +++--
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 24 ---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 90 ---
 lib/librte_vhost/virtio-net.c | 24 ++-
 lib/librte_vhost/virtio-net.h |  3 +
 5 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 10dcb90..5233879 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -144,16 +144,22 @@ struct virtio_memory_regions {
uint64_taddress_offset; /**< Offset of region for 
address translation. */
 };

+/**
+ * Record a memory mapping so that it can be munmap'd later.
+ */
+struct virtio_memory_mapping {
+   void *addr;
+   size_t length;
+};

 /**
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-   uint64_tbase_address;   /**< Base QEMU userspace address of the 
memory file. */
-   uint64_tmapped_address; /**< Mapped address of memory file base 
in our applications memory space. */
-   uint64_tmapped_size;/**< Total size of memory file. */
uint32_tnregions;   /**< Number of memory regions. */
-   struct virtio_memory_regions  regions[0]; /**< Memory region 
information. */
+   uint32_tnmappings;  /**< Number of memory mappings */
+   struct virtio_memory_regionsregions[VHOST_MEMORY_MAX_NREGIONS]; 
/**< Memory region information. */
+   struct virtio_memory_mappingmappings[VHOST_MEMORY_MAX_NREGIONS]; 
/**< Memory mappings */
 };

 /**
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index ae2c3fa..1cd0c52 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -278,15 +278,20 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
if (dev == NULL)
return -1;

-   if (dev->mem && dev->mem->mapped_address) {
-   munmap((void *)(uintptr_t)dev->mem->mapped_address,
-   (size_t)dev->mem->mapped_size);
-   free(dev->mem);
+   if (nregions > VHOST_MEMORY_MAX_NREGIONS) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "(%"PRIu64") Too many memory regions (%u, max %u)\n",
+   dev->device_fh, nregions,
+   VHOST_MEMORY_MAX_NREGIONS);
+   return -1;
+   }
+
+   if (dev->mem) {
+   rte_vhost_free_mem(dev->mem);
dev->mem = NULL;
}

-   dev->mem = calloc(1, sizeof(struct virtio_memory) +
-   sizeof(struct virtio_memory_regions) * nregions);
+   dev->mem = calloc(1, sizeof(*dev->mem));
if (dev->mem == NULL) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for dev->mem\n",
@@ -325,9 +330,10 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
dev->mem = NULL;
return -1;
}
-   dev->mem->mapped_address = mapped_address;
-   dev->mem->base_address = base_address;
-   dev->mem->mapped_size = mapped_size;
+
+   rte_vhost_add_mapping(dev->mem,
+   (void *)(uintptr_t)mapped_address,
+   mapped_size);
}
}

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 2934d1c..492927a 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -48,18 +48,6 @@
 #include "vhost-net-user.h"
 #include "vhost-net.h"

-struct orig_region_map {
-   int fd;
-   uint64_t mapped_address;
-   uint64_t mapped_size;
-   uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-28 Thread Rich Lane

On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa  wrote:
>
> Hi Rich and Yuanhan,
>
> I guess we have 2 implementations here.
>
> 1. rte_eth_vhost_get_queue_event() returns each event.
> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>
> I guess option "2" is more generic manner to handle interrupts from
> device driver.
> In the case of option "1", if DPDK application doesn't call
> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> This may exhaust memory.
>

Option 1 can be implemented in constant space by only tracking the latest
state of each
queue. I pushed a rough implementation to https://github.com/rlane/dpdk
vhost-queue-callback.

One more example is current link status interrupt handling.
> Actually ethdev API just returns current status of the port.
> What do you think?
>

Option 2 adds a small burden to the application but I'm fine with this as
long as it's
thread-safe (see my comments below).


> > An issue with having the application dig into struct virtio_net is that
> it
> > can only be safely accessed from
> > a callback on the vhost thread.
>
> Here is one of example how to invoke a callback handler registered by
> DPDK application from the PMD.
>
>   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
>
> Above function is called by interrupt handling thread of the PMDs.
>
> Please check implementation of above function.
> A callback handler that DPDK application registers is called in
> "interrupt handling context".
> (I mean the interrupt handling thread of the PMD calls the callback
> handler of DPDK application also.)
> Anyway, I guess the callback handler of DPDK application can access to
> "struct virtio_net" safely.


> > A typical application running its control
> > plane on lcore 0 would need to
> > copy all the relevant info from struct virtio_net before sending it over.
>
> Could you please describe it more?
> Sorry, probably I don't understand correctly which restriction make you
> copy data.
> (As described above, the callback handler registered by DPDK application
> can safely access "to struct virtio_net". Does this solve the copy issue?)
>

The ethdev event callback can safely access struct virtio_net, yes. The
problem is
that a real application will likely want to handle queue state changes as
part
of its main event loop running on a separate thread. Because no other thread
can safely access struct virtio_net. the callback would need to copy the
queue
states out of struct virtio_net into another datastructure before sending
it to
the main thread.

Instead of having the callback muck around with struct virtio_net, I would
prefer
an API that I could call from any thread to get the current queue states.
This
also removes struct virtio_net from the PMD's API which I think is a win.


> > As you mentioned, queues for a single vhost port could be located on
> > different NUMA nodes. I think this
> > is an uncommon scenario but if needed you could add an API to retrieve
> the
> > NUMA node for a given port
> > and queue.
> >
>
> I agree this is very specific for vhost, because in the case of generic
> PCI device, all queues of a port are on same NUMA node.
> Anyway, because it's very specific for vhost, I am not sure we should
> add ethdev API to handle this.
>
> If we handle it by vhost PMD API, we probably have 2 options also here.
>
> 1. Extend "struct rte_eth_vhost_queue_event , and use
> rte_eth_vhost_get_queue_event() like you described.
> struct rte_eth_vhost_queue_event
> {
> uint16_t queue_id;
> bool rx;
> bool enable;
> +  int socket_id;
> };
>
> 2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
> queues.
>

Up to you, but I think we can skip this for the time being because it would
be unusual
for a guest to place virtqueues for one PCI device on different NUMA nodes.

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-23 Thread Rich Lane

On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa  wrote:

> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
> yuanhan.liu at linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to
> be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg.
>
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>

You're right, I'd mistakenly thought that the PMD controlled the void *
passed to the callback.

Here's a thought:

struct rte_eth_vhost_queue_event {
uint16_t queue_id;
bool rx;
bool enable;
};

int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
rte_eth_vhost_queue_event *event);

On receiving the ethdev event the application could repeatedly call
rte_eth_vhost_get_queue_event
to find out what happened.

An issue with having the application dig into struct virtio_net is that it
can only be safely accessed from
a callback on the vhost thread. A typical application running its control
plane on lcore 0 would need to
copy all the relevant info from struct virtio_net before sending it over.

As you mentioned, queues for a single vhost port could be located on
different NUMA nodes. I think this
is an uncommon scenario but if needed you could add an API to retrieve the
NUMA node for a given port
and queue.

[dpdk-dev] [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT

2015-12-23 Thread Rich Lane

The no-refcount path was being taken without the application opting in to it.

Reported-by: Mike Stolarchuk 
Signed-off-by: Rich Lane 
---
 drivers/net/i40e/i40e_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 39d94ec..d0bdeb9 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1762,7 +1762,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
for (i = 0; i < txq->tx_rs_thresh; i++)
rte_prefetch0((txep + i)->mbuf);

-   if (!(txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT)) {
+   if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) {
for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
rte_mempool_put(txep->mbuf->pool, txep->mbuf);
txep->mbuf = NULL;
-- 
1.9.1

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-22 Thread Rich Lane

On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu 
wrote:

> On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
>
> Again I'd ask, will vring_state_changed() be enough, when above issues
> are resolved: vring_state_changed() will be invoked at new_device()/
> destroy_device(), and of course, ethtool change?

It would be sufficient. It is not a great API though, because it requires
the
application to do the conversion from struct virtio_net to a DPDK port
number,
and from a virtqueue index to a DPDK queue id and direction. Also, the
current
implementation often makes this callback when the vring state has not
actually
changed (enabled -> enabled and disabled -> disabled).

If you're asking about using vring_state_changed() _instead_ of the link
status
event and rte_eth_dev_socket_id(), then yes, it still works. I'd only
consider
that a stopgap until the real ethdev APIs are implemented.

I'd suggest to add RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
create another callback registration API.

Perhaps we could merge the basic PMD which I think is pretty solid and then
continue the API discussion with patches to it.

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-21 Thread Rich Lane

On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu 
wrote:

> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> > I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> in a few
> > ways:
>
> Rich, thanks for the info!
>
> >
> > 1. new_device/destroy_device: Link state change (will be covered by the
> link
> > status interrupt).
> > 2. new_device: Add first queue to datapath.
>
> I'm wondering why vring_state_changed() is not used, as it will also be
> triggered at the beginning, when the default queue (the first queue) is
> enabled.
>

Turns out I'd misread the code and it's already using the
vring_state_changed callback for the
first queue. Not sure if this is intentional but vring_state_changed is
called for the first queue
before new_device.

> > 3. vring_state_changed: Add/remove queue to datapath.
> > 4. destroy_device: Remove all queues (vring_state_changed is not called
> when
> > qemu is killed).
>
> I had a plan to invoke vring_state_changed() to disable all vrings
> when destroy_device() is called.
>

That would be good.

> > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>
> You can get the 'struct virtio_net' dev from all above callbacks.

> 1. Link status interrupt.
>
> To vhost pmd, new_device()/destroy_device() equals to the link status
> interrupt, where new_device() is a link up, and destroy_device() is link
> down().
>
>
> > 2. New queue_state_changed callback. Unlike vring_state_changed this
> should
> > cover the first queue at new_device and removal of all queues at
> > destroy_device.
>
> As stated above, vring_state_changed() should be able to do that, except
> the one on destroy_device(), which is not done yet.
>
> > 3. Per-queue or per-device NUMA node info.
>
> You can query the NUMA node info implicitly by get_mempolicy(); check
> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>

Your suggestions are exactly how my application is already working. I was
commenting on the
proposed changes to the vhost PMD API. I would prefer to
use RTE_ETH_EVENT_INTR_LSC
and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
of these vhost-specific
hacks. The queue state change callback is the one new API that needs to be
added because
normal NICs don't have this behavior.

You could add another rte_eth_event_type for the queue state change
callback, and pass the
queue ID, RX/TX direction, and enable bit through cb_arg. The application
would never need
to touch struct virtio_net.

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-18 Thread Rich Lane

I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
few ways:

1. new_device/destroy_device: Link state change (will be covered by the
link status interrupt).
2. new_device: Add first queue to datapath.
3. vring_state_changed: Add/remove queue to datapath.
4. destroy_device: Remove all queues (vring_state_changed is not called
when qemu is killed).
5. new_device and struct virtio_net: Determine NUMA node of the VM.

The vring_state_changed callback is necessary because the VM might not be
using the maximum number of RX queues. If I boot Linux in the VM it will
start out using one RX queue, which can be changed with ethtool. The DPDK
app in the host needs to be notified that it can start sending traffic to
the new queue.

The vring_state_changed callback is also useful for guest TX queues to
avoid reading from an inactive queue.

API I'd like to have:

1. Link status interrupt.
2. New queue_state_changed callback. Unlike vring_state_changed this should
cover the first queue at new_device and removal of all queues at
destroy_device.
3. Per-queue or per-device NUMA node info.

On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa  wrote:

> On 2015/12/18 13:15, Yuanhan Liu wrote:
> > On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/12/17 20:42, Yuanhan Liu wrote:
> >>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>  The vhost PMD will be a wrapper of vhost library, but some of vhost
>  library APIs cannot be mapped to ethdev library APIs.
>  Becasue of this, in some cases, we still need to use vhost library
> APIs
>  for a port created by the vhost PMD.
> 
>  Currently, when virtio device is created and destroyed, vhost library
>  will call one of callback handlers. The vhost PMD need to use this
>  pair of callback handlers to know which virtio devices are connected
>  actually.
>  Because we can register only one pair of callbacks to vhost library,
> if
>  the PMD use it, DPDK applications cannot have a way to know the
> events.
> 
>  This may break legacy DPDK applications that uses vhost library. To
> prevent
>  it, this patch adds one more pair of callbacks to vhost library
> especially
>  for the vhost PMD.
>  With the patch, legacy applications can use the vhost PMD even if
> they need
>  additional specific handling for virtio device creation and
> destruction.
> 
>  For example, legacy application can call
>  rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>> TBH, I never liked it since the beginning. Introducing two callbacks
> >>> for one event is a bit messy, and therefore error prone.
> >> I agree with you.
> >>
> >>> I have been thinking this occasionally last few weeks, and have came
> >>> up something that we may introduce another layer callback based on
> >>> the vhost pmd itself, by a new API:
> >>>
> >>> rte_eth_vhost_register_callback().
> >>>
> >>> And we then call those new callback inside the vhost pmd new_device()
> >>> and vhost pmd destroy_device() implementations.
> >>>
> >>> And we could have same callbacks like vhost have, but I'm thinking
> >>> that new_device() and destroy_device() doesn't sound like a good name
> >>> to a PMD driver. Maybe a name like "link_state_changed" is better?
> >>>
> >>> What do you think of that?
> >> Yes,  "link_state_changed" will be good.
> >>
> >> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> >> vhost library APIs directly.
> >> But probably you may feel strangeness about it. Is this correct?
> > Unluckily, that's true :)
> >
> >> If so, how about implementing legacy status interrupt mechanism to vhost
> >> PMD?
> >> For example, an DPDK app can register callback handler like
> >> "examples/link_status_interrupt".
> >>
> >> Also, if the app doesn't call vhost library APIs directly,
> >> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> >> need to handle virtio device structure anymore.
> >>
> >>>
> >>> On the other hand, I'm still thinking is that really necessary to let
> >>> the application be able to call vhost functions like
> rte_vhost_enable_guest_notification()
> >>> with the vhost PMD driver?
> >> Basic concept of my patch is that vhost PMD will provides the features
> >> that vhost library provides.
> > I don't think that's necessary. Let's just treat it as a normal pmd
> > driver, having nothing to do with vhost library.
> >
> >> How about removing rte_vhost_enable_guest_notification() from "vhost
> >> library"?
> >> (I also not sure what are use cases)
> >> If we can do this, vhost PMD also doesn't need to take care of it.
> >> Or if rte_vhost_enable_guest_notification() will be removed in the
> >> future, vhost PMD is able to ignore it.
> > You could either call it in vhost-pmd (which you already have done that),
> > or ignore it in vhost-pmd, but dont' remove it from vhost library.
> >
> >> Please let me correct up my

[dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst

2015-12-11 Thread Rich Lane

On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu 
wrote:
>
> +static inline struct rte_mbuf *
> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> + uint16_t desc_idx, struct rte_mempool *mbuf_pool)
> +{
>
...

> +
> +   desc = >desc[desc_idx];
> +   desc_addr = gpa_to_vva(dev, desc->addr);
> +   rte_prefetch0((void *)(uintptr_t)desc_addr);
> +
> +   /* Discard virtio header */
> +   desc_avail  = desc->len - vq->vhost_hlen;


If desc->len is zero (happens all the time when restarting DPDK apps in the
guest) then desc_avail will be huge.


> +   desc_offset = vq->vhost_hlen;
> +
> +   mbuf_avail  = 0;
> +   mbuf_offset = 0;
> +   while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) {

+   /* This desc reachs to its end, get the next one */
> +   if (desc_avail == 0) {
> +   desc = >desc[desc->next];
>

Need to check desc->next against vq->size.

There should be a limit on the number of descriptors in a chain to prevent
an infinite loop.

 uint16_t
>  rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
>  {
> ...
> avail_idx =  *((volatile uint16_t *)>avail->idx);
> -
> -   /* If there are no available buffers then return. */
> -   if (vq->last_used_idx == avail_idx)
> +   free_entries = avail_idx - vq->last_used_idx;
> +   if (free_entries == 0)
> return 0;
>

A common problem is that avail_idx goes backwards when the guest zeroes its
virtqueues. This function could check for free_entries > vq->size and reset
vq->last_used_idx.

+   LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequene %u buffers\n",
> +   dev->device_fh, count);
>

Typo at "dequene".

[dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx

2015-12-11 Thread Rich Lane

On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu 
wrote:

> +static inline int __attribute__((always_inline))
> +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
> + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied)
> +{
> ...
> +   while (1) {
> +   /* done with current mbuf, fetch next */
> +   if (mbuf_avail == 0) {
> +   m = m->next;
> +   if (m == NULL)
> +   break;
> +
> +   mbuf_offset = 0;
> +   mbuf_avail  = rte_pktmbuf_data_len(m);
> +   }
> +
> +   /* done with current desc buf, fetch next */
> +   if (desc_avail == 0) {
> +   if ((desc->flags & VRING_DESC_F_NEXT) == 0) {
> +   /* Room in vring buffer is not enough */
> +   return -1;
> +   }
> +
> +   desc = >desc[desc->next];
> +   desc_addr   = gpa_to_vva(dev, desc->addr);
> +   desc_offset = 0;
> +   desc_avail  = desc->len;
> +   }
> +
> +   COPY(desc_addr + desc_offset,
> +   rte_pktmbuf_mtod_offset(m, uint64_t, mbuf_offset));
> +   PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
> +cpy_len, 0);
> +   }
> +   *copied = rte_pktmbuf_pkt_len(m);
>

AFAICT m will always be NULL at this point so the call to rte_pktmbuf_len
will segfault.

[dpdk-dev] [PATCH v4 2/2] vhost: Add VHOST PMD

2015-11-20 Thread Rich Lane

On Thu, Nov 12, 2015 at 9:20 PM, Tetsuya Mukawa  wrote:

> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
>
...

> +
> +   /* Enqueue packets to guest RX queue */
> +   nb_tx = rte_vhost_enqueue_burst(r->device,
> +   r->virtqueue_id, bufs, nb_bufs);
> +
> +   r->tx_pkts += nb_tx;
> +   r->err_pkts += nb_bufs - nb_tx;
>

I don't think a full TX queue is counted as an error by physical NIC PMDs
like ixgbe and i40e. It is counted as an error by the af_packet, pcap, and
ring PMDs. I'd suggest not counting it as an error because it's a common
and expected condition, and the application might just retry the TX later.

Are the byte counts left out because it would be a performance hit? It
seems like it would be a minimal cost given how much we're already touching
each packet.

> +static int
> +new_device(struct virtio_net *dev)
> +{
>
...

> +
> +   if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
> +   (dev->virt_qp_nb < internal->nb_tx_queues)) {
> +   RTE_LOG(INFO, PMD, "Not enough queues\n");
> +   return -1;
> +   }
>

Would it make sense to take the minimum of the guest and host queuepairs
and use that below in place of nb_rx_queues/nb_tx_queues? That way the host
can support a large maximum number of queues and each guest can choose how
many it wants to use. The host application will receive vring_state_changed
callbacks for each queue the guest activates.

[dpdk-dev] [RFC PATCH 0/2] Virtio-net PMD Extension to work on host.

2015-11-19 Thread Rich Lane

What's the reason for using qemu as a middleman? Couldn't the new PMD
itself open /dev/vhost-net or the vhost-user socket and send the commands
to set up virtqueues? That was the approach taken by Jianfeng's earlier RFC.

On Thu, Nov 19, 2015 at 2:57 AM, Tetsuya Mukawa  wrote:

> THIS IS A PoC IMPLEMENATION.
>
> [Abstraction]
>
> Normally, virtio-net PMD only works on VM, because there is no virtio-net
> device on host.
> This RFC patch extends virtio-net PMD to be able to work on host as
> virtual PMD.
> But we didn't implement virtio-net device as a part of virtio-net PMD.
> To prepare virtio-net device for the PMD, start QEMU process with special
> QTest mode, then connect it from virtio-net PMD through unix domain socket.
>
> The PMD can connect to anywhere QEMU virtio-net device can.
> For example, the PMD can connects to vhost-net kernel module and
> vhost-user backend application.
> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
> PMD will be shared between vhost backend application.
> But vhost backend application memory will not be shared.
>
> Main target of this PMD is container like docker, rkt, lxc and etc.
> We can isolate related processes(virtio-net PMD process, QEMU and
> vhost-user backend process) by container.
> But, to communicate through unix domain socket, shared directory will be
> needed.
>
>
> [How to use]
>
> So far, we need QEMU patch to connect to vhost-user backend.
> Please check known issue in later section.
> Because of this, I will describe example of using vhost-net kernel module.
>
>  - Compile
>  Set "CONFIG_RTE_VIRTIO_VDEV=y" in config/common_linux.
>  Then compile it.
>
>  - Start QEMU like below.
>  $ sudo qemu-system-x86_64 -qtest unix:/tmp/qtest0,server -machine
> accel=qtest \
>-display none -qtest-log /dev/null \
>-netdev
> type=tap,script=/etc/qemu-ifup,id=net0,vhost=on \
>-device virtio-net-pci,netdev=net0 \
>-chardev
> socket,id=chr1,path=/tmp/ivshmem0,server \
>-device ivshmem,size=1G,chardev=chr1,vectors=1
>
>  - Start DPDK application like below
>  $ sudo ./testpmd -c f -n 1 -m 1024 --shm \
>
> --vdev="eth_cvio0,qtest=/tmp/qtest0,ivshmem=/tmp/ivshmem0" -- \
>   --disable-hw-vlan --txqflags=0xf00 -i
>
>  - Check created tap device.
>
> (*1) Please Specify same memory size in QEMU and DPDK command line.
>
>
> [Detailed Description]
>
>  - virtio-net device implementation
> The PMD uses QEMU virtio-net device. To do that, QEMU QTest functionality
> is used.
> QTest is a test framework of QEMU devices. It allows us to implement a
> device driver outside of QEMU.
> With QTest, we can implement DPDK application and virtio-net PMD as
> standalone process on host.
> When QEMU is invoked as QTest mode, any guest code will not run.
> To know more about QTest, see below.
> http://wiki.qemu.org/Features/QTest
>
>  - probing devices
> QTest provides a unix domain socket. Through this socket, driver process
> can access to I/O port and memory of QEMU virtual machine.
> The PMD will send I/O port accesses to probe pci devices.
> If we can find virtio-net and ivshmem device, initialize the devices.
> Also, I/O port accesses of virtio-net PMD will be sent through socket, and
> virtio-net PMD can initialize vitio-net device on QEMU correctly.
>
>  - ivshmem device to share memory
> To share memory that virtio-net PMD process uses, ivshmem device will be
> used.
> Because ivshmem device can only handle one file descriptor, shared memory
> should be consist of one file.
> To allocate such a memory, EAL has new option called "--shm".
> If the option is specified, EAL will open a file and allocate memory from
> hugepages.
> While initializing ivshmem device, we can set BAR(Base Address Register).
> It represents which memory QEMU vcpu can access to this shared memory.
> We will specify host physical address of shared memory as this address.
> It is very useful because we don't need to apply patch to QEMU to
> calculate address offset.
> (For example, if virtio-net PMD process will allocate memory from shared
> memory, then specify the physical address of it to virtio-net register,
> QEMU virtio-net device can understand it without calculating address
> offset.)
>
>  - Known limitation
> So far, the PMD doesn't handle interrupts from QEMU devices.
> Because of this, VIRTIO_NET_F_STATUS functionality is dropped.
> But without it, we can use all virtio-net functions.
>
>  - Known issues
> So far, to use vhost-user, we need to apply vhost-user patch to QEMU and
> DPDK vhost library.
> This is because, QEMU will not send memory information and file descriptor
> of ivshmem device to vhost-user backend.
> (Anyway, vhost-net kernel module can receive the information. So
> vhost-user behavior will not be correct. I will submit the patch to QEMU
> soon)
> Also, we may have an issue in DPDK vhost library to

[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-17 Thread Rich Lane

On Tue, Nov 17, 2015 at 6:56 PM, Yuanhan Liu 
wrote:

> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx,
> _len, _idx);
> +   if (secure_len == 0)
> +   goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
>

I think this needs to check whether secure_len was modified. secure_len is
read-write and could have a nonzero value going into the call. It could be
cleaner to give update_secure_len a return value saying whether it was able
to reserve any buffers.

Otherwise looks good, thanks!

[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-17 Thread Rich Lane

On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu 
wrote:

> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> > You can reproduce this with l2fwd and the vhost PMD.
> >
> > You'll need this patch on top of the vhost PMD patches:
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> > return -1;
> >
> > if (dev->flags & VIRTIO_DEV_RUNNING)
> > -   notify_ops->destroy_device(dev);
> > +   notify_destroy_device(dev);
> >
> > cleanup_device(dev);
> > reset_device(dev);
> >
> > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> > eth_vhost0,iface=/run/vhost0.sock -- -p3
> > 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> > 4. Kill the l2fwd inside the VM with SIGINT.
> > 5. Start l2fwd inside the VM.
> > 6. l2fwd on the host crashes.
> >
> > I found the source of the memory corruption by setting a watchpoint in
> > gdb: watch -l rte_eth_devices[1].data->rx_queues
>
> Rich,
>
> Thanks for the detailed steps for reproducing this issue, and sorry for
> being a bit late: I finally got the time to dig this issue today.
>
> Put simply, buffer overflow is not the root cause, but the fact "we do
> not release resource on stop/exit" is.
>
> And here is how the issue comes.  After step 4 (terminating l2fwd), neither
> the l2fwd nor the virtio pmd driver does some resource release. Hence,
> l2fwd at HOST will not notice such chage, still trying to receive and
> queue packets to the vhost dev. It's not an issue as far as we don't
> start l2fwd again, for there is actaully no packets to forward, and
> rte_vhost_dequeue_burst returns from:
>
> 596 avail_idx =  *((volatile uint16_t *)>avail->idx);
> 597
> 598 /* If there are no available buffers then return. */
> 599 if (vq->last_used_idx == avail_idx)
> 600 return 0;
>
> But just at the init stage while starting l2fwd (step 5),
> rte_eal_memory_init()
> resets all huge pages memory to zero, resulting all vq->desc[] items
> being reset to zero, which in turn ends up with secure_len being set
> with 0 at return.
>
> (BTW, I'm not quite sure why the inside VM huge pages memory reset
> would results to vq->desc reset).
>
> The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
> as update_secure_len() keeps setting secure_len with 0:
>
> 511do {
> 512avail_idx = *((volatile uint16_t
> *)>avail->idx);
> 513if (unlikely(res_cur_idx == avail_idx))
> {
> 514LOG_DEBUG(VHOST_DATA,
> 515"(%"PRIu64") Failed "
> 516"to get enough desc
> from "
> 517"vring\n",
> 518dev->device_fh);
> 519goto merge_rx_exit;
> 520} else {
> 521update_secure_len(vq,
> res_cur_idx, _len, _idx);
> 522res_cur_idx++;
> 523}
> 524} while (pkt_len > secure_len);
>
> The dead loop causes vec_idx keep increasing then, and overflows
> quickly, leading to the crash in the end as you saw.
>
> So, the following would resolve this issue, in a right way (I
> guess), and it's for virtio-pmd and l2fwd only so far.
>
> ---
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 12fcc23..8d6bf56 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1507,9 +1507,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
> struct rte_eth_link link;
> +   struct virtio_hw *hw = dev->data->dev_private;
>
> PMD_INIT_LOG(DEBUG, "stop");
>
> +   vtpci_reset(hw);
> +
> if (dev->data->dev_conf.intr_conf.lsc)
> rte_intr_disable(>pci_dev->intr_handle);
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..565f648 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
>  #

[dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD

2015-11-12 Thread Rich Lane

>
> +   if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +   ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +_iface, _name);
> +   if (ret < 0)
> +   goto out_free;
> +   }
>

I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass
the iface option, so this should probably return an error if the option
doesn't exist.

[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-12 Thread Rich Lane

You can reproduce this with l2fwd and the vhost PMD.

You'll need this patch on top of the vhost PMD patches:
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
return -1;

if (dev->flags & VIRTIO_DEV_RUNNING)
-   notify_ops->destroy_device(dev);
+   notify_destroy_device(dev);

cleanup_device(dev);
reset_device(dev);

1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
eth_vhost0,iface=/run/vhost0.sock -- -p3
2. Start a VM using vhost-user and set up uio, hugepages, etc.
3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
4. Kill the l2fwd inside the VM with SIGINT.
5. Start l2fwd inside the VM.
6. l2fwd on the host crashes.

I found the source of the memory corruption by setting a watchpoint in
gdb: watch -l rte_eth_devices[1].data->rx_queues

On Thu, Nov 12, 2015 at 1:23 AM, Yuanhan Liu 
wrote:

> On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> > The guest could trigger this buffer overflow by creating a cycle of
> descriptors
> > (which would also cause an infinite loop). The more common case is that
> > vq->avail->idx jumps out of the range [last_used_idx,
> last_used_idx+256). This
> > happens nearly every time when restarting a DPDK app inside a VM
> connected to a
> > vhost-user vswitch because the virtqueue memory allocated by the
> previous run
> > is zeroed.
>
> Hi,
>
> I somehow was aware of this issue before while reading the code.
> Thinking that we never met that, I delayed the fix (it was still
> in my TODO list).
>
> Would you please tell me the steps (commands would be better) to
> reproduce your issue? I'd like to know more about the isue: I'm
> guessing maybe we need fix it with a bit more cares.
>
> --yliu
> >
> > Signed-off-by: Rich Lane 
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c
> b/lib/librte_vhost/vhost_rxtx.c
> > index 9322ce6..d95b478 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq,
> uint32_t id,
> >   vq->buf_vec[vec_id].desc_idx = idx;
> >   vec_id++;
> >
> > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id <
> BUF_VECTOR_MAX) {
> >   idx = vq->desc[idx].next;
> >   next_desc = 1;
> >   }
> > --
> > 1.9.1
>

[dpdk-dev] [PATCH] vhost: reset device properly

2015-11-12 Thread Rich Lane

On Wed, Nov 11, 2015 at 8:10 PM, Yuanhan Liu 
wrote:

> Currently, we reset all fields of a device to zero when reset
> happens, which is wrong, since for some fields like device_fh,
> ifname, and virt_qp_nb, they should be same and be kept after
> reset until the device is removed. And this is what's the new
> helper function reset_device() for.
>
> And use rte_zmalloc() instead of rte_malloc, so that we could
> avoid init_device(), which basically dose zero reset only so far.
> Hence, init_device() is dropped in this patch.
>
> This patch also removes a hack of using the offset a specific
> field (which is virtqueue now) inside of `virtio_net' structure
> to do reset, which could be broken easily if someone changed the
> field order without caution.
>
> Cc: Tetsuya Mukawa 
> Cc: Xie Huawei 
> Signed-off-by: Yuanhan Liu 
>

I had a patch that just saved the ifname but this is much better.

Acked-by: Rich Lane

[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-12 Thread Rich Lane

The guest could trigger this buffer overflow by creating a cycle of descriptors
(which would also cause an infinite loop). The more common case is that
vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This
happens nearly every time when restarting a DPDK app inside a VM connected to a
vhost-user vswitch because the virtqueue memory allocated by the previous run
is zeroed.

Signed-off-by: Rich Lane 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 9322ce6..d95b478 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
vq->buf_vec[vec_id].desc_idx = idx;
vec_id++;

-   if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
+   if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < 
BUF_VECTOR_MAX) {
idx = vq->desc[idx].next;
next_desc = 1;
}
-- 
1.9.1

[dpdk-dev] [PATCH] vhost: make destroy callback on VHOST_USER_RESET_OWNER

2015-11-09 Thread Rich Lane

QEMU sends this message first when shutting down. There was previously no way
for the dataplane to know that the virtio_net instance had become unusable and
it would segfault when trying to do RX/TX.

Signed-off-by: Rich Lane 
---
 lib/librte_vhost/virtio-net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 14278de..39a6a5e 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -436,6 +436,9 @@ reset_owner(struct vhost_device_ctx ctx)
if (dev == NULL)
return -1;

+   if (dev->flags & VIRTIO_DEV_RUNNING)
+   notify_ops->destroy_device(dev);
+
device_fh = dev->device_fh;
cleanup_device(dev);
init_device(dev);
-- 
1.9.1

[dpdk-dev] [PATCH] default to using all cores if no -c, -l, or --lcores options given

2015-09-25 Thread Rich Lane

This is a useful default for simple applications where the assignment of lcores
to CPUs doesn't matter. It's also useful for more complex applications that
automatically assign tasks to cores based on the NUMA topology.

Signed-off-by: Rich Lane 
---
 app/test/test_eal_flags.c|  9 +++--
 doc/guides/freebsd_gsg/build_sample_apps.rst |  2 +-
 doc/guides/linux_gsg/build_sample_apps.rst   |  2 +-
 lib/librte_eal/common/eal_common_lcore.c |  3 +++
 lib/librte_eal/common/eal_common_options.c   | 11 +--
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index e6f7035..e0aee2d 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -571,11 +571,16 @@ test_missing_c_flag(void)
 "-n", "3", "--lcores",
 "0-1,2@(5-7),(3-5)@(0,2),(0,6),7"};

+   if (launch_proc(argv2) != 0) {
+   printf("Error - "
+  "process did not run ok when missing -c flag\n");
+   return -1;
+   }
+
if (launch_proc(argv1) == 0
-   || launch_proc(argv2) == 0
|| launch_proc(argv3) == 0) {
printf("Error - "
-  "process ran without error when missing -c flag\n");
+  "process ran without error with invalid -c flag\n");
return -1;
}
if (launch_proc(argv4) != 0) {
diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst 
b/doc/guides/freebsd_gsg/build_sample_apps.rst
index acd0311..a89055f 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -114,7 +114,7 @@ The following is the list of options that can be given to 
the EAL:

 .. code-block:: console

-./rte-app -c COREMASK -n NUM [-b ] [-r NUM] [-v] 
[--proc-type <primary|secondary|auto>]
+./rte-app -n NUM [-c COREMASK] [-b ] [-r NUM] [-v] 
[--proc-type <primary|secondary|auto>]

 .. note::

diff --git a/doc/guides/linux_gsg/build_sample_apps.rst 
b/doc/guides/linux_gsg/build_sample_apps.rst
index e0de2f5..07d84df 100644
--- a/doc/guides/linux_gsg/build_sample_apps.rst
+++ b/doc/guides/linux_gsg/build_sample_apps.rst
@@ -109,7 +109,7 @@ The following is the list of options that can be given to 
the EAL:

 .. code-block:: console

-./rte-app -c COREMASK -n NUM [-b ] 
[--socket-mem=MB,...] [-m MB] [-r NUM] [-v] [--file-prefix] [--proc-type 
<primary|secondary|auto>] [-- xen-dom0]
+./rte-app -n NUM [-c COREMASK] [-b ] 
[--socket-mem=MB,...] [-m MB] [-r NUM] [-v] [--file-prefix] [--proc-type 
<primary|secondary|auto>] [-- xen-dom0]

 The EAL options are as follows:

diff --git a/lib/librte_eal/common/eal_common_lcore.c 
b/lib/librte_eal/common/eal_common_lcore.c
index 845140b..a4263ba 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -63,6 +63,8 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   lcore_config[lcore_id].core_index = count;
+
/* init cpuset for per lcore config */
CPU_ZERO(_config[lcore_id].cpuset);

@@ -70,6 +72,7 @@ rte_eal_cpu_init(void)
lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
+   lcore_config[lcore_id].core_index = -1;
continue;
}

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..2e40857 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -93,7 +93,6 @@ eal_long_options[] = {
{0, 0, NULL, 0}
 };

-static int lcores_parsed;
 static int master_lcore_parsed;
 static int mem_parsed;

@@ -212,7 +211,6 @@ eal_parse_coremask(const char *coremask)
return -1;
/* Update the count of enabled logical cores of the EAL configuration */
cfg->lcore_count = count;
-   lcores_parsed = 1;
return 0;
 }

@@ -279,7 +277,6 @@ eal_parse_corelist(const char *corelist)
/* Update the count of enabled logical cores of the EAL configuration */
cfg->lcore_count = count;

-   lcores_parsed = 1;
return 0;
 }

@@ -569,7 +566,6 @@ eal_parse_lcores(const char *lcores)
goto err;

cfg->lcore_count = count;
-   lcores_parsed = 1;
ret = 0;

 err:
@@ -820,11 +816,6 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
 {
struct rte_confi

55 matches

Mail list logo