Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-15 Thread Denis V. Lunev

On 14/10/14 13:10, Michael S. Tsirkin wrote:

On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote:

Michael S. Tsirkin m...@redhat.com writes:


On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:

Denis V. Lunev d...@parallels.com writes:

From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.


This makes some amount of sense.


This reminds me of the balloon fs that Google once proposed.
This really needs to be controlled from host though.
At the moment host does not expect guest to deflate before
requests.
So as a minimum, add a feature bit for this.  what if you want a mix of
mandatory and optional balooning? I guess we can use multiple balloons,
is that the idea?


Trying to claw back some pages on OOM is almost certainly correct,
even if the host doesn't expect it.  It's roughly equivalent to not
giving up pages in the first place.


Well the difference is that there are management tools that
poll balloon in host until they see balloon size reaches
the expected value.

They don't expect balloon to shrink below num_pages and will respond in various
unexpected ways like e.g. killing the VM if it does.
Killing a userspace process within the guest might be better
for VM health.

Besides the fact that we always did it like this, these tools seem to have
basis in the spec.
Specifically, this is based on this text from the spec:
the device asks for a certain amount of memory, and the driver
supplies it (or withdraws it, if the device has more than it asks for).
This allows the guest to adapt to changes in allowance of underlying
physical memory.

and

The device is driven by the receipt of a configuration change interrupt.




Cheers,
Rusty.
PS.  Yes, a real guest-driven balloon is preferable, but that's a much
  larger task.



Any objection to making the feature depend on a feature flag?




OK. I got the point. This sounds good for me. We will prepare patch for
kernel and proper bits (command line option) for QEMU.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors

2014-10-15 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Below should be useful for some experiments Jason is doing.
 I thought I'd send it out for early review/feedback.

 event idx feature allows us to defer interrupts until
 a specific # of descriptors were used.
 Sometimes it might be useful to get an interrupt after
 a specific descriptor, regardless.
 This adds a descriptor flag for this, and an API
 to create an urgent output descriptor.
 This is still an RFC:
 we'll need a feature bit for drivers to detect this,
 but we've run out of feature bits for virtio 0.X.
 For experimentation purposes, drivers can assume
 this is set, or add a driver-specific feature bit.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-15 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:
  Denis V. Lunev d...@parallels.com writes:
   From: Raushaniya Maksudova rmaksud...@parallels.com
  
   Excessive virtio_balloon inflation can cause invocation of OOM-killer,
   when Linux is under severe memory pressure. Various mechanisms are
   responsible for correct virtio_balloon memory management. Nevertheless
   it is often the case that these control tools does not have enough time
   to react on fast changing memory load. As a result OS runs out of memory
   and invokes OOM-killer. The balancing of memory by use of the virtio
   balloon should not cause the termination of processes while there are
   pages in the balloon. Now there is no way for virtio balloon driver to
   free some memory at the last moment before some process will be get
   killed by OOM-killer.
  
  This makes some amount of sense.
 
  This reminds me of the balloon fs that Google once proposed.
  This really needs to be controlled from host though.
  At the moment host does not expect guest to deflate before
  requests.
  So as a minimum, add a feature bit for this.  what if you want a mix of
  mandatory and optional balooning? I guess we can use multiple balloons,
  is that the idea?
 
 Trying to claw back some pages on OOM is almost certainly correct,
 even if the host doesn't expect it.  It's roughly equivalent to not
 giving up pages in the first place.

 Well the difference is that there are management tools that
 poll balloon in host until they see balloon size reaches
 the expected value.

 They don't expect balloon to shrink below num_pages and will respond in 
 various
 unexpected ways like e.g. killing the VM if it does.
 Killing a userspace process within the guest might be better
 for VM health.

 Besides the fact that we always did it like this, these tools seem to have
 basis in the spec.
 Specifically, this is based on this text from the spec:
   the device asks for a certain amount of memory, and the driver
   supplies it (or withdraws it, if the device has more than it asks for).
   This allows the guest to adapt to changes in allowance of underlying
   physical memory.

 and

   The device is driven by the receipt of a configuration change interrupt.



 Cheers,
 Rusty.
 PS.  Yes, a real guest-driven balloon is preferable, but that's a much
  larger task.


 Any objection to making the feature depend on a feature flag?

If you believe a guest which does this will cause drastic failure on the
host side (ie. killing the VM), then yes, we can do this.

However, I'm not aware of anything that sophisticated...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_balloon: Convert vballon kthread into a workqueue

2014-10-15 Thread Rusty Russell
Petr Mladek pmla...@suse.cz writes:
 Workqueues have clean and rich API for all basic operations. The code is 
 usually
 easier and better readable. It can be easily tuned for the given purpose.

OK, sure.

 -static void fill_balloon(struct virtio_balloon *vb, size_t num)
 +static void fill_balloon(struct virtio_balloon *vb, size_t diff)
  {
   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 + size_t num;
 + bool done;
  
   /* We can only do one array worth at a time. */
 - num = min(num, ARRAY_SIZE(vb-pfns));
 + num = min(diff, ARRAY_SIZE(vb-pfns));
 + done = (num == diff) ? true : false;
  
   mutex_lock(vb-balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
 @@ -143,6 +143,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
VIRTIO_BALLOON_PAGES_PER_PAGE);
   /* Sleep for at least 1/5 of a second before retry. */
   msleep(200);
 + done = false;
   break;
   }
   set_page_pfns(vb-pfns + vb-num_pfns, page);
 @@ -154,6 +155,9 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   if (vb-num_pfns != 0)
   tell_host(vb, vb-inflate_vq);
   mutex_unlock(vb-balloon_lock);
 +
 + if (!done)
 + queue_work(vb-wq, vb-wq_work);
  }

Hmm, this is a bit convoluted.  I would spell it out by keeping a
num_done counter:

if (num_done  diff)
queue_work(vb-wq, vb-wq_work);

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -168,20 +172,25 @@ static void release_pages_by_pfn(const u32 pfns[], 
 unsigned int num)
   }
  }
  
 -static void leak_balloon(struct virtio_balloon *vb, size_t num)
 +static void leak_balloon(struct virtio_balloon *vb, size_t diff)
  {
   struct page *page;
   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 + size_t num;
 + bool done;
  
   /* We can only do one array worth at a time. */
 - num = min(num, ARRAY_SIZE(vb-pfns));
 + num = min(diff, ARRAY_SIZE(vb-pfns));
 + done = (num == diff) ? true : false;
  
   mutex_lock(vb-balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
   page = balloon_page_dequeue(vb_dev_info);
 - if (!page)
 + if (!page) {
 + done = false;
   break;
 + }
   set_page_pfns(vb-pfns + vb-num_pfns, page);
   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
   }
 @@ -195,6 +204,9 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
   tell_host(vb, vb-deflate_vq);
   mutex_unlock(vb-balloon_lock);
   release_pages_by_pfn(vb-pfns, vb-num_pfns);
 +
 + if (!done)
 + queue_work(vb-wq, vb-wq_work);
  }

Here too.

 @@ -471,11 +469,13 @@ static int virtballoon_probe(struct virtio_device *vdev)
   if (err)
   goto out_free_vb_mapping;
  
 - vb-thread = kthread_run(balloon, vb, vballoon);
 - if (IS_ERR(vb-thread)) {
 - err = PTR_ERR(vb-thread);
 + vb-wq = alloc_workqueue(vballoon_wq,
 +  WQ_FREEZABLE | WQ_MEM_RECLAIM, 0);
 + if (!vb-wq) {
 + err = -ENOMEM;
   goto out_del_vqs;
   }
 + INIT_WORK(vb-wq_work, balloon);

That looks racy to me; shouldn't we init vq-work before allocating wq?

Otherwise, looks good!

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 03/25] virtio-pci: move freeze/restore to virtio core

2014-10-15 Thread Paul Bolle
On Mon, 2014-10-13 at 10:48 +0300, Michael S. Tsirkin wrote:
 This is in preparation to extending config changed event handling
 in core.
 Wrapping these in an API also seems to make for a cleaner code.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

This patch landed in today's linux-next (next-20141015).

  include/linux/virtio.h  |  6 +
  drivers/virtio/virtio.c | 54 
 +
  drivers/virtio/virtio_pci.c | 54 
 ++---
  3 files changed, 62 insertions(+), 52 deletions(-)
 
 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index 3c19bd3..8df7ba8 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  /**
   * virtio_device - representation of a device using virtio
   * @index: unique position on the virtio bus
 + * @failed: saved value for CONFIG_S_FAILED bit (for restore)

s/CONFIG_S_FAILED/VIRTIO_CONFIG_S_FAILED/?

   * @dev: underlying device.
   * @id: the device type identification (used to match it with a driver).
   * @config: the configuration ops for this device.


Paul Bolle

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 07:06 AM, Michael S. Tsirkin wrote:
 On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
  From: Jason Wang jasow...@redhat.com
  Date: Sat, 11 Oct 2014 15:16:43 +0800
  
   We free old transmitted packets in ndo_start_xmit() currently, so any
   packet must be orphaned also there. This was used to reduce the 
   overhead of
   tx interrupt to achieve better performance. But this may not work for 
   some
   protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc 
   to
   implement various optimization for small packets stream such as TCP 
   small
   queue and auto corking. But orphaning packets early in ndo_start_xmit()
   disable such things more or less since sk_wmem_alloc was not accurate. 
   This
   lead extra low throughput for TCP stream of small writes.
   
   This series tries to solve this issue by enable tx interrupts for all 
   TCP
   packets other than the ones with push bit or pure ACK. This is done 
   through
   the support of urgent descriptor which can force an interrupt for a
   specified packet. If tx interrupt was enabled for a packet, there's no 
   need
   to orphan it in ndo_start_xmit(), we can free it tx napi which is 
   scheduled
   by tx interrupt. Then sk_wmem_alloc was more accurate than before and 
   TCP
   can batch more for small write. More larger skb was produced by TCP in 
   this
   case to improve both throughput and cpu utilization.
   
   Test shows great improvements on small write tcp streams. For most of 
   the
   other cases, the throughput and cpu utilization are the same in the
   past. Only few cases, more cpu utilization was noticed which needs more
   investigation.
   
   Review and comments are welcomed.
  
  I think proper accounting and queueing (at all levels, not just TCP
  sockets) is more important than trying to skim a bunch of cycles by
  avoiding TX interrupts.
  
  Having an event to free the SKB is absolutely essential for the stack
  to operate correctly.
  
  And with virtio-net you don't even have the excuse of the HW
  unfortunately doesn't have an appropriate TX event.
  
  So please don't play games, and instead use TX interrupts all the
  time.  You can mitigate them in various ways, but don't turn them on
  selectively based upon traffic type, that's terrible.
  
  You can even use -xmit_more to defer the TX interrupt indication to
  the final TX packet in the chain.
 I guess we can just defer the kick, interrupt will naturally be
 deferred as well.
 This should solve the problem for old hosts as well.

Interrupt were delayed but not reduced, to support this we need publish
avail idx as used event. This should reduce the tx interrupt in the case
of bulk dequeuing.

I will draft a new rfc series contain this.

 We'll also need to implement bql for this.
 Something like the below?
 Completely untested - posting here to see if I figured the
 API out correctly. Has to be applied on top of the previous patch.

Looks so. I believe better to have but not a must.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Jason Wang
Accumulate the sent packets and sent bytes in local variables and perform a
single u64_stats_update_begin/end() after.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
unsigned int len;
struct virtnet_info *vi = sq-vq-vdev-priv;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   u64 tx_bytes = 0, tx_packets = 0;
 
while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
pr_debug(Sent skb %p\n, skb);
 
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
+   tx_bytes += skb-len;
+   tx_packets++;
 
dev_kfree_skb_any(skb);
}
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += tx_bytes;
+   stats-tx_packets =+ tx_packets;
+   u64_stats_update_end(stats-tx_syncp);
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Jason Wang
This patch introduces virtio_enable_cb_avail() to publish avail idx
and used event. This could be used by batched buffer submitting to
reduce the number of tx interrupts.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/virtio/virtio_ring.c |   22 --
 include/linux/virtio.h   |2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b3929f..d67fbf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
*_vq)
 * entry. Always do both to keep code simple. */
vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
/* Make sure used event never go backwards */
-   if (!vring_need_event(vring_used_event(vq-vring),
- vq-vring.avail-idx, last_used_idx))
+   if (vq-vring.avail-idx != vring_used_event(vq-vring) 
+   !vring_need_event(vring_used_event(vq-vring),
+ vq-vring.avail-idx, last_used_idx)) {
vring_used_event(vq-vring) = last_used_idx;
+   }
END_USE(vq);
return last_used_idx;
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   bool ret;
+
+   START_USE(vq);
+   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
+   vring_used_event(vq-vring) = vq-vring.avail-idx;
+   ret = vring_need_event(vq-vring.avail-idx,
+  vq-last_used_idx, vq-vring.used-idx);
+   END_USE(vq);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
+
 /**
  * virtqueue_poll - query pending used buffers
  * @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b46671e..bfaf058 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net

2014-10-15 Thread Jason Wang
According to David, proper accounting and queueing (at all levels, not
just TCP sockets) is more important than trying to skim a bunch of
cycles by avoiding TX interrupts. Having an event to free the SKB is
absolutely essential for the stack to operate correctly.

This series tries to enable tx interrupt for virtio-net. The idea is
simple: enable tx interrupt and schedule a tx napi to free old xmit
skbs.

Several notes:
- Tx interrupt storm avoidance when queue is about to be full is
  kept. Since we may enable callbacks in both ndo_start_xmit() and tx
  napi, patch 1 adds a check to make sure used event never go
  back. This will let the napi not enable the callbacks wrongly after
  delayed callbacks was used.
- For bulk dequeuing, there's no need to enable tx interrupt for each
  packet. The last patch only enable tx interrupt for the final skb in
  the chain through xmit_more and a new helper to publish current avail
  idx as used event.

This series fixes several issues of original rfc pointed out by Michael.

Smoking test is done, and will do complete netperf test. Send the
seires for early review.

Thanks

Jason Wang (6):
  virtio: make sure used event never go backwards
  virtio: introduce virtio_enable_cb_avail()
  virtio-net: small optimization on free_old_xmit_skbs()
  virtio-net: return the number of packets sent in free_old_xmit_skbs()
  virtio-net: enable tx interrupt
  virtio-net: enable tx interrupt only for the final skb in the chain

 drivers/net/virtio_net.c |  125 ++
 drivers/virtio/virtio_ring.c |   25 -
 include/linux/virtio.h   |2 +
 3 files changed, 115 insertions(+), 37 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain

2014-10-15 Thread Jason Wang
With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
enable tx interrupt only for the final skb in the chain if
needed. This will help to mitigate tx interrupts.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2afc2e2..5943f3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
virtqueue_disable_cb(sq-vq);
}
}
-   } else if (virtqueue_enable_cb(sq-vq)) {
-   free_old_xmit_skbs(sq, qsize);
}
 
-   if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
+   if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) {
virtqueue_kick(sq-vq);
+   if (sq-vq-num_free = 2 +MAX_SKB_FRAGS 
+   unlikely(virtqueue_enable_cb_avail(sq-vq))) {
+   free_old_xmit_skbs(sq, qsize);
+   virtqueue_disable_cb(sq-vq);
+   }
+   }
 
return NETDEV_TX_OK;
 }
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Jason Wang
Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.

This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.

More tx interrupt mitigation method could be used on top.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c |  125 +++---
 1 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
/* Name of the send queue: output.$index */
char name[40];
+
+   struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+   struct sk_buff *skb;
+   unsigned int len;
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   u64 tx_bytes = 0, tx_packets = 0;
+
+   while (tx_packets  budget 
+  (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
+   pr_debug(Sent skb %p\n, skb);
+
+   tx_bytes += skb-len;
+   tx_packets++;
+
+   dev_kfree_skb_any(skb);
+   }
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += tx_bytes;
+   stats-tx_packets =+ tx_packets;
+   u64_stats_update_end(stats-tx_syncp);
+
+   return tx_packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq-vdev-priv;
+   struct send_queue *sq = vi-sq[vq2txq(vq)];
 
-   /* Suppress further interrupts. */
-   virtqueue_disable_cb(vq);
-
-   /* We were probably waiting for more output buffers. */
-   netif_wake_subqueue(vi-dev, vq2txq(vq));
+   if (napi_schedule_prep(sq-napi)) {
+   __napi_schedule(sq-napi);
+   }
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,7 +801,39 @@ again:
return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+   struct send_queue *sq =
+   container_of(napi, struct send_queue, napi);
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
+   unsigned int r, sent = 0;
+
+again:
+   __netif_tx_lock(txq, smp_processor_id());
+   virtqueue_disable_cb(sq-vq);
+   sent += free_old_xmit_skbs(sq, budget - sent);
+
+   if (sent  budget) {
+   r = virtqueue_enable_cb_prepare(sq-vq);
+   napi_complete(napi);
+   __netif_tx_unlock(txq);
+   if (unlikely(virtqueue_poll(sq-vq, r)) 
+   napi_schedule_prep(napi)) {
+   virtqueue_disable_cb(sq-vq);
+   __napi_schedule(napi);
+   goto again;
+   }
+   } else {
+   __netif_tx_unlock(txq);
+   }
+
+   netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
+   return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
+
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
 {
@@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
virtnet_napi_enable(vi-rq[i]);
+   napi_enable(vi-sq[i].napi);
}
 
return 0;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq)
-{
-   struct sk_buff *skb;
-   unsigned int len;
-   struct virtnet_info *vi = sq-vq-vdev-priv;
-   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
-   u64 tx_bytes = 0, tx_packets = 0;
-
-   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
-   pr_debug(Sent skb %p\n, skb);
-
-   tx_bytes += skb-len;
-   tx_packets++;
-
-   dev_kfree_skb_any(skb);
-   }
-
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += tx_bytes;
-   stats-tx_packets =+ tx_packets;
-   u64_stats_update_end(stats-tx_syncp);
-
-   return tx_packets;
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
struct skb_vnet_hdr *hdr;
@@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
sg_set_buf(sq-sg, hdr, hdr_len);
num_sg = 

Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
 This patch introduces virtio_enable_cb_avail() to publish avail idx
 and used event. This could be used by batched buffer submitting to
 reduce the number of tx interrupts.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/virtio/virtio_ring.c |   22 --
  include/linux/virtio.h   |2 ++
  2 files changed, 22 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 1b3929f..d67fbf8 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
 *_vq)
* entry. Always do both to keep code simple. */
   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
   /* Make sure used event never go backwards */
 - if (!vring_need_event(vring_used_event(vq-vring),
 -   vq-vring.avail-idx, last_used_idx))
 + if (vq-vring.avail-idx != vring_used_event(vq-vring) 
 + !vring_need_event(vring_used_event(vq-vring),
 +   vq-vring.avail-idx, last_used_idx)) {
   vring_used_event(vq-vring) = last_used_idx;
 + }
   END_USE(vq);
   return last_used_idx;
  }
  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);


I see you are also changing virtqueue_enable_cb_prepare, why?

 +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
 +{
 + struct vring_virtqueue *vq = to_vvq(_vq);
 + bool ret;
 +
 + START_USE(vq);
 + vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 + vring_used_event(vq-vring) = vq-vring.avail-idx;
 + ret = vring_need_event(vq-vring.avail-idx,
 +vq-last_used_idx, vq-vring.used-idx);
 + END_USE(vq);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
 +
  /**
   * virtqueue_poll - query pending used buffers
   * @vq: the struct virtqueue we're talking about.

Could not figure out what this does.
Please add documentation.

 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index b46671e..bfaf058 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
  
  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
  
 +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
 +
  bool virtqueue_poll(struct virtqueue *vq, unsigned);
  
  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 -- 
 1.7.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
 This patch checks the new event idx to make sure used event idx never
 goes back. This is used to synchronize the calls between
 virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

the implication being that moving event idx back might cause some race
condition?  If yes but please describe the race explicitly.
Is there a bug we need to fix on stable?
Please also explicitly describe a configuration that causes event idx
to go back.

All this info should go in the commit log.

 ---
  drivers/virtio/virtio_ring.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 3b1f89b..1b3929f 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
 *_vq)
   u16 last_used_idx;
  
   START_USE(vq);
 -
 + last_used_idx = vq-last_used_idx;
   /* We optimistically turn back on interrupts, then check if there was
* more to do. */
   /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
* either clear the flags bit or point the event index at the next
* entry. Always do both to keep code simple. */
   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 - vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx;
 + /* Make sure used event never go backwards */

s/go/goes/

 + if (!vring_need_event(vring_used_event(vq-vring),
 +   vq-vring.avail-idx, last_used_idx))
 + vring_used_event(vq-vring) = last_used_idx;

The result will be that driver will *not* get an interrupt
on the next consumed buffer, which is likely not what driver
intended when it called virtqueue_enable_cb.

Instead, how about we simply document the requirement that drivers either
always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
but not both?


   END_USE(vq);
   return last_used_idx;
  }
 -- 
 1.7.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
 Accumulate the sent packets and sent bytes in local variables and perform a
 single u64_stats_update_begin/end() after.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

Not sure how much it's worth but since Eric suggested it ...

Acked-by: Michael S. Tsirkin m...@redhat.com

 ---
  drivers/net/virtio_net.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 3d0ce44..a4d56b8 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
   unsigned int len;
   struct virtnet_info *vi = sq-vq-vdev-priv;
   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + u64 tx_bytes = 0, tx_packets = 0;
  
   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
   pr_debug(Sent skb %p\n, skb);
  
 - u64_stats_update_begin(stats-tx_syncp);
 - stats-tx_bytes += skb-len;
 - stats-tx_packets++;
 - u64_stats_update_end(stats-tx_syncp);
 + tx_bytes += skb-len;
 + tx_packets++;
  
   dev_kfree_skb_any(skb);
   }
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += tx_bytes;
 + stats-tx_packets =+ tx_packets;
 + u64_stats_update_end(stats-tx_syncp);
  }
  
  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 -- 
 1.7.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
 Accumulate the sent packets and sent bytes in local variables and perform a
 single u64_stats_update_begin/end() after.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/net/virtio_net.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 3d0ce44..a4d56b8 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
   unsigned int len;
   struct virtnet_info *vi = sq-vq-vdev-priv;
   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + u64 tx_bytes = 0, tx_packets = 0;
  
   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
   pr_debug(Sent skb %p\n, skb);
  
 - u64_stats_update_begin(stats-tx_syncp);
 - stats-tx_bytes += skb-len;
 - stats-tx_packets++;
 - u64_stats_update_end(stats-tx_syncp);
 + tx_bytes += skb-len;
 + tx_packets++;
  
   dev_kfree_skb_any(skb);
   }
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += tx_bytes;
 + stats-tx_packets =+ tx_packets;


stats-tx_packets += tx_packets;


 + u64_stats_update_end(stats-tx_syncp);
  }
  
  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Eric Dumazet
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:

...

 +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 +{
 + struct sk_buff *skb;
 + unsigned int len;
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + u64 tx_bytes = 0, tx_packets = 0;
 +
 + while (tx_packets  budget 
 +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 + pr_debug(Sent skb %p\n, skb);
 +
 + tx_bytes += skb-len;
 + tx_packets++;
 +
 + dev_kfree_skb_any(skb);
 + }
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += tx_bytes;
 + stats-tx_packets =+ tx_packets;


stats-tx_packets += tx_packets;

 + u64_stats_update_end(stats-tx_syncp);
 +
 + return tx_packets;
 +}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards

2014-10-15 Thread Jason Wang
On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
 This patch checks the new event idx to make sure used event idx never
 goes back. This is used to synchronize the calls between
 virtqueue_enable_cb_delayed() and virtqueue_enable_cb().

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 the implication being that moving event idx back might cause some race
 condition?  

This will cause race condition when tx interrupt is enabled. Consider
the following cases

1) tx napi was scheduled
2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
event is vq-last_used_idx + 3/4 pendg bufs]
3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
vq-last_used_idx ]
 
After step 3, used event was moved back, unnecessary tx interrupt was
triggered.
 If yes but please describe the race explicitly.
 Is there a bug we need to fix on stable?

Looks not, current code does not have such race condition.
 Please also explicitly describe a configuration that causes event idx
 to go back.

 All this info should go in the commit log.

Will do this.
 ---
  drivers/virtio/virtio_ring.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 3b1f89b..1b3929f 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
 *_vq)
  u16 last_used_idx;
  
  START_USE(vq);
 -
 +last_used_idx = vq-last_used_idx;
  /* We optimistically turn back on interrupts, then check if there was
   * more to do. */
  /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
   * either clear the flags bit or point the event index at the next
   * entry. Always do both to keep code simple. */
  vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 -vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx;
 +/* Make sure used event never go backwards */
 s/go/goes/

 +if (!vring_need_event(vring_used_event(vq-vring),
 +  vq-vring.avail-idx, last_used_idx))
 +vring_used_event(vq-vring) = last_used_idx;
 The result will be that driver will *not* get an interrupt
 on the next consumed buffer, which is likely not what driver
 intended when it called virtqueue_enable_cb.

This will only happen when we want to delay the interrupt for next few
consumed buffers (virtqueue_enable_cb_delayed() was called). For the
other case, vq-last_used_idx should be ahead of previous used event. Do
you see any other case?

 Instead, how about we simply document the requirement that drivers either
 always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
 but not both?

We need call them both when tx interrupt is enabled I believe.

  END_USE(vq);
  return last_used_idx;
  }
 -- 
 1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
 Orphan skb in ndo_start_xmit() breaks socket accounting and packet
 queuing. This in fact breaks lots of things such as pktgen and several
 TCP optimizations. And also make BQL can't be implemented for
 virtio-net.
 
 This patch tries to solve this issue by enabling tx interrupt. To
 avoid introducing extra spinlocks, a tx napi was scheduled to free
 those packets.
 
 More tx interrupt mitigation method could be used on top.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/net/virtio_net.c |  125 
 +++---
  1 files changed, 85 insertions(+), 40 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index ccf98f9..2afc2e2 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -72,6 +72,8 @@ struct send_queue {
  
   /* Name of the send queue: output.$index */
   char name[40];
 +
 + struct napi_struct napi;
  };
  
  /* Internal representation of a receive virtqueue */
 @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
 *rq, gfp_t gfp_mask)
   return p;
  }
  
 +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 +{
 + struct sk_buff *skb;
 + unsigned int len;
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + u64 tx_bytes = 0, tx_packets = 0;
 +
 + while (tx_packets  budget 
 +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 + pr_debug(Sent skb %p\n, skb);
 +
 + tx_bytes += skb-len;
 + tx_packets++;
 +
 + dev_kfree_skb_any(skb);
 + }
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += tx_bytes;
 + stats-tx_packets =+ tx_packets;
 + u64_stats_update_end(stats-tx_syncp);
 +
 + return tx_packets;
 +}
 +
  static void skb_xmit_done(struct virtqueue *vq)
  {
   struct virtnet_info *vi = vq-vdev-priv;
 + struct send_queue *sq = vi-sq[vq2txq(vq)];
  
 - /* Suppress further interrupts. */
 - virtqueue_disable_cb(vq);
 -
 - /* We were probably waiting for more output buffers. */
 - netif_wake_subqueue(vi-dev, vq2txq(vq));
 + if (napi_schedule_prep(sq-napi)) {
 + __napi_schedule(sq-napi);
 + }
  }
  
  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
 @@ -774,7 +801,39 @@ again:
   return received;
  }
  
 +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 +{
 + struct send_queue *sq =
 + container_of(napi, struct send_queue, napi);
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
 + unsigned int r, sent = 0;
 +
 +again:
 + __netif_tx_lock(txq, smp_processor_id());
 + virtqueue_disable_cb(sq-vq);
 + sent += free_old_xmit_skbs(sq, budget - sent);
 +
 + if (sent  budget) {
 + r = virtqueue_enable_cb_prepare(sq-vq);
 + napi_complete(napi);
 + __netif_tx_unlock(txq);
 + if (unlikely(virtqueue_poll(sq-vq, r)) 


So you are enabling callback on the next packet,
which is almost sure to cause an interrupt storm
on the guest.


I think it's a bad idea, this is why I used
enable_cb_delayed in my patch.




 + napi_schedule_prep(napi)) {
 + virtqueue_disable_cb(sq-vq);
 + __napi_schedule(napi);
 + goto again;
 + }
 + } else {
 + __netif_tx_unlock(txq);
 + }
 +
 + netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
 + return sent;
 +}
 +
  #ifdef CONFIG_NET_RX_BUSY_POLL
 +
  /* must be called with local_bh_disable()d */
  static int virtnet_busy_poll(struct napi_struct *napi)
  {
 @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   schedule_delayed_work(vi-refill, 0);
   virtnet_napi_enable(vi-rq[i]);
 + napi_enable(vi-sq[i].napi);
   }
  
   return 0;
  }
  
 -static int free_old_xmit_skbs(struct send_queue *sq)
 -{
 - struct sk_buff *skb;
 - unsigned int len;
 - struct virtnet_info *vi = sq-vq-vdev-priv;
 - struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 - u64 tx_bytes = 0, tx_packets = 0;
 -
 - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 - pr_debug(Sent skb %p\n, skb);
 -
 - tx_bytes += skb-len;
 - tx_packets++;
 -
 - dev_kfree_skb_any(skb);
 - }
 -
 - u64_stats_update_begin(stats-tx_syncp);
 - stats-tx_bytes += tx_bytes;
 - stats-tx_packets =+ tx_packets;
 - u64_stats_update_end(stats-tx_syncp);
 -
 - return tx_packets;
 -}
 -

So you end up 

Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Jason Wang
On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
 This patch introduces virtio_enable_cb_avail() to publish avail idx
 and used event. This could be used by batched buffer submitting to
 reduce the number of tx interrupts.

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/virtio/virtio_ring.c |   22 --
  include/linux/virtio.h   |2 ++
  2 files changed, 22 insertions(+), 2 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 1b3929f..d67fbf8 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
 *_vq)
   * entry. Always do both to keep code simple. */
  vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
  /* Make sure used event never go backwards */
 -if (!vring_need_event(vring_used_event(vq-vring),
 -  vq-vring.avail-idx, last_used_idx))
 +if (vq-vring.avail-idx != vring_used_event(vq-vring) 
 +!vring_need_event(vring_used_event(vq-vring),
 +  vq-vring.avail-idx, last_used_idx)) {
  vring_used_event(vq-vring) = last_used_idx;
 +}
  END_USE(vq);
  return last_used_idx;
  }
  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);

 I see you are also changing virtqueue_enable_cb_prepare, why?

This is also used to prevent it from moving the used event backwards.
This may happens when we handle tx napi after we publish avail idx as
used event (virtqueue_enable_cb_avail() was called).

 +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
 +{
 +struct vring_virtqueue *vq = to_vvq(_vq);
 +bool ret;
 +
 +START_USE(vq);
 +vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 +vring_used_event(vq-vring) = vq-vring.avail-idx;
 +ret = vring_need_event(vq-vring.avail-idx,
 +   vq-last_used_idx, vq-vring.used-idx);
 +END_USE(vq);
 +
 +return ret;
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
 +
  /**
   * virtqueue_poll - query pending used buffers
   * @vq: the struct virtqueue we're talking about.
 Could not figure out what this does.
 Please add documentation.


Sure, does something like below explain what does this function do?

/**

 * virtqueue_enable_cb_avail - restart callbacks after
disable_cb.   
 * @vq: the struct virtqueue we're talking
about.  
 * 

 * This re-enables callbacks but hints to the other side to
delay  
 * interrupts all of the available buffers have been processed; 
 * it returns false if there are at least one pending buffer in the
queue,  
 * to detect a possible race between the driver checking for more
work,
 * and enabling
callbacks. 
 * 

 * Caller must ensure we don't call this with other
virtqueue  
 * operations at the same time (except where
noted).   
 */

 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index b46671e..bfaf058 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
  
  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
  
 +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
 +
  bool virtqueue_poll(struct virtqueue *vq, unsigned);
  
  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 -- 
 1.7.1
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
 With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
 enable tx interrupt only for the final skb in the chain if
 needed. This will help to mitigate tx interrupts.
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

I think you should split xmit_more stuff out.



 ---
  drivers/net/virtio_net.c |   10 +++---
  1 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 2afc2e2..5943f3f 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
 struct net_device *dev)
   virtqueue_disable_cb(sq-vq);
   }
   }
 - } else if (virtqueue_enable_cb(sq-vq)) {
 - free_old_xmit_skbs(sq, qsize);
   }
  
 - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
 + if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) {
   virtqueue_kick(sq-vq);
 + if (sq-vq-num_free = 2 +MAX_SKB_FRAGS 
 + unlikely(virtqueue_enable_cb_avail(sq-vq))) {
 + free_old_xmit_skbs(sq, qsize);
 + virtqueue_disable_cb(sq-vq);
 + }
 + }
  
   return NETDEV_TX_OK;
  }
 -- 
 1.7.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 05:37 PM, Eric Dumazet wrote:
 On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:

 ...

 +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 +{
 +struct sk_buff *skb;
 +unsigned int len;
 +struct virtnet_info *vi = sq-vq-vdev-priv;
 +struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 +u64 tx_bytes = 0, tx_packets = 0;
 +
 +while (tx_packets  budget 
 +   (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 +pr_debug(Sent skb %p\n, skb);
 +
 +tx_bytes += skb-len;
 +tx_packets++;
 +
 +dev_kfree_skb_any(skb);
 +}
 +
 +u64_stats_update_begin(stats-tx_syncp);
 +stats-tx_bytes += tx_bytes;
 +stats-tx_packets =+ tx_packets;
   
   stats-tx_packets += tx_packets;

My bad, will correct it in next version.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
  Orphan skb in ndo_start_xmit() breaks socket accounting and packet
  queuing. This in fact breaks lots of things such as pktgen and several
  TCP optimizations. And also make BQL can't be implemented for
  virtio-net.
  
  This patch tries to solve this issue by enabling tx interrupt. To
  avoid introducing extra spinlocks, a tx napi was scheduled to free
  those packets.
  
  More tx interrupt mitigation method could be used on top.
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   drivers/net/virtio_net.c |  125 
  +++---
   1 files changed, 85 insertions(+), 40 deletions(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index ccf98f9..2afc2e2 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -72,6 +72,8 @@ struct send_queue {
   
 /* Name of the send queue: output.$index */
 char name[40];
  +
  +  struct napi_struct napi;
   };
   
   /* Internal representation of a receive virtqueue */
  @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
  *rq, gfp_t gfp_mask)
 return p;
   }
   
  +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
  +{
  +  struct sk_buff *skb;
  +  unsigned int len;
  +  struct virtnet_info *vi = sq-vq-vdev-priv;
  +  struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  +  u64 tx_bytes = 0, tx_packets = 0;
  +
  +  while (tx_packets  budget 
  + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  +  pr_debug(Sent skb %p\n, skb);
  +
  +  tx_bytes += skb-len;
  +  tx_packets++;
  +
  +  dev_kfree_skb_any(skb);
  +  }
  +
  +  u64_stats_update_begin(stats-tx_syncp);
  +  stats-tx_bytes += tx_bytes;
  +  stats-tx_packets =+ tx_packets;
  +  u64_stats_update_end(stats-tx_syncp);
  +
  +  return tx_packets;
  +}
  +
   static void skb_xmit_done(struct virtqueue *vq)
   {
 struct virtnet_info *vi = vq-vdev-priv;
  +  struct send_queue *sq = vi-sq[vq2txq(vq)];
   
  -  /* Suppress further interrupts. */
  -  virtqueue_disable_cb(vq);
  -
  -  /* We were probably waiting for more output buffers. */
  -  netif_wake_subqueue(vi-dev, vq2txq(vq));
  +  if (napi_schedule_prep(sq-napi)) {
  +  __napi_schedule(sq-napi);
  +  }
   }
   
   static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
  @@ -774,7 +801,39 @@ again:
 return received;
   }
   
  +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
  +{
  +  struct send_queue *sq =
  +  container_of(napi, struct send_queue, napi);
  +  struct virtnet_info *vi = sq-vq-vdev-priv;
  +  struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
  +  unsigned int r, sent = 0;
  +
  +again:
  +  __netif_tx_lock(txq, smp_processor_id());
  +  virtqueue_disable_cb(sq-vq);
  +  sent += free_old_xmit_skbs(sq, budget - sent);
  +
  +  if (sent  budget) {
  +  r = virtqueue_enable_cb_prepare(sq-vq);
  +  napi_complete(napi);
  +  __netif_tx_unlock(txq);
  +  if (unlikely(virtqueue_poll(sq-vq, r)) 
 So you are enabling callback on the next packet,
 which is almost sure to cause an interrupt storm
 on the guest.


 I think it's a bad idea, this is why I used
 enable_cb_delayed in my patch.

Right, will do this, but may also need to make sure used event never
goes back since we may call virtqueue_enable_cb_avail().


  +  napi_schedule_prep(napi)) {
  +  virtqueue_disable_cb(sq-vq);
  +  __napi_schedule(napi);
  +  goto again;
  +  }
  +  } else {
  +  __netif_tx_unlock(txq);
  +  }
  +
  +  netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
  +  return sent;
  +}
  +
   #ifdef CONFIG_NET_RX_BUSY_POLL
  +
   /* must be called with local_bh_disable()d */
   static int virtnet_busy_poll(struct napi_struct *napi)
   {
  @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
 if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
 schedule_delayed_work(vi-refill, 0);
 virtnet_napi_enable(vi-rq[i]);
  +  napi_enable(vi-sq[i].napi);
 }
   
 return 0;
   }
   
  -static int free_old_xmit_skbs(struct send_queue *sq)
  -{
  -  struct sk_buff *skb;
  -  unsigned int len;
  -  struct virtnet_info *vi = sq-vq-vdev-priv;
  -  struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  -  u64 tx_bytes = 0, tx_packets = 0;
  -
  -  while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  -  pr_debug(Sent skb %p\n, skb);
  -
  -  tx_bytes += skb-len;
  -  tx_packets++;
  -
  -  dev_kfree_skb_any(skb);
  -  }
  -
  -  u64_stats_update_begin(stats-tx_syncp);
  -  stats-tx_bytes += tx_bytes;
  -  stats-tx_packets =+ 

Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
 On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
  This patch checks the new event idx to make sure used event idx never
  goes back. This is used to synchronize the calls between
  virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  the implication being that moving event idx back might cause some race
  condition?  
 
 This will cause race condition when tx interrupt is enabled. Consider
 the following cases
 
 1) tx napi was scheduled
 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
 event is vq-last_used_idx + 3/4 pendg bufs]
 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
 vq-last_used_idx ]
  
 After step 3, used event was moved back, unnecessary tx interrupt was
 triggered.

Well unnecessary interrupts are safe.
With your patch caller of virtqueue_enable_cb will not get an
interrupt on the next buffer which is not safe.

If you don't want an interrupt on the next buffer, don't
call virtqueue_enable_cb.


  If yes but please describe the race explicitly.
  Is there a bug we need to fix on stable?
 
 Looks not, current code does not have such race condition.
  Please also explicitly describe a configuration that causes event idx
  to go back.
 
  All this info should go in the commit log.
 
 Will do this.
  ---
   drivers/virtio/virtio_ring.c |7 +--
   1 files changed, 5 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
  index 3b1f89b..1b3929f 100644
  --- a/drivers/virtio/virtio_ring.c
  +++ b/drivers/virtio/virtio_ring.c
  @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct 
  virtqueue *_vq)
 u16 last_used_idx;
   
 START_USE(vq);
  -
  +  last_used_idx = vq-last_used_idx;
 /* We optimistically turn back on interrupts, then check if there was
  * more to do. */
 /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
  * either clear the flags bit or point the event index at the next
  * entry. Always do both to keep code simple. */
 vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
  -  vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx;
  +  /* Make sure used event never go backwards */
  s/go/goes/
 
  +  if (!vring_need_event(vring_used_event(vq-vring),
  +vq-vring.avail-idx, last_used_idx))
  +  vring_used_event(vq-vring) = last_used_idx;
  The result will be that driver will *not* get an interrupt
  on the next consumed buffer, which is likely not what driver
  intended when it called virtqueue_enable_cb.
 
 This will only happen when we want to delay the interrupt for next few
 consumed buffers (virtqueue_enable_cb_delayed() was called). For the
 other case, vq-last_used_idx should be ahead of previous used event. Do
 you see any other case?

Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
event index is not updated in virtqueue_enable_cb, driver will not get
an interrupt on the next buffer.


 
  Instead, how about we simply document the requirement that drivers either
  always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
  but not both?
 
 We need call them both when tx interrupt is enabled I believe.

Can you pls reply to my patch and document issues you see?

 
 END_USE(vq);
 return last_used_idx;
   }
  -- 
  1.7.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain

2014-10-15 Thread Jason Wang
On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
 With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
 enable tx interrupt only for the final skb in the chain if
 needed. This will help to mitigate tx interrupts.

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 I think you should split xmit_more stuff out.

No much difference but if you prefer I will do this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
 On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
  This patch introduces virtio_enable_cb_avail() to publish avail idx
  and used event. This could be used by batched buffer submitting to
  reduce the number of tx interrupts.
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   drivers/virtio/virtio_ring.c |   22 --
   include/linux/virtio.h   |2 ++
   2 files changed, 22 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
  index 1b3929f..d67fbf8 100644
  --- a/drivers/virtio/virtio_ring.c
  +++ b/drivers/virtio/virtio_ring.c
  @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct 
  virtqueue *_vq)
  * entry. Always do both to keep code simple. */
 vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 /* Make sure used event never go backwards */
  -  if (!vring_need_event(vring_used_event(vq-vring),
  -vq-vring.avail-idx, last_used_idx))
  +  if (vq-vring.avail-idx != vring_used_event(vq-vring) 
  +  !vring_need_event(vring_used_event(vq-vring),
  +vq-vring.avail-idx, last_used_idx)) {
 vring_used_event(vq-vring) = last_used_idx;
  +  }
 END_USE(vq);
 return last_used_idx;
   }
   EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
  I see you are also changing virtqueue_enable_cb_prepare, why?
 
 This is also used to prevent it from moving the used event backwards.
 This may happens when we handle tx napi after we publish avail idx as
 used event (virtqueue_enable_cb_avail() was called).

So it's wrong exactly in the same way.

But also, please document this stuff, don't put
unrelated changes in a patch called introduce
virtqueue_enable_cb_avail.


 
  +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
  +{
  +  struct vring_virtqueue *vq = to_vvq(_vq);
  +  bool ret;
  +
  +  START_USE(vq);
  +  vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
  +  vring_used_event(vq-vring) = vq-vring.avail-idx;
  +  ret = vring_need_event(vq-vring.avail-idx,
  + vq-last_used_idx, vq-vring.used-idx);
  +  END_USE(vq);
  +
  +  return ret;
  +}
  +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
  +
   /**
* virtqueue_poll - query pending used buffers
* @vq: the struct virtqueue we're talking about.
  Could not figure out what this does.
  Please add documentation.
 
 
 Sure, does something like below explain what does this function do?
 
 /**   
  
 
  * virtqueue_enable_cb_avail - restart callbacks after
 disable_cb.   
  * @vq: the struct virtqueue we're talking
 about.  
  *
  
 
  * This re-enables callbacks but hints to the other side to
 delay  
  * interrupts all of the available buffers have been processed; 


So this is like virtqueue_enable_cb_delayed but even more
aggressive?
I think it's too agressive: it's better to wake up guest
after you are through most of buffers, but not all,
so guest and host can work in parallel.


  * it returns false if there are at least one pending buffer in the
 queue,  
  * to detect a possible race between the driver checking for more
 work,
  * and enabling
 callbacks. 
  *
  
 
  * Caller must ensure we don't call this with other
 virtqueue  
  * operations at the same time (except where
 noted).   
  */
 
  diff --git a/include/linux/virtio.h b/include/linux/virtio.h
  index b46671e..bfaf058 100644
  --- a/include/linux/virtio.h
  +++ b/include/linux/virtio.h
  @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
   
   unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
   
  +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
  +
   bool virtqueue_poll(struct virtqueue *vq, unsigned);
   
   bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
  -- 
  1.7.1
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 06:31:19PM +0800, Jason Wang wrote:
 On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
  With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
  enable tx interrupt only for the final skb in the chain if
  needed. This will help to mitigate tx interrupts.
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  I think you should split xmit_more stuff out.
 
 No much difference but if you prefer I will do this.

In fact I did it exactly like this already :)

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards

2014-10-15 Thread Jason Wang
On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
 On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
 This patch checks the new event idx to make sure used event idx never
 goes back. This is used to synchronize the calls between
 virtqueue_enable_cb_delayed() and virtqueue_enable_cb().

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 the implication being that moving event idx back might cause some race
 condition?  
 This will cause race condition when tx interrupt is enabled. Consider
 the following cases

 1) tx napi was scheduled
 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
 event is vq-last_used_idx + 3/4 pendg bufs]
 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
 vq-last_used_idx ]
  
 After step 3, used event was moved back, unnecessary tx interrupt was
 triggered.
 Well unnecessary interrupts are safe.

But it that is what we want to reduce.
 With your patch caller of virtqueue_enable_cb will not get an
 interrupt on the next buffer which is not safe.

 If you don't want an interrupt on the next buffer, don't
 call virtqueue_enable_cb.

So something like this patch should be done in virtio core somewhere
else. Virtio-net can not do this since it does not have the knowledge of
event index.

 If yes but please describe the race explicitly.
 Is there a bug we need to fix on stable?
 Looks not, current code does not have such race condition.
 Please also explicitly describe a configuration that causes event idx
 to go back.

 All this info should go in the commit log.
 Will do this.
 ---
  drivers/virtio/virtio_ring.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 3b1f89b..1b3929f 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct 
 virtqueue *_vq)
u16 last_used_idx;
  
START_USE(vq);
 -
 +  last_used_idx = vq-last_used_idx;
/* We optimistically turn back on interrupts, then check if there was
 * more to do. */
/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
 * either clear the flags bit or point the event index at the next
 * entry. Always do both to keep code simple. */
vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 -  vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx;
 +  /* Make sure used event never go backwards */
 s/go/goes/

 +  if (!vring_need_event(vring_used_event(vq-vring),
 +vq-vring.avail-idx, last_used_idx))
 +  vring_used_event(vq-vring) = last_used_idx;
 The result will be that driver will *not* get an interrupt
 on the next consumed buffer, which is likely not what driver
 intended when it called virtqueue_enable_cb.
 This will only happen when we want to delay the interrupt for next few
 consumed buffers (virtqueue_enable_cb_delayed() was called). For the
 other case, vq-last_used_idx should be ahead of previous used event. Do
 you see any other case?
 Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
 event index is not updated in virtqueue_enable_cb, driver will not get
 an interrupt on the next buffer.

This is just what we want I think. The interrupt was not lost but fired
after 3/4 pending buffers were consumed. Do you see any real issue on this?

 Instead, how about we simply document the requirement that drivers either
 always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
 but not both?
 We need call them both when tx interrupt is enabled I believe.
 Can you pls reply to my patch and document issues you see?


In the previous reply you said you're using
virtuqueue_enable_cb_delayed(), so no race in your patch.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote:
 From: Of Michael S. Tsirkin
  On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
   Accumulate the sent packets and sent bytes in local variables and perform 
   a
   single u64_stats_update_begin/end() after.
  
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Signed-off-by: Jason Wang jasow...@redhat.com
  
  Not sure how much it's worth but since Eric suggested it ...
 
 Probably depends on the actual cost of u64_stats_update_begin/end
 against the likely extra saving of the tx_bytes and tx_packets
 values onto the stack across the call to dev_kfree_skb_any().
 (Which depends on the number of caller saved registers.)

Yea, some benchmark results would be nice to see.


  Acked-by: Michael S. Tsirkin m...@redhat.com
  
   ---
drivers/net/virtio_net.c |   12 
1 files changed, 8 insertions(+), 4 deletions(-)
  
   diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
   index 3d0ce44..a4d56b8 100644
   --- a/drivers/net/virtio_net.c
   +++ b/drivers/net/virtio_net.c
   @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue 
   *sq)
 unsigned int len;
 struct virtnet_info *vi = sq-vq-vdev-priv;
 struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
   + u64 tx_bytes = 0, tx_packets = 0;
 
 tx_packets need only be 'unsigned int'.
 The same is almost certainly true of tx_bytes.
 
   David
 
 while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 pr_debug(Sent skb %p\n, skb);
  
   - u64_stats_update_begin(stats-tx_syncp);
   - stats-tx_bytes += skb-len;
   - stats-tx_packets++;
   - u64_stats_update_end(stats-tx_syncp);
   + tx_bytes += skb-len;
   + tx_packets++;
  
 dev_kfree_skb_any(skb);
 }
   +
   + u64_stats_update_begin(stats-tx_syncp);
   + stats-tx_bytes += tx_bytes;
   + stats-tx_packets =+ tx_packets;
   + u64_stats_update_end(stats-tx_syncp);
}
  
static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
   --
   1.7.1
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread David Laight
From: Michael S. Tsirkin
 On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote:
  From: Of Michael S. Tsirkin
   On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
Accumulate the sent packets and sent bytes in local variables and 
perform a
single u64_stats_update_begin/end() after.
   
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
  
   Not sure how much it's worth but since Eric suggested it ...
 
  Probably depends on the actual cost of u64_stats_update_begin/end
  against the likely extra saving of the tx_bytes and tx_packets
  values onto the stack across the call to dev_kfree_skb_any().
  (Which depends on the number of caller saved registers.)
 
 Yea, some benchmark results would be nice to see.

I there are likely to be multiple skb on the queue the fastest
code would probably do one 'virtqueue_get_all()' that returned
a linked list of buffers, then follow the list to get the stats,
and follow it again to free the skb.

David

   Acked-by: Michael S. Tsirkin m...@redhat.com
  
---
 drivers/net/virtio_net.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)
   
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue 
*sq)
unsigned int len;
struct virtnet_info *vi = sq-vq-vdev-priv;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   u64 tx_bytes = 0, tx_packets = 0;
 
  tx_packets need only be 'unsigned int'.
  The same is almost certainly true of tx_bytes.
 
  David
 
while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
pr_debug(Sent skb %p\n, skb);
   
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
+   tx_bytes += skb-len;
+   tx_packets++;
   
dev_kfree_skb_any(skb);
}
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += tx_bytes;
+   stats-tx_packets =+ tx_packets;
+   u64_stats_update_end(stats-tx_syncp);
 }
   
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
--
1.7.1
   --
   To unsubscribe from this list: send the line unsubscribe netdev in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Jason Wang
On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
 On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
 This patch introduces virtio_enable_cb_avail() to publish avail idx
 and used event. This could be used by batched buffer submitting to
 reduce the number of tx interrupts.

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/virtio/virtio_ring.c |   22 --
  include/linux/virtio.h   |2 ++
  2 files changed, 22 insertions(+), 2 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 1b3929f..d67fbf8 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct 
 virtqueue *_vq)
 * entry. Always do both to keep code simple. */
vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
/* Make sure used event never go backwards */
 -  if (!vring_need_event(vring_used_event(vq-vring),
 -vq-vring.avail-idx, last_used_idx))
 +  if (vq-vring.avail-idx != vring_used_event(vq-vring) 
 +  !vring_need_event(vring_used_event(vq-vring),
 +vq-vring.avail-idx, last_used_idx)) {
vring_used_event(vq-vring) = last_used_idx;
 +  }
END_USE(vq);
return last_used_idx;
  }
  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);

 I see you are also changing virtqueue_enable_cb_prepare, why?
 This is also used to prevent it from moving the used event backwards.
 This may happens when we handle tx napi after we publish avail idx as
 used event (virtqueue_enable_cb_avail() was called).
 So it's wrong exactly in the same way.

 But also, please document this stuff, don't put
 unrelated changes in a patch called introduce
 virtqueue_enable_cb_avail.


 +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
 +{
 +  struct vring_virtqueue *vq = to_vvq(_vq);
 +  bool ret;
 +
 +  START_USE(vq);
 +  vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
 +  vring_used_event(vq-vring) = vq-vring.avail-idx;
 +  ret = vring_need_event(vq-vring.avail-idx,
 + vq-last_used_idx, vq-vring.used-idx);
 +  END_USE(vq);
 +
 +  return ret;
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
 +
  /**
   * virtqueue_poll - query pending used buffers
   * @vq: the struct virtqueue we're talking about.
 Could not figure out what this does.
 Please add documentation.

 Sure, does something like below explain what does this function do?

 /**  
   

  * virtqueue_enable_cb_avail - restart callbacks after
 disable_cb.   
  * @vq: the struct virtqueue we're talking
 about.  
  *   
   

  * This re-enables callbacks but hints to the other side to
 delay  
  * interrupts all of the available buffers have been processed; 

 So this is like virtqueue_enable_cb_delayed but even more
 aggressive?
 I think it's too agressive: it's better to wake up guest
 after you are through most of buffers, but not all,
 so guest and host can work in parallel.

Note that:

- it was only used when there are still few of free slots (which is
greater than 2 + MAX_SKB_FRAGS)
- my patch keeps the free_old_xmit_skbs() in the beginning of
start_xmit(), so the tx skb reclaiming does not depends totally on tx
interrupt. If more packets comes, we'd expect some of them were freed in
ndo_start_xmit(). If not, finally we may trigger the
virtqueue_enable_cb_delayed().

So probably not as aggressive as it looks. I will do benchmark on this.


  * it returns false if there are at least one pending buffer in the
 queue,  
  * to detect a possible race between the driver checking for more
 work,
  * and enabling
 callbacks. 
  *   
   

  * Caller must ensure we don't call this with other
 virtqueue  
  * operations at the same time (except where
 noted).   
  */

 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index b46671e..bfaf058 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
  
  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
  
 +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
 +
  bool virtqueue_poll(struct virtqueue *vq, unsigned);
  
  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 -- 
 1.7.1
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo 

Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 06:43 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote:
  On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
   On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and 
several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.

This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.

More tx interrupt mitigation method could be used on top.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c |  125 
+++---
 1 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
   /* Name of the send queue: output.$index */
   char name[40];
+
+  struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct 
receive_queue *rq, gfp_t gfp_mask)
   return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+  struct sk_buff *skb;
+  unsigned int len;
+  struct virtnet_info *vi = sq-vq-vdev-priv;
+  struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+  u64 tx_bytes = 0, tx_packets = 0;
+
+  while (tx_packets  budget 
+ (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
+  pr_debug(Sent skb %p\n, skb);
+
+  tx_bytes += skb-len;
+  tx_packets++;
+
+  dev_kfree_skb_any(skb);
+  }
+
+  u64_stats_update_begin(stats-tx_syncp);
+  stats-tx_bytes += tx_bytes;
+  stats-tx_packets =+ tx_packets;
+  u64_stats_update_end(stats-tx_syncp);
+
+  return tx_packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
   struct virtnet_info *vi = vq-vdev-priv;
+  struct send_queue *sq = vi-sq[vq2txq(vq)];
 
-  /* Suppress further interrupts. */
-  virtqueue_disable_cb(vq);
-
-  /* We were probably waiting for more output buffers. */
-  netif_wake_subqueue(vi-dev, vq2txq(vq));
+  if (napi_schedule_prep(sq-napi)) {
+  __napi_schedule(sq-napi);
+  }
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long 
mrg_ctx)
@@ -774,7 +801,39 @@ again:
   return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+  struct send_queue *sq =
+  container_of(napi, struct send_queue, napi);
+  struct virtnet_info *vi = sq-vq-vdev-priv;
+  struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, 
vq2txq(sq-vq));
+  unsigned int r, sent = 0;
+
+again:
+  __netif_tx_lock(txq, smp_processor_id());
+  virtqueue_disable_cb(sq-vq);
+  sent += free_old_xmit_skbs(sq, budget - sent);
+
+  if (sent  budget) {
+  r = virtqueue_enable_cb_prepare(sq-vq);
+  napi_complete(napi);
+  __netif_tx_unlock(txq);
+  if (unlikely(virtqueue_poll(sq-vq, r)) 
   So you are enabling callback on the next packet,
   which is almost sure to cause an interrupt storm
   on the guest.
  
  
   I think it's a bad idea, this is why I used
   enable_cb_delayed in my patch.
  
  Right, will do this, but may also need to make sure used event never
  goes back since we may call virtqueue_enable_cb_avail().
 That's why my patch always calls virtqueue_enable_cb_delayed.
 So no need for hacks.

 Maybe you can review my patch and comment?

I think I miss the virtqueue_enable_cb_delayed() in your patch when I'm
reviewing it. Will do.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote:
 On newer hosts that support delayed tx interrupts,
 we probably don't have much to gain from orphaning
 packets early.

 Based on patch by Jason Wang.

 Note: this will likely degrade performance for hosts without event idx
 support.  Various fallback options are available, including
 orphaning conditionally.
 Testing TBD.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/net/virtio_net.c | 119 
 +--
  1 file changed, 83 insertions(+), 36 deletions(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 6b6e136..62c059d 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -72,6 +72,8 @@ struct send_queue {
  
   /* Name of the send queue: output.$index */
   char name[40];
 +
 + struct napi_struct napi;
  };
  
  /* Internal representation of a receive virtqueue */
 @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue 
 *rq, gfp_t gfp_mask)
   return p;
  }
  
 +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 +{
 + struct sk_buff *skb;
 + unsigned int len;
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + int sent = 0;
 +
 + while (sent  budget 
 +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 + pr_debug(Sent skb %p\n, skb);
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += skb-len;
 + stats-tx_packets++;
 + u64_stats_update_end(stats-tx_syncp);
 +
 + dev_kfree_skb_any(skb);
 + sent++;
 + }
 +
 + return sent;
 +}
 +
  static void skb_xmit_done(struct virtqueue *vq)
  {
   struct virtnet_info *vi = vq-vdev-priv;
 + struct send_queue *sq = vi-sq[vq2txq(vq)];
  
 - /* Suppress further interrupts. */
 - virtqueue_disable_cb(vq);
 -
 - /* We were probably waiting for more output buffers. */
 - netif_wake_subqueue(vi-dev, vq2txq(vq));
 + if (napi_schedule_prep(sq-napi)) {
 + __napi_schedule(sq-napi);
 + }
  }
  
  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
 @@ -766,6 +791,37 @@ again:
   return received;
  }
  
 +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 +{
 + struct send_queue *sq =
 + container_of(napi, struct send_queue, napi);
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
 + unsigned int r, sent = 0;
 +
 +again:
 + __netif_tx_lock(txq, smp_processor_id());
 + virtqueue_disable_cb(sq-vq);
 + sent += free_old_xmit_skbs(sq, budget - sent);
 +
 + if (sent  budget) {
 + r = virtqueue_enable_cb_prepare(sq-vq);

So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can
move used index backwards to trigger unnecessary interrupts.
 + napi_complete(napi);
 + __netif_tx_unlock(txq);
 + if (unlikely(virtqueue_poll(sq-vq, r)) 
 + napi_schedule_prep(napi)) {
 + virtqueue_disable_cb(sq-vq);
 + __napi_schedule(napi);
 + goto again;
 + }
 + } else {
 + __netif_tx_unlock(txq);
 + }
 +
 + netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
 + return sent;
 +}
 +
  #ifdef CONFIG_NET_RX_BUSY_POLL
  /* must be called with local_bh_disable()d */
  static int virtnet_busy_poll(struct napi_struct *napi)
 @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   schedule_delayed_work(vi-refill, 0);
   virtnet_napi_enable(vi-rq[i]);
 + napi_enable(vi-sq[i].napi);
   }
  
   return 0;
  }
  
 -static void free_old_xmit_skbs(struct send_queue *sq)
 -{
 - struct sk_buff *skb;
 - unsigned int len;
 - struct virtnet_info *vi = sq-vq-vdev-priv;
 - struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 -
 - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 - pr_debug(Sent skb %p\n, skb);
 -
 - u64_stats_update_begin(stats-tx_syncp);
 - stats-tx_bytes += skb-len;
 - stats-tx_packets++;
 - u64_stats_update_end(stats-tx_syncp);
 -
 - dev_kfree_skb_any(skb);
 - }
 -}
 -
  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
  {
   struct skb_vnet_hdr *hdr;
 @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
 *skb)
   sg_set_buf(sq-sg, hdr, hdr_len);
   num_sg = skb_to_sgvec(skb, sq-sg + 1, 0, skb-len) + 1;
   }
 - return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, GFP_ATOMIC);
 +
 + return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, 

Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net

2014-10-15 Thread Jason Wang
On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
 According to David, proper accounting and queueing (at all levels, not
 just TCP sockets) is more important than trying to skim a bunch of
 cycles by avoiding TX interrupts.
 He also mentioned we should find other ways to batch


Right.
 Having an event to free the SKB is
 absolutely essential for the stack to operate correctly.

 This series tries to enable tx interrupt for virtio-net. The idea is
 simple: enable tx interrupt and schedule a tx napi to free old xmit
 skbs.

 Several notes:
 - Tx interrupt storm avoidance when queue is about to be full is
   kept.
 But queue is typically *not* full. More important to avoid interrupt
 storms in that case IMO.

Yes.
 Since we may enable callbacks in both ndo_start_xmit() and tx
   napi, patch 1 adds a check to make sure used event never go
   back. This will let the napi not enable the callbacks wrongly after
   delayed callbacks was used.
 So why not just use delayed callbacks?

This means the tx interrupt are coalesced in a somewhat adaptive way.
Need benchmark to see its effect.

 - For bulk dequeuing, there's no need to enable tx interrupt for each
   packet. The last patch only enable tx interrupt for the final skb in
   the chain through xmit_more and a new helper to publish current avail
   idx as used event.

 This series fixes several issues of original rfc pointed out by Michael.
 Could you list the issues, for ease of review?

Probably just one:

- Move the virtqueue_disable_cb() from skb_xmit_done() into
virtnet_poll_tx() under tx lock.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
 On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
  On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
  This patch checks the new event idx to make sure used event idx never
  goes back. This is used to synchronize the calls between
  virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  the implication being that moving event idx back might cause some race
  condition?  
  This will cause race condition when tx interrupt is enabled. Consider
  the following cases
 
  1) tx napi was scheduled
  2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
  event is vq-last_used_idx + 3/4 pendg bufs]
  3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
  vq-last_used_idx ]
   
  After step 3, used event was moved back, unnecessary tx interrupt was
  triggered.
  Well unnecessary interrupts are safe.
 
 But it that is what we want to reduce.

It's all about correctness. I don't think mixing enable_cb
and enable_cb_delayed makes sense, let's just make
virtio behave correctly if that happens, no need to
optimize for that.


  With your patch caller of virtqueue_enable_cb will not get an
  interrupt on the next buffer which is not safe.
 
  If you don't want an interrupt on the next buffer, don't
  call virtqueue_enable_cb.
 
 So something like this patch should be done in virtio core somewhere
 else. Virtio-net can not do this since it does not have the knowledge of
 event index.

Take a look at my patch - no calls to enable_cb, only
enable_cb_delayed, so we should be fine.

 
  If yes but please describe the race explicitly.
  Is there a bug we need to fix on stable?
  Looks not, current code does not have such race condition.
  Please also explicitly describe a configuration that causes event idx
  to go back.
 
  All this info should go in the commit log.
  Will do this.
  ---
   drivers/virtio/virtio_ring.c |7 +--
   1 files changed, 5 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
  index 3b1f89b..1b3929f 100644
  --- a/drivers/virtio/virtio_ring.c
  +++ b/drivers/virtio/virtio_ring.c
  @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct 
  virtqueue *_vq)
   u16 last_used_idx;
   
   START_USE(vq);
  -
  +last_used_idx = vq-last_used_idx;
   /* We optimistically turn back on interrupts, then check if 
  there was
* more to do. */
   /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
* either clear the flags bit or point the event index at the 
  next
* entry. Always do both to keep code simple. */
   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
  -vring_used_event(vq-vring) = last_used_idx = 
  vq-last_used_idx;
  +/* Make sure used event never go backwards */
  s/go/goes/
 
  +if (!vring_need_event(vring_used_event(vq-vring),
  +  vq-vring.avail-idx, last_used_idx))
  +vring_used_event(vq-vring) = last_used_idx;
  The result will be that driver will *not* get an interrupt
  on the next consumed buffer, which is likely not what driver
  intended when it called virtqueue_enable_cb.
  This will only happen when we want to delay the interrupt for next few
  consumed buffers (virtqueue_enable_cb_delayed() was called). For the
  other case, vq-last_used_idx should be ahead of previous used event. Do
  you see any other case?
  Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
  event index is not updated in virtqueue_enable_cb, driver will not get
  an interrupt on the next buffer.
 
 This is just what we want I think. The interrupt was not lost but fired
 after 3/4 pending buffers were consumed. Do you see any real issue on this?

Yes, this violates the API. For example device might never
consume the rest of buffers.


 
  Instead, how about we simply document the requirement that drivers either
  always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
  but not both?
  We need call them both when tx interrupt is enabled I believe.
  Can you pls reply to my patch and document issues you see?
 
 
 In the previous reply you said you're using
 virtuqueue_enable_cb_delayed(), so no race in your patch.

OK so you think my patch is also correct, but that yours gives better
efficiency?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote:
 On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
  On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
  This patch introduces virtio_enable_cb_avail() to publish avail idx
  and used event. This could be used by batched buffer submitting to
  reduce the number of tx interrupts.
 
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   drivers/virtio/virtio_ring.c |   22 --
   include/linux/virtio.h   |2 ++
   2 files changed, 22 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
  index 1b3929f..d67fbf8 100644
  --- a/drivers/virtio/virtio_ring.c
  +++ b/drivers/virtio/virtio_ring.c
  @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct 
  virtqueue *_vq)
* entry. Always do both to keep code simple. */
   vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
   /* Make sure used event never go backwards */
  -if (!vring_need_event(vring_used_event(vq-vring),
  -  vq-vring.avail-idx, last_used_idx))
  +if (vq-vring.avail-idx != vring_used_event(vq-vring) 
  +!vring_need_event(vring_used_event(vq-vring),
  +  vq-vring.avail-idx, last_used_idx)) {
   vring_used_event(vq-vring) = last_used_idx;
  +}
   END_USE(vq);
   return last_used_idx;
   }
   EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
  I see you are also changing virtqueue_enable_cb_prepare, why?
  This is also used to prevent it from moving the used event backwards.
  This may happens when we handle tx napi after we publish avail idx as
  used event (virtqueue_enable_cb_avail() was called).
  So it's wrong exactly in the same way.
 
  But also, please document this stuff, don't put
  unrelated changes in a patch called introduce
  virtqueue_enable_cb_avail.
 
 
  +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
  +{
  +struct vring_virtqueue *vq = to_vvq(_vq);
  +bool ret;
  +
  +START_USE(vq);
  +vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT;
  +vring_used_event(vq-vring) = vq-vring.avail-idx;
  +ret = vring_need_event(vq-vring.avail-idx,
  +   vq-last_used_idx, vq-vring.used-idx);
  +END_USE(vq);
  +
  +return ret;
  +}
  +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
  +
   /**
* virtqueue_poll - query pending used buffers
* @vq: the struct virtqueue we're talking about.
  Could not figure out what this does.
  Please add documentation.
 
  Sure, does something like below explain what does this function do?
 
  /**
  
 
   * virtqueue_enable_cb_avail - restart callbacks after
  disable_cb.   
   * @vq: the struct virtqueue we're talking
  about.  
   * 
  
 
   * This re-enables callbacks but hints to the other side to
  delay  
   * interrupts all of the available buffers have been processed; 
 
  So this is like virtqueue_enable_cb_delayed but even more
  aggressive?
  I think it's too agressive: it's better to wake up guest
  after you are through most of buffers, but not all,
  so guest and host can work in parallel.
 
 Note that:
 
 - it was only used when there are still few of free slots (which is
 greater than 2 + MAX_SKB_FRAGS)
 - my patch keeps the free_old_xmit_skbs() in the beginning of
 start_xmit(), so the tx skb reclaiming does not depends totally on tx
 interrupt. If more packets comes, we'd expect some of them were freed in
 ndo_start_xmit(). If not, finally we may trigger the
 virtqueue_enable_cb_delayed().
 
 So probably not as aggressive as it looks. I will do benchmark on this.

Mine too:
} else if (virtqueue_enable_cb_delayed(sq-vq)) {
free_old_xmit_skbs(txq, sq, qsize);
}



 
 
   * it returns false if there are at least one pending buffer in the
  queue,  
   * to detect a possible race between the driver checking for more
  work,
   * and enabling
  callbacks. 
   * 
  
 
   * Caller must ensure we don't call this with other
  virtqueue  
   * operations at the same time (except where
  noted).   
   */
 
  diff --git a/include/linux/virtio.h b/include/linux/virtio.h
  index b46671e..bfaf058 100644
  --- a/include/linux/virtio.h
  +++ b/include/linux/virtio.h
  @@ -65,6 +65,8 @@ bool 

Re: [PATCH RFC] virtio_net: enable tx interrupt

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 07:04:20PM +0800, Jason Wang wrote:
 On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote:
  On newer hosts that support delayed tx interrupts,
  we probably don't have much to gain from orphaning
  packets early.
 
  Based on patch by Jason Wang.
 
  Note: this will likely degrade performance for hosts without event idx
  support.  Various fallback options are available, including
  orphaning conditionally.
  Testing TBD.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   drivers/net/virtio_net.c | 119 
  +--
   1 file changed, 83 insertions(+), 36 deletions(-)
 
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 6b6e136..62c059d 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -72,6 +72,8 @@ struct send_queue {
   
  /* Name of the send queue: output.$index */
  char name[40];
  +
  +   struct napi_struct napi;
   };
   
   /* Internal representation of a receive virtqueue */
  @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue 
  *rq, gfp_t gfp_mask)
  return p;
   }
   
  +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
  +{
  +   struct sk_buff *skb;
  +   unsigned int len;
  +   struct virtnet_info *vi = sq-vq-vdev-priv;
  +   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  +   int sent = 0;
  +
  +   while (sent  budget 
  +  (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  +   pr_debug(Sent skb %p\n, skb);
  +
  +   u64_stats_update_begin(stats-tx_syncp);
  +   stats-tx_bytes += skb-len;
  +   stats-tx_packets++;
  +   u64_stats_update_end(stats-tx_syncp);
  +
  +   dev_kfree_skb_any(skb);
  +   sent++;
  +   }
  +
  +   return sent;
  +}
  +
   static void skb_xmit_done(struct virtqueue *vq)
   {
  struct virtnet_info *vi = vq-vdev-priv;
  +   struct send_queue *sq = vi-sq[vq2txq(vq)];
   
  -   /* Suppress further interrupts. */
  -   virtqueue_disable_cb(vq);
  -
  -   /* We were probably waiting for more output buffers. */
  -   netif_wake_subqueue(vi-dev, vq2txq(vq));
  +   if (napi_schedule_prep(sq-napi)) {
  +   __napi_schedule(sq-napi);
  +   }
   }
   
   static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
  @@ -766,6 +791,37 @@ again:
  return received;
   }
   
  +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
  +{
  +   struct send_queue *sq =
  +   container_of(napi, struct send_queue, napi);
  +   struct virtnet_info *vi = sq-vq-vdev-priv;
  +   struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
  +   unsigned int r, sent = 0;
  +
  +again:
  +   __netif_tx_lock(txq, smp_processor_id());
  +   virtqueue_disable_cb(sq-vq);
  +   sent += free_old_xmit_skbs(sq, budget - sent);
  +
  +   if (sent  budget) {
  +   r = virtqueue_enable_cb_prepare(sq-vq);
 
 So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can
 move used index backwards to trigger unnecessary interrupts.

Good point. I'll rework this to use virtqueue_enable_cb_delayed.

virtqueue_enable_cb_delayed_prepare might be nice to
reduce lock contention, but that needs to be benchmarked.


  +   napi_complete(napi);
  +   __netif_tx_unlock(txq);
  +   if (unlikely(virtqueue_poll(sq-vq, r)) 
  +   napi_schedule_prep(napi)) {
  +   virtqueue_disable_cb(sq-vq);
  +   __napi_schedule(napi);
  +   goto again;
  +   }
  +   } else {
  +   __netif_tx_unlock(txq);
  +   }
  +
  +   netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
  +   return sent;
  +}
  +
   #ifdef CONFIG_NET_RX_BUSY_POLL
   /* must be called with local_bh_disable()d */
   static int virtnet_busy_poll(struct napi_struct *napi)
  @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
  if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
  schedule_delayed_work(vi-refill, 0);
  virtnet_napi_enable(vi-rq[i]);
  +   napi_enable(vi-sq[i].napi);
  }
   
  return 0;
   }
   
  -static void free_old_xmit_skbs(struct send_queue *sq)
  -{
  -   struct sk_buff *skb;
  -   unsigned int len;
  -   struct virtnet_info *vi = sq-vq-vdev-priv;
  -   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  -
  -   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  -   pr_debug(Sent skb %p\n, skb);
  -
  -   u64_stats_update_begin(stats-tx_syncp);
  -   stats-tx_bytes += skb-len;
  -   stats-tx_packets++;
  -   u64_stats_update_end(stats-tx_syncp);
  -
  -   dev_kfree_skb_any(skb);
  -   }
  -}
  -
   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
   {
  struct skb_vnet_hdr *hdr;
  @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct 
  sk_buff *skb)
  

Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote:
 On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
  According to David, proper accounting and queueing (at all levels, not
  just TCP sockets) is more important than trying to skim a bunch of
  cycles by avoiding TX interrupts.
  He also mentioned we should find other ways to batch
 
 
 Right.
  Having an event to free the SKB is
  absolutely essential for the stack to operate correctly.
 
  This series tries to enable tx interrupt for virtio-net. The idea is
  simple: enable tx interrupt and schedule a tx napi to free old xmit
  skbs.
 
  Several notes:
  - Tx interrupt storm avoidance when queue is about to be full is
kept.
  But queue is typically *not* full. More important to avoid interrupt
  storms in that case IMO.
 
 Yes.
  Since we may enable callbacks in both ndo_start_xmit() and tx
napi, patch 1 adds a check to make sure used event never go
back. This will let the napi not enable the callbacks wrongly after
delayed callbacks was used.
  So why not just use delayed callbacks?
 
 This means the tx interrupt are coalesced in a somewhat adaptive way.
 Need benchmark to see its effect.

I think it's a minimal change, and does not need new APIs.
If that's not optimal, we can do smarter things on top.

 
  - For bulk dequeuing, there's no need to enable tx interrupt for each
packet. The last patch only enable tx interrupt for the final skb in
the chain through xmit_more and a new helper to publish current avail
idx as used event.
 
  This series fixes several issues of original rfc pointed out by Michael.
  Could you list the issues, for ease of review?
 
 Probably just one:
 
 - Move the virtqueue_disable_cb() from skb_xmit_done() into
 virtnet_poll_tx() under tx lock.

I think I did this already, I'll recheck.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 10:51:32AM +, David Laight wrote:
 From: Michael S. Tsirkin
  On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote:
   From: Of Michael S. Tsirkin
On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
 Accumulate the sent packets and sent bytes in local variables and 
 perform a
 single u64_stats_update_begin/end() after.

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
   
Not sure how much it's worth but since Eric suggested it ...
  
   Probably depends on the actual cost of u64_stats_update_begin/end
   against the likely extra saving of the tx_bytes and tx_packets
   values onto the stack across the call to dev_kfree_skb_any().
   (Which depends on the number of caller saved registers.)
  
  Yea, some benchmark results would be nice to see.
 
 I there are likely to be multiple skb on the queue the fastest
 code would probably do one 'virtqueue_get_all()' that returned
 a linked list of buffers, then follow the list to get the stats,
 and follow it again to free the skb.
 
   David

Interesting. Each time we tried playing with linked list in the past,
it simply destroyed performance.
Maybe this case is different but I have my doubts.

Acked-by: Michael S. Tsirkin m...@redhat.com
   
 ---
  drivers/net/virtio_net.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 3d0ce44..a4d56b8 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct 
 send_queue *sq)
   unsigned int len;
   struct virtnet_info *vi = sq-vq-vdev-priv;
   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + u64 tx_bytes = 0, tx_packets = 0;
  
   tx_packets need only be 'unsigned int'.
   The same is almost certainly true of tx_bytes.
  
 David
  
   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
   pr_debug(Sent skb %p\n, skb);

 - u64_stats_update_begin(stats-tx_syncp);
 - stats-tx_bytes += skb-len;
 - stats-tx_packets++;
 - u64_stats_update_end(stats-tx_syncp);
 + tx_bytes += skb-len;
 + tx_packets++;

   dev_kfree_skb_any(skb);
   }
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += tx_bytes;
 + stats-tx_packets =+ tx_packets;
 + u64_stats_update_end(stats-tx_syncp);
  }

  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 --
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio_net: fix use after free

2014-10-15 Thread Michael S. Tsirkin
commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
net: Remove ndo_xmit_flush netdev operation, use signalling instead.

added code that looks at skb-xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.

Fix by storing xmit_more in a local variable.

Cc: David S. Miller da...@davemloft.net
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

David, am I using the API correctly?
Seems to work for me.
You used __netif_subqueue_stopped but that seems to use
a slightly more expensive test_bit internally.
The reason I added a variable for the txq here is because it's handy for
BQL patch later on.


 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..13d0a8b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
int qnum = skb_get_queue_mapping(skb);
struct send_queue *sq = vi-sq[qnum];
int err;
+   struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+   bool kick = !skb-xmit_more;
 
/* Free up any pending old buffers before queueing new ones. */
free_old_xmit_skbs(sq);
@@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
}
 
-   if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
+   if (kick || netif_xmit_stopped(txq))
virtqueue_kick(sq-vq);
 
return NETDEV_TX_OK;
-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH] virtio_net: fix use after free

2014-10-15 Thread David Laight
From: Michael S. Tsirkin
 commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
 net: Remove ndo_xmit_flush netdev operation, use signalling instead.
 
 added code that looks at skb-xmit_more after the skb has
 been put in TX VQ. Since some paths process the ring and free the skb
 immediately, this can cause use after free.
 
 Fix by storing xmit_more in a local variable.
 
 Cc: David S. Miller da...@davemloft.net
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 David, am I using the API correctly?
 Seems to work for me.
 You used __netif_subqueue_stopped but that seems to use
 a slightly more expensive test_bit internally.
 The reason I added a variable for the txq here is because it's handy for
 BQL patch later on.
 
 
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 3d0ce44..13d0a8b 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   int qnum = skb_get_queue_mapping(skb);
   struct send_queue *sq = vi-sq[qnum];
   int err;
 + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);

Do you need to cache 'txq' on stack for the entire call?
Looks like it is only needed when 'kick' is true.
I've not looked to see if saves both 'dev' and 'qnum' being kept.

In any case it isn't mentioned in the commit message.

David

 + bool kick = !skb-xmit_more;
 
   /* Free up any pending old buffers before queueing new ones. */
   free_old_xmit_skbs(sq);
 @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   }
   }
 
 - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
 + if (kick || netif_xmit_stopped(txq))
   virtqueue_kick(sq-vq);
 
   return NETDEV_TX_OK;
 --
 MST
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: fix use after free

2014-10-15 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 01:24:57PM +, David Laight wrote:
 From: Michael S. Tsirkin
  commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
  net: Remove ndo_xmit_flush netdev operation, use signalling instead.
  
  added code that looks at skb-xmit_more after the skb has
  been put in TX VQ. Since some paths process the ring and free the skb
  immediately, this can cause use after free.
  
  Fix by storing xmit_more in a local variable.
  
  Cc: David S. Miller da...@davemloft.net
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  David, am I using the API correctly?
  Seems to work for me.
  You used __netif_subqueue_stopped but that seems to use
  a slightly more expensive test_bit internally.
  The reason I added a variable for the txq here is because it's handy for
  BQL patch later on.
  
  
   drivers/net/virtio_net.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 3d0ce44..13d0a8b 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
  int qnum = skb_get_queue_mapping(skb);
  struct send_queue *sq = vi-sq[qnum];
  int err;
  +   struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 
 Do you need to cache 'txq' on stack for the entire call?
 Looks like it is only needed when 'kick' is true.
 I've not looked to see if saves both 'dev' and 'qnum' being kept.
 
 In any case it isn't mentioned in the commit message.
 
   David

Code seems slightly neater this way, I haven't bothered to
micro-optimize it to this level yet.
Want to benchmark and send a patch on top?

  +   bool kick = !skb-xmit_more;
  
  /* Free up any pending old buffers before queueing new ones. */
  free_old_xmit_skbs(sq);
  @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
  }
  }
  
  -   if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more)
  +   if (kick || netif_xmit_stopped(txq))
  virtqueue_kick(sq-vq);
  
  return NETDEV_TX_OK;
  --
  MST
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v2 1/3] virtio_net: enable tx interrupt

2014-10-15 Thread Michael S. Tsirkin
On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Based on patch by Jason Wang.

Note: this might degrade performance for
hosts without event idx support.
Should be addressed by the next patch.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c | 137 ---
 1 file changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 13d0a8b..a9bf178 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
/* Name of the send queue: output.$index */
char name[40];
+
+   struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,37 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+   struct sk_buff *skb;
+   unsigned int len;
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   unsigned int packets = 0;
+
+   while (packets  budget 
+  (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
+   pr_debug(Sent skb %p\n, skb);
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += skb-len;
+   stats-tx_packets++;
+   u64_stats_update_end(stats-tx_syncp);
+
+   dev_kfree_skb_any(skb);
+   packets++;
+   }
+
+   return packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq-vdev-priv;
+   struct send_queue *sq = vi-sq[vq2txq(vq)];
 
-   /* Suppress further interrupts. */
-   virtqueue_disable_cb(vq);
-
-   /* We were probably waiting for more output buffers. */
-   netif_wake_subqueue(vi-dev, vq2txq(vq));
+   if (napi_schedule_prep(sq-napi))
+   __napi_schedule(sq-napi);
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,6 +798,37 @@ again:
return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+   struct send_queue *sq =
+   container_of(napi, struct send_queue, napi);
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
+   unsigned int sent = 0;
+   bool enable_done;
+
+again:
+   __netif_tx_lock(txq, smp_processor_id());
+   virtqueue_disable_cb(sq-vq);
+   sent += free_old_xmit_skbs(sq, budget - sent);
+
+   if (sent  budget) {
+   enable_done = virtqueue_enable_cb_delayed(sq-vq);
+   napi_complete(napi);
+   __netif_tx_unlock(txq);
+   if (unlikely(enable_done)  napi_schedule_prep(napi)) {
+   virtqueue_disable_cb(sq-vq);
+   __napi_schedule(napi);
+   goto again;
+   }
+   } else {
+   __netif_tx_unlock(txq);
+   }
+
+   netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
+   return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -822,30 +877,12 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
virtnet_napi_enable(vi-rq[i]);
+   napi_enable(vi-sq[i].napi);
}
 
return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-   struct sk_buff *skb;
-   unsigned int len;
-   struct virtnet_info *vi = sq-vq-vdev-priv;
-   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
-
-   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
-   pr_debug(Sent skb %p\n, skb);
-
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
-
-   dev_kfree_skb_any(skb);
-   }
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
struct skb_vnet_hdr *hdr;
@@ -911,7 +948,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
sg_set_buf(sq-sg, hdr, hdr_len);
num_sg = skb_to_sgvec(skb, sq-sg + 1, 0, skb-len) + 1;
}
-   return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, GFP_ATOMIC);
+
+   return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb,
+   GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -919,12 +958,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 

[PATCH RFC v2 3/3] virtio-net: optimize free_old_xmit_skbs stats

2014-10-15 Thread Michael S. Tsirkin
From: Jason Wang jasow...@redhat.com

We already have counters for sent packets and sent bytes.
Use them to reduce the number of u64_stats_update_begin/end().

Take care not to bother with stats update when called
speculatively.

Based on a patch by Jason Wang.

Cc: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Jason Wang jasow...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8dea411..4e12023 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -233,16 +233,22 @@ static unsigned int free_old_xmit_skbs(struct 
netdev_queue *txq,
   (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
pr_debug(Sent skb %p\n, skb);
 
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
+   packets++;
 
dev_kfree_skb_any(skb);
-   packets++;
}
 
+   /* Avoid overhead when no packets have been processed
+* happens when called speculatively from start_xmit. */
+   if (!packets)
+   return 0;
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += bytes;
+   stats-tx_packets += packets;
+   u64_stats_update_end(stats-tx_syncp);
+
netdev_tx_completed_queue(txq, packets, bytes);
 
return packets;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v2 2/3] virtio_net: bql

2014-10-15 Thread Michael S. Tsirkin
Improve tx batching using byte queue limits.
Should be especially effective for MQ.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a9bf178..8dea411 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -219,13 +219,15 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
-static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
+  struct send_queue *sq, int budget)
 {
struct sk_buff *skb;
unsigned int len;
struct virtnet_info *vi = sq-vq-vdev-priv;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
unsigned int packets = 0;
+   unsigned int bytes = 0;
 
while (packets  budget 
   (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
@@ -233,6 +235,7 @@ static unsigned int free_old_xmit_skbs(struct send_queue 
*sq, int budget)
 
u64_stats_update_begin(stats-tx_syncp);
stats-tx_bytes += skb-len;
+   bytes += skb-len;
stats-tx_packets++;
u64_stats_update_end(stats-tx_syncp);
 
@@ -240,6 +243,8 @@ static unsigned int free_old_xmit_skbs(struct send_queue 
*sq, int budget)
packets++;
}
 
+   netdev_tx_completed_queue(txq, packets, bytes);
+
return packets;
 }
 
@@ -810,7 +815,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
 again:
__netif_tx_lock(txq, smp_processor_id());
virtqueue_disable_cb(sq-vq);
-   sent += free_old_xmit_skbs(sq, budget - sent);
+   sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
if (sent  budget) {
enable_done = virtqueue_enable_cb_delayed(sq-vq);
@@ -962,12 +967,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
bool kick = !skb-xmit_more;
bool stopped;
+   unsigned int bytes = skb-len;
 
virtqueue_disable_cb(sq-vq);
 
/* We are going to push one skb.
 * Try to pop one off to free space for it. */
-   free_old_xmit_skbs(sq, 1);
+   free_old_xmit_skbs(txq, sq, 1);
 
/* Try to transmit */
err = xmit_skb(sq, skb);
@@ -983,6 +989,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
}
 
+   netdev_tx_sent_queue(txq, bytes);
+
+   /* Kick early so device can process descriptors in parallel with us. */
+   if (kick)
+   virtqueue_kick(sq-vq);
+
/* Apparently nice girls don't return TX_BUSY; stop the queue
 * before it gets out of hand.  Naturally, this wastes entries. */
if (sq-vq-num_free  2+MAX_SKB_FRAGS) {
@@ -997,7 +1009,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
if (unlikely(!virtqueue_enable_cb_delayed(sq-vq))) {
/* More just got used, free them then recheck. */
-   free_old_xmit_skbs(sq, qsize);
+   free_old_xmit_skbs(txq, sq, qsize);
if (stopped  sq-vq-num_free = 2+MAX_SKB_FRAGS)
netif_start_subqueue(dev, qnum);
}
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-15 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f893148..66cac10 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/2] shrink virtio baloon on OOM in guest

2014-10-15 Thread Denis V. Lunev
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when
Linux is under severe memory pressure. Various mechanisms are responsible for
correct virtio_balloon memory management. Nevertheless it is often the case
that these control tools does not have enough time to react on fast changing
memory load. As a result OS runs out of memory and invokes OOM-killer.
The balancing of memory by use of the virtio balloon should not cause the
termination of processes while there are pages in the balloon. Now there is
no way for virtio balloon driver to free memory at the last moment before
some process get killed by OOM-killer.

This does not provide a security breach as baloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is expected
to be called from the oom notifier call chain in out_of_memory() function.
If virtio balloon could release some memory, it will make the system to
return and retry the allocation that forced the out of memory killer to run.

Patch 1 of this series adds support for implementation of virtio_balloon
callback, so now leak_balloon() function returns number of freed pages.
Patch 2 implements virtio_balloon callback itself.

Changes from v2:
- added feature bit to control OOM baloon behavior from host
Changes from v1:
- minor cosmetic tweaks suggested by rusty@

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM

2014-10-15 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Allocate virtio  feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 52 +
 include/uapi/linux/virtio_balloon.h |  1 +
 2 files changed, 53 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 66cac10..88d73a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ * memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the
+ * system return and retry the allocation that forced the OOM killer
+ * to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   struct virtio_balloon *vb;
+   unsigned long *freed;
+   unsigned num_freed_pages;
+
+   vb = container_of(self, struct virtio_balloon, nb);
+   if (!virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   return NOTIFY_OK;
+
+   freed = parm;
+   num_freed_pages = leak_balloon(vb, oom_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -446,6 +488,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -455,6 +503,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb:
kfree(vb);
@@ -479,6 +529,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {

Re: [PATCH] virtio_net: fix use after free

2014-10-15 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Wed, 15 Oct 2014 16:23:28 +0300

 You used __netif_subqueue_stopped but that seems to use
 a slightly more expensive test_bit internally.

More expensive in what sense?  It should be roughly the same
as x  y sans the volatile.

Anyways I'm ambivalent and I want to see this bug fixes, so I'll
apply your patch.

Thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization