Re: [PATCH v4 04/25] virtio: defer config changed notifications

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Defer config changed notifications that arrive during
 probe/scan/freeze/restore.

 This will allow drivers to set DRIVER_OK earlier, without worrying about
 racing with config change interrupts.

 This change will also benefit old hypervisors (before 2009)
 that send interrupts without checking DRIVER_OK: previously,
 the callback could race with driver-specific initialization.

 This will also help simplify drivers.

But AFAICT you never *read* dev-config_changed.

You unconditionally trigger a config_changed event in
virtio_config_enable().  That's a bit weird, but probably OK.

How about the following change (on top of your patch).  I
think the renaming is clearer, and note the added if() test in
virtio_config_enable().

If you approve, I'll fold it in.

Cheers,
Rusty.

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 2536701b098b..df598dd8c5c8 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device 
*dev)
struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
 
if (!dev-config_enabled)
-   dev-config_changed = true;
+   dev-config_change_pending = true;
else if (drv  drv-config_changed)
drv-config_changed(dev);
 }
@@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device *dev)
 {
spin_lock_irq(dev-config_lock);
dev-config_enabled = true;
-   __virtio_config_changed(dev);
-   dev-config_changed = false;
+   if (dev-config_change_pending)
+   __virtio_config_changed(dev);
+   dev-config_change_pending = false;
spin_unlock_irq(dev-config_lock);
 }
 
@@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev)
 
spin_lock_init(dev-config_lock);
dev-config_enabled = false;
-   dev-config_changed = false;
+   dev-config_change_pending = false;
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5636b119dc25..65261a7244fc 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * @index: unique position on the virtio bus
  * @failed: saved value for CONFIG_S_FAILED bit (for restore)
  * @config_enabled: configuration change reporting enabled
- * @config_changed: configuration change reported while disabled
+ * @config_change_pending: configuration change reported while disabled
  * @config_lock: protects configuration change reporting
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
@@ -94,7 +94,7 @@ struct virtio_device {
int index;
bool failed;
bool config_enabled;
-   bool config_changed;
+   bool config_change_pending;
spinlock_t config_lock;
struct device dev;
struct virtio_device_id id;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio_net: pass well-formed sgs to virtqueue_add_*()

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Fri, Sep 05, 2014 at 12:40:50PM +0200, Paolo Bonzini wrote:
 Il 03/09/2014 06:29, Rusty Russell ha scritto:
  +  sg_init_table(rq-sg, MAX_SKB_FRAGS + 2);
 
 I think 2 is enough here.  That said...
 
 sg_set_buf(rq-sg, hdr-hdr, sizeof hdr-hdr);
  -
 skb_to_sgvec(skb, rq-sg + 1, 0, skb-len);
   
 err = virtqueue_add_inbuf(rq-vq, rq-sg, 2, skb, gfp);
 
 ... skb_to_sgvec will already make the sg well formed, so the
 sg_init_table is _almost_ redundant; it is only there to remove
 intermediate end marks.  The block layer takes care to remove
 them, but skb_to_sgvec doesn't.

sg_init_table is still needed if CONFIG_DEBUG_SG, so I don't
think it's worth it.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:
 Denis V. Lunev d...@parallels.com writes:
  From: Raushaniya Maksudova rmaksud...@parallels.com
 
  Excessive virtio_balloon inflation can cause invocation of OOM-killer,
  when Linux is under severe memory pressure. Various mechanisms are
  responsible for correct virtio_balloon memory management. Nevertheless
  it is often the case that these control tools does not have enough time
  to react on fast changing memory load. As a result OS runs out of memory
  and invokes OOM-killer. The balancing of memory by use of the virtio
  balloon should not cause the termination of processes while there are
  pages in the balloon. Now there is no way for virtio balloon driver to
  free some memory at the last moment before some process will be get
  killed by OOM-killer.
 
 This makes some amount of sense.

 This reminds me of the balloon fs that Google once proposed.
 This really needs to be controlled from host though.
 At the moment host does not expect guest to deflate before
 requests.
 So as a minimum, add a feature bit for this.  what if you want a mix of
 mandatory and optional balooning? I guess we can use multiple balloons,
 is that the idea?

Trying to claw back some pages on OOM is almost certainly correct,
even if the host doesn't expect it.  It's roughly equivalent to not
giving up pages in the first place.

Cheers,
Rusty.
PS.  Yes, a real guest-driven balloon is preferable, but that's a much
 larger task.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 04/25] virtio: defer config changed notifications

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 11:01:12AM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  Defer config changed notifications that arrive during
  probe/scan/freeze/restore.
 
  This will allow drivers to set DRIVER_OK earlier, without worrying about
  racing with config change interrupts.
 
  This change will also benefit old hypervisors (before 2009)
  that send interrupts without checking DRIVER_OK: previously,
  the callback could race with driver-specific initialization.
 
  This will also help simplify drivers.
 
 But AFAICT you never *read* dev-config_changed.
 
 You unconditionally trigger a config_changed event in
 virtio_config_enable().  That's a bit weird, but probably OK.
 
 How about the following change (on top of your patch).  I
 think the renaming is clearer, and note the added if() test in
 virtio_config_enable().
 
 If you approve, I'll fold it in.
 
 Cheers,
 Rusty.

Hi Rusty,
I'm okay with both changes.
You can fold it in if you prefer, or just make it a patch on top.

 diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
 index 2536701b098b..df598dd8c5c8 100644
 --- a/drivers/virtio/virtio.c
 +++ b/drivers/virtio/virtio.c
 @@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device 
 *dev)
   struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
  
   if (!dev-config_enabled)
 - dev-config_changed = true;
 + dev-config_change_pending = true;
   else if (drv  drv-config_changed)
   drv-config_changed(dev);
  }
 @@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device 
 *dev)
  {
   spin_lock_irq(dev-config_lock);
   dev-config_enabled = true;
 - __virtio_config_changed(dev);
 - dev-config_changed = false;
 + if (dev-config_change_pending)
 + __virtio_config_changed(dev);
 + dev-config_change_pending = false;
   spin_unlock_irq(dev-config_lock);
  }
  
 @@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev)
  
   spin_lock_init(dev-config_lock);
   dev-config_enabled = false;
 - dev-config_changed = false;
 + dev-config_change_pending = false;
  
   /* We always start by resetting the device, in case a previous
* driver messed it up.  This also tests that code path a little. */
 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index 5636b119dc25..65261a7244fc 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
   * @index: unique position on the virtio bus
   * @failed: saved value for CONFIG_S_FAILED bit (for restore)
   * @config_enabled: configuration change reporting enabled
 - * @config_changed: configuration change reported while disabled
 + * @config_change_pending: configuration change reported while disabled
   * @config_lock: protects configuration change reporting
   * @dev: underlying device.
   * @id: the device type identification (used to match it with a driver).
 @@ -94,7 +94,7 @@ struct virtio_device {
   int index;
   bool failed;
   bool config_enabled;
 - bool config_changed;
 + bool config_change_pending;
   spinlock_t config_lock;
   struct device dev;
   struct virtio_device_id id;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:
  Denis V. Lunev d...@parallels.com writes:
   From: Raushaniya Maksudova rmaksud...@parallels.com
  
   Excessive virtio_balloon inflation can cause invocation of OOM-killer,
   when Linux is under severe memory pressure. Various mechanisms are
   responsible for correct virtio_balloon memory management. Nevertheless
   it is often the case that these control tools does not have enough time
   to react on fast changing memory load. As a result OS runs out of memory
   and invokes OOM-killer. The balancing of memory by use of the virtio
   balloon should not cause the termination of processes while there are
   pages in the balloon. Now there is no way for virtio balloon driver to
   free some memory at the last moment before some process will be get
   killed by OOM-killer.
  
  This makes some amount of sense.
 
  This reminds me of the balloon fs that Google once proposed.
  This really needs to be controlled from host though.
  At the moment host does not expect guest to deflate before
  requests.
  So as a minimum, add a feature bit for this.  what if you want a mix of
  mandatory and optional balooning? I guess we can use multiple balloons,
  is that the idea?
 
 Trying to claw back some pages on OOM is almost certainly correct,
 even if the host doesn't expect it.  It's roughly equivalent to not
 giving up pages in the first place.

Well the difference is that there are management tools that
poll balloon in host until they see balloon size reaches
the expected value.

They don't expect balloon to shrink below num_pages and will respond in various
unexpected ways like e.g. killing the VM if it does.
Killing a userspace process within the guest might be better
for VM health.

Besides the fact that we always did it like this, these tools seem to have
basis in the spec.
Specifically, this is based on this text from the spec:
the device asks for a certain amount of memory, and the driver
supplies it (or withdraws it, if the device has more than it asks for).
This allows the guest to adapt to changes in allowance of underlying
physical memory.

and

The device is driven by the receipt of a configuration change interrupt.



 Cheers,
 Rusty.
 PS.  Yes, a real guest-driven balloon is preferable, but that's a much
  larger task.


Any objection to making the feature depend on a feature flag?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread David Miller
From: Jason Wang jasow...@redhat.com
Date: Sat, 11 Oct 2014 15:16:43 +0800

 We free old transmitted packets in ndo_start_xmit() currently, so any
 packet must be orphaned also there. This was used to reduce the overhead of
 tx interrupt to achieve better performance. But this may not work for some
 protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
 implement various optimization for small packets stream such as TCP small
 queue and auto corking. But orphaning packets early in ndo_start_xmit()
 disable such things more or less since sk_wmem_alloc was not accurate. This
 lead extra low throughput for TCP stream of small writes.
 
 This series tries to solve this issue by enable tx interrupts for all TCP
 packets other than the ones with push bit or pure ACK. This is done through
 the support of urgent descriptor which can force an interrupt for a
 specified packet. If tx interrupt was enabled for a packet, there's no need
 to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
 by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
 can batch more for small write. More larger skb was produced by TCP in this
 case to improve both throughput and cpu utilization.
 
 Test shows great improvements on small write tcp streams. For most of the
 other cases, the throughput and cpu utilization are the same in the
 past. Only few cases, more cpu utilization was noticed which needs more
 investigation.
 
 Review and comments are welcomed.

I think proper accounting and queueing (at all levels, not just TCP
sockets) is more important than trying to skim a bunch of cycles by
avoiding TX interrupts.

Having an event to free the SKB is absolutely essential for the stack
to operate correctly.

And with virtio-net you don't even have the excuse of the HW
unfortunately doesn't have an appropriate TX event.

So please don't play games, and instead use TX interrupts all the
time.  You can mitigate them in various ways, but don't turn them on
selectively based upon traffic type, that's terrible.

You can even use -xmit_more to defer the TX interrupt indication to
the final TX packet in the chain.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
 From: Jason Wang jasow...@redhat.com
 Date: Sat, 11 Oct 2014 15:16:43 +0800
 
  We free old transmitted packets in ndo_start_xmit() currently, so any
  packet must be orphaned also there. This was used to reduce the overhead of
  tx interrupt to achieve better performance. But this may not work for some
  protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
  implement various optimization for small packets stream such as TCP small
  queue and auto corking. But orphaning packets early in ndo_start_xmit()
  disable such things more or less since sk_wmem_alloc was not accurate. This
  lead extra low throughput for TCP stream of small writes.
  
  This series tries to solve this issue by enable tx interrupts for all TCP
  packets other than the ones with push bit or pure ACK. This is done through
  the support of urgent descriptor which can force an interrupt for a
  specified packet. If tx interrupt was enabled for a packet, there's no need
  to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
  by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
  can batch more for small write. More larger skb was produced by TCP in this
  case to improve both throughput and cpu utilization.
  
  Test shows great improvements on small write tcp streams. For most of the
  other cases, the throughput and cpu utilization are the same in the
  past. Only few cases, more cpu utilization was noticed which needs more
  investigation.
  
  Review and comments are welcomed.
 
 I think proper accounting and queueing (at all levels, not just TCP
 sockets) is more important than trying to skim a bunch of cycles by
 avoiding TX interrupts.
 
 Having an event to free the SKB is absolutely essential for the stack
 to operate correctly.
 
 And with virtio-net you don't even have the excuse of the HW
 unfortunately doesn't have an appropriate TX event.
 
 So please don't play games, and instead use TX interrupts all the
 time.  You can mitigate them in various ways, but don't turn them on
 selectively based upon traffic type, that's terrible.

This got me thinking: how about using virtqueue_enable_cb_delayed
for this mitigation?

It's pretty easy to implement - I'll send a proof of concept patch
separately.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
 We free transmitted packets in ndo_start_xmit() in the past to get better
 performance in the past. One side effect is that skb_orphan() needs to be
 called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
 fact. For TCP protocol, this means several optimization could not work well
 such as TCP small queue and auto corking. This can lead extra low
 throughput of small packets stream.
 
 Thanks to the urgent descriptor support. This patch tries to solve this
 issue by enable the tx interrupt selectively for stream packets. This means
 we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
 tx interrupt for those packets. After we get tx interrupt, a tx napi was
 scheduled to free those packets.
 
 With this method, sk_wmem_alloc of TCP socket were more accurate than in
 the past which let TCP can batch more through TSQ and auto corking.
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/net/virtio_net.c | 164 
 ---
  1 file changed, 128 insertions(+), 36 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 5810841..b450fc4 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -72,6 +72,8 @@ struct send_queue {
  
   /* Name of the send queue: output.$index */
   char name[40];
 +
 + struct napi_struct napi;
  };
  
  /* Internal representation of a receive virtqueue */
 @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
 *rq, gfp_t gfp_mask)
   return p;
  }
  
 +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
 +{
 + struct sk_buff *skb;
 + unsigned int len;
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 + int sent = 0;
 +
 + while (sent  budget 
 +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
 + pr_debug(Sent skb %p\n, skb);
 +
 + u64_stats_update_begin(stats-tx_syncp);
 + stats-tx_bytes += skb-len;
 + stats-tx_packets++;
 + u64_stats_update_end(stats-tx_syncp);
 +
 + dev_kfree_skb_any(skb);
 + sent++;
 + }
 +
 + return sent;
 +}
 +
  static void skb_xmit_done(struct virtqueue *vq)
  {
   struct virtnet_info *vi = vq-vdev-priv;
 + struct send_queue *sq = vi-sq[vq2txq(vq)];
  
 - /* Suppress further interrupts. */
 - virtqueue_disable_cb(vq);
 -
 - /* We were probably waiting for more output buffers. */
 - netif_wake_subqueue(vi-dev, vq2txq(vq));
 + if (napi_schedule_prep(sq-napi)) {
 + virtqueue_disable_cb(vq);
 + virtqueue_disable_cb_urgent(vq);

This disable_cb is no longer safe in xmit_done callback,
since queue can be running at the same time.

You must do it under tx lock. And yes, this likely will not work
work well without event_idx. We'll probably need extra
synchronization for such old hosts.



 + __napi_schedule(sq-napi);
 + }
  }
  
  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
 @@ -772,7 +799,38 @@ again:
   return received;
  }
  
 +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 +{
 + struct send_queue *sq =
 + container_of(napi, struct send_queue, napi);
 + struct virtnet_info *vi = sq-vq-vdev-priv;
 + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
 + unsigned int r, sent = 0;
 +
 +again:
 + __netif_tx_lock(txq, smp_processor_id());
 + sent += free_old_xmit_skbs(sq, budget - sent);
 +
 + if (sent  budget) {
 + r = virtqueue_enable_cb_prepare_urgent(sq-vq);
 + napi_complete(napi);
 + __netif_tx_unlock(txq);
 + if (unlikely(virtqueue_poll(sq-vq, r)) 
 + napi_schedule_prep(napi)) {
 + virtqueue_disable_cb_urgent(sq-vq);
 + __napi_schedule(napi);
 + goto again;
 + }
 + } else {
 + __netif_tx_unlock(txq);
 + }
 +
 + netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
 + return sent;
 +}
 +
  #ifdef CONFIG_NET_RX_BUSY_POLL
 +
  /* must be called with local_bh_disable()d */
  static int virtnet_busy_poll(struct napi_struct *napi)
  {
 @@ -820,31 +878,13 @@ static int virtnet_open(struct net_device *dev)
   if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   schedule_delayed_work(vi-refill, 0);
   virtnet_napi_enable(vi-rq[i]);
 + napi_enable(vi-sq[i].napi);
   }
  
   return 0;
  }
  
 -static void free_old_xmit_skbs(struct send_queue *sq)
 -{
 - struct sk_buff *skb;
 - unsigned int len;
 - struct virtnet_info *vi = sq-vq-vdev-priv;
 - struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 -
 - while ((skb = 

[PATCH RFC] virtio_net: enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Based on patch by Jason Wang.

Note: this will likely degrade performance for hosts without event idx
support.  Various fallback options are available, including
orphaning conditionally.
Testing TBD.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c | 119 +--
 1 file changed, 83 insertions(+), 36 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6b6e136..62c059d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
/* Name of the send queue: output.$index */
char name[40];
+
+   struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+   struct sk_buff *skb;
+   unsigned int len;
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
+   int sent = 0;
+
+   while (sent  budget 
+  (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
+   pr_debug(Sent skb %p\n, skb);
+
+   u64_stats_update_begin(stats-tx_syncp);
+   stats-tx_bytes += skb-len;
+   stats-tx_packets++;
+   u64_stats_update_end(stats-tx_syncp);
+
+   dev_kfree_skb_any(skb);
+   sent++;
+   }
+
+   return sent;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq-vdev-priv;
+   struct send_queue *sq = vi-sq[vq2txq(vq)];
 
-   /* Suppress further interrupts. */
-   virtqueue_disable_cb(vq);
-
-   /* We were probably waiting for more output buffers. */
-   netif_wake_subqueue(vi-dev, vq2txq(vq));
+   if (napi_schedule_prep(sq-napi)) {
+   __napi_schedule(sq-napi);
+   }
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -766,6 +791,37 @@ again:
return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+   struct send_queue *sq =
+   container_of(napi, struct send_queue, napi);
+   struct virtnet_info *vi = sq-vq-vdev-priv;
+   struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq));
+   unsigned int r, sent = 0;
+
+again:
+   __netif_tx_lock(txq, smp_processor_id());
+   virtqueue_disable_cb(sq-vq);
+   sent += free_old_xmit_skbs(sq, budget - sent);
+
+   if (sent  budget) {
+   r = virtqueue_enable_cb_prepare(sq-vq);
+   napi_complete(napi);
+   __netif_tx_unlock(txq);
+   if (unlikely(virtqueue_poll(sq-vq, r)) 
+   napi_schedule_prep(napi)) {
+   virtqueue_disable_cb(sq-vq);
+   __napi_schedule(napi);
+   goto again;
+   }
+   } else {
+   __netif_tx_unlock(txq);
+   }
+
+   netif_wake_subqueue(vi-dev, vq2txq(sq-vq));
+   return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
schedule_delayed_work(vi-refill, 0);
virtnet_napi_enable(vi-rq[i]);
+   napi_enable(vi-sq[i].napi);
}
 
return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-   struct sk_buff *skb;
-   unsigned int len;
-   struct virtnet_info *vi = sq-vq-vdev-priv;
-   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
-
-   while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
-   pr_debug(Sent skb %p\n, skb);
-
-   u64_stats_update_begin(stats-tx_syncp);
-   stats-tx_bytes += skb-len;
-   stats-tx_packets++;
-   u64_stats_update_end(stats-tx_syncp);
-
-   dev_kfree_skb_any(skb);
-   }
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
struct skb_vnet_hdr *hdr;
@@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
sg_set_buf(sq-sg, hdr, hdr_len);
num_sg = skb_to_sgvec(skb, sq-sg + 1, 0, skb-len) + 1;
}
-   return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, GFP_ATOMIC);
+
+   return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb,
+   GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -910,10 +950,9 @@ static netdev_tx_t 

Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
 From: Jason Wang jasow...@redhat.com
 Date: Sat, 11 Oct 2014 15:16:43 +0800
 
  We free old transmitted packets in ndo_start_xmit() currently, so any
  packet must be orphaned also there. This was used to reduce the overhead of
  tx interrupt to achieve better performance. But this may not work for some
  protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
  implement various optimization for small packets stream such as TCP small
  queue and auto corking. But orphaning packets early in ndo_start_xmit()
  disable such things more or less since sk_wmem_alloc was not accurate. This
  lead extra low throughput for TCP stream of small writes.
  
  This series tries to solve this issue by enable tx interrupts for all TCP
  packets other than the ones with push bit or pure ACK. This is done through
  the support of urgent descriptor which can force an interrupt for a
  specified packet. If tx interrupt was enabled for a packet, there's no need
  to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
  by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
  can batch more for small write. More larger skb was produced by TCP in this
  case to improve both throughput and cpu utilization.
  
  Test shows great improvements on small write tcp streams. For most of the
  other cases, the throughput and cpu utilization are the same in the
  past. Only few cases, more cpu utilization was noticed which needs more
  investigation.
  
  Review and comments are welcomed.
 
 I think proper accounting and queueing (at all levels, not just TCP
 sockets) is more important than trying to skim a bunch of cycles by
 avoiding TX interrupts.
 
 Having an event to free the SKB is absolutely essential for the stack
 to operate correctly.
 
 And with virtio-net you don't even have the excuse of the HW
 unfortunately doesn't have an appropriate TX event.
 
 So please don't play games, and instead use TX interrupts all the
 time.  You can mitigate them in various ways, but don't turn them on
 selectively based upon traffic type, that's terrible.
 
 You can even use -xmit_more to defer the TX interrupt indication to
 the final TX packet in the chain.

I guess we can just defer the kick, interrupt will naturally be
deferred as well.
This should solve the problem for old hosts as well.

We'll also need to implement bql for this.
Something like the below?
Completely untested - posting here to see if I figured the
API out correctly. Has to be applied on top of the previous patch.

---

virtio_net: bql + xmit_more

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 62c059d..c245047 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -213,13 +213,15 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static int free_old_xmit_skbs(struct netdev_queue *txq,
+ struct send_queue *sq, int budget)
 {
struct sk_buff *skb;
unsigned int len;
struct virtnet_info *vi = sq-vq-vdev-priv;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
int sent = 0;
+   unsigned int bytes = 0;
 
while (sent  budget 
   (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
@@ -227,6 +229,7 @@ static int free_old_xmit_skbs(struct send_queue *sq, int 
budget)
 
u64_stats_update_begin(stats-tx_syncp);
stats-tx_bytes += skb-len;
+   bytes += skb-len;
stats-tx_packets++;
u64_stats_update_end(stats-tx_syncp);
 
@@ -234,6 +237,8 @@ static int free_old_xmit_skbs(struct send_queue *sq, int 
budget)
sent++;
}
 
+   netdev_tx_completed_queue(txq, sent, bytes);
+
return sent;
 }
 
@@ -802,7 +807,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
 again:
__netif_tx_lock(txq, smp_processor_id());
virtqueue_disable_cb(sq-vq);
-   sent += free_old_xmit_skbs(sq, budget - sent);
+   sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
if (sent  budget) {
r = virtqueue_enable_cb_prepare(sq-vq);
@@ -951,6 +956,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
int qnum = skb_get_queue_mapping(skb);
struct send_queue *sq = vi-sq[qnum];
int err, qsize = virtqueue_get_vring_size(sq-vq);
+   struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+   bool kick = !skb-xmit_more || netif_xmit_stopped(txq);
+   unsigned int bytes = skb-len;
 
virtqueue_disable_cb(sq-vq);
 
@@ -967,7 +975,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
dev_kfree_skb_any(skb);

Re: [PATCH RFC] virtio_net: enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
  static void skb_xmit_done(struct virtqueue *vq)
  {
   struct virtnet_info *vi = vq-vdev-priv;
 + struct send_queue *sq = vi-sq[vq2txq(vq)];
  
 - /* Suppress further interrupts. */
 - virtqueue_disable_cb(vq);
 -

One note here: current code seems racy because of doing
virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
no guarantee we don't get an interrupt while tx ring is running, and if
that happens we can end up with interrupts disabled forever.

 - /* We were probably waiting for more output buffers. */
 - netif_wake_subqueue(vi-dev, vq2txq(vq));
 + if (napi_schedule_prep(sq-napi)) {
 + __napi_schedule(sq-napi);
 + }
  }
  
  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
 On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
 From: Jason Wang jasow...@redhat.com
 Date: Sat, 11 Oct 2014 15:16:43 +0800

 We free old transmitted packets in ndo_start_xmit() currently, so any
 packet must be orphaned also there. This was used to reduce the overhead of
 tx interrupt to achieve better performance. But this may not work for some
 protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
 implement various optimization for small packets stream such as TCP small
 queue and auto corking. But orphaning packets early in ndo_start_xmit()
 disable such things more or less since sk_wmem_alloc was not accurate. This
 lead extra low throughput for TCP stream of small writes.

 This series tries to solve this issue by enable tx interrupts for all TCP
 packets other than the ones with push bit or pure ACK. This is done through
 the support of urgent descriptor which can force an interrupt for a
 specified packet. If tx interrupt was enabled for a packet, there's no need
 to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
 by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
 can batch more for small write. More larger skb was produced by TCP in this
 case to improve both throughput and cpu utilization.

 Test shows great improvements on small write tcp streams. For most of the
 other cases, the throughput and cpu utilization are the same in the
 past. Only few cases, more cpu utilization was noticed which needs more
 investigation.

 Review and comments are welcomed.
 I think proper accounting and queueing (at all levels, not just TCP
 sockets) is more important than trying to skim a bunch of cycles by
 avoiding TX interrupts.

 Having an event to free the SKB is absolutely essential for the stack
 to operate correctly.

 And with virtio-net you don't even have the excuse of the HW
 unfortunately doesn't have an appropriate TX event.

 So please don't play games, and instead use TX interrupts all the
 time.  You can mitigate them in various ways, but don't turn them on
 selectively based upon traffic type, that's terrible.
 This got me thinking: how about using virtqueue_enable_cb_delayed
 for this mitigation?

Should work, another possible solution is interrupt coalescing, which
can speed up also the case without event index.
 It's pretty easy to implement - I'll send a proof of concept patch
 separately.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
 On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
  We free transmitted packets in ndo_start_xmit() in the past to get better
  performance in the past. One side effect is that skb_orphan() needs to be
  called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
  fact. For TCP protocol, this means several optimization could not work well
  such as TCP small queue and auto corking. This can lead extra low
  throughput of small packets stream.
  
  Thanks to the urgent descriptor support. This patch tries to solve this
  issue by enable the tx interrupt selectively for stream packets. This means
  we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
  tx interrupt for those packets. After we get tx interrupt, a tx napi was
  scheduled to free those packets.
  
  With this method, sk_wmem_alloc of TCP socket were more accurate than in
  the past which let TCP can batch more through TSQ and auto corking.
  
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   drivers/net/virtio_net.c | 164 
  ---
   1 file changed, 128 insertions(+), 36 deletions(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 5810841..b450fc4 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -72,6 +72,8 @@ struct send_queue {
   
 /* Name of the send queue: output.$index */
 char name[40];
  +
  +  struct napi_struct napi;
   };
   
   /* Internal representation of a receive virtqueue */
  @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
  *rq, gfp_t gfp_mask)
 return p;
   }
   
  +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
  +{
  +  struct sk_buff *skb;
  +  unsigned int len;
  +  struct virtnet_info *vi = sq-vq-vdev-priv;
  +  struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
  +  int sent = 0;
  +
  +  while (sent  budget 
  + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) {
  +  pr_debug(Sent skb %p\n, skb);
  +
  +  u64_stats_update_begin(stats-tx_syncp);
  +  stats-tx_bytes += skb-len;
  +  stats-tx_packets++;
  +  u64_stats_update_end(stats-tx_syncp);
  +
  +  dev_kfree_skb_any(skb);
  +  sent++;
  +  }
  +
  +  return sent;
  +}
  +
   static void skb_xmit_done(struct virtqueue *vq)
   {
 struct virtnet_info *vi = vq-vdev-priv;
  +  struct send_queue *sq = vi-sq[vq2txq(vq)];
   
  -  /* Suppress further interrupts. */
  -  virtqueue_disable_cb(vq);
  -
  -  /* We were probably waiting for more output buffers. */
  -  netif_wake_subqueue(vi-dev, vq2txq(vq));
  +  if (napi_schedule_prep(sq-napi)) {
  +  virtqueue_disable_cb(vq);
  +  virtqueue_disable_cb_urgent(vq);
 This disable_cb is no longer safe in xmit_done callback,
 since queue can be running at the same time.

 You must do it under tx lock. And yes, this likely will not work
 work well without event_idx. We'll probably need extra
 synchronization for such old hosts.




Yes, and the virtqueue_enable_cb_prepare() in virtnet_poll_tx() needs to
be synced with virtqueue_enabled_cb_dealyed(). Otherwise an old idx will
be published and we may still see tx interrupt storm.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 07:11 AM, Michael S. Tsirkin wrote:
 On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
   static void skb_xmit_done(struct virtqueue *vq)
   {
 struct virtnet_info *vi = vq-vdev-priv;
  +  struct send_queue *sq = vi-sq[vq2txq(vq)];
   
  -  /* Suppress further interrupts. */
  -  virtqueue_disable_cb(vq);
  -
 One note here: current code seems racy because of doing
 virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
 no guarantee we don't get an interrupt while tx ring is running, and if
 that happens we can end up with interrupts disabled forever.


Looks harmless since:

- if event index is enabled, virtqueue_disable_cb() does nothing in fact.
- if event index is disabled, we don't depend on tx interrupt and when
num_free is low we will try to enable the tx interrupt again.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()

2014-10-14 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f893148..66cac10 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned 
int num)
}
 }
 
-static void leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 {
+   unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 
@@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
+   num_freed_pages = vb-num_pfns;
/*
 * Note that if
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
@@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
tell_host(vb, vb-deflate_vq);
mutex_unlock(vb-balloon_lock);
release_pages_by_pfn(vb-pfns, vb-num_pfns);
+   return num_freed_pages;
 }
 
 static inline void update_stat(struct virtio_balloon *vb, int idx,
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM

2014-10-14 Thread Denis V. Lunev
From: Raushaniya Maksudova rmaksud...@parallels.com

Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.

This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.

To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.

Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Rusty Russell ru...@rustcorp.com.au
CC: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_balloon.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 66cac10..ab1fa69 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -28,6 +28,7 @@
 #include linux/slab.h
 #include linux/module.h
 #include linux/balloon_compaction.h
+#include linux/oom.h
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -36,6 +37,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
+#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
+
+static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
+module_param(oom_pages, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(oom_pages, pages to free on OOM);
 
 struct virtio_balloon
 {
@@ -71,6 +78,9 @@ struct virtio_balloon
/* Memory statistics */
int need_stats_update;
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+   /* To register callback in oom notifier call chain */
+   struct notifier_block nb;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -290,6 +300,35 @@ static void update_balloon_size(struct virtio_balloon *vb)
  actual);
 }
 
+/*
+ * virtballoon_oom_notify - release pages when system is under severe
+ *  memory pressure (called from out_of_memory())
+ * @self : notifier block struct
+ * @dummy: not used
+ * @parm : returned - number of freed pages
+ *
+ * The balancing of memory by use of the virtio balloon should not cause
+ * the termination of processes while there are pages in the balloon.
+ * If virtio balloon manages to release some memory, it will make the
+ * system return and retry the allocation that forced the OOM killer
+ * to run.
+ */
+static int virtballoon_oom_notify(struct notifier_block *self,
+ unsigned long dummy, void *parm)
+{
+   unsigned num_freed_pages;
+   unsigned long *freed;
+   struct virtio_balloon *vb;
+
+   freed = parm;
+   vb = container_of(self, struct virtio_balloon, nb);
+   num_freed_pages = leak_balloon(vb, oom_pages);
+   update_balloon_size(vb);
+   *freed += num_freed_pages;
+
+   return NOTIFY_OK;
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -446,6 +485,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
 
+   vb-nb.notifier_call = virtballoon_oom_notify;
+   vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
+   err = register_oom_notifier(vb-nb);
+   if (err  0)
+   goto out_oom_notify;
+
vb-thread = kthread_run(balloon, vb, vballoon);
if (IS_ERR(vb-thread)) {
err = PTR_ERR(vb-thread);
@@ -455,6 +500,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
return 0;
 
 out_del_vqs:
+   unregister_oom_notifier(vb-nb);
+out_oom_notify:
vdev-config-del_vqs(vdev);
 out_free_vb:
kfree(vb);
@@ -479,6 +526,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
struct virtio_balloon *vb = vdev-priv;
 
+   unregister_oom_notifier(vb-nb);
kthread_stop(vb-thread);
remove_common(vb);
kfree(vb);
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH RFC] virtio_net: enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 12:33 PM, Jason Wang wrote:
 On 10/15/2014 07:11 AM, Michael S. Tsirkin wrote:
  On Wed, Oct 15, 2014 at 12:53:59AM +0300, Michael S. Tsirkin wrote:
static void skb_xmit_done(struct virtqueue *vq)
{
 struct virtnet_info *vi = vq-vdev-priv;
   + struct send_queue *sq = vi-sq[vq2txq(vq)];

   - /* Suppress further interrupts. */
   - virtqueue_disable_cb(vq);
   -
  One note here: current code seems racy because of doing
  virtqueue_disable_cb from skb_xmit_done that I'm dropping here: there's
  no guarantee we don't get an interrupt while tx ring is running, and if
  that happens we can end up with interrupts disabled forever.
 
 Looks harmless since:

 - if event index is enabled, virtqueue_disable_cb() does nothing in fact.
 - if event index is disabled, we don't depend on tx interrupt and when
 num_free is low we will try to enable the tx interrupt again.

Ok, I think I get you here. For 'current' you mean the rfc I post.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization