Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM
On 14/10/14 13:10, Michael S. Tsirkin wrote: On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote: Denis V. Lunev d...@parallels.com writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This makes some amount of sense. This reminds me of the balloon fs that Google once proposed. This really needs to be controlled from host though. At the moment host does not expect guest to deflate before requests. So as a minimum, add a feature bit for this. what if you want a mix of mandatory and optional balooning? I guess we can use multiple balloons, is that the idea? Trying to claw back some pages on OOM is almost certainly correct, even if the host doesn't expect it. It's roughly equivalent to not giving up pages in the first place. Well the difference is that there are management tools that poll balloon in host until they see balloon size reaches the expected value. They don't expect balloon to shrink below num_pages and will respond in various unexpected ways like e.g. killing the VM if it does. Killing a userspace process within the guest might be better for VM health. Besides the fact that we always did it like this, these tools seem to have basis in the spec. Specifically, this is based on this text from the spec: the device asks for a certain amount of memory, and the driver supplies it (or withdraws it, if the device has more than it asks for). This allows the guest to adapt to changes in allowance of underlying physical memory. and The device is driven by the receipt of a configuration change interrupt. Cheers, Rusty. PS. Yes, a real guest-driven balloon is preferable, but that's a much larger task. Any objection to making the feature depend on a feature flag? OK. I got the point. This sounds good for me. We will prepare patch for kernel and proper bits (command line option) for QEMU. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
Jason Wang jasow...@redhat.com writes: Below should be useful for some experiments Jason is doing. I thought I'd send it out for early review/feedback. event idx feature allows us to defer interrupts until a specific # of descriptors were used. Sometimes it might be useful to get an interrupt after a specific descriptor, regardless. This adds a descriptor flag for this, and an API to create an urgent output descriptor. This is still an RFC: we'll need a feature bit for drivers to detect this, but we've run out of feature bits for virtio 0.X. For experimentation purposes, drivers can assume this is set, or add a driver-specific feature bit. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com The new VRING_DESC_F_URGENT bit is theoretically nicer, but for networking (which tends to take packets in order) couldn't we just set the event counter to give us a tx interrupt at the packet we want? Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM
Michael S. Tsirkin m...@redhat.com writes: On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote: Denis V. Lunev d...@parallels.com writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This makes some amount of sense. This reminds me of the balloon fs that Google once proposed. This really needs to be controlled from host though. At the moment host does not expect guest to deflate before requests. So as a minimum, add a feature bit for this. what if you want a mix of mandatory and optional balooning? I guess we can use multiple balloons, is that the idea? Trying to claw back some pages on OOM is almost certainly correct, even if the host doesn't expect it. It's roughly equivalent to not giving up pages in the first place. Well the difference is that there are management tools that poll balloon in host until they see balloon size reaches the expected value. They don't expect balloon to shrink below num_pages and will respond in various unexpected ways like e.g. killing the VM if it does. Killing a userspace process within the guest might be better for VM health. Besides the fact that we always did it like this, these tools seem to have basis in the spec. Specifically, this is based on this text from the spec: the device asks for a certain amount of memory, and the driver supplies it (or withdraws it, if the device has more than it asks for). This allows the guest to adapt to changes in allowance of underlying physical memory. and The device is driven by the receipt of a configuration change interrupt. Cheers, Rusty. PS. Yes, a real guest-driven balloon is preferable, but that's a much larger task. Any objection to making the feature depend on a feature flag? If you believe a guest which does this will cause drastic failure on the host side (ie. killing the VM), then yes, we can do this. However, I'm not aware of anything that sophisticated... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_balloon: Convert vballon kthread into a workqueue
Petr Mladek pmla...@suse.cz writes: Workqueues have clean and rich API for all basic operations. The code is usually easier and better readable. It can be easily tuned for the given purpose. OK, sure. -static void fill_balloon(struct virtio_balloon *vb, size_t num) +static void fill_balloon(struct virtio_balloon *vb, size_t diff) { struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; + size_t num; + bool done; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb-pfns)); + num = min(diff, ARRAY_SIZE(vb-pfns)); + done = (num == diff) ? true : false; mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; @@ -143,6 +143,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) VIRTIO_BALLOON_PAGES_PER_PAGE); /* Sleep for at least 1/5 of a second before retry. */ msleep(200); + done = false; break; } set_page_pfns(vb-pfns + vb-num_pfns, page); @@ -154,6 +155,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) if (vb-num_pfns != 0) tell_host(vb, vb-inflate_vq); mutex_unlock(vb-balloon_lock); + + if (!done) + queue_work(vb-wq, vb-wq_work); } Hmm, this is a bit convoluted. I would spell it out by keeping a num_done counter: if (num_done diff) queue_work(vb-wq, vb-wq_work); static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -168,20 +172,25 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } -static void leak_balloon(struct virtio_balloon *vb, size_t num) +static void leak_balloon(struct virtio_balloon *vb, size_t diff) { struct page *page; struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; + size_t num; + bool done; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb-pfns)); + num = min(diff, ARRAY_SIZE(vb-pfns)); + done = (num == diff) ? true : false; mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { page = balloon_page_dequeue(vb_dev_info); - if (!page) + if (!page) { + done = false; break; + } set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -195,6 +204,9 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); + + if (!done) + queue_work(vb-wq, vb-wq_work); } Here too. @@ -471,11 +469,13 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb_mapping; - vb-thread = kthread_run(balloon, vb, vballoon); - if (IS_ERR(vb-thread)) { - err = PTR_ERR(vb-thread); + vb-wq = alloc_workqueue(vballoon_wq, + WQ_FREEZABLE | WQ_MEM_RECLAIM, 0); + if (!vb-wq) { + err = -ENOMEM; goto out_del_vqs; } + INIT_WORK(vb-wq_work, balloon); That looks racy to me; shouldn't we init vq-work before allocating wq? Otherwise, looks good! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 03/25] virtio-pci: move freeze/restore to virtio core
On Mon, 2014-10-13 at 10:48 +0300, Michael S. Tsirkin wrote: This is in preparation to extending config changed event handling in core. Wrapping these in an API also seems to make for a cleaner code. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com This patch landed in today's linux-next (next-20141015). include/linux/virtio.h | 6 + drivers/virtio/virtio.c | 54 + drivers/virtio/virtio_pci.c | 54 ++--- 3 files changed, 62 insertions(+), 52 deletions(-) diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 3c19bd3..8df7ba8 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq); /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus + * @failed: saved value for CONFIG_S_FAILED bit (for restore) s/CONFIG_S_FAILED/VIRTIO_CONFIG_S_FAILED/? * @dev: underlying device. * @id: the device type identification (used to match it with a driver). * @config: the configuration ops for this device. Paul Bolle ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt
On 10/15/2014 07:06 AM, Michael S. Tsirkin wrote: On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote: From: Jason Wang jasow...@redhat.com Date: Sat, 11 Oct 2014 15:16:43 +0800 We free old transmitted packets in ndo_start_xmit() currently, so any packet must be orphaned also there. This was used to reduce the overhead of tx interrupt to achieve better performance. But this may not work for some protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to implement various optimization for small packets stream such as TCP small queue and auto corking. But orphaning packets early in ndo_start_xmit() disable such things more or less since sk_wmem_alloc was not accurate. This lead extra low throughput for TCP stream of small writes. This series tries to solve this issue by enable tx interrupts for all TCP packets other than the ones with push bit or pure ACK. This is done through the support of urgent descriptor which can force an interrupt for a specified packet. If tx interrupt was enabled for a packet, there's no need to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP can batch more for small write. More larger skb was produced by TCP in this case to improve both throughput and cpu utilization. Test shows great improvements on small write tcp streams. For most of the other cases, the throughput and cpu utilization are the same in the past. Only few cases, more cpu utilization was noticed which needs more investigation. Review and comments are welcomed. I think proper accounting and queueing (at all levels, not just TCP sockets) is more important than trying to skim a bunch of cycles by avoiding TX interrupts. Having an event to free the SKB is absolutely essential for the stack to operate correctly. And with virtio-net you don't even have the excuse of the HW unfortunately doesn't have an appropriate TX event. So please don't play games, and instead use TX interrupts all the time. You can mitigate them in various ways, but don't turn them on selectively based upon traffic type, that's terrible. You can even use -xmit_more to defer the TX interrupt indication to the final TX packet in the chain. I guess we can just defer the kick, interrupt will naturally be deferred as well. This should solve the problem for old hosts as well. Interrupt were delayed but not reduced, to support this we need publish avail idx as used event. This should reduce the tx interrupt in the case of bulk dequeuing. I will draft a new rfc series contain this. We'll also need to implement bql for this. Something like the below? Completely untested - posting here to see if I figured the API out correctly. Has to be applied on top of the previous patch. Looks so. I believe better to have but not a must. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ - if (!vring_need_event(vring_used_event(vq-vring), - vq-vring.avail-idx, last_used_idx)) + if (vq-vring.avail-idx != vring_used_event(vq-vring) + !vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; + } END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + bool ret; + + START_USE(vq); + vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; + vring_used_event(vq-vring) = vq-vring.avail-idx; + ret = vring_need_event(vq-vring.avail-idx, + vq-last_used_idx, vq-vring.used-idx); + END_USE(vq); + + return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +bool virtqueue_enable_cb_avail(struct virtqueue *vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
According to David, proper accounting and queueing (at all levels, not just TCP sockets) is more important than trying to skim a bunch of cycles by avoiding TX interrupts. Having an event to free the SKB is absolutely essential for the stack to operate correctly. This series tries to enable tx interrupt for virtio-net. The idea is simple: enable tx interrupt and schedule a tx napi to free old xmit skbs. Several notes: - Tx interrupt storm avoidance when queue is about to be full is kept. Since we may enable callbacks in both ndo_start_xmit() and tx napi, patch 1 adds a check to make sure used event never go back. This will let the napi not enable the callbacks wrongly after delayed callbacks was used. - For bulk dequeuing, there's no need to enable tx interrupt for each packet. The last patch only enable tx interrupt for the final skb in the chain through xmit_more and a new helper to publish current avail idx as used event. This series fixes several issues of original rfc pointed out by Michael. Smoking test is done, and will do complete netperf test. Send the seires for early review. Thanks Jason Wang (6): virtio: make sure used event never go backwards virtio: introduce virtio_enable_cb_avail() virtio-net: small optimization on free_old_xmit_skbs() virtio-net: return the number of packets sent in free_old_xmit_skbs() virtio-net: enable tx interrupt virtio-net: enable tx interrupt only for the final skb in the chain drivers/net/virtio_net.c | 125 ++ drivers/virtio/virtio_ring.c | 25 - include/linux/virtio.h |2 + 3 files changed, 115 insertions(+), 37 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
With the help of xmit_more and virtqueue_enable_cb_avail(), this patch enable tx interrupt only for the final skb in the chain if needed. This will help to mitigate tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2afc2e2..5943f3f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) virtqueue_disable_cb(sq-vq); } } - } else if (virtqueue_enable_cb(sq-vq)) { - free_old_xmit_skbs(sq, qsize); } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) { virtqueue_kick(sq-vq); + if (sq-vq-num_free = 2 +MAX_SKB_FRAGS + unlikely(virtqueue_enable_cb_avail(sq-vq))) { + free_old_xmit_skbs(sq, qsize); + virtqueue_disable_cb(sq-vq); + } + } return NETDEV_TX_OK; } -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
Orphan skb in ndo_start_xmit() breaks socket accounting and packet queuing. This in fact breaks lots of things such as pktgen and several TCP optimizations. And also make BQL can't be implemented for virtio-net. This patch tries to solve this issue by enabling tx interrupt. To avoid introducing extra spinlocks, a tx napi was scheduled to free those packets. More tx interrupt mitigation method could be used on top. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 125 +++--- 1 files changed, 85 insertions(+), 40 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ccf98f9..2afc2e2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; + + while (tx_packets budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + tx_bytes += skb-len; + tx_packets++; + + dev_kfree_skb_any(skb); + } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); + + return tx_packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -774,7 +801,39 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL + /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) { @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static int free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - u64 tx_bytes = 0, tx_packets = 0; - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - tx_bytes += skb-len; - tx_packets++; - - dev_kfree_skb_any(skb); - } - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += tx_bytes; - stats-tx_packets =+ tx_packets; - u64_stats_update_end(stats-tx_syncp); - - return tx_packets; -} - static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct skb_vnet_hdr *hdr; @@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) sg_set_buf(sq-sg, hdr, hdr_len); num_sg =
Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote: This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ - if (!vring_need_event(vring_used_event(vq-vring), - vq-vring.avail-idx, last_used_idx)) + if (vq-vring.avail-idx != vring_used_event(vq-vring) + !vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; + } END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); I see you are also changing virtqueue_enable_cb_prepare, why? +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + bool ret; + + START_USE(vq); + vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; + vring_used_event(vq-vring) = vq-vring.avail-idx; + ret = vring_need_event(vq-vring.avail-idx, +vq-last_used_idx, vq-vring.used-idx); + END_USE(vq); + + return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. Could not figure out what this does. Please add documentation. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +bool virtqueue_enable_cb_avail(struct virtqueue *vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote: This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com the implication being that moving event idx back might cause some race condition? If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log. --- drivers/virtio/virtio_ring.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - + last_used_idx = vq-last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; - vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx; + /* Make sure used event never go backwards */ s/go/goes/ + if (!vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) + vring_used_event(vq-vring) = last_used_idx; The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both? END_USE(vq); return last_used_idx; } -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; stats-tx_packets += tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: ... +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; + + while (tx_packets budget +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + tx_bytes += skb-len; + tx_packets++; + + dev_kfree_skb_any(skb); + } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; stats-tx_packets += tx_packets; + u64_stats_update_end(stats-tx_syncp); + + return tx_packets; +} ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote: This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com the implication being that moving event idx back might cause some race condition? This will cause race condition when tx interrupt is enabled. Consider the following cases 1) tx napi was scheduled 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used event is vq-last_used_idx + 3/4 pendg bufs] 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is vq-last_used_idx ] After step 3, used event was moved back, unnecessary tx interrupt was triggered. If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Looks not, current code does not have such race condition. Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log. Will do this. --- drivers/virtio/virtio_ring.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - +last_used_idx = vq-last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; -vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx; +/* Make sure used event never go backwards */ s/go/goes/ +if (!vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) +vring_used_event(vq-vring) = last_used_idx; The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. This will only happen when we want to delay the interrupt for next few consumed buffers (virtqueue_enable_cb_delayed() was called). For the other case, vq-last_used_idx should be ahead of previous used event. Do you see any other case? Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both? We need call them both when tx interrupt is enabled I believe. END_USE(vq); return last_used_idx; } -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote: Orphan skb in ndo_start_xmit() breaks socket accounting and packet queuing. This in fact breaks lots of things such as pktgen and several TCP optimizations. And also make BQL can't be implemented for virtio-net. This patch tries to solve this issue by enabling tx interrupt. To avoid introducing extra spinlocks, a tx napi was scheduled to free those packets. More tx interrupt mitigation method could be used on top. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 125 +++--- 1 files changed, 85 insertions(+), 40 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ccf98f9..2afc2e2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; + + while (tx_packets budget +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + tx_bytes += skb-len; + tx_packets++; + + dev_kfree_skb_any(skb); + } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); + + return tx_packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -774,7 +801,39 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) So you are enabling callback on the next packet, which is almost sure to cause an interrupt storm on the guest. I think it's a bad idea, this is why I used enable_cb_delayed in my patch. + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL + /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) { @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static int free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - u64 tx_bytes = 0, tx_packets = 0; - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - tx_bytes += skb-len; - tx_packets++; - - dev_kfree_skb_any(skb); - } - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += tx_bytes; - stats-tx_packets =+ tx_packets; - u64_stats_update_end(stats-tx_syncp); - - return tx_packets; -} - So you end up
Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote: This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ -if (!vring_need_event(vring_used_event(vq-vring), - vq-vring.avail-idx, last_used_idx)) +if (vq-vring.avail-idx != vring_used_event(vq-vring) +!vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; +} END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); I see you are also changing virtqueue_enable_cb_prepare, why? This is also used to prevent it from moving the used event backwards. This may happens when we handle tx napi after we publish avail idx as used event (virtqueue_enable_cb_avail() was called). +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ +struct vring_virtqueue *vq = to_vvq(_vq); +bool ret; + +START_USE(vq); +vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; +vring_used_event(vq-vring) = vq-vring.avail-idx; +ret = vring_need_event(vq-vring.avail-idx, + vq-last_used_idx, vq-vring.used-idx); +END_USE(vq); + +return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. Could not figure out what this does. Please add documentation. Sure, does something like below explain what does this function do? /** * virtqueue_enable_cb_avail - restart callbacks after disable_cb. * @vq: the struct virtqueue we're talking about. * * This re-enables callbacks but hints to the other side to delay * interrupts all of the available buffers have been processed; * it returns false if there are at least one pending buffer in the queue, * to detect a possible race between the driver checking for more work, * and enabling callbacks. * * Caller must ensure we don't call this with other virtqueue * operations at the same time (except where noted). */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +bool virtqueue_enable_cb_avail(struct virtqueue *vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote: With the help of xmit_more and virtqueue_enable_cb_avail(), this patch enable tx interrupt only for the final skb in the chain if needed. This will help to mitigate tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I think you should split xmit_more stuff out. --- drivers/net/virtio_net.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2afc2e2..5943f3f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) virtqueue_disable_cb(sq-vq); } } - } else if (virtqueue_enable_cb(sq-vq)) { - free_old_xmit_skbs(sq, qsize); } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) { virtqueue_kick(sq-vq); + if (sq-vq-num_free = 2 +MAX_SKB_FRAGS + unlikely(virtqueue_enable_cb_avail(sq-vq))) { + free_old_xmit_skbs(sq, qsize); + virtqueue_disable_cb(sq-vq); + } + } return NETDEV_TX_OK; } -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On 10/15/2014 05:37 PM, Eric Dumazet wrote: On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: ... +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ +struct sk_buff *skb; +unsigned int len; +struct virtnet_info *vi = sq-vq-vdev-priv; +struct virtnet_stats *stats = this_cpu_ptr(vi-stats); +u64 tx_bytes = 0, tx_packets = 0; + +while (tx_packets budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { +pr_debug(Sent skb %p\n, skb); + +tx_bytes += skb-len; +tx_packets++; + +dev_kfree_skb_any(skb); +} + +u64_stats_update_begin(stats-tx_syncp); +stats-tx_bytes += tx_bytes; +stats-tx_packets =+ tx_packets; stats-tx_packets += tx_packets; My bad, will correct it in next version. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote: Orphan skb in ndo_start_xmit() breaks socket accounting and packet queuing. This in fact breaks lots of things such as pktgen and several TCP optimizations. And also make BQL can't be implemented for virtio-net. This patch tries to solve this issue by enabling tx interrupt. To avoid introducing extra spinlocks, a tx napi was scheduled to free those packets. More tx interrupt mitigation method could be used on top. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 125 +++--- 1 files changed, 85 insertions(+), 40 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ccf98f9..2afc2e2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; + + while (tx_packets budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + tx_bytes += skb-len; + tx_packets++; + + dev_kfree_skb_any(skb); + } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); + + return tx_packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -774,7 +801,39 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) So you are enabling callback on the next packet, which is almost sure to cause an interrupt storm on the guest. I think it's a bad idea, this is why I used enable_cb_delayed in my patch. Right, will do this, but may also need to make sure used event never goes back since we may call virtqueue_enable_cb_avail(). + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL + /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) { @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static int free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - u64 tx_bytes = 0, tx_packets = 0; - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - tx_bytes += skb-len; - tx_packets++; - - dev_kfree_skb_any(skb); - } - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += tx_bytes; - stats-tx_packets =+
Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote: On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote: This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com the implication being that moving event idx back might cause some race condition? This will cause race condition when tx interrupt is enabled. Consider the following cases 1) tx napi was scheduled 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used event is vq-last_used_idx + 3/4 pendg bufs] 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is vq-last_used_idx ] After step 3, used event was moved back, unnecessary tx interrupt was triggered. Well unnecessary interrupts are safe. With your patch caller of virtqueue_enable_cb will not get an interrupt on the next buffer which is not safe. If you don't want an interrupt on the next buffer, don't call virtqueue_enable_cb. If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Looks not, current code does not have such race condition. Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log. Will do this. --- drivers/virtio/virtio_ring.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - + last_used_idx = vq-last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; - vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx; + /* Make sure used event never go backwards */ s/go/goes/ + if (!vring_need_event(vring_used_event(vq-vring), +vq-vring.avail-idx, last_used_idx)) + vring_used_event(vq-vring) = last_used_idx; The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. This will only happen when we want to delay the interrupt for next few consumed buffers (virtqueue_enable_cb_delayed() was called). For the other case, vq-last_used_idx should be ahead of previous used event. Do you see any other case? Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If event index is not updated in virtqueue_enable_cb, driver will not get an interrupt on the next buffer. Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both? We need call them both when tx interrupt is enabled I believe. Can you pls reply to my patch and document issues you see? END_USE(vq); return last_used_idx; } -- 1.7.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote: With the help of xmit_more and virtqueue_enable_cb_avail(), this patch enable tx interrupt only for the final skb in the chain if needed. This will help to mitigate tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I think you should split xmit_more stuff out. No much difference but if you prefer I will do this. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote: On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote: This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ - if (!vring_need_event(vring_used_event(vq-vring), -vq-vring.avail-idx, last_used_idx)) + if (vq-vring.avail-idx != vring_used_event(vq-vring) + !vring_need_event(vring_used_event(vq-vring), +vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; + } END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); I see you are also changing virtqueue_enable_cb_prepare, why? This is also used to prevent it from moving the used event backwards. This may happens when we handle tx napi after we publish avail idx as used event (virtqueue_enable_cb_avail() was called). So it's wrong exactly in the same way. But also, please document this stuff, don't put unrelated changes in a patch called introduce virtqueue_enable_cb_avail. +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + bool ret; + + START_USE(vq); + vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; + vring_used_event(vq-vring) = vq-vring.avail-idx; + ret = vring_need_event(vq-vring.avail-idx, + vq-last_used_idx, vq-vring.used-idx); + END_USE(vq); + + return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. Could not figure out what this does. Please add documentation. Sure, does something like below explain what does this function do? /** * virtqueue_enable_cb_avail - restart callbacks after disable_cb. * @vq: the struct virtqueue we're talking about. * * This re-enables callbacks but hints to the other side to delay * interrupts all of the available buffers have been processed; So this is like virtqueue_enable_cb_delayed but even more aggressive? I think it's too agressive: it's better to wake up guest after you are through most of buffers, but not all, so guest and host can work in parallel. * it returns false if there are at least one pending buffer in the queue, * to detect a possible race between the driver checking for more work, * and enabling callbacks. * * Caller must ensure we don't call this with other virtqueue * operations at the same time (except where noted). */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +bool virtqueue_enable_cb_avail(struct virtqueue *vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
On Wed, Oct 15, 2014 at 06:31:19PM +0800, Jason Wang wrote: On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote: With the help of xmit_more and virtqueue_enable_cb_avail(), this patch enable tx interrupt only for the final skb in the chain if needed. This will help to mitigate tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I think you should split xmit_more stuff out. No much difference but if you prefer I will do this. In fact I did it exactly like this already :) -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote: On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote: This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com the implication being that moving event idx back might cause some race condition? This will cause race condition when tx interrupt is enabled. Consider the following cases 1) tx napi was scheduled 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used event is vq-last_used_idx + 3/4 pendg bufs] 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is vq-last_used_idx ] After step 3, used event was moved back, unnecessary tx interrupt was triggered. Well unnecessary interrupts are safe. But it that is what we want to reduce. With your patch caller of virtqueue_enable_cb will not get an interrupt on the next buffer which is not safe. If you don't want an interrupt on the next buffer, don't call virtqueue_enable_cb. So something like this patch should be done in virtio core somewhere else. Virtio-net can not do this since it does not have the knowledge of event index. If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Looks not, current code does not have such race condition. Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log. Will do this. --- drivers/virtio/virtio_ring.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - + last_used_idx = vq-last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; - vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx; + /* Make sure used event never go backwards */ s/go/goes/ + if (!vring_need_event(vring_used_event(vq-vring), +vq-vring.avail-idx, last_used_idx)) + vring_used_event(vq-vring) = last_used_idx; The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. This will only happen when we want to delay the interrupt for next few consumed buffers (virtqueue_enable_cb_delayed() was called). For the other case, vq-last_used_idx should be ahead of previous used event. Do you see any other case? Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If event index is not updated in virtqueue_enable_cb, driver will not get an interrupt on the next buffer. This is just what we want I think. The interrupt was not lost but fired after 3/4 pending buffers were consumed. Do you see any real issue on this? Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both? We need call them both when tx interrupt is enabled I believe. Can you pls reply to my patch and document issues you see? In the previous reply you said you're using virtuqueue_enable_cb_delayed(), so no race in your patch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote: From: Of Michael S. Tsirkin On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Probably depends on the actual cost of u64_stats_update_begin/end against the likely extra saving of the tx_bytes and tx_packets values onto the stack across the call to dev_kfree_skb_any(). (Which depends on the number of caller saved registers.) Yea, some benchmark results would be nice to see. Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; tx_packets need only be 'unsigned int'. The same is almost certainly true of tx_bytes. David while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Michael S. Tsirkin On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote: From: Of Michael S. Tsirkin On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Probably depends on the actual cost of u64_stats_update_begin/end against the likely extra saving of the tx_bytes and tx_packets values onto the stack across the call to dev_kfree_skb_any(). (Which depends on the number of caller saved registers.) Yea, some benchmark results would be nice to see. I there are likely to be multiple skb on the queue the fastest code would probably do one 'virtqueue_get_all()' that returned a linked list of buffers, then follow the list to get the stats, and follow it again to free the skb. David Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; tx_packets need only be 'unsigned int'. The same is almost certainly true of tx_bytes. David while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote: On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote: This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ - if (!vring_need_event(vring_used_event(vq-vring), -vq-vring.avail-idx, last_used_idx)) + if (vq-vring.avail-idx != vring_used_event(vq-vring) + !vring_need_event(vring_used_event(vq-vring), +vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; + } END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); I see you are also changing virtqueue_enable_cb_prepare, why? This is also used to prevent it from moving the used event backwards. This may happens when we handle tx napi after we publish avail idx as used event (virtqueue_enable_cb_avail() was called). So it's wrong exactly in the same way. But also, please document this stuff, don't put unrelated changes in a patch called introduce virtqueue_enable_cb_avail. +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + bool ret; + + START_USE(vq); + vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; + vring_used_event(vq-vring) = vq-vring.avail-idx; + ret = vring_need_event(vq-vring.avail-idx, + vq-last_used_idx, vq-vring.used-idx); + END_USE(vq); + + return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. Could not figure out what this does. Please add documentation. Sure, does something like below explain what does this function do? /** * virtqueue_enable_cb_avail - restart callbacks after disable_cb. * @vq: the struct virtqueue we're talking about. * * This re-enables callbacks but hints to the other side to delay * interrupts all of the available buffers have been processed; So this is like virtqueue_enable_cb_delayed but even more aggressive? I think it's too agressive: it's better to wake up guest after you are through most of buffers, but not all, so guest and host can work in parallel. Note that: - it was only used when there are still few of free slots (which is greater than 2 + MAX_SKB_FRAGS) - my patch keeps the free_old_xmit_skbs() in the beginning of start_xmit(), so the tx skb reclaiming does not depends totally on tx interrupt. If more packets comes, we'd expect some of them were freed in ndo_start_xmit(). If not, finally we may trigger the virtqueue_enable_cb_delayed(). So probably not as aggressive as it looks. I will do benchmark on this. * it returns false if there are at least one pending buffer in the queue, * to detect a possible race between the driver checking for more work, * and enabling callbacks. * * Caller must ensure we don't call this with other virtqueue * operations at the same time (except where noted). */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +bool virtqueue_enable_cb_avail(struct virtqueue *vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo
Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On 10/15/2014 06:43 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote: On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote: Orphan skb in ndo_start_xmit() breaks socket accounting and packet queuing. This in fact breaks lots of things such as pktgen and several TCP optimizations. And also make BQL can't be implemented for virtio-net. This patch tries to solve this issue by enabling tx interrupt. To avoid introducing extra spinlocks, a tx napi was scheduled to free those packets. More tx interrupt mitigation method could be used on top. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c | 125 +++--- 1 files changed, 85 insertions(+), 40 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ccf98f9..2afc2e2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; + + while (tx_packets budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + tx_bytes += skb-len; + tx_packets++; + + dev_kfree_skb_any(skb); + } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); + + return tx_packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -774,7 +801,39 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) So you are enabling callback on the next packet, which is almost sure to cause an interrupt storm on the guest. I think it's a bad idea, this is why I used enable_cb_delayed in my patch. Right, will do this, but may also need to make sure used event never goes back since we may call virtqueue_enable_cb_avail(). That's why my patch always calls virtqueue_enable_cb_delayed. So no need for hacks. Maybe you can review my patch and comment? I think I miss the virtqueue_enable_cb_delayed() in your patch when I'm reviewing it. Will do. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio_net: enable tx interrupt
On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote: On newer hosts that support delayed tx interrupts, we probably don't have much to gain from orphaning packets early. Based on patch by Jason Wang. Note: this will likely degrade performance for hosts without event idx support. Various fallback options are available, including orphaning conditionally. Testing TBD. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 119 +-- 1 file changed, 83 insertions(+), 36 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6b6e136..62c059d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + int sent = 0; + + while (sent budget +(skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += skb-len; + stats-tx_packets++; + u64_stats_update_end(stats-tx_syncp); + + dev_kfree_skb_any(skb); + sent++; + } + + return sent; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -766,6 +791,37 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can move used index backwards to trigger unnecessary interrupts. + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); - - dev_kfree_skb_any(skb); - } -} - static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct skb_vnet_hdr *hdr; @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) sg_set_buf(sq-sg, hdr, hdr_len); num_sg = skb_to_sgvec(skb, sq-sg + 1, 0, skb-len) + 1; } - return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, GFP_ATOMIC); + + return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg,
Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote: According to David, proper accounting and queueing (at all levels, not just TCP sockets) is more important than trying to skim a bunch of cycles by avoiding TX interrupts. He also mentioned we should find other ways to batch Right. Having an event to free the SKB is absolutely essential for the stack to operate correctly. This series tries to enable tx interrupt for virtio-net. The idea is simple: enable tx interrupt and schedule a tx napi to free old xmit skbs. Several notes: - Tx interrupt storm avoidance when queue is about to be full is kept. But queue is typically *not* full. More important to avoid interrupt storms in that case IMO. Yes. Since we may enable callbacks in both ndo_start_xmit() and tx napi, patch 1 adds a check to make sure used event never go back. This will let the napi not enable the callbacks wrongly after delayed callbacks was used. So why not just use delayed callbacks? This means the tx interrupt are coalesced in a somewhat adaptive way. Need benchmark to see its effect. - For bulk dequeuing, there's no need to enable tx interrupt for each packet. The last patch only enable tx interrupt for the final skb in the chain through xmit_more and a new helper to publish current avail idx as used event. This series fixes several issues of original rfc pointed out by Michael. Could you list the issues, for ease of review? Probably just one: - Move the virtqueue_disable_cb() from skb_xmit_done() into virtnet_poll_tx() under tx lock. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote: On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote: On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote: This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com the implication being that moving event idx back might cause some race condition? This will cause race condition when tx interrupt is enabled. Consider the following cases 1) tx napi was scheduled 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used event is vq-last_used_idx + 3/4 pendg bufs] 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is vq-last_used_idx ] After step 3, used event was moved back, unnecessary tx interrupt was triggered. Well unnecessary interrupts are safe. But it that is what we want to reduce. It's all about correctness. I don't think mixing enable_cb and enable_cb_delayed makes sense, let's just make virtio behave correctly if that happens, no need to optimize for that. With your patch caller of virtqueue_enable_cb will not get an interrupt on the next buffer which is not safe. If you don't want an interrupt on the next buffer, don't call virtqueue_enable_cb. So something like this patch should be done in virtio core somewhere else. Virtio-net can not do this since it does not have the knowledge of event index. Take a look at my patch - no calls to enable_cb, only enable_cb_delayed, so we should be fine. If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Looks not, current code does not have such race condition. Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log. Will do this. --- drivers/virtio/virtio_ring.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - +last_used_idx = vq-last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; -vring_used_event(vq-vring) = last_used_idx = vq-last_used_idx; +/* Make sure used event never go backwards */ s/go/goes/ +if (!vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) +vring_used_event(vq-vring) = last_used_idx; The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. This will only happen when we want to delay the interrupt for next few consumed buffers (virtqueue_enable_cb_delayed() was called). For the other case, vq-last_used_idx should be ahead of previous used event. Do you see any other case? Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If event index is not updated in virtqueue_enable_cb, driver will not get an interrupt on the next buffer. This is just what we want I think. The interrupt was not lost but fired after 3/4 pending buffers were consumed. Do you see any real issue on this? Yes, this violates the API. For example device might never consume the rest of buffers. Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both? We need call them both when tx interrupt is enabled I believe. Can you pls reply to my patch and document issues you see? In the previous reply you said you're using virtuqueue_enable_cb_delayed(), so no race in your patch. OK so you think my patch is also correct, but that yours gives better efficiency? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote: On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote: On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote: This patch introduces virtio_enable_cb_avail() to publish avail idx and used event. This could be used by batched buffer submitting to reduce the number of tx interrupts. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 22 -- include/linux/virtio.h |2 ++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1b3929f..d67fbf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; /* Make sure used event never go backwards */ -if (!vring_need_event(vring_used_event(vq-vring), - vq-vring.avail-idx, last_used_idx)) +if (vq-vring.avail-idx != vring_used_event(vq-vring) +!vring_need_event(vring_used_event(vq-vring), + vq-vring.avail-idx, last_used_idx)) { vring_used_event(vq-vring) = last_used_idx; +} END_USE(vq); return last_used_idx; } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); I see you are also changing virtqueue_enable_cb_prepare, why? This is also used to prevent it from moving the used event backwards. This may happens when we handle tx napi after we publish avail idx as used event (virtqueue_enable_cb_avail() was called). So it's wrong exactly in the same way. But also, please document this stuff, don't put unrelated changes in a patch called introduce virtqueue_enable_cb_avail. +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) +{ +struct vring_virtqueue *vq = to_vvq(_vq); +bool ret; + +START_USE(vq); +vq-vring.avail-flags = ~VRING_AVAIL_F_NO_INTERRUPT; +vring_used_event(vq-vring) = vq-vring.avail-idx; +ret = vring_need_event(vq-vring.avail-idx, + vq-last_used_idx, vq-vring.used-idx); +END_USE(vq); + +return ret; +} +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); + /** * virtqueue_poll - query pending used buffers * @vq: the struct virtqueue we're talking about. Could not figure out what this does. Please add documentation. Sure, does something like below explain what does this function do? /** * virtqueue_enable_cb_avail - restart callbacks after disable_cb. * @vq: the struct virtqueue we're talking about. * * This re-enables callbacks but hints to the other side to delay * interrupts all of the available buffers have been processed; So this is like virtqueue_enable_cb_delayed but even more aggressive? I think it's too agressive: it's better to wake up guest after you are through most of buffers, but not all, so guest and host can work in parallel. Note that: - it was only used when there are still few of free slots (which is greater than 2 + MAX_SKB_FRAGS) - my patch keeps the free_old_xmit_skbs() in the beginning of start_xmit(), so the tx skb reclaiming does not depends totally on tx interrupt. If more packets comes, we'd expect some of them were freed in ndo_start_xmit(). If not, finally we may trigger the virtqueue_enable_cb_delayed(). So probably not as aggressive as it looks. I will do benchmark on this. Mine too: } else if (virtqueue_enable_cb_delayed(sq-vq)) { free_old_xmit_skbs(txq, sq, qsize); } * it returns false if there are at least one pending buffer in the queue, * to detect a possible race between the driver checking for more work, * and enabling callbacks. * * Caller must ensure we don't call this with other virtqueue * operations at the same time (except where noted). */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b46671e..bfaf058 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -65,6 +65,8 @@ bool
Re: [PATCH RFC] virtio_net: enable tx interrupt
On Wed, Oct 15, 2014 at 07:04:20PM +0800, Jason Wang wrote: On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote: On newer hosts that support delayed tx interrupts, we probably don't have much to gain from orphaning packets early. Based on patch by Jason Wang. Note: this will likely degrade performance for hosts without event idx support. Various fallback options are available, including orphaning conditionally. Testing TBD. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 119 +-- 1 file changed, 83 insertions(+), 36 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6b6e136..62c059d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + int sent = 0; + + while (sent budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += skb-len; + stats-tx_packets++; + u64_stats_update_end(stats-tx_syncp); + + dev_kfree_skb_any(skb); + sent++; + } + + return sent; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) { + __napi_schedule(sq-napi); + } } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -766,6 +791,37 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int r, sent = 0; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + r = virtqueue_enable_cb_prepare(sq-vq); So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can move used index backwards to trigger unnecessary interrupts. Good point. I'll rework this to use virtqueue_enable_cb_delayed. virtqueue_enable_cb_delayed_prepare might be nice to reduce lock contention, but that needs to be benchmarked. + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(virtqueue_poll(sq-vq, r)) + napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); - - dev_kfree_skb_any(skb); - } -} - static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct skb_vnet_hdr *hdr; @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote: On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote: On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote: According to David, proper accounting and queueing (at all levels, not just TCP sockets) is more important than trying to skim a bunch of cycles by avoiding TX interrupts. He also mentioned we should find other ways to batch Right. Having an event to free the SKB is absolutely essential for the stack to operate correctly. This series tries to enable tx interrupt for virtio-net. The idea is simple: enable tx interrupt and schedule a tx napi to free old xmit skbs. Several notes: - Tx interrupt storm avoidance when queue is about to be full is kept. But queue is typically *not* full. More important to avoid interrupt storms in that case IMO. Yes. Since we may enable callbacks in both ndo_start_xmit() and tx napi, patch 1 adds a check to make sure used event never go back. This will let the napi not enable the callbacks wrongly after delayed callbacks was used. So why not just use delayed callbacks? This means the tx interrupt are coalesced in a somewhat adaptive way. Need benchmark to see its effect. I think it's a minimal change, and does not need new APIs. If that's not optimal, we can do smarter things on top. - For bulk dequeuing, there's no need to enable tx interrupt for each packet. The last patch only enable tx interrupt for the final skb in the chain through xmit_more and a new helper to publish current avail idx as used event. This series fixes several issues of original rfc pointed out by Michael. Could you list the issues, for ease of review? Probably just one: - Move the virtqueue_disable_cb() from skb_xmit_done() into virtnet_poll_tx() under tx lock. I think I did this already, I'll recheck. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, Oct 15, 2014 at 10:51:32AM +, David Laight wrote: From: Michael S. Tsirkin On Wed, Oct 15, 2014 at 09:49:01AM +, David Laight wrote: From: Of Michael S. Tsirkin On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote: Accumulate the sent packets and sent bytes in local variables and perform a single u64_stats_update_begin/end() after. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Not sure how much it's worth but since Eric suggested it ... Probably depends on the actual cost of u64_stats_update_begin/end against the likely extra saving of the tx_bytes and tx_packets values onto the stack across the call to dev_kfree_skb_any(). (Which depends on the number of caller saved registers.) Yea, some benchmark results would be nice to see. I there are likely to be multiple skb on the queue the fastest code would probably do one 'virtqueue_get_all()' that returned a linked list of buffers, then follow the list to get the stats, and follow it again to free the skb. David Interesting. Each time we tried playing with linked list in the past, it simply destroyed performance. Maybe this case is different but I have my doubts. Acked-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..a4d56b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + u64 tx_bytes = 0, tx_packets = 0; tx_packets need only be 'unsigned int'. The same is almost certainly true of tx_bytes. David while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + tx_bytes += skb-len; + tx_packets++; dev_kfree_skb_any(skb); } + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += tx_bytes; + stats-tx_packets =+ tx_packets; + u64_stats_update_end(stats-tx_syncp); } static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio_net: fix use after free
commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb-xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller da...@davemloft.net Signed-off-by: Michael S. Tsirkin m...@redhat.com --- David, am I using the API correctly? Seems to work for me. You used __netif_subqueue_stopped but that seems to use a slightly more expensive test_bit internally. The reason I added a variable for the txq here is because it's handy for BQL patch later on. drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..13d0a8b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int qnum = skb_get_queue_mapping(skb); struct send_queue *sq = vi-sq[qnum]; int err; + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); + bool kick = !skb-xmit_more; /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (kick || netif_xmit_stopped(txq)) virtqueue_kick(sq-vq); return NETDEV_TX_OK; -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb-xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller da...@davemloft.net Signed-off-by: Michael S. Tsirkin m...@redhat.com --- David, am I using the API correctly? Seems to work for me. You used __netif_subqueue_stopped but that seems to use a slightly more expensive test_bit internally. The reason I added a variable for the txq here is because it's handy for BQL patch later on. drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..13d0a8b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int qnum = skb_get_queue_mapping(skb); struct send_queue *sq = vi-sq[qnum]; int err; + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); Do you need to cache 'txq' on stack for the entire call? Looks like it is only needed when 'kick' is true. I've not looked to see if saves both 'dev' and 'qnum' being kept. In any case it isn't mentioned in the commit message. David + bool kick = !skb-xmit_more; /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (kick || netif_xmit_stopped(txq)) virtqueue_kick(sq-vq); return NETDEV_TX_OK; -- MST -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: fix use after free
On Wed, Oct 15, 2014 at 01:24:57PM +, David Laight wrote: From: Michael S. Tsirkin commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb-xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller da...@davemloft.net Signed-off-by: Michael S. Tsirkin m...@redhat.com --- David, am I using the API correctly? Seems to work for me. You used __netif_subqueue_stopped but that seems to use a slightly more expensive test_bit internally. The reason I added a variable for the txq here is because it's handy for BQL patch later on. drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3d0ce44..13d0a8b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int qnum = skb_get_queue_mapping(skb); struct send_queue *sq = vi-sq[qnum]; int err; + struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); Do you need to cache 'txq' on stack for the entire call? Looks like it is only needed when 'kick' is true. I've not looked to see if saves both 'dev' and 'qnum' being kept. In any case it isn't mentioned in the commit message. David Code seems slightly neater this way, I haven't bothered to micro-optimize it to this level yet. Want to benchmark and send a patch on top? + bool kick = !skb-xmit_more; /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } } - if (__netif_subqueue_stopped(dev, qnum) || !skb-xmit_more) + if (kick || netif_xmit_stopped(txq)) virtqueue_kick(sq-vq); return NETDEV_TX_OK; -- MST -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC v2 1/3] virtio_net: enable tx interrupt
On newer hosts that support delayed tx interrupts, we probably don't have much to gain from orphaning packets early. Based on patch by Jason Wang. Note: this might degrade performance for hosts without event idx support. Should be addressed by the next patch. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 137 --- 1 file changed, 94 insertions(+), 43 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 13d0a8b..a9bf178 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -72,6 +72,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -217,15 +219,37 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq-vq-vdev-priv; + struct virtnet_stats *stats = this_cpu_ptr(vi-stats); + unsigned int packets = 0; + + while (packets budget + (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { + pr_debug(Sent skb %p\n, skb); + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += skb-len; + stats-tx_packets++; + u64_stats_update_end(stats-tx_syncp); + + dev_kfree_skb_any(skb); + packets++; + } + + return packets; +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-vdev-priv; + struct send_queue *sq = vi-sq[vq2txq(vq)]; - /* Suppress further interrupts. */ - virtqueue_disable_cb(vq); - - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi-dev, vq2txq(vq)); + if (napi_schedule_prep(sq-napi)) + __napi_schedule(sq-napi); } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -774,6 +798,37 @@ again: return received; } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = + container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq-vq-vdev-priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi-dev, vq2txq(sq-vq)); + unsigned int sent = 0; + bool enable_done; + +again: + __netif_tx_lock(txq, smp_processor_id()); + virtqueue_disable_cb(sq-vq); + sent += free_old_xmit_skbs(sq, budget - sent); + + if (sent budget) { + enable_done = virtqueue_enable_cb_delayed(sq-vq); + napi_complete(napi); + __netif_tx_unlock(txq); + if (unlikely(enable_done) napi_schedule_prep(napi)) { + virtqueue_disable_cb(sq-vq); + __napi_schedule(napi); + goto again; + } + } else { + __netif_tx_unlock(txq); + } + + netif_wake_subqueue(vi-dev, vq2txq(sq-vq)); + return sent; +} + #ifdef CONFIG_NET_RX_BUSY_POLL /* must be called with local_bh_disable()d */ static int virtnet_busy_poll(struct napi_struct *napi) @@ -822,30 +877,12 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); virtnet_napi_enable(vi-rq[i]); + napi_enable(vi-sq[i].napi); } return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq-vq-vdev-priv; - struct virtnet_stats *stats = this_cpu_ptr(vi-stats); - - while ((skb = virtqueue_get_buf(sq-vq, len)) != NULL) { - pr_debug(Sent skb %p\n, skb); - - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); - - dev_kfree_skb_any(skb); - } -} - static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct skb_vnet_hdr *hdr; @@ -911,7 +948,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) sg_set_buf(sq-sg, hdr, hdr_len); num_sg = skb_to_sgvec(skb, sq-sg + 1, 0, skb-len) + 1; } - return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, GFP_ATOMIC); + + return virtqueue_add_outbuf(sq-vq, sq-sg, num_sg, skb, + GFP_ATOMIC); } static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -919,12 +958,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct
[PATCH RFC v2 3/3] virtio-net: optimize free_old_xmit_skbs stats
From: Jason Wang jasow...@redhat.com We already have counters for sent packets and sent bytes. Use them to reduce the number of u64_stats_update_begin/end(). Take care not to bother with stats update when called speculatively. Based on a patch by Jason Wang. Cc: Rusty Russell ru...@rustcorp.com.au Signed-off-by: Jason Wang jasow...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8dea411..4e12023 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -233,16 +233,22 @@ static unsigned int free_old_xmit_skbs(struct netdev_queue *txq, (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { pr_debug(Sent skb %p\n, skb); - u64_stats_update_begin(stats-tx_syncp); - stats-tx_bytes += skb-len; bytes += skb-len; - stats-tx_packets++; - u64_stats_update_end(stats-tx_syncp); + packets++; dev_kfree_skb_any(skb); - packets++; } + /* Avoid overhead when no packets have been processed +* happens when called speculatively from start_xmit. */ + if (!packets) + return 0; + + u64_stats_update_begin(stats-tx_syncp); + stats-tx_bytes += bytes; + stats-tx_packets += packets; + u64_stats_update_end(stats-tx_syncp); + netdev_tx_completed_queue(txq, packets, bytes); return packets; -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC v2 2/3] virtio_net: bql
Improve tx batching using byte queue limits. Should be especially effective for MQ. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index a9bf178..8dea411 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -219,13 +219,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } -static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget) +static unsigned int free_old_xmit_skbs(struct netdev_queue *txq, + struct send_queue *sq, int budget) { struct sk_buff *skb; unsigned int len; struct virtnet_info *vi = sq-vq-vdev-priv; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); unsigned int packets = 0; + unsigned int bytes = 0; while (packets budget (skb = virtqueue_get_buf(sq-vq, len)) != NULL) { @@ -233,6 +235,7 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget) u64_stats_update_begin(stats-tx_syncp); stats-tx_bytes += skb-len; + bytes += skb-len; stats-tx_packets++; u64_stats_update_end(stats-tx_syncp); @@ -240,6 +243,8 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget) packets++; } + netdev_tx_completed_queue(txq, packets, bytes); + return packets; } @@ -810,7 +815,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) again: __netif_tx_lock(txq, smp_processor_id()); virtqueue_disable_cb(sq-vq); - sent += free_old_xmit_skbs(sq, budget - sent); + sent += free_old_xmit_skbs(txq, sq, budget - sent); if (sent budget) { enable_done = virtqueue_enable_cb_delayed(sq-vq); @@ -962,12 +967,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !skb-xmit_more; bool stopped; + unsigned int bytes = skb-len; virtqueue_disable_cb(sq-vq); /* We are going to push one skb. * Try to pop one off to free space for it. */ - free_old_xmit_skbs(sq, 1); + free_old_xmit_skbs(txq, sq, 1); /* Try to transmit */ err = xmit_skb(sq, skb); @@ -983,6 +989,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } + netdev_tx_sent_queue(txq, bytes); + + /* Kick early so device can process descriptors in parallel with us. */ + if (kick) + virtqueue_kick(sq-vq); + /* Apparently nice girls don't return TX_BUSY; stop the queue * before it gets out of hand. Naturally, this wastes entries. */ if (sq-vq-num_free 2+MAX_SKB_FRAGS) { @@ -997,7 +1009,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) if (unlikely(!virtqueue_enable_cb_delayed(sq-vq))) { /* More just got used, free them then recheck. */ - free_old_xmit_skbs(sq, qsize); + free_old_xmit_skbs(txq, sq, qsize); if (stopped sq-vq-num_free = 2+MAX_SKB_FRAGS) netif_start_subqueue(dev, qnum); } -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()
From: Raushaniya Maksudova rmaksud...@parallels.com This value would be useful in the next patch to provide the amount of the freed memory for OOM killer. Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Rusty Russell ru...@rustcorp.com.au CC: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_balloon.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f893148..66cac10 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } -static void leak_balloon(struct virtio_balloon *vb, size_t num) +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) { + unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; @@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } + num_freed_pages = vb-num_pfns; /* * Note that if * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); @@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); + return num_freed_pages; } static inline void update_stat(struct virtio_balloon *vb, int idx, -- 1.9.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 0/2] shrink virtio baloon on OOM in guest
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free memory at the last moment before some process get killed by OOM-killer. This does not provide a security breach as baloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Patch 1 of this series adds support for implementation of virtio_balloon callback, so now leak_balloon() function returns number of freed pages. Patch 2 implements virtio_balloon callback itself. Changes from v2: - added feature bit to control OOM baloon behavior from host Changes from v1: - minor cosmetic tweaks suggested by rusty@ Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Rusty Russell ru...@rustcorp.com.au CC: Michael S. Tsirkin m...@redhat.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM
From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This does not provide a security breach as balloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Allocate virtio feature bit for this: it is not set by default, the the guest will not deflate virtio balloon on OOM without explicit permission from host. Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Rusty Russell ru...@rustcorp.com.au CC: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_balloon.c | 52 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 53 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 66cac10..88d73a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -28,6 +28,7 @@ #include linux/slab.h #include linux/module.h #include linux/balloon_compaction.h +#include linux/oom.h /* * Balloon device works in 4K page units. So each page is pointed to by @@ -36,6 +37,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define OOM_VBALLOON_DEFAULT_PAGES 256 +#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 + +static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; +module_param(oom_pages, int, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(oom_pages, pages to free on OOM); struct virtio_balloon { @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* To register callback in oom notifier call chain */ + struct notifier_block nb; }; static struct virtio_device_id id_table[] = { @@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb) actual); } +/* + * virtballoon_oom_notify - release pages when system is under severe + * memory pressure (called from out_of_memory()) + * @self : notifier block struct + * @dummy: not used + * @parm : returned - number of freed pages + * + * The balancing of memory by use of the virtio balloon should not cause + * the termination of processes while there are pages in the balloon. + * If virtio balloon manages to release some memory, it will make the + * system return and retry the allocation that forced the OOM killer + * to run. + */ +static int virtballoon_oom_notify(struct notifier_block *self, + unsigned long dummy, void *parm) +{ + struct virtio_balloon *vb; + unsigned long *freed; + unsigned num_freed_pages; + + vb = container_of(self, struct virtio_balloon, nb); + if (!virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) + return NOTIFY_OK; + + freed = parm; + num_freed_pages = leak_balloon(vb, oom_pages); + update_balloon_size(vb); + *freed += num_freed_pages; + + return NOTIFY_OK; +} + static int balloon(void *_vballoon) { struct virtio_balloon *vb = _vballoon; @@ -446,6 +488,12 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb; + vb-nb.notifier_call = virtballoon_oom_notify; + vb-nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; + err = register_oom_notifier(vb-nb); + if (err 0) + goto out_oom_notify; + vb-thread = kthread_run(balloon, vb, vballoon); if (IS_ERR(vb-thread)) { err = PTR_ERR(vb-thread); @@ -455,6 +503,8 @@ static int virtballoon_probe(struct virtio_device *vdev) return 0; out_del_vqs: + unregister_oom_notifier(vb-nb); +out_oom_notify: vdev-config-del_vqs(vdev); out_free_vb: kfree(vb); @@ -479,6 +529,7 @@ static void virtballoon_remove(struct virtio_device *vdev) {
Re: [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 15 Oct 2014 16:23:28 +0300 You used __netif_subqueue_stopped but that seems to use a slightly more expensive test_bit internally. More expensive in what sense? It should be roughly the same as x y sans the volatile. Anyways I'm ambivalent and I want to see this bug fixes, so I'll apply your patch. Thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization