Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote: > This patch introduces vhost_prefetch_desc_indices() which could batch > descriptor indices fetching and used ring updating. This intends to > reduce the cache misses of indices fetching and updating and reduce > cache line bounce when virtqueue is almost full. copy_to_user() was > used in order to benefit from modern cpus that support fast string > copy. Batched virtqueue processing will be the first user. > > Signed-off-by: Jason Wang> --- > drivers/vhost/vhost.c | 55 > +++ > drivers/vhost/vhost.h | 3 +++ > 2 files changed, 58 insertions(+) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index f87ec75..8424166d 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct > vhost_dev *dev, > } > EXPORT_SYMBOL_GPL(vhost_dequeue_msg); > > +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, > + struct vring_used_elem *heads, > + u16 num, bool used_update) why do you need to combine used update with prefetch? > +{ > + int ret, ret2; > + u16 last_avail_idx, last_used_idx, total, copied; > + __virtio16 avail_idx; > + struct vring_used_elem __user *used; > + int i; > + > + if (unlikely(vhost_get_avail(vq, avail_idx, >avail->idx))) { > + vq_err(vq, "Failed to access avail idx at %p\n", > +>avail->idx); > + return -EFAULT; > + } > + last_avail_idx = vq->last_avail_idx & (vq->num - 1); > + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); > + total = vq->avail_idx - vq->last_avail_idx; > + ret = total = min(total, num); > + > + for (i = 0; i < ret; i++) { > + ret2 = vhost_get_avail(vq, heads[i].id, > + >avail->ring[last_avail_idx]); > + if (unlikely(ret2)) { > + vq_err(vq, "Failed to get descriptors\n"); > + return -EFAULT; > + } > + last_avail_idx = (last_avail_idx + 1) & (vq->num - 1); > + } > + > + if (!used_update) > + return ret; > + > + last_used_idx = vq->last_used_idx & (vq->num - 1); > + while (total) { > + copied = min((u16)(vq->num - last_used_idx), total); > + ret2 = vhost_copy_to_user(vq, > + >used->ring[last_used_idx], > + [ret - total], > + copied * sizeof(*used)); > + > + if (unlikely(ret2)) { > + vq_err(vq, "Failed to update used ring!\n"); > + return -EFAULT; > + } > + > + last_used_idx = 0; > + total -= copied; > + } > + > + /* Only get avail ring entries after they have been exposed by guest. */ > + smp_rmb(); Barrier before return is a very confusing API. I guess it's designed to be used in a specific way to make it necessary - but what is it? > + return ret; > +} > +EXPORT_SYMBOL(vhost_prefetch_desc_indices); > > static int __init vhost_init(void) > { > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 39ff897..16c2cb6 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct > iov_iter *to, > ssize_t vhost_chr_write_iter(struct vhost_dev *dev, >struct iov_iter *from); > int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); > +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, > + struct vring_used_elem *heads, > + u16 num, bool used_update); > > #define vq_err(vq, fmt, ...) do { \ > pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ > -- > 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 0/5] batched tx processing in vhost_net
On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote: > Hi: > > This series tries to implement basic tx batched processing. This is > done by prefetching descriptor indices and update used ring in a > batch. This intends to speed up used ring updating and improve the > cache utilization. Test shows about ~22% improvement in tx pss. > Please review. > > Jason Wang (5): > vhost: split out ring head fetching logic > vhost: introduce helper to prefetch desc index > vhost: introduce vhost_add_used_idx() Please squash these new APIs into where they are used. This split-up just makes review harder for me as I can't figure out how the new APIs are used. > vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH This is ok as a separate patch. > vhost_net: basic tx virtqueue batched processing > > drivers/vhost/net.c | 221 > -- > drivers/vhost/vhost.c | 165 +++-- > drivers/vhost/vhost.h | 9 ++ > 3 files changed, 270 insertions(+), 125 deletions(-) > > -- > 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote: > This patch introduces a helper which just increase the used idx. This > will be used in pair with vhost_prefetch_desc_indices() by batching > code. > > Signed-off-by: Jason Wang> --- > drivers/vhost/vhost.c | 33 + > drivers/vhost/vhost.h | 1 + > 2 files changed, 34 insertions(+) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 8424166d..6532cda 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, > unsigned int head, int len) > } > EXPORT_SYMBOL_GPL(vhost_add_used); > > +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n) > +{ > + u16 old, new; > + > + old = vq->last_used_idx; > + new = (vq->last_used_idx += n); > + /* If the driver never bothers to signal in a very long while, > + * used index might wrap around. If that happens, invalidate > + * signalled_used index we stored. TODO: make sure driver > + * signals at least once in 2^16 and remove this. > + */ > + if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old))) > + vq->signalled_used_valid = false; > + > + /* Make sure buffer is written before we update index. */ > + smp_wmb(); > + if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), > +>used->idx)) { > + vq_err(vq, "Failed to increment used idx"); > + return -EFAULT; > + } > + if (unlikely(vq->log_used)) { > + /* Log used index update. */ > + log_write(vq->log_base, > + vq->log_addr + offsetof(struct vring_used, idx), > + sizeof(vq->used->idx)); > + if (vq->log_ctx) > + eventfd_signal(vq->log_ctx, 1); > + } > + return 0; > +} > +EXPORT_SYMBOL_GPL(vhost_add_used_idx); > + > static int __vhost_add_used_n(struct vhost_virtqueue *vq, > struct vring_used_elem *heads, > unsigned count) > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 16c2cb6..5dd6c05 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq, > void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); > > int vhost_vq_init_access(struct vhost_virtqueue *); > +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n); > int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len); > int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads, >unsigned count); Please change the API to hide the fact that there's an index that needs to be updated. > -- > 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched processing
On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote: > This patch implements basic batched processing of tx virtqueue by > prefetching desc indices and updating used ring in a batch. For > non-zerocopy case, vq->heads were used for storing the prefetched > indices and updating used ring. It is also a requirement for doing > more batching on top. For zerocopy case and for simplicity, batched > processing were simply disabled by only fetching and processing one > descriptor at a time, this could be optimized in the future. > > XDP_DROP (without touching skb) on tun (with Moongen in guest) with > zercopy disabled: > > Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz: > Before: 3.20Mpps > After: 3.90Mpps (+22%) > > No differences were seen with zerocopy enabled. > > Signed-off-by: Jason WangSo where is the speedup coming from? I'd guess the ring is hot in cache, it's faster to access it in one go, then pass many packets to net stack. Is that right? Another possibility is better code cache locality. So how about this patchset is refactored: 1. use existing APIs just first get packets then transmit them all then use them all 2. add new APIs and move the loop into vhost core for more speedups > --- > drivers/vhost/net.c | 215 > -- > drivers/vhost/vhost.c | 2 +- > 2 files changed, 121 insertions(+), 96 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index c89640e..c439892 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -408,27 +408,25 @@ static int vhost_net_enable_vq(struct vhost_net *n, > return vhost_poll_start(poll, sock->file); > } > > -static int vhost_net_tx_get_vq_desc(struct vhost_net *net, > - struct vhost_virtqueue *vq, > - struct iovec iov[], unsigned int iov_size, > - unsigned int *out_num, unsigned int *in_num) > +static bool vhost_net_tx_avail(struct vhost_net *net, > +struct vhost_virtqueue *vq) > { > unsigned long uninitialized_var(endtime); > - int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), > - out_num, in_num, NULL, NULL); > > - if (r == vq->num && vq->busyloop_timeout) { > - preempt_disable(); > - endtime = busy_clock() + vq->busyloop_timeout; > - while (vhost_can_busy_poll(vq->dev, endtime) && > -vhost_vq_avail_empty(vq->dev, vq)) > - cpu_relax(); > - preempt_enable(); > - r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), > - out_num, in_num, NULL, NULL); > - } > + if (!vq->busyloop_timeout) > + return false; > > - return r; > + if (!vhost_vq_avail_empty(vq->dev, vq)) > + return true; > + > + preempt_disable(); > + endtime = busy_clock() + vq->busyloop_timeout; > + while (vhost_can_busy_poll(vq->dev, endtime) && > + vhost_vq_avail_empty(vq->dev, vq)) > + cpu_relax(); > + preempt_enable(); > + > + return !vhost_vq_avail_empty(vq->dev, vq); > } > > static bool vhost_exceeds_maxpend(struct vhost_net *net) > @@ -446,8 +444,9 @@ static void handle_tx(struct vhost_net *net) > { > struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX]; > struct vhost_virtqueue *vq = >vq; > + struct vring_used_elem used, *heads = vq->heads; > unsigned out, in; > - int head; > + int avails, head; > struct msghdr msg = { > .msg_name = NULL, > .msg_namelen = 0, > @@ -461,6 +460,7 @@ static void handle_tx(struct vhost_net *net) > struct socket *sock; > struct vhost_net_ubuf_ref *uninitialized_var(ubufs); > bool zcopy, zcopy_used; > + int i, batched = VHOST_NET_BATCH; > > mutex_lock(>mutex); > sock = vq->private_data; > @@ -475,6 +475,12 @@ static void handle_tx(struct vhost_net *net) > hdr_size = nvq->vhost_hlen; > zcopy = nvq->ubufs; > > + /* Disable zerocopy batched fetching for simplicity */ > + if (zcopy) { > + heads = > + batched = 1; > + } > + > for (;;) { > /* Release DMAs done buffers first */ > if (zcopy) > @@ -486,95 +492,114 @@ static void handle_tx(struct vhost_net *net) > if (unlikely(vhost_exceeds_maxpend(net))) > break; > > - head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, > - ARRAY_SIZE(vq->iov), > - , ); > + avails = vhost_prefetch_desc_indices(vq, heads, batched, > !zcopy); > /* On error, stop handling until the next kick. */ > - if (unlikely(head < 0)) > + if (unlikely(avails
Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
On 2017年09月27日 03:13, Michael S. Tsirkin wrote: On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote: This patch introduces a helper which just increase the used idx. This will be used in pair with vhost_prefetch_desc_indices() by batching code. Signed-off-by: Jason Wang--- drivers/vhost/vhost.c | 33 + drivers/vhost/vhost.h | 1 + 2 files changed, 34 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8424166d..6532cda 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len) } EXPORT_SYMBOL_GPL(vhost_add_used); +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n) +{ + u16 old, new; + + old = vq->last_used_idx; + new = (vq->last_used_idx += n); + /* If the driver never bothers to signal in a very long while, +* used index might wrap around. If that happens, invalidate +* signalled_used index we stored. TODO: make sure driver +* signals at least once in 2^16 and remove this. +*/ + if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old))) + vq->signalled_used_valid = false; + + /* Make sure buffer is written before we update index. */ + smp_wmb(); + if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), + >used->idx)) { + vq_err(vq, "Failed to increment used idx"); + return -EFAULT; + } + if (unlikely(vq->log_used)) { + /* Log used index update. */ + log_write(vq->log_base, + vq->log_addr + offsetof(struct vring_used, idx), + sizeof(vq->used->idx)); + if (vq->log_ctx) + eventfd_signal(vq->log_ctx, 1); + } + return 0; +} +EXPORT_SYMBOL_GPL(vhost_add_used_idx); + static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 16c2cb6..5dd6c05 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq, void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); int vhost_vq_init_access(struct vhost_virtqueue *); +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n); int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len); int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads, unsigned count); Please change the API to hide the fact that there's an index that needs to be updated. In fact, an interesting optimization on top is just call vhost_add_used_idx(vq, n) instead of n vhost_add_used_idx(vq, 1). That's the reason I leave n in the API. Thanks -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 0/5] batched tx processing in vhost_net
On 2017年09月27日 03:26, Michael S. Tsirkin wrote: On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote: Hi: This series tries to implement basic tx batched processing. This is done by prefetching descriptor indices and update used ring in a batch. This intends to speed up used ring updating and improve the cache utilization. Test shows about ~22% improvement in tx pss. Please review. Jason Wang (5): vhost: split out ring head fetching logic vhost: introduce helper to prefetch desc index vhost: introduce vhost_add_used_idx() Please squash these new APIs into where they are used. This split-up just makes review harder for me as I can't figure out how the new APIs are used. Ok. Thanks vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH This is ok as a separate patch. vhost_net: basic tx virtqueue batched processing drivers/vhost/net.c | 221 -- drivers/vhost/vhost.c | 165 +++-- drivers/vhost/vhost.h | 9 ++ 3 files changed, 270 insertions(+), 125 deletions(-) -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 4/4] KVM/vmx: enable lbr for the guest
On 09/27/2017 12:41 AM, Andi Kleen wrote: 1) vCPU context switching and guest side task switching are not identical. That is, when the vCPU is scheduled out, the guest task on the vCPU may not guest task lifetime has nothing to do with this. It's completely independent of what you do here on the VCPU level. run out its time slice yet, so the task will continue to run when the vCPU is scheduled in by the host (lbr wasn't save by the guest task when the vCPU is scheduled out in this case). It is possible to have the vCPU which runs the guest task (in use of lbr) scheduled out, followed by a new host task being scheduled in on the pCPU to run. It is not guaranteed that the new host task does not use the LBR feature on the pCPU. Sure it may use the LBR, and the normal perf context switch will switch it and everything works fine. It's like any other per-task LBR user. OK, I see the point, thanks. Why couldn't we save the LBR_SELECT via task switching too? Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched processing
On 2017年09月27日 03:25, Michael S. Tsirkin wrote: On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote: This patch implements basic batched processing of tx virtqueue by prefetching desc indices and updating used ring in a batch. For non-zerocopy case, vq->heads were used for storing the prefetched indices and updating used ring. It is also a requirement for doing more batching on top. For zerocopy case and for simplicity, batched processing were simply disabled by only fetching and processing one descriptor at a time, this could be optimized in the future. XDP_DROP (without touching skb) on tun (with Moongen in guest) with zercopy disabled: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz: Before: 3.20Mpps After: 3.90Mpps (+22%) No differences were seen with zerocopy enabled. Signed-off-by: Jason WangSo where is the speedup coming from? I'd guess the ring is hot in cache, it's faster to access it in one go, then pass many packets to net stack. Is that right? Another possibility is better code cache locality. Yes, I think the speed up comes from: - less cache misses - less cache line bounce when virtqueue is about to be full (guest is faster than host which is the case of MoonGen) - less memory barriers - possible faster copy speed by using copy_to_user() on modern CPUs So how about this patchset is refactored: 1. use existing APIs just first get packets then transmit them all then use them all Looks like current API can not get packets first, it only support get packet one by one (if you mean vhost_get_vq_desc()). And used ring updating may get more misses in this case. 2. add new APIs and move the loop into vhost core for more speedups I don't see any advantages, looks like just need some e.g callbacks in this case. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
On 2017年09月27日 03:19, Michael S. Tsirkin wrote: On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote: This patch introduces vhost_prefetch_desc_indices() which could batch descriptor indices fetching and used ring updating. This intends to reduce the cache misses of indices fetching and updating and reduce cache line bounce when virtqueue is almost full. copy_to_user() was used in order to benefit from modern cpus that support fast string copy. Batched virtqueue processing will be the first user. Signed-off-by: Jason Wang--- drivers/vhost/vhost.c | 55 +++ drivers/vhost/vhost.h | 3 +++ 2 files changed, 58 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index f87ec75..8424166d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev, } EXPORT_SYMBOL_GPL(vhost_dequeue_msg); +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, + struct vring_used_elem *heads, + u16 num, bool used_update) why do you need to combine used update with prefetch? For better performance and I believe we don't care about the overhead when we meet errors in tx. +{ + int ret, ret2; + u16 last_avail_idx, last_used_idx, total, copied; + __virtio16 avail_idx; + struct vring_used_elem __user *used; + int i; + + if (unlikely(vhost_get_avail(vq, avail_idx, >avail->idx))) { + vq_err(vq, "Failed to access avail idx at %p\n", + >avail->idx); + return -EFAULT; + } + last_avail_idx = vq->last_avail_idx & (vq->num - 1); + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); + total = vq->avail_idx - vq->last_avail_idx; + ret = total = min(total, num); + + for (i = 0; i < ret; i++) { + ret2 = vhost_get_avail(vq, heads[i].id, + >avail->ring[last_avail_idx]); + if (unlikely(ret2)) { + vq_err(vq, "Failed to get descriptors\n"); + return -EFAULT; + } + last_avail_idx = (last_avail_idx + 1) & (vq->num - 1); + } + + if (!used_update) + return ret; + + last_used_idx = vq->last_used_idx & (vq->num - 1); + while (total) { + copied = min((u16)(vq->num - last_used_idx), total); + ret2 = vhost_copy_to_user(vq, + >used->ring[last_used_idx], + [ret - total], + copied * sizeof(*used)); + + if (unlikely(ret2)) { + vq_err(vq, "Failed to update used ring!\n"); + return -EFAULT; + } + + last_used_idx = 0; + total -= copied; + } + + /* Only get avail ring entries after they have been exposed by guest. */ + smp_rmb(); Barrier before return is a very confusing API. I guess it's designed to be used in a specific way to make it necessary - but what is it? Looks like a and we need do this after reading avail_idx. Thanks + return ret; +} +EXPORT_SYMBOL(vhost_prefetch_desc_indices); static int __init vhost_init(void) { diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 39ff897..16c2cb6 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to, ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from); int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq, + struct vring_used_elem *heads, + u16 num, bool used_update); #define vq_err(vq, fmt, ...) do { \ pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/4] Enable LBR for the guest
> On the other side, it seems that the (guest) kernel driver also works > without > the above being supported, should we change it to report error and stop > using the PMU features when the check of the above two fails (at > intel_pmu_init())? You could add the extra check for the LBR code yes, although it already checks if the LBR MSRs work, so in practice it's likely already covered. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 4/4] KVM/vmx: enable lbr for the guest
> 1) vCPU context switching and guest side task switching are not identical. > That is, when the vCPU is scheduled out, the guest task on the vCPU may not guest task lifetime has nothing to do with this. It's completely independent of what you do here on the VCPU level. > run out its time slice yet, so the task will continue to run when the vCPU > is > scheduled in by the host (lbr wasn't save by the guest task when the vCPU is > scheduled out in this case). > > It is possible to have the vCPU which runs the guest task (in use of lbr) > scheduled > out, followed by a new host task being scheduled in on the pCPU to run. > It is not guaranteed that the new host task does not use the LBR feature on > the > pCPU. Sure it may use the LBR, and the normal perf context switch will switch it and everything works fine. It's like any other per-task LBR user. > > 2) Sometimes, people may want this usage: "perf record -b > ./qemu-system-x86_64 ...", > which will need lbr to be used in KVM as well. In this obscure case you can disable LBR support for the guest. The common case is far more important. It sounds like you didn't do any performance measurements. I expect the performance of your current solution to be terrible. e.g. a normal perf PMI does at least 1 MSR reads and 4+ MSR writes for a single counter. With multiple counters it gets worse. For each of those you'll need to exit. Adding something to the entry/exit list is similar to the cost of doing explicit RD/WRMSRs. On Skylake we have 32*3=96 MSRs for the LBRs. So with the 5 exits and entries, you're essentually doing 5*2*96=18432 extra MSR accesses for each PMI. MSR access is 100+ cycles at least, for writes it is far more expensive. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 0/5] batched tx processing in vhost_net
On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote: > Hi: > > This series tries to implement basic tx batched processing. This is > done by prefetching descriptor indices and update used ring in a > batch. This intends to speed up used ring updating and improve the > cache utilization. Interesting, thanks for the patches. So IIUC most of the gain is really overcoming some of the shortcomings of virtio 1.0 wrt cache utilization? Which is fair enough (1.0 is already deployed) but I would like to avoid making 1.1 support harder, and this patchset does this unfortunately, see comments on individual patches. I'm sure it can be addressed though. > Test shows about ~22% improvement in tx pss. Is this with or without tx napi in guest? > Please review. > > Jason Wang (5): > vhost: split out ring head fetching logic > vhost: introduce helper to prefetch desc index > vhost: introduce vhost_add_used_idx() > vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH > vhost_net: basic tx virtqueue batched processing > > drivers/vhost/net.c | 221 > -- > drivers/vhost/vhost.c | 165 +++-- > drivers/vhost/vhost.h | 9 ++ > 3 files changed, 270 insertions(+), 125 deletions(-) > > -- > 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/4] Enable LBR for the guest
On 09/25/2017 10:59 PM, Andi Kleen wrote: On Mon, Sep 25, 2017 at 12:44:52PM +0800, Wei Wang wrote: This patch series enables the Last Branch Recording feature for the guest. Instead of trapping each LBR stack MSR access, the MSRs are passthroughed to the guest. Those MSRs are switched (i.e. load and saved) on VMExit and VMEntry. Test: Try "perf record -b ./test_program" on guest. I don't see where you expose the PERF capabilities MSR? That's normally needed for LBR too to report the version number. It was missed, thanks for pointing it out. I also found KVM/QEMU doesn't expose CPUID.PDCM, will add that too. Since for now we are enabling LBR, I plan to expose only "PERF_CAP & 0x3f" to the guest, which reports the LBR format only. On the other side, it seems that the (guest) kernel driver also works without the above being supported, should we change it to report error and stop using the PMU features when the check of the above two fails (at intel_pmu_init())? Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 4/4] KVM/vmx: enable lbr for the guest
On 09/25/2017 10:57 PM, Andi Kleen wrote: +static void auto_switch_lbr_msrs(struct vcpu_vmx *vmx) +{ + int i; + struct perf_lbr_stack lbr_stack; + + perf_get_lbr_stack(_stack); + + add_atomic_switch_msr(vmx, MSR_LBR_SELECT, 0, 0); + add_atomic_switch_msr(vmx, lbr_stack.lbr_tos, 0, 0); + + for (i = 0; i < lbr_stack.lbr_nr; i++) { + add_atomic_switch_msr(vmx, lbr_stack.lbr_from + i, 0, 0); + add_atomic_switch_msr(vmx, lbr_stack.lbr_to + i, 0, 0); + if (lbr_stack.lbr_info) + add_atomic_switch_msr(vmx, lbr_stack.lbr_info + i, 0, + 0); + } That will be really expensive and add a lot of overhead to every entry/exit. perf can already context switch the LBRs on task context switch. With that you can just switch LBR_SELECT, which is *much* cheaper because there are far less context switches than exit/entries. It implies that when KVM is running it needs to prevent perf from enabling LBRs in the context of KVM, but that should be straight forward. I kind of have a different thought here: 1) vCPU context switching and guest side task switching are not identical. That is, when the vCPU is scheduled out, the guest task on the vCPU may not run out its time slice yet, so the task will continue to run when the vCPU is scheduled in by the host (lbr wasn't save by the guest task when the vCPU is scheduled out in this case). It is possible to have the vCPU which runs the guest task (in use of lbr) scheduled out, followed by a new host task being scheduled in on the pCPU to run. It is not guaranteed that the new host task does not use the LBR feature on the pCPU. 2) Sometimes, people may want this usage: "perf record -b ./qemu-system-x86_64 ...", which will need lbr to be used in KVM as well. I think one possible optimization we could do would be to add the LBR MSRs to auto switching when the guest requests to enable the feature, and remove them when being disabled. This will need to trap guest access to MSR_DEBUGCTL. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 REPOST 1/6] hw_random: place mutex around read functions and buffers.
> > A bit late to a party, but: > > On Mon, Dec 8, 2014 at 12:50 AM, Amos Kongwrote: > > From: Rusty Russell > > > > There's currently a big lock around everything, and it means that we > > can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) > > while the rng is reading. This is a real problem when the rng is slow, > > or blocked (eg. virtio_rng with qemu's default /dev/random backend) > > > > This doesn't help (it leaves the current lock untouched), just adds a > > lock to protect the read function and the static buffers, in preparation > > for transition. > > > > Signed-off-by: Rusty Russell > > --- > ... > > > > @@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char > > __user *buf, > > goto out_unlock; > > } > > > > + mutex_lock(_mutex); > > I think this breaks O_NONBLOCK: we have hwrng core thread that is > constantly pumps underlying rng for data; the thread takes the mutex > and calls rng_get_data() that blocks until RNG responds. This means > that even user specified O_NONBLOCK here we'll be waiting until > [hwrng] thread releases reading_mutex before we can continue. I think for 'virtio_rng' for 'O_NON_BLOCK' 'rng_get_data' returns without waiting for data which can let mutex to be used by other threads waiting if any? rng_dev_read rng_get_data virtio_read static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) { int ret; struct virtrng_info *vi = (struct virtrng_info *)rng->priv; if (vi->hwrng_removed) return -ENODEV; if (!vi->busy) { vi->busy = true; init_completion(>have_data); register_buffer(vi, buf, size); } if (!wait) return 0; ret = wait_for_completion_killable(>have_data); if (ret < 0) return ret; vi->busy = false; return vi->data_avail; } > > > if (!data_avail) { > > bytes_read = rng_get_data(current_rng, rng_buffer, > > rng_buffer_size(), > > !(filp->f_flags & O_NONBLOCK)); > > if (bytes_read < 0) { > > err = bytes_read; > > - goto out_unlock; > > + goto out_unlock_reading; > > } > > data_avail = bytes_read; > > } > > Thanks. > > -- > Dmitry > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] KVM: s390: Disable CONFIG_S390_GUEST_OLD_TRANSPORT by default
On 09/25/2017 04:45 PM, Thomas Huth wrote: > There is no recent user space application available anymore which still > supports this old virtio transport, so let's disable this by default. > > Signed-off-by: Thomas Huththanks applied. > --- > arch/s390/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index 48af970..923bf04 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -930,7 +930,7 @@ config S390_GUEST > the KVM hypervisor. > > config S390_GUEST_OLD_TRANSPORT > - def_bool y > + def_bool n > prompt "Guest support for old s390 virtio transport (DEPRECATED)" > depends on S390_GUEST > help > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] KVM: s390: Disable CONFIG_S390_GUEST_OLD_TRANSPORT by default
On Tue, Sep 26, 2017 at 12:41:41PM +0200, Christian Borntraeger wrote: > > > On 09/26/2017 12:40 PM, Heiko Carstens wrote: > > On Mon, Sep 25, 2017 at 08:37:36PM +0200, Christian Borntraeger wrote: > >> > >> On 09/25/2017 07:54 PM, Halil Pasic wrote: > >>> > >>> > >>> On 09/25/2017 04:45 PM, Thomas Huth wrote: > There is no recent user space application available anymore which still > supports this old virtio transport, so let's disable this by default. > > Signed-off-by: Thomas Huth> >>> > >>> I don't have any objections, but there may be something I'm not aware of. > >>> Let's see what Connie says. From my side it's ack. > >>> > >>> Via whom is this supposed to go in? Looking at the MAINTAINERS, I would > >>> say Martin or Heiko but I don't see them among the recipients. > >> > >> FWIW as the original author of that transport > >> Acked-by: Christian Borntraeger > >> > >> I can pick this up for Martins/Heikos tree if you want. > > > > When will this code be removed? > > > > When the config option was initially added Conny said it should survive > > "probably two kernel releases or so, depending on feedback". > > It was merged for v4.8. Now we are five kernel releases later... > > Lets wait for one release and then get rid of it? So it's going to be removed with the next merge window. Where is the patch? ;) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] KVM: s390: Disable CONFIG_S390_GUEST_OLD_TRANSPORT by default
On Tue, Sep 26, 2017 at 12:57:26PM +0200, Thomas Huth wrote: > On 26.09.2017 12:47, Heiko Carstens wrote: > > So it's going to be removed with the next merge window. > > Where is the patch? ;) > > Hmm, so far the code was always enabled by default, so in the unlikely > case that somebody is still using it, they might not have noticed that > this is marked as deprecated. So maybe it's better to keep the code for > two more released, but disable it by default, so that people have time > to realize that it is going away? ... just my 0.02 € ... but if you > prefer, I can also write a patch that removes it immediately instead. Switching the default from yes to no won't prevent anybody from selecting it again. So I don't see any value in waiting even longer. Besides the original commit already printed a message to the console that the transport is deprecated. See commit 3b2fbb3f06ef ("virtio/s390: deprecate old transport"). ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] KVM: s390: Disable CONFIG_S390_GUEST_OLD_TRANSPORT by default
On Mon, Sep 25, 2017 at 08:37:36PM +0200, Christian Borntraeger wrote: > > On 09/25/2017 07:54 PM, Halil Pasic wrote: > > > > > > On 09/25/2017 04:45 PM, Thomas Huth wrote: > >> There is no recent user space application available anymore which still > >> supports this old virtio transport, so let's disable this by default. > >> > >> Signed-off-by: Thomas Huth> > > > I don't have any objections, but there may be something I'm not aware of. > > Let's see what Connie says. From my side it's ack. > > > > Via whom is this supposed to go in? Looking at the MAINTAINERS, I would > > say Martin or Heiko but I don't see them among the recipients. > > FWIW as the original author of that transport > Acked-by: Christian Borntraeger > > I can pick this up for Martins/Heikos tree if you want. When will this code be removed? When the config option was initially added Conny said it should survive "probably two kernel releases or so, depending on feedback". It was merged for v4.8. Now we are five kernel releases later... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] KVM: s390: Disable CONFIG_S390_GUEST_OLD_TRANSPORT by default
On 09/26/2017 12:40 PM, Heiko Carstens wrote: > On Mon, Sep 25, 2017 at 08:37:36PM +0200, Christian Borntraeger wrote: >> >> On 09/25/2017 07:54 PM, Halil Pasic wrote: >>> >>> >>> On 09/25/2017 04:45 PM, Thomas Huth wrote: There is no recent user space application available anymore which still supports this old virtio transport, so let's disable this by default. Signed-off-by: Thomas Huth>>> >>> I don't have any objections, but there may be something I'm not aware of. >>> Let's see what Connie says. From my side it's ack. >>> >>> Via whom is this supposed to go in? Looking at the MAINTAINERS, I would >>> say Martin or Heiko but I don't see them among the recipients. >> >> FWIW as the original author of that transport >> Acked-by: Christian Borntraeger >> >> I can pick this up for Martins/Heikos tree if you want. > > When will this code be removed? > > When the config option was initially added Conny said it should survive > "probably two kernel releases or so, depending on feedback". > It was merged for v4.8. Now we are five kernel releases later... Lets wait for one release and then get rid of it? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization