Re: [RFC PATCH V3 1/5] vhost: generalize adding used elem
On Mon, Jan 07, 2019 at 03:00:17PM +0800, Jason Wang wrote: > > On 2019/1/5 上午8:33, Sean Christopherson wrote: > > On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote: > > > > Use one generic vhost_copy_to_user() instead of two dedicated > > > > accessor. This will simplify the conversion to fine grain > > > > accessors. About 2% improvement of PPS were seen during vitio-user > > > > txonly test. > > > > > > > > Signed-off-by: Jason Wang > > > I don't hve a problem with this patch but do you have > > > any idea how come removing what's supposed to be > > > an optimization speeds things up? > > With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair, > > which is probably slower than the overhead of CALL+RET to whatever flavor > > of copy_user_generic() gets used. CALL+RET is really the only overhead > > since all variants of copy_user_generic() unroll accesses smaller than > > 64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8 > > bytes in a single MOV. > > > > Removing the special casing also eliminates a few hundred bytes of code > > as well as the need for hardware to predict count==1 vs. count>1. > > > > Yes, I don't measure, but STAC/CALC is pretty expensive when we are do very > small copies based on the result of nosmap PPS. > > Thanks Yes all this really looks like a poster child for uaccess_begin/end plus unsafe accesses. And if these APIs don't do the job for us then maybe better ones are needed ... -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V3 1/5] vhost: generalize adding used elem
On 2019/1/5 上午8:33, Sean Christopherson wrote: On Fri, Jan 04, 2019 at 04:29:34PM -0500, Michael S. Tsirkin wrote: On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote: Use one generic vhost_copy_to_user() instead of two dedicated accessor. This will simplify the conversion to fine grain accessors. About 2% improvement of PPS were seen during vitio-user txonly test. Signed-off-by: Jason Wang I don't hve a problem with this patch but do you have any idea how come removing what's supposed to be an optimization speeds things up? With SMAP, the 2x vhost_put_user() will also mean an extra STAC/CLAC pair, which is probably slower than the overhead of CALL+RET to whatever flavor of copy_user_generic() gets used. CALL+RET is really the only overhead since all variants of copy_user_generic() unroll accesses smaller than 64 bytes, e.g. on a 64-bit system, __copy_to_user() will write all 8 bytes in a single MOV. Removing the special casing also eliminates a few hundred bytes of code as well as the need for hardware to predict count==1 vs. count>1. Yes, I don't measure, but STAC/CALC is pretty expensive when we are do very small copies based on the result of nosmap PPS. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V3 1/5] vhost: generalize adding used elem
On Sat, Dec 29, 2018 at 08:46:52PM +0800, Jason Wang wrote: > Use one generic vhost_copy_to_user() instead of two dedicated > accessor. This will simplify the conversion to fine grain > accessors. About 2% improvement of PPS were seen during vitio-user > txonly test. > > Signed-off-by: Jason Wang I don't hve a problem with this patch but do you have any idea how come removing what's supposed to be an optimization speeds things up? > --- > drivers/vhost/vhost.c | 11 +-- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 55e5aa662ad5..f179b5ee14c4 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2174,16 +2174,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue > *vq, > > start = vq->last_used_idx & (vq->num - 1); > used = vq->used->ring + start; > - if (count == 1) { > - if (vhost_put_user(vq, heads[0].id, >id)) { > - vq_err(vq, "Failed to write used id"); > - return -EFAULT; > - } > - if (vhost_put_user(vq, heads[0].len, >len)) { > - vq_err(vq, "Failed to write used len"); > - return -EFAULT; > - } > - } else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) { > + if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) { > vq_err(vq, "Failed to write used"); > return -EFAULT; > } > -- > 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC PATCH V3 1/5] vhost: generalize adding used elem
Use one generic vhost_copy_to_user() instead of two dedicated accessor. This will simplify the conversion to fine grain accessors. About 2% improvement of PPS were seen during vitio-user txonly test. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 55e5aa662ad5..f179b5ee14c4 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2174,16 +2174,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, start = vq->last_used_idx & (vq->num - 1); used = vq->used->ring + start; - if (count == 1) { - if (vhost_put_user(vq, heads[0].id, >id)) { - vq_err(vq, "Failed to write used id"); - return -EFAULT; - } - if (vhost_put_user(vq, heads[0].len, >len)) { - vq_err(vq, "Failed to write used len"); - return -EFAULT; - } - } else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) { + if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) { vq_err(vq, "Failed to write used"); return -EFAULT; } -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization