Re: [V9fs-developer] [PATCH] net/9p: fix error path of p9_virtio_probe

2018-05-24 Thread Greg Kurz
On Thu, 24 May 2018 11:10:21 +0100
Jean-Philippe Brucker <jean-philippe.bruc...@arm.com> wrote:

> Currently when virtio_find_single_vq fails, we go through del_vqs which
> throws a warning (Trying to free already-free IRQ). Skip del_vqs if vq
> allocation failed.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.bruc...@arm.com>
> ---

Reviewed-by: Greg Kurz <gr...@kaod.org>

>  net/9p/trans_virtio.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 4d0372263e5d..1c87eee522b7 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -562,7 +562,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
>   chan->vq = virtio_find_single_vq(vdev, req_done, "requests");
>   if (IS_ERR(chan->vq)) {
>   err = PTR_ERR(chan->vq);
> - goto out_free_vq;
> + goto out_free_chan;
>   }
>   chan->vq->vdev->priv = chan;
>   spin_lock_init(>lock);
> @@ -615,6 +615,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
>   kfree(tag);
>  out_free_vq:
>   vdev->config->del_vqs(vdev);
> +out_free_chan:
>   kfree(chan);
>  fail:
>   return err;



Re: [PATCH] net/9p: avoid -ERESTARTSYS leak to userspace

2018-03-10 Thread Greg Kurz
Hi Andrew,

Thank you very much for taking care of this.

Please find my answers to your remarks below.

On Fri, 9 Mar 2018 14:12:52 -0800
Andrew Morton <a...@linux-foundation.org> wrote:

> On Fri, 09 Mar 2018 21:41:38 +0100 Greg Kurz <gr...@kaod.org> wrote:
> 
> > If it was interrupted by a signal, the 9p client may need to send some
> > more requests to the server for cleanup before returning to userspace.
> > 
> > To avoid such a last minute request to be interrupted right away, the
> > client memorizes if a signal is pending, clears TIF_SIGPENDING, handles
> > the request and calls recalc_sigpending() before returning.
> > 
> > Unfortunately, if the transmission of this cleanup request fails for any
> > reason, the transport returns an error and the client propagates it right
> > away, without calling recalc_sigpending().
> > 
> > This ends up with -ERESTARTSYS from the initially interrupted request
> > crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
> > request. The specific signal handling code, which is responsible for
> > converting -ERESTARTSYS to -EINTR is not called, and userspace receives
> > the confusing errno value:
> > 
> > open: Unknown error 512 (512)
> > 
> > This is really hard to hit in real life. I discovered the issue while
> > working on hot-unplug of a virtio-9p-pci device with an instrumented
> > QEMU allowing to control request completion.
> > 
> > Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
> > error path actually. Their code flow is a bit obscure and the best
> > thing to do would probably be a full rewrite: to really ensure this
> > situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
> > never happen.
> > 
> > But given the general lack of interest for the 9p code, I won't risk
> > breaking more things. So this patch simply fixes the buggy paths in
> > both functions with a trivial label+goto.
> > 
> > Thanks to Laurent Dufour for his help and suggestions on how to find
> > the root cause and how to fix it.  
> 
> That's a fairly straightforward-looking bug.  However the code still
> looks a bit racy:
> 
> 
> : if (signal_pending(current)) {
> : sigpending = 1;
> : clear_thread_flag(TIF_SIGPENDING);
> : } else
> : sigpending = 0;
> : 
> 
> what happens if signal_pending(current) becomes true right here?
>

Depending on the transport c->trans_mod->request() could possibly
return -ERESTARTSYS, and we would then recalc_sigpending() and
propagate -ERESTARTSYS to the caller.

> : err = c->trans_mod->request(c, req);
> 
> 
> I'm surprised that the 9p client is mucking with signals at all. 
> Signals are a userspace IPC scheme and kernel code should instead be
> using the more powerful messaging mechanisms which we've developed. 
> Ones which don't disrupt userspace state.
> 
> Why is this happening?  Is there some userspace/kernel interoperation
> involved?
> 

Before commit 9523feac272ccad2ad8186ba4fcc89103754de52, we used to rely
on wait_event_interruptible() instead of wait_event_killable(). IIUC,
the purpose of all this sigpending tweaking was just to allow subsequent
cleanup related requests to be issued, without them being interrupted right
away because of the initial signal.

BTW the issue tentatively fixed by commit 
9523feac272ccad2ad8186ba4fcc89103754de52
was more deeply investigated since then. It was caused by a bug in QEMU that 
got 
fixed, and will be available in the next release (2.12). And to cope with 
existing
buggy QEMUs, we can assume that a -EINTR response from the server necessarily 
means
the corresponding request has been successfully canceled.

Cheers,

--
Greg


[PATCH] net/9p: avoid -ERESTARTSYS leak to userspace

2018-03-09 Thread Greg Kurz
If it was interrupted by a signal, the 9p client may need to send some
more requests to the server for cleanup before returning to userspace.

To avoid such a last minute request to be interrupted right away, the
client memorizes if a signal is pending, clears TIF_SIGPENDING, handles
the request and calls recalc_sigpending() before returning.

Unfortunately, if the transmission of this cleanup request fails for any
reason, the transport returns an error and the client propagates it right
away, without calling recalc_sigpending().

This ends up with -ERESTARTSYS from the initially interrupted request
crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
request. The specific signal handling code, which is responsible for
converting -ERESTARTSYS to -EINTR is not called, and userspace receives
the confusing errno value:

open: Unknown error 512 (512)

This is really hard to hit in real life. I discovered the issue while
working on hot-unplug of a virtio-9p-pci device with an instrumented
QEMU allowing to control request completion.

Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
error path actually. Their code flow is a bit obscure and the best
thing to do would probably be a full rewrite: to really ensure this
situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
never happen.

But given the general lack of interest for the 9p code, I won't risk
breaking more things. So this patch simply fixes the buggy paths in
both functions with a trivial label+goto.

Thanks to Laurent Dufour for his help and suggestions on how to find
the root cause and how to fix it.

Signed-off-by: Greg Kurz <gr...@kaod.org>
---
 net/9p/client.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index b433aff5ff13..e6cae8332e2e 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (err < 0) {
if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
-   goto reterr;
+   goto recalc_sigpending;
}
 again:
/* Wait for the response */
@@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();
@@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (err == -EIO)
c->status = Disconnected;
if (err != -ERESTARTSYS)
-   goto reterr;
+   goto recalc_sigpending;
}
if (req->status == REQ_STATUS_ERROR) {
p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();



Re: [V9fs-developer] [PATCH] net/9p: avoid -ERESTARTSYS leak to userspace

2018-02-20 Thread Greg Kurz
Hi Al,

It's been two years without any sign of life from 9p maintainers... :-\

Would you apply (or nack) this patch ?

Thanks,

--
Greg

PS: in the case you apply it, probable Cc sta...@vger.kernel.org as well


On Thu, 08 Feb 2018 18:38:49 +0100

Greg Kurz <gr...@kaod.org> wrote:

> If it was interrupted by a signal, the 9p client may need to send some
> more requests to the server for cleanup before returning to userspace.
> 
> To avoid such a last minute request to be interrupted right away, the
> client memorizes if a signal is pending, clear TIF_SIGPENDING, handle
> the request and call recalc_sigpending() before returning.
> 
> Unfortunately, if the transmission of this cleanup request fails for any
> reason, the transport returns an error and the client propagates it right
> away, without calling recalc_sigpending().
> 
> This ends up with -ERESTARTSYS from the initially interrupted request
> crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
> request. The specific signal handling code, which is responsible for
> converting -ERESTARTSYS to -EINTR is not called, and userspace receives
> the confusing errno value:
> 
> open: Unknown error 512 (512)
> 
> This is really hard to hit in real life. I discovered the issue while
> working on hot-unplug of a virtio-9p-pci device with an instrumented
> QEMU allowing to control request completion.
> 
> Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
> error path actually. Their code flow is a bit obscure and the best
> thing to do would probably be a full rewrite: to really ensure this
> situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
> never happen.
> 
> But given the general lack of interest for the 9p code, I won't risk
> breaking more things. So this patch simply fix the buggy paths in both
> functions with a trivial label+goto.
> 
> Thanks to Laurent Dufour for his help and suggestions on how to find
> the root cause and how to fix it.
> 
> Signed-off-by: Greg Kurz <gr...@kaod.org>
> ---
>  net/9p/client.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 4c8cf9c1631a..5154eaf19fff 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const 
> char *fmt, ...)
>   if (err < 0) {
>   if (err != -ERESTARTSYS && err != -EFAULT)
>   c->status = Disconnected;
> - goto reterr;
> + goto recalc_sigpending;
>   }
>  again:
>   /* Wait for the response */
> @@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const 
> char *fmt, ...)
>   if (req->status == REQ_STATUS_RCVD)
>   err = 0;
>   }
> +recalc_sigpending:
>   if (sigpending) {
>   spin_lock_irqsave(>sighand->siglock, flags);
>   recalc_sigpending();
> @@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
> *c, int8_t type,
>   if (err == -EIO)
>   c->status = Disconnected;
>   if (err != -ERESTARTSYS)
> - goto reterr;
> + goto recalc_sigpending;
>   }
>   if (req->status == REQ_STATUS_ERROR) {
>   p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
> @@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
> *c, int8_t type,
>   if (req->status == REQ_STATUS_RCVD)
>   err = 0;
>   }
> +recalc_sigpending:
>   if (sigpending) {
>   spin_lock_irqsave(>sighand->siglock, flags);
>   recalc_sigpending();
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> V9fs-developer mailing list
> v9fs-develo...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer



Re: [V9fs-developer] [PATCH] 9p/trans_virtio: discard zero-length reply

2018-02-08 Thread Greg Kurz
Ping ?

Michael,

Since this is virtio code and you have acked the QEMU part of the fix already,
would you be kind enough to take this through your tree ?

Cheers,

--
Greg

On Mon, 22 Jan 2018 22:02:05 +0100
Greg Kurz <gr...@kaod.org> wrote:

> When a 9p request is successfully flushed, the server is expected to just
> mark it as used without sending a 9p reply (ie, without writing data into
> the buffer). In this case, virtqueue_get_buf() will return len == 0 and
> we must not report a REQ_STATUS_RCVD status to the client, otherwise the
> client will erroneously assume the request has not been flushed.
> 
> Signed-off-by: Greg Kurz <gr...@kaod.org>
> ---
>  net/9p/trans_virtio.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 0845aad4ba51..ca08c72ef4de 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq)
>   spin_unlock_irqrestore(>lock, flags);
>   /* Wakeup if anyone waiting for VirtIO ring space. */
>   wake_up(chan->vc_wq);
> - p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> + if (len)
> + p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
>   }
>  }
>  
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> V9fs-developer mailing list
> v9fs-develo...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer



[PATCH] net/9p: avoid -ERESTARTSYS leak to userspace

2018-02-08 Thread Greg Kurz
If it was interrupted by a signal, the 9p client may need to send some
more requests to the server for cleanup before returning to userspace.

To avoid such a last minute request to be interrupted right away, the
client memorizes if a signal is pending, clear TIF_SIGPENDING, handle
the request and call recalc_sigpending() before returning.

Unfortunately, if the transmission of this cleanup request fails for any
reason, the transport returns an error and the client propagates it right
away, without calling recalc_sigpending().

This ends up with -ERESTARTSYS from the initially interrupted request
crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
request. The specific signal handling code, which is responsible for
converting -ERESTARTSYS to -EINTR is not called, and userspace receives
the confusing errno value:

open: Unknown error 512 (512)

This is really hard to hit in real life. I discovered the issue while
working on hot-unplug of a virtio-9p-pci device with an instrumented
QEMU allowing to control request completion.

Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
error path actually. Their code flow is a bit obscure and the best
thing to do would probably be a full rewrite: to really ensure this
situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
never happen.

But given the general lack of interest for the 9p code, I won't risk
breaking more things. So this patch simply fix the buggy paths in both
functions with a trivial label+goto.

Thanks to Laurent Dufour for his help and suggestions on how to find
the root cause and how to fix it.

Signed-off-by: Greg Kurz <gr...@kaod.org>
---
 net/9p/client.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 4c8cf9c1631a..5154eaf19fff 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (err < 0) {
if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
-   goto reterr;
+   goto recalc_sigpending;
}
 again:
/* Wait for the response */
@@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();
@@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (err == -EIO)
c->status = Disconnected;
if (err != -ERESTARTSYS)
-   goto reterr;
+   goto recalc_sigpending;
}
if (req->status == REQ_STATUS_ERROR) {
p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();



[PATCH] 9p/trans_virtio: discard zero-length reply

2018-01-22 Thread Greg Kurz
When a 9p request is successfully flushed, the server is expected to just
mark it as used without sending a 9p reply (ie, without writing data into
the buffer). In this case, virtqueue_get_buf() will return len == 0 and
we must not report a REQ_STATUS_RCVD status to the client, otherwise the
client will erroneously assume the request has not been flushed.

Signed-off-by: Greg Kurz <gr...@kaod.org>
---
 net/9p/trans_virtio.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 0845aad4ba51..ca08c72ef4de 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq)
spin_unlock_irqrestore(>lock, flags);
/* Wakeup if anyone waiting for VirtIO ring space. */
wake_up(chan->vc_wq);
-   p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
+   if (len)
+   p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
}
 }
 



Re: [BUG/RFC] vhost: net: big endian viring access despite virtio 1

2017-01-27 Thread Greg Kurz
On Fri, 27 Jan 2017 13:24:13 +0100
Halil Pasic  wrote:

> On 01/26/2017 08:20 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote:  
> >>
> >> Hi!
> >>
> >> Recently I have been investigating some strange migration problems on
> >> s390x.
> >>
> >> It turned out under certain circumstances vhost_net corrupts avail.idx by
> >> using wrong endianness.  
> 
> [..]
> 
> >> -8<--  
> >> >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001  
> >> From: Halil Pasic 
> >> Date: Thu, 26 Jan 2017 00:06:15 +0100
> >> Subject: [PATCH] vhost: remove useless/dangerous reset of is_le
> >>
> >> The reset of is_le does no good, but it contributes its fair share to a
> >> bug in vhost_net, which occurs if we have some oldubufs when stopping and
> >> setting a fd = -1 as a backend. Instead of doing something convoluted in
> >> vhost_net, let's just get rid of the reset.
> >>
> >> Signed-off-by: Halil Pasic 
> >> Fixes: commit 2751c9882b94 
> >> ---
> >>  drivers/vhost/vhost.c | 4 +---
> >>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >> index d643260..08072a2 100644
> >> --- a/drivers/vhost/vhost.c
> >> +++ b/drivers/vhost/vhost.c
> >> @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
> >> int r;
> >> bool is_le = vq->is_le;
> >>
> >> -   if (!vq->private_data) {
> >> -   vhost_reset_is_le(vq);
> >> +   if (!vq->private_data)
> >> return 0;
> >> -   }
> >>
> >> vhost_init_is_le(vq);  
> > 
> > 
> > I think you do need to reset it, just maybe within vhost_init_is_le.
> > 
> > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> > vq->is_le = true;
> > else
> > vhost_reset_is_le(vq);
> > 
> >   
> 
> That is a very good point! I have overlooked that while the 
> CONFIG_VHOST_CROSS_ENDIAN_LEGACY variant
> 
> static void vhost_init_is_le(struct vhost_virtqueue *vq)
> {
> /* Note for legacy virtio: user_be is initialized at reset time
>  * according to the host endianness. If userspace does not set an
>  * explicit endianness, the default behavior is native endian, as
>  * expected by legacy virtio.
>  */
> vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
> }
> 
> is fine the other variant 
> 
> static void vhost_init_is_le(struct vhost_virtqueue *vq)
> {
> if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> vq->is_le = true;
> }
> is a very strange initializer (makes assumptions about the state
> to be initialized).
> 
> I agree, setting native endianness there sounds very reasonable.
> 
> I have a question regarding readability. IMHO the relationship
> of reset_is_le and int_is_le is a bit confusing, and I'm afraid
> it could become even more confusing with using reset in one of
> the init_is_le's.
> 
> How about we do the following?
> 
> static void vhost_init_is_le(struct vhost_virtqueue *vq)
> {
> if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
> vq->is_le = true;
> +   else
> +   vq->is_le = virtio_legacy_is_little_endian();
> 
> }
> 
> static void vhost_reset_is_le(struct vhost_virtqueue *vq)
> {
> -   vq->is_le = virtio_legacy_is_little_endian();
> +   vhost_init_is_le(vq);
> }
> 
> That way we would have correct endianness both after reset
> and after init, I think :).
> 

Yes, I think this is what we need.

Cheers.

--
Greg

> Thank you very much!
> 
> Halil
> 
> 



Re: [PATCH V4 0/3] basic busy polling support for vhost_net

2016-03-09 Thread Greg Kurz
On Fri,  4 Mar 2016 06:24:50 -0500
Jason Wang  wrote:

> This series tries to add basic busy polling for vhost net. The idea is
> simple: at the end of tx/rx processing, busy polling for new tx added
> descriptor and rx receive socket for a while. The maximum number of
> time (in us) could be spent on busy polling was specified ioctl.
> 
> Test A were done through:
> 
> - 50 us as busy loop timeout
> - Netperf 2.6
> - Two machines with back to back connected mlx4

Hi Jason,

Could this also improve performance if both VMs are
on the same host system ?

> - Guest with 1 vcpus and 1 queue
> 
> Results:
> - Obvious improvements (%5 - 20%) for latency (TCP_RR).
> - Get a better or minor regression on most of the TX tests, but see
>   some regression on 4096 size.
> - Except for 8 sessions of 4096 size RX, have a better or same
>   performance.
> - CPU utilization were incrased as expected.
> 
> TCP_RR:
> size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
> 1/ 1/   +8%/  -32%/   +8%/   +8%/   +7%
> 1/50/   +7%/  -19%/   +7%/   +7%/   +1%
> 1/   100/   +5%/  -21%/   +5%/   +5%/0%
> 1/   200/   +5%/  -21%/   +7%/   +7%/   +1%
>64/ 1/  +11%/  -29%/  +11%/  +11%/  +10%
>64/50/   +7%/  -19%/   +8%/   +8%/   +2%
>64/   100/   +8%/  -18%/   +9%/   +9%/   +2%
>64/   200/   +6%/  -19%/   +6%/   +6%/0%
>   256/ 1/   +7%/  -33%/   +7%/   +7%/   +6%
>   256/50/   +7%/  -18%/   +7%/   +7%/0%
>   256/   100/   +9%/  -18%/   +8%/   +8%/   +2%
>   256/   200/   +9%/  -18%/  +10%/  +10%/   +3%
>  1024/ 1/  +20%/  -28%/  +20%/  +20%/  +19%
>  1024/50/   +8%/  -18%/   +9%/   +9%/   +2%
>  1024/   100/   +6%/  -19%/   +5%/   +5%/0%
>  1024/   200/   +8%/  -18%/   +9%/   +9%/   +2%
> Guest TX:
> size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
>64/ 1/   -5%/  -28%/  +11%/  +12%/  +10%
>64/ 4/   -2%/  -26%/  +13%/  +13%/  +13%
>64/ 8/   -6%/  -29%/   +9%/  +10%/  +10%
>   512/ 1/  +15%/   -7%/  +13%/  +11%/   +3%
>   512/ 4/  +17%/   -6%/  +18%/  +13%/  +11%
>   512/ 8/  +14%/   -7%/  +13%/   +7%/   +7%
>  1024/ 1/  +27%/   -2%/  +26%/  +29%/  +12%
>  1024/ 4/   +8%/   -9%/   +6%/   +1%/   +6%
>  1024/ 8/  +41%/  +12%/  +34%/  +20%/   -3%
>  4096/ 1/  -22%/  -21%/  -36%/  +81%/+1360%
>  4096/ 4/  -57%/  -58%/ +286%/  +15%/+2074%
>  4096/ 8/  +67%/  +70%/  -45%/   -8%/  +63%
> 16384/ 1/   -2%/   -5%/   +5%/   -3%/  +80%
> 16384/ 4/0%/0%/0%/   +4%/ +138%
> 16384/ 8/0%/0%/0%/   +1%/  +41%
> 65535/ 1/   -3%/   -6%/   +2%/  +11%/ +113%
> 65535/ 4/   -2%/   -1%/   -2%/   -3%/ +484%
> 65535/ 8/0%/   +1%/0%/   +2%/  +40%
> Guest RX:
> size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
>64/ 1/  +31%/   -3%/   +8%/   +8%/   +8%
>64/ 4/  +11%/  -17%/  +13%/  +14%/  +15%
>64/ 8/   +4%/  -23%/  +11%/  +11%/  +12%
>   512/ 1/  +24%/0%/  +18%/  +14%/   -8%
>   512/ 4/   +4%/  -15%/   +6%/   +5%/   +6%
>   512/ 8/  +26%/0%/  +21%/  +10%/   +3%
>  1024/ 1/  +88%/  +47%/  +69%/  +44%/  -30%
>  1024/ 4/  +18%/   -5%/  +19%/  +16%/   +2%
>  1024/ 8/  +15%/   -4%/  +13%/   +8%/   +1%
>  4096/ 1/   -3%/   -5%/   +2%/   -2%/  +41%
>  4096/ 4/   +2%/   +3%/  -20%/  -14%/  -24%
>  4096/ 8/  -43%/  -45%/  +69%/  -24%/  +94%
> 16384/ 1/   -3%/  -11%/  +23%/   +7%/  +42%
> 16384/ 4/   -3%/   -3%/   -4%/   +5%/ +115%
> 16384/ 8/   -1%/0%/   -1%/   -3%/  +32%
> 65535/ 1/   +1%/0%/   +2%/0%/  +66%
> 65535/ 4/   -1%/   -1%/0%/   +4%/ +492%
> 65535/ 8/0%/   -1%/   -1%/   +4%/  +38%
> 
> Changes from V3:
> - drop single_task_running()
> - use cpu_relax_lowlatency() instead of cpu_relax()
> 
> Changes from V2:
> - rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return
> false we __get_user() fails.
> - do not bother premmptions/timers for good path.
> - use vhost_vring_state as ioctl parameter instead of reinveting a new
> one.
> - add the unit of timeout (us) to the comment of new added ioctls
> 
> Changes from V1:
> - remove the buggy vq_error() in vhost_vq_more_avail().
> - leave vhost_enable_notify() untouched.
> 
> Changes from RFC V3:
> - small tweak on the code to avoid multiple duplicate conditions in
> critical path when busy loop is not enabled.
> - add the test result of multiple VMs
> 
> Changes from RFC V2:
> - poll also at the end of rx handling
> - factor out the polling logic and optimize the code a little bit
> - add two ioctls to get and set the busy poll timeout
> - test on ixgbe (which can give more stable and reproducable numbers)
> instead of mlx4.
> 
> Changes from RFC V1:
> - add a comment for vhost_has_work() to explain why it could be
> lockless
> - add param description for busyloop_timeout
> - split out the busy polling logic into a new helper
> - check and exit the loop when 

Re: [PATCH v2 0/3] vhost: cross-endian code cleanup

2016-02-16 Thread Greg Kurz
On Tue, 16 Feb 2016 17:34:13 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Tue, Feb 16, 2016 at 03:54:18PM +0100, Greg Kurz wrote:
> > This series is a new tentative to have cleaner cross-endian code.
> > 
> > Patches 1/3 is new: it fixes a side-effect in case vhost_init_used() fails.
> > 
> > Patch 2/3 comes from v1: it renames cross-endian helpers
> > 
> > Patch 3/3 is new: it simply renames vhost_init_used() as suggested by 
> > Michael.  
> 
> Is this on top of my tree?
> 

No it's on top of Linus's but I can rebase.

> > ---
> > 
> > Greg Kurz (3):
> >   vhost: fix error path in vhost_init_used()
> >   vhost: rename cross-endian helpers
> >   vhost: rename vhost_init_used()
> > 
> > 
> >  drivers/vhost/net.c   |2 +-
> >  drivers/vhost/scsi.c  |2 +-
> >  drivers/vhost/test.c  |2 +-
> >  drivers/vhost/vhost.c |   49 
> > +
> >  drivers/vhost/vhost.h |2 +-
> >  5 files changed, 41 insertions(+), 16 deletions(-)  
> 



[PATCH v2 2/3] vhost: rename cross-endian helpers

2016-02-16 Thread Greg Kurz
The default use case for vhost is when the host and the vring have the
same endianness (default native endianness). But there are cases where
they differ and vhost should byteswap when accessing the vring.

The first case is when the host is big endian and the vring belongs to
a virtio 1.0 device, which is always little endian.

This is covered by the vq->is_le field. This field is initialized when
userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
stops.

We already have a vhost_init_is_le() helper, but the reset operation is
opencoded as follows:

vq->is_le = virtio_legacy_is_little_endian();

It isn't clear that we are resetting vq->is_le here.

This patch moves the code to a helper with a more explicit name.

The other case where we may have to byteswap is when the architecture can
switch endianness at runtime (bi-endian). If endianness differs in the host
and in the guest, then legacy devices need to be used in cross-endian mode.

This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
introduces a vq->user_be field. Userspace may enable cross-endian mode
by calling the SET_VRING_ENDIAN ioctl before the device is started. The
cross-endian mode is disabled when the device is stopped.

The current names of the helpers that manipulate vq->user_be are unclear.

This patch renames those helpers to clearly show that this is cross-endian
stuff and with explicit enable/disable semantics.

No behaviour change.

Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
---
Changes since v1:
- use init/reset semantics for vq->is_le (Michael)
- use enable/disable semantics for vq->user_be (Cornelia)
- rewrote the changelog
---
 drivers/vhost/vhost.c |   30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 236553e81027..69f6463e11bd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -43,11 +43,21 @@ enum {
 #define vhost_avail_event(vq) ((__virtio16 __user *)>used->ring[vq->num])
 
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
-static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq)
+static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
 {
vq->user_be = !virtio_legacy_is_little_endian();
 }
 
+static void vhost_enable_cross_endian_big(struct vhost_virtqueue *vq)
+{
+   vq->user_be = true;
+}
+
+static void vhost_enable_cross_endian_little(struct vhost_virtqueue *vq)
+{
+   vq->user_be = false;
+}
+
 static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user 
*argp)
 {
struct vhost_vring_state s;
@@ -62,7 +72,10 @@ static long vhost_set_vring_endian(struct vhost_virtqueue 
*vq, int __user *argp)
s.num != VHOST_VRING_BIG_ENDIAN)
return -EINVAL;
 
-   vq->user_be = s.num;
+   if (s.num == VHOST_VRING_BIG_ENDIAN)
+   vhost_enable_cross_endian_big(vq);
+   else
+   vhost_enable_cross_endian_little(vq);
 
return 0;
 }
@@ -91,7 +104,7 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
 }
 #else
-static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq)
+static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
 {
 }
 
@@ -113,6 +126,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
 }
 #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
 
+static void vhost_reset_is_le(struct vhost_virtqueue *vq)
+{
+   vq->is_le = virtio_legacy_is_little_endian();
+}
+
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
 {
@@ -276,8 +294,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->call = NULL;
vq->log_ctx = NULL;
vq->memory = NULL;
-   vq->is_le = virtio_legacy_is_little_endian();
-   vhost_vq_reset_user_be(vq);
+   vhost_reset_is_le(vq);
+   vhost_disable_cross_endian(vq);
 }
 
 static int vhost_worker(void *data)
@@ -1159,7 +1177,7 @@ int vhost_init_used(struct vhost_virtqueue *vq)
bool is_le = vq->is_le;
 
if (!vq->private_data) {
-   vq->is_le = virtio_legacy_is_little_endian();
+   vhost_reset_is_le(vq);
return 0;
}
 



[PATCH v2 3/3] vhost: rename vhost_init_used()

2016-02-16 Thread Greg Kurz
Looking at how callers use this, maybe we should just rename init_used
to vhost_vq_init_access. The _used suffix was a hint that we
access the vq used ring. But maybe what callers care about is
that it must be called after access_ok.

Also, this function manipulates the vq->is_le field which isn't related
to the vq used ring.

This patch simply renames vhost_init_used() to vhost_vq_init_access() as
suggested by Michael.

No behaviour change.

Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
---
 drivers/vhost/net.c   |2 +-
 drivers/vhost/scsi.c  |2 +-
 drivers/vhost/test.c  |2 +-
 drivers/vhost/vhost.c |4 ++--
 drivers/vhost/vhost.h |2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e40678..7bd75ff8be26 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -917,7 +917,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
 
vhost_net_disable_vq(n, vq);
vq->private_data = sock;
-   r = vhost_init_used(vq);
+   r = vhost_vq_init_access(vq);
if (r)
goto err_used;
r = vhost_net_enable_vq(n, vq);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 29cfc57d496e..f898686cdd93 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1274,7 +1274,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
vq = >vqs[i].vq;
mutex_lock(>mutex);
vq->private_data = vs_tpg;
-   vhost_init_used(vq);
+   vhost_vq_init_access(vq);
mutex_unlock(>mutex);
}
ret = 0;
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index f2882ac98726..388eec4e1a90 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -196,7 +196,7 @@ static long vhost_test_run(struct vhost_test *n, int test)
oldpriv = vq->private_data;
vq->private_data = priv;
 
-   r = vhost_init_used(>vqs[index]);
+   r = vhost_vq_init_access(>vqs[index]);
 
mutex_unlock(>mutex);
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69f6463e11bd..328c54ab0154 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1170,7 +1170,7 @@ static int vhost_update_avail_event(struct 
vhost_virtqueue *vq, u16 avail_event)
return 0;
 }
 
-int vhost_init_used(struct vhost_virtqueue *vq)
+int vhost_vq_init_access(struct vhost_virtqueue *vq)
 {
__virtio16 last_used_idx;
int r;
@@ -1200,7 +1200,7 @@ err:
vq->is_le = is_le;
return r;
 }
-EXPORT_SYMBOL_GPL(vhost_init_used);
+EXPORT_SYMBOL_GPL(vhost_vq_init_access);
 
 static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
  struct iovec iov[], int iov_size)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d3f767448a72..8f0dd0d915d4 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -148,7 +148,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
  struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
-int vhost_init_used(struct vhost_virtqueue *);
+int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
 unsigned count);



[PATCH v2 0/3] vhost: cross-endian code cleanup

2016-02-16 Thread Greg Kurz
This series is a new tentative to have cleaner cross-endian code.

Patches 1/3 is new: it fixes a side-effect in case vhost_init_used() fails.

Patch 2/3 comes from v1: it renames cross-endian helpers

Patch 3/3 is new: it simply renames vhost_init_used() as suggested by Michael.

---

Greg Kurz (3):
  vhost: fix error path in vhost_init_used()
  vhost: rename cross-endian helpers
  vhost: rename vhost_init_used()


 drivers/vhost/net.c   |2 +-
 drivers/vhost/scsi.c  |2 +-
 drivers/vhost/test.c  |2 +-
 drivers/vhost/vhost.c |   49 +
 drivers/vhost/vhost.h |2 +-
 5 files changed, 41 insertions(+), 16 deletions(-)



[PATCH v2 1/3] vhost: fix error path in vhost_init_used()

2016-02-16 Thread Greg Kurz
We don't want side effects. If something fails, we rollback vq->is_le to
its previous value.

Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
---
 drivers/vhost/vhost.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ad2146a9ab2d..236553e81027 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1156,6 +1156,8 @@ int vhost_init_used(struct vhost_virtqueue *vq)
 {
__virtio16 last_used_idx;
int r;
+   bool is_le = vq->is_le;
+
if (!vq->private_data) {
vq->is_le = virtio_legacy_is_little_endian();
return 0;
@@ -1165,15 +1167,20 @@ int vhost_init_used(struct vhost_virtqueue *vq)
 
r = vhost_update_used_flags(vq);
if (r)
-   return r;
+   goto err;
vq->signalled_used_valid = false;
-   if (!access_ok(VERIFY_READ, >used->idx, sizeof vq->used->idx))
-   return -EFAULT;
+   if (!access_ok(VERIFY_READ, >used->idx, sizeof vq->used->idx)) {
+   r = -EFAULT;
+   goto err;
+   }
r = __get_user(last_used_idx, >used->idx);
if (r)
-   return r;
+   goto err;
vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx);
return 0;
+err:
+   vq->is_le = is_le;
+   return r;
 }
 EXPORT_SYMBOL_GPL(vhost_init_used);
 



Re: [PATCH 1/2] vhost: helpers to enable/disable vring endianness

2016-02-10 Thread Greg Kurz
On Wed, 10 Feb 2016 17:08:52 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Wed, Feb 10, 2016 at 01:11:34PM +0100, Greg Kurz wrote:
> > On Wed, 10 Feb 2016 13:21:22 +0200
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> >   
> > > On Wed, Jan 13, 2016 at 06:09:41PM +0100, Greg Kurz wrote:  
> > > > The default use case for vhost is when the host and the vring have the
> > > > same endianness (default native endianness). But there are cases where
> > > > they differ and vhost should byteswap when accessing the vring:
> > > > - the host is big endian and the vring comes from a virtio 1.0 device
> > > >   which is always little endian
> > > > - the architecture is bi-endian and the vring comes from a legacy virtio
> > > >   device with a different endianness than the endianness of the host 
> > > > (aka
> > > >   legacy cross-endian)
> > > > 
> > > > These cases are handled by the vq->is_le and the optional vq->user_be,
> > > > with the following logic:
> > > > - if none of the fields is enabled, vhost access the vring without 
> > > > byteswap
> > > > - if the vring is virtio 1.0 and the host is big endian, vq->is_le is
> > > >   enabled to enforce little endian access to the vring
> > > > - if the vring is legacy cross-endian, userspace enables vq->user_be
> > > >   to inform vhost about the vring endianness. This endianness is then
> > > >   enforced for vring accesses through vq->is_le again
> > > > 
> > > > The logic is unclear in the current code.
> > > > 
> > > > This patch introduces helpers with explicit enable and disable 
> > > > semantics,
> > > > for better clarity.
> > > > 
> > > > No behaviour change.
> > > > 
> > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> > > > ---
> > > >  drivers/vhost/vhost.c |   28 +++-
> > > >  1 file changed, 19 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index ad2146a9ab2d..e02e06755ab7 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -43,11 +43,16 @@ enum {
> > > >  #define vhost_avail_event(vq) ((__virtio16 __user 
> > > > *)>used->ring[vq->num])
> > > >  
> > > >  #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
> > > > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq)
> > > > +static void vhost_disable_user_be(struct vhost_virtqueue *vq)
> > > >  {
> > > > vq->user_be = !virtio_legacy_is_little_endian();
> > > >  }
> > > >  
> > > 
> > > Hmm this doesn't look like an improvement to me.
> > > What does it mean to disable big endian? Make it little endian?  
> > 
> > The default behavior for the device is native endian.
> > 
> > The SET_VRING_ENDIAN ioctl is used to make the device big endian
> > on little endian hosts, hence "enabling" cross-endian mode...
> >   
> > > Existing reset seems to make sense.
> > >   
> > 
> > ... and we "disable" cross-endian mode on reset.  
> 
> OK but that's enable cross-endian. Not enable be.  We could have
> something like vhost_disable_user_byte_swap though each time we try,
> the result makes my head hurt: "swap" is a relative thing and hard to
> keep track of.
> 

What about having something like below then ?

vhost_enable_cross_endian_big() to be called on little endian hosts with big 
endian devices

vhost_enable_cross_endian_little() to be called on big endian hosts with little 
endian devices

vhost_disable_cross_endian() to go back to the native endian default

> > > > +static void vhost_enable_user_be(struct vhost_virtqueue *vq, bool 
> > > > user_be)
> > > > +{
> > > > +   vq->user_be = user_be;
> > > > +}
> > > > +
> > > 
> > > And this is maybe "init_user_be"?
> > >   
> > 
> > Anyway I don't mind changing the names to reset/init_user_be if you think it
> > is clearer.
> >   
> > > >  static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int 
> > > > __user *argp)
> > > >  {
> > > > struct vhost_vring_state s;
> > > > 

Re: [PATCH 2/2] vhost: disentangle vring endianness stuff from the core code

2016-02-10 Thread Greg Kurz
On Wed, 10 Feb 2016 14:23:33 +0100
Cornelia Huck <cornelia.h...@de.ibm.com> wrote:

> On Wed, 10 Feb 2016 14:08:43 +0100
> Greg Kurz <gk...@linux.vnet.ibm.com> wrote:
> 
> > But you are right, there is a bug: we should rollback if vhost_init_used()
> > fails. Something like below:
> > 
> >  err_used:
> > vq->private_data = oldsock;
> > vhost_net_enable_vq(n, vq);
> > +   vhost_adjust_vring_endian(vq);  
> 
> Shouldn't we switch back before we reenable? Or have I lost myself in
> this maze here again?
> 

I haven't spotted any path under vhost_net_enable_vq() that needs
the vring endianness, but indeed it looks safer to switch back
before a worker thread may be woken up.

> > if (ubufs)
> > vhost_net_ubuf_put_wait_and_free(ubufs);
> >  err_ubufs:  



Re: [PATCH 1/2] vhost: helpers to enable/disable vring endianness

2016-02-10 Thread Greg Kurz
On Wed, 10 Feb 2016 13:21:22 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Wed, Jan 13, 2016 at 06:09:41PM +0100, Greg Kurz wrote:
> > The default use case for vhost is when the host and the vring have the
> > same endianness (default native endianness). But there are cases where
> > they differ and vhost should byteswap when accessing the vring:
> > - the host is big endian and the vring comes from a virtio 1.0 device
> >   which is always little endian
> > - the architecture is bi-endian and the vring comes from a legacy virtio
> >   device with a different endianness than the endianness of the host (aka
> >   legacy cross-endian)
> > 
> > These cases are handled by the vq->is_le and the optional vq->user_be,
> > with the following logic:
> > - if none of the fields is enabled, vhost access the vring without byteswap
> > - if the vring is virtio 1.0 and the host is big endian, vq->is_le is
> >   enabled to enforce little endian access to the vring
> > - if the vring is legacy cross-endian, userspace enables vq->user_be
> >   to inform vhost about the vring endianness. This endianness is then
> >   enforced for vring accesses through vq->is_le again
> > 
> > The logic is unclear in the current code.
> > 
> > This patch introduces helpers with explicit enable and disable semantics,
> > for better clarity.
> > 
> > No behaviour change.
> > 
> > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> > ---
> >  drivers/vhost/vhost.c |   28 +++-
> >  1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index ad2146a9ab2d..e02e06755ab7 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -43,11 +43,16 @@ enum {
> >  #define vhost_avail_event(vq) ((__virtio16 __user 
> > *)>used->ring[vq->num])
> >  
> >  #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
> > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq)
> > +static void vhost_disable_user_be(struct vhost_virtqueue *vq)
> >  {
> > vq->user_be = !virtio_legacy_is_little_endian();
> >  }
> >
> 
> Hmm this doesn't look like an improvement to me.
> What does it mean to disable big endian? Make it little endian?

The default behavior for the device is native endian.

The SET_VRING_ENDIAN ioctl is used to make the device big endian
on little endian hosts, hence "enabling" cross-endian mode...

> Existing reset seems to make sense.
> 

... and we "disable" cross-endian mode on reset.

> > +static void vhost_enable_user_be(struct vhost_virtqueue *vq, bool user_be)
> > +{
> > +   vq->user_be = user_be;
> > +}
> > +  
> 
> And this is maybe "init_user_be"?
> 

Anyway I don't mind changing the names to reset/init_user_be if you think it
is clearer.

> >  static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user 
> > *argp)
> >  {
> > struct vhost_vring_state s;
> > @@ -62,7 +67,7 @@ static long vhost_set_vring_endian(struct vhost_virtqueue 
> > *vq, int __user *argp)
> > s.num != VHOST_VRING_BIG_ENDIAN)
> > return -EINVAL;
> >  
> > -   vq->user_be = s.num;
> > +   vhost_enable_user_be(vq, !!s.num);
> >  
> > return 0;
> >  }
> > @@ -81,7 +86,7 @@ static long vhost_get_vring_endian(struct vhost_virtqueue 
> > *vq, u32 idx,
> > return 0;
> >  }
> >  
> > -static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > +static void vhost_enable_is_le(struct vhost_virtqueue *vq)
> >  {
> > /* Note for legacy virtio: user_be is initialized at reset time
> >  * according to the host endianness. If userspace does not set an  
> 
> Same thing really. I'd rather add "reset_is_le".
> 
> > @@ -91,7 +96,7 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be;
> >  }
> >  #else
> > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq)
> > +static void vhost_disable_user_be(struct vhost_virtqueue *vq)
> >  {
> >  }
> >  
> > @@ -106,13 +111,18 @@ static long vhost_get_vring_endian(struct 
> > vhost_virtqueue *vq, u32 idx,
> > return -ENOIOCTLCMD;
> >  }
> >  
> > -static void vhost_init_is_le(struct vhost_virtqueue *vq)
> > +static void vhost_enable_is_le(struct vhost_virtqueue *vq)
> >  {
> > 

Re: [PATCH 2/2] vhost: disentangle vring endianness stuff from the core code

2016-02-10 Thread Greg Kurz
On Wed, 10 Feb 2016 13:48:09 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Wed, Jan 13, 2016 at 06:09:47PM +0100, Greg Kurz wrote:
> > The way vring endianness is being handled currently obfuscates
> > the code in vhost_init_used().
> > 
> > This patch tries to fix that by doing the following:
> > - move the the code that adjusts endianness to a dedicated helper
> > - export this helper so that backends explicitely call it
> > 
> > No behaviour change.
> > 
> > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> > ---
> >  drivers/vhost/net.c   |3 +++
> >  drivers/vhost/scsi.c  |3 +++
> >  drivers/vhost/test.c  |2 ++
> >  drivers/vhost/vhost.c |   16 +++-
> >  drivers/vhost/vhost.h |1 +
> >  5 files changed, 20 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 9eda69e40678..df01c939cd00 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -917,6 +917,9 @@ static long vhost_net_set_backend(struct vhost_net *n, 
> > unsigned index, int fd)
> >  
> > vhost_net_disable_vq(n, vq);
> > vq->private_data = sock;
> > +
> > +   vhost_adjust_vring_endian(vq);
> > +
> > r = vhost_init_used(vq);
> > if (r)
> > goto err_used;  
> 
> 
> This is in fact a bug in existing code: if vhost_init_used
> fails, it preferably should not have side-effects.
> It's best to update it last thing.
> 

I'm afraid we can't because the following path needs the
vring endianness:

vhost_init_used()->vhost_update_used_flags()->cpu_to_vhost16()

But you are right, there is a bug: we should rollback if vhost_init_used()
fails. Something like below:

 err_used:
vq->private_data = oldsock;
vhost_net_enable_vq(n, vq);
+   vhost_adjust_vring_endian(vq);
if (ubufs)
vhost_net_ubuf_put_wait_and_free(ubufs);
 err_ubufs:

> > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> > index 29cfc57d496e..5a8363bfcb74 100644
> > --- a/drivers/vhost/scsi.c
> > +++ b/drivers/vhost/scsi.c
> > @@ -1274,6 +1274,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> > vq = >vqs[i].vq;
> > mutex_lock(>mutex);
> > vq->private_data = vs_tpg;
> > +
> > +   vhost_adjust_vring_endian(vq);
> > +
> > vhost_init_used(vq);
> > mutex_unlock(>mutex);
> > }
> > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> > index f2882ac98726..75e3e0e9f5a8 100644
> > --- a/drivers/vhost/test.c
> > +++ b/drivers/vhost/test.c
> > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int 
> > test)
> > oldpriv = vq->private_data;
> > vq->private_data = priv;
> >  
> > +   vhost_adjust_vring_endian(vq);
> > +
> > r = vhost_init_used(>vqs[index]);
> >  
> > mutex_unlock(>mutex);
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index e02e06755ab7..b0a00340309e 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -123,6 +123,15 @@ static void vhost_disable_is_le(struct vhost_virtqueue 
> > *vq)
> > vq->is_le = virtio_legacy_is_little_endian();
> >  }
> >  
> > +void vhost_adjust_vring_endian(struct vhost_virtqueue *vq)
> > +{
> > +   if (!vq->private_data)
> > +   vhost_disable_is_le(vq);
> > +   else
> > +   vhost_enable_is_le(vq);
> > +}
> > +EXPORT_SYMBOL_GPL(vhost_adjust_vring_endian);
> > +
> >  static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
> > poll_table *pt)
> >  {  
> 
> I'd prefer "vhost_update_is_le" here. "endian" might also mean
> "user_be". But see below pls.
> 

Ok, I will rename if this patch survives the review.

> 
> > @@ -1166,12 +1175,9 @@ int vhost_init_used(struct vhost_virtqueue *vq)
> >  {
> > __virtio16 last_used_idx;
> > int r;
> > -   if (!vq->private_data) {
> > -   vhost_disable_is_le(vq);
> > -   return 0;
> > -   }
> >  
> > -   vhost_enable_is_le(vq);
> > +   if (!vq->private_data)
> > +   return 0;
> >  
> > r = vhost_update_used_flags(vq);
> > if (r)  
> 
> L

Re: [PATCH 0/2] vhost: cross-endian code cleanup

2016-01-27 Thread Greg Kurz
On Wed, 13 Jan 2016 18:09:34 +0100
Greg Kurz <gk...@linux.vnet.ibm.com> wrote:

> This series is a respin of the following patch:
> 
> http://patchwork.ozlabs.org/patch/565921/
> 
> Patch 1 is preliminary work: it gives better names to the helpers that are
> involved in cross-endian support.
> 
> Patch 2 is actually a v2 of the original patch. All devices now call a
> helper in the generic code, which DTRT according to vq->private_data, as
> suggested by Michael.
> 

Hi Michael,

This is just a friendly reminder for this series, which was
reviewed by Cornelia already.

Thanks.

--
Greg

> ---
> 
> Greg Kurz (2):
>   vhost: helpers to enable/disable vring endianness
>   vhost: disentangle vring endianness stuff from the core code
> 
> 
>  drivers/vhost/net.c   |3 +++
>  drivers/vhost/scsi.c  |3 +++
>  drivers/vhost/test.c  |2 ++
>  drivers/vhost/vhost.c |   40 
>  drivers/vhost/vhost.h |1 +
>  5 files changed, 37 insertions(+), 12 deletions(-)
> 
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 



Re: [PATCH] vhost: move is_le setup to the backend

2015-12-16 Thread Greg Kurz
On Thu, 12 Nov 2015 15:28:19 +0100
Greg Kurz <gk...@linux.vnet.ibm.com> wrote:

> On Thu, 12 Nov 2015 15:46:30 +0200
> "Michael S. Tsirkin" <m...@redhat.com> wrote:
> 
> > On Fri, Oct 30, 2015 at 12:42:35PM +0100, Greg Kurz wrote:
> > > The vq->is_le field is used to fix endianness when accessing the vring via
> > > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
> > > 
> > > 1) host is big endian and device is modern virtio
> > > 
> > > 2) host has cross-endian support and device is legacy virtio with a 
> > > different
> > >endianness than the host
> > > 
> > > Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
> > > VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
> > > is only needed when the backend is active, it was decided to set it at
> > > backend start.
> > > 
> > > This is currently done in vhost_init_used()->vhost_init_is_le() but it
> > > obfuscates the core vhost code. This patch moves the is_le setup to a
> > > dedicated function that is called from the backend code.
> > > 
> > > Note vhost_net is the only backend that can pass vq->private_data == NULL 
> > > to
> > > vhost_init_used(), hence the "if (sock)" branch.
> > > 
> > > No behaviour change.
> > > 
> > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> > 
> > I plan to look at this next week, busy with QEMU 2.5 now.
> > 
> 
> I don't have any deadline for this since this is only a cleanup tentative.
> 
> Thanks.
> 

... but ping anyway since it was a month ago :)

Cheers.

--
Greg

> > > ---
> > >  drivers/vhost/net.c   |6 ++
> > >  drivers/vhost/scsi.c  |3 +++
> > >  drivers/vhost/test.c  |2 ++
> > >  drivers/vhost/vhost.c |   12 +++-
> > >  drivers/vhost/vhost.h |1 +
> > >  5 files changed, 19 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 9eda69e40678..d6319cb2664c 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net 
> > > *n, unsigned index, int fd)
> > >  
> > >   vhost_net_disable_vq(n, vq);
> > >   vq->private_data = sock;
> > > +
> > > + if (sock)
> > > + vhost_set_is_le(vq);
> > > + else
> > > + vq->is_le = virtio_legacy_is_little_endian();
> > > +
> > >   r = vhost_init_used(vq);
> > >   if (r)
> > >   goto err_used;
> > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> > > index e25a23692822..e2644a301fa5 100644
> > > --- a/drivers/vhost/scsi.c
> > > +++ b/drivers/vhost/scsi.c
> > > @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> > >   vq = >vqs[i].vq;
> > >   mutex_lock(>mutex);
> > >   vq->private_data = vs_tpg;
> > > +
> > > + vhost_set_is_le(vq);
> > > +
> > >   vhost_init_used(vq);
> > >   mutex_unlock(>mutex);
> > >   }
> > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> > > index f2882ac98726..b1c7df502211 100644
> > > --- a/drivers/vhost/test.c
> > > +++ b/drivers/vhost/test.c
> > > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int 
> > > test)
> > >   oldpriv = vq->private_data;
> > >   vq->private_data = priv;
> > >  
> > > + vhost_set_is_le(vq);
> > > +
> > >   r = vhost_init_used(>vqs[index]);
> > >  
> > >   mutex_unlock(>mutex);
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index eec2f11809ff..6be863dcbd13 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue 
> > > *vq)
> > >  }
> > >  #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
> > >  
> > > +void vhost_set_is_le(struct vhost_virtqueue *vq)
> > > +{
> > > + vhost_init_is_le(vq);
> > > +}
> > > +EXPORT_SYMBOL_GPL(vhost_

Re: [PATCH] vhost: move is_le setup to the backend

2015-11-12 Thread Greg Kurz
On Fri, 30 Oct 2015 12:42:35 +0100
Greg Kurz <gk...@linux.vnet.ibm.com> wrote:

> The vq->is_le field is used to fix endianness when accessing the vring via
> the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
> 
> 1) host is big endian and device is modern virtio
> 
> 2) host has cross-endian support and device is legacy virtio with a different
>endianness than the host
> 
> Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
> VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
> is only needed when the backend is active, it was decided to set it at
> backend start.
> 
> This is currently done in vhost_init_used()->vhost_init_is_le() but it
> obfuscates the core vhost code. This patch moves the is_le setup to a
> dedicated function that is called from the backend code.
> 
> Note vhost_net is the only backend that can pass vq->private_data == NULL to
> vhost_init_used(), hence the "if (sock)" branch.
> 
> No behaviour change.
> 
> Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> ---

Ping ?

>  drivers/vhost/net.c   |6 ++
>  drivers/vhost/scsi.c  |3 +++
>  drivers/vhost/test.c  |2 ++
>  drivers/vhost/vhost.c |   12 +++-
>  drivers/vhost/vhost.h |1 +
>  5 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e40678..d6319cb2664c 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, 
> unsigned index, int fd)
> 
>   vhost_net_disable_vq(n, vq);
>   vq->private_data = sock;
> +
> + if (sock)
> + vhost_set_is_le(vq);
> + else
> + vq->is_le = virtio_legacy_is_little_endian();
> +
>   r = vhost_init_used(vq);
>   if (r)
>   goto err_used;
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index e25a23692822..e2644a301fa5 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
>   vq = >vqs[i].vq;
>   mutex_lock(>mutex);
>   vq->private_data = vs_tpg;
> +
> + vhost_set_is_le(vq);
> +
>   vhost_init_used(vq);
>   mutex_unlock(>mutex);
>   }
> diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> index f2882ac98726..b1c7df502211 100644
> --- a/drivers/vhost/test.c
> +++ b/drivers/vhost/test.c
> @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int test)
>   oldpriv = vq->private_data;
>   vq->private_data = priv;
> 
> + vhost_set_is_le(vq);
> +
>   r = vhost_init_used(>vqs[index]);
> 
>   mutex_unlock(>mutex);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index eec2f11809ff..6be863dcbd13 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
>  }
>  #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
> 
> +void vhost_set_is_le(struct vhost_virtqueue *vq)
> +{
> + vhost_init_is_le(vq);
> +}
> +EXPORT_SYMBOL_GPL(vhost_set_is_le);
> +
>  static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
>   poll_table *pt)
>  {
> @@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq)
>  {
>   __virtio16 last_used_idx;
>   int r;
> - if (!vq->private_data) {
> - vq->is_le = virtio_legacy_is_little_endian();
> + if (!vq->private_data)
>   return 0;
> - }
> -
> - vhost_init_is_le(vq);
> 
>   r = vhost_update_used_flags(vq);
>   if (r)
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 4772862b71a7..8a62041959fe 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -162,6 +162,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct 
> vhost_virtqueue *);
> 
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>   unsigned int log_num, u64 len);
> +void vhost_set_is_le(struct vhost_virtqueue *vq);
> 
>  #define vq_err(vq, fmt, ...) do {  \
>   pr_debug(pr_fmt(fmt), ##__VA_ARGS__);   \
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: move is_le setup to the backend

2015-11-12 Thread Greg Kurz
On Thu, 12 Nov 2015 15:46:30 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Fri, Oct 30, 2015 at 12:42:35PM +0100, Greg Kurz wrote:
> > The vq->is_le field is used to fix endianness when accessing the vring via
> > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
> > 
> > 1) host is big endian and device is modern virtio
> > 
> > 2) host has cross-endian support and device is legacy virtio with a 
> > different
> >endianness than the host
> > 
> > Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
> > VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
> > is only needed when the backend is active, it was decided to set it at
> > backend start.
> > 
> > This is currently done in vhost_init_used()->vhost_init_is_le() but it
> > obfuscates the core vhost code. This patch moves the is_le setup to a
> > dedicated function that is called from the backend code.
> > 
> > Note vhost_net is the only backend that can pass vq->private_data == NULL to
> > vhost_init_used(), hence the "if (sock)" branch.
> > 
> > No behaviour change.
> > 
> > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> 
> I plan to look at this next week, busy with QEMU 2.5 now.
> 

I don't have any deadline for this since this is only a cleanup tentative.

Thanks.

> > ---
> >  drivers/vhost/net.c   |6 ++
> >  drivers/vhost/scsi.c  |3 +++
> >  drivers/vhost/test.c  |2 ++
> >  drivers/vhost/vhost.c |   12 +++-
> >  drivers/vhost/vhost.h |1 +
> >  5 files changed, 19 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 9eda69e40678..d6319cb2664c 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, 
> > unsigned index, int fd)
> >  
> > vhost_net_disable_vq(n, vq);
> > vq->private_data = sock;
> > +
> > +   if (sock)
> > +   vhost_set_is_le(vq);
> > +   else
> > +   vq->is_le = virtio_legacy_is_little_endian();
> > +
> > r = vhost_init_used(vq);
> > if (r)
> > goto err_used;
> > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> > index e25a23692822..e2644a301fa5 100644
> > --- a/drivers/vhost/scsi.c
> > +++ b/drivers/vhost/scsi.c
> > @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> > vq = >vqs[i].vq;
> > mutex_lock(>mutex);
> > vq->private_data = vs_tpg;
> > +
> > +   vhost_set_is_le(vq);
> > +
> > vhost_init_used(vq);
> > mutex_unlock(>mutex);
> > }
> > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> > index f2882ac98726..b1c7df502211 100644
> > --- a/drivers/vhost/test.c
> > +++ b/drivers/vhost/test.c
> > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int 
> > test)
> > oldpriv = vq->private_data;
> > vq->private_data = priv;
> >  
> > +   vhost_set_is_le(vq);
> > +
> > r = vhost_init_used(>vqs[index]);
> >  
> > mutex_unlock(>mutex);
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index eec2f11809ff..6be863dcbd13 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue 
> > *vq)
> >  }
> >  #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
> >  
> > +void vhost_set_is_le(struct vhost_virtqueue *vq)
> > +{
> > +   vhost_init_is_le(vq);
> > +}
> > +EXPORT_SYMBOL_GPL(vhost_set_is_le);
> > +
> >  static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
> > poll_table *pt)
> >  {
> > @@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq)
> >  {
> > __virtio16 last_used_idx;
> > int r;
> > -   if (!vq->private_data) {
> > -   vq->is_le = virtio_legacy_is_little_endian();
> > +   if (!vq->private_data)
> > return 0;
> > -   }
> > -
> > -   vhost_init_is_le(vq);
> >  
> > r = vhost_update_used_flags(vq);
> > i

[PATCH] vhost: move is_le setup to the backend

2015-10-30 Thread Greg Kurz
The vq->is_le field is used to fix endianness when accessing the vring via
the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:

1) host is big endian and device is modern virtio

2) host has cross-endian support and device is legacy virtio with a different
   endianness than the host

Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
is only needed when the backend is active, it was decided to set it at
backend start.

This is currently done in vhost_init_used()->vhost_init_is_le() but it
obfuscates the core vhost code. This patch moves the is_le setup to a
dedicated function that is called from the backend code.

Note vhost_net is the only backend that can pass vq->private_data == NULL to
vhost_init_used(), hence the "if (sock)" branch.

No behaviour change.

Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
---
 drivers/vhost/net.c   |6 ++
 drivers/vhost/scsi.c  |3 +++
 drivers/vhost/test.c  |2 ++
 drivers/vhost/vhost.c |   12 +++-
 drivers/vhost/vhost.h |1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e40678..d6319cb2664c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
 
vhost_net_disable_vq(n, vq);
vq->private_data = sock;
+
+   if (sock)
+   vhost_set_is_le(vq);
+   else
+   vq->is_le = virtio_legacy_is_little_endian();
+
r = vhost_init_used(vq);
if (r)
goto err_used;
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index e25a23692822..e2644a301fa5 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
vq = >vqs[i].vq;
mutex_lock(>mutex);
vq->private_data = vs_tpg;
+
+   vhost_set_is_le(vq);
+
vhost_init_used(vq);
mutex_unlock(>mutex);
}
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index f2882ac98726..b1c7df502211 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int test)
oldpriv = vq->private_data;
vq->private_data = priv;
 
+   vhost_set_is_le(vq);
+
r = vhost_init_used(>vqs[index]);
 
mutex_unlock(>mutex);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index eec2f11809ff..6be863dcbd13 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
 }
 #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
 
+void vhost_set_is_le(struct vhost_virtqueue *vq)
+{
+   vhost_init_is_le(vq);
+}
+EXPORT_SYMBOL_GPL(vhost_set_is_le);
+
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
 {
@@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq)
 {
__virtio16 last_used_idx;
int r;
-   if (!vq->private_data) {
-   vq->is_le = virtio_legacy_is_little_endian();
+   if (!vq->private_data)
return 0;
-   }
-
-   vhost_init_is_le(vq);
 
r = vhost_update_used_flags(vq);
if (r)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 4772862b71a7..8a62041959fe 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -162,6 +162,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct 
vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
unsigned int log_num, u64 len);
+void vhost_set_is_le(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {  \
pr_debug(pr_fmt(fmt), ##__VA_ARGS__);   \

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: fix performance on LE hosts

2015-10-27 Thread Greg Kurz
On Tue, 27 Oct 2015 11:37:39 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:
> commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian
> support for legacy devices") introduced a minor regression: even with
> cross-endian disabled, and even on LE host, vhost_is_little_endian is
> checking is_le flag so there's always a branch.
> 
> To fix, simply check virtio_legacy_is_little_endian first.
> 
> Cc: Greg Kurz <gk...@linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>

Oops my bad for this oversight...

Reviewed-by: Greg Kurz <gk...@linux.vnet.ibm.com>

> ---
>  drivers/vhost/vhost.h | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 4772862..d3f7674 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -183,10 +183,17 @@ static inline bool vhost_has_feature(struct 
> vhost_virtqueue *vq, int bit)
>   return vq->acked_features & (1ULL << bit);
>  }
> 
> +#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
>  static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
>  {
>   return vq->is_le;
>  }
> +#else
> +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
> +{
> + return virtio_legacy_is_little_endian() || vq->is_le;
> +}
> +#endif
> 
>  /* Memory accessors */
>  static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Fix AF_PACKET ABI breakage in 4.2

2015-09-25 Thread Greg Kurz
On Thu, 24 Sep 2015 12:50:59 +0300
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Thu, Sep 24, 2015 at 09:25:45AM +0200, Greg Kurz wrote:
> > On Wed, 23 Sep 2015 19:45:08 +0100
> > David Woodhouse <dw...@infradead.org> wrote:
> > 
> > > Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
> > > accessors") accidentally changed the virtio_net header used by
> > > AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
> > > 
> > 
> > Hi David,
> > 
> > Oops my bad... I obviously overlooked this one when adding cross-endian
> > support.
> > 
> > > Since virtio_legacy_is_little_endian() is a very long identifier,
> > > define a VIO_LE macro and use that throughout the code instead of the
> > 
> > VIO usually refers to virtual IO adapters for the PowerPC pSeries platform.
> > 
> > > hard-coded 'false' for little-endian.
> > > 
> > 
> > What about introducing dedicated accessors as it is done in many other
> > locations where we do virtio byteswap ? Something like:
> > 
> > static inline bool packet_is_little_endian(void)
> > {
> > return virtio_legacy_is_little_endian();
> > }
> > 
> > static inline u16 packet16_to_cpu(__virtio16 val)
> > {
> > return __virtio16_to_cpu(packet_is_little_endian(), val);
> > }
> > 
> > static inline __virtio16 cpu_to_packet16(u16 val)
> > {
> > return __cpu_to_virtio16(packet_is_little_endian(), val);
> > }
> > 
> > It results in prettier code IMHO. Have a look at drivers/net/tun.c or
> > drivers/vhost/vhost.c.
> > 
> > > This restores the ABI to match 4.1 and earlier kernels, and makes my
> > > test program work again.
> > > 
> > 
> > BTW, there is still work to do if we want to support cross-endian legacy or
> > virtio 1 on a big endian arch...
> > 
> > Cheers.
> > 
> > --
> > Greg
> 
> It seems the API that we have is a confusing one.
> 
> virtio endian-ness is either native or little, depending on a flag, so
> __virtio16_to_cpu seems to mean "either native to cpu or little to cpu
> depending on flag".
> 
> It used to be like that, but not anymore.
> 
> This leads to all kind of bugs.
> 
> For example, I have only now realized vhost_is_little_endian isn't a
> constant on LE hosts if cross-endian support is not compiled.
> 
> I think we need to fix it, but also think about a better API.
> 

Your original API is good and works well for all the callers that
don't care for cross-endian support.

I guess we'd rather move the cross-endian burden to the few callers who
need it, i.e. tun, macvtap and vhost when cross-endian is compiled.

More or less like this:

/* Original API : either little-endian or native */
static inline u16 __virtio16_to_cpu(bool little_endian, __virtio16 val)
{
if (little_endian)
return le16_to_cpu((__force __le16)val);
else
return (__force u16)val;
}

#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY

static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)
{
/* little-endian because of virtio 1 */
if (vhost_has_feature(vq, VIRTIO_F_VERSION_1))
return __virtio16_to_cpu(true, val);

#ifdef __LITTLE_ENDIAN
/* native little-endian */
return (__force u16)val;
#else
/* native big-endian */
return be16_to_cpu((__force __be16)val);
#endif
}

#else

static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)
{
#ifdef __LITTLE_ENDIAN
/* fast path for little-endian host */
return __virtio16_to_cpu(true, val);
#else
/* slow path for big-endian host */
return __virtio16_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
val);
#endif
}

#endif

Does it make sense ?

> 
> > > Signed-off-by: David Woodhouse <david.woodho...@intel.com>
> > > ---
> > > On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote:
> > > > > +#define VIO_LE virtio_legacy_is_little_endian()
> > > > 
> > > > When you define a shorthand macro, the defines to a function call,
> > > > make the macro have parenthesis too.
> > > 
> > > In which case I suppose it also wants to be lower-case. Although
> > > "function call" is a bit strong since it's effectively just a constant.
> > > I'm still wondering if it'd be nicer just to use (__force u16) instead.
> > > 
> > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > index 7b8e39a..aa4b15c 100644
> > > --- a

Re: [PATCH v2] Fix AF_PACKET ABI breakage in 4.2

2015-09-24 Thread Greg Kurz
On Wed, 23 Sep 2015 19:45:08 +0100
David Woodhouse  wrote:

> Commit 7d82410950aa ("virtio: add explicit big-endian support to memory
> accessors") accidentally changed the virtio_net header used by
> AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
> 

Hi David,

Oops my bad... I obviously overlooked this one when adding cross-endian
support.

> Since virtio_legacy_is_little_endian() is a very long identifier,
> define a VIO_LE macro and use that throughout the code instead of the

VIO usually refers to virtual IO adapters for the PowerPC pSeries platform.

> hard-coded 'false' for little-endian.
> 

What about introducing dedicated accessors as it is done in many other
locations where we do virtio byteswap ? Something like:

static inline bool packet_is_little_endian(void)
{
return virtio_legacy_is_little_endian();
}

static inline u16 packet16_to_cpu(__virtio16 val)
{
return __virtio16_to_cpu(packet_is_little_endian(), val);
}

static inline __virtio16 cpu_to_packet16(u16 val)
{
return __cpu_to_virtio16(packet_is_little_endian(), val);
}

It results in prettier code IMHO. Have a look at drivers/net/tun.c or
drivers/vhost/vhost.c.

> This restores the ABI to match 4.1 and earlier kernels, and makes my
> test program work again.
> 

BTW, there is still work to do if we want to support cross-endian legacy or
virtio 1 on a big endian arch...

Cheers.

--
Greg

> Signed-off-by: David Woodhouse 
> ---
> On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote:
> > > +#define VIO_LE virtio_legacy_is_little_endian()
> > 
> > When you define a shorthand macro, the defines to a function call,
> > make the macro have parenthesis too.
> 
> In which case I suppose it also wants to be lower-case. Although
> "function call" is a bit strong since it's effectively just a constant.
> I'm still wondering if it'd be nicer just to use (__force u16) instead.
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7b8e39a..aa4b15c 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -230,6 +230,8 @@ struct packet_skb_cb {
>   } sa;
>  };
>  
> +#define vio_le() virtio_legacy_is_little_endian()
> +
>  #define PACKET_SKB_CB(__skb) ((struct packet_skb_cb *)((__skb)->cb))
>  
>  #define GET_PBDQC_FROM_RB(x) ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
> @@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
>   goto out_unlock;
>  
>   if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
> - (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
> -  __virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
> -   __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
> - vnet_hdr.hdr_len = __cpu_to_virtio16(false,
> -  __virtio16_to_cpu(false, vnet_hdr.csum_start) +
> - __virtio16_to_cpu(false, vnet_hdr.csum_offset) 
> + 2);
> + (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
> +  __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 >
> +   __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len)))
> + vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(),
> +  __virtio16_to_cpu(vio_le(), 
> vnet_hdr.csum_start) +
> + __virtio16_to_cpu(vio_le(), 
> vnet_hdr.csum_offset) + 2);
>  
>   err = -EINVAL;
> - if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
> + if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len)
>   goto out_unlock;
>  
>   if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> @@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
>   hlen = LL_RESERVED_SPACE(dev);
>   tlen = dev->needed_tailroom;
>   skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
> -__virtio16_to_cpu(false, vnet_hdr.hdr_len),
> +__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len),
>  msg->msg_flags & MSG_DONTWAIT, );
>   if (skb == NULL)
>   goto out_unlock;
> @@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
>  
>   if (po->has_vnet_hdr) {
>   if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> - u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
> - u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
> + u16 s = __virtio16_to_cpu(vio_le(), 
> vnet_hdr.csum_start);
> + u16 o = __virtio16_to_cpu(vio_le(), 
> vnet_hdr.csum_offset);
>   if (!skb_partial_csum_set(skb, s, o)) {
>   err = -EINVAL;
> 

Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-13 Thread Greg Kurz
On Thu,  9 Jul 2015 09:49:05 +0200
Thomas Huth th...@redhat.com wrote:

 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 Signed-off-by: Thomas Huth th...@redhat.com

Acked-by: Greg Kurz gk...@linux.vnet.ibm.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] virtio/vhost: cross endian support

2015-07-02 Thread Greg Kurz
On Thu, 2 Jul 2015 08:01:28 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote:
  On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote:
   virtio/vhost: cross endian support
  
  Ugh. Does this really have to be dynamic?
  
  Can't virtio do the sane thing, and just use a _fixed_ endianness?
  
  Doing a unconditional byte swap is faster and simpler than the crazy
  conditionals. That's true regardless of endianness, but gets to be
  even more so if the fixed endianness is little-endian, since BE is
  not-so-slowly fading from the world.
  
 Linus
 
 Yea, well - support for legacy BE guests on the new LE hosts is
 exactly the motivation for this.
 
 I dislike it too, but there are two redeeming properties that
 made me merge this:
 
 1.  It's a trivial amount of code: since we wrap host/guest accesses
 anyway, almost all of it is well hidden from drivers.
 
 2.  Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY -
 and when it's clear, there's zero overhead (as some point it was
 tested by compiling with and without the patches, got the same
 stripped binary).
 
 Maybe we could create a Kconfig symbol to enforce point (2): prevent
 people from enabling it e.g. on x86. I will look into this - but it can
 be done by a patch on top, so I think this can be merged as is.
 

This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I
am not aware of any other users. Maybe create a symbol that would
be only selected by PPC_BOOK3S_64 ?


 Or do you know of someone using kernel with all config options enabled
 undiscriminately?
 
 Thanks,
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html