Re: [V9fs-developer] [PATCH] net/9p: fix error path of p9_virtio_probe
On Thu, 24 May 2018 11:10:21 +0100 Jean-Philippe Brucker <jean-philippe.bruc...@arm.com> wrote: > Currently when virtio_find_single_vq fails, we go through del_vqs which > throws a warning (Trying to free already-free IRQ). Skip del_vqs if vq > allocation failed. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.bruc...@arm.com> > --- Reviewed-by: Greg Kurz <gr...@kaod.org> > net/9p/trans_virtio.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c > index 4d0372263e5d..1c87eee522b7 100644 > --- a/net/9p/trans_virtio.c > +++ b/net/9p/trans_virtio.c > @@ -562,7 +562,7 @@ static int p9_virtio_probe(struct virtio_device *vdev) > chan->vq = virtio_find_single_vq(vdev, req_done, "requests"); > if (IS_ERR(chan->vq)) { > err = PTR_ERR(chan->vq); > - goto out_free_vq; > + goto out_free_chan; > } > chan->vq->vdev->priv = chan; > spin_lock_init(>lock); > @@ -615,6 +615,7 @@ static int p9_virtio_probe(struct virtio_device *vdev) > kfree(tag); > out_free_vq: > vdev->config->del_vqs(vdev); > +out_free_chan: > kfree(chan); > fail: > return err;
Re: [PATCH] net/9p: avoid -ERESTARTSYS leak to userspace
Hi Andrew, Thank you very much for taking care of this. Please find my answers to your remarks below. On Fri, 9 Mar 2018 14:12:52 -0800 Andrew Morton <a...@linux-foundation.org> wrote: > On Fri, 09 Mar 2018 21:41:38 +0100 Greg Kurz <gr...@kaod.org> wrote: > > > If it was interrupted by a signal, the 9p client may need to send some > > more requests to the server for cleanup before returning to userspace. > > > > To avoid such a last minute request to be interrupted right away, the > > client memorizes if a signal is pending, clears TIF_SIGPENDING, handles > > the request and calls recalc_sigpending() before returning. > > > > Unfortunately, if the transmission of this cleanup request fails for any > > reason, the transport returns an error and the client propagates it right > > away, without calling recalc_sigpending(). > > > > This ends up with -ERESTARTSYS from the initially interrupted request > > crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup > > request. The specific signal handling code, which is responsible for > > converting -ERESTARTSYS to -EINTR is not called, and userspace receives > > the confusing errno value: > > > > open: Unknown error 512 (512) > > > > This is really hard to hit in real life. I discovered the issue while > > working on hot-unplug of a virtio-9p-pci device with an instrumented > > QEMU allowing to control request completion. > > > > Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy > > error path actually. Their code flow is a bit obscure and the best > > thing to do would probably be a full rewrite: to really ensure this > > situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can > > never happen. > > > > But given the general lack of interest for the 9p code, I won't risk > > breaking more things. So this patch simply fixes the buggy paths in > > both functions with a trivial label+goto. > > > > Thanks to Laurent Dufour for his help and suggestions on how to find > > the root cause and how to fix it. > > That's a fairly straightforward-looking bug. However the code still > looks a bit racy: > > > : if (signal_pending(current)) { > : sigpending = 1; > : clear_thread_flag(TIF_SIGPENDING); > : } else > : sigpending = 0; > : > > what happens if signal_pending(current) becomes true right here? > Depending on the transport c->trans_mod->request() could possibly return -ERESTARTSYS, and we would then recalc_sigpending() and propagate -ERESTARTSYS to the caller. > : err = c->trans_mod->request(c, req); > > > I'm surprised that the 9p client is mucking with signals at all. > Signals are a userspace IPC scheme and kernel code should instead be > using the more powerful messaging mechanisms which we've developed. > Ones which don't disrupt userspace state. > > Why is this happening? Is there some userspace/kernel interoperation > involved? > Before commit 9523feac272ccad2ad8186ba4fcc89103754de52, we used to rely on wait_event_interruptible() instead of wait_event_killable(). IIUC, the purpose of all this sigpending tweaking was just to allow subsequent cleanup related requests to be issued, without them being interrupted right away because of the initial signal. BTW the issue tentatively fixed by commit 9523feac272ccad2ad8186ba4fcc89103754de52 was more deeply investigated since then. It was caused by a bug in QEMU that got fixed, and will be available in the next release (2.12). And to cope with existing buggy QEMUs, we can assume that a -EINTR response from the server necessarily means the corresponding request has been successfully canceled. Cheers, -- Greg
[PATCH] net/9p: avoid -ERESTARTSYS leak to userspace
If it was interrupted by a signal, the 9p client may need to send some more requests to the server for cleanup before returning to userspace. To avoid such a last minute request to be interrupted right away, the client memorizes if a signal is pending, clears TIF_SIGPENDING, handles the request and calls recalc_sigpending() before returning. Unfortunately, if the transmission of this cleanup request fails for any reason, the transport returns an error and the client propagates it right away, without calling recalc_sigpending(). This ends up with -ERESTARTSYS from the initially interrupted request crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup request. The specific signal handling code, which is responsible for converting -ERESTARTSYS to -EINTR is not called, and userspace receives the confusing errno value: open: Unknown error 512 (512) This is really hard to hit in real life. I discovered the issue while working on hot-unplug of a virtio-9p-pci device with an instrumented QEMU allowing to control request completion. Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy error path actually. Their code flow is a bit obscure and the best thing to do would probably be a full rewrite: to really ensure this situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can never happen. But given the general lack of interest for the 9p code, I won't risk breaking more things. So this patch simply fixes the buggy paths in both functions with a trivial label+goto. Thanks to Laurent Dufour for his help and suggestions on how to find the root cause and how to fix it. Signed-off-by: Greg Kurz <gr...@kaod.org> --- net/9p/client.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/9p/client.c b/net/9p/client.c index b433aff5ff13..e6cae8332e2e 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (err < 0) { if (err != -ERESTARTSYS && err != -EFAULT) c->status = Disconnected; - goto reterr; + goto recalc_sigpending; } again: /* Wait for the response */ @@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (req->status == REQ_STATUS_RCVD) err = 0; } +recalc_sigpending: if (sigpending) { spin_lock_irqsave(>sighand->siglock, flags); recalc_sigpending(); @@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, if (err == -EIO) c->status = Disconnected; if (err != -ERESTARTSYS) - goto reterr; + goto recalc_sigpending; } if (req->status == REQ_STATUS_ERROR) { p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); @@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, if (req->status == REQ_STATUS_RCVD) err = 0; } +recalc_sigpending: if (sigpending) { spin_lock_irqsave(>sighand->siglock, flags); recalc_sigpending();
Re: [V9fs-developer] [PATCH] net/9p: avoid -ERESTARTSYS leak to userspace
Hi Al, It's been two years without any sign of life from 9p maintainers... :-\ Would you apply (or nack) this patch ? Thanks, -- Greg PS: in the case you apply it, probable Cc sta...@vger.kernel.org as well On Thu, 08 Feb 2018 18:38:49 +0100 Greg Kurz <gr...@kaod.org> wrote: > If it was interrupted by a signal, the 9p client may need to send some > more requests to the server for cleanup before returning to userspace. > > To avoid such a last minute request to be interrupted right away, the > client memorizes if a signal is pending, clear TIF_SIGPENDING, handle > the request and call recalc_sigpending() before returning. > > Unfortunately, if the transmission of this cleanup request fails for any > reason, the transport returns an error and the client propagates it right > away, without calling recalc_sigpending(). > > This ends up with -ERESTARTSYS from the initially interrupted request > crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup > request. The specific signal handling code, which is responsible for > converting -ERESTARTSYS to -EINTR is not called, and userspace receives > the confusing errno value: > > open: Unknown error 512 (512) > > This is really hard to hit in real life. I discovered the issue while > working on hot-unplug of a virtio-9p-pci device with an instrumented > QEMU allowing to control request completion. > > Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy > error path actually. Their code flow is a bit obscure and the best > thing to do would probably be a full rewrite: to really ensure this > situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can > never happen. > > But given the general lack of interest for the 9p code, I won't risk > breaking more things. So this patch simply fix the buggy paths in both > functions with a trivial label+goto. > > Thanks to Laurent Dufour for his help and suggestions on how to find > the root cause and how to fix it. > > Signed-off-by: Greg Kurz <gr...@kaod.org> > --- > net/9p/client.c |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/9p/client.c b/net/9p/client.c > index 4c8cf9c1631a..5154eaf19fff 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const > char *fmt, ...) > if (err < 0) { > if (err != -ERESTARTSYS && err != -EFAULT) > c->status = Disconnected; > - goto reterr; > + goto recalc_sigpending; > } > again: > /* Wait for the response */ > @@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const > char *fmt, ...) > if (req->status == REQ_STATUS_RCVD) > err = 0; > } > +recalc_sigpending: > if (sigpending) { > spin_lock_irqsave(>sighand->siglock, flags); > recalc_sigpending(); > @@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client > *c, int8_t type, > if (err == -EIO) > c->status = Disconnected; > if (err != -ERESTARTSYS) > - goto reterr; > + goto recalc_sigpending; > } > if (req->status == REQ_STATUS_ERROR) { > p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); > @@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client > *c, int8_t type, > if (req->status == REQ_STATUS_RCVD) > err = 0; > } > +recalc_sigpending: > if (sigpending) { > spin_lock_irqsave(>sighand->siglock, flags); > recalc_sigpending(); > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > V9fs-developer mailing list > v9fs-develo...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/v9fs-developer
Re: [V9fs-developer] [PATCH] 9p/trans_virtio: discard zero-length reply
Ping ? Michael, Since this is virtio code and you have acked the QEMU part of the fix already, would you be kind enough to take this through your tree ? Cheers, -- Greg On Mon, 22 Jan 2018 22:02:05 +0100 Greg Kurz <gr...@kaod.org> wrote: > When a 9p request is successfully flushed, the server is expected to just > mark it as used without sending a 9p reply (ie, without writing data into > the buffer). In this case, virtqueue_get_buf() will return len == 0 and > we must not report a REQ_STATUS_RCVD status to the client, otherwise the > client will erroneously assume the request has not been flushed. > > Signed-off-by: Greg Kurz <gr...@kaod.org> > --- > net/9p/trans_virtio.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c > index 0845aad4ba51..ca08c72ef4de 100644 > --- a/net/9p/trans_virtio.c > +++ b/net/9p/trans_virtio.c > @@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq) > spin_unlock_irqrestore(>lock, flags); > /* Wakeup if anyone waiting for VirtIO ring space. */ > wake_up(chan->vc_wq); > - p9_client_cb(chan->client, req, REQ_STATUS_RCVD); > + if (len) > + p9_client_cb(chan->client, req, REQ_STATUS_RCVD); > } > } > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > V9fs-developer mailing list > v9fs-develo...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/v9fs-developer
[PATCH] net/9p: avoid -ERESTARTSYS leak to userspace
If it was interrupted by a signal, the 9p client may need to send some more requests to the server for cleanup before returning to userspace. To avoid such a last minute request to be interrupted right away, the client memorizes if a signal is pending, clear TIF_SIGPENDING, handle the request and call recalc_sigpending() before returning. Unfortunately, if the transmission of this cleanup request fails for any reason, the transport returns an error and the client propagates it right away, without calling recalc_sigpending(). This ends up with -ERESTARTSYS from the initially interrupted request crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup request. The specific signal handling code, which is responsible for converting -ERESTARTSYS to -EINTR is not called, and userspace receives the confusing errno value: open: Unknown error 512 (512) This is really hard to hit in real life. I discovered the issue while working on hot-unplug of a virtio-9p-pci device with an instrumented QEMU allowing to control request completion. Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy error path actually. Their code flow is a bit obscure and the best thing to do would probably be a full rewrite: to really ensure this situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can never happen. But given the general lack of interest for the 9p code, I won't risk breaking more things. So this patch simply fix the buggy paths in both functions with a trivial label+goto. Thanks to Laurent Dufour for his help and suggestions on how to find the root cause and how to fix it. Signed-off-by: Greg Kurz <gr...@kaod.org> --- net/9p/client.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/9p/client.c b/net/9p/client.c index 4c8cf9c1631a..5154eaf19fff 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (err < 0) { if (err != -ERESTARTSYS && err != -EFAULT) c->status = Disconnected; - goto reterr; + goto recalc_sigpending; } again: /* Wait for the response */ @@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (req->status == REQ_STATUS_RCVD) err = 0; } +recalc_sigpending: if (sigpending) { spin_lock_irqsave(>sighand->siglock, flags); recalc_sigpending(); @@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, if (err == -EIO) c->status = Disconnected; if (err != -ERESTARTSYS) - goto reterr; + goto recalc_sigpending; } if (req->status == REQ_STATUS_ERROR) { p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); @@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, if (req->status == REQ_STATUS_RCVD) err = 0; } +recalc_sigpending: if (sigpending) { spin_lock_irqsave(>sighand->siglock, flags); recalc_sigpending();
[PATCH] 9p/trans_virtio: discard zero-length reply
When a 9p request is successfully flushed, the server is expected to just mark it as used without sending a 9p reply (ie, without writing data into the buffer). In this case, virtqueue_get_buf() will return len == 0 and we must not report a REQ_STATUS_RCVD status to the client, otherwise the client will erroneously assume the request has not been flushed. Signed-off-by: Greg Kurz <gr...@kaod.org> --- net/9p/trans_virtio.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 0845aad4ba51..ca08c72ef4de 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq) spin_unlock_irqrestore(>lock, flags); /* Wakeup if anyone waiting for VirtIO ring space. */ wake_up(chan->vc_wq); - p9_client_cb(chan->client, req, REQ_STATUS_RCVD); + if (len) + p9_client_cb(chan->client, req, REQ_STATUS_RCVD); } }
Re: [BUG/RFC] vhost: net: big endian viring access despite virtio 1
On Fri, 27 Jan 2017 13:24:13 +0100 Halil Pasicwrote: > On 01/26/2017 08:20 PM, Michael S. Tsirkin wrote: > > On Thu, Jan 26, 2017 at 06:39:14PM +0100, Halil Pasic wrote: > >> > >> Hi! > >> > >> Recently I have been investigating some strange migration problems on > >> s390x. > >> > >> It turned out under certain circumstances vhost_net corrupts avail.idx by > >> using wrong endianness. > > [..] > > >> -8<-- > >> >From b26e2bbdc03832a0204ee2b42967a1b49a277dc8 Mon Sep 17 00:00:00 2001 > >> From: Halil Pasic > >> Date: Thu, 26 Jan 2017 00:06:15 +0100 > >> Subject: [PATCH] vhost: remove useless/dangerous reset of is_le > >> > >> The reset of is_le does no good, but it contributes its fair share to a > >> bug in vhost_net, which occurs if we have some oldubufs when stopping and > >> setting a fd = -1 as a backend. Instead of doing something convoluted in > >> vhost_net, let's just get rid of the reset. > >> > >> Signed-off-by: Halil Pasic > >> Fixes: commit 2751c9882b94 > >> --- > >> drivers/vhost/vhost.c | 4 +--- > >> 1 file changed, 1 insertion(+), 3 deletions(-) > >> > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > >> index d643260..08072a2 100644 > >> --- a/drivers/vhost/vhost.c > >> +++ b/drivers/vhost/vhost.c > >> @@ -1714,10 +1714,8 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) > >> int r; > >> bool is_le = vq->is_le; > >> > >> - if (!vq->private_data) { > >> - vhost_reset_is_le(vq); > >> + if (!vq->private_data) > >> return 0; > >> - } > >> > >> vhost_init_is_le(vq); > > > > > > I think you do need to reset it, just maybe within vhost_init_is_le. > > > > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) > > vq->is_le = true; > > else > > vhost_reset_is_le(vq); > > > > > > That is a very good point! I have overlooked that while the > CONFIG_VHOST_CROSS_ENDIAN_LEGACY variant > > static void vhost_init_is_le(struct vhost_virtqueue *vq) > { > /* Note for legacy virtio: user_be is initialized at reset time > * according to the host endianness. If userspace does not set an > * explicit endianness, the default behavior is native endian, as > * expected by legacy virtio. > */ > vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; > } > > is fine the other variant > > static void vhost_init_is_le(struct vhost_virtqueue *vq) > { > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) > vq->is_le = true; > } > is a very strange initializer (makes assumptions about the state > to be initialized). > > I agree, setting native endianness there sounds very reasonable. > > I have a question regarding readability. IMHO the relationship > of reset_is_le and int_is_le is a bit confusing, and I'm afraid > it could become even more confusing with using reset in one of > the init_is_le's. > > How about we do the following? > > static void vhost_init_is_le(struct vhost_virtqueue *vq) > { > if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) > vq->is_le = true; > + else > + vq->is_le = virtio_legacy_is_little_endian(); > > } > > static void vhost_reset_is_le(struct vhost_virtqueue *vq) > { > - vq->is_le = virtio_legacy_is_little_endian(); > + vhost_init_is_le(vq); > } > > That way we would have correct endianness both after reset > and after init, I think :). > Yes, I think this is what we need. Cheers. -- Greg > Thank you very much! > > Halil > >
Re: [PATCH V4 0/3] basic busy polling support for vhost_net
On Fri, 4 Mar 2016 06:24:50 -0500 Jason Wangwrote: > This series tries to add basic busy polling for vhost net. The idea is > simple: at the end of tx/rx processing, busy polling for new tx added > descriptor and rx receive socket for a while. The maximum number of > time (in us) could be spent on busy polling was specified ioctl. > > Test A were done through: > > - 50 us as busy loop timeout > - Netperf 2.6 > - Two machines with back to back connected mlx4 Hi Jason, Could this also improve performance if both VMs are on the same host system ? > - Guest with 1 vcpus and 1 queue > > Results: > - Obvious improvements (%5 - 20%) for latency (TCP_RR). > - Get a better or minor regression on most of the TX tests, but see > some regression on 4096 size. > - Except for 8 sessions of 4096 size RX, have a better or same > performance. > - CPU utilization were incrased as expected. > > TCP_RR: > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ > 1/ 1/ +8%/ -32%/ +8%/ +8%/ +7% > 1/50/ +7%/ -19%/ +7%/ +7%/ +1% > 1/ 100/ +5%/ -21%/ +5%/ +5%/0% > 1/ 200/ +5%/ -21%/ +7%/ +7%/ +1% >64/ 1/ +11%/ -29%/ +11%/ +11%/ +10% >64/50/ +7%/ -19%/ +8%/ +8%/ +2% >64/ 100/ +8%/ -18%/ +9%/ +9%/ +2% >64/ 200/ +6%/ -19%/ +6%/ +6%/0% > 256/ 1/ +7%/ -33%/ +7%/ +7%/ +6% > 256/50/ +7%/ -18%/ +7%/ +7%/0% > 256/ 100/ +9%/ -18%/ +8%/ +8%/ +2% > 256/ 200/ +9%/ -18%/ +10%/ +10%/ +3% > 1024/ 1/ +20%/ -28%/ +20%/ +20%/ +19% > 1024/50/ +8%/ -18%/ +9%/ +9%/ +2% > 1024/ 100/ +6%/ -19%/ +5%/ +5%/0% > 1024/ 200/ +8%/ -18%/ +9%/ +9%/ +2% > Guest TX: > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ >64/ 1/ -5%/ -28%/ +11%/ +12%/ +10% >64/ 4/ -2%/ -26%/ +13%/ +13%/ +13% >64/ 8/ -6%/ -29%/ +9%/ +10%/ +10% > 512/ 1/ +15%/ -7%/ +13%/ +11%/ +3% > 512/ 4/ +17%/ -6%/ +18%/ +13%/ +11% > 512/ 8/ +14%/ -7%/ +13%/ +7%/ +7% > 1024/ 1/ +27%/ -2%/ +26%/ +29%/ +12% > 1024/ 4/ +8%/ -9%/ +6%/ +1%/ +6% > 1024/ 8/ +41%/ +12%/ +34%/ +20%/ -3% > 4096/ 1/ -22%/ -21%/ -36%/ +81%/+1360% > 4096/ 4/ -57%/ -58%/ +286%/ +15%/+2074% > 4096/ 8/ +67%/ +70%/ -45%/ -8%/ +63% > 16384/ 1/ -2%/ -5%/ +5%/ -3%/ +80% > 16384/ 4/0%/0%/0%/ +4%/ +138% > 16384/ 8/0%/0%/0%/ +1%/ +41% > 65535/ 1/ -3%/ -6%/ +2%/ +11%/ +113% > 65535/ 4/ -2%/ -1%/ -2%/ -3%/ +484% > 65535/ 8/0%/ +1%/0%/ +2%/ +40% > Guest RX: > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ >64/ 1/ +31%/ -3%/ +8%/ +8%/ +8% >64/ 4/ +11%/ -17%/ +13%/ +14%/ +15% >64/ 8/ +4%/ -23%/ +11%/ +11%/ +12% > 512/ 1/ +24%/0%/ +18%/ +14%/ -8% > 512/ 4/ +4%/ -15%/ +6%/ +5%/ +6% > 512/ 8/ +26%/0%/ +21%/ +10%/ +3% > 1024/ 1/ +88%/ +47%/ +69%/ +44%/ -30% > 1024/ 4/ +18%/ -5%/ +19%/ +16%/ +2% > 1024/ 8/ +15%/ -4%/ +13%/ +8%/ +1% > 4096/ 1/ -3%/ -5%/ +2%/ -2%/ +41% > 4096/ 4/ +2%/ +3%/ -20%/ -14%/ -24% > 4096/ 8/ -43%/ -45%/ +69%/ -24%/ +94% > 16384/ 1/ -3%/ -11%/ +23%/ +7%/ +42% > 16384/ 4/ -3%/ -3%/ -4%/ +5%/ +115% > 16384/ 8/ -1%/0%/ -1%/ -3%/ +32% > 65535/ 1/ +1%/0%/ +2%/0%/ +66% > 65535/ 4/ -1%/ -1%/0%/ +4%/ +492% > 65535/ 8/0%/ -1%/ -1%/ +4%/ +38% > > Changes from V3: > - drop single_task_running() > - use cpu_relax_lowlatency() instead of cpu_relax() > > Changes from V2: > - rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return > false we __get_user() fails. > - do not bother premmptions/timers for good path. > - use vhost_vring_state as ioctl parameter instead of reinveting a new > one. > - add the unit of timeout (us) to the comment of new added ioctls > > Changes from V1: > - remove the buggy vq_error() in vhost_vq_more_avail(). > - leave vhost_enable_notify() untouched. > > Changes from RFC V3: > - small tweak on the code to avoid multiple duplicate conditions in > critical path when busy loop is not enabled. > - add the test result of multiple VMs > > Changes from RFC V2: > - poll also at the end of rx handling > - factor out the polling logic and optimize the code a little bit > - add two ioctls to get and set the busy poll timeout > - test on ixgbe (which can give more stable and reproducable numbers) > instead of mlx4. > > Changes from RFC V1: > - add a comment for vhost_has_work() to explain why it could be > lockless > - add param description for busyloop_timeout > - split out the busy polling logic into a new helper > - check and exit the loop when
Re: [PATCH v2 0/3] vhost: cross-endian code cleanup
On Tue, 16 Feb 2016 17:34:13 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Tue, Feb 16, 2016 at 03:54:18PM +0100, Greg Kurz wrote: > > This series is a new tentative to have cleaner cross-endian code. > > > > Patches 1/3 is new: it fixes a side-effect in case vhost_init_used() fails. > > > > Patch 2/3 comes from v1: it renames cross-endian helpers > > > > Patch 3/3 is new: it simply renames vhost_init_used() as suggested by > > Michael. > > Is this on top of my tree? > No it's on top of Linus's but I can rebase. > > --- > > > > Greg Kurz (3): > > vhost: fix error path in vhost_init_used() > > vhost: rename cross-endian helpers > > vhost: rename vhost_init_used() > > > > > > drivers/vhost/net.c |2 +- > > drivers/vhost/scsi.c |2 +- > > drivers/vhost/test.c |2 +- > > drivers/vhost/vhost.c | 49 > > + > > drivers/vhost/vhost.h |2 +- > > 5 files changed, 41 insertions(+), 16 deletions(-) >
[PATCH v2 2/3] vhost: rename cross-endian helpers
The default use case for vhost is when the host and the vring have the same endianness (default native endianness). But there are cases where they differ and vhost should byteswap when accessing the vring. The first case is when the host is big endian and the vring belongs to a virtio 1.0 device, which is always little endian. This is covered by the vq->is_le field. This field is initialized when userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device stops. We already have a vhost_init_is_le() helper, but the reset operation is opencoded as follows: vq->is_le = virtio_legacy_is_little_endian(); It isn't clear that we are resetting vq->is_le here. This patch moves the code to a helper with a more explicit name. The other case where we may have to byteswap is when the architecture can switch endianness at runtime (bi-endian). If endianness differs in the host and in the guest, then legacy devices need to be used in cross-endian mode. This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which introduces a vq->user_be field. Userspace may enable cross-endian mode by calling the SET_VRING_ENDIAN ioctl before the device is started. The cross-endian mode is disabled when the device is stopped. The current names of the helpers that manipulate vq->user_be are unclear. This patch renames those helpers to clearly show that this is cross-endian stuff and with explicit enable/disable semantics. No behaviour change. Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> --- Changes since v1: - use init/reset semantics for vq->is_le (Michael) - use enable/disable semantics for vq->user_be (Cornelia) - rewrote the changelog --- drivers/vhost/vhost.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 236553e81027..69f6463e11bd 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -43,11 +43,21 @@ enum { #define vhost_avail_event(vq) ((__virtio16 __user *)>used->ring[vq->num]) #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq) +static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { vq->user_be = !virtio_legacy_is_little_endian(); } +static void vhost_enable_cross_endian_big(struct vhost_virtqueue *vq) +{ + vq->user_be = true; +} + +static void vhost_enable_cross_endian_little(struct vhost_virtqueue *vq) +{ + vq->user_be = false; +} + static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user *argp) { struct vhost_vring_state s; @@ -62,7 +72,10 @@ static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user *argp) s.num != VHOST_VRING_BIG_ENDIAN) return -EINVAL; - vq->user_be = s.num; + if (s.num == VHOST_VRING_BIG_ENDIAN) + vhost_enable_cross_endian_big(vq); + else + vhost_enable_cross_endian_little(vq); return 0; } @@ -91,7 +104,7 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; } #else -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq) +static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { } @@ -113,6 +126,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) } #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ +static void vhost_reset_is_le(struct vhost_virtqueue *vq) +{ + vq->is_le = virtio_legacy_is_little_endian(); +} + static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { @@ -276,8 +294,8 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->call = NULL; vq->log_ctx = NULL; vq->memory = NULL; - vq->is_le = virtio_legacy_is_little_endian(); - vhost_vq_reset_user_be(vq); + vhost_reset_is_le(vq); + vhost_disable_cross_endian(vq); } static int vhost_worker(void *data) @@ -1159,7 +1177,7 @@ int vhost_init_used(struct vhost_virtqueue *vq) bool is_le = vq->is_le; if (!vq->private_data) { - vq->is_le = virtio_legacy_is_little_endian(); + vhost_reset_is_le(vq); return 0; }
[PATCH v2 3/3] vhost: rename vhost_init_used()
Looking at how callers use this, maybe we should just rename init_used to vhost_vq_init_access. The _used suffix was a hint that we access the vq used ring. But maybe what callers care about is that it must be called after access_ok. Also, this function manipulates the vq->is_le field which isn't related to the vq used ring. This patch simply renames vhost_init_used() to vhost_vq_init_access() as suggested by Michael. No behaviour change. Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> --- drivers/vhost/net.c |2 +- drivers/vhost/scsi.c |2 +- drivers/vhost/test.c |2 +- drivers/vhost/vhost.c |4 ++-- drivers/vhost/vhost.h |2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9eda69e40678..7bd75ff8be26 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -917,7 +917,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) vhost_net_disable_vq(n, vq); vq->private_data = sock; - r = vhost_init_used(vq); + r = vhost_vq_init_access(vq); if (r) goto err_used; r = vhost_net_enable_vq(n, vq); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 29cfc57d496e..f898686cdd93 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1274,7 +1274,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, vq = >vqs[i].vq; mutex_lock(>mutex); vq->private_data = vs_tpg; - vhost_init_used(vq); + vhost_vq_init_access(vq); mutex_unlock(>mutex); } ret = 0; diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index f2882ac98726..388eec4e1a90 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -196,7 +196,7 @@ static long vhost_test_run(struct vhost_test *n, int test) oldpriv = vq->private_data; vq->private_data = priv; - r = vhost_init_used(>vqs[index]); + r = vhost_vq_init_access(>vqs[index]); mutex_unlock(>mutex); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 69f6463e11bd..328c54ab0154 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1170,7 +1170,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) return 0; } -int vhost_init_used(struct vhost_virtqueue *vq) +int vhost_vq_init_access(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; @@ -1200,7 +1200,7 @@ err: vq->is_le = is_le; return r; } -EXPORT_SYMBOL_GPL(vhost_init_used); +EXPORT_SYMBOL_GPL(vhost_vq_init_access); static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct iovec iov[], int iov_size) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index d3f767448a72..8f0dd0d915d4 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -148,7 +148,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *, struct vhost_log *log, unsigned int *log_num); void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); -int vhost_init_used(struct vhost_virtqueue *); +int vhost_vq_init_access(struct vhost_virtqueue *); int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len); int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads, unsigned count);
[PATCH v2 0/3] vhost: cross-endian code cleanup
This series is a new tentative to have cleaner cross-endian code. Patches 1/3 is new: it fixes a side-effect in case vhost_init_used() fails. Patch 2/3 comes from v1: it renames cross-endian helpers Patch 3/3 is new: it simply renames vhost_init_used() as suggested by Michael. --- Greg Kurz (3): vhost: fix error path in vhost_init_used() vhost: rename cross-endian helpers vhost: rename vhost_init_used() drivers/vhost/net.c |2 +- drivers/vhost/scsi.c |2 +- drivers/vhost/test.c |2 +- drivers/vhost/vhost.c | 49 + drivers/vhost/vhost.h |2 +- 5 files changed, 41 insertions(+), 16 deletions(-)
[PATCH v2 1/3] vhost: fix error path in vhost_init_used()
We don't want side effects. If something fails, we rollback vq->is_le to its previous value. Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> --- drivers/vhost/vhost.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index ad2146a9ab2d..236553e81027 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1156,6 +1156,8 @@ int vhost_init_used(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; + bool is_le = vq->is_le; + if (!vq->private_data) { vq->is_le = virtio_legacy_is_little_endian(); return 0; @@ -1165,15 +1167,20 @@ int vhost_init_used(struct vhost_virtqueue *vq) r = vhost_update_used_flags(vq); if (r) - return r; + goto err; vq->signalled_used_valid = false; - if (!access_ok(VERIFY_READ, >used->idx, sizeof vq->used->idx)) - return -EFAULT; + if (!access_ok(VERIFY_READ, >used->idx, sizeof vq->used->idx)) { + r = -EFAULT; + goto err; + } r = __get_user(last_used_idx, >used->idx); if (r) - return r; + goto err; vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); return 0; +err: + vq->is_le = is_le; + return r; } EXPORT_SYMBOL_GPL(vhost_init_used);
Re: [PATCH 1/2] vhost: helpers to enable/disable vring endianness
On Wed, 10 Feb 2016 17:08:52 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Wed, Feb 10, 2016 at 01:11:34PM +0100, Greg Kurz wrote: > > On Wed, 10 Feb 2016 13:21:22 +0200 > > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > > > On Wed, Jan 13, 2016 at 06:09:41PM +0100, Greg Kurz wrote: > > > > The default use case for vhost is when the host and the vring have the > > > > same endianness (default native endianness). But there are cases where > > > > they differ and vhost should byteswap when accessing the vring: > > > > - the host is big endian and the vring comes from a virtio 1.0 device > > > > which is always little endian > > > > - the architecture is bi-endian and the vring comes from a legacy virtio > > > > device with a different endianness than the endianness of the host > > > > (aka > > > > legacy cross-endian) > > > > > > > > These cases are handled by the vq->is_le and the optional vq->user_be, > > > > with the following logic: > > > > - if none of the fields is enabled, vhost access the vring without > > > > byteswap > > > > - if the vring is virtio 1.0 and the host is big endian, vq->is_le is > > > > enabled to enforce little endian access to the vring > > > > - if the vring is legacy cross-endian, userspace enables vq->user_be > > > > to inform vhost about the vring endianness. This endianness is then > > > > enforced for vring accesses through vq->is_le again > > > > > > > > The logic is unclear in the current code. > > > > > > > > This patch introduces helpers with explicit enable and disable > > > > semantics, > > > > for better clarity. > > > > > > > > No behaviour change. > > > > > > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > > > --- > > > > drivers/vhost/vhost.c | 28 +++- > > > > 1 file changed, 19 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index ad2146a9ab2d..e02e06755ab7 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -43,11 +43,16 @@ enum { > > > > #define vhost_avail_event(vq) ((__virtio16 __user > > > > *)>used->ring[vq->num]) > > > > > > > > #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY > > > > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq) > > > > +static void vhost_disable_user_be(struct vhost_virtqueue *vq) > > > > { > > > > vq->user_be = !virtio_legacy_is_little_endian(); > > > > } > > > > > > > > > > Hmm this doesn't look like an improvement to me. > > > What does it mean to disable big endian? Make it little endian? > > > > The default behavior for the device is native endian. > > > > The SET_VRING_ENDIAN ioctl is used to make the device big endian > > on little endian hosts, hence "enabling" cross-endian mode... > > > > > Existing reset seems to make sense. > > > > > > > ... and we "disable" cross-endian mode on reset. > > OK but that's enable cross-endian. Not enable be. We could have > something like vhost_disable_user_byte_swap though each time we try, > the result makes my head hurt: "swap" is a relative thing and hard to > keep track of. > What about having something like below then ? vhost_enable_cross_endian_big() to be called on little endian hosts with big endian devices vhost_enable_cross_endian_little() to be called on big endian hosts with little endian devices vhost_disable_cross_endian() to go back to the native endian default > > > > +static void vhost_enable_user_be(struct vhost_virtqueue *vq, bool > > > > user_be) > > > > +{ > > > > + vq->user_be = user_be; > > > > +} > > > > + > > > > > > And this is maybe "init_user_be"? > > > > > > > Anyway I don't mind changing the names to reset/init_user_be if you think it > > is clearer. > > > > > > static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int > > > > __user *argp) > > > > { > > > > struct vhost_vring_state s; > > > >
Re: [PATCH 2/2] vhost: disentangle vring endianness stuff from the core code
On Wed, 10 Feb 2016 14:23:33 +0100 Cornelia Huck <cornelia.h...@de.ibm.com> wrote: > On Wed, 10 Feb 2016 14:08:43 +0100 > Greg Kurz <gk...@linux.vnet.ibm.com> wrote: > > > But you are right, there is a bug: we should rollback if vhost_init_used() > > fails. Something like below: > > > > err_used: > > vq->private_data = oldsock; > > vhost_net_enable_vq(n, vq); > > + vhost_adjust_vring_endian(vq); > > Shouldn't we switch back before we reenable? Or have I lost myself in > this maze here again? > I haven't spotted any path under vhost_net_enable_vq() that needs the vring endianness, but indeed it looks safer to switch back before a worker thread may be woken up. > > if (ubufs) > > vhost_net_ubuf_put_wait_and_free(ubufs); > > err_ubufs:
Re: [PATCH 1/2] vhost: helpers to enable/disable vring endianness
On Wed, 10 Feb 2016 13:21:22 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Wed, Jan 13, 2016 at 06:09:41PM +0100, Greg Kurz wrote: > > The default use case for vhost is when the host and the vring have the > > same endianness (default native endianness). But there are cases where > > they differ and vhost should byteswap when accessing the vring: > > - the host is big endian and the vring comes from a virtio 1.0 device > > which is always little endian > > - the architecture is bi-endian and the vring comes from a legacy virtio > > device with a different endianness than the endianness of the host (aka > > legacy cross-endian) > > > > These cases are handled by the vq->is_le and the optional vq->user_be, > > with the following logic: > > - if none of the fields is enabled, vhost access the vring without byteswap > > - if the vring is virtio 1.0 and the host is big endian, vq->is_le is > > enabled to enforce little endian access to the vring > > - if the vring is legacy cross-endian, userspace enables vq->user_be > > to inform vhost about the vring endianness. This endianness is then > > enforced for vring accesses through vq->is_le again > > > > The logic is unclear in the current code. > > > > This patch introduces helpers with explicit enable and disable semantics, > > for better clarity. > > > > No behaviour change. > > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > --- > > drivers/vhost/vhost.c | 28 +++- > > 1 file changed, 19 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index ad2146a9ab2d..e02e06755ab7 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -43,11 +43,16 @@ enum { > > #define vhost_avail_event(vq) ((__virtio16 __user > > *)>used->ring[vq->num]) > > > > #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY > > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq) > > +static void vhost_disable_user_be(struct vhost_virtqueue *vq) > > { > > vq->user_be = !virtio_legacy_is_little_endian(); > > } > > > > Hmm this doesn't look like an improvement to me. > What does it mean to disable big endian? Make it little endian? The default behavior for the device is native endian. The SET_VRING_ENDIAN ioctl is used to make the device big endian on little endian hosts, hence "enabling" cross-endian mode... > Existing reset seems to make sense. > ... and we "disable" cross-endian mode on reset. > > +static void vhost_enable_user_be(struct vhost_virtqueue *vq, bool user_be) > > +{ > > + vq->user_be = user_be; > > +} > > + > > And this is maybe "init_user_be"? > Anyway I don't mind changing the names to reset/init_user_be if you think it is clearer. > > static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user > > *argp) > > { > > struct vhost_vring_state s; > > @@ -62,7 +67,7 @@ static long vhost_set_vring_endian(struct vhost_virtqueue > > *vq, int __user *argp) > > s.num != VHOST_VRING_BIG_ENDIAN) > > return -EINVAL; > > > > - vq->user_be = s.num; > > + vhost_enable_user_be(vq, !!s.num); > > > > return 0; > > } > > @@ -81,7 +86,7 @@ static long vhost_get_vring_endian(struct vhost_virtqueue > > *vq, u32 idx, > > return 0; > > } > > > > -static void vhost_init_is_le(struct vhost_virtqueue *vq) > > +static void vhost_enable_is_le(struct vhost_virtqueue *vq) > > { > > /* Note for legacy virtio: user_be is initialized at reset time > > * according to the host endianness. If userspace does not set an > > Same thing really. I'd rather add "reset_is_le". > > > @@ -91,7 +96,7 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) > > vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; > > } > > #else > > -static void vhost_vq_reset_user_be(struct vhost_virtqueue *vq) > > +static void vhost_disable_user_be(struct vhost_virtqueue *vq) > > { > > } > > > > @@ -106,13 +111,18 @@ static long vhost_get_vring_endian(struct > > vhost_virtqueue *vq, u32 idx, > > return -ENOIOCTLCMD; > > } > > > > -static void vhost_init_is_le(struct vhost_virtqueue *vq) > > +static void vhost_enable_is_le(struct vhost_virtqueue *vq) > > { > >
Re: [PATCH 2/2] vhost: disentangle vring endianness stuff from the core code
On Wed, 10 Feb 2016 13:48:09 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Wed, Jan 13, 2016 at 06:09:47PM +0100, Greg Kurz wrote: > > The way vring endianness is being handled currently obfuscates > > the code in vhost_init_used(). > > > > This patch tries to fix that by doing the following: > > - move the the code that adjusts endianness to a dedicated helper > > - export this helper so that backends explicitely call it > > > > No behaviour change. > > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > --- > > drivers/vhost/net.c |3 +++ > > drivers/vhost/scsi.c |3 +++ > > drivers/vhost/test.c |2 ++ > > drivers/vhost/vhost.c | 16 +++- > > drivers/vhost/vhost.h |1 + > > 5 files changed, 20 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > index 9eda69e40678..df01c939cd00 100644 > > --- a/drivers/vhost/net.c > > +++ b/drivers/vhost/net.c > > @@ -917,6 +917,9 @@ static long vhost_net_set_backend(struct vhost_net *n, > > unsigned index, int fd) > > > > vhost_net_disable_vq(n, vq); > > vq->private_data = sock; > > + > > + vhost_adjust_vring_endian(vq); > > + > > r = vhost_init_used(vq); > > if (r) > > goto err_used; > > > This is in fact a bug in existing code: if vhost_init_used > fails, it preferably should not have side-effects. > It's best to update it last thing. > I'm afraid we can't because the following path needs the vring endianness: vhost_init_used()->vhost_update_used_flags()->cpu_to_vhost16() But you are right, there is a bug: we should rollback if vhost_init_used() fails. Something like below: err_used: vq->private_data = oldsock; vhost_net_enable_vq(n, vq); + vhost_adjust_vring_endian(vq); if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > > index 29cfc57d496e..5a8363bfcb74 100644 > > --- a/drivers/vhost/scsi.c > > +++ b/drivers/vhost/scsi.c > > @@ -1274,6 +1274,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > > vq = >vqs[i].vq; > > mutex_lock(>mutex); > > vq->private_data = vs_tpg; > > + > > + vhost_adjust_vring_endian(vq); > > + > > vhost_init_used(vq); > > mutex_unlock(>mutex); > > } > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > index f2882ac98726..75e3e0e9f5a8 100644 > > --- a/drivers/vhost/test.c > > +++ b/drivers/vhost/test.c > > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int > > test) > > oldpriv = vq->private_data; > > vq->private_data = priv; > > > > + vhost_adjust_vring_endian(vq); > > + > > r = vhost_init_used(>vqs[index]); > > > > mutex_unlock(>mutex); > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index e02e06755ab7..b0a00340309e 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -123,6 +123,15 @@ static void vhost_disable_is_le(struct vhost_virtqueue > > *vq) > > vq->is_le = virtio_legacy_is_little_endian(); > > } > > > > +void vhost_adjust_vring_endian(struct vhost_virtqueue *vq) > > +{ > > + if (!vq->private_data) > > + vhost_disable_is_le(vq); > > + else > > + vhost_enable_is_le(vq); > > +} > > +EXPORT_SYMBOL_GPL(vhost_adjust_vring_endian); > > + > > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, > > poll_table *pt) > > { > > I'd prefer "vhost_update_is_le" here. "endian" might also mean > "user_be". But see below pls. > Ok, I will rename if this patch survives the review. > > > @@ -1166,12 +1175,9 @@ int vhost_init_used(struct vhost_virtqueue *vq) > > { > > __virtio16 last_used_idx; > > int r; > > - if (!vq->private_data) { > > - vhost_disable_is_le(vq); > > - return 0; > > - } > > > > - vhost_enable_is_le(vq); > > + if (!vq->private_data) > > + return 0; > > > > r = vhost_update_used_flags(vq); > > if (r) > > L
Re: [PATCH 0/2] vhost: cross-endian code cleanup
On Wed, 13 Jan 2016 18:09:34 +0100 Greg Kurz <gk...@linux.vnet.ibm.com> wrote: > This series is a respin of the following patch: > > http://patchwork.ozlabs.org/patch/565921/ > > Patch 1 is preliminary work: it gives better names to the helpers that are > involved in cross-endian support. > > Patch 2 is actually a v2 of the original patch. All devices now call a > helper in the generic code, which DTRT according to vq->private_data, as > suggested by Michael. > Hi Michael, This is just a friendly reminder for this series, which was reviewed by Cornelia already. Thanks. -- Greg > --- > > Greg Kurz (2): > vhost: helpers to enable/disable vring endianness > vhost: disentangle vring endianness stuff from the core code > > > drivers/vhost/net.c |3 +++ > drivers/vhost/scsi.c |3 +++ > drivers/vhost/test.c |2 ++ > drivers/vhost/vhost.c | 40 > drivers/vhost/vhost.h |1 + > 5 files changed, 37 insertions(+), 12 deletions(-) > > ___ > Virtualization mailing list > virtualizat...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization >
Re: [PATCH] vhost: move is_le setup to the backend
On Thu, 12 Nov 2015 15:28:19 +0100 Greg Kurz <gk...@linux.vnet.ibm.com> wrote: > On Thu, 12 Nov 2015 15:46:30 +0200 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Fri, Oct 30, 2015 at 12:42:35PM +0100, Greg Kurz wrote: > > > The vq->is_le field is used to fix endianness when accessing the vring via > > > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases: > > > > > > 1) host is big endian and device is modern virtio > > > > > > 2) host has cross-endian support and device is legacy virtio with a > > > different > > >endianness than the host > > > > > > Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the > > > VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le > > > is only needed when the backend is active, it was decided to set it at > > > backend start. > > > > > > This is currently done in vhost_init_used()->vhost_init_is_le() but it > > > obfuscates the core vhost code. This patch moves the is_le setup to a > > > dedicated function that is called from the backend code. > > > > > > Note vhost_net is the only backend that can pass vq->private_data == NULL > > > to > > > vhost_init_used(), hence the "if (sock)" branch. > > > > > > No behaviour change. > > > > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > > > I plan to look at this next week, busy with QEMU 2.5 now. > > > > I don't have any deadline for this since this is only a cleanup tentative. > > Thanks. > ... but ping anyway since it was a month ago :) Cheers. -- Greg > > > --- > > > drivers/vhost/net.c |6 ++ > > > drivers/vhost/scsi.c |3 +++ > > > drivers/vhost/test.c |2 ++ > > > drivers/vhost/vhost.c | 12 +++- > > > drivers/vhost/vhost.h |1 + > > > 5 files changed, 19 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > > index 9eda69e40678..d6319cb2664c 100644 > > > --- a/drivers/vhost/net.c > > > +++ b/drivers/vhost/net.c > > > @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net > > > *n, unsigned index, int fd) > > > > > > vhost_net_disable_vq(n, vq); > > > vq->private_data = sock; > > > + > > > + if (sock) > > > + vhost_set_is_le(vq); > > > + else > > > + vq->is_le = virtio_legacy_is_little_endian(); > > > + > > > r = vhost_init_used(vq); > > > if (r) > > > goto err_used; > > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > > > index e25a23692822..e2644a301fa5 100644 > > > --- a/drivers/vhost/scsi.c > > > +++ b/drivers/vhost/scsi.c > > > @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > > > vq = >vqs[i].vq; > > > mutex_lock(>mutex); > > > vq->private_data = vs_tpg; > > > + > > > + vhost_set_is_le(vq); > > > + > > > vhost_init_used(vq); > > > mutex_unlock(>mutex); > > > } > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > > index f2882ac98726..b1c7df502211 100644 > > > --- a/drivers/vhost/test.c > > > +++ b/drivers/vhost/test.c > > > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int > > > test) > > > oldpriv = vq->private_data; > > > vq->private_data = priv; > > > > > > + vhost_set_is_le(vq); > > > + > > > r = vhost_init_used(>vqs[index]); > > > > > > mutex_unlock(>mutex); > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index eec2f11809ff..6be863dcbd13 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue > > > *vq) > > > } > > > #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ > > > > > > +void vhost_set_is_le(struct vhost_virtqueue *vq) > > > +{ > > > + vhost_init_is_le(vq); > > > +} > > > +EXPORT_SYMBOL_GPL(vhost_
Re: [PATCH] vhost: move is_le setup to the backend
On Fri, 30 Oct 2015 12:42:35 +0100 Greg Kurz <gk...@linux.vnet.ibm.com> wrote: > The vq->is_le field is used to fix endianness when accessing the vring via > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases: > > 1) host is big endian and device is modern virtio > > 2) host has cross-endian support and device is legacy virtio with a different >endianness than the host > > Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the > VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le > is only needed when the backend is active, it was decided to set it at > backend start. > > This is currently done in vhost_init_used()->vhost_init_is_le() but it > obfuscates the core vhost code. This patch moves the is_le setup to a > dedicated function that is called from the backend code. > > Note vhost_net is the only backend that can pass vq->private_data == NULL to > vhost_init_used(), hence the "if (sock)" branch. > > No behaviour change. > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > --- Ping ? > drivers/vhost/net.c |6 ++ > drivers/vhost/scsi.c |3 +++ > drivers/vhost/test.c |2 ++ > drivers/vhost/vhost.c | 12 +++- > drivers/vhost/vhost.h |1 + > 5 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 9eda69e40678..d6319cb2664c 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, > unsigned index, int fd) > > vhost_net_disable_vq(n, vq); > vq->private_data = sock; > + > + if (sock) > + vhost_set_is_le(vq); > + else > + vq->is_le = virtio_legacy_is_little_endian(); > + > r = vhost_init_used(vq); > if (r) > goto err_used; > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > index e25a23692822..e2644a301fa5 100644 > --- a/drivers/vhost/scsi.c > +++ b/drivers/vhost/scsi.c > @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > vq = >vqs[i].vq; > mutex_lock(>mutex); > vq->private_data = vs_tpg; > + > + vhost_set_is_le(vq); > + > vhost_init_used(vq); > mutex_unlock(>mutex); > } > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > index f2882ac98726..b1c7df502211 100644 > --- a/drivers/vhost/test.c > +++ b/drivers/vhost/test.c > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int test) > oldpriv = vq->private_data; > vq->private_data = priv; > > + vhost_set_is_le(vq); > + > r = vhost_init_used(>vqs[index]); > > mutex_unlock(>mutex); > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index eec2f11809ff..6be863dcbd13 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) > } > #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ > > +void vhost_set_is_le(struct vhost_virtqueue *vq) > +{ > + vhost_init_is_le(vq); > +} > +EXPORT_SYMBOL_GPL(vhost_set_is_le); > + > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, > poll_table *pt) > { > @@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq) > { > __virtio16 last_used_idx; > int r; > - if (!vq->private_data) { > - vq->is_le = virtio_legacy_is_little_endian(); > + if (!vq->private_data) > return 0; > - } > - > - vhost_init_is_le(vq); > > r = vhost_update_used_flags(vq); > if (r) > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 4772862b71a7..8a62041959fe 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -162,6 +162,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct > vhost_virtqueue *); > > int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, > unsigned int log_num, u64 len); > +void vhost_set_is_le(struct vhost_virtqueue *vq); > > #define vq_err(vq, fmt, ...) do { \ > pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: move is_le setup to the backend
On Thu, 12 Nov 2015 15:46:30 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Fri, Oct 30, 2015 at 12:42:35PM +0100, Greg Kurz wrote: > > The vq->is_le field is used to fix endianness when accessing the vring via > > the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases: > > > > 1) host is big endian and device is modern virtio > > > > 2) host has cross-endian support and device is legacy virtio with a > > different > >endianness than the host > > > > Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the > > VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le > > is only needed when the backend is active, it was decided to set it at > > backend start. > > > > This is currently done in vhost_init_used()->vhost_init_is_le() but it > > obfuscates the core vhost code. This patch moves the is_le setup to a > > dedicated function that is called from the backend code. > > > > Note vhost_net is the only backend that can pass vq->private_data == NULL to > > vhost_init_used(), hence the "if (sock)" branch. > > > > No behaviour change. > > > > Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> > > I plan to look at this next week, busy with QEMU 2.5 now. > I don't have any deadline for this since this is only a cleanup tentative. Thanks. > > --- > > drivers/vhost/net.c |6 ++ > > drivers/vhost/scsi.c |3 +++ > > drivers/vhost/test.c |2 ++ > > drivers/vhost/vhost.c | 12 +++- > > drivers/vhost/vhost.h |1 + > > 5 files changed, 19 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > index 9eda69e40678..d6319cb2664c 100644 > > --- a/drivers/vhost/net.c > > +++ b/drivers/vhost/net.c > > @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, > > unsigned index, int fd) > > > > vhost_net_disable_vq(n, vq); > > vq->private_data = sock; > > + > > + if (sock) > > + vhost_set_is_le(vq); > > + else > > + vq->is_le = virtio_legacy_is_little_endian(); > > + > > r = vhost_init_used(vq); > > if (r) > > goto err_used; > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > > index e25a23692822..e2644a301fa5 100644 > > --- a/drivers/vhost/scsi.c > > +++ b/drivers/vhost/scsi.c > > @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > > vq = >vqs[i].vq; > > mutex_lock(>mutex); > > vq->private_data = vs_tpg; > > + > > + vhost_set_is_le(vq); > > + > > vhost_init_used(vq); > > mutex_unlock(>mutex); > > } > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > index f2882ac98726..b1c7df502211 100644 > > --- a/drivers/vhost/test.c > > +++ b/drivers/vhost/test.c > > @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int > > test) > > oldpriv = vq->private_data; > > vq->private_data = priv; > > > > + vhost_set_is_le(vq); > > + > > r = vhost_init_used(>vqs[index]); > > > > mutex_unlock(>mutex); > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index eec2f11809ff..6be863dcbd13 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue > > *vq) > > } > > #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ > > > > +void vhost_set_is_le(struct vhost_virtqueue *vq) > > +{ > > + vhost_init_is_le(vq); > > +} > > +EXPORT_SYMBOL_GPL(vhost_set_is_le); > > + > > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, > > poll_table *pt) > > { > > @@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq) > > { > > __virtio16 last_used_idx; > > int r; > > - if (!vq->private_data) { > > - vq->is_le = virtio_legacy_is_little_endian(); > > + if (!vq->private_data) > > return 0; > > - } > > - > > - vhost_init_is_le(vq); > > > > r = vhost_update_used_flags(vq); > > i
[PATCH] vhost: move is_le setup to the backend
The vq->is_le field is used to fix endianness when accessing the vring via the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases: 1) host is big endian and device is modern virtio 2) host has cross-endian support and device is legacy virtio with a different endianness than the host Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le is only needed when the backend is active, it was decided to set it at backend start. This is currently done in vhost_init_used()->vhost_init_is_le() but it obfuscates the core vhost code. This patch moves the is_le setup to a dedicated function that is called from the backend code. Note vhost_net is the only backend that can pass vq->private_data == NULL to vhost_init_used(), hence the "if (sock)" branch. No behaviour change. Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com> --- drivers/vhost/net.c |6 ++ drivers/vhost/scsi.c |3 +++ drivers/vhost/test.c |2 ++ drivers/vhost/vhost.c | 12 +++- drivers/vhost/vhost.h |1 + 5 files changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9eda69e40678..d6319cb2664c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -917,6 +917,12 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) vhost_net_disable_vq(n, vq); vq->private_data = sock; + + if (sock) + vhost_set_is_le(vq); + else + vq->is_le = virtio_legacy_is_little_endian(); + r = vhost_init_used(vq); if (r) goto err_used; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index e25a23692822..e2644a301fa5 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1276,6 +1276,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, vq = >vqs[i].vq; mutex_lock(>mutex); vq->private_data = vs_tpg; + + vhost_set_is_le(vq); + vhost_init_used(vq); mutex_unlock(>mutex); } diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index f2882ac98726..b1c7df502211 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -196,6 +196,8 @@ static long vhost_test_run(struct vhost_test *n, int test) oldpriv = vq->private_data; vq->private_data = priv; + vhost_set_is_le(vq); + r = vhost_init_used(>vqs[index]); mutex_unlock(>mutex); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index eec2f11809ff..6be863dcbd13 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -113,6 +113,12 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) } #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ +void vhost_set_is_le(struct vhost_virtqueue *vq) +{ + vhost_init_is_le(vq); +} +EXPORT_SYMBOL_GPL(vhost_set_is_le); + static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { @@ -1156,12 +1162,8 @@ int vhost_init_used(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; - if (!vq->private_data) { - vq->is_le = virtio_legacy_is_little_endian(); + if (!vq->private_data) return 0; - } - - vhost_init_is_le(vq); r = vhost_update_used_flags(vq); if (r) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 4772862b71a7..8a62041959fe 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -162,6 +162,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *); int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, unsigned int log_num, u64 len); +void vhost_set_is_le(struct vhost_virtqueue *vq); #define vq_err(vq, fmt, ...) do { \ pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix performance on LE hosts
On Tue, 27 Oct 2015 11:37:39 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote: > commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian > support for legacy devices") introduced a minor regression: even with > cross-endian disabled, and even on LE host, vhost_is_little_endian is > checking is_le flag so there's always a branch. > > To fix, simply check virtio_legacy_is_little_endian first. > > Cc: Greg Kurz <gk...@linux.vnet.ibm.com> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Oops my bad for this oversight... Reviewed-by: Greg Kurz <gk...@linux.vnet.ibm.com> > --- > drivers/vhost/vhost.h | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 4772862..d3f7674 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -183,10 +183,17 @@ static inline bool vhost_has_feature(struct > vhost_virtqueue *vq, int bit) > return vq->acked_features & (1ULL << bit); > } > > +#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY > static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) > { > return vq->is_le; > } > +#else > +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) > +{ > + return virtio_legacy_is_little_endian() || vq->is_le; > +} > +#endif > > /* Memory accessors */ > static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Fix AF_PACKET ABI breakage in 4.2
On Thu, 24 Sep 2015 12:50:59 +0300 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Thu, Sep 24, 2015 at 09:25:45AM +0200, Greg Kurz wrote: > > On Wed, 23 Sep 2015 19:45:08 +0100 > > David Woodhouse <dw...@infradead.org> wrote: > > > > > Commit 7d82410950aa ("virtio: add explicit big-endian support to memory > > > accessors") accidentally changed the virtio_net header used by > > > AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian. > > > > > > > Hi David, > > > > Oops my bad... I obviously overlooked this one when adding cross-endian > > support. > > > > > Since virtio_legacy_is_little_endian() is a very long identifier, > > > define a VIO_LE macro and use that throughout the code instead of the > > > > VIO usually refers to virtual IO adapters for the PowerPC pSeries platform. > > > > > hard-coded 'false' for little-endian. > > > > > > > What about introducing dedicated accessors as it is done in many other > > locations where we do virtio byteswap ? Something like: > > > > static inline bool packet_is_little_endian(void) > > { > > return virtio_legacy_is_little_endian(); > > } > > > > static inline u16 packet16_to_cpu(__virtio16 val) > > { > > return __virtio16_to_cpu(packet_is_little_endian(), val); > > } > > > > static inline __virtio16 cpu_to_packet16(u16 val) > > { > > return __cpu_to_virtio16(packet_is_little_endian(), val); > > } > > > > It results in prettier code IMHO. Have a look at drivers/net/tun.c or > > drivers/vhost/vhost.c. > > > > > This restores the ABI to match 4.1 and earlier kernels, and makes my > > > test program work again. > > > > > > > BTW, there is still work to do if we want to support cross-endian legacy or > > virtio 1 on a big endian arch... > > > > Cheers. > > > > -- > > Greg > > It seems the API that we have is a confusing one. > > virtio endian-ness is either native or little, depending on a flag, so > __virtio16_to_cpu seems to mean "either native to cpu or little to cpu > depending on flag". > > It used to be like that, but not anymore. > > This leads to all kind of bugs. > > For example, I have only now realized vhost_is_little_endian isn't a > constant on LE hosts if cross-endian support is not compiled. > > I think we need to fix it, but also think about a better API. > Your original API is good and works well for all the callers that don't care for cross-endian support. I guess we'd rather move the cross-endian burden to the few callers who need it, i.e. tun, macvtap and vhost when cross-endian is compiled. More or less like this: /* Original API : either little-endian or native */ static inline u16 __virtio16_to_cpu(bool little_endian, __virtio16 val) { if (little_endian) return le16_to_cpu((__force __le16)val); else return (__force u16)val; } #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) { /* little-endian because of virtio 1 */ if (vhost_has_feature(vq, VIRTIO_F_VERSION_1)) return __virtio16_to_cpu(true, val); #ifdef __LITTLE_ENDIAN /* native little-endian */ return (__force u16)val; #else /* native big-endian */ return be16_to_cpu((__force __be16)val); #endif } #else static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) { #ifdef __LITTLE_ENDIAN /* fast path for little-endian host */ return __virtio16_to_cpu(true, val); #else /* slow path for big-endian host */ return __virtio16_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); #endif } #endif Does it make sense ? > > > > Signed-off-by: David Woodhouse <david.woodho...@intel.com> > > > --- > > > On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote: > > > > > +#define VIO_LE virtio_legacy_is_little_endian() > > > > > > > > When you define a shorthand macro, the defines to a function call, > > > > make the macro have parenthesis too. > > > > > > In which case I suppose it also wants to be lower-case. Although > > > "function call" is a bit strong since it's effectively just a constant. > > > I'm still wondering if it'd be nicer just to use (__force u16) instead. > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > index 7b8e39a..aa4b15c 100644 > > > --- a
Re: [PATCH v2] Fix AF_PACKET ABI breakage in 4.2
On Wed, 23 Sep 2015 19:45:08 +0100 David Woodhousewrote: > Commit 7d82410950aa ("virtio: add explicit big-endian support to memory > accessors") accidentally changed the virtio_net header used by > AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian. > Hi David, Oops my bad... I obviously overlooked this one when adding cross-endian support. > Since virtio_legacy_is_little_endian() is a very long identifier, > define a VIO_LE macro and use that throughout the code instead of the VIO usually refers to virtual IO adapters for the PowerPC pSeries platform. > hard-coded 'false' for little-endian. > What about introducing dedicated accessors as it is done in many other locations where we do virtio byteswap ? Something like: static inline bool packet_is_little_endian(void) { return virtio_legacy_is_little_endian(); } static inline u16 packet16_to_cpu(__virtio16 val) { return __virtio16_to_cpu(packet_is_little_endian(), val); } static inline __virtio16 cpu_to_packet16(u16 val) { return __cpu_to_virtio16(packet_is_little_endian(), val); } It results in prettier code IMHO. Have a look at drivers/net/tun.c or drivers/vhost/vhost.c. > This restores the ABI to match 4.1 and earlier kernels, and makes my > test program work again. > BTW, there is still work to do if we want to support cross-endian legacy or virtio 1 on a big endian arch... Cheers. -- Greg > Signed-off-by: David Woodhouse > --- > On Wed, 2015-09-23 at 11:09 -0700, David Miller wrote: > > > +#define VIO_LE virtio_legacy_is_little_endian() > > > > When you define a shorthand macro, the defines to a function call, > > make the macro have parenthesis too. > > In which case I suppose it also wants to be lower-case. Although > "function call" is a bit strong since it's effectively just a constant. > I'm still wondering if it'd be nicer just to use (__force u16) instead. > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 7b8e39a..aa4b15c 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -230,6 +230,8 @@ struct packet_skb_cb { > } sa; > }; > > +#define vio_le() virtio_legacy_is_little_endian() > + > #define PACKET_SKB_CB(__skb) ((struct packet_skb_cb *)((__skb)->cb)) > > #define GET_PBDQC_FROM_RB(x) ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc)) > @@ -2680,15 +2682,15 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > goto out_unlock; > > if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && > - (__virtio16_to_cpu(false, vnet_hdr.csum_start) + > - __virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 > > - __virtio16_to_cpu(false, vnet_hdr.hdr_len))) > - vnet_hdr.hdr_len = __cpu_to_virtio16(false, > - __virtio16_to_cpu(false, vnet_hdr.csum_start) + > - __virtio16_to_cpu(false, vnet_hdr.csum_offset) > + 2); > + (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) + > + __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 > > + __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len))) > + vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(), > + __virtio16_to_cpu(vio_le(), > vnet_hdr.csum_start) + > + __virtio16_to_cpu(vio_le(), > vnet_hdr.csum_offset) + 2); > > err = -EINVAL; > - if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len) > + if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len) > goto out_unlock; > > if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) { > @@ -2731,7 +2733,7 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > hlen = LL_RESERVED_SPACE(dev); > tlen = dev->needed_tailroom; > skb = packet_alloc_skb(sk, hlen + tlen, hlen, len, > -__virtio16_to_cpu(false, vnet_hdr.hdr_len), > +__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len), > msg->msg_flags & MSG_DONTWAIT, ); > if (skb == NULL) > goto out_unlock; > @@ -2778,8 +2780,8 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > > if (po->has_vnet_hdr) { > if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > - u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start); > - u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset); > + u16 s = __virtio16_to_cpu(vio_le(), > vnet_hdr.csum_start); > + u16 o = __virtio16_to_cpu(vio_le(), > vnet_hdr.csum_offset); > if (!skb_partial_csum_set(skb, s, o)) { > err = -EINVAL; >
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 09:49:05 +0200 Thomas Huth th...@redhat.com wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com Acked-by: Greg Kurz gk...@linux.vnet.ibm.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Thu, 2 Jul 2015 08:01:28 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote: On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote: virtio/vhost: cross endian support Ugh. Does this really have to be dynamic? Can't virtio do the sane thing, and just use a _fixed_ endianness? Doing a unconditional byte swap is faster and simpler than the crazy conditionals. That's true regardless of endianness, but gets to be even more so if the fixed endianness is little-endian, since BE is not-so-slowly fading from the world. Linus Yea, well - support for legacy BE guests on the new LE hosts is exactly the motivation for this. I dislike it too, but there are two redeeming properties that made me merge this: 1. It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. 2. Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY - and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce point (2): prevent people from enabling it e.g. on x86. I will look into this - but it can be done by a patch on top, so I think this can be merged as is. This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I am not aware of any other users. Maybe create a symbol that would be only selected by PPC_BOOK3S_64 ? Or do you know of someone using kernel with all config options enabled undiscriminately? Thanks, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html