Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-27 Thread Gavin Shan

On 3/27/24 17:42, Jason Wang wrote:

On Wed, Mar 27, 2024 at 3:35 PM Gavin Shan  wrote:


On 3/27/24 14:08, Gavin Shan wrote:

On 3/27/24 12:44, Jason Wang wrote:

On Wed, Mar 27, 2024 at 10:34 AM Jason Wang  wrote:

On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:


A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will Deacon . Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
-accel kvm -machine virt,gic-version=host -cpu host  \
-smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
-m 4096M,slots=16,maxmem=64G \
-object memory-backend-ram,id=mem0,size=4096M\
 :   \
-netdev tap,id=vnet0,vhost=true  \
-device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
 :
guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
should be safe until vq->avail_idx is changed by commit 275bf960ac697
("vhost: better detection of available buffers").

Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
Cc:  # v4.11+
Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
   drivers/vhost/vhost.c | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..00445ab172b3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
  r = vhost_get_avail_idx(vq, _idx);
  if (unlikely(r))
  return false;
+
  vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+   if (vq->avail_idx != vq->last_avail_idx) {
+   /* Similar to what's done in vhost_get_vq_desc(), we need
+* to ensure the available ring entries have been exposed
+* by guest.
+*/


We need to be more verbose here. For example, which load needs to be
ordered with which load.

The rmb in vhost_get_vq_desc() is used to order the load of avail idx
and the load of head. It is paired with e.g virtio_wmb() in
virtqueue_add_split().

vhost_vq_avail_empty() are mostly used as a hint in
vhost_net_busy_poll() which is under the protection of the vq mutex.

An exception is the tx_can_batch(), but in that case it doesn't even
want to read the head.


Ok, if it is needed only in that path, maybe we can move the barriers there.



[cc Will Deacon]

Jason, appreciate for your review and comments. I think PATCH[1/2] is
the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
only to see if our issue will disappear or not. However, the issue still
exists if PATCH[2/2] is missed.



Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with 
PATCH[2/2]
only and unable to hit the issue. However, PATCH[1/2] may be needed by other 
scenarios.
So it would be nice to fix them in one shoot.


Yes, see below.





Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
== false if I remember correctly. So the added smp_rmb() in 
vhost_vq_avail_empty()
is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
for the available index and available ring entry (head). For example, 
vhost_vq_avail_empty()
called by tx_can_batch() can see next available index, but its corresponding
available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
The next call to get_tx_bufs(), where the available ring entry (head) doesn't
arrived yet, leading to stale available ring entry (head) being fetched.

handle_tx_copy
  get_tx_bufs // smp_rmb() won't be executed when 
vq->avail_idx != vq->last_avail_idx
  tx_can_batch
vhost_vq_avail_empty  // vq->avail_idx is updated from 
vq->avail->idx

The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the 
function
is a exposed API, even it's only used by drivers/vhost/net.c at present. It 
means
the API has been broken internally. So it seems more appropriate to fix it up in
vhost_vq_avail_empty() so that the API's users needn't worry about the memory 
access
order.


When tx_can_batch returns true it means there's still pending tx
buffers. Since it might read indices so it still can bypass the
smp_rmb() in the vhost_get_vq_desc().

I'd suggest adding those above to change log.

With this,

Acked-by: Jason Wang 



Thanks, Jason. The change log has 

Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-27 Thread Gavin Shan

On 3/27/24 22:07, Michael S. Tsirkin wrote:

On Wed, Mar 27, 2024 at 09:38:45AM +1000, Gavin Shan wrote:

A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will Deacon . Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
   -accel kvm -machine virt,gic-version=host -cpu host  \
   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
   -m 4096M,slots=16,maxmem=64G \
   -object memory-backend-ram,id=mem0,size=4096M\
:   \
   -netdev tap,id=vnet0,vhost=true  \
   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
:
   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
   virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
should be safe until vq->avail_idx is changed by commit 275bf960ac697
("vhost: better detection of available buffers").

Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
Cc:  # v4.11+
Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  drivers/vhost/vhost.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..00445ab172b3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
r = vhost_get_avail_idx(vq, _idx);
if (unlikely(r))
return false;
+
vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+   if (vq->avail_idx != vq->last_avail_idx) {
+   /* Similar to what's done in vhost_get_vq_desc(), we need
+* to ensure the available ring entries have been exposed
+* by guest.
+*/


A slightly clearer comment:

/* Since we have updated avail_idx, the following call to
  * vhost_get_vq_desc will read available ring entries.
  * Make sure that read happens after the avail_idx read.
  */

Pls repost with that, and I will apply.

Also add suggested-by for will.



Sure, the suggested comments have been included to v3.




+   smp_rmb();
+   return false;
+   }
  
-	return vq->avail_idx == vq->last_avail_idx;

+   return true;
  }
  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


As a follow-up patch, we should clean out code duplication that
accumulated with 3 places reading avail idx in essentially
the same way - this duplication is what causes the mess in
the 1st place.



Yes, nice idea. I've added PATCH[v3 3/3] to improve vhost_get_avail_idx()
to handle the memory barrier since all the callers have the concern.

v3: 
https://lore.kernel.org/virtualization/20240328002149.1141302-1-gs...@redhat.com/

Thanks,
Gavin




Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-27 Thread Michael S. Tsirkin
On Wed, Mar 27, 2024 at 09:38:45AM +1000, Gavin Shan wrote:
> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> Will Deacon . Otherwise, it's not ensured the
> available ring entries pushed by guest can be observed by vhost
> in time, leading to stale available ring entries fetched by vhost
> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> grace-hopper (ARM64) platform.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
>   -accel kvm -machine virt,gic-version=host -cpu host  \
>   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>   -m 4096M,slots=16,maxmem=64G \
>   -object memory-backend-ram,id=mem0,size=4096M\
>:   \
>   -netdev tap,id=vnet0,vhost=true  \
>   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>:
>   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>   virtio_net virtio0: output.0:id 100 is not a head!
> 
> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> should be safe until vq->avail_idx is changed by commit 275bf960ac697
> ("vhost: better detection of available buffers").
> 
> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> Cc:  # v4.11+
> Reported-by: Yihuang Yu 
> Signed-off-by: Gavin Shan 
> ---
>  drivers/vhost/vhost.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..00445ab172b3 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, 
> struct vhost_virtqueue *vq)
>   r = vhost_get_avail_idx(vq, _idx);
>   if (unlikely(r))
>   return false;
> +
>   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> + if (vq->avail_idx != vq->last_avail_idx) {
> + /* Similar to what's done in vhost_get_vq_desc(), we need
> +  * to ensure the available ring entries have been exposed
> +  * by guest.
> +  */

A slightly clearer comment:

/* Since we have updated avail_idx, the following call to
 * vhost_get_vq_desc will read available ring entries.
 * Make sure that read happens after the avail_idx read.
 */

Pls repost with that, and I will apply.

Also add suggested-by for will.


> + smp_rmb();
> + return false;
> + }
>  
> - return vq->avail_idx == vq->last_avail_idx;
> + return true;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

As a follow-up patch, we should clean out code duplication that
accumulated with 3 places reading avail idx in essentially
the same way - this duplication is what causes the mess in
the 1st place.






> -- 
> 2.44.0




Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-27 Thread Jason Wang
On Wed, Mar 27, 2024 at 3:35 PM Gavin Shan  wrote:
>
> On 3/27/24 14:08, Gavin Shan wrote:
> > On 3/27/24 12:44, Jason Wang wrote:
> >> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang  wrote:
> >>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:
> 
>  A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>  Will Deacon . Otherwise, it's not ensured the
>  available ring entries pushed by guest can be observed by vhost
>  in time, leading to stale available ring entries fetched by vhost
>  in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>  grace-hopper (ARM64) platform.
> 
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
> -accel kvm -machine virt,gic-version=host -cpu host  \
> -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> -m 4096M,slots=16,maxmem=64G \
> -object memory-backend-ram,id=mem0,size=4096M\
>  :   \
> -netdev tap,id=vnet0,vhost=true  \
> -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>  :
> guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> virtio_net virtio0: output.0:id 100 is not a head!
> 
>  Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>  should be safe until vq->avail_idx is changed by commit 275bf960ac697
>  ("vhost: better detection of available buffers").
> 
>  Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>  Cc:  # v4.11+
>  Reported-by: Yihuang Yu 
>  Signed-off-by: Gavin Shan 
>  ---
>    drivers/vhost/vhost.c | 11 ++-
>    1 file changed, 10 insertions(+), 1 deletion(-)
> 
>  diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>  index 045f666b4f12..00445ab172b3 100644
>  --- a/drivers/vhost/vhost.c
>  +++ b/drivers/vhost/vhost.c
>  @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, 
>  struct vhost_virtqueue *vq)
>   r = vhost_get_avail_idx(vq, _idx);
>   if (unlikely(r))
>   return false;
>  +
>   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>  +   if (vq->avail_idx != vq->last_avail_idx) {
>  +   /* Similar to what's done in vhost_get_vq_desc(), we need
>  +* to ensure the available ring entries have been exposed
>  +* by guest.
>  +*/
> >>>
> >>> We need to be more verbose here. For example, which load needs to be
> >>> ordered with which load.
> >>>
> >>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
> >>> and the load of head. It is paired with e.g virtio_wmb() in
> >>> virtqueue_add_split().
> >>>
> >>> vhost_vq_avail_empty() are mostly used as a hint in
> >>> vhost_net_busy_poll() which is under the protection of the vq mutex.
> >>>
> >>> An exception is the tx_can_batch(), but in that case it doesn't even
> >>> want to read the head.
> >>
> >> Ok, if it is needed only in that path, maybe we can move the barriers 
> >> there.
> >>
> >
> > [cc Will Deacon]
> >
> > Jason, appreciate for your review and comments. I think PATCH[1/2] is
> > the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
> > it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
> > only to see if our issue will disappear or not. However, the issue still
> > exists if PATCH[2/2] is missed.
> >
>
> Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with 
> PATCH[2/2]
> only and unable to hit the issue. However, PATCH[1/2] may be needed by other 
> scenarios.
> So it would be nice to fix them in one shoot.

Yes, see below.

>
>
> > Firstly, We were failing on the transmit queue and {tvq, 
> > rvq}->busyloop_timeout
> > == false if I remember correctly. So the added smp_rmb() in 
> > vhost_vq_avail_empty()
> > is only a concern to tx_can_batch(). A mutex isn't enough to ensure the 
> > order
> > for the available index and available ring entry (head). For example, 
> > vhost_vq_avail_empty()
> > called by tx_can_batch() can see next available index, but its corresponding
> > available ring entry (head) may not be seen by vhost yet if smp_rmb() is 
> > missed.
> > The next call to get_tx_bufs(), where the available ring entry (head) 
> > doesn't
> > arrived yet, leading to stale available ring entry (head) being fetched.
> >
> >handle_tx_copy
> >  get_tx_bufs // smp_rmb() won't be executed when 
> > vq->avail_idx != vq->last_avail_idx
> >  tx_can_batch
> >vhost_vq_avail_empty  // vq->avail_idx is updated from 
> > vq->avail->idx
> >
> > The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the 
> > function
> > is a exposed API, even it's 

Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-27 Thread Gavin Shan

On 3/27/24 14:08, Gavin Shan wrote:

On 3/27/24 12:44, Jason Wang wrote:

On Wed, Mar 27, 2024 at 10:34 AM Jason Wang  wrote:

On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:


A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will Deacon . Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
   -accel kvm -machine virt,gic-version=host -cpu host  \
   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
   -m 4096M,slots=16,maxmem=64G \
   -object memory-backend-ram,id=mem0,size=4096M    \
    :   \
   -netdev tap,id=vnet0,vhost=true  \
   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
    :
   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
   virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
should be safe until vq->avail_idx is changed by commit 275bf960ac697
("vhost: better detection of available buffers").

Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
Cc:  # v4.11+
Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  drivers/vhost/vhost.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..00445ab172b3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
 r = vhost_get_avail_idx(vq, _idx);
 if (unlikely(r))
 return false;
+
 vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+   if (vq->avail_idx != vq->last_avail_idx) {
+   /* Similar to what's done in vhost_get_vq_desc(), we need
+    * to ensure the available ring entries have been exposed
+    * by guest.
+    */


We need to be more verbose here. For example, which load needs to be
ordered with which load.

The rmb in vhost_get_vq_desc() is used to order the load of avail idx
and the load of head. It is paired with e.g virtio_wmb() in
virtqueue_add_split().

vhost_vq_avail_empty() are mostly used as a hint in
vhost_net_busy_poll() which is under the protection of the vq mutex.

An exception is the tx_can_batch(), but in that case it doesn't even
want to read the head.


Ok, if it is needed only in that path, maybe we can move the barriers there.



[cc Will Deacon]

Jason, appreciate for your review and comments. I think PATCH[1/2] is
the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
only to see if our issue will disappear or not. However, the issue still
exists if PATCH[2/2] is missed.



Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with 
PATCH[2/2]
only and unable to hit the issue. However, PATCH[1/2] may be needed by other 
scenarios.
So it would be nice to fix them in one shoot.



Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
== false if I remember correctly. So the added smp_rmb() in 
vhost_vq_avail_empty()
is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
for the available index and available ring entry (head). For example, 
vhost_vq_avail_empty()
called by tx_can_batch() can see next available index, but its corresponding
available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
The next call to get_tx_bufs(), where the available ring entry (head) doesn't
arrived yet, leading to stale available ring entry (head) being fetched.

   handle_tx_copy
     get_tx_bufs // smp_rmb() won't be executed when vq->avail_idx 
!= vq->last_avail_idx
     tx_can_batch
   vhost_vq_avail_empty  // vq->avail_idx is updated from vq->avail->idx

The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the 
function
is a exposed API, even it's only used by drivers/vhost/net.c at present. It 
means
the API has been broken internally. So it seems more appropriate to fix it up in
vhost_vq_avail_empty() so that the API's users needn't worry about the memory 
access
order.





+   smp_rmb();
+   return false;
+   }

-   return vq->avail_idx == vq->last_avail_idx;
+   return true;
  }
  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


Thanks,
Gavin




Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-26 Thread Gavin Shan

On 3/27/24 12:44, Jason Wang wrote:

On Wed, Mar 27, 2024 at 10:34 AM Jason Wang  wrote:

On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:


A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will Deacon . Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
   -accel kvm -machine virt,gic-version=host -cpu host  \
   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
   -m 4096M,slots=16,maxmem=64G \
   -object memory-backend-ram,id=mem0,size=4096M\
:   \
   -netdev tap,id=vnet0,vhost=true  \
   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
:
   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
   virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
should be safe until vq->avail_idx is changed by commit 275bf960ac697
("vhost: better detection of available buffers").

Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
Cc:  # v4.11+
Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  drivers/vhost/vhost.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..00445ab172b3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
 r = vhost_get_avail_idx(vq, _idx);
 if (unlikely(r))
 return false;
+
 vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+   if (vq->avail_idx != vq->last_avail_idx) {
+   /* Similar to what's done in vhost_get_vq_desc(), we need
+* to ensure the available ring entries have been exposed
+* by guest.
+*/


We need to be more verbose here. For example, which load needs to be
ordered with which load.

The rmb in vhost_get_vq_desc() is used to order the load of avail idx
and the load of head. It is paired with e.g virtio_wmb() in
virtqueue_add_split().

vhost_vq_avail_empty() are mostly used as a hint in
vhost_net_busy_poll() which is under the protection of the vq mutex.

An exception is the tx_can_batch(), but in that case it doesn't even
want to read the head.


Ok, if it is needed only in that path, maybe we can move the barriers there.



[cc Will Deacon]

Jason, appreciate for your review and comments. I think PATCH[1/2] is
the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
only to see if our issue will disappear or not. However, the issue still
exists if PATCH[2/2] is missed.

Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
== false if I remember correctly. So the added smp_rmb() in 
vhost_vq_avail_empty()
is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
for the available index and available ring entry (head). For example, 
vhost_vq_avail_empty()
called by tx_can_batch() can see next available index, but its corresponding
available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
The next call to get_tx_bufs(), where the available ring entry (head) doesn't
arrived yet, leading to stale available ring entry (head) being fetched.

  handle_tx_copy
get_tx_bufs // smp_rmb() won't be executed when vq->avail_idx 
!= vq->last_avail_idx
tx_can_batch
  vhost_vq_avail_empty  // vq->avail_idx is updated from vq->avail->idx

The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the 
function
is a exposed API, even it's only used by drivers/vhost/net.c at present. It 
means
the API has been broken internally. So it seems more appropriate to fix it up in
vhost_vq_avail_empty() so that the API's users needn't worry about the memory 
access
order.





+   smp_rmb();
+   return false;
+   }

-   return vq->avail_idx == vq->last_avail_idx;
+   return true;
  }
  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


Thanks,
Gavin




Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 10:34 AM Jason Wang  wrote:
>
> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:
> >
> > A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> > Will Deacon . Otherwise, it's not ensured the
> > available ring entries pushed by guest can be observed by vhost
> > in time, leading to stale available ring entries fetched by vhost
> > in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> > grace-hopper (ARM64) platform.
> >
> >   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
> >   -accel kvm -machine virt,gic-version=host -cpu host  \
> >   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> >   -m 4096M,slots=16,maxmem=64G \
> >   -object memory-backend-ram,id=mem0,size=4096M\
> >:   \
> >   -netdev tap,id=vnet0,vhost=true  \
> >   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
> >:
> >   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> >   virtio_net virtio0: output.0:id 100 is not a head!
> >
> > Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> > should be safe until vq->avail_idx is changed by commit 275bf960ac697
> > ("vhost: better detection of available buffers").
> >
> > Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> > Cc:  # v4.11+
> > Reported-by: Yihuang Yu 
> > Signed-off-by: Gavin Shan 
> > ---
> >  drivers/vhost/vhost.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 045f666b4f12..00445ab172b3 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, 
> > struct vhost_virtqueue *vq)
> > r = vhost_get_avail_idx(vq, _idx);
> > if (unlikely(r))
> > return false;
> > +
> > vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> > +   if (vq->avail_idx != vq->last_avail_idx) {
> > +   /* Similar to what's done in vhost_get_vq_desc(), we need
> > +* to ensure the available ring entries have been exposed
> > +* by guest.
> > +*/
>
> We need to be more verbose here. For example, which load needs to be
> ordered with which load.
>
> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
> and the load of head. It is paired with e.g virtio_wmb() in
> virtqueue_add_split().
>
> vhost_vq_avail_empty() are mostly used as a hint in
> vhost_net_busy_poll() which is under the protection of the vq mutex.
>
> An exception is the tx_can_batch(), but in that case it doesn't even
> want to read the head.

Ok, if it is needed only in that path, maybe we can move the barriers there.

Thanks

>
> Thanks
>
>
> > +   smp_rmb();
> > +   return false;
> > +   }
> >
> > -   return vq->avail_idx == vq->last_avail_idx;
> > +   return true;
> >  }
> >  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
> >
> > --
> > 2.44.0
> >




Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-03-26 Thread Jason Wang
On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan  wrote:
>
> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> Will Deacon . Otherwise, it's not ensured the
> available ring entries pushed by guest can be observed by vhost
> in time, leading to stale available ring entries fetched by vhost
> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> grace-hopper (ARM64) platform.
>
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64  \
>   -accel kvm -machine virt,gic-version=host -cpu host  \
>   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>   -m 4096M,slots=16,maxmem=64G \
>   -object memory-backend-ram,id=mem0,size=4096M\
>:   \
>   -netdev tap,id=vnet0,vhost=true  \
>   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>:
>   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>   virtio_net virtio0: output.0:id 100 is not a head!
>
> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> should be safe until vq->avail_idx is changed by commit 275bf960ac697
> ("vhost: better detection of available buffers").
>
> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> Cc:  # v4.11+
> Reported-by: Yihuang Yu 
> Signed-off-by: Gavin Shan 
> ---
>  drivers/vhost/vhost.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..00445ab172b3 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, 
> struct vhost_virtqueue *vq)
> r = vhost_get_avail_idx(vq, _idx);
> if (unlikely(r))
> return false;
> +
> vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> +   if (vq->avail_idx != vq->last_avail_idx) {
> +   /* Similar to what's done in vhost_get_vq_desc(), we need
> +* to ensure the available ring entries have been exposed
> +* by guest.
> +*/

We need to be more verbose here. For example, which load needs to be
ordered with which load.

The rmb in vhost_get_vq_desc() is used to order the load of avail idx
and the load of head. It is paired with e.g virtio_wmb() in
virtqueue_add_split().

vhost_vq_avail_empty() are mostly used as a hint in
vhost_net_busy_poll() which is under the protection of the vq mutex.

An exception is the tx_can_batch(), but in that case it doesn't even
want to read the head.

Thanks


> +   smp_rmb();
> +   return false;
> +   }
>
> -   return vq->avail_idx == vq->last_avail_idx;
> +   return true;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
>
> --
> 2.44.0
>