Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-14 Thread Lee Jones
On Mon, 14 Mar 2022, Michael S. Tsirkin wrote:

> On Mon, Mar 14, 2022 at 08:43:02AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Also WARN() as a precautionary measure.  The purpose of this is to
> > capture possible future race conditions which may pop up over time.
> > 
> > Cc: 
> > Signed-off-by: Lee Jones 
> 
> Pls refer to my previous responses to this patch.  I'd like to see an
> argument for why this will make future bugs less and not more likely.

If you think the previous 'check owner' patch fixes all of the
concurrency issues, then this patch can be dropped.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-14 Thread Michael S. Tsirkin
On Mon, Mar 14, 2022 at 08:43:02AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do here is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Also WARN() as a precautionary measure.  The purpose of this is to
> capture possible future race conditions which may pop up over time.
> 
> Cc: 
> Signed-off-by: Lee Jones 

Pls refer to my previous responses to this patch.  I'd like to see an
argument for why this will make future bugs less and not more likely.


> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..bbaff6a5e21b8 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + mutex_lock(>vqs[i]->mutex);
>   if (dev->vqs[i]->error_ctx)
>   eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   if (dev->vqs[i]->kick)
> @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   if (dev->vqs[i]->call_ctx.ctx)
>   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
>   vhost_vq_reset(dev, dev->vqs[i]);
> + mutex_unlock(>vqs[i]->mutex);
>   }
>   vhost_dev_free_iovecs(dev);
>   if (dev->log_ctx)
> -- 
> 2.35.1.723.g4982287a31-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-14 Thread Greg KH
On Mon, Mar 14, 2022 at 08:43:02AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do here is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Also WARN() as a precautionary measure.  The purpose of this is to
> capture possible future race conditions which may pop up over time.

These two sentances do not match your actual patch :(

> Cc: 
> Signed-off-by: Lee Jones 

What commit caused this problem?  Can you add a Fixes: line as well for
this?

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-14 Thread Lee Jones
vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc().  All we have to do here is take the same lock
during virtqueue clean-up and we mitigate the reported issues.

Also WARN() as a precautionary measure.  The purpose of this is to
capture possible future race conditions which may pop up over time.

Cc: 
Signed-off-by: Lee Jones 
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..bbaff6a5e21b8 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
int i;
 
for (i = 0; i < dev->nvqs; ++i) {
+   mutex_lock(>vqs[i]->mutex);
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
@@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
if (dev->vqs[i]->call_ctx.ctx)
eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
+   mutex_unlock(>vqs[i]->mutex);
}
vhost_dev_free_iovecs(dev);
if (dev->log_ctx)
-- 
2.35.1.723.g4982287a31-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-09 Thread Leon Romanovsky
On Tue, Mar 08, 2022 at 09:57:57AM +0100, Greg KH wrote:
> On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > On Mon, 07 Mar 2022, Greg KH wrote:
> > 
> > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > 
> > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > capture possible future race conditions which may pop up over time.
> > > > 
> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > 
> > > > Cc: 
> > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > Signed-off-by: Lee Jones 
> > > > ---
> > > >  drivers/vhost/vhost.c | 10 ++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > int i;
> > > >  
> > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > +   /* No workers should run here by design. However, races 
> > > > have
> > > > +* previously occurred where drivers have been unable 
> > > > to flush
> > > > +* all work properly prior to clean-up.  Without a 
> > > > successful
> > > > +* flush the guest will malfunction, but avoiding host 
> > > > memory
> > > > +* corruption in those cases does seem preferable.
> > > > +*/
> > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > 
> > > So you are trading one syzbot triggered issue for another one in the
> > > future?  :)
> > > 
> > > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > you want that to happen?
> > 
> > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> 
> Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> has resulted in a huge number of kernel fixes because of that.
> 
> > > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > > You still have a race...
> > 
> > No, we miss a warning that one time.  Memory is still protected.
> 
> Then don't warn on something that doesn't matter.  This line can be
> dropped as there's nothing anyone can do about it, right?

Greg, at least two other reviewers said that this line shouldn't be at
all.

https://lore.kernel.org/all/cacgkmesjmcnqpjxpjxl0wufbmg8arnumep4yjuxqznmkr1n...@mail.gmail.com/
https://lore.kernel.org/all/YiG61RqXFvq%2Ft0fB@unreal/
https://lore.kernel.org/all/YiETnIcfZCLb63oB@unreal/

Thanks

> 
> thanks,
> 
> greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 01:17:03PM +, Lee Jones wrote:
> On Tue, 08 Mar 2022, Michael S. Tsirkin wrote:
> 
> > On Tue, Mar 08, 2022 at 12:45:19PM +0100, Greg KH wrote:
> > > On Tue, Mar 08, 2022 at 05:55:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
> > > > > On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > > > > > On Tue, 08 Mar 2022, Greg KH wrote:
> > > > > > 
> > > > > > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > > > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > > > > > 
> > > > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during 
> > > > > > > > > > its call
> > > > > > > > > > to vhost_get_vq_desc().  All we have to do here is take the 
> > > > > > > > > > same lock
> > > > > > > > > > during virtqueue clean-up and we mitigate the reported 
> > > > > > > > > > issues.
> > > > > > > > > > 
> > > > > > > > > > Also WARN() as a precautionary measure.  The purpose of 
> > > > > > > > > > this is to
> > > > > > > > > > capture possible future race conditions which may pop up 
> > > > > > > > > > over time.
> > > > > > > > > > 
> > > > > > > > > > Link: 
> > > > > > > > > > https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > > > > > 
> > > > > > > > > > Cc: 
> > > > > > > > > > Reported-by: 
> > > > > > > > > > syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > > > > Signed-off-by: Lee Jones 
> > > > > > > > > > ---
> > > > > > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct 
> > > > > > > > > > vhost_dev *dev)
> > > > > > > > > > int i;
> > > > > > > > > >  
> > > > > > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > > > > +   /* No workers should run here by design. 
> > > > > > > > > > However, races have
> > > > > > > > > > +* previously occurred where drivers have been 
> > > > > > > > > > unable to flush
> > > > > > > > > > +* all work properly prior to clean-up.  
> > > > > > > > > > Without a successful
> > > > > > > > > > +* flush the guest will malfunction, but 
> > > > > > > > > > avoiding host memory
> > > > > > > > > > +* corruption in those cases does seem 
> > > > > > > > > > preferable.
> > > > > > > > > > +*/
> > > > > > > > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > > > > > 
> > > > > > > > > So you are trading one syzbot triggered issue for another one 
> > > > > > > > > in the
> > > > > > > > > future?  :)
> > > > > > > > > 
> > > > > > > > > If this ever can happen, handle it, but don't log it with a 
> > > > > > > > > WARN_ON() as
> > > > > > > > > that will trigger the panic-on-warn boxes, as well as syzbot. 
> > > > > > > > >  Unless
> > > > > > > > > you want that to happen?
> > > > > > > > 
> > > > > > > > No, Syzbot doesn't report warnings, only BUGs and memory 
> > > > > > > > corruption.
> > > > > > > 
> > > > > > > Has it changed?  Last I looked, it did trigger on WARN_* calls, 
> > > > > > > which
> > > > > > > has resulted in a huge number of kernel fixes because of that.
> > > > > > 
> > > > > > Everything is customisable in syzkaller, so maybe there are specific
> > > > > > builds which panic_on_warn enabled, but none that I'm involved with
> > > > > > do.
> > > > > 
> > > > > Many systems run with panic-on-warn (i.e. the cloud), as they want to
> > > > > drop a box and restart it if anything goes wrong.
> > > > > 
> > > > > That's why syzbot reports on WARN_* calls.  They should never be
> > > > > reachable by userspace actions.
> > > > > 
> > > > > > Here follows a topical example.  The report above in the Link: tag
> > > > > > comes with a crashlog [0].  In there you can see the WARN() at the
> > > > > > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > > > > > (non-flushed) worker list, before finally tripping the BUG() which
> > > > > > triggers the report:
> > > > > > 
> > > > > > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> > > > > 
> > > > > Ok, so both happens here.  But don't add a warning for something that
> > > > > can't happen.  Just handle it and move on.  It looks like you are
> > > > > handling it in this code, so please drop the WARN_ON().
> > > > > 
> > > > > thanks,
> > > > > 
> > > > > greg k-h
> > > > 
> > > > Hmm. Well this will mean if we ever reintroduce the bug then
> > > > syzkaller will not catch it for us :( And the bug is there,
> > > > it just results in a hard to reproduce error 

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Tue, 08 Mar 2022, Michael S. Tsirkin wrote:

> On Tue, Mar 08, 2022 at 12:45:19PM +0100, Greg KH wrote:
> > On Tue, Mar 08, 2022 at 05:55:58AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
> > > > On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > > > > On Tue, 08 Mar 2022, Greg KH wrote:
> > > > > 
> > > > > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > > > > 
> > > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during 
> > > > > > > > > its call
> > > > > > > > > to vhost_get_vq_desc().  All we have to do here is take the 
> > > > > > > > > same lock
> > > > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > > > > 
> > > > > > > > > Also WARN() as a precautionary measure.  The purpose of this 
> > > > > > > > > is to
> > > > > > > > > capture possible future race conditions which may pop up over 
> > > > > > > > > time.
> > > > > > > > > 
> > > > > > > > > Link: 
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > > > > 
> > > > > > > > > Cc: 
> > > > > > > > > Reported-by: 
> > > > > > > > > syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > > > Signed-off-by: Lee Jones 
> > > > > > > > > ---
> > > > > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev 
> > > > > > > > > *dev)
> > > > > > > > >   int i;
> > > > > > > > >  
> > > > > > > > >   for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > > > + /* No workers should run here by design. 
> > > > > > > > > However, races have
> > > > > > > > > +  * previously occurred where drivers have been 
> > > > > > > > > unable to flush
> > > > > > > > > +  * all work properly prior to clean-up.  
> > > > > > > > > Without a successful
> > > > > > > > > +  * flush the guest will malfunction, but 
> > > > > > > > > avoiding host memory
> > > > > > > > > +  * corruption in those cases does seem 
> > > > > > > > > preferable.
> > > > > > > > > +  */
> > > > > > > > > + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > > > > 
> > > > > > > > So you are trading one syzbot triggered issue for another one 
> > > > > > > > in the
> > > > > > > > future?  :)
> > > > > > > > 
> > > > > > > > If this ever can happen, handle it, but don't log it with a 
> > > > > > > > WARN_ON() as
> > > > > > > > that will trigger the panic-on-warn boxes, as well as syzbot.  
> > > > > > > > Unless
> > > > > > > > you want that to happen?
> > > > > > > 
> > > > > > > No, Syzbot doesn't report warnings, only BUGs and memory 
> > > > > > > corruption.
> > > > > > 
> > > > > > Has it changed?  Last I looked, it did trigger on WARN_* calls, 
> > > > > > which
> > > > > > has resulted in a huge number of kernel fixes because of that.
> > > > > 
> > > > > Everything is customisable in syzkaller, so maybe there are specific
> > > > > builds which panic_on_warn enabled, but none that I'm involved with
> > > > > do.
> > > > 
> > > > Many systems run with panic-on-warn (i.e. the cloud), as they want to
> > > > drop a box and restart it if anything goes wrong.
> > > > 
> > > > That's why syzbot reports on WARN_* calls.  They should never be
> > > > reachable by userspace actions.
> > > > 
> > > > > Here follows a topical example.  The report above in the Link: tag
> > > > > comes with a crashlog [0].  In there you can see the WARN() at the
> > > > > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > > > > (non-flushed) worker list, before finally tripping the BUG() which
> > > > > triggers the report:
> > > > > 
> > > > > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> > > > 
> > > > Ok, so both happens here.  But don't add a warning for something that
> > > > can't happen.  Just handle it and move on.  It looks like you are
> > > > handling it in this code, so please drop the WARN_ON().
> > > > 
> > > > thanks,
> > > > 
> > > > greg k-h
> > > 
> > > Hmm. Well this will mean if we ever reintroduce the bug then
> > > syzkaller will not catch it for us :( And the bug is there,
> > > it just results in a hard to reproduce error for userspace.
> > 
> > Is this an error you can recover from in the kernel?
> >  What is userspace
> > supposed to know with this information when it sees it?
> 
> IIUC we are talking about a use after free here since we somehow
> managed to have a pointer to the 

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 12:45:19PM +0100, Greg KH wrote:
> On Tue, Mar 08, 2022 at 05:55:58AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
> > > On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > > > On Tue, 08 Mar 2022, Greg KH wrote:
> > > > 
> > > > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > > > 
> > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its 
> > > > > > > > call
> > > > > > > > to vhost_get_vq_desc().  All we have to do here is take the 
> > > > > > > > same lock
> > > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > > > 
> > > > > > > > Also WARN() as a precautionary measure.  The purpose of this is 
> > > > > > > > to
> > > > > > > > capture possible future race conditions which may pop up over 
> > > > > > > > time.
> > > > > > > > 
> > > > > > > > Link: 
> > > > > > > > https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > > > 
> > > > > > > > Cc: 
> > > > > > > > Reported-by: 
> > > > > > > > syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > > Signed-off-by: Lee Jones 
> > > > > > > > ---
> > > > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev 
> > > > > > > > *dev)
> > > > > > > > int i;
> > > > > > > >  
> > > > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > > +   /* No workers should run here by design. 
> > > > > > > > However, races have
> > > > > > > > +* previously occurred where drivers have been 
> > > > > > > > unable to flush
> > > > > > > > +* all work properly prior to clean-up.  
> > > > > > > > Without a successful
> > > > > > > > +* flush the guest will malfunction, but 
> > > > > > > > avoiding host memory
> > > > > > > > +* corruption in those cases does seem 
> > > > > > > > preferable.
> > > > > > > > +*/
> > > > > > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > > > 
> > > > > > > So you are trading one syzbot triggered issue for another one in 
> > > > > > > the
> > > > > > > future?  :)
> > > > > > > 
> > > > > > > If this ever can happen, handle it, but don't log it with a 
> > > > > > > WARN_ON() as
> > > > > > > that will trigger the panic-on-warn boxes, as well as syzbot.  
> > > > > > > Unless
> > > > > > > you want that to happen?
> > > > > > 
> > > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> > > > > 
> > > > > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > > > > has resulted in a huge number of kernel fixes because of that.
> > > > 
> > > > Everything is customisable in syzkaller, so maybe there are specific
> > > > builds which panic_on_warn enabled, but none that I'm involved with
> > > > do.
> > > 
> > > Many systems run with panic-on-warn (i.e. the cloud), as they want to
> > > drop a box and restart it if anything goes wrong.
> > > 
> > > That's why syzbot reports on WARN_* calls.  They should never be
> > > reachable by userspace actions.
> > > 
> > > > Here follows a topical example.  The report above in the Link: tag
> > > > comes with a crashlog [0].  In there you can see the WARN() at the
> > > > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > > > (non-flushed) worker list, before finally tripping the BUG() which
> > > > triggers the report:
> > > > 
> > > > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> > > 
> > > Ok, so both happens here.  But don't add a warning for something that
> > > can't happen.  Just handle it and move on.  It looks like you are
> > > handling it in this code, so please drop the WARN_ON().
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > Hmm. Well this will mean if we ever reintroduce the bug then
> > syzkaller will not catch it for us :( And the bug is there,
> > it just results in a hard to reproduce error for userspace.
> 
> Is this an error you can recover from in the kernel?
>  What is userspace
> supposed to know with this information when it sees it?

IIUC we are talking about a use after free here since we somehow
managed to have a pointer to the device in a worker while
device is being destroyed.

That's the point of the warning as use after free is hard to debug. You
ask can we recover from a use after free? 

As regards to the added lock, IIUC it kind of shifts the use after free
window to later and since we 

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Greg KH
On Tue, Mar 08, 2022 at 05:55:58AM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
> > On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > > On Tue, 08 Mar 2022, Greg KH wrote:
> > > 
> > > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > > 
> > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its 
> > > > > > > call
> > > > > > > to vhost_get_vq_desc().  All we have to do here is take the same 
> > > > > > > lock
> > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > > 
> > > > > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > > > > capture possible future race conditions which may pop up over 
> > > > > > > time.
> > > > > > > 
> > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > > 
> > > > > > > Cc: 
> > > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > Signed-off-by: Lee Jones 
> > > > > > > ---
> > > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > >   int i;
> > > > > > >  
> > > > > > >   for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > + /* No workers should run here by design. However, races 
> > > > > > > have
> > > > > > > +  * previously occurred where drivers have been unable 
> > > > > > > to flush
> > > > > > > +  * all work properly prior to clean-up.  Without a 
> > > > > > > successful
> > > > > > > +  * flush the guest will malfunction, but avoiding host 
> > > > > > > memory
> > > > > > > +  * corruption in those cases does seem preferable.
> > > > > > > +  */
> > > > > > > + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > > 
> > > > > > So you are trading one syzbot triggered issue for another one in the
> > > > > > future?  :)
> > > > > > 
> > > > > > If this ever can happen, handle it, but don't log it with a 
> > > > > > WARN_ON() as
> > > > > > that will trigger the panic-on-warn boxes, as well as syzbot.  
> > > > > > Unless
> > > > > > you want that to happen?
> > > > > 
> > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> > > > 
> > > > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > > > has resulted in a huge number of kernel fixes because of that.
> > > 
> > > Everything is customisable in syzkaller, so maybe there are specific
> > > builds which panic_on_warn enabled, but none that I'm involved with
> > > do.
> > 
> > Many systems run with panic-on-warn (i.e. the cloud), as they want to
> > drop a box and restart it if anything goes wrong.
> > 
> > That's why syzbot reports on WARN_* calls.  They should never be
> > reachable by userspace actions.
> > 
> > > Here follows a topical example.  The report above in the Link: tag
> > > comes with a crashlog [0].  In there you can see the WARN() at the
> > > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > > (non-flushed) worker list, before finally tripping the BUG() which
> > > triggers the report:
> > > 
> > > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> > 
> > Ok, so both happens here.  But don't add a warning for something that
> > can't happen.  Just handle it and move on.  It looks like you are
> > handling it in this code, so please drop the WARN_ON().
> > 
> > thanks,
> > 
> > greg k-h
> 
> Hmm. Well this will mean if we ever reintroduce the bug then
> syzkaller will not catch it for us :( And the bug is there,
> it just results in a hard to reproduce error for userspace.

Is this an error you can recover from in the kernel?  What is userspace
supposed to know with this information when it sees it?

> Not sure what to do here. Export panic_on_warn flag to modules
> and check it here?

Hah, no, never do that :)

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 08:01:32AM +, Lee Jones wrote:
> On Mon, 07 Mar 2022, Michael S. Tsirkin wrote:
> 
> > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Pls just basically copy the code comment here. this is just confuses.
> > 
> > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > capture possible future race conditions which may pop up over time.
> > > 
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > And this is a bug we already fixed, right?
> 
> Well, this was the bug I set out to fix.
> 
> I didn't know your patch was in flight at the time.
> 
> > > Cc: 
> > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > 
> > not really applicable anymore ...
> 
> I can remove these if it helps.

yes let's do that pls.

> -- 
> Lee Jones [李琼斯]
> Principal Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 08:08:25AM +, Lee Jones wrote:
> On Tue, 08 Mar 2022, Jason Wang wrote:
> 
> > On Tue, Mar 8, 2022 at 3:18 AM Lee Jones  wrote:
> > >
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > >
> > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > capture possible future race conditions which may pop up over time.
> > >
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > >
> > > Cc: 
> > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > Signed-off-by: Lee Jones 
> > > ---
> > >  drivers/vhost/vhost.c | 10 ++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > int i;
> > >
> > > for (i = 0; i < dev->nvqs; ++i) {
> > > +   /* No workers should run here by design. However, races 
> > > have
> > > +* previously occurred where drivers have been unable to 
> > > flush
> > > +* all work properly prior to clean-up.  Without a 
> > > successful
> > > +* flush the guest will malfunction, but avoiding host 
> > > memory
> > > +* corruption in those cases does seem preferable.
> > > +*/
> > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > +
> > 
> > I don't get how this can help, the mutex could be grabbed in the
> > middle of the above and below line.
> 
> The worst that happens in this slim scenario is we miss a warning.
> The mutexes below will still function as expected and prevent possible
> memory corruption.

maybe. or maybe corruption already happened and this is the
fallout.

> > > +   mutex_lock(>vqs[i]->mutex);
> > > if (dev->vqs[i]->error_ctx)
> > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > if (dev->vqs[i]->kick)
> > > @@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > if (dev->vqs[i]->call_ctx.ctx)
> > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > vhost_vq_reset(dev, dev->vqs[i]);
> > > +   mutex_unlock(>vqs[i]->mutex);
> > > }
> > 
> > I'm not sure it's correct to assume some behaviour of a buggy device.
> > For the device mutex, we use that to protect more than just err/call
> > and vq.
> 
> When I authored this, I did so as *the* fix.  However, since the cause
> of today's crash has now been patched, this has become a belt and
> braces solution.  Michael's addition of the WARN() also has the
> benefit of providing us with an early warning system for future
> breakages.  Personally, I think it's kinda neat.
> 
> -- 
> Lee Jones [李琼斯]
> Principal Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 09:57:57AM +0100, Greg KH wrote:
> > > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > > You still have a race...
> > 
> > No, we miss a warning that one time.  Memory is still protected.
> 
> Then don't warn on something that doesn't matter.  This line can be
> dropped as there's nothing anyone can do about it, right?

I mean, the reason I wanted the warning is because there's a kernel
bug, and it will break userspace. warning is just telling us this.
is the bug reacheable from userspace? if we knew that we won't
need the lock ...

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
> On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > On Tue, 08 Mar 2022, Greg KH wrote:
> > 
> > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > 
> > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do here is take the same 
> > > > > > lock
> > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > 
> > > > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > > > capture possible future race conditions which may pop up over time.
> > > > > > 
> > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > 
> > > > > > Cc: 
> > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones 
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > int i;
> > > > > >  
> > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +   /* No workers should run here by design. However, races 
> > > > > > have
> > > > > > +* previously occurred where drivers have been unable 
> > > > > > to flush
> > > > > > +* all work properly prior to clean-up.  Without a 
> > > > > > successful
> > > > > > +* flush the guest will malfunction, but avoiding host 
> > > > > > memory
> > > > > > +* corruption in those cases does seem preferable.
> > > > > > +*/
> > > > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > 
> > > > > So you are trading one syzbot triggered issue for another one in the
> > > > > future?  :)
> > > > > 
> > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() 
> > > > > as
> > > > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > > > you want that to happen?
> > > > 
> > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> > > 
> > > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > > has resulted in a huge number of kernel fixes because of that.
> > 
> > Everything is customisable in syzkaller, so maybe there are specific
> > builds which panic_on_warn enabled, but none that I'm involved with
> > do.
> 
> Many systems run with panic-on-warn (i.e. the cloud), as they want to
> drop a box and restart it if anything goes wrong.
> 
> That's why syzbot reports on WARN_* calls.  They should never be
> reachable by userspace actions.
> 
> > Here follows a topical example.  The report above in the Link: tag
> > comes with a crashlog [0].  In there you can see the WARN() at the
> > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > (non-flushed) worker list, before finally tripping the BUG() which
> > triggers the report:
> > 
> > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> 
> Ok, so both happens here.  But don't add a warning for something that
> can't happen.  Just handle it and move on.  It looks like you are
> handling it in this code, so please drop the WARN_ON().
> 
> thanks,
> 
> greg k-h

Hmm. Well this will mean if we ever reintroduce the bug then
syzkaller will not catch it for us :( And the bug is there,
it just results in a hard to reproduce error for userspace.

Not sure what to do here. Export panic_on_warn flag to modules
and check it here?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Tue, 08 Mar 2022, Greg KH wrote:

> On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> > On Tue, 08 Mar 2022, Greg KH wrote:
> > 
> > > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > 
> > > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do here is take the same 
> > > > > > lock
> > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > 
> > > > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > > > capture possible future race conditions which may pop up over time.
> > > > > > 
> > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > 
> > > > > > Cc: 
> > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones 
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 10 ++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > int i;
> > > > > >  
> > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +   /* No workers should run here by design. However, races 
> > > > > > have
> > > > > > +* previously occurred where drivers have been unable 
> > > > > > to flush
> > > > > > +* all work properly prior to clean-up.  Without a 
> > > > > > successful
> > > > > > +* flush the guest will malfunction, but avoiding host 
> > > > > > memory
> > > > > > +* corruption in those cases does seem preferable.
> > > > > > +*/
> > > > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > > 
> > > > > So you are trading one syzbot triggered issue for another one in the
> > > > > future?  :)
> > > > > 
> > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() 
> > > > > as
> > > > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > > > you want that to happen?
> > > > 
> > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> > > 
> > > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > > has resulted in a huge number of kernel fixes because of that.
> > 
> > Everything is customisable in syzkaller, so maybe there are specific
> > builds which panic_on_warn enabled, but none that I'm involved with
> > do.
> 
> Many systems run with panic-on-warn (i.e. the cloud), as they want to
> drop a box and restart it if anything goes wrong.
> 
> That's why syzbot reports on WARN_* calls.  They should never be
> reachable by userspace actions.
> 
> > Here follows a topical example.  The report above in the Link: tag
> > comes with a crashlog [0].  In there you can see the WARN() at the
> > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > (non-flushed) worker list, before finally tripping the BUG() which
> > triggers the report:
> > 
> > [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70
> 
> Ok, so both happens here.  But don't add a warning for something that
> can't happen.  Just handle it and move on.  It looks like you are
> handling it in this code, so please drop the WARN_ON().

Happy to oblige.

Let's give Micheal a chance to speak, then I'll fix-up if he agrees.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Greg KH
On Tue, Mar 08, 2022 at 09:15:27AM +, Lee Jones wrote:
> On Tue, 08 Mar 2022, Greg KH wrote:
> 
> > On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > 
> > > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > 
> > > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > > capture possible future race conditions which may pop up over time.
> > > > > 
> > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > 
> > > > > Cc: 
> > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > Signed-off-by: Lee Jones 
> > > > > ---
> > > > >  drivers/vhost/vhost.c | 10 ++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > >   int i;
> > > > >  
> > > > >   for (i = 0; i < dev->nvqs; ++i) {
> > > > > + /* No workers should run here by design. However, races 
> > > > > have
> > > > > +  * previously occurred where drivers have been unable 
> > > > > to flush
> > > > > +  * all work properly prior to clean-up.  Without a 
> > > > > successful
> > > > > +  * flush the guest will malfunction, but avoiding host 
> > > > > memory
> > > > > +  * corruption in those cases does seem preferable.
> > > > > +  */
> > > > > + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > > 
> > > > So you are trading one syzbot triggered issue for another one in the
> > > > future?  :)
> > > > 
> > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > > you want that to happen?
> > > 
> > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> > 
> > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > has resulted in a huge number of kernel fixes because of that.
> 
> Everything is customisable in syzkaller, so maybe there are specific
> builds which panic_on_warn enabled, but none that I'm involved with
> do.

Many systems run with panic-on-warn (i.e. the cloud), as they want to
drop a box and restart it if anything goes wrong.

That's why syzbot reports on WARN_* calls.  They should never be
reachable by userspace actions.

> Here follows a topical example.  The report above in the Link: tag
> comes with a crashlog [0].  In there you can see the WARN() at the
> bottom of vhost_dev_cleanup() trigger many times due to a populated
> (non-flushed) worker list, before finally tripping the BUG() which
> triggers the report:
> 
> [0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70

Ok, so both happens here.  But don't add a warning for something that
can't happen.  Just handle it and move on.  It looks like you are
handling it in this code, so please drop the WARN_ON().

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Tue, 08 Mar 2022, Greg KH wrote:

> On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> > On Mon, 07 Mar 2022, Greg KH wrote:
> > 
> > > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > 
> > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > capture possible future race conditions which may pop up over time.
> > > > 
> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > 
> > > > Cc: 
> > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > Signed-off-by: Lee Jones 
> > > > ---
> > > >  drivers/vhost/vhost.c | 10 ++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > int i;
> > > >  
> > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > +   /* No workers should run here by design. However, races 
> > > > have
> > > > +* previously occurred where drivers have been unable 
> > > > to flush
> > > > +* all work properly prior to clean-up.  Without a 
> > > > successful
> > > > +* flush the guest will malfunction, but avoiding host 
> > > > memory
> > > > +* corruption in those cases does seem preferable.
> > > > +*/
> > > > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > > 
> > > So you are trading one syzbot triggered issue for another one in the
> > > future?  :)
> > > 
> > > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > you want that to happen?
> > 
> > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> 
> Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> has resulted in a huge number of kernel fixes because of that.

Everything is customisable in syzkaller, so maybe there are specific
builds which panic_on_warn enabled, but none that I'm involved with
do.

Here follows a topical example.  The report above in the Link: tag
comes with a crashlog [0].  In there you can see the WARN() at the
bottom of vhost_dev_cleanup() trigger many times due to a populated
(non-flushed) worker list, before finally tripping the BUG() which
triggers the report:

[0] https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70

> > > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > > You still have a race...
> > 
> > No, we miss a warning that one time.  Memory is still protected.
> 
> Then don't warn on something that doesn't matter.  This line can be
> dropped as there's nothing anyone can do about it, right?

You'll have to take that point up with Michael.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Greg KH
On Tue, Mar 08, 2022 at 08:10:06AM +, Lee Jones wrote:
> On Mon, 07 Mar 2022, Greg KH wrote:
> 
> > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > > 
> > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > capture possible future race conditions which may pop up over time.
> > > 
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > 
> > > Cc: 
> > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > Signed-off-by: Lee Jones 
> > > ---
> > >  drivers/vhost/vhost.c | 10 ++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >   int i;
> > >  
> > >   for (i = 0; i < dev->nvqs; ++i) {
> > > + /* No workers should run here by design. However, races have
> > > +  * previously occurred where drivers have been unable to flush
> > > +  * all work properly prior to clean-up.  Without a successful
> > > +  * flush the guest will malfunction, but avoiding host memory
> > > +  * corruption in those cases does seem preferable.
> > > +  */
> > > + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > 
> > So you are trading one syzbot triggered issue for another one in the
> > future?  :)
> > 
> > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > you want that to happen?
> 
> No, Syzbot doesn't report warnings, only BUGs and memory corruption.

Has it changed?  Last I looked, it did trigger on WARN_* calls, which
has resulted in a huge number of kernel fixes because of that.

> > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > You still have a race...
> 
> No, we miss a warning that one time.  Memory is still protected.

Then don't warn on something that doesn't matter.  This line can be
dropped as there's nothing anyone can do about it, right?

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Tue, 08 Mar 2022, Lee Jones wrote:

> On Mon, 07 Mar 2022, Greg KH wrote:
> 
> > On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > > 
> > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > capture possible future race conditions which may pop up over time.
> > > 
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > 
> > > Cc: 
> > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > Signed-off-by: Lee Jones 
> > > ---
> > >  drivers/vhost/vhost.c | 10 ++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >   int i;
> > >  
> > >   for (i = 0; i < dev->nvqs; ++i) {
> > > + /* No workers should run here by design. However, races have
> > > +  * previously occurred where drivers have been unable to flush
> > > +  * all work properly prior to clean-up.  Without a successful
> > > +  * flush the guest will malfunction, but avoiding host memory
> > > +  * corruption in those cases does seem preferable.
> > > +  */
> > > + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > 
> > So you are trading one syzbot triggered issue for another one in the
> > future?  :)
> > 
> > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > you want that to happen?
> 
> No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> 
> > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > You still have a race...
> 
> No, we miss a warning that one time.  Memory is still protected.

I didn't mean those "no"s to sound so negative, sorry. :)

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Mon, 07 Mar 2022, Greg KH wrote:

> On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Also WARN() as a precautionary measure.  The purpose of this is to
> > capture possible future race conditions which may pop up over time.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> > ---
> >  drivers/vhost/vhost.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 59edb5a1ffe28..ef7e371e3e649 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > int i;
> >  
> > for (i = 0; i < dev->nvqs; ++i) {
> > +   /* No workers should run here by design. However, races have
> > +* previously occurred where drivers have been unable to flush
> > +* all work properly prior to clean-up.  Without a successful
> > +* flush the guest will malfunction, but avoiding host memory
> > +* corruption in those cases does seem preferable.
> > +*/
> > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> 
> So you are trading one syzbot triggered issue for another one in the
> future?  :)
> 
> If this ever can happen, handle it, but don't log it with a WARN_ON() as
> that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> you want that to happen?

No, Syzbot doesn't report warnings, only BUGs and memory corruption.

> And what happens if the mutex is locked _RIGHT_ after you checked it?
> You still have a race...

No, we miss a warning that one time.  Memory is still protected.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Tue, 08 Mar 2022, Jason Wang wrote:

> On Tue, Mar 8, 2022 at 3:18 AM Lee Jones  wrote:
> >
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> >
> > Also WARN() as a precautionary measure.  The purpose of this is to
> > capture possible future race conditions which may pop up over time.
> >
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> >
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> > ---
> >  drivers/vhost/vhost.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 59edb5a1ffe28..ef7e371e3e649 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > int i;
> >
> > for (i = 0; i < dev->nvqs; ++i) {
> > +   /* No workers should run here by design. However, races have
> > +* previously occurred where drivers have been unable to 
> > flush
> > +* all work properly prior to clean-up.  Without a 
> > successful
> > +* flush the guest will malfunction, but avoiding host 
> > memory
> > +* corruption in those cases does seem preferable.
> > +*/
> > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> > +
> 
> I don't get how this can help, the mutex could be grabbed in the
> middle of the above and below line.

The worst that happens in this slim scenario is we miss a warning.
The mutexes below will still function as expected and prevent possible
memory corruption.

> > +   mutex_lock(>vqs[i]->mutex);
> > if (dev->vqs[i]->error_ctx)
> > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > if (dev->vqs[i]->kick)
> > @@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > if (dev->vqs[i]->call_ctx.ctx)
> > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > vhost_vq_reset(dev, dev->vqs[i]);
> > +   mutex_unlock(>vqs[i]->mutex);
> > }
> 
> I'm not sure it's correct to assume some behaviour of a buggy device.
> For the device mutex, we use that to protect more than just err/call
> and vq.

When I authored this, I did so as *the* fix.  However, since the cause
of today's crash has now been patched, this has become a belt and
braces solution.  Michael's addition of the WARN() also has the
benefit of providing us with an early warning system for future
breakages.  Personally, I think it's kinda neat.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-08 Thread Lee Jones
On Mon, 07 Mar 2022, Michael S. Tsirkin wrote:

> On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> 
> Pls just basically copy the code comment here. this is just confuses.
> 
> > Also WARN() as a precautionary measure.  The purpose of this is to
> > capture possible future race conditions which may pop up over time.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> And this is a bug we already fixed, right?

Well, this was the bug I set out to fix.

I didn't know your patch was in flight at the time.

> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> 
> not really applicable anymore ...

I can remove these if it helps.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-07 Thread Jason Wang
On Tue, Mar 8, 2022 at 3:18 AM Lee Jones  wrote:
>
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do here is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
>
> Also WARN() as a precautionary measure.  The purpose of this is to
> capture possible future race conditions which may pop up over time.
>
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
>
> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 
> ---
>  drivers/vhost/vhost.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..ef7e371e3e649 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> int i;
>
> for (i = 0; i < dev->nvqs; ++i) {
> +   /* No workers should run here by design. However, races have
> +* previously occurred where drivers have been unable to flush
> +* all work properly prior to clean-up.  Without a successful
> +* flush the guest will malfunction, but avoiding host memory
> +* corruption in those cases does seem preferable.
> +*/
> +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> +

I don't get how this can help, the mutex could be grabbed in the
middle of the above and below line.

> +   mutex_lock(>vqs[i]->mutex);
> if (dev->vqs[i]->error_ctx)
> eventfd_ctx_put(dev->vqs[i]->error_ctx);
> if (dev->vqs[i]->kick)
> @@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> if (dev->vqs[i]->call_ctx.ctx)
> eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> vhost_vq_reset(dev, dev->vqs[i]);
> +   mutex_unlock(>vqs[i]->mutex);
> }

I'm not sure it's correct to assume some behaviour of a buggy device.
For the device mutex, we use that to protect more than just err/call
and vq.

Thanks

> vhost_dev_free_iovecs(dev);
> if (dev->log_ctx)
> --
> 2.35.1.616.g0bdcbb4464-goog
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-07 Thread Michael S. Tsirkin
On Mon, Mar 07, 2022 at 08:33:27PM +0100, Greg KH wrote:
> On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Also WARN() as a precautionary measure.  The purpose of this is to
> > capture possible future race conditions which may pop up over time.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> > ---
> >  drivers/vhost/vhost.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 59edb5a1ffe28..ef7e371e3e649 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > int i;
> >  
> > for (i = 0; i < dev->nvqs; ++i) {
> > +   /* No workers should run here by design. However, races have
> > +* previously occurred where drivers have been unable to flush
> > +* all work properly prior to clean-up.  Without a successful
> > +* flush the guest will malfunction, but avoiding host memory
> > +* corruption in those cases does seem preferable.
> > +*/
> > +   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> 
> So you are trading one syzbot triggered issue for another one in the
> future?  :)
> 
> If this ever can happen, handle it, but don't log it with a WARN_ON() as
> that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> you want that to happen?
> 
> And what happens if the mutex is locked _RIGHT_ after you checked it?
> You still have a race...
> 
> thanks,
> 
> greg k-h

Well it's a symptom of a kernel bug. I guess people with panic on
warn are not worried about DOS and more worried about integrity
and security ... am I right?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-07 Thread Michael S. Tsirkin
On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do here is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.

Pls just basically copy the code comment here. this is just confuses.

> Also WARN() as a precautionary measure.  The purpose of this is to
> capture possible future race conditions which may pop up over time.
> 
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00

And this is a bug we already fixed, right?

> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com

not really applicable anymore ...

> Signed-off-by: Lee Jones 
> ---
>  drivers/vhost/vhost.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..ef7e371e3e649 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + /* No workers should run here by design. However, races have
> +  * previously occurred where drivers have been unable to flush
> +  * all work properly prior to clean-up.  Without a successful
> +  * flush the guest will malfunction, but avoiding host memory
> +  * corruption in those cases does seem preferable.
> +  */
> + WARN_ON(mutex_is_locked(>vqs[i]->mutex));
> +
> + mutex_lock(>vqs[i]->mutex);
>   if (dev->vqs[i]->error_ctx)
>   eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   if (dev->vqs[i]->kick)
> @@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   if (dev->vqs[i]->call_ctx.ctx)
>   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
>   vhost_vq_reset(dev, dev->vqs[i]);
> + mutex_unlock(>vqs[i]->mutex);
>   }
>   vhost_dev_free_iovecs(dev);
>   if (dev->log_ctx)
> -- 
> 2.35.1.616.g0bdcbb4464-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-07 Thread Greg KH
On Mon, Mar 07, 2022 at 07:17:57PM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do here is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Also WARN() as a precautionary measure.  The purpose of this is to
> capture possible future race conditions which may pop up over time.
> 
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 
> ---
>  drivers/vhost/vhost.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..ef7e371e3e649 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + /* No workers should run here by design. However, races have
> +  * previously occurred where drivers have been unable to flush
> +  * all work properly prior to clean-up.  Without a successful
> +  * flush the guest will malfunction, but avoiding host memory
> +  * corruption in those cases does seem preferable.
> +  */
> + WARN_ON(mutex_is_locked(>vqs[i]->mutex));

So you are trading one syzbot triggered issue for another one in the
future?  :)

If this ever can happen, handle it, but don't log it with a WARN_ON() as
that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
you want that to happen?

And what happens if the mutex is locked _RIGHT_ after you checked it?
You still have a race...

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-07 Thread Lee Jones
vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc().  All we have to do here is take the same lock
during virtqueue clean-up and we mitigate the reported issues.

Also WARN() as a precautionary measure.  The purpose of this is to
capture possible future race conditions which may pop up over time.

Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00

Cc: 
Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
Signed-off-by: Lee Jones 
---
 drivers/vhost/vhost.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..ef7e371e3e649 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
int i;
 
for (i = 0; i < dev->nvqs; ++i) {
+   /* No workers should run here by design. However, races have
+* previously occurred where drivers have been unable to flush
+* all work properly prior to clean-up.  Without a successful
+* flush the guest will malfunction, but avoiding host memory
+* corruption in those cases does seem preferable.
+*/
+   WARN_ON(mutex_is_locked(>vqs[i]->mutex));
+
+   mutex_lock(>vqs[i]->mutex);
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
@@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
if (dev->vqs[i]->call_ctx.ctx)
eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
+   mutex_unlock(>vqs[i]->mutex);
}
vhost_dev_free_iovecs(dev);
if (dev->log_ctx)
-- 
2.35.1.616.g0bdcbb4464-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-04 Thread Lee Jones
On Fri, 04 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> 
> OK so please post series with this and the warning
> cleaned up comments and commit logs explaining that
> this is just to make debugging easier in case
> we have issues in the future, it's not a bugfix.

No problem.

Just to clarify, drop Cc: Stable also?

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-04 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 

OK so please post series with this and the warning
cleaned up comments and commit logs explaining that
this is just to make debugging easier in case
we have issues in the future, it's not a bugfix.

> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..bbaff6a5e21b8 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + mutex_lock(>vqs[i]->mutex);
>   if (dev->vqs[i]->error_ctx)
>   eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   if (dev->vqs[i]->kick)
> @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   if (dev->vqs[i]->call_ctx.ctx)
>   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
>   vhost_vq_reset(dev, dev->vqs[i]);
> + mutex_unlock(>vqs[i]->mutex);
>   }
>   vhost_dev_free_iovecs(dev);
>   if (dev->log_ctx)
> -- 
> 2.35.1.574.g5d30c73bfb-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-04 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 04:36:43PM +0100, Stefano Garzarella wrote:
> On Wed, Mar 02, 2022 at 09:50:38AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
> > > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > >
> > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > >
> > > > > This issue is similar to [1] that should be already fixed upstream by 
> > > > > [2].
> > > > >
> > > > > However I think this patch would have prevented some issues, because
> > > > > vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> > > > > running.
> > > > >
> > > > > Anyway I think that when we enter in vhost_dev_cleanup() the worker 
> > > > > should
> > > > > be already stopped, so it shouldn't be necessary to take the mutex. 
> > > > > But in
> > > > > order to prevent future issues maybe it's better to take them, so:
> > > > >
> > > > > Reviewed-by: Stefano Garzarella 
> > > > >
> > > > > [1]
> > > > > https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
> > > > > [2] 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > > >
> > > >
> > > > Right. I want to queue this but I would like to get a warning
> > > > so we can detect issues like [2] before they cause more issues.
> > > 
> > > I agree, what about moving the warning that we already have higher up, 
> > > right
> > > at the beginning of the function?
> > > 
> > > I mean something like this:
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 59edb5a1ffe2..1721ff3f18c0 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >  {
> > > int i;
> > > +   WARN_ON(!llist_empty(>work_list));
> > > +
> > > for (i = 0; i < dev->nvqs; ++i) {
> > > if (dev->vqs[i]->error_ctx)
> > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > @@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > dev->iotlb = NULL;
> > > vhost_clear_msg(dev);
> > > wake_up_interruptible_poll(>wait, EPOLLIN | EPOLLRDNORM);
> > > -   WARN_ON(!llist_empty(>work_list));
> > > if (dev->worker) {
> > > kthread_stop(dev->worker);
> > > dev->worker = NULL;
> > > 
> > 
> > Hmm I'm not sure why it matters.
> 
> Because after this new patch, putting locks in the while loop, when we
> finish the loop the workers should be stopped, because vhost_vq_reset() sets
> vq->private to NULL.
> 
> But the best thing IMHO is to check that there is no backend set for each
> vq, so the workers have been stopped correctly at this point.
> 
> Thanks,
> Stefano

It's the list of workers waiting to run though. That is not affected by
vq lock at all.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-04 Thread Lee Jones
On Fri, 04 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> 
> So combine with the warning patch and update description with
> the comment I posted, explaining it's more a just in case thing.

Will do.  Plan is to submit this on Monday.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-03 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 

So combine with the warning patch and update description with
the comment I posted, explaining it's more a just in case thing.

> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..bbaff6a5e21b8 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + mutex_lock(>vqs[i]->mutex);
>   if (dev->vqs[i]->error_ctx)
>   eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   if (dev->vqs[i]->kick)
> @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   if (dev->vqs[i]->call_ctx.ctx)
>   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
>   vhost_vq_reset(dev, dev->vqs[i]);
> + mutex_unlock(>vqs[i]->mutex);
>   }
>   vhost_dev_free_iovecs(dev);
>   if (dev->log_ctx)
> -- 
> 2.35.1.574.g5d30c73bfb-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-03 Thread Lee Jones
On Wed, 02 Mar 2022, Stefano Garzarella wrote:

> On Wed, Mar 02, 2022 at 04:49:17PM +, Lee Jones wrote:
> > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > 
> > > On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> > > > On Wed, Mar 2, 2022 at 3:57 PM Lee Jones  wrote:
> > > > >
> > > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > > >
> > > > > > On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > > > > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > > > > >
> > > > > > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during 
> > > > > > > > > its call
> > > > > > > > > to vhost_get_vq_desc().  All we have to do is take the same 
> > > > > > > > > lock
> > > > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > > > >
> > > > > > > > > Link: 
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > > > >
> > > > > > > > > Cc: 
> > > > > > > > > Reported-by: 
> > > > > > > > > syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > > > Signed-off-by: Lee Jones 
> > > > > > > > > ---
> > > > > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > > > > >  1 file changed, 2 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev 
> > > > > > > > > *dev)
> > > > > > > > > int i;
> > > > > > > > >
> > > > > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > > > +   mutex_lock(>vqs[i]->mutex);
> > > > > > > > > if (dev->vqs[i]->error_ctx)
> > > > > > > > > 
> > > > > > > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > > > > if (dev->vqs[i]->kick)
> > > > > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev 
> > > > > > > > > *dev)
> > > > > > > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > > > > > > 
> > > > > > > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > > > > > > }
> > > > > > > >
> > > > > > > > So this is a mitigation plan but the bug is still there though
> > > > > > > > we don't know exactly what it is.  I would prefer adding 
> > > > > > > > something like
> > > > > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make 
> > > > > > > > sense?
> > > > > > >
> > > > > > > As a rework to this, or as a subsequent patch?
> > > > > >
> > > > > > Can be a separate patch.
> > > > > >
> > > > > > > Just before the first lock I assume?
> > > > > >
> > > > > > I guess so, yes.
> > > > >
> > > > > No problem.  Patch to follow.
> > > > >
> > > > > I'm also going to attempt to debug the root cause, but I'm new to this
> > > > > subsystem to it might take a while for me to get my head around.
> > > >
> > > > IIUC the root cause should be the same as the one we solved here:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > > >
> > > > The worker was not stopped before calling vhost_dev_cleanup(). So while
> > > > the worker was still running we were going to free memory or initialize
> > > > fields while it was still using virtqueue.
> > > 
> > > Right, and I agree but it's not the root though, we do attempt to stop 
> > > all workers.
> > 
> > Exactly.  This is what happens, but the question I'm going to attempt
> > to answer is *why* does this happen.
> 
> IIUC the worker was still running because the /dev/vhost-vsock file was not
> explicitly closed, so vhost_vsock_dev_release() was called in the do_exit()
> of the process.
> 
> In that case there was the issue, because vhost_dev_check_owner() returned
> false in vhost_vsock_stop() since current->mm was NULL.
> So it returned earlier, without calling vhost_vq_set_backend(vq, NULL).
> 
> This did not stop the worker from continuing to run, causing the multiple
> issues we are seeing.
> 
> current->mm was NULL, because in the do_exit() the address space is cleaned
> in the exit_mm(), which is called before releasing the files into the
> exit_task_work().
> 
> This can be seen from the logs, where we see first the warnings printed by
> vhost_dev_cleanup() and then the panic in the worker (e.g. here
> https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70)
> 
> Mike also added a few more helpful details in this thread:
> https://lore.kernel.org/virtualization/20220221100500.2x3s2sddqahgdfyt@sgarzare-redhat/T/#ree61316eac63245c9ba3050b44330e4034282cc2

I guess that about sums it up. :)

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Stefano Garzarella

On Wed, Mar 02, 2022 at 04:49:17PM +, Lee Jones wrote:

On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:


On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> On Wed, Mar 2, 2022 at 3:57 PM Lee Jones  wrote:
> >
> > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> >
> > > On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > >
> > > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > >
> > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > >
> > > > > > Cc: 
> > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones 
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > int i;
> > > > > >
> > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +   mutex_lock(>vqs[i]->mutex);
> > > > > > if (dev->vqs[i]->error_ctx)
> > > > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > if (dev->vqs[i]->kick)
> > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > > > }
> > > > >
> > > > > So this is a mitigation plan but the bug is still there though
> > > > > we don't know exactly what it is.  I would prefer adding something 
like
> > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > > >
> > > > As a rework to this, or as a subsequent patch?
> > >
> > > Can be a separate patch.
> > >
> > > > Just before the first lock I assume?
> > >
> > > I guess so, yes.
> >
> > No problem.  Patch to follow.
> >
> > I'm also going to attempt to debug the root cause, but I'm new to this
> > subsystem to it might take a while for me to get my head around.
>
> IIUC the root cause should be the same as the one we solved here:
> 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
>
> The worker was not stopped before calling vhost_dev_cleanup(). So while
> the worker was still running we were going to free memory or initialize
> fields while it was still using virtqueue.

Right, and I agree but it's not the root though, we do attempt to stop all 
workers.


Exactly.  This is what happens, but the question I'm going to attempt
to answer is *why* does this happen.


IIUC the worker was still running because the /dev/vhost-vsock file was 
not explicitly closed, so vhost_vsock_dev_release() was called in the 
do_exit() of the process.


In that case there was the issue, because vhost_dev_check_owner() 
returned false in vhost_vsock_stop() since current->mm was NULL.

So it returned earlier, without calling vhost_vq_set_backend(vq, NULL).

This did not stop the worker from continuing to run, causing the 
multiple issues we are seeing.


current->mm was NULL, because in the do_exit() the address space is 
cleaned in the exit_mm(), which is called before releasing the files 
into the exit_task_work().


This can be seen from the logs, where we see first the warnings printed 
by vhost_dev_cleanup() and then the panic in the worker (e.g. here 
https://syzkaller.appspot.com/text?tag=CrashLog=16a61fce70)


Mike also added a few more helpful details in this thread: 
https://lore.kernel.org/virtualization/20220221100500.2x3s2sddqahgdfyt@sgarzare-redhat/T/#ree61316eac63245c9ba3050b44330e4034282cc2


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Lee Jones
On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> > On Wed, Mar 2, 2022 at 3:57 PM Lee Jones  wrote:
> > >
> > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > >
> > > > On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > > >
> > > > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its 
> > > > > > > call
> > > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > >
> > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > >
> > > > > > > Cc: 
> > > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > > Signed-off-by: Lee Jones 
> > > > > > > ---
> > > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > > int i;
> > > > > > >
> > > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > +   mutex_lock(>vqs[i]->mutex);
> > > > > > > if (dev->vqs[i]->error_ctx)
> > > > > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > > if (dev->vqs[i]->kick)
> > > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > > > > 
> > > > > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > > > > }
> > > > > >
> > > > > > So this is a mitigation plan but the bug is still there though
> > > > > > we don't know exactly what it is.  I would prefer adding something 
> > > > > > like
> > > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > > > >
> > > > > As a rework to this, or as a subsequent patch?
> > > >
> > > > Can be a separate patch.
> > > >
> > > > > Just before the first lock I assume?
> > > >
> > > > I guess so, yes.
> > >
> > > No problem.  Patch to follow.
> > >
> > > I'm also going to attempt to debug the root cause, but I'm new to this
> > > subsystem to it might take a while for me to get my head around.
> > 
> > IIUC the root cause should be the same as the one we solved here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > 
> > The worker was not stopped before calling vhost_dev_cleanup(). So while 
> > the worker was still running we were going to free memory or initialize 
> > fields while it was still using virtqueue.
> 
> Right, and I agree but it's not the root though, we do attempt to stop all 
> workers.

Exactly.  This is what happens, but the question I'm going to attempt
to answer is *why* does this happen.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> On Wed, Mar 2, 2022 at 3:57 PM Lee Jones  wrote:
> >
> > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> >
> > > On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > >
> > > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > >
> > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > >
> > > > > > Cc: 
> > > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones 
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > int i;
> > > > > >
> > > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +   mutex_lock(>vqs[i]->mutex);
> > > > > > if (dev->vqs[i]->error_ctx)
> > > > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > if (dev->vqs[i]->kick)
> > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > > > }
> > > > >
> > > > > So this is a mitigation plan but the bug is still there though
> > > > > we don't know exactly what it is.  I would prefer adding something 
> > > > > like
> > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > > >
> > > > As a rework to this, or as a subsequent patch?
> > >
> > > Can be a separate patch.
> > >
> > > > Just before the first lock I assume?
> > >
> > > I guess so, yes.
> >
> > No problem.  Patch to follow.
> >
> > I'm also going to attempt to debug the root cause, but I'm new to this
> > subsystem to it might take a while for me to get my head around.
> 
> IIUC the root cause should be the same as the one we solved here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> 
> The worker was not stopped before calling vhost_dev_cleanup(). So while 
> the worker was still running we were going to free memory or initialize 
> fields while it was still using virtqueue.
> 
> Cheers,
> Stefano

Right, and I agree but it's not the root though, we do attempt to stop all 
workers.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Stefano Garzarella
On Wed, Mar 2, 2022 at 3:57 PM Lee Jones  wrote:
>
> On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
>
> > On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > >
> > > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > >
> > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > >
> > > > > Cc: 
> > > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > > Signed-off-by: Lee Jones 
> > > > > ---
> > > > >  drivers/vhost/vhost.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > int i;
> > > > >
> > > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > > +   mutex_lock(>vqs[i]->mutex);
> > > > > if (dev->vqs[i]->error_ctx)
> > > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > if (dev->vqs[i]->kick)
> > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > > }
> > > >
> > > > So this is a mitigation plan but the bug is still there though
> > > > we don't know exactly what it is.  I would prefer adding something like
> > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > >
> > > As a rework to this, or as a subsequent patch?
> >
> > Can be a separate patch.
> >
> > > Just before the first lock I assume?
> >
> > I guess so, yes.
>
> No problem.  Patch to follow.
>
> I'm also going to attempt to debug the root cause, but I'm new to this
> subsystem to it might take a while for me to get my head around.

IIUC the root cause should be the same as the one we solved here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9

The worker was not stopped before calling vhost_dev_cleanup(). So while 
the worker was still running we were going to free memory or initialize 
fields while it was still using virtqueue.

Cheers,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Stefano Garzarella

On Wed, Mar 02, 2022 at 09:50:38AM -0500, Michael S. Tsirkin wrote:

On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:

On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
> On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
> > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > >
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> >
> > This issue is similar to [1] that should be already fixed upstream by [2].
> >
> > However I think this patch would have prevented some issues, because
> > vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> > running.
> >
> > Anyway I think that when we enter in vhost_dev_cleanup() the worker should
> > be already stopped, so it shouldn't be necessary to take the mutex. But in
> > order to prevent future issues maybe it's better to take them, so:
> >
> > Reviewed-by: Stefano Garzarella 
> >
> > [1]
> > 
https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
> > [2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
>
>
> Right. I want to queue this but I would like to get a warning
> so we can detect issues like [2] before they cause more issues.

I agree, what about moving the warning that we already have higher up, right
at the beginning of the function?

I mean something like this:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..1721ff3f18c0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 {
int i;
+   WARN_ON(!llist_empty(>work_list));
+
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
dev->iotlb = NULL;
vhost_clear_msg(dev);
wake_up_interruptible_poll(>wait, EPOLLIN | EPOLLRDNORM);
-   WARN_ON(!llist_empty(>work_list));
if (dev->worker) {
kthread_stop(dev->worker);
dev->worker = NULL;



Hmm I'm not sure why it matters.


Because after this new patch, putting locks in the while loop, when we 
finish the loop the workers should be stopped, because vhost_vq_reset() 
sets vq->private to NULL.


But the best thing IMHO is to check that there is no backend set for 
each vq, so the workers have been stopped correctly at this point.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Lee Jones
On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > 
> > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > 
> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > 
> > > > Cc: 
> > > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > > Signed-off-by: Lee Jones 
> > > > ---
> > > >  drivers/vhost/vhost.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > int i;
> > > >  
> > > > for (i = 0; i < dev->nvqs; ++i) {
> > > > +   mutex_lock(>vqs[i]->mutex);
> > > > if (dev->vqs[i]->error_ctx)
> > > > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > if (dev->vqs[i]->kick)
> > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > if (dev->vqs[i]->call_ctx.ctx)
> > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > vhost_vq_reset(dev, dev->vqs[i]);
> > > > +   mutex_unlock(>vqs[i]->mutex);
> > > > }
> > > 
> > > So this is a mitigation plan but the bug is still there though
> > > we don't know exactly what it is.  I would prefer adding something like
> > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > 
> > As a rework to this, or as a subsequent patch?
> 
> Can be a separate patch.
> 
> > Just before the first lock I assume?
> 
> I guess so, yes.

No problem.  Patch to follow.

I'm also going to attempt to debug the root cause, but I'm new to this
subsystem to it might take a while for me to get my head around.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 01:56:35PM +, Lee Jones wrote:
> On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> 
> > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > > 
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > 
> > > Cc: 
> > > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > > Signed-off-by: Lee Jones 
> > > ---
> > >  drivers/vhost/vhost.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >   int i;
> > >  
> > >   for (i = 0; i < dev->nvqs; ++i) {
> > > + mutex_lock(>vqs[i]->mutex);
> > >   if (dev->vqs[i]->error_ctx)
> > >   eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > >   if (dev->vqs[i]->kick)
> > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >   if (dev->vqs[i]->call_ctx.ctx)
> > >   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > >   vhost_vq_reset(dev, dev->vqs[i]);
> > > + mutex_unlock(>vqs[i]->mutex);
> > >   }
> > 
> > So this is a mitigation plan but the bug is still there though
> > we don't know exactly what it is.  I would prefer adding something like
> > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> 
> As a rework to this, or as a subsequent patch?

Can be a separate patch.

> Just before the first lock I assume?

I guess so, yes.

> -- 
> Lee Jones [李琼斯]
> Principal Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:
> On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
> > > On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > during virtqueue clean-up and we mitigate the reported issues.
> > > >
> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > 
> > > This issue is similar to [1] that should be already fixed upstream by [2].
> > > 
> > > However I think this patch would have prevented some issues, because
> > > vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> > > running.
> > > 
> > > Anyway I think that when we enter in vhost_dev_cleanup() the worker should
> > > be already stopped, so it shouldn't be necessary to take the mutex. But in
> > > order to prevent future issues maybe it's better to take them, so:
> > > 
> > > Reviewed-by: Stefano Garzarella 
> > > 
> > > [1]
> > > https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
> > > [2] 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > 
> > 
> > Right. I want to queue this but I would like to get a warning
> > so we can detect issues like [2] before they cause more issues.
> 
> I agree, what about moving the warning that we already have higher up, right
> at the beginning of the function?
> 
> I mean something like this:
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe2..1721ff3f18c0 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  {
> int i;
> +   WARN_ON(!llist_empty(>work_list));
> +
> for (i = 0; i < dev->nvqs; ++i) {
> if (dev->vqs[i]->error_ctx)
> eventfd_ctx_put(dev->vqs[i]->error_ctx);
> @@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> dev->iotlb = NULL;
> vhost_clear_msg(dev);
> wake_up_interruptible_poll(>wait, EPOLLIN | EPOLLRDNORM);
> -   WARN_ON(!llist_empty(>work_list));
> if (dev->worker) {
> kthread_stop(dev->worker);
> dev->worker = NULL;
> 

Hmm I'm not sure why it matters.

> And maybe we can also check vq->private and warn in the loop, because the
> work_list may be empty if the device is doing nothing.
> 
> Thanks,
> Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Stefano Garzarella

On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:

On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:

On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
>
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00

This issue is similar to [1] that should be already fixed upstream by [2].

However I think this patch would have prevented some issues, because
vhost_vq_reset() sets vq->private to NULL, preventing the worker from
running.

Anyway I think that when we enter in vhost_dev_cleanup() the worker should
be already stopped, so it shouldn't be necessary to take the mutex. But in
order to prevent future issues maybe it's better to take them, so:

Reviewed-by: Stefano Garzarella 

[1]
https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9



Right. I want to queue this but I would like to get a warning
so we can detect issues like [2] before they cause more issues.


I agree, what about moving the warning that we already have higher up, 
right at the beginning of the function?


I mean something like this:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..1721ff3f18c0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 {
int i;
 
+   WARN_ON(!llist_empty(>work_list));

+
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
dev->iotlb = NULL;
vhost_clear_msg(dev);
wake_up_interruptible_poll(>wait, EPOLLIN | EPOLLRDNORM);
-   WARN_ON(!llist_empty(>work_list));
if (dev->worker) {
kthread_stop(dev->worker);
dev->worker = NULL;


And maybe we can also check vq->private and warn in the loop, because 
the work_list may be empty if the device is doing nothing.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Lee Jones
On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> > ---
> >  drivers/vhost/vhost.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > int i;
> >  
> > for (i = 0; i < dev->nvqs; ++i) {
> > +   mutex_lock(>vqs[i]->mutex);
> > if (dev->vqs[i]->error_ctx)
> > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > if (dev->vqs[i]->kick)
> > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > if (dev->vqs[i]->call_ctx.ctx)
> > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > vhost_vq_reset(dev, dev->vqs[i]);
> > +   mutex_unlock(>vqs[i]->mutex);
> > }
> 
> So this is a mitigation plan but the bug is still there though
> we don't know exactly what it is.  I would prefer adding something like
> WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?

As a rework to this, or as a subsequent patch?

Just before the first lock I assume?

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
> On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> This issue is similar to [1] that should be already fixed upstream by [2].
> 
> However I think this patch would have prevented some issues, because
> vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> running.
> 
> Anyway I think that when we enter in vhost_dev_cleanup() the worker should
> be already stopped, so it shouldn't be necessary to take the mutex. But in
> order to prevent future issues maybe it's better to take them, so:
> 
> Reviewed-by: Stefano Garzarella 
> 
> [1]
> https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9


Right. I want to queue this but I would like to get a warning
so we can detect issues like [2] before they cause more issues.


> > 
> > Cc: 
> > Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones 
> > ---
> > drivers/vhost/vhost.c | 2 ++
> > 1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > int i;
> > 
> > for (i = 0; i < dev->nvqs; ++i) {
> > +   mutex_lock(>vqs[i]->mutex);
> > if (dev->vqs[i]->error_ctx)
> > eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > if (dev->vqs[i]->kick)
> > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > if (dev->vqs[i]->call_ctx.ctx)
> > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > vhost_vq_reset(dev, dev->vqs[i]);
> > +   mutex_unlock(>vqs[i]->mutex);
> > }
> > vhost_dev_free_iovecs(dev);
> > if (dev->log_ctx)
> > -- 
> > 2.35.1.574.g5d30c73bfb-goog
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Michael S. Tsirkin
On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> vhost_vsock_handle_tx_kick() already holds the mutex during its call
> to vhost_get_vq_desc().  All we have to do is take the same lock
> during virtqueue clean-up and we mitigate the reported issues.
> 
> Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> Cc: 
> Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 
> ---
>  drivers/vhost/vhost.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe28..bbaff6a5e21b8 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   int i;
>  
>   for (i = 0; i < dev->nvqs; ++i) {
> + mutex_lock(>vqs[i]->mutex);
>   if (dev->vqs[i]->error_ctx)
>   eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   if (dev->vqs[i]->kick)
> @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   if (dev->vqs[i]->call_ctx.ctx)
>   eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
>   vhost_vq_reset(dev, dev->vqs[i]);
> + mutex_unlock(>vqs[i]->mutex);
>   }

So this is a mitigation plan but the bug is still there though
we don't know exactly what it is.  I would prefer adding something like
WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?



>   vhost_dev_free_iovecs(dev);
>   if (dev->log_ctx)
> -- 
> 2.35.1.574.g5d30c73bfb-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Lee Jones
On Wed, 02 Mar 2022, Stefano Garzarella wrote:

> On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:
> > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > to vhost_get_vq_desc().  All we have to do is take the same lock
> > during virtqueue clean-up and we mitigate the reported issues.
> > 
> > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> 
> This issue is similar to [1] that should be already fixed upstream by [2].
> 
> However I think this patch would have prevented some issues, because
> vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> running.
> 
> Anyway I think that when we enter in vhost_dev_cleanup() the worker should
> be already stopped, so it shouldn't be necessary to take the mutex. But in
> order to prevent future issues maybe it's better to take them, so:


> Reviewed-by: Stefano Garzarella 

Thanks for the analysis and the review Stefano.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-02 Thread Stefano Garzarella

On Wed, Mar 02, 2022 at 07:54:21AM +, Lee Jones wrote:

vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc().  All we have to do is take the same lock
during virtqueue clean-up and we mitigate the reported issues.

Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00


This issue is similar to [1] that should be already fixed upstream by 
[2].


However I think this patch would have prevented some issues, because 
vhost_vq_reset() sets vq->private to NULL, preventing the worker from 
running.


Anyway I think that when we enter in vhost_dev_cleanup() the worker 
should be already stopped, so it shouldn't be necessary to take the 
mutex. But in order to prevent future issues maybe it's better to take 
them, so:


Reviewed-by: Stefano Garzarella 

[1] 
https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9




Cc: 
Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
Signed-off-by: Lee Jones 
---
drivers/vhost/vhost.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..bbaff6a5e21b8 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
int i;

for (i = 0; i < dev->nvqs; ++i) {
+   mutex_lock(>vqs[i]->mutex);
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
@@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
if (dev->vqs[i]->call_ctx.ctx)
eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
+   mutex_unlock(>vqs[i]->mutex);
}
vhost_dev_free_iovecs(dev);
if (dev->log_ctx)
--
2.35.1.574.g5d30c73bfb-goog



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

2022-03-01 Thread Lee Jones
vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc().  All we have to do is take the same lock
during virtqueue clean-up and we mitigate the reported issues.

Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00

Cc: 
Reported-by: syzbot+adc3cb32385586bec...@syzkaller.appspotmail.com
Signed-off-by: Lee Jones 
---
 drivers/vhost/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..bbaff6a5e21b8 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
int i;
 
for (i = 0; i < dev->nvqs; ++i) {
+   mutex_lock(>vqs[i]->mutex);
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
if (dev->vqs[i]->kick)
@@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
if (dev->vqs[i]->call_ctx.ctx)
eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
vhost_vq_reset(dev, dev->vqs[i]);
+   mutex_unlock(>vqs[i]->mutex);
}
vhost_dev_free_iovecs(dev);
if (dev->log_ctx)
-- 
2.35.1.574.g5d30c73bfb-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization