Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-06-05 Thread Stefano Garzarella

On Thu, Jun 01, 2023 at 11:33:09AM -0500, Mike Christie wrote:

On 6/1/23 2:47 AM, Stefano Garzarella wrote:


static void vhost_worker_free(struct vhost_dev *dev)
{
-    struct vhost_worker *worker = dev->worker;
+    struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);

-    if (!worker)
+    if (!vtsk)
    return;

-    dev->worker = NULL;
-    WARN_ON(!llist_empty(>work_list));
-    vhost_task_stop(worker->vtsk);
-    kfree(worker);
+    vhost_task_stop(vtsk);
+    WARN_ON(!llist_empty(>worker.work_list));
+    WRITE_ONCE(dev->worker.vtsk, NULL);


The patch LGTM, I just wonder if we should set dev->worker to zero here,


We might want to just set kcov_handle to zero for now.

In 6.3 and older, I think we could do:

1. vhost_dev_set_owner could successfully set dev->worker.
2. vhost_transport_send_pkt runs vhost_work_queue and sees worker
is set and adds the vhost_work to the work_list.
3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop
the worker before the work can be run and set worker to NULL.
4. We clear kcov_handle and return.

We leave the work on the work_list.

5. Userspace can then retry vhost_dev_set_owner. If that works, then the
work gets executed ok eventually.

OR

Userspace can just close the device. vhost_vsock_dev_release would
eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker
so will just return), and that will hit the WARN_ON but we would
proceed ok.

If I do a memset of the worker, then if userspace were to retry
VHOST_SET_OWNER, we would lose the queued work since the work_list would
get zero'd. I think it's unlikely this ever happens, but you know best
so let me know if this a real issue.



I don't think it's a problem, though, you're right, we could hide the 
warning and thus future bugs, better as you proposed.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-06-01 Thread Mike Christie
On 6/1/23 2:47 AM, Stefano Garzarella wrote:
>>
>> static void vhost_worker_free(struct vhost_dev *dev)
>> {
>> -    struct vhost_worker *worker = dev->worker;
>> +    struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);
>>
>> -    if (!worker)
>> +    if (!vtsk)
>>     return;
>>
>> -    dev->worker = NULL;
>> -    WARN_ON(!llist_empty(>work_list));
>> -    vhost_task_stop(worker->vtsk);
>> -    kfree(worker);
>> +    vhost_task_stop(vtsk);
>> +    WARN_ON(!llist_empty(>worker.work_list));
>> +    WRITE_ONCE(dev->worker.vtsk, NULL);
> 
> The patch LGTM, I just wonder if we should set dev->worker to zero here,

We might want to just set kcov_handle to zero for now.

In 6.3 and older, I think we could do:

1. vhost_dev_set_owner could successfully set dev->worker.
2. vhost_transport_send_pkt runs vhost_work_queue and sees worker
is set and adds the vhost_work to the work_list.
3. vhost_dev_set_owner fails in vhost_attach_cgroups, so we stop
the worker before the work can be run and set worker to NULL.
4. We clear kcov_handle and return.

We leave the work on the work_list.

5. Userspace can then retry vhost_dev_set_owner. If that works, then the
work gets executed ok eventually.

OR

Userspace can just close the device. vhost_vsock_dev_release would
eventually call vhost_dev_cleanup (vhost_dev_flush won't see a worker
so will just return), and that will hit the WARN_ON but we would
proceed ok.

If I do a memset of the worker, then if userspace were to retry
VHOST_SET_OWNER, we would lose the queued work since the work_list would
get zero'd. I think it's unlikely this ever happens, but you know best
so let me know if this a real issue.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-06-01 Thread Stefano Garzarella

On Wed, May 31, 2023 at 11:27:12AM -0500, Mike Christie wrote:

On 5/31/23 10:15 AM, Mike Christie wrote:

rcu would work for your case and for what Jason had requested.

Yeah, so you already have some patches?

Do you want to send it to solve this problem?


Yeah, I'll break them out and send them later today when I can retest
rebased patches.



Just one question. Do you core vhost developers consider RCU more complex
or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate
regression fix we could just switch to the latter like below to just fix
the crash if we think that is more simple.

I think RCU is just a little more complex/invasive because it will have the
extra synchronize_rcu calls.


Yes, you may be right, in this case we should just need
READ_ONCE/WRITE_ONCE if dev->worker is no longer a pointer.




diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a92af08e7864..03fd47a22a73 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev)
{
struct vhost_flush_struct flush;

-   if (dev->worker) {
+   if (READ_ONCE(dev->worker.vtsk)) {
init_completion(_event);
vhost_work_init(, vhost_flush_work);

@@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush);

void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
{
-   if (!dev->worker)
+   struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);
+
+   if (!vtsk)
return;

if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) {
@@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
vhost_work *work)
 * sure it was not in the list.
 * test_and_set_bit() implies a memory barrier.
 */
-   llist_add(>node, >worker->work_list);
-   wake_up_process(dev->worker->vtsk->task);
+   llist_add(>node, >worker.work_list);
+   wake_up_process(vtsk->task);
}
}
EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
/* A lockless hint for busy polling code to exit the loop */
bool vhost_has_work(struct vhost_dev *dev)
{
-   return dev->worker && !llist_empty(>worker->work_list);
+   return !llist_empty(>worker.work_list);
}
EXPORT_SYMBOL_GPL(vhost_has_work);

@@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->umem = NULL;
dev->iotlb = NULL;
dev->mm = NULL;
-   dev->worker = NULL;
+   memset(>worker, 0, sizeof(dev->worker));
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev)

static void vhost_worker_free(struct vhost_dev *dev)
{
-   struct vhost_worker *worker = dev->worker;
+   struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);

-   if (!worker)
+   if (!vtsk)
return;

-   dev->worker = NULL;
-   WARN_ON(!llist_empty(>work_list));
-   vhost_task_stop(worker->vtsk);
-   kfree(worker);
+   vhost_task_stop(vtsk);
+   WARN_ON(!llist_empty(>worker.work_list));
+   WRITE_ONCE(dev->worker.vtsk, NULL);


The patch LGTM, I just wonder if we should set dev->worker to zero here,
but maybe we don't need to.

Thanks,
Stefano


}

static int vhost_worker_create(struct vhost_dev *dev)
{
-   struct vhost_worker *worker;
struct vhost_task *vtsk;
char name[TASK_COMM_LEN];
int ret;

-   worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
-   if (!worker)
-   return -ENOMEM;
-
-   dev->worker = worker;
-   worker->kcov_handle = kcov_common_handle();
-   init_llist_head(>work_list);
+   dev->worker.kcov_handle = kcov_common_handle();
+   init_llist_head(>worker.work_list);
snprintf(name, sizeof(name), "vhost-%d", current->pid);

-   vtsk = vhost_task_create(vhost_worker, worker, name);
+   vtsk = vhost_task_create(vhost_worker, >worker, name);
if (!vtsk) {
ret = -ENOMEM;
goto free_worker;
}

-   worker->vtsk = vtsk;
+   WRITE_ONCE(dev->worker.vtsk, vtsk);
vhost_task_start(vtsk);
return 0;

free_worker:
-   kfree(worker);
-   dev->worker = NULL;
+   WRITE_ONCE(dev->worker.vtsk, NULL);
return ret;
}

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 0308638cdeee..305ec8593d46 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -154,7 +154,7 @@ struct vhost_dev {
struct vhost_virtqueue **vqs;
int nvqs;
struct eventfd_ctx *log_ctx;
-   struct vhost_worker *worker;
+   struct vhost_worker worker;
struct vhost_iotlb *umem;
struct vhost_iotlb *iotlb;
spinlock_t iotlb_lock;



___
Virtualization mailing 

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-31 Thread Mike Christie
On 5/31/23 10:15 AM, Mike Christie wrote:
>>> rcu would work for your case and for what Jason had requested.
>> Yeah, so you already have some patches?
>>
>> Do you want to send it to solve this problem?
>>
> Yeah, I'll break them out and send them later today when I can retest
> rebased patches.
> 

Just one question. Do you core vhost developers consider RCU more complex
or switching to READ_ONCE/WRITE_ONCE? I am asking because for this immediate
regression fix we could just switch to the latter like below to just fix
the crash if we think that is more simple.

I think RCU is just a little more complex/invasive because it will have the
extra synchronize_rcu calls.


diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a92af08e7864..03fd47a22a73 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -235,7 +235,7 @@ void vhost_dev_flush(struct vhost_dev *dev)
 {
struct vhost_flush_struct flush;
 
-   if (dev->worker) {
+   if (READ_ONCE(dev->worker.vtsk)) {
init_completion(_event);
vhost_work_init(, vhost_flush_work);
 
@@ -247,7 +247,9 @@ EXPORT_SYMBOL_GPL(vhost_dev_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
-   if (!dev->worker)
+   struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);
+
+   if (!vtsk)
return;
 
if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) {
@@ -255,8 +257,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
vhost_work *work)
 * sure it was not in the list.
 * test_and_set_bit() implies a memory barrier.
 */
-   llist_add(>node, >worker->work_list);
-   wake_up_process(dev->worker->vtsk->task);
+   llist_add(>node, >worker.work_list);
+   wake_up_process(vtsk->task);
}
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -264,7 +266,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-   return dev->worker && !llist_empty(>worker->work_list);
+   return !llist_empty(>worker.work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -468,7 +470,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->umem = NULL;
dev->iotlb = NULL;
dev->mm = NULL;
-   dev->worker = NULL;
+   memset(>worker, 0, sizeof(dev->worker));
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -542,46 +544,38 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 
 static void vhost_worker_free(struct vhost_dev *dev)
 {
-   struct vhost_worker *worker = dev->worker;
+   struct vhost_task *vtsk = READ_ONCE(dev->worker.vtsk);
 
-   if (!worker)
+   if (!vtsk)
return;
 
-   dev->worker = NULL;
-   WARN_ON(!llist_empty(>work_list));
-   vhost_task_stop(worker->vtsk);
-   kfree(worker);
+   vhost_task_stop(vtsk);
+   WARN_ON(!llist_empty(>worker.work_list));
+   WRITE_ONCE(dev->worker.vtsk, NULL);
 }
 
 static int vhost_worker_create(struct vhost_dev *dev)
 {
-   struct vhost_worker *worker;
struct vhost_task *vtsk;
char name[TASK_COMM_LEN];
int ret;
 
-   worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
-   if (!worker)
-   return -ENOMEM;
-
-   dev->worker = worker;
-   worker->kcov_handle = kcov_common_handle();
-   init_llist_head(>work_list);
+   dev->worker.kcov_handle = kcov_common_handle();
+   init_llist_head(>worker.work_list);
snprintf(name, sizeof(name), "vhost-%d", current->pid);
 
-   vtsk = vhost_task_create(vhost_worker, worker, name);
+   vtsk = vhost_task_create(vhost_worker, >worker, name);
if (!vtsk) {
ret = -ENOMEM;
goto free_worker;
}
 
-   worker->vtsk = vtsk;
+   WRITE_ONCE(dev->worker.vtsk, vtsk);
vhost_task_start(vtsk);
return 0;
 
 free_worker:
-   kfree(worker);
-   dev->worker = NULL;
+   WRITE_ONCE(dev->worker.vtsk, NULL);
return ret;
 }
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 0308638cdeee..305ec8593d46 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -154,7 +154,7 @@ struct vhost_dev {
struct vhost_virtqueue **vqs;
int nvqs;
struct eventfd_ctx *log_ctx;
-   struct vhost_worker *worker;
+   struct vhost_worker worker;
struct vhost_iotlb *umem;
struct vhost_iotlb *iotlb;
spinlock_t iotlb_lock;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-31 Thread Mike Christie
On 5/31/23 2:27 AM, Stefano Garzarella wrote:
> On Tue, May 30, 2023 at 6:30 PM  wrote:
>>
>> On 5/30/23 11:17 AM, Stefano Garzarella wrote:
>>> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote:
 On 5/30/23 11:00 AM, Stefano Garzarella wrote:
> I think it is partially related to commit 6e890c5d5021 ("vhost: use
> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move
> worker thread fields to new struct"). Maybe that commits just
> highlighted the issue and it was already existing.

 See my mail about the crash. Agree with your analysis about worker->vtsk
 not being set yet. It's a bug from my commit where I should have not set
 it so early or I should be checking for

 if (dev->worker && worker->vtsk)

 instead of

 if (dev->worker)
>>>
>>> Yes, though, in my opinion the problem may persist depending on how the
>>> instructions are reordered.
>>
>> Ah ok.
>>
>>>
>>> Should we protect dev->worker() with an RCU to be safe?
>>
>> For those multiple worker patchsets Jason had asked me about supporting
>> where we don't have a worker while we are swapping workers around. To do
>> that I had added rcu around the dev->worker. I removed it in later patchsets
>> because I didn't think anyone would use it.
>>
>> rcu would work for your case and for what Jason had requested.
> 
> Yeah, so you already have some patches?
> 
> Do you want to send it to solve this problem?
> 

Yeah, I'll break them out and send them later today when I can retest
rebased patches.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-31 Thread Stefano Garzarella
On Tue, May 30, 2023 at 6:30 PM  wrote:
>
> On 5/30/23 11:17 AM, Stefano Garzarella wrote:
> > On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote:
> >> On 5/30/23 11:00 AM, Stefano Garzarella wrote:
> >>> I think it is partially related to commit 6e890c5d5021 ("vhost: use
> >>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move
> >>> worker thread fields to new struct"). Maybe that commits just
> >>> highlighted the issue and it was already existing.
> >>
> >> See my mail about the crash. Agree with your analysis about worker->vtsk
> >> not being set yet. It's a bug from my commit where I should have not set
> >> it so early or I should be checking for
> >>
> >> if (dev->worker && worker->vtsk)
> >>
> >> instead of
> >>
> >> if (dev->worker)
> >
> > Yes, though, in my opinion the problem may persist depending on how the
> > instructions are reordered.
>
> Ah ok.
>
> >
> > Should we protect dev->worker() with an RCU to be safe?
>
> For those multiple worker patchsets Jason had asked me about supporting
> where we don't have a worker while we are swapping workers around. To do
> that I had added rcu around the dev->worker. I removed it in later patchsets
> because I didn't think anyone would use it.
>
> rcu would work for your case and for what Jason had requested.

Yeah, so you already have some patches?

Do you want to send it to solve this problem?

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread michael . christie
On 5/30/23 11:17 AM, Stefano Garzarella wrote:
> On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote:
>> On 5/30/23 11:00 AM, Stefano Garzarella wrote:
>>> I think it is partially related to commit 6e890c5d5021 ("vhost: use
>>> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move
>>> worker thread fields to new struct"). Maybe that commits just
>>> highlighted the issue and it was already existing.
>>
>> See my mail about the crash. Agree with your analysis about worker->vtsk
>> not being set yet. It's a bug from my commit where I should have not set
>> it so early or I should be checking for
>>
>> if (dev->worker && worker->vtsk)
>>
>> instead of
>>
>> if (dev->worker)
> 
> Yes, though, in my opinion the problem may persist depending on how the
> instructions are reordered.

Ah ok.

> 
> Should we protect dev->worker() with an RCU to be safe?

For those multiple worker patchsets Jason had asked me about supporting
where we don't have a worker while we are swapping workers around. To do
that I had added rcu around the dev->worker. I removed it in later patchsets
because I didn't think anyone would use it.

rcu would work for your case and for what Jason had requested.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Stefano Garzarella

On Tue, May 30, 2023 at 11:09:09AM -0500, Mike Christie wrote:

On 5/30/23 11:00 AM, Stefano Garzarella wrote:

I think it is partially related to commit 6e890c5d5021 ("vhost: use
vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move
worker thread fields to new struct"). Maybe that commits just
highlighted the issue and it was already existing.


See my mail about the crash. Agree with your analysis about worker->vtsk
not being set yet. It's a bug from my commit where I should have not set
it so early or I should be checking for

if (dev->worker && worker->vtsk)

instead of

if (dev->worker)


Yes, though, in my opinion the problem may persist depending on how the
instructions are reordered.

Should we protect dev->worker() with an RCU to be safe?



One question about the behavior before my commit though and what we want in
the end going forward. Before that patch we would just drop work if
vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected?


I think so, since we ask the guest to call VHOST_SET_OWNER, before any
other command.



The call to vhost_work_queue in vhost_vsock_start was only seeing the
works queued after VHOST_SET_OWNER. Did you want works queued before that?



Yes, for example if an application in the host has tried to connect and
is waiting for a timeout, we already have work queued up to flush as
soon as we start the device. (See commit 0b841030625c ("vhost: vsock:
kick send_pkt worker once device is started")).

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Stefano Garzarella

On Tue, May 30, 2023 at 11:01:11AM -0500, Mike Christie wrote:

On 5/30/23 10:58 AM, Mike Christie wrote:

On 5/30/23 8:44 AM, Stefano Garzarella wrote:


From a first glance, it looks like an issue when we call vhost_work_queue().
@Mike, does that ring any bells since you recently looked at that code?


I see the bug. needed to have set the dev->worker after setting worker->vtsk


Yes, I came to the same conclusion (see my email sent at the same time
:-).


like below:


diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a92af08e7864..7bd95984a501 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev)
if (!worker)
return -ENOMEM;

-   dev->worker = worker;
worker->kcov_handle = kcov_common_handle();
init_llist_head(>work_list);
snprintf(name, sizeof(name), "vhost-%d", current->pid);
@@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev)
}

worker->vtsk = vtsk;


Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask
is set before dev->worker or vhost_work_queue could still end up seeing
dev->worker set before worker->vtsk right?


But should we pair smp_wmb() with an smp_rmb() wherever we check 
dev->worker?


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Mike Christie
On 5/30/23 11:00 AM, Stefano Garzarella wrote:
> I think it is partially related to commit 6e890c5d5021 ("vhost: use
> vhost_tasks for worker threads") and commit 1a5f8090c6de ("vhost: move
> worker thread fields to new struct"). Maybe that commits just
> highlighted the issue and it was already existing.

See my mail about the crash. Agree with your analysis about worker->vtsk
not being set yet. It's a bug from my commit where I should have not set
it so early or I should be checking for

if (dev->worker && worker->vtsk)

instead of 

if (dev->worker)

One question about the behavior before my commit though and what we want in
the end going forward. Before that patch we would just drop work if
vhost_work_queue was called before VHOST_SET_OWNER. Was that correct/expected?

The call to vhost_work_queue in vhost_vsock_start was only seeing the
works queued after VHOST_SET_OWNER. Did you want works queued before that?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Mike Christie
On 5/30/23 10:58 AM, Mike Christie wrote:
> On 5/30/23 8:44 AM, Stefano Garzarella wrote:
>>
>> From a first glance, it looks like an issue when we call vhost_work_queue().
>> @Mike, does that ring any bells since you recently looked at that code?
> 
> I see the bug. needed to have set the dev->worker after setting worker->vtsk
> like below:
> 
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a92af08e7864..7bd95984a501 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev)
>   if (!worker)
>   return -ENOMEM;
>  
> - dev->worker = worker;
>   worker->kcov_handle = kcov_common_handle();
>   init_llist_head(>work_list);
>   snprintf(name, sizeof(name), "vhost-%d", current->pid);
> @@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev)
>   }
>  
>   worker->vtsk = vtsk;

Shoot, oh wait, I think I needed a smp_wmb to always make sure worker->vtask
is set before dev->worker or vhost_work_queue could still end up seeing
dev->worker set before worker->vtsk right?

> + dev->worker = worker;>  vhost_task_start(vtsk);
>   return 0;
>  

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Stefano Garzarella
On Tue, May 30, 2023 at 3:44 PM Stefano Garzarella  wrote:
>
> On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of 
> > > git://git.ker..
> > > git tree:   upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e
> > > compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU 
> > > Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Downloadable assets:
> > > disk image: 
> > > https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz
> > > vmlinux: 
> > > https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz
> > > kernel image: 
> > > https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the 
> > > commit:
> > > Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com
> > >
> > > general protection fault, probably for non-canonical address 
> > > 0xdc0e:  [#1] PREEMPT SMP KASAN
> > > KASAN: null-ptr-deref in range [0x0070-0x0077]
> > > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 
> > > 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > > Google 05/16/2023
> > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 
> > > 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 
> > > 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> > > RSP: 0018:c9000333faf8 EFLAGS: 00010202
> > > RAX: dc00 RBX:  RCX: c9000d84d000
> > > RDX: 000e RSI: 841221d7 RDI: 0070
> > > RBP: 88804b6b95b0 R08: 0001 R09: 
> > > R10: 0001 R11:  R12: 88804b6b00b0
> > > R13:  R14: 88804b6b95e0 R15: 88804b6b95c8
> > > FS:  7f3b445ec700() GS:8880b980() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0
> > > DR0:  DR1:  DR2: 
> > > DR3: 003b DR6: 0ff0 DR7: 0400
> > > Call Trace:
> > >  
> > >  vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288
> > >  virtio_transport_send_pkt_info+0x54c/0x820 
> > > net/vmw_vsock/virtio_transport_common.c:250
> > >  virtio_transport_connect+0xb1/0xf0 
> > > net/vmw_vsock/virtio_transport_common.c:813
> > >  vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414
> > >  __sys_connect_file+0x153/0x1a0 net/socket.c:2003
> > >  __sys_connect+0x165/0x1a0 net/socket.c:2020
> > >  __do_sys_connect net/socket.c:2030 [inline]
> > >  __se_sys_connect net/socket.c:2027 [inline]
> > >  __x64_sys_connect+0x73/0xb0 net/socket.c:2027
> > >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> > >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > RIP: 0033:0x7f3b4388c169
> > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 
> > > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 
> > > ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a
> > > RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169
> > > RDX: 0010 RSI: 2140 RDI: 0004
> > > RBP: 7f3b438e7ca1 R08:  R09: 
> > > R10:  R11: 0246 R12: 
> > > R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000
> > >  
> > > Modules linked in:
> > > ---[ end trace  ]---
> > > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> > > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> > > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 
> > > 00 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 
> > > 75 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> > > RSP: 0018:c9000333faf8 EFLAGS: 00010202
> > > RAX: dc00 RBX:  RCX: c9000d84d000
> > > RDX: 000e RSI: 841221d7 RDI: 0070
> > > RBP: 88804b6b95b0 R08: 

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Mike Christie
On 5/30/23 8:44 AM, Stefano Garzarella wrote:
> 
> From a first glance, it looks like an issue when we call vhost_work_queue().
> @Mike, does that ring any bells since you recently looked at that code?

I see the bug. needed to have set the dev->worker after setting worker->vtsk
like below:


diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a92af08e7864..7bd95984a501 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -564,7 +564,6 @@ static int vhost_worker_create(struct vhost_dev *dev)
if (!worker)
return -ENOMEM;
 
-   dev->worker = worker;
worker->kcov_handle = kcov_common_handle();
init_llist_head(>work_list);
snprintf(name, sizeof(name), "vhost-%d", current->pid);
@@ -576,6 +575,7 @@ static int vhost_worker_create(struct vhost_dev *dev)
}
 
worker->vtsk = vtsk;
+   dev->worker = worker;
vhost_task_start(vtsk);
return 0;
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Stefano Garzarella
On Tue, May 30, 2023 at 1:24 PM Michael S. Tsirkin  wrote:
>
> On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e
> > compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils 
> > for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: 
> > https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz
> > vmlinux: 
> > https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz
> > kernel image: 
> > https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com
> >
> > general protection fault, probably for non-canonical address 
> > 0xdc0e:  [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0070-0x0077]
> > CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 
> > 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 05/16/2023
> > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 
> > 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 
> > 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> > RSP: 0018:c9000333faf8 EFLAGS: 00010202
> > RAX: dc00 RBX:  RCX: c9000d84d000
> > RDX: 000e RSI: 841221d7 RDI: 0070
> > RBP: 88804b6b95b0 R08: 0001 R09: 
> > R10: 0001 R11:  R12: 88804b6b00b0
> > R13:  R14: 88804b6b95e0 R15: 88804b6b95c8
> > FS:  7f3b445ec700() GS:8880b980() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0
> > DR0:  DR1:  DR2: 
> > DR3: 003b DR6: 0ff0 DR7: 0400
> > Call Trace:
> >  
> >  vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288
> >  virtio_transport_send_pkt_info+0x54c/0x820 
> > net/vmw_vsock/virtio_transport_common.c:250
> >  virtio_transport_connect+0xb1/0xf0 
> > net/vmw_vsock/virtio_transport_common.c:813
> >  vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414
> >  __sys_connect_file+0x153/0x1a0 net/socket.c:2003
> >  __sys_connect+0x165/0x1a0 net/socket.c:2020
> >  __do_sys_connect net/socket.c:2030 [inline]
> >  __se_sys_connect net/socket.c:2027 [inline]
> >  __x64_sys_connect+0x73/0xb0 net/socket.c:2027
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7f3b4388c169
> > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 
> > ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a
> > RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169
> > RDX: 0010 RSI: 2140 RDI: 0004
> > RBP: 7f3b438e7ca1 R08:  R09: 
> > R10:  R11: 0246 R12: 
> > R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000
> >  
> > Modules linked in:
> > ---[ end trace  ]---
> > RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> > RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> > Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 
> > 00 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 
> > 42 48 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> > RSP: 0018:c9000333faf8 EFLAGS: 00010202
> > RAX: dc00 RBX:  RCX: c9000d84d000
> > RDX: 000e RSI: 841221d7 RDI: 0070
> > RBP: 88804b6b95b0 R08: 0001 R09: 
> > R10: 0001 R11:  R12: 88804b6b00b0
> > R13:  R14: 88804b6b95e0 R15: 88804b6b95c8
> > FS:  7f3b445ec700() GS:8880b990() knlGS:
> > CS:  0010 

Re: [syzbot] [kvm?] [net?] [virt?] general protection fault in vhost_work_queue

2023-05-30 Thread Michael S. Tsirkin
On Tue, May 30, 2023 at 12:30:06AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:933174ae28ba Merge tag 'spi-fix-v6.4-rc3' of git://git.ker..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=138d4ae528
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f389ffdf4e9ba3f0
> dashboard link: https://syzkaller.appspot.com/bug?extid=d0d442c22fa8db45ff0e
> compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils 
> for Debian) 2.35.2
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: 
> https://storage.googleapis.com/syzbot-assets/21a81b8c2660/disk-933174ae.raw.xz
> vmlinux: 
> https://storage.googleapis.com/syzbot-assets/b4951d89e238/vmlinux-933174ae.xz
> kernel image: 
> https://storage.googleapis.com/syzbot-assets/21eb405303cc/bzImage-933174ae.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d0d442c22fa8db45f...@syzkaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 
> 0xdc0e:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0070-0x0077]
> CPU: 0 PID: 29845 Comm: syz-executor.4 Not tainted 
> 6.4.0-rc3-syzkaller-00032-g933174ae28ba #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 05/16/2023
> RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 
> 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 
> 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> RSP: 0018:c9000333faf8 EFLAGS: 00010202
> RAX: dc00 RBX:  RCX: c9000d84d000
> RDX: 000e RSI: 841221d7 RDI: 0070
> RBP: 88804b6b95b0 R08: 0001 R09: 
> R10: 0001 R11:  R12: 88804b6b00b0
> R13:  R14: 88804b6b95e0 R15: 88804b6b95c8
> FS:  7f3b445ec700() GS:8880b980() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2e423000 CR3: 5d734000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3: 003b DR6: 0ff0 DR7: 0400
> Call Trace:
>  
>  vhost_transport_send_pkt+0x268/0x520 drivers/vhost/vsock.c:288
>  virtio_transport_send_pkt_info+0x54c/0x820 
> net/vmw_vsock/virtio_transport_common.c:250
>  virtio_transport_connect+0xb1/0xf0 
> net/vmw_vsock/virtio_transport_common.c:813
>  vsock_connect+0x37f/0xcd0 net/vmw_vsock/af_vsock.c:1414
>  __sys_connect_file+0x153/0x1a0 net/socket.c:2003
>  __sys_connect+0x165/0x1a0 net/socket.c:2020
>  __do_sys_connect net/socket.c:2030 [inline]
>  __se_sys_connect net/socket.c:2027 [inline]
>  __x64_sys_connect+0x73/0xb0 net/socket.c:2027
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f3b4388c169
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f3b445ec168 EFLAGS: 0246 ORIG_RAX: 002a
> RAX: ffda RBX: 7f3b439ac050 RCX: 7f3b4388c169
> RDX: 0010 RSI: 2140 RDI: 0004
> RBP: 7f3b438e7ca1 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 7f3b43acfb1f R14: 7f3b445ec300 R15: 00022000
>  
> Modules linked in:
> ---[ end trace  ]---
> RIP: 0010:vhost_work_queue drivers/vhost/vhost.c:259 [inline]
> RIP: 0010:vhost_work_queue+0xfc/0x150 drivers/vhost/vhost.c:248
> Code: 00 00 fc ff df 48 89 da 48 c1 ea 03 80 3c 02 00 75 56 48 b8 00 00 00 00 
> 00 fc ff df 48 8b 1b 48 8d 7b 70 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 42 48 
> 8b 7b 70 e8 95 9e ae f9 5b 5d 41 5c 41 5d e9
> RSP: 0018:c9000333faf8 EFLAGS: 00010202
> RAX: dc00 RBX:  RCX: c9000d84d000
> RDX: 000e RSI: 841221d7 RDI: 0070
> RBP: 88804b6b95b0 R08: 0001 R09: 
> R10: 0001 R11:  R12: 88804b6b00b0
> R13:  R14: 88804b6b95e0 R15: 88804b6b95c8
> FS:  7f3b445ec700() GS:8880b990() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2e428000 CR3: 5d734000 CR4: 003506e0
> DR0:  DR1:  DR2: 
> DR3: 003b DR6: 0ff0 DR7: