Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-14 Thread Ilya Maximets


On 14.12.2017 17:31, Ilya Maximets wrote:
> One update for the testing scenario:
> 
> No need to kill OVS. The issue reproducible with simple 'del-port'
> and 'add-port'. virtio driver in guest could crash on both operations.
> Most times it crashes in my case on 'add-port' after deletion.
> 
> Hi Maxime,
> I already saw below patches and original linux kernel virtio issue.
> Just had no enough time to test them.
> Now I tested below patches and they fixes virtio driver crash.
> Thanks for suggestion.
> 
> Michael,
> I tested "[PATCH] virtio_error: don't invoke status callbacks "
> and it fixes the QEMU crash in case of broken guest index.
> Thanks.
> 
> Best regards, Ilya Maximets.
> 
> P.S. Previously I mentioned that I can not reproduce virtio driver
>  crash with "[PATCH] virtio_error: don't invoke status callbacks"

It should be "[PATCH dontapply] virtio: rework set_status callbacks".
Sorry again.

>  applied. I was wrong. I can reproduce now. System was misconfigured.
>  Sorry.
> 
> 
> On 14.12.2017 12:01, Maxime Coquelin wrote:
>> Hi Ilya,
>>
>> On 12/14/2017 08:06 AM, Ilya Maximets wrote:
>>> On 13.12.2017 22:48, Michael S. Tsirkin wrote:
 On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
>>> That
>>> looks very strange. Some of the functions gets 'old_status', others
>>> the 'new_status'. I'm a bit confused.
>>
>> OK, fair enough. Fixed - let's pass old status everywhere,
>> users that need the new one can get it from the vdev.
>>
>>> And it's not functional in current state:
>>>
>>> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
>>
>> Fixed too. new version below.
>
> This doesn't fix the segmentation fault.

 Hmm you are right. Looking into it.

> I have exactly same crash stacktrace:
>
> #0  vhost_memory_unmap hw/virtio/vhost.c:446
> #1  vhost_virtqueue_stop hw/virtio/vhost.c:1155
> #2  vhost_dev_stop hw/virtio/vhost.c:1594
> #3  vhost_net_stop_one hw/net/vhost_net.c:289
> #4  vhost_net_stop hw/net/vhost_net.c:368
> #5  virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
> hw/net/virtio-net.c:180
> #6  virtio_net_set_status (vdev=0x5625f3901100, old_status= out>) at hw/net/virtio-net.c:254
> #7  virtio_set_status (vdev=vdev@entry=0x5625f3901100, val= out>) at hw/virtio/virtio.c:1152
> #8  virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 
> "Guest says index %u is available") at hw/virtio/virtio.c:2460

 BTW what is causing this? Why is guest avail index corrupted?
>>>
>>> My testing environment for the issue:
>>>
>>> * QEMU 2.10.1
>>
>> Could you try to backport below patch and try again killing OVS?
>>
>> commit 2ae39a113af311cb56a0c35b7f212dafcef15303
>> Author: Maxime Coquelin 
>> Date:   Thu Nov 16 19:48:35 2017 +0100
>>
>>     vhost: restore avail index from vring used index on disconnection
>>
>>     vhost_virtqueue_stop() gets avail index value from the backend,
>>     except if the backend is not responding.
>>
>>     It happens when the backend crashes, and in this case, internal
>>     state of the virtio queue is inconsistent, making packets
>>     to corrupt the vring state.
>>
>>     With a Linux guest, it results in following error message on
>>     backend reconnection:
>>
>>     [   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
>>     [   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
>>     [   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5
>>
>>     Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down")
>>     Cc: qemu-sta...@nongnu.org
>>     Signed-off-by: Maxime Coquelin 
>>     Reviewed-by: Michael S. Tsirkin 
>>     Signed-off-by: Michael S. Tsirkin 
>>
>> commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a
>> Author: Maxime Coquelin 
>> Date:   Thu Nov 16 19:48:34 2017 +0100
>>
>>     virtio: Add queue interface to restore avail index from vring used index
>>
>>     In case of backend crash, it is not possible to restore internal
>>     avail index from the backend value as vhost_get_vring_base
>>     callback fails.
>>
>>     This patch provides a new interface to restore internal avail index
>>     from the vring used index, as done by some vhost-user backend on
>>     reconnection.
>>
>>     Signed-off-by: Maxime Coquelin 
>>     Reviewed-by: Michael S. Tsirkin 
>>     Signed-off-by: Michael S. Tsirkin 
>>
>>
>> Cheers,
>> Maxime
>>
>>
>>
> 
> 



Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-14 Thread Ilya Maximets
One update for the testing scenario:

No need to kill OVS. The issue reproducible with simple 'del-port'
and 'add-port'. virtio driver in guest could crash on both operations.
Most times it crashes in my case on 'add-port' after deletion.

Hi Maxime,
I already saw below patches and original linux kernel virtio issue.
Just had no enough time to test them.
Now I tested below patches and they fixes virtio driver crash.
Thanks for suggestion.

Michael,
I tested "[PATCH] virtio_error: don't invoke status callbacks "
and it fixes the QEMU crash in case of broken guest index.
Thanks.

Best regards, Ilya Maximets.

P.S. Previously I mentioned that I can not reproduce virtio driver
 crash with "[PATCH] virtio_error: don't invoke status callbacks"
 applied. I was wrong. I can reproduce now. System was misconfigured.
 Sorry.


On 14.12.2017 12:01, Maxime Coquelin wrote:
> Hi Ilya,
> 
> On 12/14/2017 08:06 AM, Ilya Maximets wrote:
>> On 13.12.2017 22:48, Michael S. Tsirkin wrote:
>>> On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
>> That
>> looks very strange. Some of the functions gets 'old_status', others
>> the 'new_status'. I'm a bit confused.
>
> OK, fair enough. Fixed - let's pass old status everywhere,
> users that need the new one can get it from the vdev.
>
>> And it's not functional in current state:
>>
>> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
>
> Fixed too. new version below.

 This doesn't fix the segmentation fault.
>>>
>>> Hmm you are right. Looking into it.
>>>
 I have exactly same crash stacktrace:

 #0  vhost_memory_unmap hw/virtio/vhost.c:446
 #1  vhost_virtqueue_stop hw/virtio/vhost.c:1155
 #2  vhost_dev_stop hw/virtio/vhost.c:1594
 #3  vhost_net_stop_one hw/net/vhost_net.c:289
 #4  vhost_net_stop hw/net/vhost_net.c:368
 #5  virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
 hw/net/virtio-net.c:180
 #6  virtio_net_set_status (vdev=0x5625f3901100, old_status=>>> out>) at hw/net/virtio-net.c:254
 #7  virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=>>> out>) at hw/virtio/virtio.c:1152
 #8  virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest 
 says index %u is available") at hw/virtio/virtio.c:2460
>>>
>>> BTW what is causing this? Why is guest avail index corrupted?
>>
>> My testing environment for the issue:
>>
>> * QEMU 2.10.1
> 
> Could you try to backport below patch and try again killing OVS?
> 
> commit 2ae39a113af311cb56a0c35b7f212dafcef15303
> Author: Maxime Coquelin 
> Date:   Thu Nov 16 19:48:35 2017 +0100
> 
>     vhost: restore avail index from vring used index on disconnection
> 
>     vhost_virtqueue_stop() gets avail index value from the backend,
>     except if the backend is not responding.
> 
>     It happens when the backend crashes, and in this case, internal
>     state of the virtio queue is inconsistent, making packets
>     to corrupt the vring state.
> 
>     With a Linux guest, it results in following error message on
>     backend reconnection:
> 
>     [   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
>     [   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
>     [   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5
> 
>     Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down")
>     Cc: qemu-sta...@nongnu.org
>     Signed-off-by: Maxime Coquelin 
>     Reviewed-by: Michael S. Tsirkin 
>     Signed-off-by: Michael S. Tsirkin 
> 
> commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a
> Author: Maxime Coquelin 
> Date:   Thu Nov 16 19:48:34 2017 +0100
> 
>     virtio: Add queue interface to restore avail index from vring used index
> 
>     In case of backend crash, it is not possible to restore internal
>     avail index from the backend value as vhost_get_vring_base
>     callback fails.
> 
>     This patch provides a new interface to restore internal avail index
>     from the vring used index, as done by some vhost-user backend on
>     reconnection.
> 
>     Signed-off-by: Maxime Coquelin 
>     Reviewed-by: Michael S. Tsirkin 
>     Signed-off-by: Michael S. Tsirkin 
> 
> 
> Cheers,
> Maxime
> 
> 
> 



Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-14 Thread Maxime Coquelin

Hi Ilya,

On 12/14/2017 08:06 AM, Ilya Maximets wrote:

On 13.12.2017 22:48, Michael S. Tsirkin wrote:

On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:

That
looks very strange. Some of the functions gets 'old_status', others
the 'new_status'. I'm a bit confused.


OK, fair enough. Fixed - let's pass old status everywhere,
users that need the new one can get it from the vdev.


And it's not functional in current state:

hw/net/virtio-net.c:264:28: error: ‘status’ undeclared


Fixed too. new version below.


This doesn't fix the segmentation fault.


Hmm you are right. Looking into it.


I have exactly same crash stacktrace:

#0  vhost_memory_unmap hw/virtio/vhost.c:446
#1  vhost_virtqueue_stop hw/virtio/vhost.c:1155
#2  vhost_dev_stop hw/virtio/vhost.c:1594
#3  vhost_net_stop_one hw/net/vhost_net.c:289
#4  vhost_net_stop hw/net/vhost_net.c:368
#5  virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
hw/net/virtio-net.c:180
#6  virtio_net_set_status (vdev=0x5625f3901100, old_status=) at 
hw/net/virtio-net.c:254
#7  virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) at 
hw/virtio/virtio.c:1152
#8  virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest says 
index %u is available") at hw/virtio/virtio.c:2460


BTW what is causing this? Why is guest avail index corrupted?


My testing environment for the issue:

* QEMU 2.10.1


Could you try to backport below patch and try again killing OVS?

commit 2ae39a113af311cb56a0c35b7f212dafcef15303
Author: Maxime Coquelin 
Date:   Thu Nov 16 19:48:35 2017 +0100

vhost: restore avail index from vring used index on disconnection

vhost_virtqueue_stop() gets avail index value from the backend,
except if the backend is not responding.

It happens when the backend crashes, and in this case, internal
state of the virtio queue is inconsistent, making packets
to corrupt the vring state.

With a Linux guest, it results in following error message on
backend reconnection:

[   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
[   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
[   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5

Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link 
down")

Cc: qemu-sta...@nongnu.org
Signed-off-by: Maxime Coquelin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 

commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a
Author: Maxime Coquelin 
Date:   Thu Nov 16 19:48:34 2017 +0100

virtio: Add queue interface to restore avail index from vring used 
index


In case of backend crash, it is not possible to restore internal
avail index from the backend value as vhost_get_vring_base
callback fails.

This patch provides a new interface to restore internal avail index
from the vring used index, as done by some vhost-user backend on
reconnection.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 


Cheers,
Maxime



Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-13 Thread Ilya Maximets
On 13.12.2017 22:48, Michael S. Tsirkin wrote:
> On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
 That
 looks very strange. Some of the functions gets 'old_status', others
 the 'new_status'. I'm a bit confused.
>>>
>>> OK, fair enough. Fixed - let's pass old status everywhere,
>>> users that need the new one can get it from the vdev.
>>>
 And it's not functional in current state:

 hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
>>>
>>> Fixed too. new version below.
>>
>> This doesn't fix the segmentation fault.
> 
> Hmm you are right. Looking into it.
> 
>> I have exactly same crash stacktrace:
>>
>> #0  vhost_memory_unmap hw/virtio/vhost.c:446
>> #1  vhost_virtqueue_stop hw/virtio/vhost.c:1155
>> #2  vhost_dev_stop hw/virtio/vhost.c:1594
>> #3  vhost_net_stop_one hw/net/vhost_net.c:289
>> #4  vhost_net_stop hw/net/vhost_net.c:368
>> #5  virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
>> hw/net/virtio-net.c:180
>> #6  virtio_net_set_status (vdev=0x5625f3901100, old_status=) 
>> at hw/net/virtio-net.c:254
>> #7  virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) 
>> at hw/virtio/virtio.c:1152
>> #8  virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest 
>> says index %u is available") at hw/virtio/virtio.c:2460
> 
> BTW what is causing this? Why is guest avail index corrupted?

My testing environment for the issue:

* QEMU 2.10.1
* testpmd from a bit outdated DPDK 16.07.0 in guest with uio_pci_generic
* OVS 2.8 with DPDK 17.05.2 on the host.
* 2 vhost-user ports in VM in server mode.
* Testpmd just forwards packets from one port to another with --forward-mode=mac

testpmd with virtio-net driver sometimes crashes after killing the OVS if some
heavy traffic flows. After restarting of the OVS and stopping it again QEMU
crashes on vhost disconnect and virtqueue_get_head() failure.

So, the sequence is follows:

1. Start OVS, Start QEMU, start testpmd, start external packet generator.
2. pkill -11 ovs-vswitchd
3. If testpmd not crashed in guest goto step #1.
4. Start OVS (testpmd with virtio driver still in down state)
5. pkill -11 ovs-vswitchd
6. Observe QEMU crash

I suspect that virtio-net driver from DPDK 16.07.0 corrupts vrings
just before crash at step #2.

I didn't tried actually to investigate virtio driver crash because it's a bit
out of my scope and I have no enough time and a driver slightly outdated.
But the stability of QEMU itself is really important thing.

One interesting thing is that I can not reproduce virtio driver crash with
"virtio: rework set_status callbacks" applied. I had to break vrings manually
to reproduce original nested call of vhost_net_stop().

> 
>> #9  virtqueue_get_head at hw/virtio/virtio.c:543
>> #10 virtqueue_drop_all hw/virtio/virtio.c:984
>> #11 virtio_net_drop_tx_queue_data hw/net/virtio-net.c:240
>> #12 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>> #13 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>> #14 vhost_net_stop_one at hw/net/vhost_net.c:290
>> #15 vhost_net_stop at hw/net/vhost_net.c:368
>> #16 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
>> hw/net/virtio-net.c:180
>> #17 virtio_net_set_status (vdev=0x5625f3901100, old_status=) 
>> at hw/net/virtio-net.c:254
>> #18 qmp_set_link at net/net.c:1430
>> #19 chr_closed_bh at net/vhost-user.c:214
>> #20 aio_bh_call at util/async.c:90
>> #21 aio_bh_poll at util/async.c:118
>> #22 aio_dispatch at util/aio-posix.c:429
>> #23 aio_ctx_dispatch at util/async.c:261
>> #24 g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>> #25 glib_pollfds_poll () at util/main-loop.c:213
>> #26 os_host_main_loop_wait at util/main-loop.c:261
>> #27 main_loop_wait at util/main-loop.c:515
>> #28 main_loop () at vl.c:1917
>> #29 main
>>
>>
>> Actually the logic doesn't change. In function  virtio_net_vhost_status():
>>
>> -if ((virtio_net_started(n, status) && !nc->peer->link_down) ==
>> +if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) ==
>>  !!n->vhost_started) {
>>  return;
>>  }
>>
>> previously new 'status' was checked and the new 'vdev->status' checked now.
>> It's the same condition that doesn't work because vhost_started flag is still
>> set to 1.
>> Anyway, nc->peer->link_down is true in our case, so there is no difference if
>> we'll change the vdev->status.
>>
>
> Signed-off-by: Michael S. Tsirkin 
>>>
>>> Still completely untested, sorry about that - hope you can help here.
>>>
>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>> index 098bdaa..f5d0ee1 100644
>>> --- a/include/hw/virtio/virtio.h
>>> +++ b/include/hw/virtio/virtio.h
>>> @@ -115,7 +115,7 @@ typedef struct VirtioDeviceClass {
>>>  void (*get_config)(VirtIODevice *vdev, uint8_t *config);
>>>  void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
>>>  void (*reset)(VirtIODevice *vdev);
>>> -void 

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-13 Thread Michael S. Tsirkin
On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
> >> That
> >> looks very strange. Some of the functions gets 'old_status', others
> >> the 'new_status'. I'm a bit confused.
> > 
> > OK, fair enough. Fixed - let's pass old status everywhere,
> > users that need the new one can get it from the vdev.
> > 
> >> And it's not functional in current state:
> >>
> >> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
> > 
> > Fixed too. new version below.
> 
> This doesn't fix the segmentation fault.

Hmm you are right. Looking into it.

> I have exactly same crash stacktrace:
> 
> #0  vhost_memory_unmap hw/virtio/vhost.c:446
> #1  vhost_virtqueue_stop hw/virtio/vhost.c:1155
> #2  vhost_dev_stop hw/virtio/vhost.c:1594
> #3  vhost_net_stop_one hw/net/vhost_net.c:289
> #4  vhost_net_stop hw/net/vhost_net.c:368
> #5  virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
> hw/net/virtio-net.c:180
> #6  virtio_net_set_status (vdev=0x5625f3901100, old_status=) 
> at hw/net/virtio-net.c:254
> #7  virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=) 
> at hw/virtio/virtio.c:1152
> #8  virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest 
> says index %u is available") at hw/virtio/virtio.c:2460

BTW what is causing this? Why is guest avail index corrupted?

> #9  virtqueue_get_head at hw/virtio/virtio.c:543
> #10 virtqueue_drop_all hw/virtio/virtio.c:984
> #11 virtio_net_drop_tx_queue_data hw/net/virtio-net.c:240
> #12 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
> #13 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> #14 vhost_net_stop_one at hw/net/vhost_net.c:290
> #15 vhost_net_stop at hw/net/vhost_net.c:368
> #16 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at 
> hw/net/virtio-net.c:180
> #17 virtio_net_set_status (vdev=0x5625f3901100, old_status=) 
> at hw/net/virtio-net.c:254
> #18 qmp_set_link at net/net.c:1430
> #19 chr_closed_bh at net/vhost-user.c:214
> #20 aio_bh_call at util/async.c:90
> #21 aio_bh_poll at util/async.c:118
> #22 aio_dispatch at util/aio-posix.c:429
> #23 aio_ctx_dispatch at util/async.c:261
> #24 g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #25 glib_pollfds_poll () at util/main-loop.c:213
> #26 os_host_main_loop_wait at util/main-loop.c:261
> #27 main_loop_wait at util/main-loop.c:515
> #28 main_loop () at vl.c:1917
> #29 main
> 
> 
> Actually the logic doesn't change. In function  virtio_net_vhost_status():
> 
> -if ((virtio_net_started(n, status) && !nc->peer->link_down) ==
> +if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) ==
>  !!n->vhost_started) {
>  return;
>  }
> 
> previously new 'status' was checked and the new 'vdev->status' checked now.
> It's the same condition that doesn't work because vhost_started flag is still
> set to 1.
> Anyway, nc->peer->link_down is true in our case, so there is no difference if
> we'll change the vdev->status.
> 
> >>>
> >>> Signed-off-by: Michael S. Tsirkin 
> > 
> > Still completely untested, sorry about that - hope you can help here.
> > 
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 098bdaa..f5d0ee1 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -115,7 +115,7 @@ typedef struct VirtioDeviceClass {
> >  void (*get_config)(VirtIODevice *vdev, uint8_t *config);
> >  void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
> >  void (*reset)(VirtIODevice *vdev);
> > -void (*set_status)(VirtIODevice *vdev, uint8_t val);
> > +void (*set_status)(VirtIODevice *vdev, uint8_t old_status);
> >  /* For transitional devices, this is a bitmap of features
> >   * that are only exposed on the legacy interface but not
> >   * the modern one.
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 05d1440..b8b07ba 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -814,15 +814,15 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
> > *vdev, uint64_t features,
> >  return features;
> >  }
> >  
> > -static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
> > +static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t old_status)
> >  {
> >  VirtIOBlock *s = VIRTIO_BLK(vdev);
> >  
> > -if (!(status & (VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK))) {
> > +if (!(vdev->status & (VIRTIO_CONFIG_S_DRIVER | 
> > VIRTIO_CONFIG_S_DRIVER_OK))) {
> >  assert(!s->dataplane_started);
> >  }
> >  
> > -if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> > +if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> >  return;
> >  }
> >  
> > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > index 9470bd7..881b1ff 100644
> > --- a/hw/char/virtio-serial-bus.c
> > +++ b/hw/char/virtio-serial-bus.c
> > @@ -616,7 +616,7 @@ static void guest_reset(VirtIOSerial *vser)
> >  

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-13 Thread Ilya Maximets
On 11.12.2017 07:35, Michael S. Tsirkin wrote:
> On Fri, Dec 08, 2017 at 05:54:18PM +0300, Ilya Maximets wrote:
>> On 07.12.2017 20:27, Michael S. Tsirkin wrote:
>>> On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
 On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
>> In case virtio error occured after vhost_dev_close(), qemu will crash
>> in nested cleanup while checking IOMMU flag because dev->vdev already
>> set to zero and resources are already freed.
>>
>> Example:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>>
>> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>> #3  vhost_net_stop at hw/net/vhost_net.c:368
>>
>> Nested call to vhost_net_stop(). First time was at #14.
>>
>> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>> #6  virtio_set_status at hw/virtio/virtio.c:1146
>> #7  virtio_error at hw/virtio/virtio.c:2455
>>
>> virtqueue_get_head() failed here.
>>
>> #8  virtqueue_get_head at hw/virtio/virtio.c:543
>> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>>
>> vhost_dev_stop() was executed here. dev->vdev == NULL now.
>>
>> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>> #14 vhost_net_stop at hw/net/vhost_net.c:368
>> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>> #18 chr_closed_bh at net/vhost-user.c:214
>> #19 aio_bh_call at util/async.c:90
>> #20 aio_bh_poll at util/async.c:118
>> #21 aio_dispatch at util/aio-posix.c:429
>> #22 aio_ctx_dispatch at util/async.c:261
>> #23 g_main_context_dispatch
>> #24 glib_pollfds_poll at util/main-loop.c:213
>> #25 os_host_main_loop_wait at util/main-loop.c:261
>> #26 main_loop_wait at util/main-loop.c:515
>> #27 main_loop () at vl.c:1917
>> #28 main at vl.c:4795
>>
>> Above backtrace captured from qemu crash on vhost disconnect while
>> virtio driver in guest was in failed state.
>>
>> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
>> it will assert further trying to free already freed ioeventfds. The
>> real problem is that we're allowing nested calls to 'vhost_net_stop'.
>>
>> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
>> any possible double frees and segmentation faults doue to using of
>> already freed resources by setting 'vhost_started' flag to zero prior
>> to 'vhost_net_stop' call.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> This issue was already addressed more than a year ago by the following
>> patch: 
>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
>> but it was dropped without review due to not yet implemented 
>> re-connection
>> of vhost-user. Re-connection implementation lately fixed most of the
>> nested calls, but few of them still exists. For example, above backtrace
>> captured after 'virtqueue_get_head()' failure on vhost-user 
>> disconnection.
>>
>>  hw/net/virtio-net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 38674b0..4d95a18 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
>> uint8_t status)
>>  n->vhost_started = 0;
>>  }
>>  } else {
>> -vhost_net_stop(vdev, n->nic->ncs, queues);
>>  n->vhost_started = 0;
>> +vhost_net_stop(vdev, n->nic->ncs, queues);
>>  }
>>  }
>
> Well the wider context is
>
>
> n->vhost_started = 1;
> r = vhost_net_start(vdev, n->nic->ncs, queues);
> if (r < 0) {
> error_report("unable to start vhost net: %d: "
>  "falling back on userspace virtio", -r);
> n->vhost_started = 0;
> }
> } else {
> vhost_net_stop(vdev, n->nic->ncs, queues);
> n->vhost_started = 0;
>
> So we set it to 

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-10 Thread Michael S. Tsirkin
On Fri, Dec 08, 2017 at 05:54:18PM +0300, Ilya Maximets wrote:
> On 07.12.2017 20:27, Michael S. Tsirkin wrote:
> > On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
> >> On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
>  In case virtio error occured after vhost_dev_close(), qemu will crash
>  in nested cleanup while checking IOMMU flag because dev->vdev already
>  set to zero and resources are already freed.
> 
>  Example:
> 
>  Program received signal SIGSEGV, Segmentation fault.
>  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> 
>  #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>  #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>  #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>  #3  vhost_net_stop at hw/net/vhost_net.c:368
> 
>  Nested call to vhost_net_stop(). First time was at #14.
> 
>  #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>  #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>  #6  virtio_set_status at hw/virtio/virtio.c:1146
>  #7  virtio_error at hw/virtio/virtio.c:2455
> 
>  virtqueue_get_head() failed here.
> 
>  #8  virtqueue_get_head at hw/virtio/virtio.c:543
>  #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>  #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>  #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>  #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> 
>  vhost_dev_stop() was executed here. dev->vdev == NULL now.
> 
>  #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>  #14 vhost_net_stop at hw/net/vhost_net.c:368
>  #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>  #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>  #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>  #18 chr_closed_bh at net/vhost-user.c:214
>  #19 aio_bh_call at util/async.c:90
>  #20 aio_bh_poll at util/async.c:118
>  #21 aio_dispatch at util/aio-posix.c:429
>  #22 aio_ctx_dispatch at util/async.c:261
>  #23 g_main_context_dispatch
>  #24 glib_pollfds_poll at util/main-loop.c:213
>  #25 os_host_main_loop_wait at util/main-loop.c:261
>  #26 main_loop_wait at util/main-loop.c:515
>  #27 main_loop () at vl.c:1917
>  #28 main at vl.c:4795
> 
>  Above backtrace captured from qemu crash on vhost disconnect while
>  virtio driver in guest was in failed state.
> 
>  We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
>  it will assert further trying to free already freed ioeventfds. The
>  real problem is that we're allowing nested calls to 'vhost_net_stop'.
> 
>  This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
>  any possible double frees and segmentation faults doue to using of
>  already freed resources by setting 'vhost_started' flag to zero prior
>  to 'vhost_net_stop' call.
> 
>  Signed-off-by: Ilya Maximets 
>  ---
> 
>  This issue was already addressed more than a year ago by the following
>  patch: 
>  https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
>  but it was dropped without review due to not yet implemented 
>  re-connection
>  of vhost-user. Re-connection implementation lately fixed most of the
>  nested calls, but few of them still exists. For example, above backtrace
>  captured after 'virtqueue_get_head()' failure on vhost-user 
>  disconnection.
> 
>   hw/net/virtio-net.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>  index 38674b0..4d95a18 100644
>  --- a/hw/net/virtio-net.c
>  +++ b/hw/net/virtio-net.c
>  @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
>  uint8_t status)
>   n->vhost_started = 0;
>   }
>   } else {
>  -vhost_net_stop(vdev, n->nic->ncs, queues);
>   n->vhost_started = 0;
>  +vhost_net_stop(vdev, n->nic->ncs, queues);
>   }
>   }
> >>>
> >>> Well the wider context is
> >>>
> >>>
> >>> n->vhost_started = 1;
> >>> r = vhost_net_start(vdev, n->nic->ncs, queues);
> >>> if (r < 0) {
> >>> error_report("unable to start vhost net: %d: "
> >>>  "falling back on userspace virtio", -r);
> >>> n->vhost_started = 0;
> >>> }
> >>> } else {
> >>> vhost_net_stop(vdev, n->nic->ncs, queues);
> >>> n->vhost_started = 0;
> >>>
> >>> So we set it to 1 before start, we should clear after stop.
> >>

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-08 Thread Ilya Maximets
On 07.12.2017 20:27, Michael S. Tsirkin wrote:
> On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
>> On 06.12.2017 19:45, Michael S. Tsirkin wrote:
>>> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
 In case virtio error occured after vhost_dev_close(), qemu will crash
 in nested cleanup while checking IOMMU flag because dev->vdev already
 set to zero and resources are already freed.

 Example:

 Program received signal SIGSEGV, Segmentation fault.
 vhost_virtqueue_stop at hw/virtio/vhost.c:1155

 #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
 #1  vhost_dev_stop at hw/virtio/vhost.c:1594
 #2  vhost_net_stop_one at hw/net/vhost_net.c:289
 #3  vhost_net_stop at hw/net/vhost_net.c:368

 Nested call to vhost_net_stop(). First time was at #14.

 #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
 #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
 #6  virtio_set_status at hw/virtio/virtio.c:1146
 #7  virtio_error at hw/virtio/virtio.c:2455

 virtqueue_get_head() failed here.

 #8  virtqueue_get_head at hw/virtio/virtio.c:543
 #9  virtqueue_drop_all at hw/virtio/virtio.c:984
 #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
 #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
 #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431

 vhost_dev_stop() was executed here. dev->vdev == NULL now.

 #13 vhost_net_stop_one at hw/net/vhost_net.c:290
 #14 vhost_net_stop at hw/net/vhost_net.c:368
 #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
 #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
 #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
 #18 chr_closed_bh at net/vhost-user.c:214
 #19 aio_bh_call at util/async.c:90
 #20 aio_bh_poll at util/async.c:118
 #21 aio_dispatch at util/aio-posix.c:429
 #22 aio_ctx_dispatch at util/async.c:261
 #23 g_main_context_dispatch
 #24 glib_pollfds_poll at util/main-loop.c:213
 #25 os_host_main_loop_wait at util/main-loop.c:261
 #26 main_loop_wait at util/main-loop.c:515
 #27 main_loop () at vl.c:1917
 #28 main at vl.c:4795

 Above backtrace captured from qemu crash on vhost disconnect while
 virtio driver in guest was in failed state.

 We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
 it will assert further trying to free already freed ioeventfds. The
 real problem is that we're allowing nested calls to 'vhost_net_stop'.

 This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
 any possible double frees and segmentation faults doue to using of
 already freed resources by setting 'vhost_started' flag to zero prior
 to 'vhost_net_stop' call.

 Signed-off-by: Ilya Maximets 
 ---

 This issue was already addressed more than a year ago by the following
 patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
 but it was dropped without review due to not yet implemented re-connection
 of vhost-user. Re-connection implementation lately fixed most of the
 nested calls, but few of them still exists. For example, above backtrace
 captured after 'virtqueue_get_head()' failure on vhost-user disconnection.

  hw/net/virtio-net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index 38674b0..4d95a18 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
 @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
 uint8_t status)
  n->vhost_started = 0;
  }
  } else {
 -vhost_net_stop(vdev, n->nic->ncs, queues);
  n->vhost_started = 0;
 +vhost_net_stop(vdev, n->nic->ncs, queues);
  }
  }
>>>
>>> Well the wider context is
>>>
>>>
>>> n->vhost_started = 1;
>>> r = vhost_net_start(vdev, n->nic->ncs, queues);
>>> if (r < 0) {
>>> error_report("unable to start vhost net: %d: "
>>>  "falling back on userspace virtio", -r);
>>> n->vhost_started = 0;
>>> }
>>> } else {
>>> vhost_net_stop(vdev, n->nic->ncs, queues);
>>> n->vhost_started = 0;
>>>
>>> So we set it to 1 before start, we should clear after stop.
>>
>> OK. I agree that clearing after is a bit safer. But in this case we need
>> a separate flag or other way to detect that we're already inside the
>> 'vhost_net_stop()'.
>>
>> What do you think about that old patch:
>> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?
>>
>> It 

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-07 Thread Michael S. Tsirkin
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
> On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
> >> In case virtio error occured after vhost_dev_close(), qemu will crash
> >> in nested cleanup while checking IOMMU flag because dev->vdev already
> >> set to zero and resources are already freed.
> >>
> >> Example:
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >>
> >> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
> >> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
> >> #3  vhost_net_stop at hw/net/vhost_net.c:368
> >>
> >> Nested call to vhost_net_stop(). First time was at #14.
> >>
> >> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
> >> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
> >> #6  virtio_set_status at hw/virtio/virtio.c:1146
> >> #7  virtio_error at hw/virtio/virtio.c:2455
> >>
> >> virtqueue_get_head() failed here.
> >>
> >> #8  virtqueue_get_head at hw/virtio/virtio.c:543
> >> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
> >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
> >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
> >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> >>
> >> vhost_dev_stop() was executed here. dev->vdev == NULL now.
> >>
> >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
> >> #14 vhost_net_stop at hw/net/vhost_net.c:368
> >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
> >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
> >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
> >> #18 chr_closed_bh at net/vhost-user.c:214
> >> #19 aio_bh_call at util/async.c:90
> >> #20 aio_bh_poll at util/async.c:118
> >> #21 aio_dispatch at util/aio-posix.c:429
> >> #22 aio_ctx_dispatch at util/async.c:261
> >> #23 g_main_context_dispatch
> >> #24 glib_pollfds_poll at util/main-loop.c:213
> >> #25 os_host_main_loop_wait at util/main-loop.c:261
> >> #26 main_loop_wait at util/main-loop.c:515
> >> #27 main_loop () at vl.c:1917
> >> #28 main at vl.c:4795
> >>
> >> Above backtrace captured from qemu crash on vhost disconnect while
> >> virtio driver in guest was in failed state.
> >>
> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
> >> it will assert further trying to free already freed ioeventfds. The
> >> real problem is that we're allowing nested calls to 'vhost_net_stop'.
> >>
> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
> >> any possible double frees and segmentation faults doue to using of
> >> already freed resources by setting 'vhost_started' flag to zero prior
> >> to 'vhost_net_stop' call.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>
> >> This issue was already addressed more than a year ago by the following
> >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
> >> but it was dropped without review due to not yet implemented re-connection
> >> of vhost-user. Re-connection implementation lately fixed most of the
> >> nested calls, but few of them still exists. For example, above backtrace
> >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
> >>
> >>  hw/net/virtio-net.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 38674b0..4d95a18 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> >> uint8_t status)
> >>  n->vhost_started = 0;
> >>  }
> >>  } else {
> >> -vhost_net_stop(vdev, n->nic->ncs, queues);
> >>  n->vhost_started = 0;
> >> +vhost_net_stop(vdev, n->nic->ncs, queues);
> >>  }
> >>  }
> > 
> > Well the wider context is
> > 
> > 
> > n->vhost_started = 1;
> > r = vhost_net_start(vdev, n->nic->ncs, queues);
> > if (r < 0) {
> > error_report("unable to start vhost net: %d: "
> >  "falling back on userspace virtio", -r);
> > n->vhost_started = 0;
> > }
> > } else {
> > vhost_net_stop(vdev, n->nic->ncs, queues);
> > n->vhost_started = 0;
> > 
> > So we set it to 1 before start, we should clear after stop.
> 
> OK. I agree that clearing after is a bit safer. But in this case we need
> a separate flag or other way to detect that we're already inside the
> 'vhost_net_stop()'.
> 
> What do you think about that old patch:
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?
> 
> It implements the same thing but introduces additional 

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-07 Thread Michael S. Tsirkin
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
> On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
> >> In case virtio error occured after vhost_dev_close(), qemu will crash
> >> in nested cleanup while checking IOMMU flag because dev->vdev already
> >> set to zero and resources are already freed.
> >>
> >> Example:
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >>
> >> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
> >> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
> >> #3  vhost_net_stop at hw/net/vhost_net.c:368
> >>
> >> Nested call to vhost_net_stop(). First time was at #14.
> >>
> >> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
> >> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
> >> #6  virtio_set_status at hw/virtio/virtio.c:1146
> >> #7  virtio_error at hw/virtio/virtio.c:2455
> >>
> >> virtqueue_get_head() failed here.
> >>
> >> #8  virtqueue_get_head at hw/virtio/virtio.c:543
> >> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
> >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
> >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
> >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> >>
> >> vhost_dev_stop() was executed here. dev->vdev == NULL now.
> >>
> >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
> >> #14 vhost_net_stop at hw/net/vhost_net.c:368
> >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
> >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
> >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
> >> #18 chr_closed_bh at net/vhost-user.c:214
> >> #19 aio_bh_call at util/async.c:90
> >> #20 aio_bh_poll at util/async.c:118
> >> #21 aio_dispatch at util/aio-posix.c:429
> >> #22 aio_ctx_dispatch at util/async.c:261
> >> #23 g_main_context_dispatch
> >> #24 glib_pollfds_poll at util/main-loop.c:213
> >> #25 os_host_main_loop_wait at util/main-loop.c:261
> >> #26 main_loop_wait at util/main-loop.c:515
> >> #27 main_loop () at vl.c:1917
> >> #28 main at vl.c:4795
> >>
> >> Above backtrace captured from qemu crash on vhost disconnect while
> >> virtio driver in guest was in failed state.
> >>
> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
> >> it will assert further trying to free already freed ioeventfds. The
> >> real problem is that we're allowing nested calls to 'vhost_net_stop'.
> >>
> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
> >> any possible double frees and segmentation faults doue to using of
> >> already freed resources by setting 'vhost_started' flag to zero prior
> >> to 'vhost_net_stop' call.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>
> >> This issue was already addressed more than a year ago by the following
> >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
> >> but it was dropped without review due to not yet implemented re-connection
> >> of vhost-user. Re-connection implementation lately fixed most of the
> >> nested calls, but few of them still exists. For example, above backtrace
> >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
> >>
> >>  hw/net/virtio-net.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 38674b0..4d95a18 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> >> uint8_t status)
> >>  n->vhost_started = 0;
> >>  }
> >>  } else {
> >> -vhost_net_stop(vdev, n->nic->ncs, queues);
> >>  n->vhost_started = 0;
> >> +vhost_net_stop(vdev, n->nic->ncs, queues);
> >>  }
> >>  }
> > 
> > Well the wider context is
> > 
> > 
> > n->vhost_started = 1;
> > r = vhost_net_start(vdev, n->nic->ncs, queues);
> > if (r < 0) {
> > error_report("unable to start vhost net: %d: "
> >  "falling back on userspace virtio", -r);
> > n->vhost_started = 0;
> > }
> > } else {
> > vhost_net_stop(vdev, n->nic->ncs, queues);
> > n->vhost_started = 0;
> > 
> > So we set it to 1 before start, we should clear after stop.
> 
> OK. I agree that clearing after is a bit safer. But in this case we need
> a separate flag or other way to detect that we're already inside the
> 'vhost_net_stop()'.
> 
> What do you think about that old patch:
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?
> 
> It implements the same thing but introduces additional 

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-06 Thread Ilya Maximets
On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
>> In case virtio error occured after vhost_dev_close(), qemu will crash
>> in nested cleanup while checking IOMMU flag because dev->vdev already
>> set to zero and resources are already freed.
>>
>> Example:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>>
>> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>> #3  vhost_net_stop at hw/net/vhost_net.c:368
>>
>> Nested call to vhost_net_stop(). First time was at #14.
>>
>> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>> #6  virtio_set_status at hw/virtio/virtio.c:1146
>> #7  virtio_error at hw/virtio/virtio.c:2455
>>
>> virtqueue_get_head() failed here.
>>
>> #8  virtqueue_get_head at hw/virtio/virtio.c:543
>> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>>
>> vhost_dev_stop() was executed here. dev->vdev == NULL now.
>>
>> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>> #14 vhost_net_stop at hw/net/vhost_net.c:368
>> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>> #18 chr_closed_bh at net/vhost-user.c:214
>> #19 aio_bh_call at util/async.c:90
>> #20 aio_bh_poll at util/async.c:118
>> #21 aio_dispatch at util/aio-posix.c:429
>> #22 aio_ctx_dispatch at util/async.c:261
>> #23 g_main_context_dispatch
>> #24 glib_pollfds_poll at util/main-loop.c:213
>> #25 os_host_main_loop_wait at util/main-loop.c:261
>> #26 main_loop_wait at util/main-loop.c:515
>> #27 main_loop () at vl.c:1917
>> #28 main at vl.c:4795
>>
>> Above backtrace captured from qemu crash on vhost disconnect while
>> virtio driver in guest was in failed state.
>>
>> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
>> it will assert further trying to free already freed ioeventfds. The
>> real problem is that we're allowing nested calls to 'vhost_net_stop'.
>>
>> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
>> any possible double frees and segmentation faults doue to using of
>> already freed resources by setting 'vhost_started' flag to zero prior
>> to 'vhost_net_stop' call.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> This issue was already addressed more than a year ago by the following
>> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
>> but it was dropped without review due to not yet implemented re-connection
>> of vhost-user. Re-connection implementation lately fixed most of the
>> nested calls, but few of them still exists. For example, above backtrace
>> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
>>
>>  hw/net/virtio-net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 38674b0..4d95a18 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
>> uint8_t status)
>>  n->vhost_started = 0;
>>  }
>>  } else {
>> -vhost_net_stop(vdev, n->nic->ncs, queues);
>>  n->vhost_started = 0;
>> +vhost_net_stop(vdev, n->nic->ncs, queues);
>>  }
>>  }
> 
> Well the wider context is
> 
> 
> n->vhost_started = 1;
> r = vhost_net_start(vdev, n->nic->ncs, queues);
> if (r < 0) {
> error_report("unable to start vhost net: %d: "
>  "falling back on userspace virtio", -r);
> n->vhost_started = 0;
> }
> } else {
> vhost_net_stop(vdev, n->nic->ncs, queues);
> n->vhost_started = 0;
> 
> So we set it to 1 before start, we should clear after stop.

OK. I agree that clearing after is a bit safer. But in this case we need
a separate flag or other way to detect that we're already inside the
'vhost_net_stop()'.

What do you think about that old patch:
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?

It implements the same thing but introduces additional flag. It even could
be still applicable.



Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-06 Thread Michael S. Tsirkin
On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
> In case virtio error occured after vhost_dev_close(), qemu will crash
> in nested cleanup while checking IOMMU flag because dev->vdev already
> set to zero and resources are already freed.
> 
> Example:
> 
> Program received signal SIGSEGV, Segmentation fault.
> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> 
> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
> #3  vhost_net_stop at hw/net/vhost_net.c:368
> 
> Nested call to vhost_net_stop(). First time was at #14.
> 
> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
> #6  virtio_set_status at hw/virtio/virtio.c:1146
> #7  virtio_error at hw/virtio/virtio.c:2455
> 
> virtqueue_get_head() failed here.
> 
> #8  virtqueue_get_head at hw/virtio/virtio.c:543
> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> 
> vhost_dev_stop() was executed here. dev->vdev == NULL now.
> 
> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
> #14 vhost_net_stop at hw/net/vhost_net.c:368
> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
> #18 chr_closed_bh at net/vhost-user.c:214
> #19 aio_bh_call at util/async.c:90
> #20 aio_bh_poll at util/async.c:118
> #21 aio_dispatch at util/aio-posix.c:429
> #22 aio_ctx_dispatch at util/async.c:261
> #23 g_main_context_dispatch
> #24 glib_pollfds_poll at util/main-loop.c:213
> #25 os_host_main_loop_wait at util/main-loop.c:261
> #26 main_loop_wait at util/main-loop.c:515
> #27 main_loop () at vl.c:1917
> #28 main at vl.c:4795
> 
> Above backtrace captured from qemu crash on vhost disconnect while
> virtio driver in guest was in failed state.
> 
> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
> it will assert further trying to free already freed ioeventfds. The
> real problem is that we're allowing nested calls to 'vhost_net_stop'.
> 
> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
> any possible double frees and segmentation faults doue to using of
> already freed resources by setting 'vhost_started' flag to zero prior
> to 'vhost_net_stop' call.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> This issue was already addressed more than a year ago by the following
> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
> but it was dropped without review due to not yet implemented re-connection
> of vhost-user. Re-connection implementation lately fixed most of the
> nested calls, but few of them still exists. For example, above backtrace
> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
> 
>  hw/net/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 38674b0..4d95a18 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
> status)
>  n->vhost_started = 0;
>  }
>  } else {
> -vhost_net_stop(vdev, n->nic->ncs, queues);
>  n->vhost_started = 0;
> +vhost_net_stop(vdev, n->nic->ncs, queues);
>  }
>  }

Well the wider context is


n->vhost_started = 1;
r = vhost_net_start(vdev, n->nic->ncs, queues);
if (r < 0) {
error_report("unable to start vhost net: %d: "
 "falling back on userspace virtio", -r);
n->vhost_started = 0;
}
} else {
vhost_net_stop(vdev, n->nic->ncs, queues);
n->vhost_started = 0;

So we set it to 1 before start, we should clear after stop.


> -- 
> 2.7.4