One update for the testing scenario:
No need to kill OVS. The issue reproducible with simple 'del-port'
and 'add-port'. virtio driver in guest could crash on both operations.
Most times it crashes in my case on 'add-port' after deletion.
Hi Maxime,
I already saw below patches and original linux kernel virtio issue.
Just had no enough time to test them.
Now I tested below patches and they fixes virtio driver crash.
Thanks for suggestion.
Michael,
I tested "[PATCH] virtio_error: don't invoke status callbacks "
and it fixes the QEMU crash in case of broken guest index.
Thanks.
Best regards, Ilya Maximets.
P.S. Previously I mentioned that I can not reproduce virtio driver
crash with "[PATCH] virtio_error: don't invoke status callbacks"
applied. I was wrong. I can reproduce now. System was misconfigured.
Sorry.
On 14.12.2017 12:01, Maxime Coquelin wrote:
> Hi Ilya,
>
> On 12/14/2017 08:06 AM, Ilya Maximets wrote:
>> On 13.12.2017 22:48, Michael S. Tsirkin wrote:
>>> On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
>>>>>> That
>>>>>> looks very strange. Some of the functions gets 'old_status', others
>>>>>> the 'new_status'. I'm a bit confused.
>>>>>
>>>>> OK, fair enough. Fixed - let's pass old status everywhere,
>>>>> users that need the new one can get it from the vdev.
>>>>>
>>>>>> And it's not functional in current state:
>>>>>>
>>>>>> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
>>>>>
>>>>> Fixed too. new version below.
>>>>
>>>> This doesn't fix the segmentation fault.
>>>
>>> Hmm you are right. Looking into it.
>>>
>>>> I have exactly same crash stacktrace:
>>>>
>>>> #0 vhost_memory_unmap hw/virtio/vhost.c:446
>>>> #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155
>>>> #2 vhost_dev_stop hw/virtio/vhost.c:1594
>>>> #3 vhost_net_stop_one hw/net/vhost_net.c:289
>>>> #4 vhost_net_stop hw/net/vhost_net.c:368
>>>> #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at
>>>> hw/net/virtio-net.c:180
>>>> #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=<optimized
>>>> out>) at hw/net/virtio-net.c:254
>>>> #7 virtio_set_status (vdev=vdev@entry=0x5625f3901100, val=<optimized
>>>> out>) at hw/virtio/virtio.c:1152
>>>> #8 virtio_error (vdev=0x5625f3901100, fmt=fmt@entry=0x5625f014f688 "Guest
>>>> says index %u is available") at hw/virtio/virtio.c:2460
>>>
>>> BTW what is causing this? Why is guest avail index corrupted?
>>
>> My testing environment for the issue:
>>
>> * QEMU 2.10.1
>
> Could you try to backport below patch and try again killing OVS?
>
> commit 2ae39a113af311cb56a0c35b7f212dafcef15303
> Author: Maxime Coquelin <maxime.coque...@redhat.com>
> Date: Thu Nov 16 19:48:35 2017 +0100
>
> vhost: restore avail index from vring used index on disconnection
>
> vhost_virtqueue_stop() gets avail index value from the backend,
> except if the backend is not responding.
>
> It happens when the backend crashes, and in this case, internal
> state of the virtio queue is inconsistent, making packets
> to corrupt the vring state.
>
> With a Linux guest, it results in following error message on
> backend reconnection:
>
> [ 22.444905] virtio_net virtio0: output.0:id 0 is not a head!
> [ 22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
> [ 22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5
>
> Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down")
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com>
> Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
>
> commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a
> Author: Maxime Coquelin <maxime.coque...@redhat.com>
> Date: Thu Nov 16 19:48:34 2017 +0100
>
> virtio: Add queue interface to restore avail index from vring used index
>
> In case of backend crash, it is not possible to restore internal
> avail index from the backend value as vhost_get_vring_base
> callback fails.
>
> This patch provides a new interface to restore internal avail index
> from the vring used index, as done by some vhost-user backend on
> reconnection.
>
> Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com>
> Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
>
>
> Cheers,
> Maxime
>
>
>