This just happened overnight:

Oct 19 05:49:59 host bash[4647]: qemu-system-x86_64:
vfio_err_notifier_handler(0000:03:00.1) Unrecoverable error detected.
Please collect any data possible and then kill the guest
Oct 19 05:50:00 host bash[4647]: qemu-system-x86_64:
vfio_err_notifier_handler(0000:03:00.0) Unrecoverable error detected.
Please collect any data possible and then kill the guest

which ended up stopping the guest.  Some quick googling yields a few
threads that look related:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg04868.html
https://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg04103.html

However, there doesn't seem to be any actual solution to prevent the error
in the future.  It looks as if "someone's working on it", but it's not
ready yet.

I also noticed this in dmesg, (0000:00:02.0 is the Root Port that bus
03:00.0 is on):

[208697.190826] pcieport 0000:00:02.0: AER: Uncorrected (Non-Fatal) error
received: id=0010
[208697.190832] pcieport 0000:00:02.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, id=0010(Requester ID)
[208697.190834] pcieport 0000:00:02.0:   device [8086:6f04] error
status/mask=00004000/00000000
[208697.190835] pcieport 0000:00:02.0:    [14] Completion Timeout
(First)
[208697.190837] pcieport 0000:00:02.0: broadcast error_detected message
[208697.190840] pcieport 0000:00:02.0: broadcast mmio_enabled message
[208697.190841] pcieport 0000:00:02.0: broadcast resume message
[208697.190843] pcieport 0000:00:02.0: AER: Device recovery successful

Does anyone know the status of this hang/crash and what can be done about
it in the short term?

Thanks,
Chuck

Reply via email to