On Fri, 3 Mar 2017 13:43:32 +0100 Halil Pasic <pa...@linux.vnet.ibm.com> wrote:
> On 03/03/2017 01:21 PM, Cornelia Huck wrote: > > On Thu, 2 Mar 2017 19:59:42 +0100 > > Halil Pasic <pa...@linux.vnet.ibm.com> wrote: > > > >> The function virtio_notify_irqfd used to ignore the return code of > >> event_notifier_set. Let's fail the device should this occur. > > > > I'm wondering if there are reasons for event_notifier_set() to fail > > beyond "we've hit an internal race and should make an effort to fix > > that one, or else we have completely messed up in qemu". Marking the > > device broken tells the guest that there's something wrong with the > > device, but I think we want qemu bug reports when there's something > > broken with the irqfd. > > > > That's why the error is logged. I understand virtio_error like something > suitable for indicating bugs. > > What do you suggest? Forcing a dump? I would rather leave it to the > user to figure out how important is the state sitting in the machine > and the device, and how much effort does (s)he want to put into recovering > from the failure. How likely are those logged messages being brought to attention of the admin? Does any management software flag machines with such error messages? (that's more of a general question) I'd like to have some kind of trigger that rings an alarm bell so that the admin might consider reporting this, but I don't have a good idea on how to do that either...