On Tue, May 24, 2016 at 08:54:06PM -0600, Alex Williamson wrote: > On Tue, 24 May 2016 13:49:12 +0300 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote: > > > I think that means that if we want to switch from a > > > simple halt-on-error to a mechanism for the guest to handle recovery, > > > we need to disable access to the device between being notified that the > > > error occurred and being notified to resume. > > > > But this isn't what happens on bare metal. > > Errors are reported asynchronously and host might access the device > > meanwhile. These accesses might or might not trigger more errors, but > > fundamentally this should not matter too much as device is going to be > > reset. > > Bare metal also doesn't have a hypervisor underneath performing a PCI > bus reset,
This is where I get lost. I assumed we do reset when guest requests it. Isn't that the case? Why not? > there's only one OS trying to control the device at a time, > so we have some clear differences from bare metal that I don't know we > can avoid. The thought here was that we need to notify the guest at the > earliest point we can, but let the host recovery run to completion > before allowing the user to interact with the device. Perhaps there is > no need to block region access to the device (ie. config space & BAR > resources), but I think we do need to somehow synchronize the bus resets > or else we get situations like that observed previously where the bus is > still in reset while userspace trys to proceed with using it. > Why do we have to trigger reset upon an error? Why not wait for guest to request reset? > The next question then would be whether that's QEMU's job or something > that should be done in the host kernel. It's been proposed to add yet > another eventfd for the kernel vfio-pci to signal QEMU when a resume > notification has occured, but perhaps the better approach would be for > the hot reset ioctl (and base reset ioctl) to handle this situation more > transparently. We could immediately return -EAGAIN and allow QEMU to > delay itself for any reset ioctl received after the AER error detected > event, but before the resume event. We could also allow some sort of > timeout, that the ioctl might enter an interruptible sleep, woken on > the resume notification or timeout. That sounds a bit better to me as > the specification of what's allowed between the error detected > notification and the resume notification is otherwise pretty poorly > defined. So if guest started reset, it might take a while for device to come out of that state, and access during this time might trigger errors. But that's already possible for guest to trigger, right? How is this different? > Do you think we can run completely asynchronous, letting the > host and guest bus resets race? Thanks, > > Alex I have a feeling we need to put some code out, disabled by default, and see how it behaves in the field. For example ability to trigger UR errors seems benign but I think we are trying to prevent them now because of something we saw in the field. -- MST