On Thu, 27 Jul 2017 12:50:42 +0100
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:
> > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:  
> > > On 27 July 2017 at 02:30, Michael Roth <mdr...@linux.vnet.ibm.com> wrote: 
> > >  
> > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt 
> > > > side pretty
> > > > much always crashing the host.  
> > > 
> > > My initial naive thought is that if the host kernel can crash then
> > > this is a host kernel bug... shouldn't the host kernel refuse
> > > the subsequent libvirt rebind if it would cause a crash ?  
> > 
> > I think so too, but I haven't been able to convince Alex.  Nor
> > find time to fix it in the kernel myself.  
> 
> I think we need to fix both the QEMU premature sending of DEVICE_DELETED
> and the kernel bug that allowed the crash.


Where do we stand on this for v2.10?  I'd like to see it get in.  There
may be things to fix in the kernel, some of them may already be fixed
in the latest development kernel, but ultimately the kernel considers
driver binding to be a trusted operation and if userspace doesn't
understand all the dependencies, they shouldn't be doing it.  In this
case libvirt is using the DEVICE_DELETED signal with the assumption
that the device has been fully released by QEMU, which is of course not
accurate (libvirt could test this, but chooses not to).  libvirt
therefore begins trying to unbind a device that is still in use, we try
to handle it, but see official kernel stance that userspace is
responsible for understanding device dependencies, so we can only do so
much.

IMO, the next step along those lines would be that libvirt needs to
understand that even once a device is fully released from QEMU, it's
not necessarily safe to re-bind the device to a host driver.  If the
device is a member of a group where other devices are still in use by
userspace, this will violate user/host device isolation and the kernel
will crash to protect itself.  At best I may be able to improve this to
killing the userspace process making use of the conflicting device, but
the kernel view is that userspace (libvirt) has mandated to bind the
device to the host driver and we must make it so, the user is
responsible for the consequences.  Thanks,

Alex

Reply via email to