On Thu, 27 Jul 2017 20:53:48 +1000
David Gibson <da...@gibson.dropbear.id.au> wrote:

> On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> > On 27 July 2017 at 02:30, Michael Roth <mdr...@linux.vnet.ibm.com> wrote:  
> > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side 
> > > pretty
> > > much always crashing the host.  
> > 
> > My initial naive thought is that if the host kernel can crash then
> > this is a host kernel bug... shouldn't the host kernel refuse
> > the subsequent libvirt rebind if it would cause a crash ?  
> 
> I think so too, but I haven't been able to convince Alex.  Nor
> find time to fix it in the kernel myself.

It's not me you need to convince, it's GregKH[1].  That interpretation
is that the user bind request is a mandate and we'll fall over
ourselves to try to do as they ask.  I think the best I might be able
to do is to kill the QEMU process to avoid compromising the kernel
rather than killing the kernel after the isolation compromise has
occurred.  Messing with driver binding is a privileged operation, and
the kernel believes you get to keep all the pieces when it fails.
Sorry.  Thanks,

Alex

[1] https://lkml.org/lkml/2017/7/10/728

Reply via email to