On Fri, Sep 04, 2015 at 12:00:25AM +0530, Aravinda Prasad wrote: > > > On Thursday 03 September 2015 11:52 AM, Sam Bobroff wrote: > > On Thu, Sep 03, 2015 at 03:05:21PM +1000, David Gibson wrote: > > > > [snip] > > > >> Hm.. so why can't the hypervisor code do the retrying? > > > > Aravinda replied to this earlier in the thread: > > > > "Retrying cannot be done internally in h_report_mc_err hcall: only one > > thread can succeed entering qemu upon parallel hcall and hence retrying > > inside the hcall will not allow the ibm,nmi-interlock from first CPU to > > succeed." > > > > I assume that this means that the big QEMU lock is held while an hcall is > > processed by QEMU, but I haven't checked the code myself. Actually, even if > > the > > lock is normally held, I don't see why these particular hcalls couldn't > > release > > the lock. I'll look into this. > > I am not sure whether we can release this lock inside an hcall. I need > to check.
I don't see any reason that won't work. As long as you only touch most qemu data structures while the lock is held, of course. > > > > >>>> Also, it looks like the vector will need at least one scratch register > >>>> (for the hcall number, if nothing else). Does PAPR specify what SPRGs > >>>> the vector can clobber? Obviously it can't be anything the guest > >>>> kernel uses. > >>> > >>> PAPR only says SPRGs 0 to 3 are for software use, but the kernel (see > >>> arch/powerpc/include/asm/reg.h) defines SPRG2 as an exception scratch > >>> register > >>> so it should be the right one to use here. > >> > >> Uh.. no. If 0..3 are for software (i.e. OS) use, then this needs to > >> use a different one, since it's being used as a firmware resource > >> here. Linux might treat SPRG2 as scratch, but another OS would be > >> within its rights to use it for something persistent. > >> > >> Although, as paulus points out, sc 1 will clobber SRR0/1 anyway, and > >> if we use a special illegal instruction, then you no longer need a > >> scratch register. > >> > >>>> Btw, does anyone know what happens with the VPA (and dispatch trace > >>>> log and so forth) on kexec() - it could be subject to the same stale > >>>> address problem, and rewriting vectors won't save us there. > >>> > >>> I asked Michael Ellerman this one and he thinks kexec probably frees and > >>> re-allocates the VPA. > >> > >> Ok. So the question is: if an explicit deregister is good enough for > >> the VPA, is it also good enough for the FWNMI vector, in which case > >> doing it with just a qemu exit and not bouncing through the guest space > >> is back on the table. > >> > >> I guess that's still problematic because there are existing guests > >> that assume a kexec() will magically wipe the fwnmi vectors away. > > > > Yes, but I think we could handle this separately if necessary: even if we > > don't > > need to write anything to the vector, we could still insert a magic value > > and > > check for it later. If it's been clobbered by a kexec, go back to the old > > method. > > "> check for it later" - But does QEMU is informed or get to know when > kexec() is issued? No, but I think Sam is suggesting just rechecking the value when you catch an MC exception. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpN2I63FV92s.pgp
Description: PGP signature