On Tue, Nov 11, 2014 at 12:45:05PM +0530, Aravinda Prasad wrote: > > > On Tuesday 11 November 2014 08:54 AM, David Gibson wrote: > > On Wed, Nov 05, 2014 at 12:42:03PM +0530, Aravinda Prasad wrote: > >> This series of patches add support for fwnmi in powerKVM guests. > >> > >> Currently upon machine check exception, if the address in > >> error belongs to guest then KVM invokes guest's NMI interrupt > >> vector 0x200. > >> > >> This patch series adds functionality where the guest's 0x200 > >> interrupt vector is patched such that QEMU gets control. QEMU > >> then builds error log and reports the error to OS registered > >> machine check handlers through RTAS space. > >> > >> Apart from this, the patch series also takes care of synchronization > >> when multiple processors encounter machine check at or about the > >> same time. > >> > >> The patch set was tested by simulating a machine check error in > >> the guest. > >> > >> Changes in v3: > >> - Incorporated review comments > >> - Byte codes in patch 4/4 are now moved to > >> pc-bios/spapr-rtas/spapr-rtas.S as instructions. > >> - Defined the RTAS blob in-memory layout. > >> - FIX: save and restore cr register in the trampoline > >> > >> Changes in v2: > >> - Re-based to github.com/agraf/qemu.git branch: ppc-next > >> - Merged patches 4 and 5. > >> - Incorporated other review comments > > > > So, this may not still be possible depending on whether the KVM side > > of this is already merged, but it occurs to me that there's a simpler > > way. > > The KVM part is already merged. Commit ID: 74845bc
Ok, that makes life harder, though I guess without the qemu code merged, no-one would be using yet, so it's not impossible to change still. > > Rather than mucking about with having to update the hypervisor on the > > RTAS location, they have qemu copy the code out of RTAS, patch it and > > copy it back into the vector, you could instead do this: > > Though this is possible, I have coupe of comments below > > > > > 1. Make KVM instead of immediately delivering a 0x200 for a guest > > machine check, cause a special exit to qemu. > > > > 2. Have the register-nmi RTAS call store the guest side MC handler > > address in the spapr structure, but perform no actual guest code > > patching. > > > > 3. Allocate the error log buffer independently from the RTAS blob, > > so qemu always knows where it is. > > As per PAPR, the error log buffer should be part of RTAS blob and the > guest kernel explicitly checks if error log is inside RTAS blob. > This requires qemu to know the updated RTAS location by the OS which is > handled in patch 2/4. Ugh, ok. That's a pretty stupid interface requirement, even by PAPR standards, but I guess we're stuck with it. > > 4. When qemu gets the MC exit condition, instead of going via a > > patched 0x200 vector, just directly set the guest register state and > > jump straight into the guest side MC handler. > > PAPR mentions: > > "R1–7.3.14–8: Once the OS has registered for NMI notification, the > platform firmware must intercept all System Reset Interrupts on all of > the OS’s processors." > > So do we need to go via 0x200? I don't see why. The hypervisor is already intercepting system resets and machine checks because it's a hypervisor, and from the PAPR guest's point of view, all it cares about is that you enter its registered handler with the expected information available. I don't see that the guest cares whether you bounce via a vector in guest space or directly enter the guest supplied handler using hypervisor magic. Patching the guest's vector actually seems a pretty awful hack that would only be necessary to work around limitations in the virtualization capabilities which I don't think we have as of POWER8. Btw, isn't a "System Reset Interrupt" vector 0x100, not vector 0x200? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpmYcUEXP0bA.pgp
Description: PGP signature