Excerpts from Cédric Le Goater's message of February 16, 2022 8:52 pm: > On 2/16/22 11:25, Nicholas Piggin wrote: >> This implements the Nested KVM HV hcall API for spapr under TCG. >> >> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the >> L1 is switched back in returned from the hcall when a HV exception >> is sent to the vhyp. Register state is copied in and out according to >> the nested KVM HV hcall API specification. >> >> The hdecr timer is started when the L2 is switched in, and it provides >> the HDEC / 0x980 return to L1. >> >> The MMU re-uses the bare metal radix 2-level page table walker by >> using the get_pate method to point the MMU to the nested partition >> table entry. MMU faults due to partition scope errors raise HV >> exceptions and accordingly are routed back to the L1. >> >> The MMU does not tag translations for the L1 (direct) vs L2 (nested) >> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry >> and exit).> >> Reviewed-by: Fabiano Rosas <faro...@linux.ibm.com> >> Signed-off-by: Nicholas Piggin <npig...@gmail.com> > > Reviewed-by: Cédric Le Goater <c...@kaod.org> > > Some last comments below,
[...] >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >> index edbf3eeed0..852fe61b36 100644 >> --- a/include/hw/ppc/spapr.h >> +++ b/include/hw/ppc/spapr.h >> @@ -199,6 +199,9 @@ struct SpaprMachineState { >> bool has_graphics; >> uint32_t vsmt; /* Virtual SMT mode (KVM's "core stride") */ >> >> + /* Nested HV support (TCG only) */ >> + uint64_t nested_ptcr; >> + > > this is new state to migrate. > [...] >> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */ >> +struct kvmppc_pt_regs { >> + uint64_t gpr[32]; >> + uint64_t nip; >> + uint64_t msr; >> + uint64_t orig_gpr3; /* Used for restarting system calls */ >> + uint64_t ctr; >> + uint64_t link; >> + uint64_t xer; >> + uint64_t ccr; >> + uint64_t softe; /* Soft enabled/disabled */ >> + uint64_t trap; /* Reason for being here */ >> + uint64_t dar; /* Fault registers */ >> + uint64_t dsisr; /* on 4xx/Book-E used for ESR */ >> + uint64_t result; /* Result of a system call */ >> +}; > > I think we need to start moving all the spapr hcall definitions under > spapr_hcall.h. It can come later. Sure. [...] >> diff --git a/include/hw/ppc/spapr_cpu_core.h >> b/include/hw/ppc/spapr_cpu_core.h >> index dab3dfc76c..b560514560 100644 >> --- a/include/hw/ppc/spapr_cpu_core.h >> +++ b/include/hw/ppc/spapr_cpu_core.h >> @@ -48,6 +48,11 @@ typedef struct SpaprCpuState { >> bool prod; /* not migrated, only used to improve dispatch latencies */ >> struct ICPState *icp; >> struct XiveTCTX *tctx; >> + >> + /* Fields for nested-HV support */ >> + bool in_nested; /* true while the L2 is executing */ >> + CPUPPCState *nested_host_state; /* holds the L1 state while L2 executes >> */ >> + int64_t nested_tb_offset; /* L1->L2 TB offset */ > > This needs a new vmstate. How about instead of the vmstate (we would need all the L1 state in nested_host_state as well), we just add a migration blocker in the L2 entry path. We could limit the max hdecr to say 1 second to ensure it unblocks before long. I know migration blockers are not preferred but in this case it gives us some iterations to debug and optimise first, which might change the data to migrate. Thanks, Nick