On 05.11.14 12:24, Aravinda Prasad wrote: > > > On Wednesday 05 November 2014 04:37 PM, Alexander Graf wrote: >> >> >> On 05.11.14 11:37, Aravinda Prasad wrote: >>> >>> >>> On Wednesday 05 November 2014 02:02 PM, Alexander Graf wrote: >>>> >>>> >>>> On 05.11.14 08:13, Aravinda Prasad wrote: >>>>> This patch adds FWNMI support in qemu for powerKVM >>>>> guests by handling the ibm,nmi-register rtas call. >>>>> Whenever OS issues ibm,nmi-register RTAS call, the >>>>> machine check notification address is saved and the >>>>> machine check interrupt vector 0x200 is patched to >>>>> issue a private hcall. >>>>> >>>>> This patch also handles the cases when multi-processors >>>>> experience machine check at or about the same time. >>>>> As per PAPR, subsequent processors serialize waiting >>>>> for the first processor to issue the ibm,nmi-interlock call. >>>>> The second processor retries if the first processor which >>>>> received a machine check is still reading the error log >>>>> and is yet to issue ibm,nmi-interlock call. >>>>> >>>>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com> >>>>> --- >>>>> hw/ppc/spapr_hcall.c | 16 +++++++ >>>>> hw/ppc/spapr_rtas.c | 93 >>>>> +++++++++++++++++++++++++++++++++++++++ >>>>> include/hw/ppc/spapr.h | 17 +++++++ >>>>> pc-bios/spapr-rtas/spapr-rtas.S | 38 ++++++++++++++++ >>>>> 4 files changed, 163 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >>>>> index 8f16160..eceb5e5 100644 >>>>> --- a/hw/ppc/spapr_hcall.c >>>>> +++ b/hw/ppc/spapr_hcall.c >>>>> @@ -97,6 +97,9 @@ struct rtas_mc_log { >>>>> struct rtas_error_log err_log; >>>>> }; >>>>> >>>>> +/* Whether machine check handling is in progress by any CPU */ >>>>> +bool mc_in_progress; >>>>> + >>>>> static void do_spr_sync(void *arg) >>>>> { >>>>> struct SPRSyncState *s = arg; >>>>> @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, >>>>> sPAPREnvironment *spapr, >>>>> cpu_synchronize_state(CPU(ppc_env_get_cpu(env))); >>>>> >>>>> /* >>>>> + * Only one VCPU can process machine check NMI at a time. Hence >>>>> + * set the lock mc_in_progress. Once the VCPU finishes processing >>>>> + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset >>>>> + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter >>>>> + * NMI we return 0 asking the VCPU to retry h_report_mc_err >>>>> + */ >>>>> + if (mc_in_progress == 1) { >>>> >>>> Please don't depend on bools being numbers. Use true / false. For if()s, >>>> just don't use == at all - it makes it more readable. >>> >>> ok >>> >>>> >>>>> + return 0; >>>>> + } >>>>> + >>>>> + mc_in_progress = 1; >>>>> + >>>>> + /* >>>>> * We save the original r3 register in SPRG2 in 0x200 vector, >>>>> * which is patched during call to ibm.nmi-register. Original >>>>> * r3 is required to be included in error log >>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >>>>> index 2ec2a8e..71c7662 100644 >>>>> --- a/hw/ppc/spapr_rtas.c >>>>> +++ b/hw/ppc/spapr_rtas.c >>>>> @@ -36,6 +36,9 @@ >>>>> >>>>> #include <libfdt.h> >>>>> >>>>> +#define BRANCH_INST_MASK 0xFC000000 >>>>> +extern bool mc_in_progress; >>>> >>>> Please put this into the spapr struct. >>> >>> ok >>> >>>> >>>>> + >>>>> static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment >>>>> *spapr, >>>>> uint32_t token, uint32_t nargs, >>>>> target_ulong args, >>>>> @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu, >>>>> rtas_st(rets, 0, ret); >>>>> } >>>>> >>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu, >>>>> + sPAPREnvironment *spapr, >>>>> + uint32_t token, uint32_t nargs, >>>>> + target_ulong args, >>>>> + uint32_t nret, target_ulong rets) >>>>> +{ >>>>> + int i; >>>>> + uint32_t ori_inst = 0x60630000; >>>>> + uint32_t branch_inst = 0x48000002; >>>>> + target_ulong guest_machine_check_addr; >>>>> + uint32_t trampoline[TRAMPOLINE_INSTS]; >>>>> + int total_inst = sizeof(trampoline) / sizeof(uint32_t); >>>> >>>> ARRAY_SIZE(trampoline), though I don't quite understand why you need a >>>> variable that contains the same value as a constant (TRAMPOLINE_INSTS). >>>> >>>> But since you're moving all of those bits into variable fields on the >>>> rtas blob itself as we discussed in the last version, I guess this code >>>> will go away anyways ;). >>> >>> I think we still need this. We need to patch the KVMPPC_H_REPORT_MC_ERR >>> number and branch address in the trampoline and also, depending on >>> whether the guest running in LE/BE we may need to flip the bits in the >>> trampoline before copying it to 0x200 machine check vector. >>> >>> As rtas-blob is part of the guest memory I do not want to patch these in >>> rtas-blob, hence I copy the trampoline from the rtas-blob to an array, >>> modify accordingly and then move it to 0x200 machine check vector. >> >> Yes, you will still need the array. But the array should be dynamically >> sized based on spapr->rtas_info->fwnmi_size which you extract from the
spapr->rtas_info.fwnmi_size of course ;). No need for yet another allocation to keep track of. >> blob on load. >> >> That way you wouldn't need the "total_inst" variable anymore ;). > > Yes, I will fix it that way. > >> >>> >>>> >>>>> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); >>>>> + >>>>> + /* Store the system reset and machine check address */ >>>>> + guest_machine_check_addr = rtas_ld(args, 1); >>>> >>>> Load or Store? I don't find the comment particularly useful either ;). >>> >>> will reword it or may delete it. >>> >>>> >>>>> + >>>>> + /* >>>>> + * Read the trampoline instructions from RTAS Blob and patch >>>>> + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest >>>>> + * machine check address before copying to 0x200 vector >>>>> + */ >>>>> + cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET, >>>>> + trampoline, sizeof(trampoline)); >>>>> + >>>>> + /* Safety Check */ >>>> >>>> Same for this comment. >>> >>> we have only 0x100 bytes that can be copied at 0x200. If the trampoline >>> size exceeds then the next interrupt vector code is overwritten. Hence a >>> safety check during compile time to make sure trampoline is within 0x100 >>> bytes. >> >> I think the check is fine, the comment is just redundant with >> QEMU_BUILD_BUG_ON. Either be more verbose in the comment or remove it > > I will add above lines as comment. Awesome. You can also move the check to rtas load time, since the size will be defined by the blob with your next version ;). Alex