On 7/15/22 12:57, Igor Mammedov wrote: > On Thu, 14 Jul 2022 19:28:19 +0100 > Joao Martins <joao.m.mart...@oracle.com> wrote: > >> It is assumed that the whole GPA space is available to be DMA >> addressable, within a given address space limit, except for a >> tiny region before the 4G. Since Linux v5.4, VFIO validates >> whether the selected GPA is indeed valid i.e. not reserved by >> IOMMU on behalf of some specific devices or platform-defined >> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with >> -EINVAL. >> >> AMD systems with an IOMMU are examples of such platforms and >> particularly may only have these ranges as allowed: >> >> 0000000000000000 - 00000000fedfffff (0 .. 3.982G) >> 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G) >> 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*]) >> >> We already account for the 4G hole, albeit if the guest is big >> enough we will fail to allocate a guest with >1010G due to the >> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT). >> >> [*] there is another reserved region unrelated to HT that exists >> in the 256T boundary in Fam 17h according to Errata #1286, >> documeted also in "Open-Source Register Reference for AMD Family >> 17h Processors (PUB)" >> >> When creating the region above 4G, take into account that on AMD >> platforms the HyperTransport range is reserved and hence it >> cannot be used either as GPAs. On those cases rather than >> establishing the start of ram-above-4g to be 4G, relocate instead >> to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical >> Topology", for more information on the underlying restriction of >> IOVAs. >> >> After accounting for the 1Tb hole on AMD hosts, mtree should >> look like: >> >> 0000000000000000-000000007fffffff (prio 0, i/o): >> alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff >> 0000010000000000-000001ff7fffffff (prio 0, i/o): >> alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff >> >> If the relocation is done or the address space covers it, we >> also add the the reserved HT e820 range as reserved. >> >> Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough >> to address 1Tb (0xff ffff ffff). On AMD platforms, if a >> ram-above-4g relocation may be desired and the CPU wasn't configured >> with a big enough phys-bits, print an error message to the user >> and do not make the relocation of the above-4g-region if phys-bits >> is too low. >> >> Suggested-by: Igor Mammedov <imamm...@redhat.com> >> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com> >> --- >> hw/i386/pc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 82 insertions(+) >> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >> index cda435e3baeb..17613974163e 100644 >> --- a/hw/i386/pc.c >> +++ b/hw/i386/pc.c >> @@ -880,6 +880,52 @@ static uint64_t pc_get_cxl_range_end(PCMachineState >> *pcms) >> return start; >> } >> >> +static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t >> pci_hole64_size) >> +{ >> + X86CPU *cpu = X86_CPU(first_cpu); >> + >> + /* 32-bit systems don't have hole64 thus return max CPU address */ >> + if (cpu->phys_bits <= 32) { >> + return ((hwaddr)1 << cpu->phys_bits) - 1; >> + } >> + >> + return pc_pci_hole64_start() + pci_hole64_size - 1; >> +} >> + > [...] > >> + >> + /* >> + * Relocating ram-above-4G requires more than TCG_PHYS_ADDR_BITS (40). >> + * So make sure phys-bits is required to be appropriately sized in order >> + * to proceed with the above-4g-region relocation and thus boot. > > drop mention of relocation here as it's orthogonal to the check. > Important thing we are checking here is that max used GPA is > reachable by configured vCPU (physbits). >
OK >> + */ >> + maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size); >> + maxphysaddr = ((hwaddr)1 << cpu->phys_bits) - 1; >> + if (maxphysaddr < maxusedaddr) { >> + error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64 >> + " phys-bits too low (%u)", >> + maxphysaddr, maxusedaddr, cpu->phys_bits); >> + exit(EXIT_FAILURE); >> + } > > these hunks should be a separate patch preceding relocation patch > as it basically does max_gpa vs physbits check regardless > of relocation (i.e. relocation is only one of the reasons > max_used_gpa might exceed physbits). > Yeap, makes sense given that this is now generic regardless of AMD 1Tb hole. >> + >> /* >> * Split single memory region and use aliases to address portions of it, >> * done for backwards compatibility with older qemus. >