On 7/15/22 12:57, Igor Mammedov wrote:
> On Thu, 14 Jul 2022 19:28:19 +0100
> Joao Martins <joao.m.mart...@oracle.com> wrote:
> 
>> It is assumed that the whole GPA space is available to be DMA
>> addressable, within a given address space limit, except for a
>> tiny region before the 4G. Since Linux v5.4, VFIO validates
>> whether the selected GPA is indeed valid i.e. not reserved by
>> IOMMU on behalf of some specific devices or platform-defined
>> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
>>  -EINVAL.
>>
>> AMD systems with an IOMMU are examples of such platforms and
>> particularly may only have these ranges as allowed:
>>
>>      0000000000000000 - 00000000fedfffff (0      .. 3.982G)
>>      00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
>>      0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb[*])
>>
>> We already account for the 4G hole, albeit if the guest is big
>> enough we will fail to allocate a guest with  >1010G due to the
>> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).
>>
>> [*] there is another reserved region unrelated to HT that exists
>> in the 256T boundary in Fam 17h according to Errata #1286,
>> documeted also in "Open-Source Register Reference for AMD Family
>> 17h Processors (PUB)"
>>
>> When creating the region above 4G, take into account that on AMD
>> platforms the HyperTransport range is reserved and hence it
>> cannot be used either as GPAs. On those cases rather than
>> establishing the start of ram-above-4g to be 4G, relocate instead
>> to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
>> Topology", for more information on the underlying restriction of
>> IOVAs.
>>
>> After accounting for the 1Tb hole on AMD hosts, mtree should
>> look like:
>>
>> 0000000000000000-000000007fffffff (prio 0, i/o):
>>       alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff
>> 0000010000000000-000001ff7fffffff (prio 0, i/o):
>>      alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff
>>
>> If the relocation is done or the address space covers it, we
>> also add the the reserved HT e820 range as reserved.
>>
>> Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
>> to address 1Tb (0xff ffff ffff). On AMD platforms, if a
>> ram-above-4g relocation may be desired and the CPU wasn't configured
>> with a big enough phys-bits, print an error message to the user
>> and do not make the relocation of the above-4g-region if phys-bits
>> is too low.
>>
>> Suggested-by: Igor Mammedov <imamm...@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
>> ---
>>  hw/i386/pc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 82 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index cda435e3baeb..17613974163e 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -880,6 +880,52 @@ static uint64_t pc_get_cxl_range_end(PCMachineState 
>> *pcms)
>>      return start;
>>  }
>>  
>> +static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t 
>> pci_hole64_size)
>> +{
>> +    X86CPU *cpu = X86_CPU(first_cpu);
>> +
>> +    /* 32-bit systems don't have hole64 thus return max CPU address */
>> +    if (cpu->phys_bits <= 32) {
>> +        return ((hwaddr)1 << cpu->phys_bits) - 1;
>> +    }
>> +
>> +    return pc_pci_hole64_start() + pci_hole64_size - 1;
>> +}
>> +
> [...]
> 
>> +
>> +    /*
>> +     * Relocating ram-above-4G requires more than TCG_PHYS_ADDR_BITS (40).
>> +     * So make sure phys-bits is required to be appropriately sized in order
>> +     * to proceed with the above-4g-region relocation and thus boot.
> 
> drop mention of relocation here as it's orthogonal to the check.
> Important thing we are checking here is that max used GPA is
> reachable by configured vCPU (physbits).
> 

OK

>> +     */
>> +    maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size);
>> +    maxphysaddr = ((hwaddr)1 << cpu->phys_bits) - 1;
>> +    if (maxphysaddr < maxusedaddr) {
>> +        error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
>> +                     " phys-bits too low (%u)",
>> +                     maxphysaddr, maxusedaddr, cpu->phys_bits);
>> +        exit(EXIT_FAILURE);
>> +    }
> 
> these hunks should be a separate patch preceding relocation patch
> as it basically does max_gpa vs physbits check regardless
> of relocation (i.e. relocation is only one of the reasons
> max_used_gpa might exceed physbits).
> 
Yeap, makes sense given that this is now generic regardless of AMD 1Tb hole.

>> +
>>      /*
>>       * Split single memory region and use aliases to address portions of it,
>>       * done for backwards compatibility with older qemus.
> 

Reply via email to