Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture

Laszlo Ersek Wed, 17 Jun 2020 09:44:15 -0700

On 06/17/20 18:14, Laszlo Ersek wrote:
> On 06/17/20 15:46, Dr. David Alan Gilbert wrote:
>> * Laszlo Ersek (ler...@redhat.com) wrote:
>>> On 06/16/20 19:14, Guilherme Piccoli wrote:
>>>> Thanks Gerd, Dave and Eduardo for the prompt responses!
>>>>
>>>> So, I understand that when we use "-host-physical-bits", we are
>>>> passing the *real* number for the guest, correct? So, in this case we
>>>> can trust that the guest physbits matches the true host physbits.
>>>>
>>>> What if then we have OVMF relying in the physbits *iff*
>>>> "-host-phys-bits" is used (which is the default in RH and a possible
>>>> machine configuration on libvirt XML in Ubuntu), and we have OVMF
>>>> fallbacks to 36-bit otherwise?
>>>
>>> I've now read the commit message on QEMU commit 258fe08bd341d, and the
>>> complexity is simply stunning.
>>>
>>> Right now, OVMF calculates the guest physical address space size from
>>> various range sizes (such as hotplug memory area end, default or
>>> user-configured PCI64 MMIO aperture), and derives the minimum suitable
>>> guest-phys address width from that address space size. This width is
>>> then exposed to the rest of the firmware with the CPU HOB (hand-off
>>> block), which in turn controls how the GCD (global coherency domain)
>>> memory space map is sized. Etc.
>>>
>>> If QEMU can provide a *reliable* GPA width, in some info channel (CPUID
>>> or even fw_cfg), then the above calculation could be reversed in OVMF.
>>> We could take the width as a given (-> produce the CPU HOB directly),
>>> plus calculate the *remaining* address space between the GPA space size
>>> given by the width, and the end of the memory hotplug area end. If the
>>> "remaining size" were negative, then obviously QEMU would have been
>>> misconfigured, so we'd halt the boot. Otherwise, the remaining area
>>> could be used as PCI64 MMIO aperture (PEI memory footprint of DXE page
>>> tables be darned).
>>>
>>>> Now, regarding the problem "to trust or not" in the guests' physbits,
>>>> I think it's an orthogonal discussion to some extent. It'd be nice to
>>>> have that check, and as Eduardo said, prevent migration in such cases.
>>>> But it's not really preventing OVMF big PCI64 aperture if we only
>>>> increase the aperture _when  "-host-physical-bits" is used_.
>>>
>>> I don't know what exactly those flags do, but I doubt they are clearly
>>> visible to OVMF in any particular way.
>>
>> The firmware should trust whatever it reads from the cpuid and thus gets
>> told from qemu; if qemu is doing the wrong thing there then that's our
>> problem and we need to fix it in qemu.
> 
> This sounds good in practice, but -- as Gerd too has stated, to my
> understanding -- it has potential to break existing usage.
> 
> Consider assigning a single device with a 32G BAR -- right now that's
> supposed to work, without the X-PciMmio64Mb OVMF knob, on even the "most
> basic" hardware (36-bit host phys address width, and EPT supported). If
> OVMF suddenly starts trusting the CPUID from QEMU, and that results in a
> GPA width of 40 bits (i.e. new OVMF is run on old QEMU), then the big
> BAR (and other stuff too) could be allocated from GPA space that EPT is
> actually able to deal with. --> regression for the user.


s/able/unable/, sigh. :/

> 
> Sometimes I can tell users "hey given that you're building OVMF from
> source, or taking it from a 3rd party origin anyway, can you just run
> upstream QEMU too", but most of the time they just want everything to
> continue working on a 3 year old Ubuntu LTS release or whatever. :/
> 
> And again, this is *without* "X-PciMmio64Mb".
> 
> Laszlo
>

Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture

Reply via email to