Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture

Guilherme Piccoli Wed, 17 Jun 2020 08:59:16 -0700

Can't qemu reads the host physical bits and pass that as fw_cfg as
"real_host_physbits" or something like that?
OVMF could rely on that - if such property is available, we use it to
extend the PCI64 aperture.


On Wed, Jun 17, 2020 at 12:50 PM Eduardo Habkost <ehabk...@redhat.com> wrote:
>
> On Wed, Jun 17, 2020 at 02:46:52PM +0100, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (ler...@redhat.com) wrote:
> > > On 06/16/20 19:14, Guilherme Piccoli wrote:
> > > > Thanks Gerd, Dave and Eduardo for the prompt responses!
> > > >
> > > > So, I understand that when we use "-host-physical-bits", we are
> > > > passing the *real* number for the guest, correct? So, in this case we
> > > > can trust that the guest physbits matches the true host physbits.
> > > >
> > > > What if then we have OVMF relying in the physbits *iff*
> > > > "-host-phys-bits" is used (which is the default in RH and a possible
> > > > machine configuration on libvirt XML in Ubuntu), and we have OVMF
> > > > fallbacks to 36-bit otherwise?
> > >
> > > I've now read the commit message on QEMU commit 258fe08bd341d, and the
> > > complexity is simply stunning.
> > >
> > > Right now, OVMF calculates the guest physical address space size from
> > > various range sizes (such as hotplug memory area end, default or
> > > user-configured PCI64 MMIO aperture), and derives the minimum suitable
> > > guest-phys address width from that address space size. This width is
> > > then exposed to the rest of the firmware with the CPU HOB (hand-off
> > > block), which in turn controls how the GCD (global coherency domain)
> > > memory space map is sized. Etc.
> > >
> > > If QEMU can provide a *reliable* GPA width, in some info channel (CPUID
> > > or even fw_cfg), then the above calculation could be reversed in OVMF.
> > > We could take the width as a given (-> produce the CPU HOB directly),
> > > plus calculate the *remaining* address space between the GPA space size
> > > given by the width, and the end of the memory hotplug area end. If the
> > > "remaining size" were negative, then obviously QEMU would have been
> > > misconfigured, so we'd halt the boot. Otherwise, the remaining area
> > > could be used as PCI64 MMIO aperture (PEI memory footprint of DXE page
> > > tables be darned).
> > >
> > > > Now, regarding the problem "to trust or not" in the guests' physbits,
> > > > I think it's an orthogonal discussion to some extent. It'd be nice to
> > > > have that check, and as Eduardo said, prevent migration in such cases.
> > > > But it's not really preventing OVMF big PCI64 aperture if we only
> > > > increase the aperture _when  "-host-physical-bits" is used_.
> > >
> > > I don't know what exactly those flags do, but I doubt they are clearly
> > > visible to OVMF in any particular way.
> >
> > The firmware should trust whatever it reads from the cpuid and thus gets
> > told from qemu; if qemu is doing the wrong thing there then that's our
> > problem and we need to fix it in qemu.
>
> It is impossible to provide a MAXPHYADDR that the guest can trust
> unconditionally and allow live migration to hosts with different
> sizes at the same time.
>
> Unless we want to drop support live migration to hosts with
> different sizes entirely, we need additional bits to tell the
> guest how much it can trust MAXPHYADDR.
>
> --
> Eduardo
>

Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture

Reply via email to