* Eduardo Habkost (ehabk...@redhat.com) wrote: > On Wed, Jun 17, 2020 at 05:17:17PM +0100, Daniel P. Berrangé wrote: > > On Wed, Jun 17, 2020 at 05:04:12PM +0100, Dr. David Alan Gilbert wrote: > > > * Eduardo Habkost (ehabk...@redhat.com) wrote: > > > > On Wed, Jun 17, 2020 at 02:46:52PM +0100, Dr. David Alan Gilbert wrote: > > > > > * Laszlo Ersek (ler...@redhat.com) wrote: > > > > > > On 06/16/20 19:14, Guilherme Piccoli wrote: > > > > > > > Thanks Gerd, Dave and Eduardo for the prompt responses! > > > > > > > > > > > > > > So, I understand that when we use "-host-physical-bits", we are > > > > > > > passing the *real* number for the guest, correct? So, in this > > > > > > > case we > > > > > > > can trust that the guest physbits matches the true host physbits. > > > > > > > > > > > > > > What if then we have OVMF relying in the physbits *iff* > > > > > > > "-host-phys-bits" is used (which is the default in RH and a > > > > > > > possible > > > > > > > machine configuration on libvirt XML in Ubuntu), and we have OVMF > > > > > > > fallbacks to 36-bit otherwise? > > > > > > > > > > > > I've now read the commit message on QEMU commit 258fe08bd341d, and > > > > > > the > > > > > > complexity is simply stunning. > > > > > > > > > > > > Right now, OVMF calculates the guest physical address space size > > > > > > from > > > > > > various range sizes (such as hotplug memory area end, default or > > > > > > user-configured PCI64 MMIO aperture), and derives the minimum > > > > > > suitable > > > > > > guest-phys address width from that address space size. This width is > > > > > > then exposed to the rest of the firmware with the CPU HOB (hand-off > > > > > > block), which in turn controls how the GCD (global coherency domain) > > > > > > memory space map is sized. Etc. > > > > > > > > > > > > If QEMU can provide a *reliable* GPA width, in some info channel > > > > > > (CPUID > > > > > > or even fw_cfg), then the above calculation could be reversed in > > > > > > OVMF. > > > > > > We could take the width as a given (-> produce the CPU HOB > > > > > > directly), > > > > > > plus calculate the *remaining* address space between the GPA space > > > > > > size > > > > > > given by the width, and the end of the memory hotplug area end. If > > > > > > the > > > > > > "remaining size" were negative, then obviously QEMU would have been > > > > > > misconfigured, so we'd halt the boot. Otherwise, the remaining area > > > > > > could be used as PCI64 MMIO aperture (PEI memory footprint of DXE > > > > > > page > > > > > > tables be darned). > > > > > > > > > > > > > Now, regarding the problem "to trust or not" in the guests' > > > > > > > physbits, > > > > > > > I think it's an orthogonal discussion to some extent. It'd be > > > > > > > nice to > > > > > > > have that check, and as Eduardo said, prevent migration in such > > > > > > > cases. > > > > > > > But it's not really preventing OVMF big PCI64 aperture if we only > > > > > > > increase the aperture _when "-host-physical-bits" is used_. > > > > > > > > > > > > I don't know what exactly those flags do, but I doubt they are > > > > > > clearly > > > > > > visible to OVMF in any particular way. > > > > > > > > > > The firmware should trust whatever it reads from the cpuid and thus > > > > > gets > > > > > told from qemu; if qemu is doing the wrong thing there then that's our > > > > > problem and we need to fix it in qemu. > > > > > > > > It is impossible to provide a MAXPHYADDR that the guest can trust > > > > unconditionally and allow live migration to hosts with different > > > > sizes at the same time. > > > > > > It would be nice to get to a point where we could say that the reported > > > size is no bigger than the physical hardware. > > > The gotcha here is that (upstream) qemu is still reporting 40 by default > > > when even modern Intel desktop chips are 39. > > > > > > > Unless we want to drop support live migration to hosts with > > > > different sizes entirely, we need additional bits to tell the > > > > guest how much it can trust MAXPHYADDR. > > > > > > Could we go with host-phys-bits=true by default, that at least means the > > > normal behaviour is correct; if people want to migrate between different > > > hosts with different sizes they should set phys-bits (or > > > host-phys-limit) to the lowest in their set of hardware. > > > > Is there any sense in picking the default value based on -cpu selection ? > > > > If user has asked for -cpu host, there's no downside to host-phys-bits=true, > > as the user has intentionally traded off live migration portability already. > > Setting host-phys-bits=true when using -cpu host makes a lot of > sense, and we could start doing that immediately. > > > > > If the user askes for -cpu $MODEL, then could we set phys-bits=NNN for some > > NNN that is the lowest value for CPUs that are capable of running $MODEL ? > > Or will that get too complicated with the wide range of SKU variants, in > > particular server vs desktop CPUs. > > This makes sense too. We need some help from CPU vendors to get > us this data added to our CPU model table. I'm CCing some Intel > and AMD people that could help us.
That bit worries me because I think I agree it's SKU dependent and has been for a long time (on Intel at least) and we don't even have CPU models for all Intel devices. (My laptop for example is a Kaby Lake, 39 bits physical). Maybe it works on the more modern ones where we have 'Icelake-Client' and 'Icelake-Server'. Dave > -- > Eduardo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK