Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On 10/07/20 18:02, Eduardo Habkost wrote: > On Fri, Jul 10, 2020 at 09:22:42AM +0200, Paolo Bonzini wrote: >> On 09/07/20 21:13, Eduardo Habkost wrote: Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE mode, so that the hypervisor can validate the high bits in the PDPTEs? >>> If the fix has additional overhead, is the additional overhead >>> bad enough to warrant making it optional? Most existing >>> GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today >>> without the fix. >> >> The problematic case is when host maxphyaddr is 52. That case wouldn't >> work at all without the fix. > > What can QEMU do to do differentiate "can't work at all without > the fix" from "not the best idea, but will probably work"? Blocking guest_maxphyaddr < host_maxphyaddr if maxphyaddr==52 would be a good start. However it would block the default configuration on IceLake processors (which is why Mohammed looked at this thing in the first place). Paolo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On Fri, Jul 10, 2020 at 09:22:42AM +0200, Paolo Bonzini wrote: > On 09/07/20 21:13, Eduardo Habkost wrote: > >> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE > >> mode, so that the hypervisor can validate the high bits in the PDPTEs? > > If the fix has additional overhead, is the additional overhead > > bad enough to warrant making it optional? Most existing > > GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today > > without the fix. > > The problematic case is when host maxphyaddr is 52. That case wouldn't > work at all without the fix. What can QEMU do to do differentiate "can't work at all without the fix" from "not the best idea, but will probably work"? -- Eduardo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On 09/07/20 21:13, Eduardo Habkost wrote: >> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE >> mode, so that the hypervisor can validate the high bits in the PDPTEs? > If the fix has additional overhead, is the additional overhead > bad enough to warrant making it optional? Most existing > GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today > without the fix. The problematic case is when host maxphyaddr is 52. That case wouldn't work at all without the fix. Paolo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On 09/07/20 19:00, Jim Mattson wrote: >> >> Mostly fine. Some edge cases, like different page fault errors for >> addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR. Which I >> think Mohammed fixed in the kernel recently. > Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE > mode, so that the hypervisor can validate the high bits in the PDPTEs? In theory yes, but in practice it just means we'd use the AMD behavior of loading guest PDPT entries on demand during address translation (because the PDPT would point to nonexistent memory and cause an EPT violation on the PDE). Paolo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On Thu, Jul 09, 2020 at 10:00:59AM -0700, Jim Mattson wrote: > On Thu, Jul 9, 2020 at 2:44 AM Gerd Hoffmann wrote: > > > (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR > > > > Mostly fine. Some edge cases, like different page fault errors for > > addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR. Which I > > think Mohammed fixed in the kernel recently. > > Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE > mode, so that the hypervisor can validate the high bits in the PDPTEs? If the fix has additional overhead, is the additional overhead bad enough to warrant making it optional? Most existing GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today without the fix. -- Eduardo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On Thu, Jul 9, 2020 at 2:44 AM Gerd Hoffmann wrote: > (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR > > Mostly fine. Some edge cases, like different page fault errors for > addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR. Which I > think Mohammed fixed in the kernel recently. Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE mode, so that the hypervisor can validate the high bits in the PDPTEs?
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On 09/07/20 11:55, Mohammed Gamal wrote: >> Ideally we would simply outlaw (3), but it's hard for backward >> compatibility reasons. Second best solution is a flag somewhere >> (msr, cpuid, ...) telling the guest firmware "you can use >> GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR". > Problem is GUEST_MAXPHYADDR > HOST_MAXPHYADDR is actually a supported > configuration on some setups. Namely when memory encryption is enabled > on AMD CPUs[1]. > It's not that bad since there's two MAXPHYADDRs, the one in CPUID and the one computed internally by the kernel. GUEST_MAXPHYADDR greater than the host CPUID maxphyaddr is never supported. Paolo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On Thu, 2020-07-09 at 11:44 +0200, Gerd Hoffmann wrote: > Hi, > > > > (CCing libvir-list, and people who were included in the OVMF > > > thread[1]) > > > > > > [1] > > > https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/ > > > Also, it's important that we work with libvirt and management > > > software to ensure they have appropriate APIs to choose what to > > > do when a cluster has hosts with different MAXPHYADDR. > > > > There's been so many complex discussions that it is hard to have > > any > > understanding of what we should be doing going forward. There's > > enough > > problems wrt phys bits, that I think we would benefit from a doc > > that > > outlines the big picture expectation for how to handle this in the > > virt stack. > > Well, the fundamental issue is not that hard actually. We have three > cases: > > (1) GUEST_MAXPHYADDR == HOST_MAXPHYADDR > > Everything is fine ;) > > (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR > > Mostly fine. Some edge cases, like different page fault errors > for > addresses above GUEST_MAXPHYADDR and below > HOST_MAXPHYADDR. Which I > think Mohammed fixed in the kernel recently. > > (3) GUEST_MAXPHYADDR > HOST_MAXPHYADDR > > Broken. If the guest uses addresses above HOST_MAXPHYADDR > everything > goes south. > > The (2) case isn't much of a problem. We only need to figure > whenever > we want qemu allow this unconditionally (current state) or only in > case > the kernel fixes are present (state with this patch applied if I read > it > correctly). > > The (3) case is the reason why guest firmware never ever uses > GUEST_MAXPHYADDR and goes with very conservative heuristics instead, > which in turn leads to the consequences discussed at length in the > OVMF thread linked above. > > Ideally we would simply outlaw (3), but it's hard for backward > compatibility reasons. Second best solution is a flag somewhere > (msr, cpuid, ...) telling the guest firmware "you can use > GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR". Problem is GUEST_MAXPHYADDR > HOST_MAXPHYADDR is actually a supported configuration on some setups. Namely when memory encryption is enabled on AMD CPUs[1]. > > > As mentioned in the thread quoted above, using host_phys_bits is a > > obvious thing to do when the user requested "-cpu host". > > > > The harder issue is how to handle other CPU models. I had suggested > > we should try associating a phys bits value with them, which would > > probably involve creating Client/Server variants for all our CPU > > models which don't currently have it. I still think that's worth > > exploring as a strategy and with versioned CPU models we should > > be ok wrt back compatibility with that approach. > > Yep, better defaults for GUEST_MAXPHYADDR would be good too, but that > is a separate (although related) discussion. > > take care, > Gerd > [1] - https://lkml.org/lkml/2020/6/19/2371
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
Hi, > > (CCing libvir-list, and people who were included in the OVMF > > thread[1]) > > > > [1] > > https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/ > > Also, it's important that we work with libvirt and management > > software to ensure they have appropriate APIs to choose what to > > do when a cluster has hosts with different MAXPHYADDR. > > There's been so many complex discussions that it is hard to have any > understanding of what we should be doing going forward. There's enough > problems wrt phys bits, that I think we would benefit from a doc that > outlines the big picture expectation for how to handle this in the > virt stack. Well, the fundamental issue is not that hard actually. We have three cases: (1) GUEST_MAXPHYADDR == HOST_MAXPHYADDR Everything is fine ;) (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR Mostly fine. Some edge cases, like different page fault errors for addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR. Which I think Mohammed fixed in the kernel recently. (3) GUEST_MAXPHYADDR > HOST_MAXPHYADDR Broken. If the guest uses addresses above HOST_MAXPHYADDR everything goes south. The (2) case isn't much of a problem. We only need to figure whenever we want qemu allow this unconditionally (current state) or only in case the kernel fixes are present (state with this patch applied if I read it correctly). The (3) case is the reason why guest firmware never ever uses GUEST_MAXPHYADDR and goes with very conservative heuristics instead, which in turn leads to the consequences discussed at length in the OVMF thread linked above. Ideally we would simply outlaw (3), but it's hard for backward compatibility reasons. Second best solution is a flag somewhere (msr, cpuid, ...) telling the guest firmware "you can use GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR". > As mentioned in the thread quoted above, using host_phys_bits is a > obvious thing to do when the user requested "-cpu host". > > The harder issue is how to handle other CPU models. I had suggested > we should try associating a phys bits value with them, which would > probably involve creating Client/Server variants for all our CPU > models which don't currently have it. I still think that's worth > exploring as a strategy and with versioned CPU models we should > be ok wrt back compatibility with that approach. Yep, better defaults for GUEST_MAXPHYADDR would be good too, but that is a separate (although related) discussion. take care, Gerd
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
(CCing libvir-list, and people who were included in the OVMF thread[1]) [1] https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/ On Fri, Jun 19, 2020 at 05:53:44PM +0200, Mohammed Gamal wrote: > If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we > let QEMU choose to use the host MAXPHYADDR and print a warning to the > user. > > Signed-off-by: Mohammed Gamal > --- > target/i386/cpu.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index b1b311baa2..91c57117ce 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, Error > **errp) > uint32_t host_phys_bits = x86_host_phys_bits(); > static bool warned; > > + /* > + * If host doesn't support setting physical bits on the guest, > + * report it and return > + */ > + if (cpu->phys_bits < host_phys_bits && > + !kvm_has_smaller_maxphyaddr()) { > + warn_report("Host doesn't support setting smaller phys-bits." > + " Using host phys-bits\n"); > + cpu->phys_bits = host_phys_bits; > + } > + This looks like a regression from existing behavior. Today, using smaller phys-bits doesn't crash most guests, and still allows live migration to smaller hosts. I agree using the host phys-bits is probably a better default, but we shouldn't override options set explicitly in the command line. Also, it's important that we work with libvirt and management software to ensure they have appropriate APIs to choose what to do when a cluster has hosts with different MAXPHYADDR. > /* Print a warning if the user set it to a value that's not the > * host value. > */ > -- > 2.26.2 > -- Eduardo
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On Wed, Jul 08, 2020 at 01:16:21PM -0400, Eduardo Habkost wrote: > (CCing libvir-list, and people who were included in the OVMF > thread[1]) > > [1] > https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/ > > On Fri, Jun 19, 2020 at 05:53:44PM +0200, Mohammed Gamal wrote: > > If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we > > let QEMU choose to use the host MAXPHYADDR and print a warning to the > > user. > > > > Signed-off-by: Mohammed Gamal > > --- > > target/i386/cpu.c | 11 +++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > > index b1b311baa2..91c57117ce 100644 > > --- a/target/i386/cpu.c > > +++ b/target/i386/cpu.c > > @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, > > Error **errp) > > uint32_t host_phys_bits = x86_host_phys_bits(); > > static bool warned; > > > > + /* > > +* If host doesn't support setting physical bits on the guest, > > +* report it and return > > +*/ > > + if (cpu->phys_bits < host_phys_bits && > > + !kvm_has_smaller_maxphyaddr()) { > > + warn_report("Host doesn't support setting smaller phys-bits." > > + " Using host phys-bits\n"); > > + cpu->phys_bits = host_phys_bits; > > + } > > + > > This looks like a regression from existing behavior. Today, > using smaller phys-bits doesn't crash most guests, and still > allows live migration to smaller hosts. I agree using the host > phys-bits is probably a better default, but we shouldn't override > options set explicitly in the command line. Yeah, this looks like it would cause a behaviour change / regression so looks questionable. > Also, it's important that we work with libvirt and management > software to ensure they have appropriate APIs to choose what to > do when a cluster has hosts with different MAXPHYADDR. There's been so many complex discussions that it is hard to have any understanding of what we should be doing going forward. There's enough problems wrt phys bits, that I think we would benefit from a doc that outlines the big picture expectation for how to handle this in the virt stack. As mentioned in the thread quoted above, using host_phys_bits is a obvious thing to do when the user requested "-cpu host". The harder issue is how to handle other CPU models. I had suggested we should try associating a phys bits value with them, which would probably involve creating Client/Server variants for all our CPU models which don't currently have it. I still think that's worth exploring as a strategy and with versioned CPU models we should be ok wrt back compatibility with that approach. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it
On 19/06/20 17:53, Mohammed Gamal wrote: > If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we > let QEMU choose to use the host MAXPHYADDR and print a warning to the > user. > > Signed-off-by: Mohammed Gamal > --- > target/i386/cpu.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index b1b311baa2..91c57117ce 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, Error > **errp) > uint32_t host_phys_bits = x86_host_phys_bits(); > static bool warned; > > + /* > + * If host doesn't support setting physical bits on the guest, > + * report it and return > + */ > + if (cpu->phys_bits < host_phys_bits && > + !kvm_has_smaller_maxphyaddr()) { > + warn_report("Host doesn't support setting smaller phys-bits." > + " Using host phys-bits\n"); > + cpu->phys_bits = host_phys_bits; > + } > + > /* Print a warning if the user set it to a value that's not the > * host value. > */ > You should remove the existing warning too. Paolo