Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-10 Thread Paolo Bonzini
On 10/07/20 18:02, Eduardo Habkost wrote:
> On Fri, Jul 10, 2020 at 09:22:42AM +0200, Paolo Bonzini wrote:
>> On 09/07/20 21:13, Eduardo Habkost wrote:
 Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
 mode, so that the hypervisor can validate the high bits in the PDPTEs?
>>> If the fix has additional overhead, is the additional overhead
>>> bad enough to warrant making it optional?  Most existing
>>> GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today
>>> without the fix.
>>
>> The problematic case is when host maxphyaddr is 52.  That case wouldn't
>> work at all without the fix.
> 
> What can QEMU do to do differentiate "can't work at all without
> the fix" from "not the best idea, but will probably work"?

Blocking guest_maxphyaddr < host_maxphyaddr if maxphyaddr==52 would be a
good start.  However it would block the default configuration on IceLake
processors (which is why Mohammed looked at this thing in the first place).

Paolo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-10 Thread Eduardo Habkost
On Fri, Jul 10, 2020 at 09:22:42AM +0200, Paolo Bonzini wrote:
> On 09/07/20 21:13, Eduardo Habkost wrote:
> >> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
> >> mode, so that the hypervisor can validate the high bits in the PDPTEs?
> > If the fix has additional overhead, is the additional overhead
> > bad enough to warrant making it optional?  Most existing
> > GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today
> > without the fix.
> 
> The problematic case is when host maxphyaddr is 52.  That case wouldn't
> work at all without the fix.

What can QEMU do to do differentiate "can't work at all without
the fix" from "not the best idea, but will probably work"?

-- 
Eduardo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-10 Thread Paolo Bonzini
On 09/07/20 21:13, Eduardo Habkost wrote:
>> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
>> mode, so that the hypervisor can validate the high bits in the PDPTEs?
> If the fix has additional overhead, is the additional overhead
> bad enough to warrant making it optional?  Most existing
> GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today
> without the fix.

The problematic case is when host maxphyaddr is 52.  That case wouldn't
work at all without the fix.

Paolo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-10 Thread Paolo Bonzini
On 09/07/20 19:00, Jim Mattson wrote:
>>
>> Mostly fine.  Some edge cases, like different page fault errors for
>> addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR.  Which I
>> think Mohammed fixed in the kernel recently.
> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
> mode, so that the hypervisor can validate the high bits in the PDPTEs?

In theory yes, but in practice it just means we'd use the AMD behavior
of loading guest PDPT entries on demand during address translation
(because the PDPT would point to nonexistent memory and cause an EPT
violation on the PDE).

Paolo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-09 Thread Eduardo Habkost
On Thu, Jul 09, 2020 at 10:00:59AM -0700, Jim Mattson wrote:
> On Thu, Jul 9, 2020 at 2:44 AM Gerd Hoffmann  wrote:
> 
> > (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR
> >
> > Mostly fine.  Some edge cases, like different page fault errors for
> > addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR.  Which I
> > think Mohammed fixed in the kernel recently.
> 
> Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
> mode, so that the hypervisor can validate the high bits in the PDPTEs?

If the fix has additional overhead, is the additional overhead
bad enough to warrant making it optional?  Most existing
GUEST_MAXPHYADDR < HOST_MAXPHYADDR guests already work today
without the fix.

-- 
Eduardo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-09 Thread Jim Mattson
On Thu, Jul 9, 2020 at 2:44 AM Gerd Hoffmann  wrote:

> (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR
>
> Mostly fine.  Some edge cases, like different page fault errors for
> addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR.  Which I
> think Mohammed fixed in the kernel recently.

Doesn't this require intercepting MOV-to-CR3 when the guest is in PAE
mode, so that the hypervisor can validate the high bits in the PDPTEs?



Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-09 Thread Paolo Bonzini
On 09/07/20 11:55, Mohammed Gamal wrote:
>> Ideally we would simply outlaw (3), but it's hard for backward
>> compatibility reasons.  Second best solution is a flag somewhere
>> (msr, cpuid, ...) telling the guest firmware "you can use
>> GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR".
> Problem is GUEST_MAXPHYADDR > HOST_MAXPHYADDR is actually a supported
> configuration on some setups. Namely when memory encryption is enabled
> on AMD CPUs[1].
> 

It's not that bad since there's two MAXPHYADDRs, the one in CPUID and
the one computed internally by the kernel.  GUEST_MAXPHYADDR greater
than the host CPUID maxphyaddr is never supported.

Paolo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-09 Thread Mohammed Gamal
On Thu, 2020-07-09 at 11:44 +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > (CCing libvir-list, and people who were included in the OVMF
> > > thread[1])
> > > 
> > > [1] 
> > > https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/
> > > Also, it's important that we work with libvirt and management
> > > software to ensure they have appropriate APIs to choose what to
> > > do when a cluster has hosts with different MAXPHYADDR.
> > 
> > There's been so many complex discussions that it is hard to have
> > any
> > understanding of what we should be doing going forward. There's
> > enough
> > problems wrt phys bits, that I think we would benefit from a doc
> > that
> > outlines the big picture expectation for how to handle this in the
> > virt stack.
> 
> Well, the fundamental issue is not that hard actually.  We have three
> cases:
> 
> (1) GUEST_MAXPHYADDR == HOST_MAXPHYADDR
> 
> Everything is fine ;)
> 
> (2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR
> 
> Mostly fine.  Some edge cases, like different page fault errors
> for
> addresses above GUEST_MAXPHYADDR and below
> HOST_MAXPHYADDR.  Which I
> think Mohammed fixed in the kernel recently.
> 
> (3) GUEST_MAXPHYADDR > HOST_MAXPHYADDR
> 
> Broken.  If the guest uses addresses above HOST_MAXPHYADDR
> everything
> goes south.
> 
> The (2) case isn't much of a problem.  We only need to figure
> whenever
> we want qemu allow this unconditionally (current state) or only in
> case
> the kernel fixes are present (state with this patch applied if I read
> it
> correctly).
> 
> The (3) case is the reason why guest firmware never ever uses
> GUEST_MAXPHYADDR and goes with very conservative heuristics instead,
> which in turn leads to the consequences discussed at length in the
> OVMF thread linked above.
> 
> Ideally we would simply outlaw (3), but it's hard for backward
> compatibility reasons.  Second best solution is a flag somewhere
> (msr, cpuid, ...) telling the guest firmware "you can use
> GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR".

Problem is GUEST_MAXPHYADDR > HOST_MAXPHYADDR is actually a supported
configuration on some setups. Namely when memory encryption is enabled
on AMD CPUs[1].

> 
> > As mentioned in the thread quoted above, using host_phys_bits is a
> > obvious thing to do when the user requested "-cpu host".
> > 
> > The harder issue is how to handle other CPU models. I had suggested
> > we should try associating a phys bits value with them, which would
> > probably involve creating Client/Server variants for all our CPU
> > models which don't currently have it. I still think that's worth
> > exploring as a strategy and with versioned CPU models we should
> > be ok wrt back compatibility with that approach.
> 
> Yep, better defaults for GUEST_MAXPHYADDR would be good too, but that
> is a separate (although related) discussion.
> 
> take care,
>   Gerd
> 
[1] - https://lkml.org/lkml/2020/6/19/2371




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-09 Thread Gerd Hoffmann
  Hi,

> > (CCing libvir-list, and people who were included in the OVMF
> > thread[1])
> > 
> > [1] 
> > https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/

> > Also, it's important that we work with libvirt and management
> > software to ensure they have appropriate APIs to choose what to
> > do when a cluster has hosts with different MAXPHYADDR.
> 
> There's been so many complex discussions that it is hard to have any
> understanding of what we should be doing going forward. There's enough
> problems wrt phys bits, that I think we would benefit from a doc that
> outlines the big picture expectation for how to handle this in the
> virt stack.

Well, the fundamental issue is not that hard actually.  We have three
cases:

(1) GUEST_MAXPHYADDR == HOST_MAXPHYADDR

Everything is fine ;)

(2) GUEST_MAXPHYADDR < HOST_MAXPHYADDR

Mostly fine.  Some edge cases, like different page fault errors for
addresses above GUEST_MAXPHYADDR and below HOST_MAXPHYADDR.  Which I
think Mohammed fixed in the kernel recently.

(3) GUEST_MAXPHYADDR > HOST_MAXPHYADDR

Broken.  If the guest uses addresses above HOST_MAXPHYADDR everything
goes south.

The (2) case isn't much of a problem.  We only need to figure whenever
we want qemu allow this unconditionally (current state) or only in case
the kernel fixes are present (state with this patch applied if I read it
correctly).

The (3) case is the reason why guest firmware never ever uses
GUEST_MAXPHYADDR and goes with very conservative heuristics instead,
which in turn leads to the consequences discussed at length in the
OVMF thread linked above.

Ideally we would simply outlaw (3), but it's hard for backward
compatibility reasons.  Second best solution is a flag somewhere
(msr, cpuid, ...) telling the guest firmware "you can use
GUEST_MAXPHYADDR, we guarantee it is <= HOST_MAXPHYADDR".

> As mentioned in the thread quoted above, using host_phys_bits is a
> obvious thing to do when the user requested "-cpu host".
> 
> The harder issue is how to handle other CPU models. I had suggested
> we should try associating a phys bits value with them, which would
> probably involve creating Client/Server variants for all our CPU
> models which don't currently have it. I still think that's worth
> exploring as a strategy and with versioned CPU models we should
> be ok wrt back compatibility with that approach.

Yep, better defaults for GUEST_MAXPHYADDR would be good too, but that
is a separate (although related) discussion.

take care,
  Gerd




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-08 Thread Eduardo Habkost
(CCing libvir-list, and people who were included in the OVMF
thread[1])

[1] 
https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/

On Fri, Jun 19, 2020 at 05:53:44PM +0200, Mohammed Gamal wrote:
> If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we
> let QEMU choose to use the host MAXPHYADDR and print a warning to the
> user.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  target/i386/cpu.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b1b311baa2..91c57117ce 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  uint32_t host_phys_bits = x86_host_phys_bits();
>  static bool warned;
>  
> + /*
> +  * If host doesn't support setting physical bits on the guest,
> +  * report it and return
> +  */
> + if (cpu->phys_bits < host_phys_bits &&
> + !kvm_has_smaller_maxphyaddr()) {
> + warn_report("Host doesn't support setting smaller phys-bits."
> + " Using host phys-bits\n");
> + cpu->phys_bits = host_phys_bits;
> + }
> +

This looks like a regression from existing behavior.  Today,
using smaller phys-bits doesn't crash most guests, and still
allows live migration to smaller hosts.  I agree using the host
phys-bits is probably a better default, but we shouldn't override
options set explicitly in the command line.

Also, it's important that we work with libvirt and management
software to ensure they have appropriate APIs to choose what to
do when a cluster has hosts with different MAXPHYADDR.

>  /* Print a warning if the user set it to a value that's not the
>   * host value.
>   */
> -- 
> 2.26.2
> 

-- 
Eduardo




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-07-08 Thread Daniel P . Berrangé
On Wed, Jul 08, 2020 at 01:16:21PM -0400, Eduardo Habkost wrote:
> (CCing libvir-list, and people who were included in the OVMF
> thread[1])
> 
> [1] 
> https://lore.kernel.org/qemu-devel/99779e9c-f05f-501b-b4be-ff719f140...@canonical.com/
> 
> On Fri, Jun 19, 2020 at 05:53:44PM +0200, Mohammed Gamal wrote:
> > If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we
> > let QEMU choose to use the host MAXPHYADDR and print a warning to the
> > user.
> > 
> > Signed-off-by: Mohammed Gamal 
> > ---
> >  target/i386/cpu.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index b1b311baa2..91c57117ce 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >  uint32_t host_phys_bits = x86_host_phys_bits();
> >  static bool warned;
> >  
> > +   /*
> > +* If host doesn't support setting physical bits on the guest,
> > +* report it and return
> > +*/
> > +   if (cpu->phys_bits < host_phys_bits &&
> > +   !kvm_has_smaller_maxphyaddr()) {
> > +   warn_report("Host doesn't support setting smaller phys-bits."
> > +   " Using host phys-bits\n");
> > +   cpu->phys_bits = host_phys_bits;
> > +   }
> > +
> 
> This looks like a regression from existing behavior.  Today,
> using smaller phys-bits doesn't crash most guests, and still
> allows live migration to smaller hosts.  I agree using the host
> phys-bits is probably a better default, but we shouldn't override
> options set explicitly in the command line.

Yeah, this looks like it would cause a behaviour change / regression
so looks questionable.

> Also, it's important that we work with libvirt and management
> software to ensure they have appropriate APIs to choose what to
> do when a cluster has hosts with different MAXPHYADDR.

There's been so many complex discussions that it is hard to have any
understanding of what we should be doing going forward. There's enough
problems wrt phys bits, that I think we would benefit from a doc that
outlines the big picture expectation for how to handle this in the
virt stack.

As mentioned in the thread quoted above, using host_phys_bits is a
obvious thing to do when the user requested "-cpu host".

The harder issue is how to handle other CPU models. I had suggested
we should try associating a phys bits value with them, which would
probably involve creating Client/Server variants for all our CPU
models which don't currently have it. I still think that's worth
exploring as a strategy and with versioned CPU models we should
be ok wrt back compatibility with that approach.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 2/2] x86/cpu: Handle GUEST_MAXPHYADDR < HOST_MAXPHYADDR for hosts that don't support it

2020-06-19 Thread Paolo Bonzini
On 19/06/20 17:53, Mohammed Gamal wrote:
> If the CPU doesn't support GUEST_MAXPHYADDR < HOST_MAXPHYADDR we
> let QEMU choose to use the host MAXPHYADDR and print a warning to the
> user.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  target/i386/cpu.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b1b311baa2..91c57117ce 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6589,6 +6589,17 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  uint32_t host_phys_bits = x86_host_phys_bits();
>  static bool warned;
>  
> + /*
> +  * If host doesn't support setting physical bits on the guest,
> +  * report it and return
> +  */
> + if (cpu->phys_bits < host_phys_bits &&
> + !kvm_has_smaller_maxphyaddr()) {
> + warn_report("Host doesn't support setting smaller phys-bits."
> + " Using host phys-bits\n");
> + cpu->phys_bits = host_phys_bits;
> + }
> +
>  /* Print a warning if the user set it to a value that's not the
>   * host value.
>   */
> 

You should remove the existing warning too.

Paolo