Re: [libvirt] Qemu, libvirt, and CPU models

2012-03-07 Thread Daniel P. Berrange
On Tue, Mar 06, 2012 at 03:27:53PM -0300, Eduardo Habkost wrote:
 Hi,
 
 Sorry for the long message, but I didn't find a way to summarize the
 questions and issues and make it shorter.
 
 For people who don't know me: I have started to work recently on the
 Qemu CPU model code. I have been looking at how things work on
 libvirt+Qemu today w.r.t. CPU models, and I have some points I would
 like to understand better and see if they can be improved.
 
 I have two main points I would like to understand/discuss:
 
 1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
definitions.

We have several areas of code in which we use CPU definitions

 - Reporting the host CPU definition (virsh capabilities)
 - Calculating host CPU compatibility / baseline definitions
 - Checking guest / host CPU compatibility
 - Configuring the guest CPU definition

libvirt targets multiple platforms, and our CPU handling code is designed
to be common  sharable across all the libvirt drivers, VMWare, Xen, KVM,
LXC, etc. Obviously for container based virt, only the host side of things
is relevant.

The libvirt CPU XML definition consists of

 - Model name
 - Vendor name
 - zero or more feature flags added/removed.

A model name is basically just an alias for a bunch of feature flags,
so that the CPU XML definitions are a) reasonably short b) have
some sensible default baselines.

The cpu_map.xml is the database of the CPU models that libvirt
supports. We use this database to transform the CPU definition
from the guest XML, into the hypervisor's own format.

As luck would have it, the cpu_map.xml file contents match what
QEMU has. This need not be the case though. If there is a model
in the libvirt cpu_map.xml that QEMU doesn't know, we'll just
pick the nearest matching QEMU cpu model  specify the fature
flags to compensate. We could go one step further and just write
out a cpu.conf file that we load in QEMU with -loadconfig.

On Xen we would use the cpu_map.xml to generate the CPUID
masks that Xen expects. Similarly for VMWare.

 2) How we could properly allow CPU models to be changed without breaking
existing virtual machines?

What is the scope of changes expected to CPU models ?

 1) Qemu and cpu_map.xml
 
 I would like to understand how cpu_map.xml is supposed to be used, and
 how it is supposed to interact with the CPU model definitions provided
 by Qemu. More precisely:
 
 1.1) Do we want to eliminate the duplication between the Qemu CPU
   definitions and cpu_map.xml?

It isn't possible for us to the libvirt cpu_map.xml, since we
need that across all our hypervisor targets.

 1.1.1) If we want to eliminate the duplication, how can we accomplish
   that? What interfaces you miss, that Qemu could provide?
 
 1.1.2) If the duplication has a purpose and you want to keep
   cpu_map.xml, then:
   - First, I would like to understand why libvirt needs cpu_map.xml? Is
 it part of the public interface of libvirt, or is it just an
 internal file where libvirt stores non-user-visible data?
   - How can we make sure there is no confusion between libvirt and Qemu
 about the CPU models? For example, what if cpu_map.xml says model
 'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
 guarantee that libvirt gets exactly what it expects from Qemu when
 it asks for a CPU model? We have -cpu ?dump today, but it's not
 the better interface we could have. Do you miss something in special
 in the Qemu-libvirt interface, to help on that?
 
 1.2) About the probing of available features on the host system: Qemu
   has code specialized to query KVM about the available features, and to
   check what can be enabled and what can't be enabled in a VM. On many
   cases, the available features match exactly what is returned by the
   CPUID instruction on the host system, but there are some
   exceptions:
   - Some features can be enabled even when the host CPU doesn't support
 it (because they are completely emulated by KVM, e.g. x2apic).
   - On many other cases, the feature may be available but we have to
 check if Qemu+KVM are really able to expose it to the guest (many
 features work this way, as many depend on specific support by the
 KVM kernel module and/or Qemu).
   
   I suppose libvirt does want to check which flags can be enabled in a
   VM, as it already have checks for host CPU features (e.g.
   src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
   doesn't want to duplicate the KVM feature probing code present on
   Qemu, and in this case we could have an interface where libvirt could
   query for the actually-available CPU features. Would it be useful for
   libvirt? What's the best way to expose this interface?
 
 1.3) Some features are not plain CPU feature bits: e.g. level=X can be
   set in -cpu argument, and other features are enabled/disabled by
   exposing specific CPUID leafs and not just a feature bit (e.g. PMU
   CPUID leaf support). I

[libvirt] Qemu, libvirt, and CPU models

2012-03-06 Thread Eduardo Habkost
Hi,

Sorry for the long message, but I didn't find a way to summarize the
questions and issues and make it shorter.

For people who don't know me: I have started to work recently on the
Qemu CPU model code. I have been looking at how things work on
libvirt+Qemu today w.r.t. CPU models, and I have some points I would
like to understand better and see if they can be improved.

I have two main points I would like to understand/discuss:

1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
   definitions.

2) How we could properly allow CPU models to be changed without breaking
   existing virtual machines?

Note that for all the questions below, I don't expect that we design the
whole solution and discuss every single detail in this thread. I just
want to collectn suggestions, information about libvirt requirements and
assumptions, and warnings about expected pitfalls before I start working
on a solution on Qemu.


1) Qemu and cpu_map.xml

I would like to understand how cpu_map.xml is supposed to be used, and
how it is supposed to interact with the CPU model definitions provided
by Qemu. More precisely:

1.1) Do we want to eliminate the duplication between the Qemu CPU
  definitions and cpu_map.xml?

1.1.1) If we want to eliminate the duplication, how can we accomplish
  that? What interfaces you miss, that Qemu could provide?

1.1.2) If the duplication has a purpose and you want to keep
  cpu_map.xml, then:
  - First, I would like to understand why libvirt needs cpu_map.xml? Is
it part of the public interface of libvirt, or is it just an
internal file where libvirt stores non-user-visible data?
  - How can we make sure there is no confusion between libvirt and Qemu
about the CPU models? For example, what if cpu_map.xml says model
'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
guarantee that libvirt gets exactly what it expects from Qemu when
it asks for a CPU model? We have -cpu ?dump today, but it's not
the better interface we could have. Do you miss something in special
in the Qemu-libvirt interface, to help on that?

1.2) About the probing of available features on the host system: Qemu
  has code specialized to query KVM about the available features, and to
  check what can be enabled and what can't be enabled in a VM. On many
  cases, the available features match exactly what is returned by the
  CPUID instruction on the host system, but there are some
  exceptions:
  - Some features can be enabled even when the host CPU doesn't support
it (because they are completely emulated by KVM, e.g. x2apic).
  - On many other cases, the feature may be available but we have to
check if Qemu+KVM are really able to expose it to the guest (many
features work this way, as many depend on specific support by the
KVM kernel module and/or Qemu).
  
  I suppose libvirt does want to check which flags can be enabled in a
  VM, as it already have checks for host CPU features (e.g.
  src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
  doesn't want to duplicate the KVM feature probing code present on
  Qemu, and in this case we could have an interface where libvirt could
  query for the actually-available CPU features. Would it be useful for
  libvirt? What's the best way to expose this interface?

1.3) Some features are not plain CPU feature bits: e.g. level=X can be
  set in -cpu argument, and other features are enabled/disabled by
  exposing specific CPUID leafs and not just a feature bit (e.g. PMU
  CPUID leaf support). I suppose libvirt wants to be able to probe for
  those features too, and be able to enable/disable them, right?



2) How to change an existing model and keep existing VMs working?

Sometimes we have to update a CPU model definition because of some bug.
Eamples:

- The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
  works most times, but it breaks CPU core/thread topology enumeration.
  We have to change those CPU models to use level=4 to fix the bug.

- This can happen with plain CPU feature bits, too, not just level:
  sometimes real-world CPU models have a feature that is not supported
  by Qemu+KVM yet, but when the kernel and Qemu finally starts to
  support it, we may want to enable it on existing CPU models. Sometimes
  a model simply has the wrong set of feature bits, and we have to fix
  it to have the right set of features.

But if we simply change the existing model definition, this will break
existing machines:

- Today, it would break on live migration, but that's slightly easy to
  fix: we have to migrate the CPUID information too, to make sure we
  won't change the CPU under the guest OS feet.

- Even if we fix live migration, simple cold migration will make the
  guest OS see a different CPU after a reboot, and that's undesirable
  too. Even if the Qemu developers disagree with me and decide that this
  is not a problem, libvirt may want to expose a more stable CPU