Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On Thu, Mar 08, 2012 at 02:41:54PM +0100, Jiri Denemark wrote: On Wed, Mar 07, 2012 at 19:26:25 -0300, Eduardo Habkost wrote: Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and add the necessary flags? That was my main worry. If disagreement between Qemu and libvirt is not a problem, it would make things much easier. ...but: Is that really implemented? I simply don't see libvirt doing that. I see code that calls -cpu ? to list the available CPU models, but no code calling -cpu ?dump, or parsing the Qemu CPU definition config file. I even removed some random flags from the Nehalem model on my machine (running Fedora 16), and no additional flags were added. Right, currently we only detect if Qemu knows requested CPU model and use another one if not. We should really start using something like -cpu ?dump. However, since qemu may decide to change parts of the model according to, e.g., machine type, we would need something more dynamic. Something like, hey Qemu, this is the machine type and CPU model we want to use, these are the features we want in this model, and we also want few additional features, please, tell us what the resulting CPU configuration is (if it is even possible to deliver such CPU on current host). And the result would be complete CPU model, which may of course be different from what the qemu's configuration file says. We could then use the result to update domain XML (in a way similar to how we handle machine types) so that we can guarantee the guest will always see the same CPU. Once CPU is updated, we could just check with Qemu if it can provide such CPU and start (or refuse to start) the domain. Does it seem reasonable? Absolutely. I would even advise libvirt to refrain from using -cpu ?dump, as its semantics are likely to change. We could go one step further and just write out a cpu.conf file that we load in QEMU with -loadconfig. Sounds good. Anyway, I want to make everything configurable on the cpudef config file configurable on the command-line too, so both options (command-line or config file) would work. I'd be afraid of hitting the command line length limit if we specified all CPU details in it :-) True. I am already afraid of hitting the command-line length limit with Qemu as-is right now. ;-) So, it looks like either I am missing something on my tests or libvirt is _not_ probing the Qemu CPU model definitions to make sure libvirt gets all the features it expects. Also, I would like to ask if you have suggestions to implement the equivalent of -cpu ?dump in a more friendly and extensible way. Would a QMP command be a good alternative? Would a command-line option with json output be good enough? I quite like the possible solution Anthony (or perhaps someone else) suggested some time ago (it may however be biased by my memory): qemu could provide a command line option that would take QMP command(s) and the result would be QMP response on stdout. We could use this interface for all kinds of probes with easily parsed output. This is another case where command-line limits could be hit, isn't it? Reading QMP commands from a normal chardev (a socket, or even stdio) is already available, we just need to make sure the query QMP without ever initializing a machine use-case is working and really supported by Qemu. So, about the above: the cases where libvirt thinks a feature is available but Qemu knows it is not available are sort-of OK today, because Qemu would simply refuse to start and an error message would be returned to the user. Really? In my experience qemu just ignored the feature it didn't know about without any error message and started the domain happily. It might be because libvirt doesn't use anything like -cpu ...,check or whatever is needed to make it fail. However, I think we should fix it. Correct, I was assuming that 'enforce' was being used. I forgot that libvirt doesn't use it today. I really think libvirt should be using 'enforce', the only problem is that there may be cases where an existing VM was working (but with a result unpredictable by by libvirt), and with 'enforce' it would stop working. This is very likely to happen when using the defualt qemu64 CPU model, that has some AMD-only CPUID:8000_h bits set, but everybody probably expects it to work on Intel CPU hosts too. But what about the features that are not available on the host CPU, libvirt will think it can't be enabled, but that _can_ be enabled? x2apic seems to be the only case today, but we may have others in the future. I think qemu could tell us about those features during the probe phase (my first paragraph) and we would either use them with policy='force' or something similar. Yes, that's the conclusion I was trying to reach: we really need better CPU feature probing. -- Eduardo -- libvir-list mailing list libvir-list@redhat.com
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On Wed, 2012-03-07 at 16:07 -0700, Eric Blake wrote: On 03/07/2012 03:26 PM, Eduardo Habkost wrote: Thanks a lot for the explanations, Daniel. Comments about specific items inline. - How can we make sure there is no confusion between libvirt and Qemu about the CPU models? For example, what if cpu_map.xml says model 'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we guarantee that libvirt gets exactly what it expects from Qemu when it asks for a CPU model? We have -cpu ?dump today, but it's not the better interface we could have. Do you miss something in special in the Qemu-libvirt interface, to help on that? So, it looks like either I am missing something on my tests or libvirt is _not_ probing the Qemu CPU model definitions to make sure libvirt gets all the features it expects. Also, I would like to ask if you have suggestions to implement the equivalent of -cpu ?dump in a more friendly and extensible way. Would a QMP command be a good alternative? Would a command-line option with json output be good enough? I'm not sure where we are are using -cpu ?dump, but it sounds like we should be. (Do we have any case of capability-querying being made using QMP before starting any actual VM, today?) Right now, we have two levels of queries - the 'qemu -help' and 'qemu -device ?' output is gathered up front (we really need to patch things to cache that, rather than repeating it for every VM start). Eric: In addition to VM start, it appears that the libvirt qemu driver also runs both the 32-bit and 64-bit qemu binaries 3 times each when fetching capabilities that appears to occur when fetching VM state. Noticed this on an openstack/nova compute node that queries vm state periodically. Seemed to be taking a long time. stracing libvirtd during these queries showed this sequence for each query: 6461 17:15:25.269464 execve(/usr/bin/qemu, [/usr/bin/qemu, -cpu, ?], [/* 2 vars */]) = 0 6462 17:15:25.335300 execve(/usr/bin/qemu, [/usr/bin/qemu, -help], [/* 2 vars */]) = 0 6463 17:15:25.393786 execve(/usr/bin/qemu, [/usr/bin/qemu, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 6466 17:15:25.841086 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -cpu, ?], [/* 2 vars */]) = 0 6468 17:15:25.906746 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -help], [/* 2 vars */]) = 0 6469 17:15:25.980520 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 Seems to add about a second per VM running on the host. The periodic scan thus takes a couple of minutes on a heavily loaded host -- several 10s of VMs. Not a killer, but we'd like to eliminate it. I see that libvirt does some level of caching of capabilities, checking the st_mtime of the binaries to detect changes. I haven't figured out when that caching comes into effect, but it doesn't prevent the execs above. So, I created a patch series that caches the results of parsing the output of these calls that I will post shortly for RFC. It eliminates most of such execs. I think it might obviate the existing capabilities caching, but I'm not sure. Haven't had time to look into it. Later, Lee Schermerhorn HPCS Then we start qemu with -S, query QMP, all before starting the guest (qemu -S is in fact necessary for setting some options that cannot be set in the current CLI but can be set via the monitor) - but right now that is the only point where we query QMP capabilities. snip -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On Wed, Mar 07, 2012 at 19:26:25 -0300, Eduardo Habkost wrote: Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and add the necessary flags? That was my main worry. If disagreement between Qemu and libvirt is not a problem, it would make things much easier. ...but: Is that really implemented? I simply don't see libvirt doing that. I see code that calls -cpu ? to list the available CPU models, but no code calling -cpu ?dump, or parsing the Qemu CPU definition config file. I even removed some random flags from the Nehalem model on my machine (running Fedora 16), and no additional flags were added. Right, currently we only detect if Qemu knows requested CPU model and use another one if not. We should really start using something like -cpu ?dump. However, since qemu may decide to change parts of the model according to, e.g., machine type, we would need something more dynamic. Something like, hey Qemu, this is the machine type and CPU model we want to use, these are the features we want in this model, and we also want few additional features, please, tell us what the resulting CPU configuration is (if it is even possible to deliver such CPU on current host). And the result would be complete CPU model, which may of course be different from what the qemu's configuration file says. We could then use the result to update domain XML (in a way similar to how we handle machine types) so that we can guarantee the guest will always see the same CPU. Once CPU is updated, we could just check with Qemu if it can provide such CPU and start (or refuse to start) the domain. Does it seem reasonable? We could go one step further and just write out a cpu.conf file that we load in QEMU with -loadconfig. Sounds good. Anyway, I want to make everything configurable on the cpudef config file configurable on the command-line too, so both options (command-line or config file) would work. I'd be afraid of hitting the command line length limit if we specified all CPU details in it :-) So, it looks like either I am missing something on my tests or libvirt is _not_ probing the Qemu CPU model definitions to make sure libvirt gets all the features it expects. Also, I would like to ask if you have suggestions to implement the equivalent of -cpu ?dump in a more friendly and extensible way. Would a QMP command be a good alternative? Would a command-line option with json output be good enough? I quite like the possible solution Anthony (or perhaps someone else) suggested some time ago (it may however be biased by my memory): qemu could provide a command line option that would take QMP command(s) and the result would be QMP response on stdout. We could use this interface for all kinds of probes with easily parsed output. (Do we have any case of capability-querying being made using QMP before starting any actual VM, today?) Not really. We only query QMP while for available QMP commands that we can used further on when the domain is running. So, about the above: the cases where libvirt thinks a feature is available but Qemu knows it is not available are sort-of OK today, because Qemu would simply refuse to start and an error message would be returned to the user. Really? In my experience qemu just ignored the feature it didn't know about without any error message and started the domain happily. It might be because libvirt doesn't use anything like -cpu ...,check or whatever is needed to make it fail. However, I think we should fix it. But what about the features that are not available on the host CPU, libvirt will think it can't be enabled, but that _can_ be enabled? x2apic seems to be the only case today, but we may have others in the future. I think qemu could tell us about those features during the probe phase (my first paragraph) and we would either use them with policy='force' or something similar. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On Wed, Mar 07, 2012 at 04:07:06PM -0700, Eric Blake wrote: (Do we have any case of capability-querying being made using QMP before starting any actual VM, today?) Right now, we have two levels of queries - the 'qemu -help' and 'qemu -device ?' output is gathered up front (we really need to patch things to cache that, rather than repeating it for every VM start). That's what I feared. I was wondering if we had a better machine-friendly interface to make some of these queries, today. Then we start qemu with -S, query QMP, all before starting the guest (qemu -S is in fact necessary for setting some options that cannot be set in the current CLI but can be set via the monitor) - but right now that is the only point where we query QMP capabilities. If QMP can alter the CPU model prior to the initial start of the guest, then that would be a sufficient interface. But I'm worried that once we start qemu, even with qemu -S, that it's too late to alter the CPU model in use by that guest, and that libvirt should instead start querying these things in advance. This is probably true, and I don't see this being changed in the near future. Even if we fix that for CPU initialization, there are many other initialization steps involved that would have to be reworked to allow all capability querying to be made to the same Qemu process that would run the VM later. We definitely want a machine-parseable construct, so querying over QMP rather than '-cpu ?dump' sounds like it might be nicer, but it would also be more work to set up libvirt to do a dry-run query of QMP capabilities without also starting a real guest. On the other hand, with QMP we would have a better interface that could be used for all other queries libvirt has to run. Instead of running Qemu multiple times for capability querying, just start a single Qemu process and make the capability queries using QMP. I don't know if this was discussed or considered before. But what about the features that are not available on the host CPU, libvirt will think it can't be enabled, but that _can_ be enabled? x2apic seems to be the only case today, but we may have others in the future. That's where having an interface to probe qemu to see what capabilities are possible for any given cpu model would be worthwhile, so that libvirt can correlate the feature sets properly. Yes. The issue currently is that many things don't depend just on static CPU model or machine-type definitions, libvirt has to know what capabilities the kernel provides and Qemu will really be able to use. It will be a long way to fix this. Some features are simply not configurable yet, even on the command-line. They are just automatically used by Qemu when provided by the kernel. -- Eduardo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
Thanks a lot for the explanations, Daniel. Comments about specific items inline. On Wed, Mar 07, 2012 at 02:18:28PM +, Daniel P. Berrange wrote: I have two main points I would like to understand/discuss: 1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model definitions. We have several areas of code in which we use CPU definitions - Reporting the host CPU definition (virsh capabilities) - Calculating host CPU compatibility / baseline definitions - Checking guest / host CPU compatibility - Configuring the guest CPU definition libvirt targets multiple platforms, and our CPU handling code is designed to be common sharable across all the libvirt drivers, VMWare, Xen, KVM, LXC, etc. Obviously for container based virt, only the host side of things is relevant. The libvirt CPU XML definition consists of - Model name - Vendor name - zero or more feature flags added/removed. A model name is basically just an alias for a bunch of feature flags, so that the CPU XML definitions are a) reasonably short b) have some sensible default baselines. The cpu_map.xml is the database of the CPU models that libvirt supports. We use this database to transform the CPU definition from the guest XML, into the hypervisor's own format. Understood. Makes sense. As luck would have it, the cpu_map.xml file contents match what QEMU has. This need not be the case though. If there is a model in the libvirt cpu_map.xml that QEMU doesn't know, we'll just pick the nearest matching QEMU cpu model specify the fature flags to compensate. Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and add the necessary flags? That was my main worry. If disagreement between Qemu and libvirt is not a problem, it would make things much easier. ...but: Is that really implemented? I simply don't see libvirt doing that. I see code that calls -cpu ? to list the available CPU models, but no code calling -cpu ?dump, or parsing the Qemu CPU definition config file. I even removed some random flags from the Nehalem model on my machine (running Fedora 16), and no additional flags were added. We could go one step further and just write out a cpu.conf file that we load in QEMU with -loadconfig. Sounds good. Anyway, I want to make everything configurable on the cpudef config file configurable on the command-line too, so both options (command-line or config file) would work. On Xen we would use the cpu_map.xml to generate the CPUID masks that Xen expects. Similarly for VMWare. 2) How we could properly allow CPU models to be changed without breaking existing virtual machines? What is the scope of changes expected to CPU models ? We already have at least four cases, affecting different fields of the CPU definitions: A) Adding/removing flags. Exampes: - When the current set of flags is simply incorrect. See commit df07ec56 on qemu.git, where lots of flags that weren't supposed to be set were removed from some models. - When a new feature is now supported by Qemu+KVM and it's present on real-world CPUs, but our CPU definitions don't have the feature yet. e.g. x2apic, that is present on real-world Westmere CPUs but disabled on Qemu Westmere CPU definition. B) Changing level for some reason. One example: Conroe, Penrym and Nehalem have level=2, but need to have level=4 to make CPU topology work, so they have to be changed. C) Enabling/disabling or overriding specific CPUID leafs. This isn't even configurable on the config files today, but I plan to allow it to be configured, otherwise users won't be able to enable/disable some features that are probed by the guest by simply looking at a CPUID leaf (e.g. the 0xA CPUID leaf that contains PMU information). The PMU leaf is an example where a CPU looks different by simply using a different Qemu or kernel version, and libvirt can't control the visibility of that feature to the guest: - If you start a Virtual Machine using Qemu-1.0 today, with the pc-1.0 machine-type, the PMU CPUID leaf won't be visible to the guest (as Qemu-1.0 doesn't support the PMU leaf). - If you start a Virtual Machine using Qemu-1.1 in the future, using the pc-1.1 machine-type, with a recent kernel, the PMU CPUID leaf _will_ be visible to the guest (as the qemu.git master branch supports it). Up to now, it is OK because the machine-type in theory help us control the feature, but we have a problem on this case: - If you start a Virtual Machine using Qemu-1.1 in the future, using the pc-1.1 machine-type, using exactly the same command-line as above, but using an old kernel, the PMU CPUID leaf will _not_ be visible to the guest. 1) Qemu and cpu_map.xml I would like to understand how cpu_map.xml is supposed to be used, and how it is supposed to interact with the CPU model definitions provided by Qemu. More precisely: 1.1) Do we want to eliminate the
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On 03/07/2012 03:26 PM, Eduardo Habkost wrote: Thanks a lot for the explanations, Daniel. Comments about specific items inline. - How can we make sure there is no confusion between libvirt and Qemu about the CPU models? For example, what if cpu_map.xml says model 'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we guarantee that libvirt gets exactly what it expects from Qemu when it asks for a CPU model? We have -cpu ?dump today, but it's not the better interface we could have. Do you miss something in special in the Qemu-libvirt interface, to help on that? So, it looks like either I am missing something on my tests or libvirt is _not_ probing the Qemu CPU model definitions to make sure libvirt gets all the features it expects. Also, I would like to ask if you have suggestions to implement the equivalent of -cpu ?dump in a more friendly and extensible way. Would a QMP command be a good alternative? Would a command-line option with json output be good enough? I'm not sure where we are are using -cpu ?dump, but it sounds like we should be. (Do we have any case of capability-querying being made using QMP before starting any actual VM, today?) Right now, we have two levels of queries - the 'qemu -help' and 'qemu -device ?' output is gathered up front (we really need to patch things to cache that, rather than repeating it for every VM start). Then we start qemu with -S, query QMP, all before starting the guest (qemu -S is in fact necessary for setting some options that cannot be set in the current CLI but can be set via the monitor) - but right now that is the only point where we query QMP capabilities. If QMP can alter the CPU model prior to the initial start of the guest, then that would be a sufficient interface. But I'm worried that once we start qemu, even with qemu -S, that it's too late to alter the CPU model in use by that guest, and that libvirt should instead start querying these things in advance. We definitely want a machine-parseable construct, so querying over QMP rather than '-cpu ?dump' sounds like it might be nicer, but it would also be more work to set up libvirt to do a dry-run query of QMP capabilities without also starting a real guest. But what about the features that are not available on the host CPU, libvirt will think it can't be enabled, but that _can_ be enabled? x2apic seems to be the only case today, but we may have others in the future. That's where having an interface to probe qemu to see what capabilities are possible for any given cpu model would be worthwhile, so that libvirt can correlate the feature sets properly. That answers most of my questions about how libvirt would handle changes on CPU models. Now we need good mechanisms that allow libvirt to do that. If you have specific requirements or suggestions in mind, please let me know. I'll let others chime in with more responses, but I do appreciate you taking the time to coordinate this. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list