[RFC] CPUID usage for interaction between Hypervisors and Linux.
Hi, Please find below the proposal for the generic use of cpuid space allotted for hypervisors. Apart from this cpuid space another thing worth noting would be that, Intel AMD reserve the MSRs from 0x4000 - 0x40FF for software use. Though the proposal doesn't talk about MSR's right now, we should be aware of these reservations as we may want to extend the way we use CPUID to MSR usage as well. While we are at it, we also think we should form a group which has at least one person representing each of the hypervisors interested in generalizing the hypervisor CPUID space for Linux guest OS. This group will be informed whenever a new CPUID leaf from the generic space is to be used. This would help avoid any duplicate definitions for a CPUID semantic by two different hypervisors. I think most of the people are subscribed to LKML or the virtualization lists and we should use these lists as a platform to decide on things. Thanks, Alok --- Hypervisor CPUID Interface Proposal --- Intel AMD have reserved cpuid levels 0x4000 - 0x40FF for software use. Hypervisors can use these levels to provide an interface to pass information from the hypervisor to the guest running inside a virtual machine. This proposal defines a standard framework for the way in which the Linux and hypervisor communities incrementally define this CPUID space. (This proposal may be adopted by other guest OSes. However, that is not a requirement because a hypervisor can expose a different CPUID interface depending on the guest OS type that is specified by the VM configuration.) Hypervisor Present Bit: Bit 31 of ECX of CPUID leaf 0x1. This bit has been reserved by Intel AMD for use by hypervisors, and indicates the presence of a hypervisor. Virtual CPU's (hypervisors) set this bit to 1 and physical CPU's (all existing and future cpu's) set this bit to zero. This bit can be probed by the guest software to detect whether they are running inside a virtual machine. Hypervisor CPUID Information Leaf: Leaf 0x4000. This leaf returns the CPUID leaf range supported by the hypervisor and the hypervisor vendor signature. # EAX: The maximum input value for CPUID supported by the hypervisor. # EBX, ECX, EDX: Hypervisor vendor ID signature. Hypervisor Specific Leaves: Leaf range 0x4001 - 0x400F. These cpuid leaves are reserved as hypervisor specific leaves. The semantics of these 15 leaves depend on the signature read from the Hypervisor Information Leaf. Generic Leaves: Leaf range 0x4010 - 0x400FF. The semantics of these leaves are consistent across all hypervisors. This allows the guest kernel to probe and interpret these leaves without checking for a hypervisor signature. A hypervisor can indicate that a leaf or a leaf's field is unsupported by returning zero when that leaf or field is probed. To avoid the situation where multiple hypervisors attempt to define the semantics for the same leaf during development, we can partition the generic leaf space to allow each hypervisor to define a part of the generic space. For instance: VMware could define 0x401X Xen could define 0x402X KVM could define 0x403X and so on... Note that hypervisors can implement any leaves that have been defined in the generic leaf space whenever common features can be found. For example, VMware hypervisors can implement leafs that have been defined in the KVM area 0x403X and vice versa. The kernel can detect the support for a generic field inside leaf 0x40XY using the following algorithm: 1. Get EAX from Leaf 0x4, Hypervisor CPUID information. EAX returns the maximum input value for the hypervisor CPUID space. If EAX 0x40XY, then the field is not available. 2. Else, extract the field from the target Leaf 0x40XY by doing cpuid(0x40XY). If (field == 0), this feature is unsupported/unimplemented by the hypervisor. The kernel should handle this case gracefully so that a hypervisor is never required to support or implement any particular generic leaf. Definition of the Generic CPUID space. Leaf 0x4010, Timing Information. VMware has defined the first generic leaf to provide timing information. This leaf returns the current TSC frequency and current Bus frequency in kHz. # EAX: (Virtual) TSC frequency in kHz.
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: (This proposal may be adopted by other guest OSes. However, that is not a requirement because a hypervisor can expose a different CPUID interface depending on the guest OS type that is specified by the VM configuration.) Excuse me, but that is blatantly idiotic. Expecting the user having to configure a VM to match the target OS is *exactly* as stupid as expecting the user to reconfigure the BIOS. It's totally the wrong thing to do. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: Hypervisor CPUID Interface Proposal --- Intel AMD have reserved cpuid levels 0x4000 - 0x40FF for software use. Hypervisors can use these levels to provide an interface to pass information from the hypervisor to the guest running inside a virtual machine. This proposal defines a standard framework for the way in which the Linux and hypervisor communities incrementally define this CPUID space. I also observe that your proposal provides no mean of positive identification, i.e. that a hypervisor actually conforms to your proposal. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: Hi, Please find below the proposal for the generic use of cpuid space allotted for hypervisors. Apart from this cpuid space another thing worth noting would be that, Intel AMD reserve the MSRs from 0x4000 - 0x40FF for software use. Though the proposal doesn't talk about MSR's right now, we should be aware of these reservations as we may want to extend the way we use CPUID to MSR usage as well. While we are at it, we also think we should form a group which has at least one person representing each of the hypervisors interested in generalizing the hypervisor CPUID space for Linux guest OS. This group will be informed whenever a new CPUID leaf from the generic space is to be used. This would help avoid any duplicate definitions for a CPUID semantic by two different hypervisors. I think most of the people are subscribed to LKML or the virtualization lists and we should use these lists as a platform to decide on things. Thanks, Alok --- Hypervisor CPUID Interface Proposal --- Intel AMD have reserved cpuid levels 0x4000 - 0x40FF for software use. Hypervisors can use these levels to provide an interface to pass information from the hypervisor to the guest running inside a virtual machine. This proposal defines a standard framework for the way in which the Linux and hypervisor communities incrementally define this CPUID space. (This proposal may be adopted by other guest OSes. However, that is not a requirement because a hypervisor can expose a different CPUID interface depending on the guest OS type that is specified by the VM configuration.) Hypervisor Present Bit: Bit 31 of ECX of CPUID leaf 0x1. This bit has been reserved by Intel AMD for use by hypervisors, and indicates the presence of a hypervisor. Virtual CPU's (hypervisors) set this bit to 1 and physical CPU's (all existing and future cpu's) set this bit to zero. This bit can be probed by the guest software to detect whether they are running inside a virtual machine. Hypervisor CPUID Information Leaf: Leaf 0x4000. This leaf returns the CPUID leaf range supported by the hypervisor and the hypervisor vendor signature. # EAX: The maximum input value for CPUID supported by the hypervisor. # EBX, ECX, EDX: Hypervisor vendor ID signature. Hypervisor Specific Leaves: Leaf range 0x4001 - 0x400F. These cpuid leaves are reserved as hypervisor specific leaves. The semantics of these 15 leaves depend on the signature read from the Hypervisor Information Leaf. Generic Leaves: Leaf range 0x4010 - 0x400FF. The semantics of these leaves are consistent across all hypervisors. This allows the guest kernel to probe and interpret these leaves without checking for a hypervisor signature. A hypervisor can indicate that a leaf or a leaf's field is unsupported by returning zero when that leaf or field is probed. To avoid the situation where multiple hypervisors attempt to define the semantics for the same leaf during development, we can partition the generic leaf space to allow each hypervisor to define a part of the generic space. For instance: VMware could define 0x401X Xen could define 0x402X KVM could define 0x403X and so on... No, we're not getting anywhere. This is an outright broken idea. The space is too small to be able to chop up in this way, and the number of vendors too large to be able to do it without having a central oversight. The only way this can work is by having explicit positive identification of each group of leaves with a signature. If there's a recognizable signature, then you can inspect the rest of the group; if not, then you can't. That way, you can avoid any leaf usage which doesn't conform to this model, and you can also simultaneously support multiple hypervisor ABIs. It also accommodates existing hypervisor use of this leaf space, even if they currently use a fixed location within it. A concrete counter-proposal: The space 0x4000-0x40ff is reserved for hypervisor usage. This region is divided into 16 16-leaf blocks. Each block has the structure: 0x40x0: eax: max used leaf within the leaf block (max 0x40xf) e[bcd]x: leaf block signature. This may be a hypervisor-specific signature, or a generic signature, depending on the contents of the block A guest may search for any supported Hypervisor ABIs by inspecting each leaf at 0x40x0 for a known signature, and then may choose its mode of operation accordingly. It must ignore any unknown signatures, and not touch any of the leaves within an unknown leaf block. Hypervisor vendors who want to add a
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
H. Peter Anvin wrote: Jeremy Fitzhardinge wrote: No, we're not getting anywhere. This is an outright broken idea. The space is too small to be able to chop up in this way, and the number of vendors too large to be able to do it without having a central oversight. I suspect we can get a larger number space if we ask Intel AMD. In fact, I think we should request that the entire 0x40xx numberspace is assigned to virtualization *anyway*. Yes, that would be good. In that case I'd revise my proposal to back each leaf block 256 leaves instead of 16. But it still needs to be a proper enumeration with signatures, rather than assigning fixed points in that space to specific interfaces. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Jeremy Fitzhardinge wrote: No, we're not getting anywhere. This is an outright broken idea. The space is too small to be able to chop up in this way, and the number of vendors too large to be able to do it without having a central oversight. I suspect we can get a larger number space if we ask Intel AMD. In fact, I think we should request that the entire 0x40xx numberspace is assigned to virtualization *anyway*. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
H. Peter Anvin wrote: With a sufficiently large block, we could use fixed points, e.g. by having each vendor create interfaces in the 0x40XX range, where is the PCI ID they use for PCI devices. Sure, you could do that, but you'd still want to have a signature in 0x4000 to positively identify the chunk. And what if you wanted more than 256 leaves? Note that I said create interfaces. It's important that all about this is who specified the interface -- for what hypervisor is this just use 0x4000 and disambiguate based on that. What hypervisor is this? isn't a very interesting question; if you're even asking it then it suggests that something has gone wrong. Its much more useful to ask what interfaces does this hypervisor support?, and enumerating a smallish range of well-known leaves looking for signatures is the simplest way to do that. (We could use signatures derived from the PCI vendor IDs which would help with managing that namespace.) J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
H. Peter Anvin wrote: What you'd want, at least, is a standard CPUID identification and range leaf at the top. 256 leaves is a *lot*, though; I'm not saying one couldn't run out, but it'd be hard. Keep in mind that for large objects there are counting CPUID levels, as much as I personally dislike them, and one could easily argue that if you're doing something that would require anywhere near 256 leaves you probably are storing bulk data that belongs elsewhere. I agree, but it just makes the proposal a bit more brittle. Of course, if we had some kind of central authority assigning 8-bit IDs that would be even better, especially since there are tools in the field which already scan on 64K boundaries. I don't know, though, how likely it is that we'll have to deal with 256 hypervisors. I'm assuming that the likelihood of getting all possible vendors - current and future - to agree to a scheme like this is pretty small. We need to come up with something that will work well when there are non-cooperative parties to deal with. I agree completely, of course (except that what hypervisor is this still has limited usage, especially when it comes to dealing with bug workarounds. Similar to the way we use CPU vendor IDs and stepping numbers for physical CPUs.) I guess. Its certainly useful to be able to identify the hypervisor for bug reporting and just general status information. But making functional changes on that basis should be a last resort. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Jeremy Fitzhardinge wrote: Alok Kataria wrote: No, we're not getting anywhere. This is an outright broken idea. The space is too small to be able to chop up in this way, and the number of vendors too large to be able to do it without having a central oversight. The only way this can work is by having explicit positive identification of each group of leaves with a signature. If there's a recognizable signature, then you can inspect the rest of the group; if not, then you can't. That way, you can avoid any leaf usage which doesn't conform to this model, and you can also simultaneously support multiple hypervisor ABIs. It also accommodates existing hypervisor use of this leaf space, even if they currently use a fixed location within it. A concrete counter-proposal: Mmm, cpuid bikeshedding :-) The space 0x4000-0x40ff is reserved for hypervisor usage. This region is divided into 16 16-leaf blocks. Each block has the structure: 0x40x0: eax: max used leaf within the leaf block (max 0x40xf) Why even bother with this? It doesn't seem necessary in your proposal. Regards, Anthony Liguori ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Anthony Liguori wrote: Mmm, cpuid bikeshedding :-) My shade of blue is better. The space 0x4000-0x40ff is reserved for hypervisor usage. This region is divided into 16 16-leaf blocks. Each block has the structure: 0x40x0: eax: max used leaf within the leaf block (max 0x40xf) Why even bother with this? It doesn't seem necessary in your proposal. It allows someone to incrementally add things to their block in a fairly orderly way. But more importantly, its the prevailing idiom, and the existing and proposed cpuid schemes already do this, so they'd fit in as-is. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Jeremy Fitzhardinge wrote: Anthony Liguori wrote: Mmm, cpuid bikeshedding :-) My shade of blue is better. The space 0x4000-0x40ff is reserved for hypervisor usage. This region is divided into 16 16-leaf blocks. Each block has the structure: 0x40x0: eax: max used leaf within the leaf block (max 0x40xf) Why even bother with this? It doesn't seem necessary in your proposal. It allows someone to incrementally add things to their block in a fairly orderly way. But more importantly, its the prevailing idiom, and the existing and proposed cpuid schemes already do this, so they'd fit in as-is. We just leave eax as zero. It wouldn't be that upsetting to change this as it would only keep new guests from working on older KVMs. However, I see little incentive to change anything unless there's something compelling that we would get in return. Since we're only talking about Linux guests, it's just as easy for us to add things to our paravirt_ops implementation as it would be to add things using this new model. If this was something that other guests were all agreeing to support (even if it was just the BSDs and OpenSolaris), then there may be value to it. Right now, I see no real value in changing the status quo. Regards, Anthony Liguori J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: On Wed, 2008-10-01 at 11:04 -0700, Jeremy Fitzhardinge wrote: 2. Divergence in the interface provided by the hypervisors : The reason we brought up a flat hierarchy is because we think we should be moving towards a approach where the guest code doesn't diverge too much when running under different hypervisors. That is the guest essentially does the same thing if its running on say Xen or VMware. This design IMO, will take us a step backward to what we already have seen with para virt ops. Each hypervisor (mostly) defines its own cpuid block, the guest correspondingly needs to have code to handle each of these cpuid blocks, with these blocks will mostly being exclusive. What's wrong with what we have in paravirt_ops? Just agreeing on CPUID doesn't help very much. You still need a mechanism for doing hypercalls to implement anything meaningful. We aren't going to agree on a hypercall mechanism. KVM uses direct hypercall instructions, Xen uses a hypercall page, VMware uses VMI, Hyper-V uses MSR writes. We all have already defined the hypercall namespace in a certain way. We've already gone down the road of trying to make standard paravirtual interfaces (via virtio). No one was sufficiently interested in collaborating. I don't see why other paravirtualizations are going to be much different. Regards, Anthony Liguori ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
* Anthony Liguori ([EMAIL PROTECTED]) wrote: We've already gone down the road of trying to make standard paravirtual interfaces (via virtio). No one was sufficiently interested in collaborating. I don't see why other paravirtualizations are going to be much different. The point is to be able to support those interfaces. Presently a Linux guest will test and find out which HV it's running on, and adapt. Another guest will fail to enlighten itself, and perf will suffer...yadda, yadda. thanks, -chris ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: 1. Kernel complexity : Just thinking about the complexity that this will put in the kernel to handle these multiple ABI signatures and scanning all of these leaf block's is difficult to digest. The scanning for the signatures is trivial; it's not a significant amount of code. Actually implementing them is a different matter, but that's the same regardless of where they are placed or how they're discovered. After discovery its the same either way: there's a leaf base with offsets from it. 2. Divergence in the interface provided by the hypervisors : The reason we brought up a flat hierarchy is because we think we should be moving towards a approach where the guest code doesn't diverge too much when running under different hypervisors. That is the guest essentially does the same thing if its running on say Xen or VMware. I guess, but the bulk of the uses of this stuff are going to be hypervisor-specific. You're hard-pressed to come up with any other generic uses beyond tsc. In general, if a hypervisor is going to put something in a special cpuid leaf, its because there's no other good way to represent it. Generic things are generally going to appear as an emulated piece of the virtualized platform, in ACPI, DMI, a hardware-defined cpuid leaf, etc... 3. Is their a need to do all this over engineering : Aren't we over engineering a simple interface over here. The point is, there are right now 256 cpuid leafs do we realistically think we are ever going to exhaust all these leafs. We are really surprised to know that people may think this space is small enough. It would be interesting to know what all use you might want to put cpuid for. Look, if you want to propose a way to use that cpuid space in a reasonably flexible way that allows it to be used as the need arises, then we can talk about it. But I think your proposal is a poor way to achieve those ends If you want blessing for something that you've already implemented and shipped, well, you don't need anyone's blessing for that. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Alok Kataria wrote: Your explanation below answers the question you raised, the problem being we need to have support for each of these different hypercall mechanisms in the kernel. I understand that this was the correct thing to do at that moment. But do we want to go the same way again for CPUID when we can make it generic (flat enough) for anybody to use it in the same manner and expose a generic interface to the kernel. But what sort of information can be stored in cpuid that's actually useful? Right now we just it in KVM for feature bits. Most of the stuff that's interesting is stored in shared memory because a guest can read that without taking a vmexit or via a hypercall. We can all agree upon a common mechanism for doing something but if no one is using that mechanism to do anything significant, what purpose does it serve? Regards, Anthony Liguori ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Chris Wright wrote: * Anthony Liguori ([EMAIL PROTECTED]) wrote: We've already gone down the road of trying to make standard paravirtual interfaces (via virtio). No one was sufficiently interested in collaborating. I don't see why other paravirtualizations are going to be much different. The point is to be able to support those interfaces. Presently a Linux guest will test and find out which HV it's running on, and adapt. Another guest will fail to enlighten itself, and perf will suffer...yadda, yadda. Agreeing on CPUID does not get us close at all to having shared interfaces for paravirtualization. As I said in another note, there are more fundamental things that we differ on (like hypercall mechanism) that's going to make that challenging. We already are sharing code, when appropriate (see the Xen/KVM PV clock interface). Regards, Anthony Liguori thanks, -chris ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Jeremy Fitzhardinge wrote: Alok Kataria wrote: I guess, but the bulk of the uses of this stuff are going to be hypervisor-specific. You're hard-pressed to come up with any other generic uses beyond tsc. And arguably, storing TSC frequency in CPUID is a terrible interface because the TSC frequency can change any time a guest is entered. It really should be a shared memory area so that a guest doesn't have to vmexit to read it (like it is with the Xen/KVM paravirt clock). Regards, Anthony Liguori In general, if a hypervisor is going to put something in a special cpuid leaf, its because there's no other good way to represent it. Generic things are generally going to appear as an emulated piece of the virtualized platform, in ACPI, DMI, a hardware-defined cpuid leaf, etc... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
On Wed, 2008-10-01 at 14:34 -0700, Anthony Liguori wrote: Jeremy Fitzhardinge wrote: Alok Kataria wrote: I guess, but the bulk of the uses of this stuff are going to be hypervisor-specific. You're hard-pressed to come up with any other generic uses beyond tsc. And arguably, storing TSC frequency in CPUID is a terrible interface because the TSC frequency can change any time a guest is entered. It really should be a shared memory area so that a guest doesn't have to vmexit to read it (like it is with the Xen/KVM paravirt clock). It's not terrible, it's actually brilliant. TSC is part of the processor architecture, the processor should a way to tell us what speed it is. Having a TSC with no interface to determine the frequency is a terrible design flaw. This is what caused the problem in the first place. And now we're trying to fiddle around with software wizardry what should be done in hardware in the first place. Once again, para-virtualization is basically useless. We can't agree on a solution without over-designing some complex system with interface signatures and multi-vendor cooperation and nonsense. Solve the non-virtualized problem and the virtualized problem goes away. Jun, you work at Intel. Can you ask for a new architecturally defined MSR that returns the TSC frequency? Not a virtualization specific MSR. A real MSR that would exist on physical processors. The TSC started as an MSR anyway. There should be another MSR that tells the frequency. If it's hard to do in hardware, it can be a write-once MSR that gets initialized by the BIOS. It's really a very simple solution to a very common problem. Other MSRs are dedicated to bus speed and so on, this seems remarkably similar. Once the physical problem is solved, the virtualized problem doesn't even exist. We simply add support for the newly defined MSR and voilla. Other chipmakers probably agree it's a good idea and go along with it too, and in the meantime, reading a non-existent MSR is a fairly harmlessly handled #GP. I realize it's the wrong thing for us now, but long term, it's the only architecturally 'correct' approach. You can even extend it to have visible TSC frequency changes clocked via performance counter events (and then get interrupts on those events if you so wish), solving the dynamic problem too. Paravirtualization is a symptom of an architectural problem. We should always be trying to fix the architecture first. Zach ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Zachary Amsden wrote: Jun, you work at Intel. Can you ask for a new architecturally defined MSR that returns the TSC frequency? Not a virtualization specific MSR. A real MSR that would exist on physical processors. The TSC started as an MSR anyway. There should be another MSR that tells the frequency. If it's hard to do in hardware, it can be a write-once MSR that gets initialized by the BIOS. It's really a very simple solution to a very common problem. Other MSRs are dedicated to bus speed and so on, this seems remarkably similar. Ah, if it was only that simple. Transmeta actually did this, but it's not as useful as you think. There are at least three crystals in modern PCs: one at 32.768 kHz (for the RTC), one at 14.31818 MHz (PIT, PMTMR and HPET), and one at a higher frequency (often 200 MHz.) All the main data distribution clocks in the system are derived from the third, which is subject to spread-spectrum modulation due to RFI concerns. Therefore, relying on the *nominal* frequency of this clock is vastly incorrect; often by as much as 2%. Spread-spectrum modulation is supposed to vary around zero enough that the spreading averages out, but the only way to know what the center frequency actually is is to average. Furthermore, this high-frequency clock is generally not calibrated anywhere near as well as the 14 MHz clock; in good designs the 14 MHz is actually a TCXO (temperature compensated crystal oscillator), which is accurate to something like ±2 ppm. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
RE: [RFC] CPUID usage for interaction between Hypervisors and Linux.
On 10/1/2008 3:46:45 PM, H. Peter Anvin wrote: Alok Kataria wrote: No, that's always a terrible idea. Sure, its necessary to deal with some backward-compatibility issues, but we should even consider a new interface which assumes this kind of thing. We want properly enumerable interfaces. The reason we still have to do this is because, Microsoft has already defined a CPUID format which is way different than what you or I are proposing ( with the current case of 256 leafs being available). And I doubt they would change the way they deal with it on their OS. Any proposal that we go with, we will have to export different CPUID interface from the hypervisor for the 2 OS in question. So i think this is something that we anyways will have to do and not worth binging about in the discussion. No, that's a good hint that what you and I are proposing is utterly broken and exactly underscores what I have been stressing about noncompliant hypervisors. All I have seen out of Microsoft only covers CPUID levels 0x4000 as an vendor identification leaf and 0x4001 as a hypervisor identification leaf, but you might have access to other information. No, it says Leaf 0x4001 as hypervisor vendor-neutral interface identification, which determines the semantics of leaves from 0x4002 through 0x40FF. The Leaf 0x4000 returns vendor identifier signature (i.e. hypervisor identification) and the hypervisor CPUID leaf range, as in the proposal. This further underscores my belief that using 0x40xx for anything standards-based at all is utterly futile, and that this space should be treated as vendor identification and the rest as vendor-specific. Any hope of creating a standard that's actually usable needs to be outside this space, e.g. in the 0x40xx space I proposed earlier. Actually I'm not sure I'm following your logic. Are you saying using that 0x40xx for anything standards-based is utterly futile because Microsoft said the range is hypervisor vendor-neutral? Or you were not sure what they meant there. If we are not clear, we can ask them. -hpa . Jun Nakajima | Intel Open Source Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
On Wed, 2008-10-01 at 17:39 -0700, H. Peter Anvin wrote: third, which is subject to spread-spectrum modulation due to RFI concerns. Therefore, relying on the *nominal* frequency of this clock I'm not suggesting using the nominal value. I'm suggesting the measurement be done in the one and only place where there is perfect control of the system, the processor boot-strapping in the BIOS. Only the platform designers themselves know the speed of the oscillator which is modulating the clock and so only they should be calibrating the speed of the TSC. If this modulation really does alter the frequency by +/- 2% (seems high to me, but hey, I don't design motherboards), using an LFO, then basically all the calibration done in Linux is broken and has been for some time. You can't calibrate only once, or risk being off by 2%, you can't calibrate repeatedly and take the fastest estimate, or you are off by 2%, and you can't calibrate repeatedly and take the average without risking SMI noise affecting the lowest clock speed measurement, contributing unknown error. Hmm. Re-reading your e-mail, I see you are saying the nominal frequency may be off by 2% (and I easily believe that), not necessarily that the frequency modulation may be 2% (which I still think is high). Does anyone know what the actual bounds on spread spectrum modulation are or how fast the clock is modulated? Zach ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] CPUID usage for interaction between Hypervisors and Linux.
Zachary Amsden wrote: I'm not suggesting using the nominal value. I'm suggesting the measurement be done in the one and only place where there is perfect control of the system, the processor boot-strapping in the BIOS. Only the platform designers themselves know the speed of the oscillator which is modulating the clock and so only they should be calibrating the speed of the TSC. No. *Noone*, including the manufacturers, know the speed of the oscillator which is modulating the clock. What you have to do is average over a timespan which is long enough that the SSM averages out (a relatively small fraction of a second.) As for trusting the BIOS on this, that's a total joke. Firmware vendors can't get the most basic details right. If this modulation really does alter the frequency by +/- 2% (seems high to me, but hey, I don't design motherboards), using an LFO, then basically all the calibration done in Linux is broken and has been for some time. You can't calibrate only once, or risk being off by 2%, you can't calibrate repeatedly and take the fastest estimate, or you are off by 2%, and you can't calibrate repeatedly and take the average without risking SMI noise affecting the lowest clock speed measurement, contributing unknown error. You have to calibrate over a sample interval long enough that the SSM averages out. Hmm. Re-reading your e-mail, I see you are saying the nominal frequency may be off by 2% (and I easily believe that), not necessarily that the frequency modulation may be 2% (which I still think is high). Does anyone know what the actual bounds on spread spectrum modulation are or how fast the clock is modulated? No, I'm saying the frequency modulation may be up to 2%. Typically it is something like [-2%,+0%]. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization