Re: [PATCH 0/24] Nested VMX, v5
On 10/17/2010 02:39 PM, Nadav Har'El wrote: On Sun, Oct 17, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > >patch. In short, try running the L0 kernel with the "nosmp" option, > What are the problems with smp? Unfortunately, there appears to be a bug which causes KVM with nested VMX to hang when SMP is enabled, even if you don't try to use more than one CPU for the guest. I still need to debug this to figure out why. Well, that seems pretty critical. > > give the > >"-cpu host" option to qemu, > > Why is this needed? Qemu has a list of cpu types, and for each type it lists its features. The problem is that Qemu doesn't list the "VMX" feature for any of the CPUs, even those (like core 2 duo). I have a trivial patch to qemu to add the "VMX" feature to those CPUs, which is harmless even if KVM doesn't support nested VMX (qemu will drop features which KVM doesn't support). But until I send such a patch to qemu, the easiest workaround is just to use "-cpu host" - which will (among other things) tell qemu to emulate a machine which has vmx, just like the host does. (I also explained this in the intro to v6 of the patch). Ok. I think we can get that patch merged, just so you don't have to re-explain it over and over again. Please post it to qemu-devel. > > >and the "nested=1 ept=0 vpid=0" options to the > >kvm-intel module in L0. > > Why are those needed? Seems trivial to support a nonept guest on an ept > host - all you do is switch cr3 during vmentry and vmexit. nested=1 is needed because you asked for it *not* to be the default :-) You're right, ept=1 on the host *could* be supported even before nested ept is supported (this is the mode we called "shadow on ept" in the paper). But at the moment, I believe it doesn't work correctly. I'll add making this case work to my TODO list. I'm not sure why vpid=0 is needed (but I verified that you get a failed entry if you don't use it). I understood that there was some discussion on what is the proper way to do nested vpid, and that in the meantime it isn't supported, but I agree that it should have been possible to use vpid normally to run L1's but avoid using it when running L2's. Again, I'll need to debug this issue to understand how difficult it would be to fix this case. My feeling is the smp and vpid failures are due to bugs. vpid=0 in particular forces a tlb flush on every exit which might mask your true bug. smp might be due to host vcpu migration. Are we vmclearing the right vmcs? ept=1 may not be due to a bug per se, but my feeling is that it should be very easy to implement. In particular nsvm started out on npt (but not nnpt) and had issues with shadow-on-shadow (IIRC). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Sun, Oct 17, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > >patch. In short, try running the L0 kernel with the "nosmp" option, > What are the problems with smp? Unfortunately, there appears to be a bug which causes KVM with nested VMX to hang when SMP is enabled, even if you don't try to use more than one CPU for the guest. I still need to debug this to figure out why. > > give the > >"-cpu host" option to qemu, > > Why is this needed? Qemu has a list of cpu types, and for each type it lists its features. The problem is that Qemu doesn't list the "VMX" feature for any of the CPUs, even those (like core 2 duo). I have a trivial patch to qemu to add the "VMX" feature to those CPUs, which is harmless even if KVM doesn't support nested VMX (qemu will drop features which KVM doesn't support). But until I send such a patch to qemu, the easiest workaround is just to use "-cpu host" - which will (among other things) tell qemu to emulate a machine which has vmx, just like the host does. (I also explained this in the intro to v6 of the patch). > > >and the "nested=1 ept=0 vpid=0" options to the > >kvm-intel module in L0. > > Why are those needed? Seems trivial to support a nonept guest on an ept > host - all you do is switch cr3 during vmentry and vmexit. nested=1 is needed because you asked for it *not* to be the default :-) You're right, ept=1 on the host *could* be supported even before nested ept is supported (this is the mode we called "shadow on ept" in the paper). But at the moment, I believe it doesn't work correctly. I'll add making this case work to my TODO list. I'm not sure why vpid=0 is needed (but I verified that you get a failed entry if you don't use it). I understood that there was some discussion on what is the proper way to do nested vpid, and that in the meantime it isn't supported, but I agree that it should have been possible to use vpid normally to run L1's but avoid using it when running L2's. Again, I'll need to debug this issue to understand how difficult it would be to fix this case. Nadav. -- Nadav Har'El| Sunday, Oct 17 2010, 9 Heshvan 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Strike not only while the iron is hot, http://nadav.harel.org.il |make the iron hot by striking it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 10/17/2010 02:03 PM, Nadav Har'El wrote: On Tue, Jun 15, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > I've tried to test the patches, but I see a vm-entry failure code 7 on > the very first vmentry. Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12). Hi, as you can see, I posted a new set of patches, which apply to the current trunk. Can you please give it another try? Thanks! Please make sure you follow the instructions in the introduction to the patch. In short, try running the L0 kernel with the "nosmp" option, What are the problems with smp? give the "-cpu host" option to qemu, Why is this needed? and the "nested=1 ept=0 vpid=0" options to the kvm-intel module in L0. Why are those needed? Seems trivial to support a nonept guest on an ept host - all you do is switch cr3 during vmentry and vmexit. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Tue, Jun 15, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > I've tried to test the patches, but I see a vm-entry failure code 7 on > the very first vmentry. Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12). Hi, as you can see, I posted a new set of patches, which apply to the current trunk. Can you please give it another try? Thanks! Please make sure you follow the instructions in the introduction to the patch. In short, try running the L0 kernel with the "nosmp" option, give the "-cpu host" option to qemu, and the "nested=1 ept=0 vpid=0" options to the kvm-intel module in L0. Thanks, Nadav. -- Nadav Har'El| Sunday, Oct 17 2010, 9 Heshvan 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |This space is for sale - inquire inside. http://nadav.harel.org.il | -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Sunday 13 June 2010 20:22:33 Nadav Har'El wrote: > Hi Avi, > > This is a followup of our nested VMX patches that Orit Wasserman posted in > December. We've addressed most of the comments and concerns that you and > others on the mailing list had with the previous patch set. We hope you'll > find these patches easier to understand, and suitable for applying to KVM. > > > The following 24 patches implement nested VMX support. The patches enable a > guest to use the VMX APIs in order to run its own nested guests. I.e., it > allows running hypervisors (that use VMX) under KVM. We describe the theory > behind this work, our implementation, and its performance characteristics, > in IBM Research report H-0282, "The Turtles Project: Design and > Implementation of Nested Virtualization", available at: > > http://bit.ly/a0o9te > > The current patches support running Linux under a nested KVM using shadow > page table (with bypass_guest_pf disabled). They support multiple nested > hypervisors, which can run multiple guests. Only 64-bit nested hypervisors > are supported. SMP is supported. Additional patches for running Windows > under nested KVM, and Linux under nested VMware server, and support for > nested EPT, are currently running in the lab, and will be sent as > follow-on patchsets. Hi Nadav Do you have a tree or code base and instruction to try this patchset? I've spent some time on it, but can't get it right... -- regards Yang, Sheng > > These patches were written by: > Abel Gordon, abelg il.ibm.com > Nadav Har'El, nyh il.ibm.com > Orit Wasserman, oritw il.ibm.com > Ben-Ami Yassor, benami il.ibm.com > Muli Ben-Yehuda, muli il.ibm.com > > With contributions by: > Anthony Liguori, aliguori us.ibm.com > Mike Day, mdday us.ibm.com > > This work was inspired by the nested SVM support by Alexander Graf and > Joerg Roedel. > > > Changes since v4: > * Rebased to the current KVM tree. > * Support for lazy FPU loading. > * Implemented about 90 requests and suggestions made on the mailing list > regarding the previous version of this patch set. > * Split the changes into many more, and better documented, patches. > > -- > Nadav Har'El > IBM Haifa Research Lab > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 07/11/2010 06:39 PM, Nadav Har'El wrote: On Sun, Jul 11, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": nesting- aware L1 guest hypervisors to actually use that internal structure to modify vmcs12 directly, without vmread/vmwrite and exits. No, they can't, since (for writes) L0 might cache the information and not read it again. For reads, L0 might choose to update vmcs12 on demand. Well, in the current version of the nested code, all L0 does on a L1 vmwrite is to update the in-memory vmcs12 structure. It doesn't not update vmcs02, nor cache anything, nor remember what has changed and what hasn't. So replacing it with a direct write to the memory structure should be fine... Note you said "current version". What if this later changes? So, we cannot allow a guest to access vmcs12 directly. There has to be a protocol that allows the guest to know what it can touch and what it can't (or, tell the host what the guest touched and what it hasn't). Otherwise, we lose the ability to optimize. Of course, this situation isn't optimal, and we *should* optimize the number of unnecessary vmwrites L2 entry and exit (and we actually tried some of this in our tech report), but it's not in the current patch set. When we do these kind of optimizations, you're right that: A pvvmread/write needs to communicate with L0 about what fields are valid (likely using available and dirty bitmaps). It's right even before we do these optimizations, so a pv guest written before the optimizations can run on an optimized host. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Sun, Jul 11, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > >nesting- > >aware L1 guest hypervisors to actually use that internal structure to > >modify > >vmcs12 directly, without vmread/vmwrite and exits. > > > > No, they can't, since (for writes) L0 might cache the information and > not read it again. For reads, L0 might choose to update vmcs12 on demand. Well, in the current version of the nested code, all L0 does on a L1 vmwrite is to update the in-memory vmcs12 structure. It doesn't not update vmcs02, nor cache anything, nor remember what has changed and what hasn't. So replacing it with a direct write to the memory structure should be fine... Of course, this situation isn't optimal, and we *should* optimize the number of unnecessary vmwrites L2 entry and exit (and we actually tried some of this in our tech report), but it's not in the current patch set. When we do these kind of optimizations, you're right that: > A pvvmread/write needs to communicate with L0 about what fields are > valid (likely using available and dirty bitmaps). -- Nadav Har'El| Sunday, Jul 11 2010, 1 Av 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |If marriage was illegal, only outlaws http://nadav.harel.org.il |would have in-laws. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 07/11/2010 11:27 AM, Nadav Har'El wrote: 1: Basically there are 2 diferent type in VMCS, one is defined by hardware, whose layout is unknown to VMM. Another one is defined by VMM (this patch) and used for vmcs12. The former one is using "struct vmcs" to describe its data instance, but the later one doesn't have a clear definition (or struct vmcs12?). I suggest we can have a distinguish struct for this, for example "struct sw_vmcs" (software vmcs), or "struct vvmcs" (virtual vmcs). I decided (but let me know if you have reservations) to use the name "struct vmcs_fields" for the memory structure that contains the long list of vmcs fields. I think this name describes the structure's content well. I liked vvmcs myself... As in the last version of the patches, this list of vmcs fields will not on its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also needs to contain a couple of additional fields in its beginning, and we also need a few more runtime fields. ... for the spec-compliant vmcs in L1's memory. 2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either "struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish between data structure and instance helps IMO. I agree with you that using the name "vmcs12" for both the type (struct vmcs12) and instance of another type (struct vmcs_fields *vmcs12) is somewhat strange, but I can only think of two alternatives: 1. Invent a new name for "struct vmcs12", say "struct sw_vmcs" as you suggested. But I think it will just make things less clear, because we replace the self-explanatory name vmcs12 by a less clear name. 2. Stop separating "struct vmcs_fields" (formerly struct shadow_vmcs) and "struct vmcs12" which contains it and a few more fields - and instead put everything in one structure (and call that sw_vmcs or whatever). I like this. These extra fields will not be useful for vmcs01, but it's not a terrible waste (because vmcs01 already doesn't use a lot of these fields). You don't really need vmcs01 to be a vvmcs (or sw_vmcsw). IIRC you only need it when copying around vmcss, which you can avoid completely by initializing vmcs01 and vmcs02 using common initialization routines for the host part. Personally, I find these two alternatives even less appealing than the current alternative (with "struct vmcs12" describing vmcs12's type, and it contains a struct vmcs_fields inside). What do you think? IMO, vmcs_fields is artificial. As soon as you eliminate the vmcs copy, you won't have any use for it, and then you can fold it into its container. 5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each time when guest VMPTRLD, and free the instance at VMCLEAR. The code may fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial of service. That is fine, but it brings additional complexity and may pay with a lot of memory. I think we can emulate using concept of "cached vmcs" here in case L1 VMM doesn't do vmclear in time. L0 VMM can simply flush those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached vcs02 exceeds 10, we can do automatically flush. Right. I've already discussed this idea over the list with Avi Kivity, and it is on my todo list and definitely should be done. The current approach is simpler, because I don't need to add special code for rebuilding a forgotten vmcs02 from vmcs12 - the current prepare_vmcs02 only updates some of the fields, and I'll need to do some testing to figure out what exactly is missing for a full rebuild. You already support "full rebuild" - that's what happens when you first see a vmcs, when you launch a guest. I think the current code is "good enough" as an ad-interim solution, because users that follow the spec will not forget to VMCLEAR anyway (and if they do, only they will suffer). And I wouldn't say that "a lot of memory" is involved - at worst, an L1 can now cause 256 pages, or 1 MB, to be wasted on this. More normally, an L1 will only have a few L2 guests, and only spend a few pages for this - certainly much much less than he'd spend on actually holding the L2's memory. It's perfectly legitimate for a guest to disappear a vmcs. It might swap it to disk, or move it to a separate NUMA node. While I don't expect the first, the second will probably happen sometime. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 07/11/2010 03:49 PM, Nadav Har'El wrote: In any case, the obvious problem with this whole idea on VMX is that it requires a modified guest hypervisor, which reduces its usefulness. This is why we didn't think we should "advertise" the ability to bypass vmread/vmwrite in L1 and write directly to the vmcs12's. But Avi Kivity already asked me to add a document about the vmcs12 internal structure, and once I've done that, I guess you can now consider it "fair" for nesting- aware L1 guest hypervisors to actually use that internal structure to modify vmcs12 directly, without vmread/vmwrite and exits. No, they can't, since (for writes) L0 might cache the information and not read it again. For reads, L0 might choose to update vmcs12 on demand. A pvvmread/write needs to communicate with L0 about what fields are valid (likely using available and dirty bitmaps). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Sun, Jul 11, 2010, Alexander Graf wrote about "Re: [PATCH 0/24] Nested VMX, v5": > Thinking about this - it would be perfectly legal to split the VMCS into two > separate structs, right? You could have one struct that you map directly into > the guest, so modifications to that struct don't trap. Of course the l1 guest > shouldn't be able to modify all fields of the VMCS, so you'd still keep a > second struct around with shadow fields. While at it, also add a bitmap to > store the dirtyness status of your fields in, if you need that. > > That way a nesting aware guest could use a PV memory write instead of the > (slow) instruction emulation. That should dramatically speed up nesting vmx. Hi, We already tried this idea, and described the results in our tech report (see http://www.mulix.org/pubs/turtles/h-0282.pdf). We didn't do things quite as cleanly as you suggested - we didn't split the structure and make only part of it available directly to the guest. Rather, we only did what we had to do to get the performance improvement: We modified L1 to access the VMCS directly, assuming the nested's vmcs12 structure layout, instead of calling vmread/vmwrite. As you can see in the various benchmarks in section 4 (Evaluation) of the report, the so-called PV vmread/vmwrite method had a noticable, though perhaps not as dramatic as you hoped, effect. For example, for the kernbench benchmark, nested kvm overhead (over single-level kvm virtualization) came down from 14.5% to 10.3%, and for the specjbb benchmark, the overhead came down from 7.8% to 6.3%. In a microbenchmark less representative of real-life workloads, we were able to measure a halving of the overhead by adding the PV vmread/vmwrite. In any case, the obvious problem with this whole idea on VMX is that it requires a modified guest hypervisor, which reduces its usefulness. This is why we didn't think we should "advertise" the ability to bypass vmread/vmwrite in L1 and write directly to the vmcs12's. But Avi Kivity already asked me to add a document about the vmcs12 internal structure, and once I've done that, I guess you can now consider it "fair" for nesting- aware L1 guest hypervisors to actually use that internal structure to modify vmcs12 directly, without vmread/vmwrite and exits. By the way, I see on the KVM Forum 2010 schedule that Eddie Dong will be talking about "Examining KVM as Nested Virtualization Friendly Guest". I'm looking forward to reading the proceedings (unfortunately, I won't be able to travel to the actual meeting). Nadav. -- Nadav Har'El| Sunday, Jul 11 2010, 29 Tammuz 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I used to work in a pickle factory, until http://nadav.harel.org.il |I got canned. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 11.07.2010, at 10:27, Nadav Har'El wrote: > On Fri, Jul 09, 2010, Dong, Eddie wrote about "RE: [PATCH 0/24] Nested VMX, > v5": >> Thnaks for the posting and in general the patches are well written. >> I like the concept of VMCSxy and I feel it is pretty clear (better than my >> previous naming as well), but there are some confusing inside, especially >> for the term "shadow" which I feel quit hard. > > Hi, and thanks for the excellent ideas. As you saw, I indeed started to > convert and converge the old terminology (including that ambiguous term > "shadow") into the new names vmcs01, vmcs02, vmcs12 - names which we > introduced in our technical report. > But I have not gone all the way with these changes. I should have, and I'll > do it now. > >> 1: Basically there are 2 diferent type in VMCS, one is defined by hardware, >> whose layout is unknown to VMM. Another one is defined by VMM (this patch) >> and used for vmcs12. >> The former one is using "struct vmcs" to describe its data instance, but the >> later one doesn't have a clear definition (or struct vmcs12?). I suggest we >> can have a distinguish struct for this, for example "struct sw_vmcs" >> (software vmcs), or "struct vvmcs" (virtual vmcs). > > I decided (but let me know if you have reservations) to use the name > "struct vmcs_fields" for the memory structure that contains the long list of > vmcs fields. I think this name describes the structure's content well. > > As in the last version of the patches, this list of vmcs fields will not on > its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also > needs to contain a couple of additional fields in its beginning, and we also > need a few more runtime fields. Thinking about this - it would be perfectly legal to split the VMCS into two separate structs, right? You could have one struct that you map directly into the guest, so modifications to that struct don't trap. Of course the l1 guest shouldn't be able to modify all fields of the VMCS, so you'd still keep a second struct around with shadow fields. While at it, also add a bitmap to store the dirtyness status of your fields in, if you need that. That way a nesting aware guest could use a PV memory write instead of the (slow) instruction emulation. That should dramatically speed up nesting vmx. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Fri, Jul 09, 2010, Dong, Eddie wrote about "RE: [PATCH 0/24] Nested VMX, v5": > Thnaks for the posting and in general the patches are well written. > I like the concept of VMCSxy and I feel it is pretty clear (better than my > previous naming as well), but there are some confusing inside, especially > for the term "shadow" which I feel quit hard. Hi, and thanks for the excellent ideas. As you saw, I indeed started to convert and converge the old terminology (including that ambiguous term "shadow") into the new names vmcs01, vmcs02, vmcs12 - names which we introduced in our technical report. But I have not gone all the way with these changes. I should have, and I'll do it now. > 1: Basically there are 2 diferent type in VMCS, one is defined by hardware, > whose layout is unknown to VMM. Another one is defined by VMM (this patch) > and used for vmcs12. > The former one is using "struct vmcs" to describe its data instance, but the > later one doesn't have a clear definition (or struct vmcs12?). I suggest we > can have a distinguish struct for this, for example "struct sw_vmcs" > (software vmcs), or "struct vvmcs" (virtual vmcs). I decided (but let me know if you have reservations) to use the name "struct vmcs_fields" for the memory structure that contains the long list of vmcs fields. I think this name describes the structure's content well. As in the last version of the patches, this list of vmcs fields will not on its own be vmcs12's structure, because vmcs12, as a spec-compliant vmcs, also needs to contain a couple of additional fields in its beginning, and we also need a few more runtime fields. > 2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either > "struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish > between data structure and instance helps IMO. I agree with you that using the name "vmcs12" for both the type (struct vmcs12) and instance of another type (struct vmcs_fields *vmcs12) is somewhat strange, but I can only think of two alternatives: 1. Invent a new name for "struct vmcs12", say "struct sw_vmcs" as you suggested. But I think it will just make things less clear, because we replace the self-explanatory name vmcs12 by a less clear name. 2. Stop separating "struct vmcs_fields" (formerly struct shadow_vmcs) and "struct vmcs12" which contains it and a few more fields - and instead put everything in one structure (and call that sw_vmcs or whatever). These extra fields will not be useful for vmcs01, but it's not a terrible waste (because vmcs01 already doesn't use a lot of these fields). Personally, I find these two alternatives even less appealing than the current alternative (with "struct vmcs12" describing vmcs12's type, and it contains a struct vmcs_fields inside). What do you think? > 3: We may use prefix or suffix in addition to vmcsxy to explictly state the > format of that instance. For example vmcs02 in current patch is for hardware > use, hence it is an instance "struct vmcs", but vmcs01 is an instance of > "struct sw_vmcs". Postfix and prefix helps to make better understand. I agree. After changing the old name struct shadow_vmcs to vmcs_fields, now I can use a name like vmcs01_fields for the old l1_shadow_vmcs (memory copy of vmcs01's fields) and vmcs01 for the old l1_vmcs (the actual hardware VMCS used to run L1). This is is indeed more readable, thanks. > 4: Rename l2_vmcs to vmcs02, l1_shadow_vmcs to vmcs01, l1_vmcs to > vmcs02, with prefix/postfix can strengthen above concept of vmcsxy. Good ideas. renamed l2_vmcs, l2_vmcs_list, and the likes, to vmcs02. Renamed l1_shadow_vmcs to vmcs01_fields, ands l1_vmcs to vmcs01 (NOT vmcs02). renamed l2_shadow_vmcs, l2svmcs, nested_vmcs, and the likes, to vmcs12 (I decided not to use the longer name vmcs12_fields, because I don't think it adds any clarity). I also renamed get_shadow_vmcs to get_vmcs12_fields. > 5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each > time when guest VMPTRLD, and free the instance at VMCLEAR. The code may > fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial > of service. That is fine, but it brings additional complexity and may pay > with a lot of memory. I think we can emulate using concept of "cached vmcs" > here in case L1 VMM doesn't do vmclear in time. L0 VMM can simply flush > those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached > vcs02 exceeds 10, we can do automatically flush. Right. I've already discussed this idea over the list with Avi Kivity, and it is on my todo list and definitely should be done. The current approach is simpler, because I d
RE: [PATCH 0/24] Nested VMX, v5
Nadav Har'El wrote: > Hi Avi, > > This is a followup of our nested VMX patches that Orit Wasserman > posted in December. We've addressed most of the comments and concerns > that you and others on the mailing list had with the previous patch > set. We hope you'll find these patches easier to understand, and > suitable for applying to KVM. > > > The following 24 patches implement nested VMX support. The patches > enable a guest to use the VMX APIs in order to run its own nested > guests. I.e., it allows running hypervisors (that use VMX) under KVM. > We describe the theory behind this work, our implementation, and its > performance characteristics, > in IBM Research report H-0282, "The Turtles Project: Design and > Implementation of Nested Virtualization", available at: > > http://bit.ly/a0o9te > > The current patches support running Linux under a nested KVM using > shadow page table (with bypass_guest_pf disabled). They support > multiple nested hypervisors, which can run multiple guests. Only > 64-bit nested hypervisors are supported. SMP is supported. Additional > patches for running Windows under nested KVM, and Linux under nested > VMware server, and support for nested EPT, are currently running in > the lab, and will be sent as follow-on patchsets. > Nadav & All: Thnaks for the posting and in general the patches are well written. I like the concept of VMCSxy and I feel it is pretty clear (better than my previous naming as well), but there are some confusing inside, especially for the term "shadow" which I feel quit hard. Comments from me: 1: Basically there are 2 diferent type in VMCS, one is defined by hardware, whose layout is unknown to VMM. Another one is defined by VMM (this patch) and used for vmcs12. The former one is using "struct vmcs" to describe its data instance, but the later one doesn't have a clear definition (or struct vmcs12?). I suggest we can have a distinguish struct for this, for example "struct sw_vmcs"(software vmcs), or "struct vvmcs" (virtual vmcs). 2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either "struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish between data structure and instance helps IMO. 3: We may use prefix or suffix in addition to vmcsxy to explictly state the format of that instance. For example vmcs02 in current patch is for hardware use, hence it is an instance "struct vmcs", but vmcs01 is an instance of "struct sw_vmcs". Postfix and prefix helps to make better understand. 4: Rename l2_vmcs to vmcs02, l1_shadow_vmcs to vmcs01, l1_vmcs to vmcs02, with prefix/postfix can strengthen above concept of vmcsxy. 5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each time when guest VMPTRLD, and free the instance at VMCLEAR. The code may fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial of service. That is fine, but it brings additional complexity and may pay with a lot of memory. I think we can emulate using concept of "cached vmcs" here in case L1 VMM doesn't do vmclear in time. L0 VMM can simply flush those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached vcs02 exceeds 10, we can do automatically flush. Thx, Eddie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 06/14/2010 04:03 PM, Nadav Har'El wrote: Let's try to get this merged quickly. I'll start fixing the individual patches and resending them individually, and when I've fixed everything I'll resubmit the whole lot. I hope that this time I can do it in a matter of days, not months. I've tried to test the patches, but I see a vm-entry failure code 7 on the very first vmentry. Guest is Fedora 12 x86-64 (2.6.32.9-70.fc12). If you can post a git tree with the next round, that will make it easier for people experimenting with the patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On Mon, Jun 14, 2010, Avi Kivity wrote about "Re: [PATCH 0/24] Nested VMX, v5": > Overall, very nice. The finer split and better documentation really > help reviewing, thanks. Thank you for the review and all the accurate comments! > Let's try to get this merged quickly. I'll start fixing the individual patches and resending them individually, and when I've fixed everything I'll resubmit the whole lot. I hope that this time I can do it in a matter of days, not months. Thanks, Nadav. -- Nadav Har'El| Monday, Jun 14 2010, 2 Tammuz 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |An egotist is a person of low taste, more http://nadav.harel.org.il |interested in himself than in me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] Nested VMX, v5
On 06/13/2010 03:22 PM, Nadav Har'El wrote: Hi Avi, This is a followup of our nested VMX patches that Orit Wasserman posted in December. We've addressed most of the comments and concerns that you and others on the mailing list had with the previous patch set. We hope you'll find these patches easier to understand, and suitable for applying to KVM. The following 24 patches implement nested VMX support. The patches enable a guest to use the VMX APIs in order to run its own nested guests. I.e., it allows running hypervisors (that use VMX) under KVM. We describe the theory behind this work, our implementation, and its performance characteristics, in IBM Research report H-0282, "The Turtles Project: Design and Implementation of Nested Virtualization", available at: http://bit.ly/a0o9te The current patches support running Linux under a nested KVM using shadow page table (with bypass_guest_pf disabled). They support multiple nested hypervisors, which can run multiple guests. Only 64-bit nested hypervisors are supported. SMP is supported. Additional patches for running Windows under nested KVM, and Linux under nested VMware server, and support for nested EPT, are currently running in the lab, and will be sent as follow-on patchsets. These patches were written by: Abel Gordon, abelg il.ibm.com Nadav Har'El, nyh il.ibm.com Orit Wasserman, oritw il.ibm.com Ben-Ami Yassor, benami il.ibm.com Muli Ben-Yehuda, muli il.ibm.com With contributions by: Anthony Liguori, aliguori us.ibm.com Mike Day, mdday us.ibm.com This work was inspired by the nested SVM support by Alexander Graf and Joerg Roedel. Changes since v4: * Rebased to the current KVM tree. * Support for lazy FPU loading. * Implemented about 90 requests and suggestions made on the mailing list regarding the previous version of this patch set. * Split the changes into many more, and better documented, patches. Overall, very nice. The finer split and better documentation really help reviewing, thanks. Let's try to get this merged quickly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html