ts
http://aneqedom.maddsites.com/uqyvewox.html -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 10:19 PM, Michael Tokarev wrote: Michael Tokarev wrote: [] Apparently that does not quite work. I just re-compiled kvm with --enable-linux-aio (actually I just installed libaio-dev on debian and qemu-kvm's configure picked it up automatically), and tried a guest. But any I/O fails. It has nothing to do with kvm. It is compat_ioctl32 in the kernel wrt aio calls. Historically I've a 64bit kernel with 32bit userland, and tried 32bit kvm too, and that does not work. But 64bit kvm works just fine with aio, and the performance numbers are indeed better. Can you elaborate? This sounds like a bug that wants to be fixed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/08/2010 11:27 PM, Nikola Ciprich wrote: It's faster. Hi Avi, Could You give some rough estimate on how much faster? The standard it depends on the workload. I'm stuck with glibc-2.5 now, but I'm always eager to improve performance, so I wonder if it would make sense to either port eventfd + aio stuff, or switch to glibc-2.8 for me... Switching to a modern setup should be much easier and safer. Esp. a modern kernel. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
Avi Kivity wrote: On 03/08/2010 10:19 PM, Michael Tokarev wrote: Michael Tokarev wrote: [] Apparently that does not quite work. I just re-compiled kvm with --enable-linux-aio (actually I just installed libaio-dev on debian and qemu-kvm's configure picked it up automatically), and tried a guest. But any I/O fails. It has nothing to do with kvm. It is compat_ioctl32 in the kernel wrt aio calls. Historically I've a 64bit kernel with 32bit userland, and tried 32bit kvm too, and that does not work. But 64bit kvm works just fine with aio, and the performance numbers are indeed better. Can you elaborate? This sounds like a bug that wants to be fixed. http://thread.gmane.org/gmane.linux.kernel.aio.general/2891 It's missing compat_ioctl for some of the aio opcodes, namely it's PREADV and PWRITE - the only ones used by kvm and the only ones missing in kernel. As far as i can see, current code converts the iocb array just fine, but does not touch iovec array used with p{read,write}v. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Portable Ladders - New Catalogue
Portable Ladders - New Catalogue SKYLAX ladders · clean · insulating · antistatic Smart Catalogue: http://www.skylax.com/pdf/smartcatalog.pdf (90 pages) SKYLAX Leitern: elegant · sauber · isoliert · antistatisch Katalog: http://www.skylax.com/pdf/Katalog.pdf (90 Seiten mit Schnellzugriff) SKYLAX échelles: élégantes · hygiéniques · isolées · antistatiques Le Catalogue: http://www.skylax.com/pdf/lecatalogue.pdf (90 pages) Scale SKYLAX · eleganti · pulite · isolanti · antistatiche Catalogo Rapido: http://www.skylax.com/pdf/catalogorapido.pdf (90 pagine) Design, engineering, manufacturing and packaging done exclusively on Italian territory -- SKYLAX - The healthier way up Via del Mulino 33, 33030 Coseano (UD) - Italy Tel: +390-432.951366 - Fax: +390 - 432.951339 Cell: +39-333.2577681 - skype: pierreleblanc skylax.com - skylax.info - skylax.biz - em...@skylax.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] take srcu lock before call to complete_pio()
complete_pio() may use slot table which is protected by srcu. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 703f637..3753c11 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4519,7 +4519,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_set_cr8(vcpu, kvm_run-cr8); if (vcpu-arch.pio.cur_count) { + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); r = complete_pio(vcpu); + srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); if (r) goto out; } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-aio usable?
On 03/09/2010 11:19 AM, Michael Tokarev wrote: Can you elaborate? This sounds like a bug that wants to be fixed. http://thread.gmane.org/gmane.linux.kernel.aio.general/2891 It's missing compat_ioctl for some of the aio opcodes, namely it's PREADV and PWRITE - the only ones used by kvm and the only ones missing in kernel. As far as i can see, current code converts the iocb array just fine, but does not touch iovec array used with p{read,write}v. Yikes, looks pretty bad. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Do not create debugfs if fail to create vcpu
On 09.03.2010, at 07:13, Wei Yongjun wrote: If fail to create the vcpu, we should not create the debugfs for it. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com Good catch. I guess a goto out kind of construct would be better, but for a single line of code this is enough. And whoever adds more lines can put the goto into place. Acked-by: Alexander Graf ag...@suse.de Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Monday 08 March 2010, Cam Macdonell wrote: enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. This means you have at least two intercepts for each message: 1. Sender writes to doorbell 2. Receiver gets interrupted With optionally two more intercepts in order to avoid interrupting the receiver every time: 3. Receiver masks interrupt in order to process data 4. Receiver unmasks interrupt when it's done and status is no longer pending I believe you can do much better than this, you combine status and mask bits, making this level triggered, and move to a bitmask of all guests: In order to send an interrupt to another guest, the sender first checks the bit for the receiver. If it's '1', no need for any intercept, the receiver will come back anyway. If it's zero, write a '1' bit, which gets OR'd into the bitmask by the host. The receiver gets interrupted at a raising edge and just leaves the bit on, until it's done processing, then turns the bit off by writing a '1' into its own location in the mask. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/08/2010 08:03 PM, Alexander Graf wrote: Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. I asked for a KVM_CAP_ for this. What was the conclusion of that thread? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/10] Initialize in-kernel irqchip
On 03/02/2010 08:25 PM, Glauber Costa wrote: On Tue, Mar 02, 2010 at 01:31:35AM -0300, Marcelo Tosatti wrote: On Fri, Feb 26, 2010 at 05:12:20PM -0300, Glauber Costa wrote: Now that we have all devices set up, this patch initializes the irqchip. This is dependant on the io-thread, since we need someone to pull ourselves out of the halted state. I don't understand why - it should work without iothread. with irqchip in kernel, we have to handle halted state in the kernel too. We still exit on signals, same as tcg w/o iothread. qemu-kvm had irqchip before iothread, IIRC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 09.03.2010, at 14:00, Avi Kivity wrote: On 03/08/2010 08:03 PM, Alexander Graf wrote: MOL uses its own hypercall interface to call back into userspace when the guest wants to do something. So let's implement that as an exit reason, specify it with a CAP and only really use it when userspace wants us to. The only user of it so far is MOL. Signed-off-by: Alexander Grafag...@suse.de --- v1 - v2: - Add documentation for OSI exit struct --- Documentation/kvm/api.txt | 13 + arch/powerpc/include/asm/kvm_book3s.h |5 + arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/kvm/book3s.c | 24 ++-- arch/powerpc/kvm/powerpc.c| 12 include/linux/kvm.h |6 ++ 6 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 6a19ab6..b2129e8 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -932,6 +932,19 @@ s390 specific. powerpc specific. +/* KVM_EXIT_OSI */ +struct { +__u64 gprs[32]; +} osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. It's as unsafe as MMIO then. Is using KVM_[GS]ET_REGS problematic for some reason? It's two additional ioctls for no good reason. We know the interface, so we can model towards it. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu
On 03/09/2010 03:01 PM, Alexander Graf wrote: On 09.03.2010, at 13:56, Avi Kivity wrote: On 03/08/2010 08:03 PM, Alexander Graf wrote: Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index d170cb4..6a19ab6 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -749,6 +749,21 @@ Writes debug registers into the vcpu. See KVM_GET_DEBUGREGS for the data structure. The flags field is unused yet and must be cleared on entry. +4.34 KVM_ENABLE_CAP + +Capability: basic Capability: basic means that the feature was present in 2.6.22. Otherwise you need to specify the KVM_CAP_ that presents this feature. +Architectures: all But it's implemented for ppc only (other arches will get ENOTTY). That was the whole idea behind it. if it fails it fails. Nothing we can do about it. If it succeeds - great. If KVM_CAP_ENABLE_CAP is present, it means the KVM_ENABLE_CAP ioctl will not return ENOTTY (it may return EINVAL if wrong values are present). ENOTTY means not implemented. 'Architectures: all' means implemented. +Not all extensions are enabled by default. Using this ioctl the application +can enable an extension, making it available to the guest. + +On systems that do not support this ioctl, it always fails. On systems that +do support it, it only works for extensions that are supported for enablement. +As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI. That needs to be documented. It also needs to be discoverable separately - we can have a kernel with KVM_ENABLE_CAP but without KVM_CAP_PPC_OSI. btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace. Please choose another namespace. Well I figured it'd be slick to have capabilities get enabled or disabled. That's the whole idea behind making it generic. If I wanted a specific interface I'd go in and create an ioctl ENABLE_OSI_INTERFACE. Ah, I see. Well, that makes sense. Please document it. But this way the detection if a capability exists can be done using the existing CAP detection. It can then be enabled using ENABLE_CAP. Okay, I agree. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 09.03.2010, at 14:11, Avi Kivity wrote: On 03/09/2010 03:04 PM, Alexander Graf wrote: + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. It's as unsafe as MMIO then. From api.txt: NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Alright - so I add KVM_EXIT_OSI there and be good? :) Is using KVM_[GS]ET_REGS problematic for some reason? It's two additional ioctls for no good reason. We know the interface, so we can model towards it. But we need to be migration safe. If the interface is not heavily used, let's not add complications. MOL uses OSI calls instead of MMIO. So yes, it is heavily used. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Make sure check_image script runs on VMs turned off
As it is hard to guarantee that a qcow2 image will be in a consistent state with a VM turned on, take an extra safety step and make sure the preprocessor shuts down the VMs before the post process command check_image.py runs. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests_base.cfg.sample |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 340b0c0..beae786 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -1049,6 +1049,8 @@ variants: post_command = python scripts/check_image.py; remove_image = no post_command_timeout = 600 +kill_vm = yes +kill_vm_gracefully = yes - vmdk: only Fedora Ubuntu Windows only smp2 -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Do not create debugfs if fail to create vcpu
On 03/09/2010 08:13 AM, Wei Yongjun wrote: If fail to create the vcpu, we should not create the debugfs for it. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] take srcu lock before call to complete_pio()
On 03/09/2010 12:01 PM, Gleb Natapov wrote: complete_pio() may use slot table which is protected by srcu. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu
On 09.03.2010, at 13:56, Avi Kivity wrote: On 03/08/2010 08:03 PM, Alexander Graf wrote: Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index d170cb4..6a19ab6 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -749,6 +749,21 @@ Writes debug registers into the vcpu. See KVM_GET_DEBUGREGS for the data structure. The flags field is unused yet and must be cleared on entry. +4.34 KVM_ENABLE_CAP + +Capability: basic Capability: basic means that the feature was present in 2.6.22. Otherwise you need to specify the KVM_CAP_ that presents this feature. +Architectures: all But it's implemented for ppc only (other arches will get ENOTTY). That was the whole idea behind it. if it fails it fails. Nothing we can do about it. If it succeeds - great. +Not all extensions are enabled by default. Using this ioctl the application +can enable an extension, making it available to the guest. + +On systems that do not support this ioctl, it always fails. On systems that +do support it, it only works for extensions that are supported for enablement. +As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI. That needs to be documented. It also needs to be discoverable separately - we can have a kernel with KVM_ENABLE_CAP but without KVM_CAP_PPC_OSI. btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace. Please choose another namespace. Well I figured it'd be slick to have capabilities get enabled or disabled. That's the whole idea behind making it generic. If I wanted a specific interface I'd go in and create an ioctl ENABLE_OSI_INTERFACE. But this way the detection if a capability exists can be done using the existing CAP detection. It can then be enabled using ENABLE_CAP. Need to document the structure fields. /* @@ -696,6 +705,8 @@ struct kvm_clock_data { /* Available with KVM_CAP_DEBUGREGS */ #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) +/* No need for CAP, because then it just always fails */ +#define KVM_ENABLE_CAP_IOW(KVMIO, 0xa3, struct kvm_enable_cap) The CAPs are needed so you can discover what you have without running guests. The whole point of this extension was to make CAPs not always enabled, but make them possibly enable on demand. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 03/09/2010 03:04 PM, Alexander Graf wrote: + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. It's as unsafe as MMIO then. From api.txt: NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Is using KVM_[GS]ET_REGS problematic for some reason? It's two additional ioctls for no good reason. We know the interface, so we can model towards it. But we need to be migration safe. If the interface is not heavily used, let's not add complications. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line
On 09.03.2010, at 13:50, Avi Kivity wrote: On 03/08/2010 08:03 PM, Alexander Graf wrote: Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. I asked for a KVM_CAP_ for this. What was the conclusion of that thread? Uh - did we come to one? The last thing you said about it was: Having individual capabilities makes backporting a lot easier (otherwise you have to backport the whole thing). If the changes are logically separate, I prefer 500 separate capabilities. However, for a platform bringup, it's okay to have just one capability, assuming none of the changes are applicable to other platforms. So I assumed it'd be ok to not have one. If you like I can send an additional patch adding the CAP. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line
On 03/09/2010 02:54 PM, Alexander Graf wrote: On 09.03.2010, at 13:50, Avi Kivity wrote: On 03/08/2010 08:03 PM, Alexander Graf wrote: Userspace can tell us that it wants to trigger an interrupt. But so far it can't tell us that it wants to stop triggering one. So let's interpret the parameter to the ioctl that we have anyways to tell us if we want to raise or lower the interrupt line. I asked for a KVM_CAP_ for this. What was the conclusion of that thread? Uh - did we come to one? The last thing you said about it was: Having individual capabilities makes backporting a lot easier (otherwise you have to backport the whole thing). If the changes are logically separate, I prefer 500 separate capabilities. However, for a platform bringup, it's okay to have just one capability, assuming none of the changes are applicable to other platforms. So I assumed it'd be ok to not have one. If you like I can send an additional patch adding the CAP. Well, what's the capability for this patchset? Things like if you have KVM_CAP_OSI you can assume you have KVM_INTERRUPT_LOWER don't work for me. A platform cap would be called KVM_CAP_MOL and explicitly document everything in there. And it commits you to not deprecating things individually. Really, individual caps are better. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/7] kvm-tpr-opt cleanups
On 03/09/2010 02:47 AM, Marcelo Tosatti wrote: Prepare kvm-tpr-opt.c for upstream merge. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 03/09/2010 03:12 PM, Alexander Graf wrote: On 09.03.2010, at 14:11, Avi Kivity wrote: On 03/09/2010 03:04 PM, Alexander Graf wrote: + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. It's as unsafe as MMIO then. From api.txt: NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Alright - so I add KVM_EXIT_OSI there and be good? :) Sure, just verify that the note holds for that case too. But we need to be migration safe. If the interface is not heavily used, let's not add complications. MOL uses OSI calls instead of MMIO. So yes, it is heavily used. Ok. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] Don't call apic functions directly from kvm code
On 02/26/2010 10:12 PM, Glauber Costa wrote: It is actually not necessary to call a tpr function to save and load cr8, as cr8 is part of the processor state, and thus, it is much easier to just add it to CPUState. As for apic base, wrap kvm usages, so we can call either the qemu device, or the in kernel version. } +static void kvm_set_apic_base(CPUState *env, uint64_t val) +{ +if (!kvm_irqchip_in_kernel()) +cpu_set_apic_base(env, val); What if it is in kernel? Just ignored? Doesn't seem right. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/24] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 094d17c..81ecf47 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1506,7 +1506,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt, if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, seg); + rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg); return rc; } @@ -1681,7 +1681,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt, rc = emulate_pop(ctxt, ops, cs, c-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; - rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)cs, VCPU_SREG_CS); + rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS); return rc; } @@ -2714,7 +2714,7 @@ special_insn: if (c-modrm_reg == VCPU_SREG_SS) toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_MOV_SS); - rc = kvm_load_segment_descriptor(ctxt-vcpu, sel, c-modrm_reg); + rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg); c-dst.type = OP_NONE; /* Disable writeback. */ break; @@ -2889,8 +2889,8 @@ special_insn: goto jmp; case 0xea: /* jmp far */ jump_far: - if (kvm_load_segment_descriptor(ctxt-vcpu, c-src2.val, - VCPU_SREG_CS)) + if (load_segment_descriptor(ctxt, ops, c-src2.val, + VCPU_SREG_CS)) goto done; c-eip = c-src.val; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 032d02f..e881618 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* +* write_std: Write bytes of standard (non-emulated/special) memory. +*Used for descriptor writing. +* @addr: [IN ] Linear address to which to write. +* @val: [OUT] Value write to memory, zero-extended to 'u_long'. +* @bytes: [IN ] Number of bytes to write to memory. +*/ + int (*write_std)(unsigned long addr, void *val, +unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); + /* * fetch: Read bytes of standard (non-emulated/special) memory. *Used for instruction fetch. * @addr: [IN ] Linear address from which to read. @@ -108,6 +117,13 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct desc_struct *desc, + int seg, struct kvm_vcpu *vcpu); + void (*set_cached_descriptor)(struct desc_struct *desc, + int seg, struct kvm_vcpu *vcpu); + u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu); + void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu); + void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 41cf54c..f89502d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3077,6 +3077,18 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) return kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, addr, len, v); } +static void kvm_set_segment(struct kvm_vcpu *vcpu, + struct kvm_segment *var, int seg) +{ + kvm_x86_ops-set_segment(vcpu, var, seg); +} + +void kvm_get_segment(struct kvm_vcpu *vcpu, +struct kvm_segment *var, int seg) +{ + kvm_x86_ops-get_segment(vcpu, var, seg); +} + gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error) { u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; @@ -3157,14 +3169,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); } -static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, - struct kvm_vcpu *vcpu, u32 *error) +static int kvm_write_guest_virt_helper(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 access, + u32 *error) { void *data = val; int r = X86EMUL_CONTINUE; + access |= PFERR_WRITE_MASK; + while (bytes) { - gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error); + gpa_t gpa = vcpu-arch.mmu.gva_to_gpa(vcpu, addr, access, error); unsigned offset = addr (PAGE_SIZE-1); unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); int ret; @@ -3187,6 +3203,19 @@ out: return r; } +static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, error); +} + +static int kvm_write_guest_virt_system(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error); +} static int emulator_read_emulated(unsigned long addr, void *val, @@ -3453,12 +3482,95 @@ static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) } } +static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) +{ + kvm_x86_ops-get_gdt(vcpu, dt); +} + +static bool
[PATCH 14/24] KVM: x86 emulator: cleanup grp3 return value
When x86_emulate_insn() does not know how to emulate instruction it exits via cannot_emulate label in all cases except when emulating grp3. Fix that. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 12 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 018abb3..6e2b34b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1394,7 +1394,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; - int rc = X86EMUL_CONTINUE; switch (c-modrm_reg) { case 0 ... 1: /* test */ @@ -1407,11 +1406,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, emulate_1op(neg, c-dst, ctxt-eflags); break; default: - DPRINTF(Cannot emulate %02x\n, c-b); - rc = X86EMUL_UNHANDLEABLE; - break; + return 0; } - return rc; + return 1; } static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt, @@ -2370,9 +2367,8 @@ special_insn: c-dst.type = OP_NONE; /* Disable writeback. */ break; case 0xf6 ... 0xf7: /* Grp3 */ - rc = emulate_grp3(ctxt, ops); - if (rc != X86EMUL_CONTINUE) - goto done; + if (!emulate_grp3(ctxt, ops)) + goto cannot_emulate; break; case 0xf8: /* clc */ ctxt-eflags = ~EFLG_CF; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/24] KVM: x86 emulator: fix mov dr to inject #UD when needed.
If CR4.DE=1 access to registers DR4/DR5 cause #UD. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5ba082a..dcb9720 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2527,9 +2527,12 @@ twobyte_insn: c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops-get_cr(4, ctxt-vcpu) X86_CR4_DE) + (c-modrm_reg == 4 || c-modrm_reg == 5)) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } + emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ @@ -2537,9 +2540,12 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) - goto cannot_emulate; - rc = X86EMUL_CONTINUE; + if ((ops-get_cr(4, ctxt-vcpu) X86_CR4_DE) + (c-modrm_reg == 4 || c-modrm_reg == 5)) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } + emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); c-dst.type = OP_NONE; /* no writeback */ break; case 0x30: -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/24] KVM: x86 emulator: inject #UD on access to non-existing CR
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 54e62dc..5ba082a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2516,6 +2516,13 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ + switch (c-modrm_reg) { + case 1: + case 5 ... 7: + case 9 ... 15: + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; + } c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/24] KVM: x86 emulator: fix mov r/m, sreg emulation.
mov r/m, sreg generates #UD ins sreg is incorrect. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2cc9ef4..2df510b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2122,12 +2122,11 @@ special_insn: case 0x8c: { /* mov r/m, sreg */ struct kvm_segment segreg; - if (c-modrm_reg = 5) + if (c-modrm_reg = VCPU_SREG_GS) kvm_get_segment(ctxt-vcpu, segreg, c-modrm_reg); else { - printk(KERN_INFO 0x8c: Invalid segreg in modrm byte 0x%02x\n, - c-modrm); - goto cannot_emulate; + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; } c-dst.val = segreg.selector; break; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/24] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field are ignored. Interestingly enough older spec says that 11 is only valid encoding. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1a32b78..54e62dc 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2516,28 +2516,20 @@ twobyte_insn: c-dst.type = OP_NONE; break; case 0x20: /* mov cr, reg */ - if (c-modrm_mod != 3) - goto cannot_emulate; c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ - if (c-modrm_mod != 3) - goto cannot_emulate; if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ - if (c-modrm_mod != 3) - goto cannot_emulate; ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ - if (c-modrm_mod != 3) - goto cannot_emulate; if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm])) goto cannot_emulate; rc = X86EMUL_CONTINUE; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/24] KVM: Provide current CPL as part of emulator context.
Eliminate the need to call back into KVM to get it from emulator. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |6 +++--- arch/x86/kvm/x86.c |1 + 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0c5caa4..d8b2da0 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -159,6 +159,7 @@ struct x86_emulate_ctxt { struct kvm_vcpu *vcpu; unsigned long eflags; + int cpl; /* Emulated execution mode, represented by an X86EMUL_MODE value. */ int mode; u32 cs_base; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3d1ee74..ed29a52 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1254,7 +1254,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, int rc; unsigned long val, change_mask; int iopl = (ctxt-eflags X86_EFLAGS_IOPL) IOPL_SHIFT; - int cpl = kvm_x86_ops-get_cpl(ctxt-vcpu); + int cpl = ctxt-cpl; rc = emulate_pop(ctxt, ops, val, len); if (rc != X86EMUL_CONTINUE) @@ -1763,7 +1763,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt) if (ctxt-mode == X86EMUL_MODE_VM86) return true; iopl = (ctxt-eflags X86_EFLAGS_IOPL) IOPL_SHIFT; - return kvm_x86_ops-get_cpl(ctxt-vcpu) iopl; + return ctxt-cpl iopl; } static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt, @@ -1839,7 +1839,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* Privileged instruction can be executed only in CPL=0 */ - if ((c-d Priv) kvm_x86_ops-get_cpl(ctxt-vcpu)) { + if ((c-d Priv) ctxt-cpl) { kvm_inject_gp(ctxt-vcpu, 0); goto done; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f2b61c..9b5fb43 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu-arch.emulate_ctxt.vcpu = vcpu; vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); + vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu); vcpu-arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/24] KVM: x86 emulator: remove saved_eip
c-eip is never written back in case of emulation failure, so no need to set it to old value. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |9 + 1 files changed, 1 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 505dfba..ba1ce61 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2377,7 +2377,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { unsigned long memop = 0; u64 msr_data; - unsigned long saved_eip = 0; struct decode_cache *c = ctxt-decode; unsigned int port; int io_dir_in; @@ -2391,7 +2390,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) */ memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); - saved_eip = c-eip; if (ctxt-mode == X86EMUL_MODE_PROT64 (c-d No64)) { kvm_queue_exception(ctxt-vcpu, UD_VECTOR); @@ -2983,11 +2981,7 @@ writeback: kvm_rip_write(ctxt-vcpu, c-eip); done: - if (rc == X86EMUL_UNHANDLEABLE) { - c-eip = saved_eip; - return -1; - } - return 0; + return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; twobyte_insn: switch (c-b) { @@ -3264,6 +3258,5 @@ twobyte_insn: cannot_emulate: DPRINTF(Cannot emulate %02x\n, c-b); - c-eip = saved_eip; return -1; } -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/24] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
If LOCK prefix is used dest arg should be memory, otherwise instruction should generate #UD. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 45ded7f..018abb3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1838,7 +1838,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } /* LOCK prefix is allowed only with some instructions */ - if (c-lock_prefix !(c-d Lock)) { + if (c-lock_prefix (!(c-d Lock) || c-dst.type != OP_MEM)) { kvm_queue_exception(ctxt-vcpu, UD_VECTOR); goto done; } -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/24] KVM: x86 emulator: do not call writeback if msr access fails.
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6381df9..45ded7f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2559,7 +2559,7 @@ twobyte_insn: | ((u64)c-regs[VCPU_REGS_RDX] 32); if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = ctxt-eip; + goto done; } rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; @@ -2568,7 +2568,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = ctxt-eip; + goto done; } else { c-regs[VCPU_REGS_RAX] = (u32)msr_data; c-regs[VCPU_REGS_RDX] = msr_data 32; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/24] KVM: x86 emulator: fix in/out emulation.
in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |7 ++ arch/x86/kvm/emulate.c | 17 ++-- arch/x86/kvm/svm.c | 22 + arch/x86/kvm/vmx.c | 19 +--- arch/x86/kvm/x86.c | 203 +--- 5 files changed, 139 insertions(+), 129 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 4268330..7d323d5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -119,6 +119,13 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); + + int (*pio_in_emulated)(int size, unsigned short port, void *val, + unsigned int count, struct kvm_vcpu *vcpu); + + int (*pio_out_emulated)(int size, unsigned short port, const void *val, + unsigned int count, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct desc_struct *desc, int seg, struct kvm_vcpu *vcpu); void (*set_cached_descriptor)(struct desc_struct *desc, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 81ecf47..0ec7b9b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -208,12 +208,12 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, /* 0xE0 - 0xE7 */ 0, 0, 0, 0, - ByteOp | SrcImmUByte, SrcImmUByte, + ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, ByteOp | SrcImmUByte, SrcImmUByte, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, + SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* 0xF0 - 0xF7 */ 0, 0, 0, 0, @@ -2915,12 +2915,13 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (kvm_emulate_pio(ctxt-vcpu, io_dir_in, - (c-d ByteOp) ? 1 : c-op_bytes, - port) != 0) { - c-eip = saved_eip; - goto cannot_emulate; - } + if (io_dir_in) + ops-pio_in_emulated((c-d ByteOp) ? 1 : c-op_bytes, +port, c-dst.val, 1, ctxt-vcpu); + else + ops-pio_out_emulated((c-d ByteOp) ? 1 : c-op_bytes, + port, c-regs[VCPU_REGS_RAX], 1, + ctxt-vcpu); break; case 0xf4: /* hlt */ ctxt-vcpu-arch.halt_request = 1; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index def4877..315e8a8 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm) static int io_interception(struct vcpu_svm *svm) { - u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */ - int size, in, string; - unsigned port; - ++svm-vcpu.stat.io_exits; - svm-next_rip = svm-vmcb-control.exit_info_2; - - string = (io_info SVM_IOIO_STR_MASK) != 0; - - if (string) { - if (emulate_instruction(svm-vcpu, - 0, 0, 0) == EMULATE_DO_MMIO) - return 0; - return 1; - } - - in = (io_info SVM_IOIO_TYPE_MASK) != 0; - port = io_info 16; - size = (io_info SVM_IOIO_SIZE_MASK) SVM_IOIO_SIZE_SHIFT; - - skip_emulated_instruction(svm-vcpu); - return kvm_emulate_pio(svm-vcpu, in, size, port); + return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO); } static int nmi_interception(struct vcpu_svm *svm) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..7f33d8e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2974,26 +2974,9 @@ static int handle_triple_fault(struct kvm_vcpu *vcpu) static int handle_io(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; - int size, in, string; - unsigned port; - ++vcpu-stat.io_exits; -
[PATCH 11/24] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD for those instruction when appropriate. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c | 17 +++-- 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dcb9720..6381df9 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1597,8 +1597,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) u64 msr_data; /* syscall is not available in real mode */ - if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86) - return X86EMUL_UNHANDLEABLE; + if (ctxt-mode == X86EMUL_MODE_REAL || + ctxt-mode == X86EMUL_MODE_VM86) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, cs, ss); @@ -1648,14 +1651,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) /* inject #GP if in real mode */ if (ctxt-mode == X86EMUL_MODE_REAL) { kvm_inject_gp(ctxt-vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } /* XXX sysenter/sysexit have not been tested in 64bit mode. * Therefore, we inject an #UD. */ - if (ctxt-mode == X86EMUL_MODE_PROT64) - return X86EMUL_UNHANDLEABLE; + if (ctxt-mode == X86EMUL_MODE_PROT64) { + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + return X86EMUL_PROPAGATE_FAULT; + } setup_syscalls_segments(ctxt, cs, ss); @@ -1710,7 +1715,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86) { kvm_inject_gp(ctxt-vcpu, 0); - return X86EMUL_UNHANDLEABLE; + return X86EMUL_PROPAGATE_FAULT; } setup_syscalls_segments(ctxt, cs, ss); -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.
On 03/09/2010 04:09 PM, Gleb Natapov wrote: Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. + ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); + void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; Note, passing a vcpu means we are still tightly coupled to kvm. Can be fixed later. +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ + unsigned long value; + + switch (cr) { + case 0: + value = kvm_read_cr0(vcpu); + break; + case 2: + value = vcpu-arch.cr2; + break; + case 3: + value = vcpu-arch.cr3; + break; + case 4: + value = kvm_read_cr4(vcpu); + break; + case 8: + value = kvm_get_cr8(vcpu); + break; + default: + vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); + return 0; This printk is triggerable by guest code (as the patch didn't introduce this, it can be fixed later). The emulator should #UD on unrecognised control registers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/24] KVM: remove realmode_lmsw function.
Use (get|set)_cr callback to emulate lmsw inside emulator. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h |2 -- arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/x86.c |7 --- 3 files changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e8e108a..1e15a0a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context); void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d515795..3d1ee74 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2483,8 +2483,8 @@ twobyte_insn: c-dst.val = ops-get_cr(0, ctxt-vcpu); break; case 6: /* lmsw */ - realmode_lmsw(ctxt-vcpu, (u16)c-src.val, - ctxt-eflags); + ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu) ~0x0ful) | + (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; case 7: /* invlpg*/ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7b62ef2..8f2b61c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4072,13 +4072,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base) kvm_x86_ops-set_idt(vcpu, dt); } -void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, - unsigned long *rflags) -{ - kvm_lmsw(vcpu, msw); - *rflags = kvm_get_rflags(vcpu); -} - static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i) { struct kvm_cpuid_entry2 *e = vcpu-arch.cpuid_entries[i]; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/24] [RFC] emulator cleanup
This is the first series of patches that tries to cleanup emulator code. This is mix of bug fixes and moving code that does emulation from x86.c to emulator.c while making it KVM independent. The status of the patches: works for me. realtime.flat test now also pass where it failed before. Gleb Natapov (24): KVM: Remove pointer to rflags from realmode_set_cr parameters. KVM: Provide callback to get/set control registers in emulator ops. KVM: remove realmode_lmsw function. KVM: Provide current CPL as part of emulator context. KVM: Provide current eip as part of emulator context. KVM: x86 emulator: fix mov r/m, sreg emulation. KVM: x86 emulator: fix 0f 01 /5 emulation KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits. KVM: x86 emulator: inject #UD on access to non-existing CR KVM: x86 emulator: fix mov dr to inject #UD when needed. KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations KVM: x86 emulator: do not call writeback if msr access fails. KVM: x86 emulator: If LOCK prefix is used dest arg should be memory. KVM: x86 emulator: cleanup grp3 return value KVM: x86 emulator: Provide more callbacks for x86 emulator. KVM: x86 emulator: Emulate task switch in emulator.c KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor() KVM: Use task switch from emulator.c KVM: x86 emulator: fix in/out emulation. KVM: x86 emulator: Move string pio emulation into emulator.c KVM: x86 emulator: remove saved_eip KVM: x86 emulator: restart string instruction without going back to a guest. KVM: x86 emulator: introduce pio in string read ahead. KVM: small kvm_arch_vcpu_ioctl_run() cleanup. arch/x86/include/asm/kvm_emulate.h | 41 ++- arch/x86/include/asm/kvm_host.h| 10 - arch/x86/kvm/emulate.c | 813 +++ arch/x86/kvm/svm.c | 22 +- arch/x86/kvm/vmx.c | 19 +- arch/x86/kvm/x86.c | 1112 +--- 6 files changed, 1016 insertions(+), 1001 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/24] KVM: Use task switch from emulator.c
Remove old task switch code from x86.c Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/x86.c | 558 ++-- 1 files changed, 18 insertions(+), 540 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f89502d..5171696 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4798,553 +4798,31 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return 0; } -static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector, - struct kvm_segment *kvm_desct) -{ - kvm_desct-base = get_desc_base(seg_desc); - kvm_desct-limit = get_desc_limit(seg_desc); - if (seg_desc-g) { - kvm_desct-limit = 12; - kvm_desct-limit |= 0xfff; - } - kvm_desct-selector = selector; - kvm_desct-type = seg_desc-type; - kvm_desct-present = seg_desc-p; - kvm_desct-dpl = seg_desc-dpl; - kvm_desct-db = seg_desc-d; - kvm_desct-s = seg_desc-s; - kvm_desct-l = seg_desc-l; - kvm_desct-g = seg_desc-g; - kvm_desct-avl = seg_desc-avl; - if (!selector) - kvm_desct-unusable = 1; - else - kvm_desct-unusable = 0; - kvm_desct-padding = 0; -} - -static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu, - u16 selector, - struct desc_ptr *dtable) -{ - if (selector 1 2) { - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, kvm_seg, VCPU_SREG_LDTR); - - if (kvm_seg.unusable) - dtable-size = 0; - else - dtable-size = kvm_seg.limit; - dtable-address = kvm_seg.base; - } - else - kvm_x86_ops-get_gdt(vcpu, dtable); -} - -/* allowed just for 8 bytes segments */ -static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector 3; - int ret; - u32 err; - gva_t addr; - - get_segment_descriptor_dtable(vcpu, selector, dtable); - - if (dtable.size index * 8 + 7) { - kvm_queue_exception_e(vcpu, GP_VECTOR, selector 0xfffc); - return X86EMUL_PROPAGATE_FAULT; - } - addr = dtable.address + index * 8; - ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc), -vcpu, err); - if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(vcpu, addr, err); - - return ret; -} - -/* allowed just for 8 bytes segments */ -static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, -struct desc_struct *seg_desc) -{ - struct desc_ptr dtable; - u16 index = selector 3; - - get_segment_descriptor_dtable(vcpu, selector, dtable); - - if (dtable.size index * 8 + 7) - return 1; - return kvm_write_guest_virt(dtable.address + index*8, seg_desc, sizeof(*seg_desc), vcpu, NULL); -} - -static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu, - struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL); -} - -static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu, -struct desc_struct *seg_desc) -{ - u32 base_addr = get_desc_base(seg_desc); - - return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL); -} - -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg) -{ - struct kvm_segment kvm_seg; - - kvm_get_segment(vcpu, kvm_seg, seg); - return kvm_seg.selector; -} - -static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment segvar = { - .base = selector 4, - .limit = 0x, - .selector = selector, - .type = 3, - .present = 1, - .dpl = 3, - .db = 0, - .s = 1, - .l = 0, - .g = 0, - .avl = 0, - .unusable = 0, - }; - kvm_x86_ops-set_segment(vcpu, segvar, seg); - return X86EMUL_CONTINUE; -} - -static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg) -{ - return (seg != VCPU_SREG_LDTR) - (seg != VCPU_SREG_TR) - (kvm_get_rflags(vcpu) X86_EFLAGS_VM); -} - -int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg) -{ - struct kvm_segment kvm_seg; - struct desc_struct seg_desc; - u8 dpl, rpl, cpl; - unsigned err_vec = GP_VECTOR; - u32 err_code = 0; - bool
[PATCH 20/24] KVM: x86 emulator: Move string pio emulation into emulator.c
Currently emulation is done outside of emulator so things like doing ins/outs to/from mmio are broken it also makes it hard (if not impossible) to implement single stepping in the future. The implementation in this patch is not efficient since it exits to userspace for each IO while previous implementation did 'ins' in batches. Further patch that implements pio in string read ahead address this problem. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h |5 - arch/x86/kvm/emulate.c | 61 ++-- arch/x86/kvm/x86.c | 204 +++ 3 files changed, 45 insertions(+), 225 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1e15a0a..8507b22 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer { struct kvm_pio_request { unsigned long count; - int cur_count; - gva_t guest_gva; int in; int port; int size; - int string; - int down; - int rep; }; /* diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0ec7b9b..505dfba 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -151,8 +151,8 @@ static u32 opcode_table[256] = { 0, 0, 0, 0, /* 0x68 - 0x6F */ SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0, - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* insb, insw/insd */ - SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps, /* outsb, outsw/outsd */ + DstMem | ByteOp | Mov | String, DstMem | Mov | String, /* insb, insw/insd */ + SrcMem | ByteOp | ImplicitOps | String, SrcMem | ImplicitOps | String, /* outsb, outsw/outsd */ /* 0x70 - 0x77 */ SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte, @@ -2439,7 +2439,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) goto done; } } - c-regs[VCPU_REGS_RCX]--; + if (c-src.type == OP_MEM) + memop = register_address(c, seg_override_base(ctxt, c), +c-regs[VCPU_REGS_RSI]); + if (c-dst.type == OP_MEM) + memop = register_address(c, es_base(ctxt), +c-regs[VCPU_REGS_RDI]); c-eip = ctxt-eip; } @@ -2596,20 +2601,14 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt-vcpu, - 1, - (c-d ByteOp) ? 1 : c-op_bytes, - c-rep_prefix ? - address_mask(c, c-regs[VCPU_REGS_RCX]) : 1, - (ctxt-eflags EFLG_DF), - register_address(c, es_base(ctxt), -c-regs[VCPU_REGS_RDI]), - c-rep_prefix, - c-regs[VCPU_REGS_RDX]) == 0) { - c-eip = saved_eip; - return -1; - } - return 0; + if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX], + c-dst.val, 1, ctxt-vcpu)) + goto done; /* IO is needed, skip writeback */ + + register_address_increment(c, c-regs[VCPU_REGS_RDI], + (ctxt-eflags EFLG_DF) ? + -c-dst.bytes : c-dst.bytes); + break; case 0x6e: /* outsb */ case 0x6f: /* outsw/outsd */ if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX], @@ -2617,21 +2616,14 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (kvm_emulate_pio_string(ctxt-vcpu, - 0, - (c-d ByteOp) ? 1 : c-op_bytes, - c-rep_prefix ? - address_mask(c, c-regs[VCPU_REGS_RCX]) : 1, - (ctxt-eflags EFLG_DF), -register_address(c, - seg_override_base(ctxt, c), -c-regs[VCPU_REGS_RSI]), - c-rep_prefix, - c-regs[VCPU_REGS_RDX]) == 0) { - c-eip = saved_eip; - return -1; - } - return 0; +
[PATCH 23/24] KVM: x86 emulator: introduce pio in string read ahead.
To optimize rep ins instruction do IO in big chunks ahead of time instead of doing it only when required during instruction emulation. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |7 +++ arch/x86/kvm/emulate.c | 34 ++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f74b4ad..da7a711 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -150,6 +150,12 @@ struct fetch_cache { unsigned long end; }; +struct read_cache { + u8 data[1024]; + unsigned long pos; + unsigned long end; +}; + struct decode_cache { u8 twobyte; u8 b; @@ -177,6 +183,7 @@ struct decode_cache { void *modrm_ptr; unsigned long modrm_val; struct fetch_cache fetch; + struct read_cache io_read; }; struct x86_emulate_ctxt { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 76ed77d..987be2a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1222,6 +1222,28 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + unsigned int size, unsigned short port, + void *dest, unsigned int count) +{ + struct read_cache *mc = ctxt-decode.io_read; + + if (mc-pos == mc-end) { /* refill pio read ahead */ + unsigned int n = sizeof(mc-data) / size; + n = min(n, count); + mc-pos = mc-end = 0; + if (!ops-pio_in_emulated(size, port, mc-data, n, + ctxt-vcpu)) + return 0; + mc-end = n * size; + } + + memcpy(dest, mc-data + mc-pos, size); + mc-pos += size; + return 1; +} + static u32 desc_limit_scaled(struct desc_struct *desc) { u32 limit = get_desc_limit(desc); @@ -2601,8 +2623,11 @@ special_insn: kvm_inject_gp(ctxt-vcpu, 0); goto done; } - if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX], - c-dst.val, 1, ctxt-vcpu)) + if (c-rep_prefix) + ctxt-restart = true; + if (!pio_in_emulated(ctxt, ops, c-dst.bytes, +c-regs[VCPU_REGS_RDX], c-dst.val, +c-rep_prefix ? c-regs[VCPU_REGS_RCX] : 1)) goto done; /* IO is needed, skip writeback */ register_address_increment(c, c-regs[VCPU_REGS_RDI], @@ -2908,8 +2933,9 @@ special_insn: goto done; } if (io_dir_in) - ops-pio_in_emulated((c-d ByteOp) ? 1 : c-op_bytes, -port, c-dst.val, 1, ctxt-vcpu); + pio_in_emulated(ctxt, ops, + (c-d ByteOp) ? 1 : c-op_bytes, + port, c-dst.val, 1); else ops-pio_out_emulated((c-d ByteOp) ? 1 : c-op_bytes, port, c-regs[VCPU_REGS_RAX], 1, -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation
It is undefined and should generate #UD. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2df510b..1a32b78 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,6 +2486,9 @@ twobyte_insn: (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; + case 5: /* not defined */ + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt-vcpu, memop); /* Disable writeback. */ -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/24] KVM: x86 emulator: Emulate task switch in emulator.c
Implement emulation of 16/32 bit task switch in emulator.c Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |5 + arch/x86/kvm/emulate.c | 564 2 files changed, 569 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index e881618..4268330 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -11,6 +11,8 @@ #ifndef _ASM_X86_KVM_X86_EMULATE_H #define _ASM_X86_KVM_X86_EMULATE_H +#include asm/desc_defs.h + struct x86_emulate_ctxt; /* @@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); int x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops); +int emulator_task_switch(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 tss_selector, int reason); #endif /* _ASM_X86_KVM_X86_EMULATE_H */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6e2b34b..094d17c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -33,6 +33,7 @@ #include asm/kvm_emulate.h #include x86.h +#include tss.h /* * Opcode effective-address decode tables. @@ -1218,6 +1219,199 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static u32 desc_limit_scaled(struct desc_struct *desc) +{ + u32 limit = get_desc_limit(desc); + + return desc-g ? (limit 12) | 0xfff : limit; +} + +static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +u16 selector, struct desc_ptr *dt) +{ + if (selector 1 2) { + struct desc_struct desc; + memset (dt, 0, sizeof *dt); + if(!ops-get_cached_descriptor(desc, VCPU_SREG_LDTR, ctxt-vcpu)) + return; + + dt-size = desc_limit_scaled(desc); /* what if limit 65535? */ + dt-address = get_desc_base(desc); + } + else + ops-get_gdt(dt, ctxt-vcpu); +} + +/* allowed just for 8 bytes segments */ +static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector 3; + int ret; + u32 err; + ulong addr; + + get_descriptor_table_ptr(ctxt, ops, selector, dt); + + if (dt.size index * 8 + 7) { + kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + addr = dt.address + index * 8; + ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu, err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt-vcpu, addr, err); + + return ret; +} + +/* allowed just for 8 bytes segments */ +static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, struct desc_struct *desc) +{ + struct desc_ptr dt; + u16 index = selector 3; + u32 err; + ulong addr; + int ret; + + get_descriptor_table_ptr(ctxt, ops, selector, dt); + + if (dt.size index * 8 + 7) { + kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + return X86EMUL_PROPAGATE_FAULT; + } + + addr = dt.address + index * 8; + ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err); + if (ret == X86EMUL_PROPAGATE_FAULT) + kvm_inject_page_fault(ctxt-vcpu, addr, err); + + return ret; +} + +static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, + u16 selector, int seg) +{ + struct desc_struct seg_desc; + u8 dpl, rpl, cpl; + unsigned err_vec = GP_VECTOR; + u32 err_code = 0; + bool null_selector = !(selector ~0x3); /* -0003 are null */ + int ret; + + memset(seg_desc, 0, sizeof seg_desc); + + if ((seg = VCPU_SREG_GS ctxt-mode == X86EMUL_MODE_VM86) + || ctxt-mode == X86EMUL_MODE_REAL) { + /* set real mode segment descriptor */ + set_desc_base(seg_desc, selector 4); + set_desc_limit(seg_desc, 0x); + seg_desc.type = 3; + seg_desc.p = 1; + seg_desc.s = 1; + goto load; + } + + /* NULL selector is not valid for TR, CS and SS */ + if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR) +null_selector) + goto exception; + + /* TR should be in
[PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 22 -- arch/x86/kvm/x86.c | 16 +++- 3 files changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 7d323d5..f74b4ad 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -193,6 +193,7 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; + bool restart; /* restart string instruction after writeback */ /* decode cache */ struct decode_cache decode; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ba1ce61..76ed77d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -925,8 +925,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int mode = ctxt-mode; int def_op_bytes, def_ad_bytes, group; - /* Shadow copy of register state. Committed on successful emulation. */ + /* we cannot decode insn before we complete previous rep insn */ + WARN_ON(ctxt-restart); + + /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); c-eip = ctxt-eip; ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS); @@ -2412,8 +2415,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) memop = c-modrm_ea; if (c-rep_prefix (c-d String)) { + ctxt-restart = true; /* All REP prefixes have the same first termination condition */ if (c-regs[VCPU_REGS_RCX] == 0) { + string_done: + ctxt-restart = false; kvm_rip_write(ctxt-vcpu, c-eip); goto done; } @@ -2425,17 +2431,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) * - if REPNE/REPNZ and ZF = 1 then done */ if ((c-b == 0xa6) || (c-b == 0xa7) || - (c-b == 0xae) || (c-b == 0xaf)) { + (c-b == 0xae) || (c-b == 0xaf)) { if ((c-rep_prefix == REPE_PREFIX) - ((ctxt-eflags EFLG_ZF) == 0)) { - kvm_rip_write(ctxt-vcpu, c-eip); - goto done; - } + ((ctxt-eflags EFLG_ZF) == 0)) + goto string_done; if ((c-rep_prefix == REPNE_PREFIX) - ((ctxt-eflags EFLG_ZF) == EFLG_ZF)) { - kvm_rip_write(ctxt-vcpu, c-eip); - goto done; - } + ((ctxt-eflags EFLG_ZF) == EFLG_ZF)) + goto string_done; } if (c-src.type == OP_MEM) memop = register_address(c, seg_override_base(ctxt, c), diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b25ef4b..82379e1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3724,6 +3724,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DONE; } +restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops); shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; @@ -3746,7 +3747,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, if (r) { if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) - return EMULATE_DONE; + goto done; if (!vcpu-mmio_needed) { kvm_report_emulation_failure(vcpu, mmio); return EMULATE_FAIL; @@ -3761,6 +3762,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DO_MMIO; } +done: + if (vcpu-arch.emulate_ctxt.restart) + goto restart; + return EMULATE_DONE; } EXPORT_SYMBOL_GPL(emulate_instruction); @@ -4516,6 +4521,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) goto out; } } + if (vcpu-arch.emulate_ctxt.restart) { + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); + r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); + srcu_read_unlock(vcpu-kvm-srcu,
Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
On 03/09/2010 04:09 PM, Gleb Natapov wrote: Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 032d02f..e881618 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* +* write_std: Write bytes of standard (non-emulated/special) memory. +*Used for descriptor writing. +* @addr: [IN ] Linear address to which to write. +* @val: [OUT] Value write to memory, zero-extended to 'u_long'. +* @bytes: [IN ] Number of bytes to write to memory. +*/ + int (*write_std)(unsigned long addr, void *val, +unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); Descriptor writes need an atomic kvm_set_guest_bit(), no? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.
On 03/09/2010 04:09 PM, Gleb Natapov wrote: in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. index def4877..315e8a8 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm) static int io_interception(struct vcpu_svm *svm) { - u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */ - int size, in, string; - unsigned port; - ++svm-vcpu.stat.io_exits; - svm-next_rip = svm-vmcb-control.exit_info_2; - - string = (io_info SVM_IOIO_STR_MASK) != 0; - - if (string) { - if (emulate_instruction(svm-vcpu, - 0, 0, 0) == EMULATE_DO_MMIO) - return 0; - return 1; - } - - in = (io_info SVM_IOIO_TYPE_MASK) != 0; - port = io_info 16; - size = (io_info SVM_IOIO_SIZE_MASK) SVM_IOIO_SIZE_SHIFT; - - skip_emulated_instruction(svm-vcpu); - return kvm_emulate_pio(svm-vcpu, in, size, port); + return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO); } We don't want to enter the emulator for non-string in/out. Leftover test code? static int nmi_interception(struct vcpu_svm *svm) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..7f33d8e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2974,26 +2974,9 @@ static int handle_triple_fault(struct kvm_vcpu *vcpu) static int handle_io(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; - int size, in, string; - unsigned port; - ++vcpu-stat.io_exits; - exit_qualification = vmcs_readl(EXIT_QUALIFICATION); - string = (exit_qualification 16) != 0; - if (string) { - if (emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DO_MMIO) - return 0; - return 1; - } - - size = (exit_qualification 7) + 1; - in = (exit_qualification 8) != 0; - port = exit_qualification 16; - - skip_emulated_instruction(vcpu); - return kvm_emulate_pio(vcpu, in, size, port); + return !(emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DO_MMIO); } Ditto. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 07:32 AM, Avi Kivity wrote: On 03/02/2010 04:36 AM, Anthony Liguori wrote: I keep a patch in the SUSE version for quite some time now that bumps the default to 384 for qemu-kvm. That was the first round number where an openSUSE installation worked. If someone works up a patch and tests at least a couple types of guests to confirm that they all install with that number, I'd be happy to apply it (although we need some trickery to support older pc versions). We should avoid changing defaults. I disagree. IMHO, the defaults should represent our best suggestion for any given release. The compatibility machines make it very easier for a user to fix on a particular version of a machine type. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 08:38 AM, Alexander Graf wrote: On 09.03.2010, at 15:32, Dustin Kirkland wrote: On Tue, 2010-03-09 at 15:32 +0200, Avi Kivity wrote: On 03/02/2010 04:36 AM, Anthony Liguori wrote: I keep a patch in the SUSE version for quite some time now that bumps the default to 384 for qemu-kvm. That was the first round number where an openSUSE installation worked. If someone works up a patch and tests at least a couple types of guests to confirm that they all install with that number, I'd be happy to apply it (although we need some trickery to support older pc versions). We should avoid changing defaults. I don't think in this case it matters, since everyone specifies -m anyway, but as a general rule changing defaults = breakage for the unwary. At least make the default part of the machine type to preserve compatibility. In that case, Alex, where can I find your +384M patch, because I'd like to carry the same one in Ubuntu... It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. We should attempt to do three things with default ram size: 1) bump it up to a more reasonable number 2) make it specified in the global default config 3) make sure we can provide compatibility support for older machine types Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/09/2010 04:09 PM, Gleb Natapov wrote: Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode. What happens if rcx is really big? Going back into the guest gave us a preemption point. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 04:50 PM, Anthony Liguori wrote: It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. We should attempt to do three things with default ram size: 1) bump it up to a more reasonable number 2) make it specified in the global default config 3) make sure we can provide compatibility support for older machine types It's really sad, the amount of code needed to change a number. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 09.03.2010, at 15:32, Dustin Kirkland wrote: On Tue, 2010-03-09 at 15:32 +0200, Avi Kivity wrote: On 03/02/2010 04:36 AM, Anthony Liguori wrote: I keep a patch in the SUSE version for quite some time now that bumps the default to 384 for qemu-kvm. That was the first round number where an openSUSE installation worked. If someone works up a patch and tests at least a couple types of guests to confirm that they all install with that number, I'd be happy to apply it (although we need some trickery to support older pc versions). We should avoid changing defaults. I don't think in this case it matters, since everyone specifies -m anyway, but as a general rule changing defaults = breakage for the unwary. At least make the default part of the machine type to preserve compatibility. In that case, Alex, where can I find your +384M patch, because I'd like to carry the same one in Ubuntu... It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 04:49 PM, Anthony Liguori wrote: On 03/09/2010 07:32 AM, Avi Kivity wrote: On 03/02/2010 04:36 AM, Anthony Liguori wrote: I keep a patch in the SUSE version for quite some time now that bumps the default to 384 for qemu-kvm. That was the first round number where an openSUSE installation worked. If someone works up a patch and tests at least a couple types of guests to confirm that they all install with that number, I'd be happy to apply it (although we need some trickery to support older pc versions). We should avoid changing defaults. I disagree. IMHO, the defaults should represent our best suggestion for any given release. The compatibility machines make it very easier for a user to fix on a particular version of a machine type. Agreed, should have said, avoid changing defaults without taking care of backwards compatibility. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 08:52 AM, Avi Kivity wrote: On 03/09/2010 04:50 PM, Anthony Liguori wrote: It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. We should attempt to do three things with default ram size: 1) bump it up to a more reasonable number 2) make it specified in the global default config 3) make sure we can provide compatibility support for older machine types It's really sad, the amount of code needed to change a number. We don't do enough via a config. If we did, we could just have a 0.12 config version that got frozen over time. So really, if we can make the mem readable by global config, and we can have machine specific configs, it would simplify the problem in the future so that we just had to bump a number. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm upstream segfaults when using -smp 1
On Thu, 2010-03-04 at 19:30 +0100, Jan Kiszka wrote: Lucas Meneghel Rodrigues wrote: Hi folks: Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1: 03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command: /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file descriptor 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 Segmentation fault (core dumped) /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 -smp 1 -drive file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0 03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with status 139) I have opened a bug about it on KVM's bug tracking system on sourceforge. Relevant software versions involved: Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e) Kernel: 2.6.31.12-174.2.22.fc12.x86_64 Please let me know if you need more information about it. Should be fixed by this: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/47883 Ok, seems like the fix was already applied and today's upstream job didn't present any problems (100% PASS across the board for qemu-kvm and qemu :)) Thanks, Lucas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/02/2010 04:36 AM, Anthony Liguori wrote: I keep a patch in the SUSE version for quite some time now that bumps the default to 384 for qemu-kvm. That was the first round number where an openSUSE installation worked. If someone works up a patch and tests at least a couple types of guests to confirm that they all install with that number, I'd be happy to apply it (although we need some trickery to support older pc versions). We should avoid changing defaults. I don't think in this case it matters, since everyone specifies -m anyway, but as a general rule changing defaults = breakage for the unwary. At least make the default part of the machine type to preserve compatibility. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 07:57 PM, Cam Macdonell wrote: Can you provide a spec that describes the device? This would be useful for maintaining the code, writing guest drivers, and as a framework for review. I'm not sure if you want the Qemu command-line part as part of the spec here, but I've included for completeness. I meant something from the guest's point of view, so command line syntax is less important. It should be equally applicable to a real PCI card that works with the same driver. See http://ozlabs.org/~rusty/virtio-spec/ for an example. The Inter-VM Shared Memory PCI device --- BARs The device supports two BARs. BAR0 is a 256-byte MMIO region to support registers (but might be extended in the future) and BAR1 is used to map the shared memory object from the host. The size of BAR1 is specified on the command-line and must be a power of 2 in size. Registers BAR0 currently supports 5 registers of 16-bits each. Suggest making registers 32-bits, friendlier towards non-x86. Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? When using interrupts, VMs communicate with a shared memory server that passes the shared memory object file descriptor using SCM_RIGHTS. The server assigns each VM an ID number and sends this ID number to the Qemu process along with a series of eventfd file descriptors, one per guest using the shared memory server. These eventfds will be used to send interrupts between guests. Each guest listens on the eventfd corresponding to their ID and may use the others for sending interrupts to other guests. enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. That limits the number of guests to 16. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. What happens when two interrupts are sent back-to-back to the same guest? Will the first status value be lost? Also, reading the status register requires a vmexit. I suggest dropping it and requiring the application to manage this information in the shared memory area (where it could do proper queueing of multiple messages). A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. Please consider adding: - MSI support - interrupt on a guest attaching/detaching to the shared memory device With MSI you could also have the doorbell specify both guest ID and vector number, which may be useful. Thanks for this - it definitely makes reviewing easier. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 03/08/2010 08:03 PM, Alexander Graf wrote: MOL uses its own hypercall interface to call back into userspace when the guest wants to do something. So let's implement that as an exit reason, specify it with a CAP and only really use it when userspace wants us to. The only user of it so far is MOL. Signed-off-by: Alexander Grafag...@suse.de --- v1 - v2: - Add documentation for OSI exit struct --- Documentation/kvm/api.txt | 13 + arch/powerpc/include/asm/kvm_book3s.h |5 + arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/kvm/book3s.c | 24 ++-- arch/powerpc/kvm/powerpc.c| 12 include/linux/kvm.h |6 ++ 6 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 6a19ab6..b2129e8 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -932,6 +932,19 @@ s390 specific. powerpc specific. + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. Is using KVM_[GS]ET_REGS problematic for some reason? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu
On 03/08/2010 08:03 PM, Alexander Graf wrote: Some times we don't want all capabilities to be available to all our vcpus. One example for that is the OSI interface, implemented in the next patch. In order to have a generic mechanism in how to enable capabilities individually, this patch introduces a new ioctl that can be used for this purpose. That way features we don't want in all guests or userspace configurations can just not be enabled and we're good. diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index d170cb4..6a19ab6 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -749,6 +749,21 @@ Writes debug registers into the vcpu. See KVM_GET_DEBUGREGS for the data structure. The flags field is unused yet and must be cleared on entry. +4.34 KVM_ENABLE_CAP + +Capability: basic Capability: basic means that the feature was present in 2.6.22. Otherwise you need to specify the KVM_CAP_ that presents this feature. +Architectures: all But it's implemented for ppc only (other arches will get ENOTTY). +Not all extensions are enabled by default. Using this ioctl the application +can enable an extension, making it available to the guest. + +On systems that do not support this ioctl, it always fails. On systems that +do support it, it only works for extensions that are supported for enablement. +As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI. That needs to be documented. It also needs to be discoverable separately - we can have a kernel with KVM_ENABLE_CAP but without KVM_CAP_PPC_OSI. btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace. Please choose another namespace. Need to document the structure fields. /* @@ -696,6 +705,8 @@ struct kvm_clock_data { /* Available with KVM_CAP_DEBUGREGS */ #define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs) #define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs) +/* No need for CAP, because then it just always fails */ +#define KVM_ENABLE_CAP_IOW(KVMIO, 0xa3, struct kvm_enable_cap) The CAPs are needed so you can discover what you have without running guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation
On 03/09/2010 04:09 PM, Gleb Natapov wrote: It is undefined and should generate #UD. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/kvm/emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2df510b..1a32b78 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,6 +2486,9 @@ twobyte_insn: (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; + case 5: /* not defined */ + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt-vcpu, memop); /* Disable writeback. */ Why is this needed? We can only get here if the guest tricks us (otherwise the #UD would go back to the guest, or rather, we'd trap it to see if it's a hypercall instruction, but not pass it on to the emulator). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH bugfix] KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
On 03/09/2010 07:55 AM, Takuya Yoshikawa wrote: svm_create_vcpu() does not free the pages allocated during the creation when it fails to complete the allocations. This patch fixes it. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Mar 9
On 03/09/2010 07:44 AM, Luiz Capitulino wrote: On Mon, 8 Mar 2010 22:29:55 -0800 Chris Wrightchr...@redhat.com wrote: - virtio-9p passthrough filesystem support - modular command line helpers Please send in any additional agenda items you are interested in covering. - Summer of code 2010 (do we want to join?) - Status of Anthony's patch queue I was more behind than I thought I was. I will be catching up today. Sorry for the inconvenience. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Rework VCPU state writeback API
On 03/08/2010 02:33 PM, Marcelo Tosatti wrote: On Fri, Mar 05, 2010 at 09:37:26PM -0500, Kevin O'Connor wrote: On Thu, Mar 04, 2010 at 03:35:52PM -0300, Marcelo Tosatti wrote: On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote: On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote: The regression seems to be caused by seabios commit d7e998f. Kevin, the failure can be seen on the attached screenshot, which happens on the first reboot of WinXP 32 installation (after copying files etc). Sorry - I also noticed a bug in that commit recently. I pushed the fix I had in my local tree. Thanks, it does fix the issue here. Anthony can you please update seabios? Neither commit d7e998f nor the fix 8f469b96 are on the SeaBIOS stable branch. Is qemu ready to pull in bigger changes now? Anthony pulls in seabios master into qemu.git master periodically. We should be up to date now FWIW. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation
On Tue, Mar 09, 2010 at 04:27:39PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: It is undefined and should generate #UD. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/kvm/emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2df510b..1a32b78 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,6 +2486,9 @@ twobyte_insn: (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; +case 5: /* not defined */ +kvm_queue_exception(ctxt-vcpu, UD_VECTOR); +goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt-vcpu, memop); /* Disable writeback. */ Why is this needed? We can only get here if the guest tricks us (otherwise the #UD would go back to the guest, or rather, we'd trap it to see if it's a hypercall instruction, but not pass it on to the emulator). For completes. A lot of code we added recently is there only because guest can trick us to enter emulator. Unfortunately we have to take suck tricks into account. Without this patch if emulator gets here it will report failed emulation. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Mar 9
On Mon, 8 Mar 2010 22:29:55 -0800 Chris Wright chr...@redhat.com wrote: - virtio-9p passthrough filesystem support - modular command line helpers Please send in any additional agenda items you are interested in covering. - Summer of code 2010 (do we want to join?) - Status of Anthony's patch queue -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation
On 03/09/2010 04:33 PM, Gleb Natapov wrote: On Tue, Mar 09, 2010 at 04:27:39PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: It is undefined and should generate #UD. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/kvm/emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 2df510b..1a32b78 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2486,6 +2486,9 @@ twobyte_insn: (c-src.val 0x0f), ctxt-vcpu); c-dst.type = OP_NONE; break; + case 5: /* not defined */ + kvm_queue_exception(ctxt-vcpu, UD_VECTOR); + goto done; case 7: /* invlpg*/ emulate_invlpg(ctxt-vcpu, memop); /* Disable writeback. */ Why is this needed? We can only get here if the guest tricks us (otherwise the #UD would go back to the guest, or rather, we'd trap it to see if it's a hypercall instruction, but not pass it on to the emulator). For completes. A lot of code we added recently is there only because guest can trick us to enter emulator. Unfortunately we have to take suck tricks into account. Without this patch if emulator gets here it will report failed emulation. Okay. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/24] KVM: Remove pointer to rflags from realmode_set_cr parameters.
Mov reg, cr instruction doesn't change flags in any meaningful way, so no need to update rflags after instruction execution. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h |3 +-- arch/x86/kvm/emulate.c |3 +-- arch/x86/kvm/x86.c |4 +--- 3 files changed, 3 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..3b178d8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -586,8 +586,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value, -unsigned long *rflags); +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4f6ccab..a91bb42 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2531,8 +2531,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c-modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt-vcpu, - c-modrm_reg, c-modrm_val, ctxt-eflags); + realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3753c11..d8711fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4054,13 +4054,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr) return value; } -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val, -unsigned long *rflags) +void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val) { switch (cr) { case 0: kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); - *rflags = kvm_get_rflags(vcpu); break; case 2: vcpu-arch.cr2 = val; -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/24] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
Unify all conditions that get us back into emulator after returning from userspace. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/x86.c | 32 ++-- 1 files changed, 6 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 82379e1..a2c728f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4495,33 +4495,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if (!irqchip_in_kernel(vcpu-kvm)) kvm_set_cr8(vcpu, kvm_run-cr8); - if (vcpu-arch.pio.count) { - vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); - srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - if (r == EMULATE_DO_MMIO) { - r = 0; - goto out; + if (vcpu-arch.pio.count || vcpu-mmio_needed || + vcpu-arch.emulate_ctxt.restart) { + if (vcpu-mmio_needed) { + memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8); + vcpu-mmio_read_completed = 1; + vcpu-mmio_needed = 0; } - } - if (vcpu-mmio_needed) { - memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8); - vcpu-mmio_read_completed = 1; - vcpu-mmio_needed = 0; - - vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); - r = emulate_instruction(vcpu, vcpu-arch.mmio_fault_cr2, 0, - EMULTYPE_NO_DECODE); - srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - if (r == EMULATE_DO_MMIO) { - /* -* Read-modify-write. Back to userspace. -*/ - r = 0; - goto out; - } - } - if (vcpu-arch.emulate_ctxt.restart) { vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE); srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()
On 03/09/2010 08:37 AM, Wei Yongjun wrote: This patch fixed possible memory leak in kvm_arch_vcpu_create() under s390, which would happen when kvm_arch_vcpu_create() fails. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()
Wei Yongjun wrote: This patch fixed possible memory leak in kvm_arch_vcpu_create() under s390, which would happen when kvm_arch_vcpu_create() fails. Good catch, thanks! Acked-by: Carsten Otte co...@de.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Inter-VM shared memory PCI device
On Tue, Mar 9, 2010 at 3:29 AM, Avi Kivity a...@redhat.com wrote: On 03/08/2010 07:57 PM, Cam Macdonell wrote: Can you provide a spec that describes the device? This would be useful for maintaining the code, writing guest drivers, and as a framework for review. I'm not sure if you want the Qemu command-line part as part of the spec here, but I've included for completeness. I meant something from the guest's point of view, so command line syntax is less important. It should be equally applicable to a real PCI card that works with the same driver. See http://ozlabs.org/~rusty/virtio-spec/ for an example. The Inter-VM Shared Memory PCI device --- BARs The device supports two BARs. BAR0 is a 256-byte MMIO region to support registers (but might be extended in the future) and BAR1 is used to map the shared memory object from the host. The size of BAR1 is specified on the command-line and must be a power of 2 in size. Registers BAR0 currently supports 5 registers of 16-bits each. Suggest making registers 32-bits, friendlier towards non-x86. Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? When using interrupts, VMs communicate with a shared memory server that passes the shared memory object file descriptor using SCM_RIGHTS. The server assigns each VM an ID number and sends this ID number to the Qemu process along with a series of eventfd file descriptors, one per guest using the shared memory server. These eventfds will be used to send interrupts between guests. Each guest listens on the eventfd corresponding to their ID and may use the others for sending interrupts to other guests. enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. That limits the number of guests to 16. True, it could grow to 32 or 64 without difficulty. We could leave 'liveness' to the user (could be implemented using the shared memory region) or via the interrupts that arrive on guest attach/detach as you suggest below.. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. What happens when two interrupts are sent back-to-back to the same guest? Will the first status value be lost? Right now, it would be. I believe that eventfd has a counting semaphore option, that could prevent loss of status (but limits what the status could be). My understanding of uio_pci interrupt handling is fairly new, but we could have the uio driver store the interrupt statuses to avoid losing them. Also, reading the status register requires a vmexit. I suggest dropping it and requiring the application to manage this information in the shared memory area (where it could do proper queueing of multiple messages). A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. Please consider adding: - MSI support Sure, I'll look into it. - interrupt on a guest attaching/detaching to the shared memory device Sure. With MSI you could also have the doorbell specify both guest ID and vector number, which may be useful. Thanks for this - it definitely makes reviewing easier. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/24] KVM: Provide current CPL as part of emulator context.
On Tue, Mar 09, 2010 at 04:24:45PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Eliminate the need to call back into KVM to get it from emulator. @@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu-arch.emulate_ctxt.vcpu = vcpu; vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); +vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu); vcpu-arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) This is an unconditional VMREAD, which is slow (extra slow if nested). Most common emulator ops do not need the cpl. Will have to make it one of x86_emulate_ops callback then. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.
On Tue, Mar 09, 2010 at 04:18:09PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. +ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); +void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; Note, passing a vcpu means we are still tightly coupled to kvm. Can be fixed later. Yes, that is on my todo. +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ +unsigned long value; + +switch (cr) { +case 0: +value = kvm_read_cr0(vcpu); +break; +case 2: +value = vcpu-arch.cr2; +break; +case 3: +value = vcpu-arch.cr3; +break; +case 4: +value = kvm_read_cr4(vcpu); +break; +case 8: +value = kvm_get_cr8(vcpu); +break; +default: +vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); +return 0; This printk is triggerable by guest code (as the patch didn't introduce this, it can be fixed later). The emulator should #UD on unrecognised control registers. inject #UD on access to non-existing CR patch does this. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/24] KVM: Provide current eip as part of emulator context.
Eliminate the need to call back into KVM to get it from emulator. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |3 ++- arch/x86/kvm/emulate.c | 12 ++-- arch/x86/kvm/x86.c |1 + 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index d8b2da0..032d02f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -140,7 +140,7 @@ struct decode_cache { u8 seg_override; unsigned int d; unsigned long regs[NR_VCPU_REGS]; - unsigned long eip, eip_orig; + unsigned long eip; /* modrm */ u8 modrm; u8 modrm_mod; @@ -159,6 +159,7 @@ struct x86_emulate_ctxt { struct kvm_vcpu *vcpu; unsigned long eflags; + unsigned long eip; /* eip before instruction emulation */ int cpl; /* Emulated execution mode, represented by an X86EMUL_MODE value. */ int mode; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ed29a52..2cc9ef4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt, int rc; /* x86 instructions are limited to 15 bytes. */ - if (eip + size - ctxt-decode.eip_orig 15) + if (eip + size - ctxt-eip 15) return X86EMUL_UNHANDLEABLE; eip += ctxt-cs_base; while (size--) { @@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) /* Shadow copy of register state. Committed on successful emulation. */ memset(c, 0, sizeof(struct decode_cache)); - c-eip = c-eip_orig = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS); memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); @@ -1874,7 +1874,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } } c-regs[VCPU_REGS_RCX]--; - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } if (c-src.type == OP_MEM) { @@ -2443,7 +2443,7 @@ twobyte_insn: goto done; /* Let the processor re-execute the fixed hypercall */ - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; /* Disable writeback. */ c-dst.type = OP_NONE; break; @@ -2547,7 +2547,7 @@ twobyte_insn: | ((u64)c-regs[VCPU_REGS_RDX] 32); if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } rc = X86EMUL_CONTINUE; c-dst.type = OP_NONE; @@ -2556,7 +2556,7 @@ twobyte_insn: /* rdmsr */ if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) { kvm_inject_gp(ctxt-vcpu, 0); - c-eip = kvm_rip_read(ctxt-vcpu); + c-eip = ctxt-eip; } else { c-regs[VCPU_REGS_RAX] = (u32)msr_data; c-regs[VCPU_REGS_RDX] = msr_data 32; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9b5fb43..41cf54c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3500,6 +3500,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu-arch.emulate_ctxt.vcpu = vcpu; vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu); + vcpu-arch.emulate_ctxt.eip = kvm_rip_read(vcpu); vcpu-arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) -- 1.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/24] KVM: Provide current CPL as part of emulator context.
On 03/09/2010 04:09 PM, Gleb Natapov wrote: Eliminate the need to call back into KVM to get it from emulator. @@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, vcpu-arch.emulate_ctxt.vcpu = vcpu; vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu); + vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu); vcpu-arch.emulate_ctxt.mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL : (vcpu-arch.emulate_ctxt.eflags X86_EFLAGS_VM) This is an unconditional VMREAD, which is slow (extra slow if nested). Most common emulator ops do not need the cpl. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface
On 09.03.2010, at 14:19, Avi Kivity wrote: On 03/09/2010 03:12 PM, Alexander Graf wrote: On 09.03.2010, at 14:11, Avi Kivity wrote: On 03/09/2010 03:04 PM, Alexander Graf wrote: +/* KVM_EXIT_OSI */ +struct { +__u64 gprs[32]; +} osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + That's migration unsafe. There may not be next guest entry on this host. It's as unsafe as MMIO then. From api.txt: NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Alright - so I add KVM_EXIT_OSI there and be good? :) Sure, just verify that the note holds for that case too. The handling of the hypercall write-back is in the same region as the mmio one. So whatever applies for MMIO entries applies for OSI entries too. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/09/2010 02:49 PM, Arnd Bergmann wrote: On Monday 08 March 2010, Cam Macdonell wrote: enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. This means you have at least two intercepts for each message: 1. Sender writes to doorbell 2. Receiver gets interrupted With optionally two more intercepts in order to avoid interrupting the receiver every time: 3. Receiver masks interrupt in order to process data 4. Receiver unmasks interrupt when it's done and status is no longer pending I believe you can do much better than this, you combine status and mask bits, making this level triggered, and move to a bitmask of all guests: In order to send an interrupt to another guest, the sender first checks the bit for the receiver. If it's '1', no need for any intercept, the receiver will come back anyway. If it's zero, write a '1' bit, which gets OR'd into the bitmask by the host. The receiver gets interrupted at a raising edge and just leaves the bit on, until it's done processing, then turns the bit off by writing a '1' into its own location in the mask. We could make the masking in RAM, not in registers, like virtio, which would require no exits. It would then be part of the application specific protocol and out of scope of of this spec. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.
Use this callback instead of directly call kvm function. Also rename realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing to do with real mode. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |3 +- arch/x86/include/asm/kvm_host.h|2 - arch/x86/kvm/emulate.c |7 +- arch/x86/kvm/x86.c | 114 ++-- 4 files changed, 63 insertions(+), 63 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 2666d7a..0c5caa4 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -108,7 +108,8 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct kvm_vcpu *vcpu); - + ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); + void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3b178d8..e8e108a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw, unsigned long *rflags); -unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr); -void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value); void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a91bb42..d515795 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2480,7 +2480,7 @@ twobyte_insn: break; case 4: /* smsw */ c-dst.bytes = 2; - c-dst.val = realmode_get_cr(ctxt-vcpu, 0); + c-dst.val = ops-get_cr(0, ctxt-vcpu); break; case 6: /* lmsw */ realmode_lmsw(ctxt-vcpu, (u16)c-src.val, @@ -2516,8 +2516,7 @@ twobyte_insn: case 0x20: /* mov cr, reg */ if (c-modrm_mod != 3) goto cannot_emulate; - c-regs[c-modrm_rm] = - realmode_get_cr(ctxt-vcpu, c-modrm_reg); + c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x21: /* mov from dr to reg */ @@ -2531,7 +2530,7 @@ twobyte_insn: case 0x22: /* mov reg, cr */ if (c-modrm_mod != 3) goto cannot_emulate; - realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val); + ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu); c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d8711fe..7b62ef2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3397,12 +3397,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) } EXPORT_SYMBOL_GPL(kvm_report_emulation_failure); +static u64 mk_cr_64(u64 curr_cr, u32 new_val) +{ + return (curr_cr ~((1ULL 32) - 1)) | new_val; +} + +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu) +{ + unsigned long value; + + switch (cr) { + case 0: + value = kvm_read_cr0(vcpu); + break; + case 2: + value = vcpu-arch.cr2; + break; + case 3: + value = vcpu-arch.cr3; + break; + case 4: + value = kvm_read_cr4(vcpu); + break; + case 8: + value = kvm_get_cr8(vcpu); + break; + default: + vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); + return 0; + } + + return value; +} + +static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu) +{ + switch (cr) { + case 0: + kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); + break; + case 2: + vcpu-arch.cr2 = val; + break; + case 3: + kvm_set_cr3(vcpu, val); + break; + case 4: + kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val)); + break; + case 8: + kvm_set_cr8(vcpu, val 0xfUL); + break; + default: + vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr); + } +} + static struct x86_emulate_ops emulate_ops = {
Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 032d02f..e881618 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* + * write_std: Write bytes of standard (non-emulated/special) memory. + *Used for descriptor writing. + * @addr: [IN ] Linear address to which to write. + * @val: [OUT] Value write to memory, zero-extended to 'u_long'. + * @bytes: [IN ] Number of bytes to write to memory. + */ +int (*write_std)(unsigned long addr, void *val, + unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); Descriptor writes need an atomic kvm_set_guest_bit(), no? It is? atomic against what? Current code just write whole descriptor using write_std(). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Tue, Mar 9, 2010 at 6:03 AM, Avi Kivity a...@redhat.com wrote: On 03/09/2010 02:49 PM, Arnd Bergmann wrote: On Monday 08 March 2010, Cam Macdonell wrote: enum ivshmem_registers { IntrMask = 0, IntrStatus = 2, Doorbell = 4, IVPosition = 6, IVLiveList = 8 }; The first two registers are the interrupt mask and status registers. Interrupts are triggered when a message is received on the guest's eventfd from another VM. Writing to the 'Doorbell' register is how synchronization messages are sent to other VMs. The IVPosition register is read-only and reports the guest's ID number. The IVLiveList register is also read-only and reports a bit vector of currently live VM IDs. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. A value of 255 in the upper 8-bits will trigger a broadcast where the message will be sent to all other guests. This means you have at least two intercepts for each message: 1. Sender writes to doorbell 2. Receiver gets interrupted With optionally two more intercepts in order to avoid interrupting the receiver every time: 3. Receiver masks interrupt in order to process data 4. Receiver unmasks interrupt when it's done and status is no longer pending I believe you can do much better than this, you combine status and mask bits, making this level triggered, and move to a bitmask of all guests: In order to send an interrupt to another guest, the sender first checks the bit for the receiver. If it's '1', no need for any intercept, the receiver will come back anyway. If it's zero, write a '1' bit, which gets OR'd into the bitmask by the host. The receiver gets interrupted at a raising edge and just leaves the bit on, until it's done processing, then turns the bit off by writing a '1' into its own location in the mask. We could make the masking in RAM, not in registers, like virtio, which would require no exits. It would then be part of the application specific protocol and out of scope of of this spec. This kind of implementation would be possible now since with UIO it's up to the application whether to mask interrupts or not and what interrupts mean. We could leave the interrupt mask register for those who want that behaviour. Arnd's idea would remove the need for the Doorbell and Mask, but we will always need at least one MMIO register to send whatever interrupts we do send. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 04:57 PM, Anthony Liguori wrote: On 03/09/2010 08:52 AM, Avi Kivity wrote: On 03/09/2010 04:50 PM, Anthony Liguori wrote: It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. We should attempt to do three things with default ram size: 1) bump it up to a more reasonable number 2) make it specified in the global default config 3) make sure we can provide compatibility support for older machine types It's really sad, the amount of code needed to change a number. We don't do enough via a config. If we did, we could just have a 0.12 config version that got frozen over time. So really, if we can make the mem readable by global config, and we can have machine specific configs, it would simplify the problem in the future so that we just had to bump a number. Perhaps a json representation of things. We already have the parser. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
On 03/09/2010 06:25 PM, Gleb Natapov wrote: On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 032d02f..e881618 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* +* write_std: Write bytes of standard (non-emulated/special) memory. +*Used for descriptor writing. +* @addr: [IN ] Linear address to which to write. +* @val: [OUT] Value write to memory, zero-extended to 'u_long'. +* @bytes: [IN ] Number of bytes to write to memory. +*/ + int (*write_std)(unsigned long addr, void *val, +unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); Descriptor writes need an atomic kvm_set_guest_bit(), no? It is? atomic against what? Current code just write whole descriptor using write_std(). These are accessed bit changes, and are done atomically in the same way as a page table walk sets the accessed and dirty bit. Presumably the atomic operation is to allow the kernel to scan segments and swap them out if they are not used. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 11:11 AM, Avi Kivity wrote: On 03/09/2010 04:57 PM, Anthony Liguori wrote: On 03/09/2010 08:52 AM, Avi Kivity wrote: On 03/09/2010 04:50 PM, Anthony Liguori wrote: It's all in the openSUSE build service. The direct access URL (login required FWIW) is here: https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm package. We should attempt to do three things with default ram size: 1) bump it up to a more reasonable number 2) make it specified in the global default config 3) make sure we can provide compatibility support for older machine types It's really sad, the amount of code needed to change a number. We don't do enough via a config. If we did, we could just have a 0.12 config version that got frozen over time. So really, if we can make the mem readable by global config, and we can have machine specific configs, it would simplify the problem in the future so that we just had to bump a number. Perhaps a json representation of things. We already have the parser. Please no :-) We have a config format, QemuOpts ties nicely into it as does qdev. We just need to represent machine information via QemuOpts and tie -m to manipulating the memory assigned to a machine. IOW, instead of: (machine_init)(ram_addr_t ram_size, const char *boot_device, const char *kernel_filename, const char *kernel_cmdline, const char *initrd_filename, const char *cpu_model) It should be: (machine_init)(QemuOpts *opts); Then we can have a [machine] section in the config where we describe all of these things. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/09/2010 05:27 PM, Cam Macdonell wrote: Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? I suggest not designing the device to uio. Make it a good guest-independent device, and if uio doesn't fit it, change it. Why not support interrupts unconditionally? Is the device useful without interrupts? The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. What happens when two interrupts are sent back-to-back to the same guest? Will the first status value be lost? Right now, it would be. I believe that eventfd has a counting semaphore option, that could prevent loss of status (but limits what the status could be). It only counts the number of interrupts (and kvm will coalesce them anyway). My understanding of uio_pci interrupt handling is fairly new, but we could have the uio driver store the interrupt statuses to avoid losing them. There's nowhere to store them if we use ioeventfd/irqfd. I think it's both easier and more efficient to leave this to the application (to store into shared memory). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
On 03/09/2010 07:27 PM, Anthony Liguori wrote: Perhaps a json representation of things. We already have the parser. Please no :-) We have a config format, QemuOpts ties nicely into it as does qdev. We just need to represent machine information via QemuOpts and tie -m to manipulating the memory assigned to a machine. IOW, instead of: (machine_init)(ram_addr_t ram_size, const char *boot_device, const char *kernel_filename, const char *kernel_cmdline, const char *initrd_filename, const char *cpu_model) It should be: (machine_init)(QemuOpts *opts); Then we can have a [machine] section in the config where we describe all of these things. Looks good. One day we'll read VHDL descriptions of the device model from the machine config file and tcg them to host native code, and qemu will be pure infrastructure with zero details. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/09/2010 11:28 AM, Avi Kivity wrote: On 03/09/2010 05:27 PM, Cam Macdonell wrote: Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? I suggest not designing the device to uio. Make it a good guest-independent device, and if uio doesn't fit it, change it. You can always fall back to reading the config space directly. It's not strictly required that you stick to the UIO interface. Why not support interrupts unconditionally? Is the device useful without interrupts? You can always just have interrupts enabled and not use them if that's desired. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
On Tue, Mar 09, 2010 at 07:22:51PM +0200, Avi Kivity wrote: On 03/09/2010 06:25 PM, Gleb Natapov wrote: On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Provide get_cached_descriptor(), set_cached_descriptor(), get_segment_selector(), set_segment_selector(), get_gdt(), write_std() callbacks. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_emulate.h | 16 + arch/x86/kvm/x86.c | 130 +++ 2 files changed, 131 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 032d02f..e881618 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -63,6 +63,15 @@ struct x86_emulate_ops { unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); /* + * write_std: Write bytes of standard (non-emulated/special) memory. + *Used for descriptor writing. + * @addr: [IN ] Linear address to which to write. + * @val: [OUT] Value write to memory, zero-extended to 'u_long'. + * @bytes: [IN ] Number of bytes to write to memory. + */ + int (*write_std)(unsigned long addr, void *val, + unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error); Descriptor writes need an atomic kvm_set_guest_bit(), no? It is? atomic against what? Current code just write whole descriptor using write_std(). These are accessed bit changes, and are done atomically in the same way as a page table walk sets the accessed and dirty bit. Presumably the atomic operation is to allow the kernel to scan segments and swap them out if they are not used. We can use cmpxchg callback for that, no? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.
On Tue, Mar 09, 2010 at 04:47:24PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: in/out emulation is broken now. The breakage is different depending on where IO device resides. If it is in userspace emulator reports emulation failure since it incorrectly interprets kvm_emulate_pio() return value. If IO device is in the kernel emulation of 'in' will do nothing since kvm_emulate_pio() stores result directly into vcpu registers, so emulator will overwrite result of emulation during commit of shadowed register. index def4877..315e8a8 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm) static int io_interception(struct vcpu_svm *svm) { -u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */ -int size, in, string; -unsigned port; - ++svm-vcpu.stat.io_exits; -svm-next_rip = svm-vmcb-control.exit_info_2; - -string = (io_info SVM_IOIO_STR_MASK) != 0; - -if (string) { -if (emulate_instruction(svm-vcpu, -0, 0, 0) == EMULATE_DO_MMIO) -return 0; -return 1; -} - -in = (io_info SVM_IOIO_TYPE_MASK) != 0; -port = io_info 16; -size = (io_info SVM_IOIO_SIZE_MASK) SVM_IOIO_SIZE_SHIFT; - -skip_emulated_instruction(svm-vcpu); -return kvm_emulate_pio(svm-vcpu, in, size, port); +return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO); } We don't want to enter the emulator for non-string in/out. Leftover test code? No, unfortunately this is not leftover. I just don't see a way how we can bypass emulator and still have emulator be able to emulate in/out (for big real mode for instance). The problem is basically described in the commit message. If we have function outside of emulator that does in/out emulation on vcpu directly, then emulator can't use it since committing shadowed registers will overwrite the result of emulation. Having two different emulations (one outside of emulator and another in emulator) is also problematic since when userspace returns after IO exit we don't know which emulation to continue. If we want to avoid instruction decoding we can fill in emulation context from exit info as if instruction was already decoded and call emulator. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode. What happens if rcx is really big? Going back into the guest gave us a preemption point. Two solutions. We can check if reschedule is required and yield cpu if needed. Or we can enter guest from time to time. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity a...@redhat.com wrote: On 03/09/2010 05:27 PM, Cam Macdonell wrote: Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? I suggest not designing the device to uio. Make it a good guest-independent device, and if uio doesn't fit it, change it. Why not support interrupts unconditionally? Is the device useful without interrupts? Currently my patch works with or without the shared memory server. If you give the parameter -ivshmem 256,foo then this will create (if necessary) and map /dev/shm/foo as the shared region without interrupt support. Some users of shared memory are using it this way. Going forward we can require the shared memory server and always have interrupts enabled. The Doorbell register is 16-bits, but is treated as two 8-bit values. The upper 8-bits are used for the destination VM ID. The lower 8-bits are the value which will be written to the destination VM and what the guest status register will be set to when the interrupt is trigger is the destination guest. What happens when two interrupts are sent back-to-back to the same guest? Will the first status value be lost? Right now, it would be. I believe that eventfd has a counting semaphore option, that could prevent loss of status (but limits what the status could be). It only counts the number of interrupts (and kvm will coalesce them anyway). Right. My understanding of uio_pci interrupt handling is fairly new, but we could have the uio driver store the interrupt statuses to avoid losing them. There's nowhere to store them if we use ioeventfd/irqfd. I think it's both easier and more efficient to leave this to the application (to store into shared memory). Agreed. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Guest mmap.c bug
On Mon, Mar 08, 2010 at 03:49:01PM +0100, Andrea Arcangeli wrote: On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote: It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma despite vma is the one with the smaller vm_end where the comparison vma-vm_start vma-vm_end is true (the next vma is null and the prev will have vma-vm_start == prev-vm_end, not ). The bug check looks right, it doesn't seem false positive and this bugcheck indicates that the vma rbtree is memory-corrupted somehow. so yes fiddling with npt on and off sounds a good start, if it's a bug I can confirm it happens with npt on and off. And it also happens on a Nehalem XEON (it just happened). in shadow paging it's unlikely the exact same bug materializes with both npt and without. If the crash happens with npt on and off, then maybe it's not hypervisor related. Could also be bad RAM if it only I doubt it is bad ram! This machine is working (wihtout KVM) for almost 2 years and MCE does not report any problems on the host machine. And it happens on two identical machines (Opteron) and now o the new (5 days old) Intel Nehalem XEON. All guest are Running the same kernel. It happens with a kernel compiled by me and from debian SID both 2.6.32.9, and from previous kernel I tried (2.6.31.12 and 2.6.27.45) happens on a single host and all other hosts are fine with same binary guest/host kernels (rbtree walk might stress the memory bus more than other operations). Said that vm_next being null (and if it's null, likely vm_next pointer has no ram bitflip) is a bit weird and not common scenario and this page fault seems triggered with procfs copy_user call which is non standard, so maybe this is a guest bug. It would be interesting to know what is the vm_start address, at the end there are stack, vdso and vsyscall areas. I'll make it print vm_start for next reboot. -- Bruno Ribas - ri...@c3sl.ufpr.br http://www.inf.ufpr.br/ribas C3SL: http://www.c3sl.ufpr.br -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Paul Brook wrote: However, coherence could be made host-type-independent by the host mapping and unampping pages, so that each page is only mapped into one guest (or guest CPU) at a time. Just like some clustering filesystems do to maintain coherence. You're assuming that a TLB flush implies a write barrier, and a TLB miss implies a read barrier. I'd be surprised if this were true in general. The host driver itself can issue full barriers at the same time as it maps pages on TLB miss, and would probably have to interrupt the guest's SMP KVM threads to insert a full barrier when broadcasting a TLB flush on unmap. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Avi Kivity wrote: On 03/08/2010 03:03 PM, Paul Brook wrote: On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. That makes sense. It will force an mmio callback for every access to the shared memory, which is ok for correctness but vastly slower when running in TCG compared with KVM. But it's hard to see what else could be done - those implicit write barries on x86 have to be emulated somehow. For TCG without inter-vm shared memory, those barriers aren't a problem. Non-random-corruption guest behaviour is paramount, so I hope the inter-vm device will add those mmio callbacks for the cross-arch case before it sees much action. (Strictly, it isn't cross-arch, but host-has-more-relaxed-implicit-memory-model-than-guest. I'm assuming TCG doesn't reorder memory instructions). -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Paul Brook wrote: On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Btw, x86 doesn't have any implicit barriers due to ordinary loads. Only stores and atomics have implicit barriers, afaik. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
report stolen time via pvclock?
Hi, I'm referring to this patchset http://www.mail-archive.com/kvm@vger.kernel.org/msg23810.html of Marcelo Tosatti. It seems it was never included or even discussed, although it's nearly half a year old. I wonder if there is a good reason for that? I'd like to use the steal time for my VMs, as I consider it useful in some cases. Thanks, kr,t -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: report stolen time via pvclock?
On 03/09/2010 04:30 PM, Marcelo Tosatti wrote: On Tue, Mar 09, 2010 at 09:47:38PM +0100, Thomas Treutner wrote: Hi, I'm referring to this patchset http://www.mail-archive.com/kvm@vger.kernel.org/msg23810.html of Marcelo Tosatti. It seems it was never included or even discussed, although it's nearly half a year old. I wonder if there is a good reason for that? I'd like to use the steal time for my VMs, as I consider it useful in some cases. There is a problem with it: stolen time is accounted separately (in addition to) user/system/idle. And as you noted there seems to be lack of interest in the feature. More like a lack of time to implement it properly for KVM. I know it is a useful feature for system administrators, and should probably try to get it reimplemented so it works right and with lower overhead than full schedstats (I have some ideas on how to achieve that). Thanks for reminding me of this project :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 03:54 AM, Jamie Lokier wrote: Alexander Graf wrote: Or we could put in some code that tells the guest the host shm architecture and only accept x86 on x86 for now. If anyone cares for other combinations, they're free to implement them. Seriously, we're looking at an interface designed for kvm here. Let's please keep it as simple and fast as possible for the actual use case, not some theoretically possible ones. The concern is that a perfectly working guest image running on kvm, the guest being some OS or app that uses this facility (_not_ a kvm-only guest driver), is later run on qemu on a different host, and then mostly works except for some silent data corruption. That is not a theoretical scenario. Hint: no matter what you do, shared memory is a hack that's going to lead to subtle failures one way or another. It's useful to support because it has some interesting academic uses but it's not a mechanism that can ever be used for real world purposes. It's impossible to support save/restore correctly. It can never be made to work with TCG in a safe way. That's why I've been advocating keeping this as simple as humanly possible. It's just not worth trying to make this fancier than it needs to be because it will never be fully correct. Regards, Anthony Liguori Well, the bit with this driver is theoretical, obviously :-) But not the bit about moving to a different host. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/08/2010 07:16 AM, Avi Kivity wrote: On 03/08/2010 03:03 PM, Paul Brook wrote: On 03/08/2010 12:53 AM, Paul Brook wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. No. All new devices should be fully qdev based. I suspect you've also ignored a load of coherency issues, especially when not using KVM. As soon as you have shared memory in more than one host thread/process you have to worry about memory barriers. Shouldn't it be sufficient to require the guest to issue barriers (and to ensure tcg honours the barriers, if someone wants this with tcg)?. In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. Not good enough unless you want to severely restrict the use of shared memory within the guest. For instance, it's going to be useful to assume that you atomic instructions remain atomic. Crossing architecture boundaries here makes these assumptions invalid. A barrier is not enough. Shared memory only makes sense when using KVM. In fact, we should actively disable the shared memory device when not using KVM. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Btw, x86 doesn't have any implicit barriers due to ordinary loads. Only stores and atomics have implicit barriers, afaik. As of March 2009[1] Intel guarantees that memory reads occur in order (they may only be reordered relative to writes). It appears AMD do not provide this guarantee, which could be an interesting problem for heterogeneous migration.. Paul [*] The most recent docs I have handy. Up to and including Core-2 Duo. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html